Platform AI vs. Community-Governed AI — A Structural Analysis

Series: Community-Scale AI Governance — A Research Perspective on the Village Platform (Article 2 of 5) Author: My Digital Sovereignty Ltd Date: June 2026 Licence: CC BY 4.0 International

The Corpus Problem

The governance properties of an AI system are substantially determined by its training corpus. This is not a secondary concern — it is a first-order architectural property.

Commercial LLMs from major platform providers are trained on web-scale corpora: billions of documents scraped from the open internet, supplemented by licensed datasets and proprietary collections. The resulting systems are broad in capability and correspondingly broad in their distributional assumptions.

The internet, as a training corpus, over-represents certain domains and perspectives:

English-language content, and within English, American usage and conventions
Commercial and marketing registers
Individualistic framing of social and ethical questions
Secular therapeutic language for topics traditionally addressed by communal or spiritual traditions
Technical and professional discourse
Content produced in the last two decades, with limited historical depth

It correspondingly under-represents:

Communal decision-making traditions and governance practices
Domain-specific professional registers (liturgical, ecological, indigenous, cooperative)
Oral traditions and cultures with limited digital presence
The operational records of small organisations: meeting minutes, internal correspondence, community bulletins
Non-Western ethical and governance frameworks

This distributional imbalance is not correctable by scale. A larger web corpus amplifies the same biases. It is a structural property of the data source, not a sampling error.

Domain-Specific AI: The Alternative and Its Limitations

The Village platform takes a different architectural approach: a smaller model, trained on a layered corpus that prioritises domain-specific content over breadth.

The training architecture has three layers:

Platform layer. Common operational knowledge shared across all deployments — how the platform functions, what features are available, navigational assistance. This layer is analogous to a shared ontology across instances.

Community layer. Content specific to a particular deployment — the records, communications, and documents produced by the community that operates the instance. This layer is what differentiates one deployment from another and grounds the model's outputs in local context.

Consent layer. A structural constraint: no content enters the training corpus without explicit, auditable consent from the content creator. This is enforced architecturally, not by policy.

The resulting system is narrower than a commercial LLM. It cannot discuss topics outside its training domain with any competence. It will not produce general-purpose creative writing or engage in wide-ranging conversation. What it offers instead is outputs grounded in a specific community's records, verifiable against those records.

Limitations of this approach

Several limitations should be acknowledged:

Reduced generality. A domain-specific model cannot assist with tasks outside its training domain. Community members who need general-purpose AI assistance must use a separate system.

Corpus size constraints. Small communities produce limited content. A model trained on a few hundred documents has a correspondingly narrow knowledge base. The quality of outputs is directly constrained by the volume and quality of community content.

Retraining lag. The community layer requires periodic retraining to incorporate new content. Between retraining cycles, the model's knowledge is stale. The current retraining cadence (weekly during beta) may be insufficient for rapidly changing contexts.

Fine-tuning fragility. Domain-specific fine-tuning overlays new patterns on a base model's existing distribution. Under certain query conditions — particularly novel or complex questions — the base model's patterns may reassert themselves, a phenomenon known in the literature as catastrophic forgetting. The extent to which this affects governance-relevant outputs in practice is not yet well-characterised for this system.

Guardian Agents: External Verification Architecture

The Village platform does not rely solely on training to ensure output quality. It interposes a verification layer — termed "Guardian Agents" — between the model's outputs and the end user.

The Guardian Agent architecture comprises four independent verification mechanisms:

Semantic grounding verification. The model's output is compared against the community's document corpus using embedding-based similarity measures. Outputs that lack sufficient grounding in actual records are flagged or suppressed.

Claim-level decomposition. The output is decomposed into individual claims, each verified independently. This addresses the common failure mode where a response contains a mixture of grounded and ungrounded assertions.

Behavioural drift monitoring. A longitudinal monitoring layer tracks patterns in the model's outputs over time, detecting systematic shifts in tone, framing, or accuracy that may indicate distributional drift or degradation.

Adaptive feedback integration. Community members' feedback (both explicit ratings and moderator corrections) is incorporated into the verification thresholds. This creates a feedback loop where the verification system becomes more attuned to the community's expectations over time.

Counter-arguments and failure modes

The Guardian Agent architecture is a research contribution, not a solved problem. Several counter-arguments and failure modes warrant examination:

Semantic similarity is not truth. Embedding-based verification measures semantic proximity, not factual accuracy. A statement that is semantically close to a source document may still be factually wrong — paraphrases can invert meaning while preserving embedding similarity. The claim-level decomposition layer partially addresses this, but false positives and false negatives remain.

Verification coverage is incomplete. The guardians can verify claims against existing records. They cannot verify claims about topics not covered by the community's records. For novel questions, the system faces a choice between refusing to answer (conservative but unhelpful) and generating unverifiable outputs (helpful but unguarded). The current implementation flags low-confidence responses rather than suppressing them, which transfers the verification burden to the end user.

Feedback loops can introduce bias. The adaptive feedback mechanism assumes that community feedback is a reliable signal. In practice, feedback may be sparse, biased toward certain user demographics, or reflect preferences that conflict with accuracy. The system does not currently distinguish between feedback that corrects factual errors and feedback that reflects aesthetic or ideological preferences.

Computational overhead. Four-layer verification adds latency and computational cost. For time-sensitive queries, this overhead may degrade the user experience to the point where the system is not used — a governance failure by non-adoption rather than by technical error.

Delegation and the Locus of Control

The preceding analysis concerns the corpus — whose patterns the system carries. The agentic turn (Article 1) introduces a second, orthogonal axis: the locus of control once the system acts rather than merely advises. Delegating a task to a commercial agent transfers not only data but operative authority — the capacity to take actions (transactions, communications, record modifications) on infrastructure the deploying party does not control, with intervention points determined by the provider. Recent commentary characterises this as an accountability and data-privacy problem in the guise of an AI-capability question. The community-governed alternative does not eliminate delegation; it bounds it. Actions are confined to the community's own boundary, restricted to reversible operations, and gated by the boundary-enforcement mechanism (Article 3) such that consequential or irreversible actions require human authorisation. The architectural claim is not that the agent is more capable, but that the locus of control remains inspectable and recoverable — a property orthogonal to raw capability and, in this analysis, prior to it.

The Trade-Off: An Analytical Framework

The choice between commercial AI and community-governed AI is not a choice between a good option and a bad option. It is a choice between different trade-off profiles:

Dimension	Commercial Platform AI	Community-Governed AI
Breadth of capability	High	Low (domain-specific)
Distributional bias	Reflects web-scale corpus	Reflects community corpus
Verifiability	Low (proprietary, opaque)	Higher (open-source, auditable)
Data sovereignty	Data flows to provider	Data remains within community boundary
Verification architecture	Provider-controlled	Community-inspectable
Computational resources	Substantial (cloud-scale)	Constrained (local or small-cloud)
Generalisability	High	Low (by design)
Action authority (agentic)	Provider-mediated, few intervention points	Community-bounded, human-gated

Neither profile is categorically superior. The appropriate choice depends on the governance priorities of the deploying community — a point that itself constitutes a governance decision.

Replicability and Generalisability

A question of particular interest to the research community is whether the Village architecture is replicable and generalisable beyond its current deployment context.

The platform is designed for multi-tenant operation across diverse community types (the current implementation supports nine product types, from parishes to conservation groups to alumni associations). The vocabulary system adapts terminology to the community context, which suggests a degree of generalisability in the platform layer.

However, several factors limit confident claims about generalisability:

The system has been deployed with a small number of communities. The evidence base for cross-context effectiveness is thin.
The Guardian Agent thresholds are calibrated against specific content types. Whether they generalise to communities with fundamentally different content structures is untested.
The consent architecture assumes a membership model with authenticated access. Communities with different access models (open-access, anonymous contribution) would require architectural modifications.

These are open research questions, not resolved design decisions.

This is Article 2 of 5 in the "Community-Scale AI Governance" series. For the full Guardian Agents architecture, visit Village AI on Agentic Governance.

Previous: What AI Is, What It Is Not, and What Remains Uncertain Next: Why Policy-Based AI Governance Is Insufficient — The Structural Alternative