Guardian Agents: How Village AI Holds Itself Accountable

New to AI governance? Start with What Is AI, Really? for foundational context.

Every AI system today has the same problem: it sounds confident whether it is right or wrong.

Ask a large language model to summarise a community discussion and it will produce fluent, well-structured prose. It will not tell you which claims it verified against your actual records and which it invented to fill gaps. It will not distinguish between "I found this in your FAQ" and "I generated this because it sounded plausible." It will present everything with the same calm authority.

For a community that trusts its tools — a parish preserving its history, a whānau maintaining whakapapa, a neighbourhood coordinating care — this is not a minor inconvenience. It is a fundamental accountability failure. The AI produces answers. Nobody can see how it arrived at them. And the better the AI gets, the harder the errors are to spot.

Guardian Agents are Village's answer to this problem.

What Guardian Agents Do

Guardian Agents are a system of four interlocking layers that wrap around every AI interaction in your Village. They do not replace human judgement. They provide the evidence humans need to exercise it.

Reviewers check every AI response before it reaches you. They compare what the AI says against your community's actual records — your FAQ, your stories, your documents. Each claim gets a groundedness score: how well does this statement match what your community actually knows? Responses that cannot be verified get flagged before delivery.

Monitors watch for patterns that individual reviews cannot catch. If the AI starts drifting off-topic across multiple conversations, if its accuracy drops over a period of hours, if unusual request patterns emerge — Monitors detect the trend and alert your moderators. No single bad response triggers this; it takes a pattern.

Protectors act before the AI even processes a request. They screen for prompt injection attacks, enforce rate limits, and verify that requests stay within your community's boundaries. If someone tries to manipulate the AI into ignoring its guidelines, Protectors catch it at the door.

Adaptive Learning is where Guardian Agents become genuinely different. When your moderators review alerts and make decisions — confirming a genuine issue or dismissing a false alarm — the system learns from those decisions. Not by retraining the AI. By adjusting its own detection thresholds based on evidence, subject to moderator approval.

What You See as a Member

Confidence Badges

Every AI response in your Village now carries a small visual indicator:

Verified the AI's response is well-supported by your community's own records
Partially verified some claims are supported, others could not be confirmed
Unverified the response could not be matched to your community's records

These badges are ambient — they are always present, requiring no action from you. Over time, they help you develop an intuitive sense of when to read more carefully.

Dig Deeper — Source Analysis on Demand

If you enable "Show detailed AI source analysis" in your settings, every AI response gains an expandable panel. Open it and you see exactly what the Guardian found:

3 of 5 claims verified

✓ "The church was founded in 1892" — Source: Our History
✓ "Rev. Williams led the first service" — Source: Founders FAQ
○ "It was the largest parish in the region" — No matching source found

The tone is informational, not alarming: "Here is what we can trace. Here is what we cannot." You decide what to do with that information.

Why Village Built This Differently

Most AI safety approaches share a common assumption: use more AI to check the AI. Build a second model to evaluate the first model's output. Add a "guardrail" layer that is itself a language model making probabilistic judgments about another language model's probabilistic judgments.

This creates what engineers call a common-mode failure. If both the AI and its checker share the same fundamental assumptions, they will share the same blind spots. The checker confirms the error because the checker was built the same way.

Village's Guardian Agents take a fundamentally different approach, built on four design principles:

1. Mathematical, Not Generative

Guardians do not use AI to judge AI. They use mathematical operations — embedding cosine similarity — to measure how closely an AI response aligns with your community's actual source material. This is arithmetic, not inference. It cannot be manipulated by clever phrasing and it does not hallucinate. The recursive trust problem — who watches the watchers? — is resolved by making the watchers do mathematics, not generate opinions.

2. Sovereign by Design

All guardian processing runs on your community's own infrastructure. No data leaves your tenant boundary for safety checks. Your community's content is never sent to an external service to be evaluated. The guardians are as sovereign as the AI they protect.

3. Human Authority Preserved

Guardians propose. Moderators decide. The system provides evidence-based analysis and recommendations, but every threshold change, every pattern addition, every override requires explicit human approval. The evidence burden is deliberately asymmetric: loosening safety thresholds requires stronger evidence than tightening them. The system is designed to fail conservative.

4. Tenant-Scoped Governance

Your community has its own constitutional principles, its own anomaly baselines, its own threshold overrides. What counts as an anomaly in a parish archive is different from what counts as an anomaly in a neighbourhood coordination group. Guardian Agents respect this. Each community's guardians learn from that community's patterns, governed by that community's moderators.

What This Means for Your Community

Guardian Agents do not make AI perfect. No system can. What they do is make AI accountable — accountable to your community's own records, your community's own moderators, your community's own constitutional principles.

When your AI assistant summarises a discussion, you can see whether it drew from your actual documents or filled in gaps. When it answers a question, you can trace the answer back to its source. When patterns shift, your moderators know about it before members notice.

This is not AI governance as an afterthought. It is governance embedded in the architecture — present in every interaction, visible when you want to see it, working quietly when you do not.

Leigh McMullen of Gartner (May 2025) describes guardian agents evolving through three phases: quality control, observation, and protection — all framed as "AI designed to monitor other AI." Village's Guardian Agents already encompass all three of those phases and add a fourth — Adaptive Learning — that Gartner does not envision at all. But the gap is not just about phases. Gartner's model assumes AI checking AI — inheriting the same blind spots. It assumes cloud infrastructure — not your community's own servers. It assumes universal thresholds — the same settings for every customer, with no concept of your community's specific principles. And it frames safety as an automated feature, not a constitutional governance architecture where your moderators hold authority.

Even IBM's Sovereign Core — the first enterprise AI platform designed for local governance, launched in January 2026 — addresses only where your data sits. It does not give your community a vote on what the AI does with it.

Village's Guardian Agents go beyond both. They are not just early — they are already past where the industry is heading. Mathematical verification instead of AI-checking-AI. Sovereign processing that keeps everything within your community. Human authority that cannot be automated away. And governance scoped to your community's own constitutional principles, not a platform default.

Your stories deserve an AI that shows its working.

Village is currently in beta pilot, with Guardian Agents included in all subscriptions. We are accepting applications from communities, families, and organisations. Beta founding partners receive locked-for-life founding rates.