Building Governance on Sand: Why AI’s Behavioral Rules Are Structural Failures
02 May 2026, Yanjiang
AI behavioral governance fails structurally because Rice’s theorem proves any nontrivial property of computation is undecidable.
If you want to keep your AI safe, you write behavioral rules — it’s as natural as wearing white on a sunny day to stay cool. The rules are supposed to reflect risky actions away, leaving only the safe ones. But what if, like dark clothes that are secretly reflective, the whole paradigm of behavioral governance is structurally unsuited to the task — reflecting little, hiding much, and lulling us with an illusion of coverage? That is the unsettling conclusion of a preprint (arXiv:2604.27292) by Alan L. McCann at Mashin, Inc. It does not merely criticize AI governance; it proves, using a theorem from the dawn of computer science, that the enterprise is built on sand.
The paper identifies two boundaries that exist in any system that performs effects — writing to a database, calling an API, or even sending a physical actuator command. The expressiveness boundary encircles everything the system can actually do: all the actions its architecture permits. The governance boundary encircles everything the governance rules explicitly address. In an ideal world, these two circles would be exactly the same, a perfect Venn diagram of one circle. In nearly every deployed AI system today, they are drawn independently — and the mismatch creates not one but two distinct failure zones.
Picture a sprawling city where traffic laws are written only for a few designated boulevards. The expressiveness boundary is the entire street grid, including a tangle of unmarked alleys that the system can still turn into just as easily as it can take a main road. The governance boundary is the set of streets for which laws have been written. The region inside both boundaries — the governed boulevards — is where everything works as intended. The region inside expressiveness but outside governance — the unmarked alleys — is ungoverned risk: actions the system can perform but for which no safety check exists. And the region inside governance but outside expressiveness — laws written for streets that don’t actually exist — is what McCann calls theater: policies that address capabilities the system never had, creating a comforting but hollow sense of control.
This is not a failure of diligence. It is a structural property of any system where computation and effect are not forced to share the same boundary.
Two of those three regions are failure modes. Yet the standard response to AI risk — add more rules, more oversight layers, more behavioral filters — simply redraws the governance circle without ever shrinking the mismatch, because it never touches the expressiveness circle at all. The problem is architectural, and architecture cannot be patched with policy.
Here, McCann performs something that reads a little like a codebreaking exercise. He takes a theorem proved by Henry Gordon Rice in 1953 — one of the deepest results in computability theory — and cracks open the AI governance problem into its undecidable core. Rice’s theorem states that no algorithm can decide any nontrivial semantic property of an arbitrary Turing‑complete program. “Nontrivial” simply means the property is true for some programs and false for others. The property “the effects of this program comply with a given governance policy” is unmistakably nontrivial. Therefore, no generalized computational procedure — no automated auditor, no runtime verifier — can determine, for an arbitrary AI program, whether its future actions will ever violate the rules.
It is not that this check is merely difficult. It is that the question itself is algorithmically unanswerable for any system powerful enough to be interesting.
Think of it like a fragile stack of old magnetic tapes, the kind where the magnetic material has begun to powder. Some of them are fine, some will survive one play, but attempting to verify the integrity of an arbitrary tape by reading it can lift the coating right off the surface. “You read it once and that’s it. It’s now transparent tape,” as computer scientist Vint Cerf once said of decaying media. A behavioral governance check, applied to a Turing‑complete AI, faces the same cruel paradox: the only way to be certain the system will never perform a forbidden action is to simulate its entire, potentially infinite, behavioral horizon — an exercise that either never terminates or, like the transparent tape, destroys the information the moment you try to verify it. The policy becomes transparent; the assurance vanishes.
There is an elephant in the room of AI governance, and it has been standing there since 1953. So long as we try to govern effects behaviorally — inspecting outputs after the fact, adding ethical filters outside the computation — we are demanding a computer solve an undecidable problem. McCann does not merely name the elephant. He proposes a way to let it out.
The solution is what he calls coterminous governance: a system property in which the expressiveness boundary and the governance boundary are provably identical. When the two boundaries coincide, there is no room for ungoverned capabilities to slip through, and no theater because no governance rule can apply to an action the system cannot perform. The entire Venn diagram collapses to a single region — governed and feasible — and both failure modes evaporate.
Achieving coterminous governance, however, requires an architectural decision, not a governance layer added after the fact. The key is to separate computation from effect at the level of the system’s execution pipeline. The AI may compute whatever it wishes, explore any plan, simulate any scenario — but before any computed plan becomes an irrevocable action in the world, it must pass through a structural governance check that is part of the same pipeline, not a second system running alongside it. It is the difference between posting a “no entry” sign on a door (behavioral) and building a wall with a single guarded gate (structural). In a walled architecture, the expressiveness boundary — everything that can cross the gate — is exactly the actions that the governance check has approved. The computation, however exuberant, never escapes the gate without inspection.
This is not a metaphor. It is a precise architectural property, one that McCann has mechanized in the Coq proof assistant: 454 theorems, 36 modules, and zero admitted axioms.
The Coq mechanization is worth pausing over, because it transforms an argument that might otherwise sound philosophical into a formal, checkable artefact. Every step — the mapping from governance gaps to undecidability, the construction of coterminous architectures, the proof that structural governance subsumes separate governance infrastructure — has been verified by a machine, line by line, across 454 theorems with zero admitted lemmas. The paper does not ask us to trust its intuitions; it offers a fully formalized framework—454 theorems, machine-checked line by line in Coq—that another researcher could compile and verify.
The philosophical implications, however, are very much for us to chew on. For decades, the safety community has treated governance as a policy problem: craft the right rules, specify the right values, and you can contain a general intelligence. McCann’s proof shows that policy alone, when layered on top of a Turing‑complete core, is structurally insufficient — not because the rules are badly written, but because no finite set of rules can algorithmically decide whether an arbitrary computation obeys them. This is not a failure of imagination. It is a failure of the boundary.
What the paper leaves us with is a litmus test for any AI governance system, present or future: either the expressiveness boundary equals the governance boundary — provably, mechanically — or risk and theater are structurally inevitable.
It is a sobering, clarifying criterion. It says that no amount of behavioral fine‑tuning, no reinforcement‑learning reward shaping, no output classifier, can close the gap unless it alters the architecture so that effect and governance share a single, unbroken perimeter. The good news is that the paper demonstrates precisely how to build such a perimeter: separate computation from effect, and make the governance check the sole, mandatory bridge. The architecture itself becomes the governor.
For a field that has spent years painting rules onto sand — adding guidelines, safety filters, and external oversight boards — McCann’s work suggests that we have been using a paintbrush when we needed a foundation. The rules cannot hold if the ground beneath them is an undecidable void. The alternative he sketches is not easier to implement, but it is honest in a way that behavioral approaches have never been: it admits that governance is an engineering problem of boundaries, not a linguistic problem of writing the perfect rule.
The testable criterion, now laid out in Coq‑checked formalism, invites the next generation of AI architects to build systems whose expressiveness and governance boundaries are born identical, not painfully aligned after the fact. Whether anyone takes up that invitation will determine whether the coming decade of AI development produces machines that are safe by design, or machines that are merely accompanied by ever‑thicker stacks of disclaimers. The theorem is there, waiting. The only question is whether we will pour concrete, or keep painting on air.
Yanjiang is an online editor of Loom Science
References
- Alan L. McCann, The Two Boundaries: Why Behavioral AI Governance Fails Structurally, arXiv:2604.27292
