Why LLMs Can’t See Cause—and How Interventional Agents Find It

Why LLMs Can’t See Cause—and How Interventional Agents Find It

28 May 2026, Yanjiang

heading

Large language models mathematically cannot infer causation from passive data, but interventional agents using Bayesian optimization can discover causal structure from active queries.

What if the most brilliant AI system we’ve ever built—one that passes bar exams, writes poetry, and debugs code—is fundamentally incapable of answering the simplest question in science? Not “What is the meaning of life?” but something far more elementary: “Does A cause B, or does B cause A?”

That is the uncomfortable possibility raised by a new preprint (arXiv:2605.27567) from a team led by Sonali Parbhoo at Imperial College London, with Amartya Roy at IIT Delhi and Bosch. Their conclusion is stark: even the most advanced language models, when trained through any of the current standard recipes, carry a mathematical blindness to causation. It is not a matter of more data, bigger models, or cleverer prompting. It is a theorem.

The paper proves a kernel obstruction theorem: any scoring rule that predicts causal structure from purely observational data—exactly what supervised fine-tuning, preference optimization, and in‑context learning all do—cannot distinguish between two causal graphs that generate the same pattern of correlations. You could feed the model every statistical pattern that nature has ever produced, and it would still be unable to tell whether the rooster’s crow causes the sun to rise or the other way around.

This is not a metaphor; it is a precise mathematical statement. The obstruction is wired into the geometry of the learning space. Think of a detective who receives a complete police log of every crime ever committed—times, places, suspect descriptions—but is never allowed to interview anyone, touch a piece of evidence, or run a controlled experiment. She will see that a certain tattoo appears suspiciously often in the vicinity of a certain type of crime, but she will never know whether the tattoo attracts criminals or criminals seek out the tattoo. For a detective bound to passive observation, the two stories are indistinguishable. The theorem tells us that language models are just such detectives.

The Inescapable Logic of Observational Equivalence

The kernel obstruction theorem formalises a deep tension that has haunted machine learning for years. When a model learns by minimising a loss over static data, its internal representation—under the so‑called lazy or neural tangent kernel (NTK) regime—becomes a fixed function of the training inputs. If two causal graphs produce precisely the same joint distribution over the observed variables, that function cannot assign them different scores. No amount of gradient descent can break the tie. The model arrives at the truth only when the data already contain a fingerprint of the true causal direction, and that fingerprint vanishes in any purely correlational description.

An important question raised by earlier work on embodied AI (Gupta et al.) is whether physical interaction with the world can teach a model causation in a way that static text cannot. The current paper sharpens that intuition into something far stronger: even if the environment were entirely textual, the absence of interventional data creates an impassable mathematical wall. You could take every scientific paper ever written, strip out every explicit mention of an experiment, and feed it to a language model; it would become a vast encyclopedia of correlations, but it would never allow the model to climb the first rung of Judea Pearl’s causal ladder.

But here the dialectic twists. A careful reader, and indeed the literature on training dynamics, might say: “Wait—this theorem was proved only in the NTK regime, where the model’s features do not change much. Real large models often enter the rich regime, where representations reorganise themselves drastically during training. Could that richer evolution shatter the obstruction?”

The paper’s authors anticipate this. Karkada’s taxonomy of lazy versus µP regimes reminds us that the kernel obstruction is a creature of the lazy corner. Yet the fundamental symmetry does not depend on the representational engine; it depends on the information the model receives. If the training objective never distinguishes between “A causes B” and “B causes A” because the data themselves contain no such distinction, then a model that perfectly fits the data will treat both possibilities as equally plausible. The obstruction is not about the complexity of the representation; it is about the ordinal poverty of the signal. ## How to Ask a Question You Cannot Answer

The paper does not leave us with a mere impossibility result. That would be a philosophical essay, and this is a work of AI engineering. The escape the team charts is both elegant and paradoxical: keep the language model frozen, strip it of its ambition to discover causality, and use it only as an interventional oracle.

Here is how it works. Instead of asking the model “What is the true causal graph?”, you ask it a sequence of concrete, hypothetical questions: “If I force variable X to take a specific value—i.e., I intervene—what happens to Y?” The model, with its vast store of linguistic patterns, often provides an answer that is better than random, even if it has never been explicitly trained to do so. It is as though the model, despite being blind to causation as a global structure, has a residual intuition about how things behave when you prod them.

Crucially, the model itself never learns. Its weights stay frozen. An external Bayesian loop—Agentic Causal Bayesian Optimisation, or A‑CBO—collects these noisy answers and uses them to prune the space of possible causal graphs. Each intervention answer is a tiny morsel of causal evidence; the Bayesian machinery aggregates them across logarithmically many rounds, concentrating probability around the true graph. Because the decision process operates outside the space where the kernel obstruction applies, it converges provably. The model remains unchanged, yet causal discovery becomes possible.

Think of it like a courtroom. You have a witness who is unreliable—sometimes perceptive, sometimes confused. You never ask the witness to deliver a verdict. Instead, you ask a series of tightly controlled hypothetical questions: “If the defendant had been in the parked car at midnight, would the tire tracks have been deeper?” The jury—here, the Bayesian optimiser—weighs each answer, cross‑references it with prior beliefs, and eventually reaches a conclusion. The witness never understands the case; the jury never understands linguistics. Together, they understand causation.

From Benchmarks to Boundaries

The results are striking. On the existing Corr2Cause benchmark, A‑CBO matches fine‑tuned baselines without a single gradient step of training. On the larger Extended Corr2Cause benchmark—scaling up to twenty‑four variables with eighteen thousand test samples—it significantly outperforms both fine‑tuning and preference optimisation, with the advantage growing as the causal graphs become more complex. The best models converge in only eight to twelve intervention rounds, well within the experimental budget. Even modest open‑source models, when used as oracles in this framework, outperform specialised causal reasoning models trained in the traditional way.

fig1

Better models reach accurate causal conclusions after just 8–12 intervention rounds, well before the 20-round limit. This efficiency shows that interventional agents can overcome the causal discovery failures seen in passive LLMs. (Source: arXiv:2605.27567)

Method Training-free Near-miss separation Scales to d=24 Convergence guarantee
Zero-shot (GPT-4)
ICL / Prompting
SFT
DPO
A-CBO (ours)

A-CBO is the only method that meets all four essential criteria for trustworthy causal discovery. This breakthrough allows interventional agents to succeed where standard LLMs fail at identifying true cause-effect relationships. (Source: arXiv:2605.27567)

Yet a careful scientist will also ask: how far does the theorem’s shadow fall? The kernel obstruction is proven for a specific input format—textual correlation statements—and under the lazy regime. Sun and colleagues have shown that transformer feed‑forward layers can implement non‑linear in‑context learning algorithms that go beyond simple pattern matching. An open question remains whether models that actively re‑compose their internal representations through in‑context operations could, in principle, extract causal asymmetries that are invisible to a static kernel. The current paper does not settle this; it shows that the obstruction holds for the dominant training paradigms as they are practised today. Whether future architectures will pierce the barrier by building an implicit intervention engine inside the model itself is a question that belongs to the next chapter of research.

What the work forces us to confront, however, is not just a technical curiosity about AI. It is a mirror held up to the nature of scientific reasoning itself. We like to believe that intelligence can deduce cause from correlation—that a sufficiently brilliant mind, given enough passive data, could work out the laws of the universe. The kernel obstruction theorem suggests otherwise. Causation, in any learner, requires the ability to act, to intervene, to ask “What if?” The most important thing we can teach a machine, it turns out, may not be the answer to “Why?” but the courage—and the architecture—to ask.

— Yanjiang

Yanjiang is the founding editor of LoomSci.com, specializing in physics and science communication.

References

  • Amartya Roy and Sonali Parbhoo, Why LLMs Fail at Causal Discovery and How Interventional Agents Escape, arXiv:2605.27567
  • Gupta et al., The Essential Role of Causality in Foundation World Models for Embodied AI, arXiv:2402.06665
  • Karkada, The lazy (NTK) and rich (muP) regimes: a gentle tutorial, arXiv:2404.19719
  • Sun et al., On the Role of Transformer Feed-Forward Layers in Nonlinear In-Context Learning, arXiv:2501.18187