A Million Voices, One Culprit: The Invisible Failure of Small-Scale Explanations
16 May 2026, Yanjiang
A million-agent simulation reveals that influence is carried by the invisible many, not the few bright stars visible from afar.
What if the tools we use to explain the behavior of a million people are structurally incapable of telling the truth?
That is not a riddle. It is the precise technical finding of a new preprint (arXiv:2605.11404) from a team led by Dongrui Liu and Ling Tang at the Shanghai Artificial Intelligence Laboratory. They have been studying how we attribute emergent behavior — a market panic, an information cascade, a wave of polarization — to individual agents inside a vast multi-agent system. And what they found is unsettling. When you ask these systems why they did what they did, the answer you get depends, fatally, on how many voices you bothered to listen to.
The study of multi-agent systems has, over the past few years, undergone a kind of Cambrian explosion. Large language models now simulate individual agents capable of reasoning, deciding, and interacting — creating digital societies within server racks. Researchers have used these simulated populations to reproduce everything from stock-market crashes to the spread of misinformation. The promise is enormous: a laboratory for society itself, where interventions can be tested before they touch real lives. But a laboratory is useless if its instruments lie. And the instrument that tells us who did what inside these systems has a fundamental scaling disease.
The standard toolkit comes from cooperative game theory. To attribute a macro outcome to individual agents, you ask: how much did each agent’s presence contribute? The Shapley value, a Nobel-recognized concept, does this by computing a weighted average of every agent’s marginal contribution in every possible coalition. The math is elegant. The axioms — efficiency, symmetry, null player, additivity — give it an almost moral authority. Fairness, rendered in equations.
But there is a catch, and it is brutal. Computing Shapley values exactly requires summing contributions over all possible subsets of agents. For a system of N agents, that is a combinatorial Mount Everest: 2ᴺ terms. Even with modern sampling tricks, the method buckles at around a thousand agents. Meanwhile, the phenomena these simulations are built to study — polarization cascades, viral misinformation, social tipping points — inhabit systems of a million agents or more. We have been painting the Sistine Chapel with a toothbrush.
Liu and Tang’s team has done something deceptively simple to break this logjam. They adapted an older, less famous idea from axiomatic attribution: the Aumann-Shapley path-integral. Instead of enumerating coalitions like a political negotiation, the Aumann-Shapley method treats each agent’s contribution as a controlled variable. It integrates the gradient of the macro function along a smooth path from zero to full participation. Think of it this way: rather than testing every possible combination of dinner guests to see who brought the best wine, you gradually increase everyone’s contribution from nothing and watch how the dinner party changes. The math, in this case, scales linearly with the number of agents.
The computational difference is staggering. On the same hardware, their method runs four to five orders of magnitude faster than sampled Shapley — the difference between waiting a few minutes and waiting a week. For the first time, axiomatic attribution methods, which satisfy all four classical axioms, can operate at the scale of actual social media.
The team did not stop at a speed record. They wanted to know if the scale gap — the chasm between the tiny panels used in previous studies and the million-agent reality — actually matters. Their testbed was 14 days of public Bluesky data, spanning roughly 1.67 million active users across five distinct topics. They computed attribution at full scale, then compared it against the kind of visibility-biased convenience sample of 100 accounts that a small-scale study would realistically use.
The two pictures disagreed structurally. Not subtly. Not in the margins. Fundamentally.
At full scale, the long tail and the middle tier of users — the accounts with modest followers, the ordinary voices — jointly carried the bulk of the attribution mass. The celebrity accounts, the nodes with the most followers, were not the primary engines of emergence. They were one gear in a much larger machine. But the biased small panel attributed almost everything to those high-follower accounts. It mistook correlation for causation, visibility for influence. It was as if you studied the weather by only looking at the windiest day of the year and concluded that hurricanes cause all rain. This is not a bug in a particular sampling strategy. It is a structural blindness.
An important question raised by earlier work on multi-agent attribution, specifically Tang’s own previous study of extreme events in such systems, is whether the disagreement between scales can be papered over with a clever correction factor. Could you simply compute attributions on a small sample and then mathematically inflate them to match the full-scale picture? The team proves, in a theorem they call the Attribution Scaling Bias result, that the answer is no — not for any nonlinear macro indicator.
The theorem works like this. If the macro function you care about — the measure of polarization, the index of a market crash — is nonlinear in the agent outputs, then no single global rescaling factor can reconcile small-scale and full-scale attribution. The mismatch is baked into the geometry of the function itself. It is not a measurement error. It is a law.
This is where the philosophical voltage of the paper becomes almost unbearable. The theorem does not merely say that small studies are approximate. It says that, for the class of phenomena that matter most — emergent, collective, nonlinear — small studies are answering a different question. They are not blurred photographs of the same scene. They are photographs of a different scene entirely. Full-scale attribution is a theoretical requirement, not a methodological preference.
But here the dialectic tightens. The method’s validation, as sharpened by recent work on trace-based failure localization in agentic systems, rests on comparison with sampled Shapley values, not ground-truth causal effects. Sampled Shapley is a proxy for the ideal, which itself is a proxy for causality. The chain of inference is long. And the method’s speed, while genuine for the macro indicators studied here, does depend on how costly the gradient of each chosen function is to compute: if you choose a function with a pathological gradient, the computational advantage can shrink. This is not a failure of the method, but a boundary condition that the paper’s sweep can obscure.
Perhaps the most productive tension in this work is with the broader ecosystem of agent-level attribution tools, including spectrum-analysis techniques that treat failure as a signal-processing problem across agent interactions and trace-based methods that localise faulty agents from execution logs. Those methods achieve linear time at a different cost: they sacrifice the axiomatic guarantees that give Shapley and Aumann-Shapley their normative force. The new method occupies a peculiar, powerful middle ground — it keeps the axioms but ties its efficiency to the structure of the question you ask. The choice between it and its competitors is not a hardware decision. It is a philosophical one: do you optimize for explanation or for debugging?
The data from Bluesky tell a story that the theorems only frame. When attribution shifts from a few visible accounts to a diffuse mass of ordinary users, the narrative of who caused what changes in ways that have policy implications, platform-design implications, and journalistic implications. If you blame the loudest voices, you design one kind of intervention. If you acknowledge that the loudest voices are downstream of a quieter, broader dynamic, you design another. The gap between these two designs is not academic. It is, increasingly, the gap between effective governance and performative scapegoating.
The image that stays with me is this: imagine a vast library where every book reports on the same event, but every book names a different culprit. Now imagine that the smaller the sample of books you read, the more unanimous the verdict appears. That is the trap the team has mapped. The consensus of small-scale studies is not a sign of truth; it is an artifact of limited vision. The moment you read enough books, the consensus dissolves into a richer, more distributed story.
The data illustrate this with a kind of brutal clarity. When they sorted Bluesky users by follower percentile and tracked attribution mass day by day, the full-scale pattern showed attribution bleeding persistently into the long tail. The high-follower bands flickered with importance on some days and faded on others. The middle band — not celebrities, not bots, but the vast stratum of active, ordinary accounts — hummed with a steady, cumulative influence that no small panel could detect. The structural disagreement is not a faint statistical whisper. It is a roar.
This raises a question that goes beyond experimental logistics: what does it mean to explain a system whose most important causal flows are invisible to any instrument that cannot see the whole? The Attribution Scaling Bias theorem gives a formal answer: it means your explanation is wrong. But the philosophical answer is harder. It means that the age of studying social-scale AI systems with toy-box tools is over. Not because we want bigger computers, but because the systems themselves will not tolerate being seen in miniature.
The team is already, naturally, looking ahead. The method opens doors to real-time attribution in live multi-agent systems, to feedback loops where attribution informs intervention. But the shadow of the feature-generation bottleneck — the reality that in LLM-powered simulations, the cost of running the agents themselves dwarfs the cost of attributing their behavior — means that the method’s practical impact will arrive slowly, as the underlying infrastructure matures. The theorem is ahead of the hardware. That is not a weakness. It is a promissory note.
Perhaps, when we look back at this work a decade from now, we will see it not as a computational speedup but as a cold-water splash. It woke us from the comfortable illusion that a system of a million minds could be understood by interviewing ten. The theorem does not offer a cheaper route to wisdom. It offers something rarer: a proof that the cheap route was never a route at all. And to the agent who doesn’t yet know how to ask about emergence — well, that agent can start by remembering that in a world of a million interconnected voices, the truth about who spoke first, and who carried the word farthest, is almost never the story we would have told by listening to only the loudest few.
— Yanjiang
Yanjiang is an online editor of LoomSci.com.
References
- Tang et al., Attributing Emergence in Million-Agent Systems, arXiv:2605.11404
- Zhang et al., AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?, arXiv:2509.03312
- Ge et al., Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis, arXiv:2509.13782
- Tang et al., Interpreting Emergent Extreme Events in Multi-Agent Systems, arXiv:2601.20538