From Predict to Evolve: A New Roadmap for World Models

From Predict to Evolve: A New Roadmap for World Models

26 Apr 2026, Yanjiang

The levels × laws taxonomy maps how world models evolve from simple predictors to autonomous systems that reshape their environments.

Only last month we were marveling at how AI agents could navigate web browsers and manipulate objects in simulated environments. Now comes a preprint (arXiv:2604.22748) that attempts something far more ambitious: a unified framework for understanding how AI systems might move beyond passive prediction toward truly autonomous interaction with the world.

The paper, from a team of 42 researchers led by Jiaya Jia, proposes what they call a “levels × laws” taxonomy for world models — the internal representations that AI agents use to understand and anticipate their environments. It’s an attempt to bring order to a field that has grown chaotically across multiple research communities, each using the same term to mean different things.

Geophysicists have long puzzled over the origins of the Indian Ocean Geoid Low. AI researchers have long puzzled over what a “world model” actually is. Now, Chu and colleagues say they have worked out a framework that explains not just what these models are, but how they might evolve from simple predictors into systems capable of reshaping the environments they inhabit.

What Makes a World Model?

The core insight of the paper is deceptively simple: not all world models are created equal, and the differences matter enormously for what an AI agent can accomplish.

At the most basic level — which the authors call L1: Predictor — a world model learns one-step transitions. Given the current state of the environment and an action, it predicts what will happen next. Think of it like learning that pushing a cup makes it slide across the table, without understanding anything about friction, momentum, or the fact that the cup will eventually stop. This is where most current systems operate.

The next level, L2: Simulator, is where things get interesting. A simulator doesn’t just predict one step ahead; it can chain predictions together into multi-step rollouts. It can imagine pushing the cup, watching it slide, then pushing it again, and again — constructing a coherent narrative of possible futures. More importantly, it respects the laws of whatever domain it operates in: physical laws for robots, software rules for web agents, social norms for chatbots.

The highest level, L3: Evolver, is the most speculative and the most ambitious. An evolver can detect when its own predictions fail against new evidence and autonomously revise its internal model. This is what scientists do when experimental results contradict theory — they don’t just update parameters; they change the structure of their understanding.

Four Regimes, Four Challenges

But capability isn’t the only axis. The team identifies four governing-law regimes that determine what constraints a world model must satisfy and where it is most likely to fail.

Physical regimes are governed by the laws of physics — conservation of momentum, thermodynamics, causality. These are the most forgiving for world models because the rules are stable and well-understood. A robot learning to grasp objects operates here.

Digital regimes are governed by software rules, which are deterministic but often opaque. A web agent navigating a GUI must understand that clicking a button triggers a specific function, but the underlying code may be hidden. Unlike the physical world, digital environments can be reset, rewound, and inspected from any angle — a luxury that makes them ideal training grounds.

Social regimes are governed by human behavior, which is anything but deterministic. A conversational agent must model beliefs, intentions, and social norms — and these can shift unpredictably. This is where world models most frequently fail, because human behavior resists clean mathematical description.

Scientific regimes are where world models face their ultimate test: discovering new knowledge. An AI that designs experiments, interprets results, and revises its own hypotheses is operating at the intersection of simulation and evolution. This is the frontier.

The Road Ahead

The paper synthesizes over 400 works across model-based reinforcement learning, video generation, web agents, multi-agent simulation, and AI-driven scientific discovery. It’s a literature review that reads more like a manifesto — a call for different research communities to recognize that they’re working on the same problem, just at different levels and in different regimes.

What makes the framework valuable is its insistence on decision-centric evaluation. The authors argue that world models should not be judged by how accurately they predict the next frame of a video or the next state of a simulation, but by how well they support decision-making in the environments they model. A perfect predictor that cannot guide action is, for most practical purposes, useless.

The team also proposes a minimal reproducible evaluation package — a standardized test bed that could allow different approaches to be compared fairly. This is the kind of infrastructure that transforms a field from a collection of ad hoc demonstrations into a mature scientific discipline.

What This Means

For researchers, the levels × laws taxonomy provides a shared vocabulary and a map of unexplored territory. The paper identifies specific gaps: there are few systems that operate as L2 Simulators in social regimes, and virtually none that achieve L3 Evolution in any regime. These are not just technical challenges; they represent fundamental questions about what it means for an AI to understand the world.

For the rest of us, the framework offers a way to think about AI progress that is more nuanced than “it’s getting smarter.” The question is not whether an AI can predict the next word or the next pixel, but whether it can simulate consequences, respect constraints, and revise its own understanding when the world surprises it.

That last capability — the ability to evolve — may be the most important. Because the environments AI agents will operate in, from scientific laboratories to human societies, are themselves constantly changing. A world model that cannot adapt to change is not a model of the real world at all.

The paper does not claim to have built an L3 Evolver. It does not promise one within any specific timeframe. What it offers is something perhaps more valuable: a map of where we are, where we need to go, and what it will mean when we get there.

Yanjiang is an online editor of Loom Science

References

  • Meng Chu et al., Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond, arXiv:2604.22748