Steering the Unsteerable: AI Learns to Control Liquid Crystal Defects
15 Jun 2026, Yanjiang
A deep reinforcement learning agent learns to steer a self-propelled topological defect through a microfluidic maze, revealing hidden rules of active nematics.
What does it take to steer a self-propelled vortex? For a team led by Yuto Ashida at the University of Tokyo, the answer came from an unlikely teacher: a deep reinforcement learning algorithm. In a preprint (arXiv:2606.13721), Russ Islam, Kyogo Kawaguchi, and Ashida show that an AI can learn to pilot topological defects—swirling disturbances that move like animated whirlpools—through microfluidic mazes in an active liquid crystal. The work not only masters a chaotic system, but uncovers hidden rules about where such defects can go and how memory is written into the fluid’s structure.
Active nematics are fluids composed of self-propelled rod-shaped units, such as microtubules or bacteria, that align with each other. Their flow is riddled with topological defects: points where the orientation field twists into knots. The most mobile of these is the +1/2 defect, which resembles a tiny comet with a head and a wake, and which has the peculiar property of swimming forward by churning the fluid around it. Think of a school of fish in a narrow tank: if one fish turns abruptly, the resulting swirl itself advances, pushing ahead. In a similar way, a +1/2 defect self-propels, turning the chaotic energy of the active fluid into directed motion. The defect is not alive, of course—its movement arises purely from the equations of nematic hydrodynamics—but its apparent autonomy is startling.
Harnessing that self-propulsion has been a dream for active matter researchers. If you could steer defects, you could guide the flow, mix fluids, or even use defects as information carriers. But the defects are unruly; they respond sensitively to the surrounding director field and to the spatiotemporal patterns of activity—the internal stresses that drive the fluid. Previous attempts relied on static, rule-based patterns, like shining light in a fixed region. Such methods often overshoot or fail in complex geometries, because they cannot adapt to the defect’s moment-to-moment situation. The Tokyo team asked: could an AI learn to control these defects in real time, adjusting its actions as the defect moves?
Their solution combines hybrid lattice Boltzmann simulations of active nematodynamics with deep reinforcement learning. At each time step, the RL agent observes the director field and the positions of both +1/2 and −1/2 defects, then selects a spatiotemporal activity pattern from a predefined set—a kind of menu of possible nudges. The reward is based on how close the defect gets to a target, encouraging the controller to steer it in minimum time. Over thousands of simulated episodes, the agent hones a policy, a learned intuition for when to push, when to pause, and when to exploit the fluid’s memory. The RL controllers consistently outperform static and rule-based baselines, cutting the distance-to-goal error to a fraction.
But before learning how, the team needed to understand where a defect could even go. They introduced the concept of reachable sets: the collection of all positions a defect can reach within a fixed time, given the available activity patterns and the geometry’s constraints. In a free domain, with a rich set of local activity patterns that act near the defect, the reachable set spans almost the entire space—the defect can be herded anywhere. However, when the defect enters a cross junction, local patterns prove futile. The director field’s broken symmetry and the walls’ anchoring lock the defect into a single channel, unable to turn. Instead, the defect’s mobility across the junction emerges only when the controller draws upon global patterns, which pull on the entire director field—eventually enabling the defect to round the corner and enter a new channel, much as a freezing liquid in a porous medium draws in more liquid until the whole object swells.
This is where the RL agent reveals its strength. In a free geometry, the learned policy glides the +1/2 defect to the goal and then holds it there with minimal wobble, while the rule-based controller overshoots and oscillates wildly, as if unable to brake. In a cross junction with a right-channel goal, a static controller triggers chaotic pair creation: the original +1/2 defect collides with a newly born −1/2 partner and annihilates, leaving the target unreached. The RL controller, penalized for creating defects, learns to avoid that catastrophe entirely, steering the intact defect to the goal. Yet when the goal lies in the top channel, the original defect cannot reach it at all—the geometry forbids intact passage. Here the RL controller discovers a clever trick: it deliberately creates a new defect pair in the junction, transfers control to the newborn +1/2 defect, and guides that one to the target, as if swapping horses midstream. If defect creation is forbidden, the controller never finds a solution; the very obstacle becomes a resource when the rules are softened.
Perhaps the most striking result is the meta-controller. The team trained separate RL agents on simple junctions—T-junction, cross junction—and then combined their frozen policies, without any additional training, into a single controller. This composite agent steers a defect through a larger test maze, following a sequence of waypoints in a clockwise loop. It is like a city driver who has only ever navigated a few basic intersections but can still find his way through a new city by chaining the learned maneuvers. So the meta-controller might be best described as a composite driver who has mastered only left turns and straight-ahead surges, yet can navigate a labyrinth—with zero retraining.
The defect’s journey leaves more than a trajectory; it writes a persistent memory into the material. By tracking the free energy, the team found that activity patterns immediately raise the local free energy, and these high-energy director distortions linger long after the activity is turned off. The traveling +1/2 defect carves a path of altered orientation through the channel, a kind of scar. Intriguingly, a −1/2 defect—the less mobile counterpart—drifts behind and partially erases these scars, lowering the free energy and smoothing out the history. The system can thus both write and erase, a primitive form of material memory.
An important question raised by earlier work on controlling defects in polar fluids (Singh et al., arXiv:2507.19298) is whether the same RL strategy can tame the more complex nematic case with its extensile activity. The current work shows it can—but only for +1/2 defects. The −1/2 defects lack the self-propulsion that makes the +1/2 ones steerable; they remain stubbornly uncontrollable, though their erasing role hints at a complementary function. Moreover, the meta-controller’s success may be influenced by the fact that the maze is built from repeated cross junctions; generalizing to entirely novel geometries is an open challenge. The RL agent learned that every pattern option had its place—a local nudge for fine positioning, a global stripe for turning corners—and there is no single trick to steer a defect through a maze.
In the end, the team didn’t just teach a computer to steer a defect; they revealed that the very constraints of geometry, anchoring, and activity pattern create a landscape of possibility—a topography of what can and cannot be done. The defect’s reachability is not a fixed property but something shaped by learning and by the memory of past states. This hints at a future where active materials become programmable media: we might store information in the path of a defect and erase it with another, building a fluid computer that operates not with electrons but with swirling patterns of motion. The RL controller’s achievement is not that it banished chaos, but that it learned to recruit chaos as a collaborator—a tiny vortex that follows, if only we learn its language.
— Yanjiang
Yanjiang is an online editor of LoomSci.com.
References
- R. Islam, K. Kawaguchi, Y. Ashida, Controlling Defects and Probing Dynamics in Active Nematics with Deep Reinforcement Learning, arXiv:2606.13721
- Singh et al., Controlling Topological Defects in Polar Fluids via Reinforcement Learning, arXiv:2507.19298