Learning Without Local Minima: Gauge Freedom Makes Matrix Product States Trainable

Learning Without Local Minima: Gauge Freedom Makes Matrix Product States Trainable

10 Jun 2026, Yanjiang

heading

Gauge freedom in matrix product states creates a smooth energy landscape, eliminating poor local minima and enabling reliable quantum training.

Picture a mountaineer, dropped onto a vast, jagged peak range in total darkness, told to find the lowest valley by feel alone. Every step is a gamble. Most directions lead upward. False dips promise relief but end in dead ends. Now imagine a second mountaineer, on an identical range, but carrying a lantern that somehow smooths the terrain — not by altering the mountains, but by bending the space around them, turning treacherous crags into gentle slopes that all funnel toward the true bottom. The first mountaineer is a quantum circuit. The second is a matrix product state.

fig1

Sequential circuits produce energy landscapes free from poor local minima, unlike brickwork circuits. This absence makes it easier to find optimal solutions in quantum simulations and calculations. (Source: arXiv:2606.09988)

The question of why quantum computers are so maddeningly difficult to train has haunted the field for years. Variational quantum algorithms — where classical optimizers search for the gates that minimize a cost function, typically an energy — suffer from landscapes riddled with poor local minima. Even shallow circuits, it seems, can trap an optimizer as thoroughly as a pit trap snares a wolf. Yet the density matrix renormalization group (DMRG), which has been solving quantum many-body problems since the 1990s, operates on the exact same underlying object — a matrix product state (MPS) — and trains with breathtaking reliability. This paradox has never been fully explained. Until now.

In a preprint (arXiv:2606.09988), a team led by Tao Xiang at the Institute of Physics, Chinese Academy of Sciences, with Hao-Kai Zhang as first author, along with collaborators from Hong Kong and Princeton, offers a proof that resolves the tension. Their central claim: the energy landscapes of MPS are free from poor local minima — under precisely the same conditions where generic brickwork circuits fail catastrophically. The secret is not clever initialization or exotic optimization tricks. It is something more fundamental: the gauge freedom of MPS, which creates an effective local overparametrization that concentrates all local minima within a whisper of the global minimum.

The lantern in the dark

To understand why this matters, we need to be clear about what makes a landscape treacherous. A poor local minimum is a point from which every small step increases the cost function, yet whose value lies far above the true ground-state energy. In the language of earlier work on quantum landscape theory, the fluctuation of the loss function — how wildly it varies under small parameter changes — is a strong predictor of trainability. A landscape that fluctuates violently will toss an optimizer into bad minima; a landscape that varies smoothly will guide it home. Zhang and colleagues from that earlier study (arXiv:2406.11805) proposed a quantitative fluctuation metric, showing that when it drops, learning succeeds. What they could not fully explain was why MPS landscapes were so much smoother than those of brickwork circuits. The present work supplies the missing mechanism: gauge freedom.

Here is the allegory. An MPS is a product of tensors, each with its own internal indices. But the representation is not unique: you can multiply a tensor on the left by an invertible matrix and the tensor on the right by its inverse without changing the physical state. This is the gauge freedom — a continuous family of equivalent representations that differ only by a local choice of basis. The team proves that moving the orthogonality center (a particular gauge choice that simplifies the tensor network’s structure) does not distort the energy landscape; it merely relabels the minima. More remarkably, the distribution of local minima — how many there are, and where they sit relative to the global optimum — is invariant under such moves.

fig2

Each distinct physical state corresponds to a whole family of mathematically different yet equivalent representations, all sharing the same energy. This means optimization algorithms will not get stuck in deceptive false minima, making it easier to find the best quantum state. (Source: arXiv:2606.09988)

That invariance has a profound consequence. Because the gauge orbit weaves a continuum of equivalent representations through the parameter space, any point that would be a poor local minimum in one gauge becomes part of a connected set of points with the same energy. In the full overparametrized space, these minima are no longer isolated traps; they merge into basins that stretch toward the global minimum. The team’s proof, though not a full demonstration of concentration (that part remains a compelling heuristic argument, backed by numerics), shows that the landscape’s geometry is fundamentally benign — not because the mountains are lower, but because the space itself is folded so that all paths lead downward.

This is not a metaphor drawn from thin air. It is the exact same mechanism that makes overparametrized classical neural networks trainable: the presence of many degrees of freedom that leave the function unchanged smooths the loss surface and prevents spurious barriers. In deep learning, the phenomenon is called “mode connectivity.” In MPS, it is the gauge group. The analogy is tight enough that the paper’s insight may have cross-disciplinary implications for how we think about trainability in any family of functions with a hidden symmetry.

The dance of the sequential circuit

The theoretical result is complemented by numerical experiments that are as stark as any physicist could wish. The team trained three types of circuits on random backward-evolved Hamiltonians — effectively, maximally unstructured landscapes with no special symmetry. Brickwork circuits, where gates are arranged in a regular alternating pattern like a masonry wall, stalled at high energies for systems beyond just a few qubits. The optimizers got stuck, hopelessly far from the true ground state. Sequential circuits, which build up the MPS tensor-by-tensor from left to right, behaved completely differently. Even for 18 qubits and random Hamiltonians, they converged to near-optimal solutions with striking consistency, the final energies clustering around the global minimum.

The sloping brickwork circuits — an intermediate architecture that interpolates between sequential and brickwork — showed intermediate behavior, failing gradually as the number of layers grew and the circuit departed from the pure sequential form. The visual evidence, encoded in scatter plots of final energies against system size, told a clean story: as long as the gauge freedom remained locally effective (the orthogonality center could move freely), the landscape was trainable. When the circuit’s structure locked the gauge freedom away, poor minima re-emerged.

One might ask: does this necessarily follow from the invariance theorem? A careful reader — and the team is careful — will note that Theorem 2 proves only the relabeling of minima under gauge moves, not that the minima necessarily become shallow or concentrate. The heuristic step that bridges the proof to the strong numerical evidence is the idea of “effective local overparametrization”: because the gauge freedom acts on blocks of sites, it creates a local reservoir of parameters that can adjust without altering the physical state, turning what would be sharp troughs into broad valleys. The numerics strongly support this picture. But a rigorous proof of concentration remains an open problem, and the extension to tree-tensor networks, projected entangled-pair states, and other architectures is not immediate. The paper’s reach is bounded by the specific structure of MPS.

Yet perhaps that boundedness is itself a virtue. It tells us that trainability is not a vague, unanalyzable property of some circuits being “lucky.” It is a precise consequence of a countable, exploitable symmetry. If you want a variational ansatz to train, you should design it with a gauge group large enough to overparametrize locally, but not so large as to become computationally useless. The paper gives algorithm designers a clear target: build in gauge redundancy, and the landscape will follow.

A lantern, not a magic wand

A lantern does not change the mountain; it only changes how the mountaineer navigates it. But that change is everything. For decades, DMRG practitioners have been carrying that lantern without a theory of its glow. Now they have one.

The implication stretches beyond the immediate community of tensor-network theorists. Variational quantum algorithms are currently the most promising near-term path to extracting value from noisy quantum processors. If we understand why some circuit architectures train and others don’t, we can design the next generation of ansätze with trainability baked into their geometry, rather than relying on the kindness of random initialization. The team’s work does not claim to have solved the barren-plateau problem in its most general form — the catastrophic vanishing of gradients remains a separate beast — but it isolates the equally damaging issue of poor local minima and shows, convincingly, that it can be tamed by a symmetry we already know how to exploit.

There is a deeper philosophical current here, one that runs through much of modern machine learning and quantum physics alike. Structures that appear redundant, that seem to carry unnecessary degrees of freedom, may be the very things that make complex systems navigable. In neural networks, overparametrization does not merely add capacity; it changes the geometry of the loss function so that optimization becomes easier. In gauge theories of fundamental physics, the apparent superfluity of gauge potentials conceals the true degrees of freedom, yet without them, the theory is intractable. The MPS gauge freedom sits at the intersection of these two lessons: it is both a practical tool and a pointer toward a deeper principle.

If there is a surprise in this story, it is not that MPS are trainable — we knew that empirically — but that the reason is so clean, so transferable, and so long overlooked. The team has not built a new lantern. They have told us, at last, how the old one works. And in doing so, they have drawn a map for those who would build new ones, for terrains we have not yet dared to explore in the dark.

— Yanjiang

Yanjiang is an online editor of LoomSci.com.

References

  • Hao-Kai Zhang et al., Absence of poor local minima in matrix product states, arXiv:2606.09988
  • Zhang et al., Predicting quantum learnability from landscape fluctuation, arXiv:2406.11805