Taming Chance with Geometry: The L-Squared over Wasserstein Framework
21 May 2026, Yanjiang
Optimal transport already hums inside the machine learning engines that generate images and translate languages. It is a quiet geometric engine that asks: how can I move mass from one shape to another at the lowest possible cost? That might sound like a mathematician’s abstract fantasy, yet its fingerprints are on almost every generative model you’ve used this year. But there is a catch. Real‑world data are noisy; the probability measures we want to compare are almost never known exactly. A preprint (arXiv:2605.21365) from a team led by Riccardo Passeggeri at Imperial College London proposes to weave that uncertainty directly into the geometry. They call it the L‑squared over Wasserstein framework — an attempt to give random probability measures a geometric structure that mirrors the elegant smoothness of the classical Wasserstein space.
The ambition is grand. If it works, everything from statistical inference on brain‑imaging data to the random token‑sampling that powers modern transformers could be regarded as a flow on a single, unified geometric stage. But ambition, like a vaulting arch, is only as solid as the stones beneath it. Several earlier works have already carved their own approaches to random measures, and they raise pointed questions about whether the new framework’s foundations are yet robust enough to bear the weight it claims.
The Geometry of Certainty
To grasp what Passeggeri and his co‑authors are attempting, we need to spend a moment inside the orderly town of classical optimal transport. Start with two piles of sand, each one a probability measure. The Wasserstein distance – the workhorse of the field – measures the cheapest way to reshape one pile into the other, moving each grain a little at a time. The result is not just a number: the whole space of probability measures acquires a rich geometric life. You can talk about shortest paths between measures, about geodesics, about the geometry of probability itself.
Think of the classical theory as a system of canals that connects every island to every other island with a well‑defined distance. The islands are the known, fixed probability measures. As long as you know exactly how much sand sits on each island, the water‑highway works flawlessly.
When Randomness Enters the Blueprint
Now imagine the islands begin to flicker. Each one becomes a cloud of possible configurations, not a single contour. You have a random probability measure – a measure whose very shape is uncertain. The problem that Passeggeri, Rohan M. Shenoy, and Pengcheng Ye address is how to endow such random measures with a geometry that respects both the Wasserstein distances among realisations and the probabilistic mixing that randomness introduces. Their answer is to treat a random probability measure not as a bundle of separate islands but as a single point in a larger, richer space: the L‑squared space over the Wasserstein space.
In this picture, a random measure is like a chameleon that can adopt many colour distributions. The L‑squared structure captures how different chameleons (random measures) relate to one another, not by first fixing their colours but by averaging over all possible colourings while respecting the Wasserstein distance between them. The team defines a tangent space and attempts to prove that the resulting object inherits the formal Riemannian character that makes the classical Wasserstein space so powerful. They then use this scaffolding to unify several disparate corners of statistics and machine learning.
| Mixing type | Decay rate alpha(k) | Sample complexity |
|---|---|---|
| Geometric (c>0) | C0 e-ck | O(1) |
| Polynomial (theta>1) | C0 k-θ | O(1) |
| Polynomial (theta=1) | C0 k-1 | O(log n) |
| Polynomial (theta<1) | C0 k-θ | O(n1-θ) |
Mixing different data types dramatically changes how optimal transport reshapes the distributions. This insight helps researchers combine diverse datasets reliably for real-world statistical analysis. (Source: arXiv:2605.21365)
Foundations Under Scrutiny
Yet the paper arrives into a landscape where similar questions have already been posed, and where the expectations for geometric rigour are high. Earlier work by Acciaio and collaborators explored a Wasserstein Dirichlet form that also endows random measures with a differential structure. Another group, led by Pinzi and Savaré, built a “Wasserstein over Wasserstein” approach that treats random measures as points in a nested space and supplies a genuine Levi‑Civita connection – the sharp tool that makes Riemannian geometry precise.
An important question raised by that prior work is whether the tangent space defined by Passeggeri and colleagues can support a Levi‑Civita connection at all. In an adversarial dialogue conducted as part of the review process, the authors acknowledged that such a connection has not been established for their framework, and that what they call a “formal Riemannian structure” is, for now, heuristic. A geometry without a Levi‑Civita connection is like a cathedral blueprint with no drawings for the keystones. The arches look right on paper, but you cannot yet be certain they will stand.
A second tension is more practical. Several of the convergence results the paper presents – for instance, the behaviour of empirical measures in the L‑squared over Wasserstein space – can be obtained with classical tools that do not require the full newly‑built machinery. The Dirichlet form of Acciaio and the nested geometry of Pinzi and Savaré already provide routes to similar conclusions. What the new framework adds that these others do not is not yet fully explicit.
The authors are candid about these gaps. Their response to the adversarial dialogue suggests that they see the framework as a conceptual stepping stone rather than a finished edifice. It opens up a conversation about how to properly geometrize random measures – a conversation that, they argue, is valuable in its own right.
From Convergence to Transformers
Why, then, should we pay attention to an incomplete geometry? Because even a half‑built structure can reveal where bridges need to be laid. The paper demonstrates that its framework can accommodate the empirical measure in a unified way, producing consistent convergence in the Wasserstein topology without appealing to separate probabilistic machinery. In the setting of Bayesian non‑parametrics, they refine Schwartz’s consistency theorem and derive posterior convergence, showing that the L‑squared over Wasserstein lens brings new clarity to what it means for a posterior distribution of random measures to converge.
A particularly enticing application concerns transformer models. Recent work has shown that the random token‑sampling processes inside self‑attention mechanisms can be regarded as stochastic gradient flows in Wasserstein space. The L‑squared over Wasserstein framework swallows this entire sub‑theory naturally. The random sampling that makes transformers produce varied outputs – the very source of their unpredictability – becomes, in this picture, a geometric motion. Rather than treating randomness as an annoyance to be suppressed by large‑scale averaging, the framework embraces it as part of the atmospheric weather of the space itself.
A Cathedral Under Construction
The L‑squared over Wasserstein framework does not need to be a finished cathedral to warrant our attention. Its true value may lie in the question it forces us to face: what would it mean to do geometry when the objects themselves are chance events? The classical Wasserstein space is a sculpture carved from certainty; the new framework is a gesture toward a geometry that can ripple and breathe.
Perhaps the most honest way to read the paper is as an invitation. The authors have sketched an outline and invited others to fill in the missing keystones, to supply the Levi‑Civita connection, to test the structure on concrete data‑sets where existing approaches reach their limits. The path from a promising blueprint to a cathedral is long and requires many hands. But as anyone who has ever watched a half‑finished vault and imagined it complete can tell you, sometimes the act of asking the right question is the hardest – and most important – part of the building.
— Yanjiang
Yanjiang is the founding editor of LoomSci.com, specializing in physics and science communication.
References
- Riccardo Passeggeri et al., $L^2$ over Wasserstein: Statistical Analysis for Optimal Transport, arXiv:2605.21365
- Acciaio et al., Absolutely Continuous Curves of Stochastic Processes, arXiv:2506.13634
- Pinzi et al., Totally convex functions, $L^2$-Optimal transport for laws of random measures, and solution to the Monge problem, arXiv:2509.01768
- Pinzi et al., Nested superposition principle for random measures and the geometry of the Wasserstein on Wasserstein space, arXiv:2510.07523