Learning to Generate Rare Events from Topological Fingerprints

Learning to Generate Rare Events from Topological Fingerprints

16 Jun 2026, Yanjiang

heading

A neural network learns to generate realistic rare events by reading the hidden topological fingerprints—like loops and voids—in data’s persistent homology.

Generating market crashes on demand. It sounds like a recipe for financial apocalypse, but a preprint (arXiv:2606.15452) from Emre Yusuf, Ren Takahashi, and Jayabrata Bhaduri turns the idea into a scientific tool. The team shows that rare events — crashes, disease outbreaks, sudden failures — carry a hidden topological signature, a shape encoded in data that a neural network can learn to read and then use to create realistic synthetic time series of these extreme events.

Rare events are, by definition, hard to find. A stock market crash, a pandemic surge, a catastrophic equipment failure: each arrives so infrequently that a model trained on historical data sees only a handful of examples. Traditional statistical methods, built on averages and smooth distributions, treat the tail of the distribution as noise — something to be ignored or smoothed away. Generative models that try to reproduce the whole distribution often collapse to the well‑populated central region, leaving the rare, fascinating extremes untouched. The result is a familiar kind of blindness: models that are excellent at predicting calm weather but useless when the storm finally arrives.

What Yusuf and colleagues noticed is that rare events, despite their scarcity, leave a clear trace not in the statistics of individual values but in the global shape of the data. When a time series contains a crash, the cloud of points — seen as a set in a high‑dimensional space — abruptly changes its topology. A new loop may form, a cluster may break apart, a void may appear. These topological changes are often far more stable and distinctive than any statistical moment, and they can be captured with a branch of mathematics called persistent homology.

To understand persistent homology, imagine a collection of data points scattered on a table. Now imagine a tiny ball around each point, and slowly increase the radius of all balls simultaneously. At first the balls are separate, and the data looks like many isolated components. As the radius grows, some balls start to overlap and connect, merging components. Later, rings of connected balls may form, circling empty regions — these are loops. At still larger radii, loops may fill in and disappear, while higher‑dimensional voids may emerge and later vanish. Persistent homology tracks the birth and death of every such topological feature — components, loops, voids — across all scales. The result is a Betti curve: a graph that records, for each radius, how many features of each kind are alive.

This is not a mere metaphor. The Betti curves provide a genuine, computable fingerprint of the data’s shape. A normal market day and a day with a flash crash produce point clouds with radically different Betti curves: in the crash scenario, a loop may suddenly snap into existence at precisely the radius corresponding to the extreme price movement, persist for a range of scales, and then vanish. The timing and duration of that loop encode the magnitude and character of the crash. Yusuf and colleagues realized that if a generative model could be taught to see and reproduce these topological fingerprints, it would not need many examples of rare events. It would need only to learn the mapping between topological shapes and the time series that give rise to them.

The framework they built, which they call PHINN (Persistent Homology Inspired Neural Network), uses a flow‑matching strategy. Flow‑matching works by learning a continuous velocity field that smoothly transports samples from a simple noise distribution into the complex distribution of real time‑series data — a transformation that, in PHINN’s case, is applied to the point‑cloud representation of the time series. The crucial innovation is that PHINN conditions this transformation on the desired Betti curves. In other words, the network is told: “Make the output data have these topological features at these scales,” and it learns to steer the flow so that the final point cloud matches that target. A further layer of control comes from a persistence‑landscape loss, which penalises the network whenever the generated Betti curves deviate from the target, ensuring homology consistency across the entire transformation.

Rather than treating rare events as statistical outliers that must be fought with ever‑larger datasets, PHINN draws on the persistent homology of the data’s point‑cloud embedding, pulling out topological loops and voids that swell and contract as a scale parameter changes. The network learns to recreate these swelling patterns faithfully, effectively imprinting the rare‑event signature onto the generated time series.

One of the most striking features of PHINN is its natural‑language interface. A user can describe the desired rare‑event topology in plain words — “a sharp crash with a sudden recovery” or “a prolonged slow‑burn outbreak” — and the model translates this into a target Betti curve. The same interface also enables cross‑domain meta‑learning: a network trained on financial crashes can, with minimal additional training, be instructed to generate epidemiological outbreaks, simply by adjusting the topological target. This few‑shot capability breaks the usual mould of generative models that must be retrained from scratch for each new domain.

The team did not stop at accuracy. They also provided certified adversarial robustness, meaning that an attacker trying to perturb the input to steer the generated event away from the target topology cannot succeed beyond a mathematically guaranteed bound. In a world where generative models are increasingly vulnerable to small, imperceptible perturbations, this kind of certification is a rare and valuable property.

On financial and epidemiological benchmarks, PHINN cut the topological error — measured by the beta‑RMSE, a metric that captures mismatches in Betti curves — by up to 63 % compared to leading statistical and diffusion‑based baselines. At the same time, its transition accuracy — the ability to correctly reproduce when a rare event occurs — improved by roughly 84 %. These improvements hold across multiple datasets and come with rigorous statistical confidence. The model matched the tail coverage of specialised jump‑diffusion models while surpassing them in shape fidelity — that is, it generated not only the correct extreme values but also the correct temporal structure around those extremes.

Event Archetype Decision Trigger CPR
Supply node fragmentation Activate dual-source protocol 0.97
Feedback dependency loop Halt automated re-order 0.96
Resilience cavity collapse Escalate to crisis committee 0.95
Full network crisis Invoke continuity-of-ops plan 0.98
Flash liquidity freeze Suspend automated hedging 0.97
Slow-onset concentration Initiate diversification 0.94

Rare-event time series generated by PHINN preserve the same topological fingerprints as historical validated events, with high Certified Persistence Ratios. This confirms that the model reliably captures the hidden structure of rare scenarios, crucial for training robust forecasting systems. (Source: arXiv:2606.15452)

Method β-RMSE TA Disc.
Jump-Diff. (Merton) 2.29 0.23 0.28
Human Expert Panel 1.49 0.55 0.18
FM-TS 1.64 0.39 0.18
PHINN-Uncond 1.61 0.40 0.18
FIDE 1.51 0.43 0.15
TF-TS (H1) 1.27 0.59 0.16
PHINN (ours) 0.71 0.84 0.09

PHINN surpasses all competing methods in both shape accuracy and statistical realism for rare financial and multi-modal events. This means the model can generate trustworthy synthetic data for critical applications like fraud detection or anomaly forecasting. (Source: arXiv:2606.15452)

The implications run beyond any single application domain. A tool that can generate plausible rare‑event scenarios from a topological specification could transform how we stress‑test financial systems, prepare public‑health responses, or design safety margins for engineering infrastructure. It suggests that topology, far from being an abstract curiosity, can serve as a practical language for describing the shape of disaster — a language that machines can learn, understand, and translate into action.

Yusuf and colleagues have not solved every problem. The current approach requires a choice of embedding of the time series into a point cloud, and the right embedding may depend on the nature of the rare events one hopes to capture. Yet the road ahead is clear: the methods of topological data analysis, once the territory of pure mathematicians, are now entering the toolbox of applied machine learning. Rare events may always remain rare, but they need no longer remain unpredictable. Their shape may be the key we have been missing.

— Yanjiang

Yanjiang is an online editor of LoomSci.com.

References

  • Emre Yusuf et al., PHINN: Persistent Homology Inspired Neural Network for Rare-Event Time Series Generation, arXiv:2606.15452