RAVEN’s Eye: How Machine Learning Just Added Over 100 New Planets to the Cosmic Census
13 May 2026, Yanjiang
Machine learning sifts through TESS data to validate over 100 new exoplanets, refining the cosmic census.
Imagine trying to count every grain of sand on a beach, but someone handed you a sieve with holes of wildly uneven sizes. That, in essence, has been the challenge facing exoplanet hunters for the past decade. NASA’s TESS mission has been scanning the sky since 2018, capturing images of entire swaths of the heavens every 30 minutes. The data it produces is staggering — billions of measurements, each containing the faint whisper of a possible planet dimming its host star. But buried in that avalanche of numbers are false positives: eclipsing binary stars, instrumental glitches, and astrophysical imposters that mimic the signal of a transiting world.
Now, a team led by M. Lafarga and A. Osborn at the University of Warwick has built a better sieve. Their new pipeline, called RAVEN, combines traditional detection algorithms with machine learning to sort through over 2.2 million stars observed by TESS in its first four years of operation (sectors 1 to 55). The result, described in a preprint (arXiv:2603.22597), is a uniform catalog of over 2,000 vetted transiting planet candidates — including 118 newly validated planets, 31 of which were detected for the first time by RAVEN.
This is not just a bigger catalog. It is a cleaner one.
The Problem with Planet Hunting
To understand why RAVEN matters, you need to appreciate the messiness of the data it processes. TESS observes stars in Full Frame Images (FFIs) — essentially, wide-field photographs taken every 30 minutes (and, in later sectors, every 10 minutes). When a planet passes in front of its star, the star’s brightness dips by a tiny fraction — often less than 1%. The signal is real, but so are thousands of other things that cause similar dips: a binary star system where one star eclipses another, a starspot rotating into view, or even cosmic rays striking the detector.
> > >Histograms show posterior probabilities from three classifiers: GBDT (pink dash-dotted), GP (blue dashed), and their mean (solid black with grey fill). Probabilities near 1 indicate Planet, near 0 indicate NSFP. The inset zooms in on candidates with probabilities above 0.8.
The traditional approach has been to use algorithms like Box Least Squares (BLS), which searches for periodic dips in brightness. BLS is good at finding candidates, but it’s not great at telling planets from false positives. That’s where human vetting comes in — teams of astronomers manually inspecting each candidate light curve, applying their expertise to separate the wheat from the chaff. It works, but it’s slow, subjective, and doesn’t scale to millions of stars.
RAVEN changes this by adding a machine learning layer after the BLS detection step. The pipeline uses two complementary classifiers — a Gradient Boosted Decision Tree (GBDT) and a Gaussian Process (GP) model — both trained on realistic simulations of what real planetary transits and false positives should look like. The classifiers assign each candidate a probability of being a genuine planet. Candidates with high probability are flagged for further analysis; those with low probability are rejected.
A Uniform, Unbiased Census
Period-radius diagram of 2170 vetted candidates from juliet fits. Symbols: new candidates (black), recovered TOIs (blue), non‑recovered on TOI stars (pink), recovered CTOIs (yellow), non‑recovered on CTOI stars (green). Grey lines show Neptunian desert limits (Mazeh+2016) and new desert/ridge/savannah boundaries (Castro Gonzalez+2024). Dotted lines mark detection limits: 0.5–16 d, radii <8 Rearth.
What makes RAVEN particularly valuable is its uniformity. Previous TESS catalogs have been assembled by different teams using different methods, focusing on different subsets of stars, and applying different vetting criteria. This patchwork approach makes it difficult to perform reliable demographic studies — questions like “How common are Earth-sized planets with orbital periods shorter than 10 days?” require a sample that is complete and consistently characterized.
The team focused on a magnitude-limited sample of over 2.2 million main sequence stars that are well-characterized by Gaia, the European Space Agency’s astrometry mission. By restricting to stars with known distances, masses, and radii, the team could accurately estimate the sizes of any detected planets. RAVEN searched for transits with orbital periods between 0.5 and 16 days — the sweet spot where TESS is most sensitive — and planetary radii up to about 8 Earth radii.
The result is a vetted sample of 2,170 candidates, of which 143 have been statistically validated as genuine planets. Of these, 118 are newly validated — meaning they were previously known as candidates but had not yet been confirmed as planets — and 31 are entirely new detections, never before reported. The team also identified over 1,000 new candidates that have high probability of being planets but have not yet been formally validated.
The Neptunian Desert and Other Mysteries
One of the most striking features of the new catalog is how it populates a region of planet parameter space known as the Neptunian desert. This is a curious gap in the distribution of exoplanets: there are very few Neptune-sized planets (radii between about 2 and 6 Earth radii) with orbital periods shorter than about 3 days. The desert is a real feature — not an observational artifact — and its existence tells us something important about planet formation and migration. Planets that are both large and close to their stars are rare because they either evaporate under the intense stellar radiation or migrate inward too quickly to survive.
RAVEN’s vetted sample includes candidates that fall squarely in the Neptunian desert, as well as in the adjacent “ridge” and “savannah” regions recently identified by other teams. By providing a uniform, well-characterized sample, the catalog will help theorists test models of planetary system evolution. Are the planets in the desert truly absent, or have they simply been hiding in the noise? RAVEN suggests the former — the desert is real, and it is sharply defined.
The Human Element
Behind every data point in this catalog is a story of persistence. The team spent years developing and testing RAVEN, training their machine learning models on millions of simulated light curves, and carefully validating their results against known planets. The first author, M. Lafarga, and the corresponding author, A. Osborn, led an effort that involved collaborators across multiple institutions, including D. J. Armstrong, K. Cui, A. Hadjigeorghiou, V. Kunovac, L. Doyle, E. M. Bryant, R. F. Díaz, and others.
The paper includes detailed light curves for all 31 newly detected planets — phase-folded data showing the characteristic dip of a transiting world. For each candidate, the team performed a full Bayesian fit using the juliet package, extracting precise orbital periods, transit depths, and planetary radii. The precision is remarkable: typical period uncertainties are on the order of 10⁻⁵ to 10⁻⁶ days, meaning the orbital periods are known to within seconds.
What Comes Next
RAVEN is not the end of the story. The pipeline can be applied to future TESS data — sectors beyond 55 — and potentially to data from other missions like PLATO and the James Webb Space Telescope. The team has also identified a sample of large-radius candidates (greater than 8 Earth radii) that are prime targets for atmospheric characterization with JWST. These are worlds that straddle the boundary between rocky planets and gas giants — the kind of planets that might teach us about the transition from one regime to the other.
There is also the question of the mono- and duo-transit candidates — planets that only transited once or twice during TESS’s observations, making their periods difficult to determine. RAVEN identified a small sample of these, which will require follow-up observations to confirm. Each one is a puzzle waiting to be solved.
For now, the team’s work represents a major step toward a complete census of the short-period exoplanet population. The catalog is publicly available, and the methods are reproducible. Other teams can use RAVEN to search their own data, or build on its approach to develop even better pipelines.
In the end, science advances not just through bold new theories, but through careful, systematic work — building better tools, cleaning up messy data, and producing catalogs that are uniform enough to ask the hard questions. RAVEN’s eye has seen what was hidden in plain sight. The next step is to understand what it means.
Yanjiang is an online editor of Loom Science
References
- M. Lafarga et al., Automatic search for transiting planets in TESS-SPOC FFIs with RAVEN: over 100 newly validated planets and over 2000 vetted candidates, arXiv:2603.22597


