When Optimization Becomes a Mountain: A New Way to Climb
26 Apr 2026, Yanjiang
A new algorithm climbs the complex, quartic landscape of financial optimization by following the affine-normal direction, not just the gradient.
Imagine you’re standing at the base of a mountain so vast that its shape changes every time you look at it. The peak is somewhere above the clouds, but the ground beneath your feet is uneven, riddled with hidden crevasses and false summits. Now imagine you need to find the fastest path to the top — not once, but thousands of times, for mountains that keep rearranging themselves. This is the problem faced by anyone trying to optimize a large financial portfolio using not just expected returns and risks, but the full statistical fingerprint of the market: its asymmetry, its tail risk, its tendency to surprise.
The mountain is real. And a team of researchers — Ya-Juan Wang, Yi-Shuai Niu, Artan Sheshmani, and the renowned mathematician Shing-Tung Yau — has developed a new way to climb it. Their work, described in a preprint (arXiv:2604.25378), tackles a problem that has long seemed computationally intractable: how to optimize portfolios that account for skewness and kurtosis — the third and fourth statistical moments of returns — across thousands of assets simultaneously.
To understand why this matters, let us first consider the standard approach. For decades, portfolio optimization has been dominated by the mean-variance framework: balance expected return against variance (risk), and you have a clean, convex problem that can be solved efficiently even for large asset universes. It is elegant, powerful, and widely used. But it assumes that returns are roughly normally distributed — that extreme events are rare and symmetric. Markets, of course, are not so polite.
Skewness captures asymmetry: a positively skewed asset has more upside outliers than downside ones. Kurtosis captures tail risk: how likely are extreme moves in either direction? Together, these higher moments can dramatically change the optimal portfolio — but incorporating them turns a simple optimization into a nightmare. The objective function becomes quartic (fourth-order), the covariance matrix is replaced by massive coskewness and cokurtosis tensors, and the level sets — the contours of equal objective value — become anisotropic and ill-conditioned, riddled with narrow valleys and steep ridges.
Think of it like trying to navigate a city where the streets are not laid out on a grid, but twist and fold back on themselves. Standard optimization methods, which assume you can take steps in any direction and measure progress, get lost. They zigzag, stall, and eventually give up.
This is where Yau’s affine-normal descent enters the picture. The core insight is deceptively simple: instead of following the gradient — the direction of steepest descent — the algorithm follows the affine-normal direction of the current level set. Imagine you’re standing on a contour line of the mountain. The gradient points straight downhill, perpendicular to that contour. But the affine-normal direction accounts for the local curvature of the landscape — it points toward the center of the valley, not just downhill. This is not a metaphor; it is a precise mathematical construction that exploits the quartic structure of the objective function.
The result is an algorithm that avoids explicitly constructing the enormous coskewness and cokurtosis tensors. Instead, it works directly with the return matrix, computing exact sample oracles and derivative evaluations on the fly. It also supports an exact line search — meaning that at each step, the algorithm can determine the optimal distance to travel along the chosen direction, rather than guessing.
The team tested their method — which they call YAND-MVSK — on both synthetic benchmarks and real market data. The synthetic benchmarks used a controlled conditioning framework with 1,000 assets and 2,000 time periods, calibrated to a representative risk-aversion parameter. The results show a clear split in implementation strategy. For small asset universes (up to about 100 assets), a direct configuration works well. But for larger universes — from the hundreds up into the thousands — a preconditioned conjugate-gradient configuration with stall recovery becomes the preferred approach.
| q | MV ret. | MVSK ret. | Delta ret. |
|---|---|---|---|
| 0.40 | 36.24 | 41.68 | 5.44 |
| 0.50 | 38.20 | 42.03 | 3.83 |
| 0.60 | 40.37 | 42.39 | 2.02 |
Kurtosis-focused portfolios consistently outperform standard benchmarks across all target risk levels. This confirms that optimizing for higher-moment risks can deliver real-world gains in large-scale stock markets. (Source: arXiv:2604.25378)
The biggest improvement from adding higher moments (kurtosis) occurs at moderate return targets (q=0.40), where the return difference is largest.
The performance is striking. By the upper end of the hundreds, the large-scale configuration already dominates. And as the universe moves into the thousands, it remains competitive — a feat that standard methods cannot match.
But perhaps the most impressive test came from real data. The team applied their algorithm to a 5-minute A-share panel containing 5,440 stocks — a dataset large enough to make most optimization methods simply give up. On this panel, the algorithm made direct full-universe comparisons with exact mean-variance portfolios feasible. The baseline split revealed something important: the incremental value of higher moments is strongest at moderate return targets. In other words, when you’re not chasing extreme returns, the extra information from skewness and kurtosis matters most.
This is a subtle but important finding. It suggests that higher-moment optimization is not always necessary — but when it is, it can significantly improve portfolio quality without the computational cost that has historically made it impractical.
The implications extend beyond finance. The mathematical framework — affine-normal descent exploiting quartic structure — is general. Any optimization problem with a quartic objective and large-scale data could benefit from this approach. Machine learning, signal processing, and even certain problems in physics involve similar structures. The algorithm’s ability to avoid explicit higher-order tensors while maintaining exact derivatives is a methodological contribution that may find applications far beyond portfolio optimization.
What makes this work particularly elegant is how it separates the geometry of the data from the preferences of the investor. The team provides theory for a reduced simplex formulation, including regularity and convexity conditions that cleanly partition these two aspects. The data map — the statistical properties of the returns — determines the landscape. The investor’s preferences determine where on that landscape to search. This separation is not just mathematically satisfying; it is practically useful, allowing the algorithm to be adapted to different risk profiles without re-engineering the core machinery.
There is, of course, a limit to every analogy. Unlike a mountain climber who can see the summit and plan a route, the optimizer here works in a space of thousands of dimensions, where the “peak” is defined by a mathematical objective rather than a physical landmark. The affine-normal direction is not a path you can walk; it is a computational step that exists only in the algebra of the problem. But the intuition holds: by paying attention to the curvature of the landscape, rather than just its slope, the algorithm finds its way through terrain that would otherwise be impassable.
The team’s work does not claim to have solved every problem in portfolio optimization. Real markets have transaction costs, liquidity constraints, and regime changes that no static optimization can fully capture. But it opens a door that was previously closed. For the first time, full-universe higher-moment optimization is computationally feasible for thousands of assets. The question is no longer “can we do it?” but “when should we?”
Perhaps one day, when financial engineers design next-generation trading systems, they will look back at this preprint as the moment the mountain became climbable. The path is not easy, but it is now visible. And sometimes, in optimization as in life, seeing the path is half the battle.
Yanjiang is an online editor of Loom Science
References
- Ya-Juan Wang et al., Yau’s Affine-Normal Descent for Large-Scale Unrestricted Higher-Moment Portfolio Optimization, arXiv:2604.25378
