Borrowing Brilliance: How Analogy Teaches LLMs to Think Anew

Borrowing Brilliance: How Analogy Teaches LLMs to Think Anew

16 May 2026, Yanjiang

What would it take for a machine to be genuinely creative? Not just to remix what it has already seen, but to reach across the dark and pull back something genuinely new? For decades, this question has lived in the borderlands between philosophy and computer science — a place where definitions fray and intuitions break. Now, a preprint (arXiv:2605.11258) by Andrew Shen, Shaul Druckmann, and James Zou is forcing that question out of the seminar room and into the laboratory, by showing that large language models (LLMs) can learn to borrow ideas the way humans do: through analogy.

The problem they set out to solve is, at first glance, a deeply familiar one. When you ask an LLM to generate solutions to an open‑ended scientific problem — say, designing a new molecule, or engineering a cellular circuit — it tends to give you the same answer dressed in different clothes. It converges, like a lazy student, on the most obvious idea. The researchers call this mode collapse, and they are not the first to notice it. Earlier work by Si and colleagues had already shown that LLM‑generated research ideas were judged as more novel than human expert ideas in only a minority of cases; the machines, for all their fluency, were profoundly unoriginal.

That is the challenge. The thesis Shen and colleagues put forward is that the antidote might lie in one of the oldest cognitive tools in the human repertoire: analogical reasoning. Instead of asking an LLM to solve a problem directly, their analogical reasoning (AR) pipeline first asks it to generate analogies to problems in completely different domains — economics if the target is systems biology, say, or chess if the target is drug design — and only then, armed with those cross‑domain relational structures, to search for novel solutions. The analogy becomes the compass, not the map.

Think of a chef who has spent years preparing only French cuisine. Give her a challenge — create a new dish that embodies “bitterness and sweetness in tension” — and she will almost certainly reach for caramelized endive or chocolate‑orange combinations she has made a hundred times. The kitchen has become a cage. But now imagine she spends a month studying Thai cooking, where bitter melon meets palm sugar in a completely different flavour logic. When she returns to the French kitchen, her very notion of what “bitterness and sweetness” can mean has been expanded. The AR pipeline does something precisely analogous: it sends the LLM out into neighbouring cuisines — economics, chess, materials science — and lets those borrowed logics reshape the search for answers. The kitchen is no longer a cage.

The results are striking. Across four biomedical problems, the AR‑generated solutions consistently improve on quantitative metrics. For perturbation effect prediction — the task of forecasting how a biological system responds when you tweak one of its knobs — the AR approach delivered a nearly thirteen‑fold improvement on distributional metrics over standard baselines. For oligonucleotide property prediction, AR‑guided strategies established new state‑of‑the‑art performance on two datasets. In cell‑cell communication inference, the analogically‑derived solution outperformed all existing methods on the AUPRC score. And for brain region interaction, the AR‑inspired method inferred coupling signals with a Spearman correlation of 0.729 to published methods — a substantial alignment with ground truth. These are not merely “interesting” findings; they are concrete, measured, and validated.

But the headline numbers on the real‑world tasks only tell part of the story. What is perhaps more revealing is what happened to the LLMs themselves when they used AR. The team evaluated solutions along three axes: diversity (how spread out the ideas were in semantic space), novelty (how likely a given solution was to appear genuinely fresh), and analogy quality (how rich the cross‑domain mapping was). On diversity, AR improved the Vendi score — a measure borrowed from ecology that quantifies how many “species” of ideas a generation contains — by 90 to 173 percent compared to baselines that simply prompted the LLM without cross‑domain reasoning. On novelty, judged by a stratified LLM‑scoring protocol, AR generated solutions that were deemed novel more than half the time; the baseline, by contrast, managed novelty rates as low as 1.6 percent. This is not a marginal tweak. It is the difference between a system that almost always gives you the obvious answer and one that routinely surprises you.

How does AR accomplish this? The pipeline — and this is where the details matter — unfolds in two careful stages. In the first, the LLM is given a research problem and a domain, and is asked to identify structural analogies: shared relations, not just surface‑level similarities. It might notice, for example, that the flow of information through gene regulatory networks bears an abstract resemblance to the flow of capital through economic supply chains — not because genes are dollars, but because both systems exhibit feedback, bottlenecks, and emergent stability. In the second stage, those analogous structures are used as productive constraints: the LLM is asked to imagine what a solution would look like if it were transposed from the economic domain back into biology. The result is not simply a list of ideas; it is a search guided by an imported logic, a borrowed way of seeing.

Now, you are probably thinking that this sounds suspiciously like what humans do when they are being creative, and that perhaps the whole thing is a clever re‑branding of “thinking really hard about related problems.” And you would not be wrong. Indeed, an important question sharpened by earlier work on analogical prompting — notably by Yasunaga and colleagues — is whether the diversity gains come from the structured analogy representation itself, or simply from the extra reasoning step that AR introduces. It might be that any additional prompt that forces an LLM to linger on a problem, to explore beyond its immediate associations, would yield similar benefits. Shen and colleagues are honest about this: their paper does not include the ablation studies that would isolate the causal thread. The mechanism is promising, but the chain of evidence is not yet unbroken.

There is a deeper, more worrying question, too. Even if AR works beautifully in the short run, what happens when the LLM’s own generations become part of the training data for the next model? Dohmatob and colleagues, in their rigorous work on “strong model collapse,” formalized a proof that iterative training on model‑generated data leads to an irreversible contraction of the output distribution — eventually, the model collapses to a single, repeating pattern, as if the very idea of novelty were being squeezed out. If AR generates diverse solutions today, but those solutions are then consumed by the next generation of models, does the diversity simply evaporate? The present paper does not test this; it is a question for a different experiment, a different study, a different kind of patience.

These are not reasons to dismiss AR, any more than the fragility of a first‑generation airplane is a reason to dismiss the Wright brothers. They are, rather, the boundary conditions of the claim. The paper’s contribution is not a solved problem, but a working prototype of a new way of thinking about machine creativity — a prototype that demonstrably works in the real world, on problems that matter, in ways that can be measured and replicated. It is a “sweet spot” between too‑cautious incrementalism and too‑wild speculation, a demonstration that the idea of analogical borrowing is not merely an intuition but an engineerable mechanism. The analogy between domains is, in this sense, itself an analogy for what the research is doing: taking something that works in one cognitive register — human creative problem‑solving — and transposing it into another.

This brings us to the philosophical pivot. For decades, creativity has been framed as a kind of inexplicable leap, a spark that resists algorithmic capture. But what if the spark is, at least in part, a skill of retrieval and remapping — a way of noticing that the logic of one domain can be rewritten in the syntax of another? The AR pipeline does not claim to have captured the full mystery of human insight. It claims only to have captured a piece of it, and to have turned that piece into a repeatable protocol. The fact that the protocol works — that it produces novel solutions that measurably improve over what an LLM can do alone — suggests that at least one component of creativity is, in the end, not magic but method.

We are left, then, with a question that sits in productive tension with the paper’s findings. If a machine can be taught to borrow, does it ever really understand what it has borrowed? The LLM that maps gene networks onto supply chains is not dreaming of factories and ribosomes; it is performing a statistical operation over a space of relations that human engineers have pre‑structured. The analogy is real in its effects, but it may be hollow in its interior life. And perhaps that is enough. Perhaps the measure of a creative tool is not whether it feels the weight of its own ideas, but whether it can help us feel that weight more clearly. The AR pipeline does not need to be conscious to expand what is possible; it only needs to be useful, and to surprise us.

The most enduring scientific breakthroughs have often arrived not from data alone, but from the sudden recognition that a pattern from one field echoes, with eerie precision, in another — that the mathematics of a living cell can be glimpsed in the mathematics of a financial market, that the logic of chess can illuminate the logic of drug design. What Shen and colleagues have done is to embed that act of recognition into a prompt, and to show that it works. The question of whether it will work twice, or a hundred times, under the weight of its own outputs, remains open. But for now, the kitchen door is unlocked, and the chef is learning to cook with flavours she never knew she had.

Yanjiang is an online editor of LoomSci.com

How This Article Was Reviewed: To provide a critical perspective, we identified the most relevant preprints cited by the authors and examined how those earlier works relate to and sometimes challenge the claims made here. No interviews were conducted. The final narrative is the editor’s own synthesis.

References

  • Andrew Shen et al., Unlocking LLM Creativity in Science through Analogical Reasoning, arXiv:2605.11258
  • Yasunaga et al., Large Language Models as Analogical Reasoners, arXiv:2310.01714
  • Si et al., Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers, arXiv:2409.04109
  • Dohmatob et al., Strong Model Collapse, arXiv:2410.04840