Examples for the use of AI and especially LLMs in notable mathematical developments

Question

The purpose of this question is to collect examples where large language models (LLMs) like ChatGPT have led to notable mathematical developments.

The emphasis in this question is on LLMs, but answers about other machine-learning tools are also welcome.

This question complements two questions that I asked before: Experimental mathematics leading to major advances (January 2010) and The use of computers leading to major mathematical advances II (June 2021). I think it will be useful to keep track of mathematical achievements based on LLMs or assisted by LLMs since it is considered a serious possibility that LLM's have the potential to change (and automatize) or at least assist research in mathematics.

I relaxed the threshold from "major" (in the previous two questions) to "notable" to allow more answers.

A related question specifically about Deep Mind is this: What mathematical problems can be attacked using DeepMind's recent mathematical breakthroughs? ; Another related question referring to deep learning is What are possible applications of deep learning to research mathematics?

There have been many similar questions on MO to this about the use of AI/machine learning in research math; see, e.g., mathoverflow.net/questions/463937 and other questions linked there. — Sam Hopkins
– Sam Hopkins ♦, Commented Oct 26 at 17:43
I haven't voted on the question (in either way), but I consider it likely that answers - if you get some - will lead to a lot of discussion regarding how significant the LLM contribution actually was. — Jochen Glueck
– Jochen Glueck, Commented Oct 26 at 17:48
My instinct is to downvote the question, though I don't have any better justification than that I hate the intrusion of AI into every sphere, and would rather not see t here; but that's unreasonable personal bias, so I just won't vote. But it does seem nonsensioal to me that the question would be at 9 – 7 while both answers, reasonable as far as I can tell, are at 0 – 2. I hope downvoters will consider leaving a comment about what they think is an appropriate answer. — LSpice
– LSpice, Commented Oct 26 at 20:27
I think that there should be a special badge for controversial questions :). — Gil Kalai
– Gil Kalai, Commented Oct 27 at 19:59
Re, just to be clear, I meant my rant to express dissatisfaction with the ubiquity of AI, not with you or this question; I hope I gave no offence. Re, I thought there was, but searching just turned up a post Can we have a badge for controversy? which seems to indicate that the answer to the titular question is, or 15 years ago was, "no." — LSpice
– LSpice, Commented Oct 28 at 14:57

Lior Silberman · Accepted Answer · 2025-10-27 23:16:37Z

22

Boris Alexeev and Dustin Mixon posted last week their paper Forbidden Sidon subsets of perfect difference sets, featuring a human-assisted proof, where they had an LLM generate the Lean formalization of their proof. In my view this is one of the promising uses of LLMs, because the verifier naturally guards against hallucinations.

The problem is notable: they give a counterexample to a $1000 Erdös problem (as well as noting that Marshall Hall had published a counterexample before Erdös made the conjecture).

My caveat: a human must still verify that the definitions and the statement of the main theorem are correct, lest the LLM generate a correct proof, but of a different theorem.

answered Oct 27 at 23:16

community wiki

Lior Silberman

4

$\begingroup$ This is a very interesting paper. But I think it's important to point out that this use of ChatGPT was a mixed success. They do cite one instance where one of their intermediate results (Proposition 20) was formally proved by the LLM autonomously. On the other hand, they also say that their efforts at vibe coding the nearly trivial result that if f is a fixed point–free involution on a finite set S, then S has even cardinality was "a multi-day struggle." $\endgroup$

Timothy Chow
– Timothy Chow

2025-10-31 14:32:08 +00:00
Commented Oct 31 at 14:32
3

$\begingroup$ IMO, what Alexeev and Mixon did was closer to "autoformalization" than to automated discovery of new theorems. Another impressive example of an autoformalization effort is the development by Math.Inc of a tool called Gauss, which helped them complete a challenging formalization project that Tao and Kontorovich had proposed but had not completed. $\endgroup$

Timothy Chow
– Timothy Chow

2025-10-31 14:39:16 +00:00
Commented Oct 31 at 14:39
1

$\begingroup$ it wasn't $1000 problem - it was a strong statement that would have proven $1000 problem had it been true. But even Erdos said this formulation was most likely false $\endgroup$

NooneAtAll3
– NooneAtAll3

2025-11-07 02:12:26 +00:00
Commented yesterday
1

$\begingroup$ Note that in this particular case, the statement of the main theorem had already been formalized in a Lean repository of Erdos problems, maintained by Google DeepMind. In particular, in this case the statement had already been inspected by experts. You are absolutely right that in general people probably won't be so lucky. $\endgroup$

Kevin Buzzard
– Kevin Buzzard

2025-11-08 01:57:39 +00:00
Commented 17 hours ago

Add a comment |

Sudipta Roy · Accepted Answer · 2025-10-27 12:06:23Z

Here is an example Counterexample to majority optimality in NICD with erasures

From the abstract:

We asked GPT-5 Pro to look for counterexamples among a public list of open problems (the Simons ``Real Analysis in Computer Science'' collection). After several numerical experiments, it suggested a counterexample for the Non-Interactive Correlation Distillation (NICD) with erasures question: namely, a Boolean function on 5 bits that achieves a strictly larger value of E|f(z)| than the 5-bit majority function when the erasure parameter is p=0.40. In this very short note we record the finding, state the problem precisely, give the explicit function, and verify the computation step by step by hand so that it can be checked without a computer. In addition, we show that for each fixed odd n the majority is optimal (among unbiased Boolean functions) in a neighborhood of p=0. We view this as a little spark of an AI contribution in Theoretical Computer Science: while modern Large Language Models (LLMs) often assist with literature and numerics, here a concrete finite counterexample emerged.

Zach Teitler · Accepted Answer · 2025-10-26 19:40:32Z

This paper

Sergey Avvakumov, Roman Karasev, Tensor rank of the determinant and periodic triangulations of $\mathbb{R}^n$

https://arxiv.org/abs/2509.22333

includes in the Acknowledgments "We also thank ChatGPT 5 for pointing out that the lower bound in the proof of Theorem 1.5 can be stated in tensor language and is thus equal to the determinant’s tensor rank."

Thanks, Zach! I knew the paper and I met Sergey today, but did not know about the role of ChatGPT :) — Gil Kalai
– Gil Kalai, Commented Oct 26 at 20:35

Marco Ripà · Accepted Answer · 2025-10-26 20:08:28Z

Not exactly a notable result, but in my recent preprint Evaluation of GPT-5 on an Advanced Extension of Kashihara's Problem I describe how GPT-5 has been able to improve the general version of an extended combinatorial problem I originally solved in 2010.

2 revs, 2 users 89% · Accepted Answer · 2025-10-28 02:56:49Z

Scott Aaronson Phillip Harris, Freek Witteveen have a recent paper on the bounds of amplification of QMA (quantum Merlin-Arthur). A critical part of the paper involved a linear algebra trick suggested by GPT5. See Aaronson's blog entry here.

3 revs · Accepted Answer · 2025-10-29 18:27:52Z

The paper “Point Convergence of Nesterov's Accelerated Gradient Method: An AI-Assisted Proof” by Uijeong Jang and Ernest Ryu, posted to Arxiv October 27, 2025, states in the abstract:

The Nesterov accelerated gradient method, introduced in 1983, has been a cornerstone of optimization theory and practice. Yet the question of its point convergence had remained open. In this work, we resolve this longstanding open problem in the affirmative. The discovery of the proof was heavily assisted by ChatGPT, a proprietary large language model, and we describe the process through which its assistance was elicited.

https://arxiv.org/abs/2510.23513

See also this discussion by Damek Davis that helps put the result in perspective: https://x.com/damekdavis/status/1982529760505782510?s=46

prof-g · Accepted Answer · 2025-10-28 17:25:14Z

here is a paper on bottleneck duality in flow networks with lattice coefficients from fall 2024.

https://arxiv.org/abs/2410.00315

the appendix to this paper details how the main result and the proof were generated by GPT-o1-mini in september 2024. it was very difficult to get a correct proof at the time; current models nail a correct proof immediately.

This is funny! I once tried in vain to detropicalize max-flow-min-cut (you can find some traces of that on MO), while you have managed to tropicalize it even further (+ becomes max) and then extend it to distributive lattices :) — darij grinberg
– darij grinberg, Commented Oct 28 at 17:54

Sam Hopkins · Accepted Answer · 2025-11-06 22:00:26Z

At the request of Gil Kalai, I'm converting a comment to an answer.

The paper "Mathematical exploration and discovery at scale" by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner was just posted to the arXiv: https://arxiv.org/abs/2511.02864.

Below is the abstract of the paper.

AlphaEvolve is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.

In a nutshell, the idea is to solve a combinatorial optimization problem by evolving code for generating combinatorial objects rather than evolving the combinatorial objects themselves. To do this, one needs to be able to make small random perturbations of the code while still having the code compile; this is where LLMs come in, since writing code is one thing LLMs are good at. — Timothy Chow
– Timothy Chow, Commented yesterday
@TimothyChow I believe the approach used to find better cap sets by DeepMind (discussed at mathoverflow.net/questions/463937) was along the same lines. — Sam Hopkins
– Sam Hopkins ♦, Commented yesterday

Piyush Grover · Accepted Answer · 2025-10-28 16:54:01Z

Using deep neural networks, Deepmind & collaborators numerically found a class of unstable singularities of the porous media and 3D Euler (with boundary) equations. Notable here is the fact that the level of precision of their solutions "meets the stringent requirements for rigorous mathematical validation via computer-assisted proofs" (quote from the paper).

Paper here: https://arxiv.org/abs/2509.14185 Article: https://deepmind.google/discover/blog/discovering-new-solutions-to-century-old-problems-in-fluid-dynamics/

Note that this is an application of neural networks, but not LLMs. (I think they tried to use AlphaEvolve, but this wasn't the main ingredient in the paper...) — Geordie Williamson
– Geordie Williamson, Commented Oct 28 at 21:42

Jalaj · Accepted Answer · 2025-10-28 17:09:11Z

A recent paper by Nagda, Raghavan, and Thakurta: "Reinforced generation of combinatorial structures: Applications to complexity theory" They received help from AlphaEvolve to improve the best-known bound for Max-3CUT and Max-4CUT. Their idea seems quite general, so I would not be surprised if more complexity theory results would be improved.

Timothy Chow · Accepted Answer · 2025-11-06 14:57:04Z

I have hesitated to post this example because I don't think it's really a "notable mathematical development" as such, but after seeing the other answers, I think this one is worth mentioning.

As reported in Scientific American, Epoch AI invited several mathematicians, including Ken Ono, to a meeting designed to generate challenge problems for "FrontierMath". Among other things, Ono came up with what he thought was a Ph.D.-thesis-level problem: "What is the 5th power moment of Tamagawa numbers of elliptic curves over $\mathbb{Q}$?" To Ono's amazement, the AI autonomously solved the problem. You can read Ono's account on his Facebook page (also reproduced below), or listen to him talk about it here.

Even if this is a cherry-picked example—the best one from the whole meeting—this strikes me as a very impressive achievement. But see also this tweet by Daniel Litt, who was also one of the invited mathematicians but was not too impressed when he read over the chat log.

A similar project, but on a smaller scale and led by Christian Stump, for using PhD-level mathematics problems to benchmark AI is: math.science-bench.ai — Sam Hopkins
– Sam Hopkins ♦, Commented 2 days ago

Stack Exchange Network

Examples for the use of AI and especially LLMs in notable mathematical developments

The purpose of this question is to collect examples where large language models (LLMs) like ChatGPT have led to notable mathematical developments.

11 Answers 11

You must log in to answer this question.

Linked

Examples for the use of AI and especially LLMs in notable mathematical developments

The purpose of this question is to collect examples where large language models (LLMs) like ChatGPT have led to notable mathematical developments.

11 Answers 11

You must log in to answer this question.

Linked

Related