AI Mathematical Proof Verification: The New Research Frontier
- Terence Tao used AI to assist a peer-reviewed mathematics paper after verification.
- AI systems now solve advanced mathematics problems and assist research workflows.
- Mathematicians increasingly rely on formal verification to validate AI-generated proofs.
- Researchers say verifying AI-generated mathematics remains a growing challenge.
In March, Fields Medal-winning mathematician Terence Tao used an artificial intelligence system to help write part of a peer-reviewed mathematics paper. The AI was used to search existing research and identify patterns rather than create new mathematics, and Tao checked every step before the paper was published.
The experiment reflected a prediction Tao made in 2023 that is now beginning to take shape. At the time, he said “2026-level AI, when used properly, will be a trustworthy co-author in mathematical research.” Speaking at an Institute for Pure and Applied Mathematics (IPAM) conference earlier this year, he said AI tools were saving “more time than they waste” and were “ready for primetime.”
AI has advanced rapidly since then. It has moved beyond solving undergraduate-level problems to tackling International Mathematical Olympiad questions. DeepMind’s AlphaGeometry solved 25 of 30 Olympiad geometry problems, while AlphaProof later reached silver-medal-level performance across the full competition. AI is no longer just helping mathematicians work faster. It is beginning to produce proofs that resemble those written by human researchers.
That progress has shifted the focus of the debate. The key question is no longer whether AI can generate mathematical proofs, but whether those proofs can be trusted.
Verification, Not Generation
Tao calls his vision for AI-assisted research “Big Mathematics.” Instead of replacing mathematicians, he sees AI handling technical tasks while researchers concentrate on creativity and new ideas.
He believes mathematics has a major advantage over many other fields because its results can be formally checked.
“If it wasn’t for this formal verification layer, opening projects up without any safeguards would just be a disaster,” Tao said. “But in math, we can completely check and verify outputs, and this really filters out a lot of the rubbish.”
Formal verification converts mathematical proofs into computer-readable code that proof assistants such as Lean can examine line by line. Every logical step is verified by software instead of relying only on human review.
The system has expanded quickly. Lean’s open-source mathematics library, mathlib, now contains more than two million lines of formally verified mathematics contributed by hundreds of researchers worldwide, making it one of the world’s largest mathematical verification projects.
Even so, researchers warn that verification systems may struggle to keep up with the growing volume of AI-generated mathematics.
The Verification Bottleneck
Tao argues that AI itself is not the biggest challenge.
“The level of automation and AI power that you can profitably use before it becomes slop is roughly proportionate to how stringent your verification is,” he said.
The concern is already affecting academic publishing.
In March alone, arXiv received more than 30,000 submissions. As AI-generated manuscripts increased, the repository announced that authors whose papers contain “incontrovertible evidence” of unchecked AI-generated material could face a one-year submission ban.
The platform explained its policy clearly:
“If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper.”
The policy does not ban AI-assisted writing. It requires researchers to verify AI-generated content before publication.
Correct Isn’t Always Enough
Verification is only one purpose of a mathematical proof. Proofs must also explain why a theorem is true.
Andrew Wiles spent nearly eight years proving Fermat’s Last Theorem, only for a flaw to be discovered during peer review. It took another year to fix the proof before it was accepted. The case remains a reminder that even human-written mathematics requires careful checking.
AI presents a different challenge. A proof assistant may confirm every logical step, but still provide little understanding of the underlying mathematics. Researchers say today’s AI is becoming increasingly good at producing correct proofs while still struggling to offer meaningful explanations.
PR Handout
Tao saw this firsthand earlier this year when he and other mathematicians tested AI systems on ten difficult research problems. The models often produced convincing arguments, but Tao noted that many were so technical that non-experts “would struggle to verify any AI-generated output,” leaving only a small group of specialists able to judge whether the results were truly correct.
Formal verification also has limits. It works best when mathematics can be translated into machine-readable logic, while many frontier research problems remain difficult to formalise.
Trust May Become the Scarcest Resource
Imperial College London’s Kevin Buzzard, a leading advocate of formal mathematics, has also urged caution. He argues that although AI is becoming better at generating proofs and code, mathematical understanding still depends on explaining ideas, not simply verifying them.
Formal verification can reduce logical errors, but it cannot automatically produce insight.
As AI makes it easier to generate mathematical papers, researchers believe the real shortage will not be new proofs. Instead, the scarce resource will be people and verification systems capable of deciding which AI-generated mathematics is genuinely worth trusting.