Skip to content(if available)orjump to list(if available)

Large Language Models for Mathematicians (2023)

mgraczyk

This paper is over 1 year old and didn't review the literature at the time well enough to see the early work that led to current generation reasoning models.

Now it's out of date, automated RL on math problems seems to work and scales with compute. As we scale available compute 100x over the next 5 years and reduce cost of compute by around 10x over the same time frame, it will become increasing clear that LLMs running for a long time are capable of replacing most mathematics research.

hnax

I'm afraid you're overlooking the fact that, since AI scales linearly with constraints and exponentially with compute, the importance of knowledge (i.e. constraints) dominates over data (compute) for AI to be useful.

Ericson2314

I don't think this article is very good

https://xenaproject.wordpress.com/2024/12/22/can-ai-do-maths...

https://xenaproject.wordpress.com/2025/01/20/think-of-a-numb...

These from the mathlib founder are much more informative!

kmill

Kevin Buzzard did great work to popularize mathlib among mathematicians (that's how I got involved myself), but he didn't found mathlib!

There's a blurb about the history here: https://leanprover-community.github.io/papers/mathlib-paper....

layer8

These are about current (and future) mathematical capabilities of LLMs, but not about how mathematicians can employ LLMs in their work.

wrp

About 20 years ago, I was working on a project to build a vector model over a corpus of math texts. What ultimately killed it was that I couldn't figure a way to automatically reduce the equations to consistent, searchable text. This paper and others I've read ignore that issue. Anyone know how it is actually handled?

threeducks

Visual language models are quite good at converting images to LaTeX or any other kind of representation these days. Here is a demo where you can upload a screenshot of a page with an equation and ask the neural network to e.g. "Transcribe this equation as LaTeX.":

https://huggingface.co/spaces/Qwen/Qwen2-VL

You can also run a smaller model locally if you have enough VRAM, for example Qwen2.5-VL-7B-Instruct:

https://github.com/QwenLM/Qwen2.5-VL?tab=readme-ov-file#usin...

Also works reasonably well with hand-written equations.

For searching through similar equations, you can probably embed each as a high-dimensional vector and then search for the closed vector. Here is a ranking of text embedding networks:

https://huggingface.co/spaces/mteb/leaderboard

Or if you want something more deterministic, parse the LaTeX equation to create an abstract syntax tree for which there are plenty of similarity measures.

wrp

I think LaTeX renderings are not unique. Is there a schema for normalizing them?

lanstin

In practice communities of research mathematicians develop latex styles, similar to what they name the variables and like which proofs can be taken for granted. Collaborations are a method to synchronize the latex as are professor/grad student relationships.

So when you injest all the latex, you get the semantics, latex conventions, an variable naming of each school math for free

meowkit

Not sure I follow your question. Pretty sure they tokenize LaTeX, which should be searchable if the unrendered LaTeX is available

wrp

We were doing OCR on printed text. I tried to get access to the Zentralblatt MATH corpus, but no go.

Thorrez

Reminds me of this article about an AI model that can generate proofs for Math Olympiad:

https://deepmind.google/discover/blog/ai-solves-imo-problems...

Disclosure: I work at Google, but not on AI.

jenny91

A paper about LLMs for Mathematicians written by four authors, none of whom seem to be actual mathematicians? (At least based on a quick scroll through their past arXiv papers, they seem like ML researchers.)

Just call it LLMs for ML Researchers, but then it doesn't sound anywhere as exciting?

tptacek

I started with the first author on this paper, Simon Frieder, and found his page: he's a postdoc mathematician (and ML researcher) at Oxford. I don't think this was a particularly sharp critique.

kadushka

This paper was posted in December and somehow it completely ignores o1 model, which was specifically optimized for reasoning.

mgraczyk

December 2023

lanstin

It isn't that good at maths tho, at least compared to Claude, at least for diophantine approximation and numerical linear algebra.

westurner

It makes sense for LLMs to work with testable code for symbolic mathematics; CAS Computer Algebra System code instead of LaTeX which only roughly corresponds.

Are LLMs training on the AST parses of the symbolic expressions, or token coocurrence? What about training on the relations between code and tests?

Benchmarks for math and physics LLMs: FrontierMath, TheoremQA, Multi SWE-bench: https://news.ycombinator.com/item?id=42097683

amelius

Waiting for llms for compilers.

null

[deleted]

sylware

Maths models have very probably significant differences. For instance, they are strapped to a formal solver. Not to mention, they should be in "continuous" ML learning mode.

I wonder if from the observation data of particle physics we have, a "physics" model could "infer" a hard mathemical theory using those previous maths models (and probably other tools) which would fit this very observation data.

ScyFy: if those mathemical theories are beyond us, we would need other models in order to try to extract some predictions which could be interesting for us to verify with some "real-life"/reasonable experiments.

youreth4tguy

I notice that I trust LLMs a lot nowadays, because I spent a lot of time trying to understand the answer to the first question, thinking I was just "not getting it", until realizing that it's just wrong, and then also realizing that the question doesn't even make sense in the slightest.

mmiyer

The question is perfectly valid -https://math.stackexchange.com/questions/60050/find-a-functi...

The paper notes GPT 4 can solve it (they seemed to have asked ChatGPT 3.5 - this paper is old by AI standards, the first version being from Dec 2023).

lanstin

Yeah they aren't that good at spotting wrong questions (tho better than a year ago). Claude is especially likely to do this correctly. GPT o whatever will do this push back wrongly. Something in Gemini is positioning Gemini as an all knowing expert rather than a tool for exploring new true statements. It ends almost every thing with "is there anything else I can explain about the distribution of algebraic numbers with a given height".

lanstin

Actually I just did one of my test questions on GPT o3-mini-high and it got it. Very nice. Back in the lead over Claude. (My last check was with o1; although if they used the whole chat history to train or fine tune I gave the answer and in the end forced o1 to accept it. Lot of arm twisting language tho.)