The Math Is Haunted
168 comments
·July 30, 2025brookst
bubblyworld
Formalising natural language statements is a minefield of difficulties, for (imo) essentially the same reasons that writing code which interacts with the real world is so difficult. Concepts you take for granted like identity, time, causality... all of that stuff needs to be fleshed out carefully and precisely in the formalism for facts to be relatable to each other (or even expressible in the first place).
Not to discourage you - it's a cool problem! OpenCog comes to mind as a project that tried to take this all the way, and there's a field devoted to this stuff in academia called KRR (knowledge representation and reasoning). The IJCAI journal is full of research on similar topics.
(also see all the different logics philosophers use to formalise different kinds of argument involving time/modality/probability - there are a lot, but they aren't very "modular" and can't easily be mixed unless someone has solved that since I last looked into it)
eru
> Concepts you take for granted like identity, time, causality... all of that stuff needs to be fleshed out carefully and precisely in the formalism for facts to be relatable to each other (or even expressible in the first place).
Yes, and different formalisations of identity apply in different contexts.
Eg remember the famous line about not being able to step in the same river twice.
photonthug
> logics philosophers use .. aren't very "modular" and can't easily be mixed
Not sure if the model-checking communities would agree with you there. For example CTL-star [0] mixes tree-logic and linear-temporal, then PCTL adds probability on top. Knowledge, belief, and strategy-logics are also mixed pretty freely in at least some model checkers. Using mixed combinations of different-flavored logic does seem to be going OK in practice, but I guess this works best when those diverse logics can all be reduced towards the same primitive data structures that you want to actually crunch (like binary decision diagrams, or whatever).
If no primitive/fast/generic structure can really be shared between logics, then you may be stuck with some irreconcilable continuous-vs-discrete or deterministic-vs-probabilistic disconnect, and then require multiple model-checkers for different pieces of one problem. So even if mixing different flavors of logics is already routine.. there's lots of improvements to hope for if practically everything can be directly represented in one place like lean. Just like mathematicians don't worry much about switching back and forth from geometry/algebra, less friction between representations would be great.
Speaking of CTL, shout out to Emerson[1], who won a Turing award. If he hadn't died recently, I think he'd be surprised to hear anyone suggest he was a philosopher instead of a computer scientist ;)
[0]: https://en.wikipedia.org/wiki/CTL* [1]: https://en.wikipedia.org/wiki/E._Allen_Emerson
bubblyworld
Yeah, not suggesting philosophers are the only people using logics, but they've certainly been using them the longest!
Indeed, I've seen various attempts to tackle the problem including what you are suggesting - expressing the semantics of different logics in some base formalism like FOL in such a way that they can interplay with each other. In my experience the issue is that it's not always clear how two "sublogics" should interact, and in most cases people just pick some reasonable choice of semantics depending on the situation you are trying to model. So you end up with the same issue of having to construct a new logic for every novel situation you encounter, if that makes sense?
Logics for computing are a good example - generally you use them to formalise and prove properties of a program or spec, so they are heavily geared towards expressing stuff like liveness, consistency invariants and termination properties.
I haven't read about CTL though, thanks! I'll check it out. Hopefully I didn't write too much nonsense here =)
> Just like mathematicians don't worry much about switching back and forth from geometry/algebra [...]
As an ex-mathematician I think we worry a lot about transitioning between viewpoints like that. Some of the most interesting modern work on foundations is about finding the right language for unifying them - have a look at Schulze's work on condensed mathematics, for example, or basically all of Grothendieck's algebraic geometry work. It's super deep stuff.
> then you may be stuck with some irreconcilable continuous-vs-discrete or deterministic-vs-probabilistic disconnect
Agreed, I think this is one of the cruxes, and lately I'm starting to feel that maybe strict formal systems aren't the way to go for general-purpose modelling. Perhaps we need to take some inspiration from nature - completely non-deterministic, very messy, and nevertheless capable of reasoning about the universe around it!
zozbot234
> (also see all the different logics philosophers use to formalise different kinds of argument involving time/modality/probability - there are a lot, but they aren't very "modular" and can't easily be mixed unless someone has solved that since I last looked into it)
From a logic-as-types perspective, modalities turn out to be monads. So the problem of "mixing" modalities is quite similar to the problem of composing monads in programming languages.
bjackman
I think it's quite rare that the most important beliefs you should derive from the news can be justified by a collection of absolute statements.
I think you'd be better served by a tool for calculating chains of Bayesian reasoning. I've seen a tool that does this for numerical estimations.
lblume
Correct. The only way to not subject oneself to (economic) irrationality is to model your beliefs about the world using probabilities and using Bayes to update in light of new evidence.
t_mann
Careful, such an approach could easily end up giving an aura of logical objectivity to arbitrarily radical and nonsensical ideas. The political views of one of the fathers of modern logic may serve as a cautionary tale [0].
[0] https://en.m.wikipedia.org/wiki/Gottlob_Frege#Political_view...
CobrastanJorji
I think it'd be more interesting to map out entire trees of arguments about a topic. Start with something big, like "is there a God," then come up with all of the arguments for or against, then come up with all of the arguments against those arguments, then the counters to those arguments. Really explore it as a sort of debate space. Then bring in citations not as backing but for historical context: "Plato made this argument in such and such." The idea wouldn't be so much to decide a winner as to prevent us from going around in circles by making a map.
Scarblac
> Start with something big, like "is there a God,"
To do that you first need a definition of the concept God.
And then you realize that all the people making arguments in the past were using their own unspoken and incompatible definitions.
vntok
Kialo does this for online debates: https://www.kialo.com/
An example tree for the statement "God exists": https://www.kialo.com/god-exists-3491?path=3491.0~3491.1
magicalhippo
> I’ve been kicking around the idea of something like Lean (maybe even just Lean?) to rewrite news and other non-fiction articles, treating statements as theorems that need to be proven.
I found I got much better at writing non-fiction after having math at uni. I would help proof-read essays and other hand-ins by my SO and sister, and apply similar rigor as you mention. Stuff like "you show C follows from B here, but you haven't actually given an argument for why B follows A, so you can't then claim C follows from A".
It's tempting to say that with LLMs this seems like a plausible task to turn it into a program, but the hallucination issue puts a damper on that scheme.
TheOtherHobbes
There are rhetorical tricks which rely on this to be persuasive. You can say "Thing X is happening, so we should do Thing Y", and people will nod.
If you're sneaky about it it will read like a logical conclusion when in fact X and Y are only loosely related contextually, and there is no logical chain at all.
A standard political trick is to blame X on something emotive and irrelevant, and offer Y as a false solution which distracts from the real causes of the problem.
This is used so often it's become a core driver of policy across multiple domains.
Although it's very effective, it's a crude way to use this. There are more subtle ways - like using X to insinuate criticism of a target when Y is already self-evident.
Point being, a lot of persuasive non-fiction, especially in politics, law, religion, and marketing, uses tricks like these. And many others.
They work because they work in the domain of narrative logic - persuading through stories and parables with embedded emotional triggers and credible-sounding but fake explanations, where the bar of "That sounds plausible" is very low.
LLMs already know some of this. You can ask ChatGPT to make any text more persuasive, and it will give you some ideas. You can also ask it to read a text, pull out the rhetorical tricks, and find the logical flaws.
It won't do as good a job as someone who uses rhetoric for a living. But it will do a far better job than the average reader, who is completely unaware of rhetoric.
refulgentis
A mild damper at best, RAG-based pipelines are mature now.
Alas, things like this aren't logic proofs.
It bothers me to my core when I see this idea. Sometimes at FAANG. Blissfully management learned to...not promote...them.
petesergeant
> A mild damper at best, RAG-based pipelines are mature now.
I work with RAG pipelines all day, and the idea that hallucination isn't an ongoing major issue doesn't match my experience _at all_. Possibly a skill issue on my part, but, also on the part of everyone I ever talk to in the same space too.
null
BlarfMcFlarf
What about proven facts that get disproven? Is there room to rethink your priors?
atomicnature
Proof != evidence. In evidence, we corroborate, collate, add more sources, weigh evidence, judge. Proof is a totally different process. Only in the mathematical do one prove something, everywhere else we build up evidence, corroborate, etc.
poulpy123
News and non-fiction articles are not math and cannot be treated as math. At best you could build a tool that check the most glaring contradictions (like a typo changing a number), and I'm not even sure it can be consistent without building a software that understand language. At worse you would build a tool that spit bullshit that millions of people would treat as gospel
derdi
Nitpick, but it's a bit strange to say that the two_eq_two theorem looks like a function. It looks more like a constant, since it has no arguments. (Yes I know that constants are nullary functions.)
I would find the following a more convincing presentation:
theorem x_eq_x (x:nat) : x = x := by
rfl
theorem 2_eq_2 : 2 = 2 := by
exact (x_eq_x 2)
Here x_eq_x looks like a function, and in 2_eq_2's proof we apply it like a function.danabramov
Fair! I decided not to do that because the way arguments work (and dependent types in general — like being able to return a proof of `x = x` given x) is unusual for people like me, and deserves an article of its own. So I’m kicking that to one of the next articles.
Waterluvian
Wait… so are we basically compiling a dictionary of proofs all stemming from a handful of self-evident truths? And every further proof is just some logical aggregation of previous proofs?
Can someone please turn this into a Zachtronics style game?! I badly, direly want this. There’s a game called Euclidea that’s kind of like this for trigonometry and the whole premise of building a tower of logic is deeply attractive to me.
Is this what pure math is about? Is this what people who become math profs are feeling in their soul? The sheer thrill of adding to that dictionary of proofs?
Aside: I recall some famous mathematician had made a list of base proofs that you just hold to be true. Can someone remind me who, and/or what that list is called? I’m guessing they’re considered axioms.
wging
There is a game already, though it may not be exactly what you want (and the goal is definitely not "generate all known math"). I've played around with it, and I do think it's pretty fun.
The article mentions it, in fact: https://adam.math.hhu.de/#/g/leanprover-community/nng4
Waterluvian
How did I miss this! Thanks for pointing it out. This is scratching the itch.
magicalhippo
Had a blast with that game, really scratched that puzzle itch, and got to relearn some long-forgotten knowledge.
SilasX
I played through a lot of it, and while it was fun, I wouldn't call it gamified in the sense that Zachtronics does it. As an FYI, here are some sticking points I ran into as well:
1) It uses the term "goal" for the the current stage of the proof you're working on, which is counterintuitive. In common speech, the goal is the theorem you're trying to prove, not the current step you're on in proving it. It would be like if I were helping you get to work in the morning, and I said, "Okay, you're current [sub]goal is that you're done brushing your teeth." No. That's not a goal. That's progress toward a goal.
2) It doesn't warn you early on about the left-associativeness of the operator binding, or what to do about it. So you can apply what seem like valid transformations but then parentheses pop up that you don't expect, and which you have no guidance on how to deal with. I had to ask ChatGPT what was going on to solve it.
carodgers
> Aside: I recall some famous mathematician had made a list of base proofs that you just hold to be true. Can someone remind me who, and/or what that list is called? I’m guessing they’re considered axioms.
Take a look at zeroth-order logic.
treyd
You might be thinking of Euclid's axioms, which defines points, lines, planes, etc, and that lines can be parallel. This is an interesting one because the system is violated if your space is not flat, like if you're on a sphere.
You could also be thinking of Zermelo-Fraenknel set theory (ZF/ZFC), which most of modern mathematics is ultimately based upon.
cnity
This highlights I think what is an ultimate pitfall of something like Lean. It is something like this: when an assumption is broken or violated in mathematics, the whole field of mathematics can _grow_. Non-euclidean geometry, imaginary numbers, etc are all examples of this. Trying to cram maths into a programming language sounds like it would constrain it, removing the creativity and reducing all maths to a search space.
RossBencina
Lean does not reduce the mathematical search space as you suggest. Yes there is a fixed, compact low-level logical core that everything above it depends on. But this is equivalent to an encoding of the logical foundations that mathematics, that formal mathematics depends on in any case. On top that you have mathematical theories built on assumptions (axioms) and you can specify whatever axioms you like and change them at will. To use your analogy: the "search space" is parameterised by user-defined sets of axioms and assumptions.
null
ants_everywhere
> Can someone please turn this into a Zachtronics style game?!
That game is called math :) Partially joking, but I do think a game version would be fun.
> Is this what pure math is about?
More or less yes for an undergrad, but when you get to research it feels different.
> I badly, direly want this
Consider checking out an abstract algebra book. Maybe Dummit and Foote or something else popular. Proofs in algebra often have a satisfying game-like quality. The more popular books will have solutions you can find online if you get stuck.
danabramov
>More or less yes for an undergrad, but when you get to research it feels different.
Would you mind telling more about how it feels different in research?
ants_everywhere
You can get into a meditative zone when you're manipulating equations using techniques you've internalized.
In research you're doing a lot more of things like reading papers, putting your thoughts into words, trying to understand something the author of the paper barely understands, feeling lost and unsure where to look next. All of that can feel good too (or it cannot depending on the person) but it's a different feeling than playing a logic game.
As one example, Euclidia is a fun meditative game. But compare the difference in feeling between doing an exercise from Euclid and trying to prove the parallel postulate. It took centuries to realize you couldn't prove it, and then there was a lot of hard work trying to figure out what geometry was like if you get rid of it.
RossBencina
> Proofs in algebra often have a satisfying game-like quality.
Interesting. I find them banal and deeply unsatisfying.
ants_everywhere
That's fine. Different people prefer different subjects. But IME the mode graduate student finds algebra to be an enjoyable class because of the proofs.
jheitmann
Check out the game Bombe [1]. It's a minesweeper variant where instead of directly flagging or uncovering cells, you define rules for when cells can be flagged. As it gets more advanced you end up building lemmas that implicitly chain off each other. Then as _you_ get more advanced (and the game removes some arbitrary restrictions around your toolset) you can generalize your rules and golf down what you've already constructed.
swagmoney1606
In my mind this is literally what math is. We start with axioms, and derive conclusions. There's probably more to it than that, but that's the understanding I'm at now.
galaxyLogic
But shouldn't it also be part of the axioms what are the rules that allow you to derive new theorems from them?
So then you could self-apply it and start ... deriving new rules of how you can derive new theorems and thus also new rules, from axioms?
I'm jusr confused a bit about "axioms" and "rules". What's the difference?
sroelants
The rules that you use to compose axioms and propositions are a different set of axioms defined by the Logic system you're using. e.g., can a proof consist of infinitely many steps? Can I use the law of excluded middle? Some logic systems won't let you re-use the same proposition more than once, etc,...
They're usually considered separate, because they're orthogonal to the foundational axioms you're using to build up your mathematical systems. With the exact same system of axioms, you might be able to prove or disprove certain things using some logic systems, but not others.
null
JonChesterfield
Choosing the axioms is difficult.
null
petesergeant
Presumably made easier by something like Lean where you can have a very minimal set of axioms, because things you might use as axioms already have proved versions, in Lean.
null
hcs
Some good mentions elsewhere in the thread, another to check out is The Incredible Proof Machine https://incredible.pm/
foooooobar
Fun fact: the author of the The Incredible Proof Machine (Joachim Breitner) also works on Lean :-)
7373737373
One problem I have with learning Lean is that tactics - like rfl in the example - are overloaded, and their full semantics not completely explained/understandable from the tutorials. Unlike, say, C programming where one may understand what happens to the program state down to the bit, it feels too fuzzy. And the rewrite (rw) tactic syntax doesn't feel natural either.
LegionMammal978
Yeah, I've similarly found the tactics in Coq (now Rocq) difficult to internalize. E.g., I might have "A = B" and "P(A,A)" available, and I want to conclude "P(A,B)", but the rewrite will fail for some arcane reason. (Issues with the definition of some of the intermediate structures, I'd imagine.)
On the other end of the spectrum, I've recently been playing with Metamath and its set.mm database, which has no programmable tactics at all, only concrete inferences that can be used in a proof. (E.g., the modus ponens inference ax-mp says that "|- ph" and "|- ( ph -> ps )" prove "|- ps", where "ph" and "ps" are variables that can be substituted.) Alas, it's not much better, since now you have to memorize all the utility lemmas you might need!
7373737373
Agreed - the Metamath base language (and its verifier) seem to be the most tractable of all I've seen, although it is probably still quite far away from the complexity of the high level language(s) that compile to it.
Derived from it, the currently best attempt to achieve an unambiguous and secure language seems to be Metamath Zero: https://github.com/digama0/mm0
gylterud
This is one of the reasons I prefer Agda. It is usually written without tactics, you just write the proof term in a functional programming language via the Curry–Howard correspondence. The trade off is that you must be more disciplined with creating useful abstractions and functions, otherwise proving even routine stuff becomes really tedious.
Paracompact
I'm not sure you prefer Agda so much as you prefer providing proof terms functionally rather than imperatively. Then again I've never used Agda; how does it differ from Coq/Rocq minus Ltac or Lean minus tactics?
logicchains
Coq has much less support for working with dependent types; you need lots of annotations compared to Agda (which has superior dependent pattern matching support). Lean is somewhere between the two.
daxfohl
The surprising thing to me was that tactics are all written in "user-level" code, outside the proof kernel. This makes sense in the sense that you want a small, thoroughly tested kernel that doesn't change. But it also implies that if you use tactics in your proof, then your proof can go from correct to failing if one of the tactics you use gets modified between releases. Is that a problem in real world use?
Tainnor
Yes. I've seen proofs fail after a mathlib update because the behaviour of simp changed. I've never seen it do less after an update (so far), but sometimes it'll simplify more and then the next steps may fail to work.
I've since adopted the suggestion (that is afaik used in mathlib) to never use bare simp unless it closes the current goal directly. Instead, if you write simp?, Lean will run the simplifier and tell you exactly which theorems it used (in the form of simp only [...]) which you can then insert into the proof instead.
kmill
At least you can 'go to definition' on the tactics and see what they're doing. It's a lot to take in at the beginning, but it can all be inspected and understood. (At least until you get to the fundamental type theory; the reduction rules are a lot harder to get into.)
> the rewrite (rw) tactic syntax doesn't feel natural either.
Do you have any thoughts on what a natural rewrite syntax would be?
7373737373
> Do you have any thoughts on what a natural rewrite syntax would be?
Not yet, but I'd probably prefer something that more explicitly indicated (in English, or some sort of more visually "pointing" indicator) which specific parts of the previous step would be replaced
It feels weird, or I'd have to get used to that both what is being replaced and what it is replaced with depends on some distant context, it's very indirect as it requires switching attention between the tactics tree and the context or previous proofs
danabramov
If you have specific ideas, I'm also curious! I wonder if Lean can be extended to support what you're thinking of — its ability to have custom syntax and tactics seems really powerful. That's part of what excites me about Lean as I'm also not always the biggest fan of existing tactics, but they seem to evolve similarly to other pieces of software.
emmelaich
It was interesting to me as it didn't fit my expectations. As a math theory ignoramus, I expected that reflection and rewrite are more fundamental than addition. But Lean seems to assume addition but require explicit rfl and rewrite.
Perhaps there's a Lean "prelude" that does it for you.
derdi
Yes, there is a prelude that defines natural numbers and an addition function on them. As the post notes, the reflexivity tactic "unfolds" the addition, meaning that it applies the addition function to the constants it is given, to arrive at a constant. This is not specific to addition, it unfolds other function definitions too. So addition doesn't have to come first, you are right that reflexivity is more fundamental.
danabramov
Author here — this is correct. I’ve added a paragraph on this in an edit btw so it’s possible the parent poster hasn’t seen it yet. Reproducing the paragraph:
(Here, rfl closes ⊢ 3 + 3 = 6, but for a different reason than one might think. It doesn’t really “know” that 3 + 3 is 6. Rather, rfl unfolds the definitions on both sides before comparing them. As 3, 6, and + get unfolded, both sides turn into something like Nat.zero.succ.succ.succ.succ.succ.succ. That’s why it actually is a something = something situation, and rfl is able to close it.)
Also, another resource on this is https://xenaproject.wordpress.com/2019/05/21/equality-part-1...
armchairhacker
It's because the addition can evaluate to only one form: `3 + 3` and `6` both evaluate to `succ (succ (succ (succ (succ (succ zero)))))`, similarly Lean can infer `4 * 3 = 1 + 3 + 8` because both evaluate to the same as `12`. But an expression can be rewritten to an infinite number of forms, e.g. `x * 2` can be rewritten to `x + x`, `(x + (x / 2)) * 4/3)`, etc. So `refl` can automatically evaluate both sides but not rewrite them.
solomonb
This is why I prefer Agda, where everything comes down to pattern matching.
agnishom
You can absolutely use pattern matching in Lean instead of tactics, if you prefer to write the proof that is closer to "what is going on under the hood"
solomonb
yeah i didn't mean to imply you cannot do that, but tactics seem to be highly encouraged.
I'm actually a big fan of Lean, I just like it more as a programming language for writing programs with dependent types then as a proof checker.
danabramov
Yeah the documentation is also quite fragmented because tactics are user-definable and some are from Lean vs from Mathlib etc. I've gotten quite decent at using the basic ones but I still sometimes ask on Zulip if something isn't working as I expected.
swagmoney1606
You may like Agda. I prefer Lean even though you are right about this.
Karliss
Is there a way to read lean proofs noninteractively.
After playing with the natural number a game a bit, proofs quickly ended up being opaque sequences of "rw [x]" commands which felt unreadable. It's nice that the editor allows interactively viewing the state at different points, but having to click on each line ruins the flow of reading. Imagine if you had to read python code which has no indentation, braces or anything similar and only way to know where if statement ends or an else block starts is by clicking on each line. My impression might be influenced by the limited vocabulary that the early levels of natural number game provides. Does the richer toolset provided by full lean make it easier to make proofs readable without requiring you to click on each line to get the necessary context?
RossBencina
> Is there a way to read lean proofs noninteractively.
I'd like to see an answer to this question. I was looking into it the other day.
I found this: https://xenaproject.wordpress.com/2019/02/11/lean-in-latex/ Which gives a way to do the clicking-through outside the editor. And maybe gives some insight into how Lean people see things.
derdi
Rocq used to have a "mathematical proof language". It's hard to find examples, but this shows the flavor (https://stackoverflow.com/a/40739190):
Lemma foo:
forall b: bool, b = true -> (if b then 0 else 1) = 0.
proof.
let b : bool.
per cases on b.
suppose it is true. thus thesis.
suppose it is false. thus thesis.
end cases.
end proof.
Qed.
So the idea was to make proofs read more like "manual" proofs as you would find them in math papers. Apparently nobody used this, so it was removed.Isabelle's Isar proof language is similar, and AFAIK the standard way of proving in Isabelle (example from https://courses.grainger.illinois.edu/cs576/sp2015/doc/isar-...):
lemma "map f xs = map f ys ==> length xs = length ys"
proof (induct ys arbitrary: xs)
case Nil thus ?case by simp
next
case (Cons y ys) note Asm = Cons
show ?case
proof (cases xs)
case Nil
hence False using Asm(2) by simp
thus ?thesis ..
next
case (Cons x xs’)
with Asm(2) have "map f xs’ = map f ys" by simp
from Asm(1)[OF this] ‘xs = x#xs’‘ show ?thesis by simp
qed
qed
You spell out the structure and intermediate results you want, and the "by ..." blocks allow you to specify concrete tactics for parts where the detains don't matter and the tactic does what you want. In this proof, "simp" (a kind of simplify/reflexivity tactic) is enough for all the intermediate steps.I don't know if there is anything like this for Lean, but maybe this provides some keywords for a web search. Or inspiration for a post in some Lean forum.
danabramov
This is a great question! I’m still not very experienced so take it with a grain of salt. But here’s my take on it.
I’ve been spending time with Lean quite a bit for the past few months. When I look at a proof, I don’t “read” it the same way as I do with programming. It feels a bit more like “scanning”. What stands out is the overall structure of the argument, what tactics are used, and what lemmas are used.
In real Lean code, the accepted style is to indent any new goals, work on one goal at a time (except a few special parallel tactics), and outdent when the goal is done. That’s what I mean by the “shape” of the argument. See some examples in this PR I’m working on: https://github.com/gaearon/analysis-solutions/pull/7/files
Once you’re familiar with what tactics do, you can infer a lot more. For example `intro` steps into quantifiers and “eats” assumptions, if I see `constructor` I know it’s breaking apart the goal into multiple, and so on.
Keep in mind that in reality all tactics do is aid you in producing a tree of terms. As in, actually the proofs are all tree-shaped. It’s even possible to write them that way directly. Tactics are more like a set of macros and DSL for writing that tree very concisely. So when I look at some tactics (constructor, use, refine, intro), I really see tree manipulation (“we’re splitting pieces, we’re filling in this part before that part, etc”).
However, it’s still different from reading code in the sense that I’d need to click into the middle to know for sure what assertion a line is dealing with. How much this is a problem I’m not sure.
In a well-written proof with a good idea, you can often retrace the shape of the argument by reading because the flow of thought is similar to a paper proof. So someone who wants to communicate what they’re doing is generally able to by choosing reasonable names, clear flow of the argument, appropriate tactics and — importantly! — extracting smaller lemmas for non-obvious results. Or even inline expressions that state hypotheses clearly before supplying a few lines of proof. On the other hand, there are cases where the proof is obvious to a human but the machine struggles and you have to write some boilerplate to grind through it. Those are sometimes solved by more powerful tactics, but sometimes you just have to write the whole thing. And in that case I don’t necessarily aim for it being “understandable” but for it being short. Lean users call that golfing. Golfed code has a specific flavor to it. Again in my experience golfing is used when a part of the proof would be obvious on paper (so a mathematician wouldn’t want to see 30 lines of code dedicated to that part), or when Lean itself is uniquely able to cut through some tedious argument.
So to summarize, I feel like a lot of it is implicit, there are techniques to make it more explicit when the author wants, it doesn’t matter as much as I expected it would, and generally as your mental model of tactics improves, you’ll be able to read Lean code more fluidly without clicking. Also, often all you need to understand the argument is to overview the names of the lemmas it depends on. Whereas the specific order doesn’t matter because there’s a dozen way to restructure it without changing the substance.
harperlee
Perhaps this thread is a good place to ask, could anyone contribute their own opinion about the relative future of lean vs. idris/coq/agda? I want to dedicate some time to this from a knowledge representation point of view, but I'm unsure about which of them will have less risk of ending up as so many esoteric languages... I sank a lot of time on clojure core.logic for a related project and got burnt with the low interest / small community issue already, so I've been hesitant to start with any of them for some time.
armchairhacker
IME Lean and Coq/Rocq are used more in practice, and have bigger libraries and communities, than Idris and Agda.
Rocq is the most common for program verification, but I suspect mainly due to it being older, and it has weird quirks due to its age, so Lean may catch up. Lean is the most common for proving mathematical theorems.
Big projects verified in Rocq include CompCert, CertiCoq, and sel4. Additionally, some large companies use Rocq to verify critical software like in airplanes (there's a list at https://github.com/ligurio/practical-fm although it may not be accurate). Big projects in Lean include mathlib (collection of various mathematical proofs), and the ongoing work to prove Fermat's Last Theorem (https://imperialcollegelondon.github.io/FLT/) and PFR (https://teorth.github.io/pfr/). I'm not aware of "real-world" projects in Idris and Agda but may be wrong.
That said, they're all small communities compared to something like C++ or JavaScript. Moreover, verifying programs is very slow and tedious (relative to writing them), so I wouldn't be surprised if we see a big breakthrough (perhaps with AI) that fundamentally changes the landscape. But remember that even with a breakthrough your skills may be transferable.
daxfohl
I wouldn't put much money on any of them. IME most mathematicians aren't that interested in formalization, and the gulf between a hand written proof and and a computer verified syntax is pretty huge.
Theyre interesting to learn and play with for their own sake, but I'd be reluctant to make any bets on the future of any of them.
If I had to choose, Lean seems to have the most momentum, though the others have been around longer and each has some loyal users.
bravesoul2
I love the article. Few can cross the bridge and describe these sorts of things in an easy to digest way. The secret is showing all the tiny steps experts might not see as it is too obvious. Thank you!
danabramov
Thanks!
nowittyusername
I wonder suppose you are not relying on any tricks or funky shenanigans. is it possible to just throw random stuff at Lean and find interesting observations based if it approves? like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works? maybe im asking wrong questions though or not stating it well.. this is above my understanding, i could barely get my head around prolog.
Paracompact
As someone who does certified programming professionally, I believe generative AI and formal methods are a match made in heaven. I might even go so far as to wager that the notion that human programmers will or will not be replaced by LLM-based AI is entirely dependent on whether these AI can get good at certified programming + compositional reasoning.
> is it possible to just throw random stuff at Lean and find interesting observations based if it approves?
Traditional AI has an easy time with checkers, because the search space is small. Chess is a little harder. Go still cannot be reasonably tackled by non-machine learning AI. Meanwhile, the search space (number of available moves together with the diversity of explorable states) for a sophisticated formal language becomes unfathomably large.
When the problem is known to be of a certain nature, often you can time-efficiently brute force it via SMT solvers. Traditionally SMT solvers and proof assistants have been separate branches of formal methods, but they're finally learning to play off each other's strengths (cf. Sledgehammer, Lean-SMT).
> like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works?
Research in this vein needs to be made more mainstream. I'm excited, though, that there have been big funders behind these ideas for years now, even before LLMs became big. Cf. "Learning to Find Proofs and Theorems by Learning to Refine Search Strategies" for earlier work, or DeepSeek-Prover for more recent attempts. I'm no machine learning expert, but it seems it's still a very open question how to best train these things and what their future potential is.
All in all mainstream LLMs are still rather mediocre at languages like Rocq and Lean. And when they're wrong, their proof scripts are extremely tedious to try to troubleshoot and correct. But I have hope AI tooling in formal methods will mature greatly over time.
benreesman
I'm already doing almost all my LLM-assisted programming in Haskell, all the devops in check-heavy Nix, and starting to move everything into Dhall.
There's no way that Python written by LLMs will survive contact with Haskell written by LLMs in a standup competition.
One imagines the scope for this sort of thing goes very high in formality.
danabramov
This is an active area of research and experimentation!
Much of Lean community are on Zulip (which is kind of like a forum?) and you can see many relevant threads here: https://leanprover.zulipchat.com/#narrow/channel/219941-Mach...
kevinqi
as someone who hasn't seen Lean before but was curious from alphaproof, love the intro! curious if you can mention what you're working on in Lean?
danabramov
For now I'm just learning math with it!
Currently I'm going through https://github.com/teorth/analysis (Tao's Lean companion to his textbook) and filling in the `sorry`s in the exercises (my solutions are in https://github.com/gaearon/analysis-solutions).
kevinqi
very cool. btw, I also love that "sorry" is the "any" equivalent in Lean
barrenko
As an abstract rule of thumb, how much would one have to beef up on logic before attempting to screw around analysis and something like Lean?
ljlolel
Maybe before we have AGI we should get an AI that can translate Andrew’s proof into Lean for us. Easily tractable, checkable, useful, and build-upon-able
grumbelbart2
Work is already underway in that direction:
https://github.com/deepseek-ai/DeepSeek-Prover-V2
but also
https://deepmind.google/discover/blog/alphaevolve-a-gemini-p...
Terence Tao is doing a lot of work in this direction.
zozbot234
Kevin Buzzard is reportedly working on formalizing a modern proof of FLT. This effort has already managed to surface some unsound arguments in one of the prereqs for the proof (namely crystalline cohomology) https://xenaproject.wordpress.com/2024/12/11/fermats-last-th... though the issue has since been fixed.
7373737373
Does Lean have some sort of verification mode for untrusted proofs that guarantees that a given proof certainly does not use any "sorry" (however indirectly), and does not add to the "proving power" of some separately given fixed set of axioms with further axioms or definitions?
danabramov
Does `#print axioms some_theorem` mentioned at the end of the article qualify? This would show if it depends on `sorry`, even transitively, or on some axioms you haven't vetted.
7373737373
Oh, I missed that, thanks! It would be cool to use this to visualize the current state and progress on, and depth of the "proof dependency graph"!
masterjack
Yes, you can `print axioms` to make sure no axioms were added, make sure it compiles with no warnings or errors. There’s also a SafeVerify utility that checks more thoroughly and catches some tricks that RL systems have found
jonny_eh
Apparently this is possible with macros? I dunno: https://github.com/leanprover/lean3/issues/1355
kmill
That's Lean 3, from eight years ago, and it's from before 'sorry' really existed in the way we know it now.
---
To answer the GP's question: Not only is there a verification mode, but Lean generates object files with the fully elaborated definitions and theorems. These can be rechecked by the kernel, or by external verifiers. There's no need to trust the Lean system itself, except to make sure that the theorem statements actually correspond to what we think they're supposed to be.
treyd
What exactly do these object files look like?
I’ve been kicking around the idea of something like Lean (maybe even just Lean?) to rewrite news and other non-fiction articles, treating statements as theorems that need to be proven. Proofs could include citations, and could be compound things like “this is a fact if three of my approved sources asserted it as a fact”
It should then be possible to get a marked-up version of any document with highlighting for “proven” claims.
Sure, it’s not perfect, but I think it would be an interesting way to apply the rigor that used to be the job of publications.