OCaml as my primary language
289 comments
·August 13, 2025noelwelsh
nine_k
I'd say that Google strives to have a reasonably short list of languages approved for production-touching code. Rust can replace / complement C++, while OCaml cannot (it could replace Go instead... fat chance!). So I suspect that the team picked Rust because it was the only blessed language with ADTs, not because they won't like something with faster compile times.
No way OCaml could have stolen the Rust's thunder: we have a number of very decent and performant GC-based languages, from Go to Haskell; we only had one bare-metal-worthy expressive language in 2010, C++, and it was pretty terrible (still is, but before C++11 and C++17 it was even more terrible).
GhosT078
In 2010, Ada 2005 was the most bare-metal-worthy expressive language. Now that would be Ada 2022.
nine_k
While at it: what was / is holding Ada back? I haven't seen a single open-source project built in Ada, nor did I hear about any closed-source corporate project that uses Ada's superpowers. (Most likely these exist! But I did not see any available, or at least well-publicized.)
People agree to go to great lengths to use a tool that has some kind of superpower, despite syntactic weirdness or tooling deficiencies. People study and use LISP descendants like Clojure, APL descendants like K, "academic" languages like Haskell and OCaml, they write hobby projects in niche languages like Nim or Odin, they even use even C++ templates.
Why is Ada so under-represented? It must have a very mature ecosystem. I suspect that it's just closed-source mostly, and the parties involved don't see much value in opening up. If so, Ada is never going to make it big, and will slowly retreat under the pressure of better-known open alternatives, even in entrenched areas like aerospace.
hardwaregeek
Wouldn’t Kotlin be a more reasonable choice in that case? It has ADTs and a lot of the same niceties of Rust.
Artamus
I'm inclined to think that the Python -> Rust was only for some odds and ends. I know the biggest recipient of Rust trainings was the Android platform team at first, which I think also used a lot of C++.
Kotlin is definitely available at Google, but when talking about sym types et al it's not nearly as nice to use as Rust / OCaml.
sureglymop
Yes. It can also be compiled to native. I just think it was held back too much by the java/jvm backwards compatibility but then again that's probably also the justification for its existence.
I definitely find it (and jetpack compose) make developing android apps a much better experience than it used to be.
What I like a lot about Kotlin are its well written documentation and the trailing lambdas feature. That is definitely directly OCaml inspired (though I also recently saw it in a newer language, the "use" feature in Gleam). But in Kotlin it looks nicer imo. Allows declarative code to look pretty much like json which makes it more beginner friendly than the use syntax.
But Kotlin doesn't really significantly stand out among Java, C#, Swift, Go, etc. And so it is kind of doomed to be a somewhat domain specific language imo.
michaelcampbell
Kotlin never had the "shiny new" aspect to it that Rust did; everyone gave it a bit of side-eye for coming from a company that wrote the IDE to support it well.
rendaw
I think GP means "algebraic data types" not "abstract data types", probably specifically tagged unions. Both Kotlin and Java can (now) do something similar with sealed classes but it's quite less ergonomic.
actionfromafar
Garbage collector in Kotlin makes it a no go for C or C++ displacement.
swiftcoder
Potentially, but Kotlin is even more recent that Rust, and didn't get blessed internally at Google till somewhat later
null
StopDisinfo910
> I feel if OCaml had got its act together around about 2010 with multicore and a few other annoyances[1]
OCaml had its act together. It was significantly nicer than Python when I used it professionally in 2010. Just look at what JaneStreet achieved with it.
The main impediment to OCaml was always that it was not American nor mainly developed from the US.
People like to believe there is some technical merit to language popularity but the reality it’s all fashion based. Rust is popular because they did a ton of outreach. They used to pay someone full time to mostly toot their horn.
jvican
Hear, hear. This theory also explains why other languages such as Scala were never really mainstream despite allowing Java- and Kotlin- style programming and having a much broader follower base in Europe. Lack of outreach, concerted marketing, and advocacy from American companies that have always dominated the narrative.
nothrabannosir
Python, PHP, Ruby: all not American though, right? Do you mean those only got hockey stick growth when they happened to get picked up by the USA ? But then couldn’t the same have happened for ocaml? And if so: why didn’t it, as it supposedly did for the others
Not to mention Linux I guess
StopDisinfo910
Von Rossum moved to the US and Python actually got US government funding pretty early. Lerdorf did PHP in Canada so America. Ruby is kind of the exception being Japanese but it’s pretty niche - nearly as much as Ocaml - and is mostly popular for Rails which was started from Denmark but by someone working for, you guessed it, a US company. Torvald announced Linux on a US usenet list and moved to California soon after.
Ocaml difference is that the core development is done by INRIA, a French public institution and the people in charge had zero interest in moving to the US or chasing the US market.
It’s funny because Ocaml became a bit more popular when people from the UK (Cambridge, JaneStreet) started doing more outreach.
IshKebab
I agree. If OCaml had solved some of its bigger paper cuts it could have been a real player. Compilation time is much better than Rust too:
* OPAM is quite buggy and extremely confusing.
* Windows support is very bad. If you ever tried to use Perl on Windows back in the day... it's worse than that.
* Documentation is terse to the point of uselessness.
* The syntax style is quite hard to mentally parse and also not very recoverable. If you miss some word or character the error can be "the second half of the file has a syntax error". Not very fun. Rust's more traditional syntax is much easier to deal with.
Rust basically has none of those issues. Really the only advantage I can see with OCaml today is compile time, which is important, but it's definitely not important enough to make me want to use OCaml.
jll29
I'd say the Modula-2 inspired module system is a very valuable asset compared to today's Rust.
The only contact with OCaml I had was that I wrote a bug report to a university professor because I wanted his tool to process one of my files, but the file was larger than OCaml's int type could handle. That itself wasn't the problem - he wrote it wasn't straight forward to fix it. (This is a bug of the type "couldn't have happened in Common LISP". But I guess even in C one could replace int by FILE_SIYE_TYPE and #define it as unsigned size_t, for instance).
pjmlp
It is more the other way around ML predates Modula-2, and the module system like ideas were already present in Mesa and UCSD Pascal. :)
pjmlp
That is why if I feel like doing ML style programing I rather reach out for Kotlin, Scala or F# than Rust, and even then Java and C# have gotten enough inspiration that I can also feel at home while using them.
I am no strage to ML type systems, my first one was Caml Light, OCaml was still known as Objective Caml, and Mirada was still something being discussed on programming language lectures on my university.
From what I see, I also kind of find the same, too many people rush out for Rust thinking that ML type systems is something new introduced by Rust, without having the background where all comes from.
michaelcampbell
> too many people rush out for Rust
Yes
> thinking that ML type systems is something new introduced by Rust
This feels off to me. Of the hype-train that Rust has been for a while now, a _type_ of type system it has hasn't been any of the cars, as it were.
At least in my readings; I'm sure we travel in different circles, but even the few academic oriented things have been other language features and memory safety through and through.
gerdesj
"I feel if OCaml had got its act together ..."
The great thing is we have choice. We have a huge number of ways to express ideas and ... do them!
I might draw a parallel with the number of spoken languages extent in the UK (only ~65M people). You are probably familiar with English. There are rather a lot more languages here. Irish, Scottish, Welsh - these are the thriving Brythonic languages (and they probably have some sub-types). Cornish formally died out in the sixties (the last two sisters that spoke it natively, passed away) but it has been revived by some locals and given that living people who could communicate with relos with first hand experience, I think we can count that a language that is largely saved. Cumbric ... counting still used by shepherds - something like: yan, tan, tithera toe.
I am looking at OCAML because I'm the next generation to worry about genealogy in my family and my uncle has picked Geneweb to store the data, taking over from TMG - a Windows app. His database contains roughly 140,000 individuals. Geneweb is programmed in OCAML.
If you think that programming languages are complicated ... have a go at genealogy. You will soon discover something called GEDCOM and then you will weep!
DrewADesign
For personal projects? Sure. In nearly any development organization larger than one person, unilaterally deciding to use OCAML instead of what everybody else uses would go over about as well as unilaterally deciding to use Aramaic at meetings.
Akronymus
From the gedcom wiki page it doesnt seem that bad. At least the data format itself. But it seems like its very easy to screw up working with it.
gerdesj
GEDCOM was developed by the Mormons and was not really designed to encompass the rich diversity of human relationships. You can seriously overload the notes fields!
Genealogy is really hard but it is important to not get too bogged down with formality and get the data stashed in one form or another. Genealogy "facts" decay with a horribly short half life. It is also important to accept all "facts" as is and not try to interpret them too much - that's something else.
I'm 55 years old and have memories dealing with my grandad on my mother's side who was born in 1901. So, within reason, I can record some second hand facts going back to very early C20. My parents were born 1942/3. etc etc. However, the gold standard is written evidence of a fact.
I think that genealogy really needs to discover systems like Elastic/Open Search and not try to coerce the data to a fixed schema.
Everyone thinks their family tree is a tree. No it really isn't - it's a thicket!
I have a relative that my uncle found from Padstow in Cornwall in C16 - her first born was probably a bastard fathered by the local squire's son. There's a soldier from WWI whom the family "knowledge" from around two generations ago was convinced he was a deserter. It turns out he was honorably discharged and emigrated to Canada. On of my G^5 dad died from septicemia after punching a window in a drunken pub brawl.
All of the above has documentary evidence, expect for the desertion thing, which turned out to be bollocks. Oh there is a good chance of a mass-murderer back in C18 near Devizes, Wiltshire!
This is where IT gets really interesting. How on earth do you go about genealogy data? You are sure to piss off someone(s) who want to wear the rose tinted specs and if you think modern politics are a bit challenging, why not try to deal with politics across centuries and the random weirdness that is your own family 8)
beezlewax
It's weird to see someone from the UK champion the Irish language as a choice as if they didn't try to systematically wipe it from the face of the earth for quite a long period of time.
Choice is good of course so do keep up the good work.
gerdesj
My grandad's family on my dad's side was driven out of Dublin by some folks who went on to set their house on fire, back in the day. They were warned off by a maid and legged it to England. A few generations earlier, on the same line, my ancestors were German immigrants to London.
I will upvote your comment nonetheless. I see it has been DVd which I don't like to see - we all have our views.
unstruktured
There is absolutely no reason to use double semicolons in practice. The only place you really should see it is when using the repl.
sigzero
Yeah, it makes me think he doesn't understand them in OCaml.
acjohnson55
I worked in OCaml for a year and I couldn't tell you by memory what the difference was. I remember being very annoyed by OCaml's many language quirks.
yodsanklai
> aesthetically the double semicolons are an abomination and irk me far more.
I think they have been optional for like 20 years, except in the top-level interactive environment to force execution.
That being said, I still don't get why people are so much upset with the syntax. You'll integrate it after a week writing OCaml code.
swiftcoder
Erlang faces a similar uphill battle when it comes to syntax - there are three different punctuation marks used as terminators depending on context, and you have to keep in your head the rules for all 3. As someone who has written quite a bit of Erlang, but infrequently, it's always a battle.
And I think a big part of the reason that Elixir has done so well (Elixir pretty much starting out as Erlang-but-with-Ruby-syntax)
whimsicalism
I spent more than a week writing ocaml and still found the syntax pretty annoying. ReasonML would have been nice if the Ocaml community actually cared, but they are a bit insular.
yawaramin
Reason syntax is fully supported by the OCaml ecosystem and has been for many years.
null
munificent
> I feel if OCaml had got its act together around about 2010 with multicore and a few other annoyances[1] it could have been Rust.
Arguably, that could have been Scala and for a while it seemed like it would be Scala but then it kind of just... didn't.
I suspect some of that was that the programming style of some high profile Scala packages really alienated people by pushing the type system and operator overloading much farther than necessary.
whimsicalism
Scala was always going to be hamstrung by the fact that it's a JVM language and yes, the crazy stuff people did with the language didn't help.
owlstuffing
I agree w that, but I think Scala has deeper problems.
It tries to be a better Java and a better OCaml at the same time. This split personality led to Scala’s many dialects, which made it notorious for being difficult to read and reason about, particularly as a mainstream language contender.
Above all, Scala is considered a functional language with imperative OOP qualities. And it more or less fits that description. But like it or not primarily functional languages don’t have a strong reputation for building large maintainable enterprise software.
That’s the quiet part no one says out loud.
It’s like how in academic circles Lisp is considered the most pure and most powerful of programming languages, which may be true. At the same time most real-world decision makers see it as unsuitable as a mainstream language. If it were otherwise, we’d have seen a Lisp contend with imperative langs such as Java, C#, TypeScript, etc.
I’ve always attributed this disconnect to the fact that people naturally model the world around them as objects with state — people don’t think functionally.
birdfood
OCaml is probably my favourite language.
The most involved project I did with it was a CRUD app for organising Writer's Festivals.
The app was 100% OCaml (ReasonML so I could get JSX) + Dream + HTMX + DataTables. I used modules to get reusable front end templates. I loved being able to make a change to one of my data models and have the compiler tell me almost instantly where the change broke the front end. The main value of the app was getting data out of excel into a structured database, but I was also able to provide templated and branded itineraries in .odt format, create in memory zipped downloads so that I didn't need to touch the server disk. I was really impressed by how much I could achieve with the ecosystem.
But having to write all my database queries in strings and then marshal the data through types was tiring (and effectively not compile time type checked) and I had to roll my own auth. I often felt like I was having to work on things that were not core to the product I was trying to build.
I've spent a few years bouncing around different languages and I think my take away is that there is no perfect language. They all suck in their own special way.
Now I'm building an app just for me and I'm using Rails. Pretty much everything I've wanted to reach for has a good default answer. I really feel like I'm focused on what is relevant to the product I'm building and I'm thinking about things unrelated to language like design layout and actually shipping the thing.
BenGosub
What is the idiomatic way to handle the results from the database in a strongly typed functional language?
JaggerJo
DarkLang which was initially written in OCaml eventually switched to F#. From what I remember the main reasons were the library ecosystem and concurrency.
I'm know .NET in and out, so I might be biased. Most of the boring parts have multiple good solutions that I can pick from. I don't have to spend time on things that are not essential to the problem I actually want to solve.
I've used F# professionally for multiple years and maintain a quite popular UI library written in it. But even with .NET there still are gaps because of the smaller F# language ecosystem. Not everything "just works" between CLR languages - sometimes it's a bit more complicated.
The main point I'm trying to make is that going off the beaten path (C#) for example also comes with a cost. That cost might or might not be offset by the more expressive language. It's important to know this so you are not surprised by it.
With OCaml it's similar I'd say. You get a really powerful language, but you're off the beaten path. Sure, there are a few companies using it in production - but their use case might be different than yours. On Jane Streets Threads and Signals podcast they often talk about their really specific use cases.
_mu
I haven't worked in OCaml but I have worked a bit in F# and found it to be a pleasant experience.
One thing I am wondering about in the age of LLMs is if we should all take a harder look at functional languages again. My thought is that if FP languages like OCaml / Haskell / etc. let us compress a lot of information into a small amount of text, then that's better for the context window.
Possibly we might be able to put much denser programs into the model and one-shot larger changes than is achievable in languages like Java / C# / Ruby / etc?
jappgar
That was my optimistic take before I started working on a large Haskell code base.
Aside from the obvious problem that there's not enough FP in the training corpus, it seems like terser languages don't work all that well with LLMs.
My guess is that verbosity actually helps the generation self-correct... if it predicts some "bad" tokens it can pivot more easily and still produce working code.
sshine
> terser languages don't work all that well with LLMs
I’d believe that, but I haven’t tried enough yet. It seems to be doing quite well with jq. I wonder how its APL fares.
When Claude generates Haskell code, I constantly want to reduce it. Doing that is a very mechanical process; I wonder if giving an agent a linter would give better results than overloading it all to the LLM.
gylterud
I usually treat the LLM generated Haskell code as a first draft.
The power of Haskell in this case is the fearless refactoring the strong type system enables. So even if the code generated is not beautiful, it can sit there and do a job until the surrounding parts have taken shape, and then be refactored into something nice when I have a moment to spare.
willhslade
Apl is executed right to left and LLMS.... Aren't.
Vosporos
Can't you just run HLint on it?
yawaramin
There's actually a significant difference between Haskell and OCaml here so we can't lump them together. OCaml is a significantly simpler, and moderately more verbose, language than Haskell. That helps LLMs when they do codegen.
b_e_n_t_o_n
This has been my experience as well. Ai writes Go better than any language besides maybe html and JavaScript/python.
byw
I wonder if it has more to do with larger training data than the languages themselves.
gf000
My completely non-objective experiment of writing a simple CLI game in C++ and Haskell shows that the lines of code were indeed less in case of Haskell.. but the number of words were roughly the same, meaning the Haskell code just "wider" instead of "higher".
And then I didn't even make this "experiment" with Java or another managed, more imperative language which could have shed some weight due to not caring about manual memory management.
So not sure how much truth is in there - I think it differs based on the given program: some lend itself better for an imperative style, others prefer a more functional one.
QuadmasterXLII
My experience is that width is faster than height to type- mostly from lack of time spent indenting. This is _completely_ fixed by using a decent auto-formatter, but at least for me the bias towards width lingers on, because it took me years to notice that I needed an auto-formatter
gf000
May be faster to type - but does it matter? I have never ever been even close to being bottlenecked from typing speed. The only difference is that I "buffer" between lines or between different segments within a single line (but possibly both).
Buttons840
If LLMs get a little better at writing code, we might want to use really powerful type systems and effect systems to limit what they can do and ensure it is correct.
For instance, dependent types allow us to say something like "this function will return a sorted list", or even "this function will return a valid Sudoku solution", and these things will be checked at compile time--again, at compile time.
Combine this with an effect system and we can suddenly say things like "this function will return a valid Sudoku solution, and it will not access the network or filesystem", and then you let the LLM run wild. You don't even have to review the LLM output, if it produces code that compiles, you know it works, and you know it doesn't access the network or filesystem.
Of course, if LLMs get a lot better, they can probably just do all this in Python just as well, but if they only get a little better, then we might want to build better deterministic systems around the unreliable LLMs to make them reliable.
gylterud
The day when LLMs generate useful code with dependent types! That would be awesome!
gylterud
I have found that Haskell has two good things going for it when it comes to LLM code generation. Both have to do with correctness.
The expressive type system catches a lot of mistakes, and the fact that they are compile errors which can be fed right into the LLM again means that incorrect code is caught early.
The second is property based testing. With it I have had the LLM generate amazingly efficient, correct code, by iteratively making it more and more efficient – running quickcheck on each pass. The LLM is not super good at writing the tests, but if you add some yourself, you quickly root out any mistakes in the generated code.
akoboldfrying
Property-based testing is available in other languages. E.g., JS has fast-check, inspired by quickcheck.
gylterud
The way code is written in Haskell, small laser focused functions and clearly defined and mockable side effects, lends itself very well to property based testing.
This might not be impossible to achieve in other languages, but I haven’t seen it used as prevailently in other languages.
dkarl
In Scala, I've had excellent luck using LLMs to speed up development when I'm using cats-effect, an effects library.
My experience in the past with something like cats-effect has been that there are straightforward things that aren't obvious, and if you haven't been using it recently, and maybe even if you've been using it but haven't solved a similar problem recently, you can get stuck trawling through the docs squinting at type signatures looking for what turns out to be, in hindsight, an elegant and simple solution. LLMs have vastly reduced this kind of friction. I just ask, "In cats-effect, how do I...?" and 80% of the time the answer gets me immediately unstuck. The other 20% of the time I provide clarifying context or ask a different LLM.
I haven't done enough maintenance coding yet to know if this will radically shift my view of the cost/benefit of functional programming with effects, but I'm very excited. Writing cats-effect code has always been satisfying and frustrating in equal measure, and so far, I'm getting the confidence and correctness with a fraction of the frustration.
I haven't unleashed Claude Code on any cats-effect code yet. I'm curious to see how well it will do.
omcnoe
I think that functional languages do actually have some advantages when it comes to LLM's, but not due to terseness.
Rather, immutability/purity is a huge advantage because it plays better with the small context window of LLM's. An LLM then doesn't have to worry about side effects or mutable references to data outside the scope currently being considered.
sshine
> My thought is that if FP languages like OCaml / Haskell / etc. let us compress a lot of information into a small amount of text, then that's better for the context window.
Claude Code’s Haskell style is very verbose; if-then-elsey, lots of nested case-ofs, do-blocks at multiple levels of intension, very little naming things at top-level.
Given a sample of a simple API client, and a request to do the same but for another API, it did very well.
I concluded that I just have more opinions about Haskell than Java or Rust. If it doesn’t look nice, why even bother with Haskell.
I reckon that you could seed it with style examples that take up very little context space. Also, remind it to not enable language pragmas per file when they’re already in .cabal, and similar.
esafak
I think LLMs benefit from training examples, static typing, and an LSP implementation more than terseness.
nextos
Exactly. My experience building a system that generates Dafny and Liquid Haskell is that you can get much further than with a language that is limited to dynamic or simple static types.
pmahoney
I tried to like OCaml for a few years. The things that hold me back the most are niggling things that are largely solved in more "modern" langs, the biggest being the inability to "print" arbitrary objects.
There are ppx things that can automatically derive "to string" functions, but it's a bit of effort to set up, it's not as nice to use as what's available in Rust, and it can't handle things like Set and Map types without extra work, e.g. [1] (from 2021 so situation may have changed).
Compare to golang, where you can just use "%v" and related format strings to print nearly anything with zero effort.
[1] https://discuss.ocaml.org/t/ppx-deriving-implementation-for-...
throwaway127482
Go's %v leaves a lot to be desired, even when using %+#v to print even more info. I wish there was a format string to deeply traverse into pointers. Currently I have to import go-spew for that, which is a huge annoyance.
Python does it best from what I've seen so far, with its __repr__ method.
jerf
The default %v does leave some to be desired, but don't underestimate the utility of being able to shove anything at it and get something back. This is especially important because this applies recursively; you can have a structure that may have something "unprintable" buried deeply in it, but at least it won't prevent you from printing everything else.
Strongly-typed languages that do not force any sort of stringification on values, and thus refuse to compile if you try to dump a simple log message of one of these values out, are really annoying to work with. I understand the conceptual purity of saying "Hey, maybe not everything even has a string representation" but it makes debugging a real pain. If I were writing a new language today I think I'd mandate that everything gets a default debugging string output by default because the alternative is just so rough.
Even a not-great printer that may have a sort of "*unprintable*" bailout, or print something not terribly useful, but doesn't actually stop you from printing anything, is better than a language that completely rejects it at compile time.
tucnak
Go has both Stringer and GoStringer interfaces, which is basically the same thing as __repr__.
garbthetill
What a brilliant article, it really puts to rest for me, the whole “why not use F#?” argument. In almost every OCaml thread, someone suggests F# as a way to sidestep OCaml’s tooling.
I’ve always been curious about OCaml, especially since some people call it “Go with types” and I’m not a fan of writing Rust. But I’m still not sold on OCaml as a whole, its evangelists just don’t win me over the way the Erlang, Ruby, Rust, or Zig folks do. I just cant see the vision
debugnik
Funny, I moved to OCaml to sidestep F# tooling. At least last time I used F#: Slow compiler, increasingly C#-only ecosystem, weak and undocumented MSBuild (writing custom tasks would otherwise be nice!), Ionide crashes, Fantomas is unsound...
But OCaml sadly can't replace F# for all my use cases. F# does get access to many performance-oriented features that the CLR supports and OCaml simply can't, such as value-types. Maybe OxCaml can fix that long term, but I'm currently missing a performant ML-like with a simple toolchain.
joshmarlow
It's been a few years since I've touched OCaml - the ecosystem just wasn't what I wanted - but the core language is still my favorite.
And the best way I can describe why is that my code generally ends up with a few heavy functions that do too much; I can fix it once I notice it, but that's the direction my code tends to go in.
In my OCaml code, I would look for the big function and... just not find it. No single workhorse that does a lot - for some reason it was just easier for me to write good code.
Now I do Rust for side projects because I like the type system - but I would prefer OCaml.
I keep meaning to checkout F# though for all of these reasons.
loxs
I migrated from OCaml to Rust around 2020, haven't looked back. Although Rust is quite a lot less elegant and has some unpleasant deficiencies (lambdas, closures, currying)... and I end up having to close one one eye sometimes and clone some large data-structure to make my life easier... But regardless, its huge ecosystem and great tooling allows me to build things comparatively so easily, that OCaml has no chance. As a bonus, the end result is seriously faster - I know because I rewrote one of my projects and for some time I had feature parity between the OCaml and Rust versions.
Nevertheless, I have fond memories of OCaml and a great amount of respect for the language design. Haven't checked on it since, probably should. I hope part of the problems have been solved.
jasperry
Your comment makes me think the kind of people who favor OCaml over Rust wouldn't necessarily value a huge ecosystem or the most advanced tooling. They're the kind who value the elegance aspect above almost all else, and prefer to build things from the ground up, using no more than a handful of libraries and a very basic build procedure.
loxs
Yeah, I was that kind of person, then I wrote a real tool that does real work in OCaml... and then I discovered than I am no longer such a person and went to Rust.
ackfoobar
> the end result is seriously faster
Do you have a ballpark value of how much faster Rust is? Also I wonder if OxCaml will be roughly as fast with less effort.
null
loxs
Just the straight/naive rewrite was ~3 times faster for my benchmark (which was running the program on the real dataset) and then I went down the rabbit hole and optimized it further and ended up ~5 times faster. Then slapped Rayon on top and got another ~2-3x depending on the number of cores and disk speed (the problem wasn't embarrassingly parallel, but still got a nice speedup).
Of course, all of this was mostly unneeded, but I just wanted to find out what am I getting myself into, and I was very happy with the result. My move to Rust was mostly not because of speed, but I still needed a fast language (where OCaml qualifies). This was also before the days of multicore OCaml, so nowadays it would matter even less.
javcasas
Were you using the ocamlopt compiler? By default, ocaml runs in a VM, but few people figure that out because it is not screaming its name all the time like a pokemon (looking at you JVM/CLR). But ocaml can be compiled to machine code with significant performance improvements.
debugnik
> By default, ocaml runs in a VM,
The Dune build system does default to ocamlopt nowadays, although maybe not back around 2020.
loxs
Yeah, I just checked in my repository, it was ocamlopt
shortrounddev2
OCaml is a great language without great tooling. Desperately needs a good LSP implementation to run breakpoints and other debugging tools on VSCode or other LSP-aware IDEs. I know there ARE tools available but there isn't great support for them and they don't work well
debugnik
LSP isn't the protocol that interfaces with debuggers, that'd be DAP. You're right that OCaml debugging is kinda clunky at the moment.
OCaml does have an okay LSP implementation though, and it's getting better; certainly more stable than F#'s in my experience, since that comparison is coming up a lot in this comment section.
StopDisinfo910
What’s clunky about the Ocaml debugger?
Ocaml has been shipping with an actual fully functional reverse debugger for ages.
Is the issue mostly integration with the debugging ui of VS Code?
anentropic
it was clunky AF last time I tried to use it https://discuss.ocaml.org/t/debug-ocaml-code/10867/18
and yeah integrating to VS Code debugging UI would be ideal
I really like OCaml, so I hope the community can continue to improve the UX of these features
nukifw
Indeed, efforts should be made in terms of DAP (https://microsoft.github.io/debug-adapter-protocol//), extending the following experimentation: https://lambdafoo.com/posts/2024-03-25-ocaml-debugging-with-.... However, I find the assertion about tooling a bit exaggerated, don't you?
lambda_foo
Using DAP with VSCode and OCaml's bytecode debugger (https://github.com/hackwaly/ocamlearlybird) works reasonably well already. My blog post above can be modified to work with VSCode, but I'm primarily an Emacs user. The main issue with Emacs and DAP is the DAP modes are not very polished.
The missing piece for OCaml is debugging native code, the compiler doesn't emit enough DWARF information and debuggers like LLDB or GDB, need to be modified to understand OCaml's DWARF information. Even there DAP with LLDB/GDB works, it's just the debugger doesn't have enough information to work well. You end up working with debugging assembly or C (if it's parts of the runtime). I've written up a PR documenting native debugging here https://github.com/ocaml/ocaml/pull/13747.
dismalaf
?? OCaml has had a completion engine for as long as I can remember (definitely over a decade) and it powers their LSP these days. I do know however that the community focuses mostly on Vim and Emacs.
jasperry
Question about terminology: Is it common to call higher-order function types "exponential types" as the article does? I know what higher-order functions are, but am having trouble grasping why the types would be called "exponential".
xigoi
ackfoobar has already given a good reason why function types are called exponential, but there is an even deeper reason: function types interact algebraically the same way as exponents.
The type A → (B → C) is isomorphic to (A × B) → C (via currying). This is analogous to the rule (cᵇ)ᵃ = cᵇ˙ᵃ.
The type (A + B) → C is isomorphic to (A → C) × (B → C) (a function with a case expression can be replaced with a pair of functions). This is analogous to the rule cᵃ⁺ᵇ = cᵃ·cᵇ.
ackfoobar
Since the cardinalities match the algebra, it's no surprise that identities translate, but seeing them still brings a smile to my face.
The correspondence can be pushed much further - to differentiation!
https://codewords.recurse.com/issues/three/algebra-and-calcu...
ackfoobar
A first-order function type is already exponential.
A sum type has as many possible values as the sum of its cases. E.g. `A of bool | B of bool` has 2+2=4 values. Similarly for product types and exponential types. E.g. the type bool -> bool has 2^2=4 values (id, not, const true, const false) if you don't think about side effects.
jolmg
> bool -> bool has 2^2=4 values
Not the best example since 2*2=4 also.
How about this bit of Haskell:
f :: Bool -> Maybe Bool
That's 3 ^ 2 = 9, right? f False = Nothing
f False = Just True
f False = Just False
f True = Nothing
f True = Just True
f True = Just False
Those are 6. What would be the other 3? or should it actually be a*b=6?EDIT: Nevermind, I counted wrong. Here are the 9:
f x = case x of
True -> Nothing
False -> Nothing
f x = case x of
True -> Nothing
False -> Just False
f x = case x of
True -> Nothing
False -> Just True
f x = case x of
True -> Just False
False -> Nothing
f x = case x of
True -> Just False
False -> Just False
f x = case x of
True -> Just False
False -> Just True
f x = case x of
True -> Just True
False -> Nothing
f x = case x of
True -> Just True
False -> Just False
f x = case x of
True -> Just True
False -> Just True
ackfoobar
Good point, well there's Ordering type built-in in Haskell (LT | EQ | GT). Ordering -> bool has 2^3=8 values (const true, const false, == LT, == EQ, == GT, is_lte, is_gte, ne)
EDIT: now you see why I used the smallest type possible to make my point. Exponentials get big FAST (duh).
jasperry
You didn't list all functions, just input-output pairs. Each function is a map from every possible input to an output:
f1 False = Nothing, f1 True = Nothing
f2 False = Nothing, f2 True = Just True
...
This gives the correct 3^2 = 9 functions.
nukifw
Usually we speaking only about sum and product (because article usually refers to ADT, so Algebraic Data type). A function is not really Data, so it is not included. But you can use the same tricks (ie: a -> b has arity b^a) to compute the number of potential inhabitant
vram22
I had the same doubt.
Here is my uneducated guess:
In math, after sum and product, comes exponent :)
So they may have used that third term in an analogous manner in the example.
voidhorse
The answers in the replies are all good but the real reason is because in category theory the construct that models function types is called an "exponential product". The choice of that name stems from the reasons explored in the replies, in particular from the fact that the number of total functions from A to B is aka ways determined by an exponent (cardinality of B raised to the power of cardinality of A)
ackfoobar
> Sum types: For example, Kotlin and Java (and de facto C#) use a construct associated with inheritance relations called sealing.
This has the benefit of giving you the ability to refer to a case as its own type.
> the expression of sums verbose and, in my view, harder to reason about.
You declare the sum type once, and use it many times. Slightly more verbose sum type declaration is worth it when it makes using the cases cleaner.
sunnydiskincali
> This has the benefit of giving you the ability to refer to a case as its own type.
A case of a sum-type is an expression (of the variety so-called a type constructor), of course it has a type.
datatype shape =
Circle of real
| Rectangle of real * real
| Point
Circle : real -> shape
Rectangle : real * real -> shape
Point : () -> shape
A case itself isn't a type, though it has a type. Thanks to pattern matching, you're already unwrapping the parameter to the type-constructor when handling the case of a sum-type. It's all about declaration locality. (real * real) doesn't depend on the existence of shape.The moment you start ripping cases as distinct types out of the sum-type, you create the ability to side-step exhaustiveness and sum-types become useless in making invalid program states unrepresentable. They're also no longer sum-types. If you have a sum-type of nominally distinct types, the sum-type is contingent on the existence of those types. In a class hierarchy, this relationship is bizarrely reversed and there are knock-on effects to that.
> You declare the sum type once, and use it many times.
And you typically write many sum-types. They're disposable. And more to the point, you also have to read the code you write. The cost of verbosity here is underestimated.
> Slightly more verbose sum type declaration is worth it when it makes using the cases cleaner.
C#/Java don't actually have sum-types. It's an incompatible formalism with their type systems.
Anyways, let's look at these examples:
C#:
public abstract record Shape;
public sealed record Circle(double Radius) : Shape;
public sealed record Rectangle(double Width, double Height) : Shape;
public sealed record Point() : Shape;
double Area(Shape shape) => shape switch
{
Circle c => Math.PI * c.Radius * c.Radius,
Rectangle r => r.Width * r.Height,
Point => 0.0,
_ => throw new ArgumentException("Unknown shape", nameof(shape))
};
ML: datatype shape =
Circle of real
| Rectangle of real * real
| Point
val result =
case shape of
Circle r => Math.pi * r * r
| Rectangle (w, h) => w * h
| Point => 0.0
They're pretty much the same outside of C#'s OOP quirkiness getting in it's own way.ackfoobar
> The moment you start ripping cases as distinct types out of the sum-type, you create the ability to side-step exhaustiveness and sum-types become useless in making invalid program states unrepresentable.
Quite the opposite, that gives me the ability to explicitly express what kinds of values I might return. With your shape example, you cannot express in the type system "this function won't return a point". But with sum type as sealed inheritance hierarchy I can.
> C#/Java don't actually have sum-types.
> They're pretty much the same
Not sure about C#, but in Java if you write `sealed` correctly you won't need the catch-all throw.
If they're not actual sum types but are pretty much the same, what good does the "actually" do?
tomsmeding
> Not sure about C#, but in Java if you write `sealed` correctly you won't need the catch-all throw.
Will the compiler check that you have handled all the cases still? (Genuinely unsure — not a Java programmer)
sunnydiskincali
> With your shape example, you cannot express in the type system "this function won't return a point".
Sure you can, that's just subtyping. If it returns a value that's not a point, the domain has changed from the shape type and you should probably indicate that.
structure Shape = struct
datatype shape =
Circle of real
| Rectangle of real * real
| Point
end
structure Bound = struct
datatype shape =
Circle of real
| Rectangle of real * real
end
This is doing things quick and dirty. For this trivial example it's fine, and I think a good example of why making sum-types low friction is a good idea. It completely changes how you solve problems when they're fire and forget like this.That's not to say it's the only way to solve this problem, though. And for heavy-duty problems, you typically write something like this using higher-kinded polymorphism:
signature SHAPE_TYPE = sig
datatype shape =
Circle of real
| Rectangle of real * real
| Point
val Circle : real -> shape
val Rectangle : real * real -> shape
val Point : shape
end
functor FullShape () : SHAPE_TYPE = struct
datatype shape =
Circle of real
| Rectangle of real * real
| Point
val Circle = Circle
val Rectangle = Rectangle
val Point = Point
end
functor RemovePoint (S : SHAPE_TYPE) :> sig
type shape
val Circle : real -> shape
val Rectangle : real * real -> shape
end = struct
type shape = S.shape
val Circle = S.Circle
val Rectangle = S.Rectangle
end
structure Shape = FullShape()
structure Bound = RemovePoint(Shape)
This is extremely overkill for the example, but it also demonstrates a power you're not getting out of C# or Java without usage of reflection. This is closer to the system of inheritance, but it's a bit better designed. The added benefit here over reflection is that the same principle of "invalid program states are unrepresentable" applies here as well, because it's the exact same system being used. You'll also note that even though it's a fair bit closer conceptually to classes, the sum-type is still distinct.Anyways, in both cases, this is now just:
DoesNotReturnPoint : Shape.shape -> Bound.shape
Haskell has actual GADTs and proper higher kinded polymorphism, and a few other features where this all looks very different and much terser. Newer languages bake subtyping into the grammar.> If they're not actual sum types but are pretty much the same, what good does the "actually" do?
Conflation of two different things here. The examples given are syntactically similar, and they're both treating the constituent part of the grammar as a tagged union. The case isn't any cleaner was the point.
However in the broader comparison between class hierarchies and sum-types? They're not similar at all. Classes can do some of the things that sum-types can do, but they're fundamentally different and encourage a completely different approach to problem-solving, conceptualization and project structure... in all but the most rudimentary examples. As I said, my 2nd example here is far closer to a class-hierarchy system than sum-types, though it's still very different. And again, underlining that because of the properties of sum-types, thanks to their specific formalization, they're capable of things class hierarchies aren't. Namely, enforcing valid program-states at a type-level. Somebody more familiar with object-oriented formalizations may be a better person to ask than me on why that is the case.
It's a pretty complicated space to talk about, because these type systems deviate on a very basic and fundamental level. Shit just doesn't translate well, and it's easy to find false friends. Like how the Japanese word for "name" sounds like the English word, despite not being a loan word.
nukifw
In the specific case of OCaml, this is also possible using indexing and GADTs or polymorphic variants. But generally, referencing as its own type serves different purposes. From my point of view, distinguishing between sum branches often tends to result in code that is difficult to reason about and difficult to generalise due to concerns about variance and loss of type equality.
ackfoobar
Unless you reach an unsound part of the type system I don't see how. Could you provide an example?
nukifw
- You can use GADTs (https://ocaml.org/manual/5.2/gadts-tutorial.html) and indexes to give a concrete type to every constructors:
```ocaml
type _ treated_as =
| Int : int -> int treated_as
| Float : float -> float treated_as
let f (Int x) = x + 1 (* val f : int treated_as -> int *)
```
- You can use the structurale nature of polymorphic variants (https://ocaml.org/manual/5.1/polyvariant.html) ```ocaml
let f = function
| `Foo x -> string_of_int (x + 1)
| `Bar x -> x ^ "Hello"
(* val f : [< `Foo of int | `Bar of string] -> string` *)
let g = function
| `Foo _ -> ()
| _ -> ()
(* val g : [> `Foo of 'a ] -> unit *)
```
(Notice the difference between `>` and `<` in the signature?)And since OCaml has also an object model, you can also encoding sum and sealing using modules (and private type abreviation).
voidhorse
I'm not sure why people are debating the merits of sum types versus sealed types in response to this. I prefer functional languages myself, but you are entirely correct that sealed types can fully model sum types and that the type level discrimination you get for free via subtyping makes them slightly easier to define and work with than sum types reliant on polymorphism.
Operationally these systems and philosophies are quite different, but mathematically we are all working in more work less an equivalent category and all the type system shenanigans you have in FP are possible in OOP modulo explicit limits placed on the language and vice versa.
ackfoobar
> I'm not sure why
Me neither.
> you are entirely correct that sealed types can fully model sum types
I want to be wrong, in that case I learn something new.
wiseowise
> Slightly more verbose sum type declaration is worth it *when it makes using the cases cleaner.*
Correct. This is not the case when you talk about Java/Kotlin. Just ugliness and typical boilerplate heavy approach of JVM languages.
ackfoobar
> Just ugliness and typical boilerplate heavy approach of JVM languages.
I have provided a case how using inheritance to express sum types can help in the use site. You attacked without substantiating your claim.
wiseowise
Kotlin's/Java's implementation is just a poor man's implementation of very restricted set of real sum types. I have no idea what
> This has the benefit of giving you the ability to refer to a case as its own type.
means.
gf000
You mistyped "backwards compatible change" going back to close to 3 decades.
nine_k
I wish somebody with this amount of experience would compare the benefits / shortcomings of using the ReasonML syntax. (The article mentions it once, in passing.)
globuous
What I missed most were let bindings (https://ocaml.org/manual/5.3/bindingops.html)
ReasonML has custom operators that allows for manipulating monads somewhat sanely (>>= operators and whatnot). rescript (reasonml’s “fork”) did not last time I checked. But it does have an async/await syntax which helps a lot with async code. reasonml did not last time I checked, so you had to use raw promises.
I believe Melange (which the article briefly talks about) supports let bindings with the reason syntax.
And this kinda changes everything if you React. Because you can now have sane JSX with let bindings. Which you could not until melange. Indeed, you can PPX your way out of it in ocaml syntax, but I’m not sure the syntax highlight works well in code editors. It did not on mine anyway last time I checked.
So for frontend coding, Melange’s reason ml is great as you have both, and let bindings can approximate quite well async syntax on top of writing readable monadic code.
For backend code, as a pythonista, I hate curlies. and I do like parenthesis-less function calls and definitions a lot. But I still have a lot of trouble, as a beginner ocamler, with non-variable function argument as I need to do “weird” parenthesis stuff.
Hope this “helps”!
nukifw
Sorry, I never used ReasonML so I don't see any advantage of using ReasonML except it had the time to die twice in 4 years :)
hardwaregeek
I like Reason syntax and I wish it was more common, but I think if you want to engage in the OCaml community it’s probably better to just bite the bullet and use the standard syntax. It’s what almost everybody uses so you’ll need to understand it to read any code or documentation in the ecosystem
myaccountonhn
I don't have extensive experience but the little I did was issues with LSP not working as well.
rybosome
I’d have liked to see the use of dependency injection via the effects system expanded upon. The idea that the example program could use pattern matching to bind to either test values or production ones is interesting, but I can’t conceptualize what that would look like with the verbal description alone.
Also, I had no idea that the module system had its own type system, that’s wild.
mrkeen
Haskeller here!
> The idea that the example program could use pattern matching to bind to either test values or production ones is interesting, but I can’t conceptualize what that would look like with the verbal description alone.
The article appears to have described the free monad + interpreter pattern, that is, each business-logic statement doesn't execute the action (as a verb), but instead constructs it as a noun and slots it into some kind of AST. Once you have an AST you can execute it with either a ProdAstVisitor or a TestAstVisitor which will carry out the commands for real.
More specific to your question, it sounds like the pattern matching you mentioned is choosing between Test.ReadFile and Test.WriteFile at each node of the AST (not between Test.ReadFile and Prod.ReadFile.)
I think the Haskell community turned away a little from free monad + interpreter when it was pointed out that the 'tagless final' approach does the same thing with less ceremory, by just using typeclasses.
> I’d have liked to see the use of dependency injection via the effects system expanded upon.
I'm currently doing DI via effects, and I found a technique I'm super happy with:
At the lowest level, I have a bunch of classes & functions which I call capabilities, e.g
FileOps (readTextFile, writeTextFile, ...)
Logger (info, warn, err, ...)
Restful (postJsonBody, ...)
These are tightly-focused on doing one thing, and must not know anything about the business. No code here would need to change if I changed jobs.At the next level up I have classes & functions which can know about the business (and the lower level capabilities)
StoredCommands (fetchStoredCommands) - this uses the 'Restful' capability above to construct and send a payload to our business servers.
At the top of my stack I have a type called CliApp, which represents all the business logic things I can do, e.g.I associate CliApp to all its actual implementations (low-level and mid-level) using type classes:
instance FileOps CliApp where
readTextFile = readTextFileImpl
writeTextFile = writeTextFileImpl
...
instance Logger CliApp where
info = infoImpl
warn = warnImpl
err = errImpl
...
instance StoredCommands CliApp where
fetchStoredCommands = fetchStoredCommandsImpl
...
In this way, CliApp doesn't have any of 'its own' implementations, it's just a set of bindings to the actual implementations.I can create a CliTestApp which has a different set of bindings, e.g.
instance Logger CliTestApp where
info msg = -- maybe store message using in-memory list so I can assert on it?
Now here's where it gets interesting. Each function (all the way from top to bottom) has its effects explicitly in the type system. If you're unfamiliar with Haskell, a function either having IO or not (in its type sig) is a big deal. Non-IO essentially rules out non-determinism.The low-level prod code (capabilites) are allowed to do IO, as signaled by the MonadIO in the type sig:
readTextFileImpl :: MonadIO m => FilePath -> m (Either String Text)
but the equivalent test double is not allowed to do IO, per: readTextFileTest :: Monad m => FilePath -> m (Either String Text)
And where it gets crazy for me is: the high-level business logic (e.g. fetchStoredCommands) will be allowed to do IO if run via CliApp, but will not be allowed to do IO if run via CliTestApp, which for me is 'having my cake and eating it too'.Another way of looking at it is, if I invent a new capability (e.g. Caching) and start calling it from my business logic, the CliTestApp pointing at that same business logic will compile-time error that it doesn't have its own Caching implementation. If I try to 'cheat' by wiring the CliTestApp to the prod Caching (which would make my test cases non-deterministic) I'll get another compile-time error.
Would it work in OCaml? Not sure, the article says:
> Currently, it should be noted that effect propagation is not tracked by the type system
rybosome
Thanks for the detailed reply, that’s very cool! This looks great, very usable way to do DI.
Do you use Haskell professionally? If so, is this sort of DI style common?
mrkeen
No idea how common it is. It's never been my main work language but I've been using it at work for prototyping/scripting/automation/analysis etc for at least ten years now.
It looks like what I've done is most similar to https://www.parsonsmatt.org/2018/03/22/three_layer_haskell_c...
And I think this was the post that proposed tagless-final over free-monads: https://markkarpov.com/post/free-monad-considered-harmful.ht...
abathologist
I am not a Haskell expert, nor an expert in effect systems, but, AFAIU, what mrkeen has provided is an analogous pattern in Haskell -- where effects are represented via the free monad with an interpreter -- and not an account of what is described in the article.
In OCaml we can (and do) also manage effects via monadic style. However, we don't have ad hoc polymorphism (e.g., no typeclasses), so that aspect of the dependency injection must go thru the more verbose (but more explicit, and IMO, easier to reason about) route of parametric modules.
The point in the article is that effect handlers allow direct-style code with custom effects, which enable a kind of "dependency injection" which actually looks and feels much more like just specifying different contexts within which to run the same program. If you are very used to doing everything in Haskell's monadic style, you may not really notice the difference, until you try to compose the handlers.
Here is an example I've put together to show what the author is talking about in practice: https://gist.github.com/shonfeder/a87d7d92626be06d17d2e795c6...
I saw a talk by someone from Google about their experiences using Rust in the Android team. Two points stuck out: they migrated many projects from Python, so performance can't have been that much of a concern, and in their surveys the features people liked most were basics like pattern matching and ADTs. My conclusion is that for a lot of tasks the benefit from Rust came from ML cicra 1990, not lifetimes etc. I feel if OCaml had got its act together around about 2010 with multicore and a few other annoyances[1] it could have been Rust. Unfortunately it fell into the gap between what academia could justify working on and what industry was willing to do.
[1]: Practically speaking, the 31-bit Ints are annoying if you're trying to do any bit bashing, but aesthetically the double semicolons are an abomination and irk me far more.