Why does Mill use Scala?

77 comments

·February 10, 2025

toprerules

The best config system I've ever seen used plain old Python to generate static configs. Everyone knows Python. Python is easy to do data munging in, as demonstrated by it's popularity as the #1 data science tool. There's boundless libraries to make Python more functional, use stricter typing, or reduce the amount of side effects it can cause. Even Starlark is just a dialect of Python.

You can spend decades building a complicated configuration language, use a bespoke functional language as Mill does, but if you're a single company that can enforce code quality and just wants to get the job done, I feel like everything else is just unnecessary and over-engineered to scratch some academic itch for a "better system" that enforces "purity" at the cost of velocity.

I also think that now that LLMs are on the rage, how much context do you think they have for bespoke config language vs Scala vs Python? I think we know the answer to that one.

IshKebab

Python is a terrible choice for that sort of thing. Who really wants to have to set up a venv and deal with pip nonsense just to write a config file? Hell even installing Python is sometimes difficult.

have-a-break

You could say the same thing about setting up a JVM...

pjmlp

JVM doesn't have everything all over the place, JAVA_HOME and PATH suffices.

It doesn't require reaching out to tons of C libraries when performance is called for, and there are at least two free beer options to AOT compile to native code, if required.

IshKebab

I definitely would!

fastasucan

You don’t need any of that now that we have uv (https://github.com/astral-sh/uv)

IshKebab

Not really. Uv is great but it doesn't really help here. If an app uses Python as its configuration file format then the app is running Python. It's going to do `python3 config.py` or similar. It doesn't know anything about uv.

So you would still need to create a uv project, run `uv sync` and `uv activate` or whatever and then run your app. Not practical.

The only option if you use Python as a config file format is to stick to old features (Python 3.6) and not use any third party libraries. But op was saying third party libraries are one of the benefits of using Python...

MathMonkeyMan

Stick to the standard library as of an oldish version of python (3.6?) and it's pretty much zero-install zero-config.

threeseed

On a Mac, Python has always been a challenge.

Up until recently Apple only included Python2 and so developers used Homebrew to install Python3. Now it’s very common to find two versions of Python3 installed on a Mac developer’s laptop that conflicts with each other.

You really want to be using virtualenv.

dkkergoog

[dead]

iforgot22

If only Python had the equivalent of npm.

aiiizzz

Thought that was pdm. Never saw it used so far.

MathMonkeyMan

This does work well. A team I was on at a past job did exactly this. On Unix the service literally ran `std::system("python config.py >config.json")` on startup.

The problem with this is that the answer to the question "what kind of configuration can I expect?" is "simulate the script and find out."

If the script is written well, and is short, then the parameters that are filled in by the runtime environment are apparent. Over time, though, there is a risk that the script will not remain written well, and it almost certainly won't remain short.

siriusfeynman

An approach I use is splitting my config tools into 2 stages

Stage 1 creates a "explicit" config that can be exported to plaintext that contains exactly what is going to be created/modified with no abstraction/simplification

Stage 2 applies the "explicit" config

You get to be as clever as you want in stage 1 to avoid excessive copy pasting or not being able to know what your tool is going to do because all you have to go on is some homegrown DSL

iforgot22

You run into the same problem with config DSLs, except now you're dealing with a DSL. Config is almost never going to be static.

MathMonkeyMan

True. One advantage I can imagine for a DSL is that it constrains what is possible and optimizes (syntactically) what it's supposed to be for. I think that the author of Nix justified its language that way.

The counterargument is "eventually you'll need every facility provided by a programming language, so just start with a programming language."

I'm not sure how I feel about it. The YAML templating situation in Kubernetes is a [shit show][1]. Then again, I did once cave into the temptation of writing a [lisp-like XML preprocessor][2] to make my configurations less verbose. It doesn't have any access to the environment, though, so it's not a general purpose configuration language, just a shorthand for static XML.

[1]: https://www.davidgoffredo.com/no-string-templates

[2]: https://github.com/dgoffredo/llama

lmm

Scala is hardly some obscure bespoke language. It's a top-20, maybe top-10 programming language, that's been around for 20+ years (and had far fewer breaking changes than Python over that time). Most Python translates directly into Scala, but with the benefit of a proper sound type system and full IDE support. And it's a great language for data munging.

makeitdouble

The claim sounded outlandish, but Scala looks indeed to be around the top 10~20 languages in hiring for instance:

https://www.devjobsscanner.com/blog/top-8-most-demanded-prog...

Scala is only in 0.5% of the scanned job offerings, and is far far behind the major languages in numbers, but I was surprised there's more demand than Rust or even Perl to be honest.

iforgot22

I'm not surprised it's above Rust and Perl, but it's below Dart?! Ouch.

bdangubic

this “top 10” your LLM hallucinating? :)

threeseed

> Everyone knows Python

No they don’t. Just like everyone doesn’t know Cobol, Fortran, Scala etc.

But by having a programming language as your build tool you now make it harder for new people to onboard. As in order to build project they often need to some unique, specific to the language syntax. And in order to find this syntax they look around on Github and because it’s a programming language every project has their own unique, specific to the project approach.

Versus something like Cargo.toml where it’s simple and consistent regardless of which project you look at.

emidln

> No they don’t. Just like everyone doesn’t know Cobol, Fortran, Scala etc.

Sure somebody might not have Python experience, but it's pretty easy to just not hire someone who says they don't know Python and isn't willing to learn for the role. I don't know that you'd filter out many candidates out of any random 100 devs.

threeseed

I am talking about graduates and others new to programming.

Of course they are willing to learn for the role but making it hard for them in the beginning can forever turn them off a language. That has been a big problem with Scala and Spark.

lmm

> Sure somebody might not have Python experience, but it's pretty easy to just not hire someone who says they don't know Python and isn't willing to learn for the role.

This works just as well for Scala.

iforgot22

So then they need to know toml (Tom's Obvious Minimal Language)? https://github.com/gtk-rs/examples/blob/master/Cargo.toml I don't know what this file says.

wocram

I think it's hard to argue that Cargo.toml is any simpler than Python. Json might be ubiquitous enough for anyone to read and understand, but if Python is foreign than toml is no better.

aidenn0

I don't know about Python specifically, but using a language I'm familiar with to generate ninja files (+ any header/environment/&c) for the build has become my go-to way of doing builds in the past 18 months or so.

hcarvalhoalves

> I also think that now that LLMs are on the rage, how much context do you think they have for bespoke config language vs Scala vs Python? I think we know the answer to that one.

Nothing against Python, but of all the reasons to choose a technology, whatever is more represented on the dataset of some LLM is the worst reason.

This is a death spiral. There's no hope for the future of this industry if newcomers are thinking like this.

iforgot22

I cared about programming languages when I was a newcomer. Stopped caring about 10 years ago. They're just tools, each with their own gotchas and different design choices I couldn't care less about. Between two tools that both work ok, I will definitely pick whichever one my team and I can learn the easiest, and that includes LLM coverage.

asalahli

> There's boundless libraries to make Python more functional, use stricter typing, or reduce the amount of side effects it can cause.

What are some examples of a library that can limit or prevent side effects of a piece of python code? I could use one right now.

eptcyka

Ah yes, the age old belief that all software is complex enough that one must first run some other bespoke turing complete program to build every single piece of software.

And of all the languages to pick for this, python, with it's non-hermetic execution environment is bound to bite you in the ass, once your buildscripts start depending on libraries. Oh, you could use poetry to solve the library issue with python, or maybe it'll be setuptools, pip or whatever is the flavour of the month in python packaging.

After fighting with Nix for a sufficiently long time, I think most language specific build tools are not neccesarily the best solution to the problem of automating a build for bit of software written in language X. Complex projects will eventually evolve to depend on multiple languages (unless you're the Linux kernel), at which point the specialized language build tools turn into cumbersome barriers in the build process, where different build tools are not aware of the caching, conventions and configurations of any other tool. As such, in an ideal world, any new language would come with a compiler or bundler that can be supported well by higher level build/packaging tools. And bespoke python scripts ain't that.

koito17

The article fails to mention whether Scala can ensure code is deterministic and hermetic. Starlark code is deterministic and hermetic, but the article never mentions this. Unfortunately, Starlark does not have static typing, but I think types would make Starlark one of the best languages today for build configuration.

In the Clojure community, there was a huge push for "builds are programs". I somewhat agree with this assertion, but I also think "one should restrict the class of programs a build belongs to". Neither Clojure nor Scala, compared to Starlark, seem to offer a way to ensure builds belong to a deterministic subset of programs.

Thus I am still wondering "why Scala?". I have never used Scala, but reading this whole article gives me the impression that Mill is the Scala equivalent of Clojure's tools.build. That is not what I would want in a build system.

fmbb

What is special in starlark that makes it hermetic and deterministic?

Scala is deterministic.

If you call functions that have side effects and nondeterministic behavior you can fall outside the comforts of determinism. But you can stumble upon library functions someone wrote in Starlark that accidentally put you there as well.

The Starlark homepage says

> Hermetic execution - Execution cannot access the file system, network, system clock. It is safe to execute untrusted code.

But the last time I wrote Starlark it was to define build targets in Basel. And executing the builds definitely accessed my file system and the network, otherwise builds would have no results.

thirtyseven

Bazel splits the build into multiple phases. Starlark only comes into play in the first two, load and analysis. During these phases, Starlark code doesn't have access to the filesystem, except in a few very limited cases like using the glob() function to expand a wildcard to a list of source files. Furthermore, it only generates an abstract graph of build actions. The Bazel engine is responsible for executing this graph in later stages, which might result in non-hermetic things happening but usually not.

Starlark has intentionally limited functionality such as lacking Turing completeness or global variables. This provides guarantees that it can be executed in parallel and will have a finite runtime.

cbeach

Bazel enforces a hermetic sandbox for Starlark to operate within.

All input files have to be declared, and the build process can only see files that are declared.

Starlark cannot access arbitrary files at runtime, and it deliberately has no APIs for things like system time or random number generation, or global state.

Scala, on the other hand, has no such restrictions. As much as I love Scala, I think it’s an odd choice for a pure, deterministic system. Although perhaps if you use Scala to build a DSL (an area where Scala shines) you could engineer a pure functional sandbox within Scala.

fmbb

Do you only declare the file names, or also file content and owner and group and everything up front?

Sure it deliberately has a bunch of restrictions in its base form. But in order to use it for anything you have to write custom actions or download and execute custom actions others wrote. And this will mutate your file system and run arbitrary executables.

michaelmior

> Scala is deterministic.

I don't see how Scala is any more deterministic than any other language.

fmbb

That’s what I’m saying. About Starlark.

Kwpolska

Because it’s a Scala project, written by a Scala fan, simple as that. No need to come up with extra justification.

kunley

"Mill is a fast, scalable, multi-language build tool that supports Java, Scala, Kotlin, and Python".

So, while I understand this tool can resonate in the JVM world, I have no idea why one would want to pull Java into their toolset in order to build Python.

briankelly

I don’t know who is using mill but I’d guess Scala people and Scala frequently implies Spark which in turn implies PySpark. For instance I could see it being useful for a team building a source connector which will be Scala-centric but need some PySpark additions as well. Even without Spark, Scala is frequently used for data centric applications and those kinds of teams would naturally be working with python as well.

wavemode

What does it even mean to "build" Python?

pletnes

I imagine creating a wheel file with your code, metadata and resources? Also recall that a lot of python packages have dependencies written in C, Fortran, Rust etc.

Also, many tools exist to create executable programs - basically bundling the python interpreter with some .py files, etc.

tacticus

shouting at pip inconsistencies

kunley

Yeah had the same thought actually

vander_elst

It seems this is mainly a scala project and then they are using scala also for the configuration, it probably makes sense for them.

rpcope1

One interesting thing is that there are comparisons to Maven and Gradle but not sbt. Do people just not use sbt anymore or is it omitted because it's also Scala and/or prone to becoming a mess?

ezst

https://mill-build.org/mill/comparisons/sbt.html

videogreg93

I used sbt in my previous job and didn't hate it. All I want from build systems is to get out of my way and it let us do that pretty well. Very simple to add new tasks as well.

openplatypus

We use it.

It is actively developed.

It doesn't get in the way.

It does what it says on the tin.

ATMLOTTOBEER

It’s slow, and the task/setting key resolution makes no sense unless you waste several hours reading sbt docs (time I will never get back). I’m not saying dump sbt for mill but u gotta admit sbt kinda sucks. Afaik this is also the conclusion most larger orgs that use scala come to when they start calling scalac from bazel or similar.

gdgghhhhh

The title really confused me until I realized this has nothing to do with https://millcomputing.com/ :-)

pabs3

Wonder when Mill Computing will become more publicly active and have hardware available.

waste_monk

It seemed quite promising but from the outside momentum appears to be almost completely stalled, with only a handful of posts per year on the forum.

I'd be curious to know if there's progress being made behind the scenes.

pabs3

From forum threads in the last year or two it sounds like they have gotten quite far and only need investment to progress further.

https://millcomputing.com/topic/any-plans-for-2024/ https://millcomputing.com/topic/yearly-ping-and-see-how-thin...

Edit: posted a HN thread about getting investors for Mill:

https://news.ycombinator.com/item?id=43054697

ncgl

The first thing any self respecting python dev is going to do on a new repo is implement typechecking. And the second thing they're going to do is complain about pip/pyenv.

These days I would trade my python experience for scala, even knowing it'd mean less job prospects. We make a lot of excuses for python.

Lyngbakr

I'm surprised that Lua wasn't included in the discussion. I'm not saying they necessarily should have chosen to use it, but it's a notable omission given its popularity as a config language.

rubenvanwyk

A lot of these reasons also apply to Kotlin and Kotlin is arguably simpler than Scala, why not just use Kotlin? It is much more widely adopted than Scala.

agentultra

I'm not big on programming languages being used for configuration. It adds a lot of complexity and maintenance burden. Configurations are often read and your reader isn't a compiler but now they have to imagine what the final configuration state will be after evaluating the "program" that generates the configuration. I think plain old configuration languages are better, even if verbose, since the usual text-based tooling works quite well for managing, searching, etc.

I use nix a lot and the main thing that bothers me about it is the language. People are quick to get clever with it. It becomes a morass of code that is difficult to read for anyone but experts and when it breaks and your not that expert... good luck fixing it.

winwang

Unfortunately, I find that real-world configs don't quite conform to easy understandability past tutorial examples, i.e. k8s and yaml.