Levels of configuration languages
60 comments
·April 12, 2025susam
bangonkeyboard
A top-level /AppleInternal/ directory on macOS, even if empty, will enable certain features in Apple developer tools.
LeifCarrotson
I've observed similar things to happen in other tools too - third-party tools that will list a certain set of file formats by default, and enable Autodesk formats if C:/Autodesk is present.
It seems to be most common in systems that cross major business domains (or come from completely separate companies). If you could just add an entry to the featureful, versioned, controlled config file then you'd do that, but if you can't developers frequently resort to the "does this path name resolve" heuristic.
somat
A number of traditional unix utilities change their behavior based on what their name is. /bin/test and /bin/[ come to mind. but I just checked and a quick survey of openbsd finds.
eject mt
[ test
chgrp chmod
cksum md5 sha1 sha256 sha512
cpio pax tar
ksh rksh sh
taken to it's logical extreme you end up with somthing like crunchgen https://man.openbsd.org/crunchgen which merges many independent programs into one and select which one to run based on the name.And I am guilty of abusing symbolic links as a simple single value key value store. It turns out the link does not need to point to anything and using readlink(1) was easier than parsing a file.
cduzz
ssh and rsh used to cause you to rsh/ssh to the name of the file...
so I had, in my home directory
~me/bin/snoopy
and if I wanted to log into snoopy, I'd just type
$ snoopy
and it'd rsh me into snoopy.
Hilarity was the day when someone ran
cd /export/home && for user in * ; do chown -R $user:users ; done
(note the lack of the -h flag, which causes the ownership of the symbolic link instead of the reference of the symbolic link to be chowned)ks2048
This gives me an idea - store small integer parameters (<= 511) as file permissions (r/w/exe for user/group/other) on an empty file.
rzzzt
Larger integers can go into UID:GID values.
umbra07
I use this approach for testing conditional logic in shell scripts sometimes.
null
esafak
That's just cramming a flag into the file system.
alexambarch
I’d argue Terraform/HCL is quite popular as a Level 4 configuration language. My biggest issue with it is that once things get sufficiently complex, you wish you were using a Level 5 language.
In fact, it’s hard to see where a Level 4 language perfectly fits. After you’ve surpassed the abilities of JSON or YAML (and you don’t opt for slapping on a templating engine like Helm does), it feels like jumping straight to Level 5 is worth the effort for the tooling and larger community.
default-kramer
I'm very surprised we don't see more people using a level 5 language to generate Terraform (as level 3 JSON) for this exact reason. It would seem to be the best of both worlds -- use the powerful language to enforce consistency and correctness while still being able to read and diff the simple output to gain understanding. In this hypothetical workflow, Terraform constructs like variables and modules would not be used; they would be replaced by their counterparts in the level 5 language.
https://developer.hashicorp.com/terraform/language/syntax/js...
JanMa
That actually works quite well. I once built a templating engine for Terraform files based on JQ that reads in higher level Yaml definitions of the resources that should be created and outputs valid Terraform Json config. The main reason back then was that you couldn't dynamically create Terraform provider definitions in Terraform itself.
Later on I migrated the solution to Terramate which made it a lot more maintainable because you write HCL to template Terraform config instead of JQ filters.
harshitaneja
This is exactly how we do it with our very own rudimentary internal library and scripts. Barely enough and even though I worry at times it will break unexpectedly but so far we are surprised by how stable everything has been.
Although I really wish there was a first party solution or a well established library for this but I suspect that while it is easy to build only enough to support specific usecases but building a generic enough solution for everyone would be quite an undertaking.
danpalmer
The problem with HCL is that it's a Level 4 language masquerading as a Level 3 language, rather than a Level 4 language masquerading as a Level 5 (like Starlark, Dhall, even JSONNET). Because of that its syntax is very limited and it needs awkwardly nuanced semantics, and becomes difficult to use well as a result.
HCL is best used when the problem you're solving is nearly one you could use a level 3 language for, whereas in my experience, Starlark is only really worth it when what you need is nearly Python.
miningape
The choice between 4 and 5 is more about what you get to avoid. By choosing level 5 you are opening the possibility to make some really complicated configurations and many more footguns. When you stay at level 4 you're forced into using more "standardised" blocks of code that can easily be looked up online and understood.
Level 4 is also far more declarative by nature, you cannot fully compute stuff so a lot is abstracted away declaratively. This also leads to simpler code since you're less encouraged to get into the weeds of instantiation and rather just declare what you'd like.
Overall it's about forcing simplicity by not allowing the scope of possibilities to explode. Certainly there are cases where you can't represent problems cleanly, but I think that tradeoff is worth it because of lowered complexity.
Another benefit of level 4 is that it's easier for your code can stay the same while changing the underlying system you're configuring. Since there's a driver layer between the level 4 configuration and the system which can (ideally) be swapped out.
18172828286177
> Don't waste time on discussions within a level. For example, JSON and YAML both have their problems and pitfalls but both are probably good enough.
Disagree. YAML is considerably easier to work with than JSON, and it’s worth dying on that hill.
zzo38computer
I don't really like either format (I am not sure which is worse; both have significant problems). YAML has some problems (such as Norway problem and many other problems with the syntax), and JSON has different problems; and some problems are shared between both of them. Unicode is one problem that both of them have. Numbers are a problem in some implementations of JSON but it is not required. (Many other formats have some of these problems too, such as using Unicode, and using floating numbers and not integers, etc.)
I think DER is better (for structured data), although it is a binary format, but it is in canonical form. I made up the TER format which is a text format which you can compile to DER, and some additional types which can be used (such as a key/value list type). While Unicode is supported, there are other (sometimes better) character sets which you can also use.
(However, not all configuration files need structured data, and sometimes programs are also useful to include, and these and other considerations are also relevant for other formats, so not everything should use the same file formats anyways.)
MOARDONGZPLZ
I love that there is one comment saying JSON is better, and then yours saying YAML is better.
drewcoo
To be fair, both say "don't waste time" on it.
dijksterhuis
anchors/aliases/overrides are one of my favourite yaml features. i've done so much configuration de-duplication with them, it's unreal.
ks2048
I'm not sure "total" vs "turing-complete" should be a huge difference - just terminate with an error after X seconds.
For example, can "total programming languages" include: "for i in range(10000000000000): do_something()"?
If so, your config file can still hang - even though it provably terminates.
lmm
It's a lot easier to accidentally make a file that takes forever than to accidentally make a file that takes a long but finite amount of time.
sgeisenh
> Don't waste time on discussions within a level.
I disagree with this. YAML has too many footguns (boolean conversions being the first among them) not to mention it is a superset of JSON. Plain old JSON or TOML are much simpler.
sevensor
Lack of nulls in toml is a headache. No two yaml libraries agree on what a given yaml text means. Although json is bad at numbers, that’s more easily worked around.
xelxebar
> YAML has too many footguns (boolean conversions being the first among them)
Copying my own comment from elsewhere: https://news.ycombinator.com/item?id=43670716.
This has been fixed since 2009 with YAML 1.2. The problem is that everyone uses libyaml (_e.g._ PyYAML _etc._) which is stuck on 1.1 for reasons.
The 1.2 spec just treats all scalar types as opaque strings, along with a configurable mechanism[0] for auto-converting non-quoted scalars if you so please.
As such, I really don't quite grok why upstream libraries haven't moved to YAML 1.2. Would love to hear details from anyone with more info.
[0]:https://yaml.org/spec/1.2.2/#chapter-10-recommended-schemas
ajb
I'm not convinced by reducing this to a single dimension. There are differences in both 'what can be expressed' and 'what validation can be done' which are somewhat independent of each other
qznc
Hm, you got me thinking about reversible computing and how it could be applied to configuration.
Debugging a configuration becomes tedious once computation is involved. You think some value should be "foo" but it is "bar". Why is it "bar"? If someone wrote it there, the fix is simply to change. If "bar" is the result of some computation, you have to understand the algorithm and its inputs, which is significantly harder.
Given a "reversible" programming language that might be easier. Such languages are weird though and I don't know much about them. For example: https://en.wikipedia.org/wiki/Janus_(time-reversible_computi...
ajb
Interesting idea! Although, maybe you just want to be able to run the configuration language in a reversible debugger?
This issue becomes even harder when you have some kind of solver involved, like a constraint solver or unification. As a user the solver is supposed to make your life easier but if it rejects something without a good enough error message you are stuck; having to examine the solver code to work out why is a much worse experience than not having a solver. (This is the same issue with clever type systems that need a solver)
waynecochran
https://jsonnet.org/ I never heard of this before. This seems like the JSON I wish I really had. Of course at some point you could just use JavaScript. I guess that fits w option 5.
rssoconnor
Dave Cunningham created jsonnet from some conversations I had with him about how Nix's lazy language allows one to make recursive references between parts of one's configuration in a declarative way. No need to order the evaluation beforehand.
Dave also designed a way of doing "object oriented" programming in Nix which eventually turned into what is now known as overlays.
P.S. I'm pretty sure jsonnet is Turing complete. Once you get any level of programming, it's very hard not to be Turing complete.
liveify
I made a decision early on in a project to replace YAML with jsonnet for configuration and it was the best decision I made on that project - I’ve written tens of thousands of lines of jsonnet since.
bblb
"vibeconfig"
1. You give LLM the requirements
2. It spits out whatever monstrosity is required to configure the software or service in question
3. When issues later arise, you just vibeconfig again with new requirements
Eventually new vibeconfig tools will rise because even those three steps are not complex enough. These call LLM APIs to inject the config files dynamically at runtime. "But it's a security issue". So another vertical is born: auditing and securing the vibeconfig LLM autogeneration toolsets.
retropragma
As a TypeScript developer, my answer is to use unconfig [1]. Support for JSON, JavaScript, or best of all, TypeScript.
Or wire up your own utility function with bundle-require [2], another favorite of mine, for loading TS or JS config files.
[1]: https://github.com/antfu-collective/unconfig [2]: https://github.com/egoist/bundle-require
chewbacha
Reminds me a lot of the configuration complexity clock: https://mikehadlow.blogspot.com/2012/05/configuration-comple...
It’s made the page before and proposes that these forms are cyclic.
longor1996
The article actually (accidentally or on purpose?) refers to just that:
> How to avoid this madness? Introduce another low-level configuration file. Back to level one...
chubot
Hm I also made a taxonomy of 5 categories of config languages, which is a bit different
Survey of Config Languages https://github.com/oils-for-unix/oils/wiki/Survey-of-Config-...
Languages for String Data
Languages for Typed Data
Programmable String-ish Languages
Programmable Typed Data
Internal DSLs in General Purpose Languages
Their taxonomy is: String in a File
A List
Nested Data Structures
Total Programming Languages
Full Programming Language
So the last category (#5) is the same, the first one is different (they start with plain files), and the middle is a bit different.FWIW I don’t think “Total” is useful – for example, take Starlark … The more salient things about Starlark are that it is restricted to evaluate very fast in parallel, and it has no I/O to the external world. IMO it has nothing to do with Turing completeness.
Related threads on the “total” issue:
https://lobste.rs/s/dyqczr/find_mkdir_is_turing_complete
https://lobste.rs/s/gcfdnn/why_dhall_advertises_absence_turi...
ks2048
> I actually like XML. It isn't "cool" like YAML anymore, but it has better tooling support (e.g. schema checking) and doesn't try to be too clever. Just try to stay away from namespaces and don't be afraid of using attributes.
I agree with this (even though, In practice I usually just use JSON or YAML) - it avoids some of the pitfalls of both JSON and YAML - has comments, lacks ambiguity. The main annoyances are textContent (is whitespace important?), attributes vs children, verbosity of closing tags, etc.
retropragma
Every time I work with XML data, I hate it. Just use JSONC imo.
About 20 or so years ago, I have come across a configuration pattern that could be arguably called "Level 0". It was configuration by file existence. The file itself would be typically empty. So no parsing, syntax, or schema involved. For example, if the file /opt/foo/foo.txt exists, the software does one thing but if it is missing the software does another thing. So effectively, the existence of the file serves as a boolean configuration flag.