Stop writing CLI validation. Parse it right the first time
159 comments
·September 6, 2025bschwindHN
MathMonkeyMan
Almost every command line tool has runtime dependencies that must be installed on your system.
$ ldd /usr/bin/rg
linux-vdso.so.1 (0x00007fff45dd7000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000070764e7b1000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000070764e6ca000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000070764de00000)
/lib64/ld-linux-x86-64.so.2 (0x000070764e7e6000)
The worst is compiling a C program with a compiler that uses a more recent libc than is installed on the installation host.craftkiller
Don't let your dreams be dreams
$ wget 'https://github.com/BurntSushi/ripgrep/releases/download/14.1.1/ripgrep-14.1.1-x86_64-unknown-linux-musl.tar.gz'
$ tar -xvf 'ripgrep-14.1.1-x86_64-unknown-linux-musl.tar.gz'
$ ldd ripgrep-14.1.1-x86_64-unknown-linux-musl/rg
ldd (0x7f1dcb927000)
$ file ripgrep-14.1.1-x86_64-unknown-linux-musl/rg
ripgrep-14.1.1-x86_64-unknown-linux-musl/rg: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, stripped
3836293648
Which only works on linux. No other OS allows static binaries, you always need to link to libc for syscalls.
Sharlin
Sure, but Rust specifically uses static linking for everything but the very basics (ie. libc) in order to avoid the DLL hell.
bschwindHN
Yes but I've never had a native tool fail on a missing libc. I've had several Python tools and JS tools fail on missing the right version of their interpreter. Even on the right interpreter version Python tools frequently shit the bed because they're so fragile.
mjevans
I have. During system upgrades, usually along unsupported paths.
If you're ever living dangerously, bring along busybox-static. It might not be the best, but you'll thank yourself later.
sestep
I statically link all my Linux CLI tools against musl for this reason. Or use Nix.
null
dboon
That’s the first rule anyone writing portable binaries learns. Compile against an old libc, and stuff tends to just work.
delta_p_delta_x
> Compile against an old libc
This clause is abstracting away a ton of work. If you want to compile the latest LLVM and get 'portable C++26', you need to bootstrap everything, including CMake from that old-hat libc on some ancient distro like CentOS 6 or Ubuntu 12.04.
I've said it before, I'll say it again: the Linux kernel may maintain ABI compatibility, but the fact that GNU libc breaks it anyway makes it a moot point. It is a pain to target older Linux with a newer distro, which is by far the most common development use case.
1718627440
> The worst is compiling a C program with a compiler that uses a more recent libc than is installed on the installation host.
This is only a problem, when the program USES a symbol that was only introduced in the newer libc. In other words, when the program made a choice to deliberately need that newer symbol.
majorbugger
I will keep writing my CLI programs in the languages I want, thanks. Have it crossed your mind that these programs might be for yourself or for internal consumption? When you know runtime will be installed anyway?
dcminter
You do you, obviously, but "now let npm work its wicked way" is an offputting step for some of us when narrowing down which tool to use.
My most comfortable tool is Java, but I'm not going to persuade most of the HN crowd to install a JVM unless the software I'm offering is unbearably compelling.
Internal to work? Yeah, Java's going to be an easy sell.
I don't think OP necessarily meant it as a political statement.
goku12
There should be some way to define the CLI argument format and its constraints in some sort of DSL that can be compiled into the target language before the final compilation of the application. This way, it can be language agnostic (though I don't know why you would need this) without the need for another runtime. The same interface specification should be able to represent a customizable help/usage message with sane defaults, generate dynamic tab completions code for multiple shells, generate code for good quality customizable error messages in case of CLI argument errors and generate a neatly formatted man page with provisions for additional content, etc.
In fact, I think something like this already exists. I just can't recollect the project.
vvillena
This is not an issue with Java and the other JVM languages, it's simple to use GraalVM and package a static binary.
lazide
most java CLIs (well, non shitty ones), and most distributed java programs in general, package their own jvms in a hermetic environment. it’s just saner.
bschwindHN
That's fine, I'll be avoiding using them :)
perching_aix
You'll avoid using his personal tooling he doesn't share, and his internal tooling he shares where you don't work?
Are you stuck in write-only mode or something? How does this make any sense to you?
rs186
Apparently that ship has sailed. Claude Code and Gemini CLI requires Node.js installation, and Gemini README reads as if npm is a tool that everybody knows and has already installed.
dboon
Opencode is a great model agnostic alternative which does not require a separate runtime
yunohn
Opencode uses TS and Golang, it definitely needs a runtime for the TS part. CPU usage hovers around 100% for me on an MBP M3 Max.
Sharlin
That's terrible, but at the very least there's the tiny justification that those are web API clients rather than standalone/local tools.
jampekka
> Also - don't write CLI programs in languages that don't compile to native binaries. I don't want to have to drag around your runtime just to execute a command line tool.
And don't write programs with languages that depend on CMake and random tarballs to build and/or shared libraries to run.
I usually have a lot less issues with dragging a runtime than fighting with builds.
perching_aix
Like shell scripts? Cause I mean, I agree, I think this world would be a better place if starting tomorrow shell scripts were no longer a thing. Just probably not what you meant.
ycombobreaker
Shell scripts are a byproduct of the shell existing. Generations of programmers have cut their teeth in CLI environments. Anything that made shell scripts "no longer a thing" would necessarily destroy the interactive environment, and sounds like a ladder-pull to the curiosity of future generations.
bschwindHN
> I think this world would be a better place if starting tomorrow shell scripts were no longer a thing.
Pretty much agreed - once any sort of complicated logic enters a shell script it's probably better off written in C/Rust/Go or something akin to that.
geon
This seems like a really weird stance. Who are you to dictate what language people should use? Why CLIs in particular?
bschwindHN
I'm just making an opinionated suggestion for the case when you're shipping a tool to end users and you don't want the tool to suck. Attaching a python or nodejs runtime to your tool is a quick way to make it suck for end users. It's laziness on the dev's part who didn't bother learning a better tool for the job.
dcminter
The declarative form of clap is not quite as well documented as the programmatic approach (but it's not too bad to figure out usually).
One of the things I love about clap is that you can configure it to automatically spit out --help info, and you can even get it to generate shell autocompletions for you!
I think there are some other libraries that are challenging it now (fewer dependencies or something?) but clap sets the standard to beat.
LtWorf
> Also - don't write CLI programs in languages that don't compile to native binaries. I don't want to have to drag around your runtime just to execute a command line tool.
Go programs compile to native executables, they're still rather slow to start, especially if you just want to do --help
jmull
> Think about it. When you get JSON from an API, you don't just parse it as any and then write a bunch of if-statements. You use something like Zod to parse it directly into the shape you want. Invalid data? The parser rejects it. Done.
Isn’t writing code and using zod the same thing? The difference being who wrote the code.
Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.
MrJohz
I think the key part, although the author doesn't quite make it explicit, is that (a) the parsing happens all up front, rather than weaving validation and logic together, and (b) the parsing creates a new structure that encodes the invariants of the application, so that the rest of the application no longer needs to check anything.
Whether you do that with Zod or manually or whatever isn't important, the important thing is having a preprocessing step that transforms the data and doesn't just validate it.
1718627440
But when you parse all arguments first before throwing error messages, you can create much better error messages, since they can be more holistic. To do that you need to represent the invalid configuration as a type.
geon
Sure. Then you return that validated data structure from the parsing function and never touch the invalid data structure again. That's exactly what "Parse, don't validate" means.
12_throw_away
> To do that you need to represent the invalid configuration as a type
Right - and one thing that keeps coming up for me is that, if you want to maintain complex invariants, it's quite natural to express them in terms of the domain object itself (or maybe, ugh, a DTO with the same fields), rather than in terms of input constraints.
makeitdouble
The base assumption is parsing upfront cost less than validating along. I thinks it's a common case, but not common enough to apply it as a generic principle.
For instance if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips. Light "surface" validation can still be applied, but that's not what we're talking about here I think.
MrJohz
It's not about costing less, it's about program structure. The goal should be to move from interface type (in this case a series of strings passed on the command line) to internal domain type (where we can use rich data types and enforce invariants like "if server, then all server properties are specified") as quickly as possible. That way, more of the application can be written to use those rich data types, avoiding errors or unnecessary defensive programming.
Even better, that conversion from interface type to internal type should ideally happen at one explicit point in the program - a function call which rejects all invalid inputs and returns a type that enforces the invariants we're interested in. That way, we gave a clean boundary point between the outside world and the inside one.
This isn't a performance issue at all, it's closer to the "imperative shell, functional core" ideas about structuring your application and data.
lmm
> if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips
Sure, but probably at the cost of leaving everything in a horribly inconsistent state when you error out partway through. Which is almost always not worth it.
bigstrat2003
Yeah, the "parse, don't validate" advice seems vacuous to me because of this. Someone is doing that validation. I think the advice would perhaps be phrased better as "try to not reimplement popular libraries when you could just use them".
lock1
When I first saw "Parse, don't validate" title, it struck me as a catchy but perhaps unnecessarily clever catchphrase. It's catchy, yes, but it felt too ambiguous to be meaningful for anyone outside of the target audience (Haskellers in this case).
That said, I fully agree with the article content itself. It basically just boils down to:
When you create a program, eventually you'll need to process & check whether input data is valid or not. In C-like language, you have 2 options
void validate(struct Data d);
or struct ValidatedData;
ValidatedData validate(struct Data d);
"Parse, don't validate" is just trying to say don't do `void validate(struct Data d)` (procedure with `void`), but do `ValidatedData validate(struct Data d)` (function returning `ValidatedData`) instead.It doesn't mean you need to explicitly create or name everything as a "parser". It also doesn't mean "don't validate" either; in `ValidatedData validate(struct Data d)` you'll eventually have "validation" logic similar to the procedure `void` counterpart.
Specifically, the article tries to teach folks to utilize the type system to their advantage. Rather than praying to never forget invoking `validate(d)` on every single call site, make the type signature only accept `ValidatedData` type so the compiler will complain loudly if future maintainers try to shove `Data` type to it. This strategy offloads the mental burden of remembering things from the dev to the compiler.
I'm not exactly sure why the "Parse, don't validate" catchphrase keeps getting reused in other language communities. It's not clear to non-FP community what the distinction between "parser" and "validate", let alone "parser combinator". Yet somehow other articles keep reusing this same catchphrase.
Lvl999Noob
The difference, in my opinion, is that you received the cli args in the form
``` some_cli <some args> --some-option --no-some-option ```
Before parsing, the argument array contains both the flags to enable and disable the option. Validation would either throw an error or accept it as either enabled or disabled. But importantly, it wouldn't change the arguments. If the assumption is that the last option overwrites anything before it then the cli command is valid with the option disabled.
And now, correct behaviour relies on all the code using that option to always make the same assumption.
Parsing, on the other hand, would put create a new config where `option` is an enum - either enabled or disabled or not given. No confusion about multiple flags or anything. It provides a single view for the rest of the program of what the input config was.
Whether that parsing is done by a third party library or first party code, declaratively or imperatively, is besides the point.
andreygrehov
What is ValidatedData? A subset of the Data that is valid? This makes no sense to me. The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter. The return type implies transformation – a write operation per se, whereas validation is always a read operation only.
dwattttt
Sibling says this with code, but to distil the advice: reflect the result of your validation in the type system.
Then instead of validating a loose type & still using the loose type, you're parsing it from a loose type into a strict type.
The key point is you never need to look at a loose type and think "I don't need to check this is valid, because it was checked before"; the type system tracks that for you.
8n4vidtmkvmk
Everyone seems hung up on the type system, but I think the validity of the data is the important part. I'd still want to convert strings to ints, trim whitespace, drop extraneous props and all of that jazz even if I was using plain JS without types.
I still wouldn't need to check the inputs again because I know it's already been processed, even if the type system can't help me.
geon
This might be a clearer phrasing: "Parse and validate ONCE AND FOR ALL, instead of sprinkling validation everywhere you need to access the data."
But I suppose it isn't as catchy.
remexre
The difference between parse and validate is
function parse(x: Foo): Bar { ... }
const y = parse(x);
and function validate(x: Foo): void { ... }
validate(x);
const y = x as Bar;
Zod has a parser API, not a validator API.yakshaving_jgt
Parsing includes validation.
The point is you don’t check that your string only contains valid characters and then continue passing that string through your system. You parse your string into a narrower type, and none of the rest of your system needs to be programmed defensively.
To describe this advice as “vacuous” says more about you than it does about the author.
null
akoboldfrying
Yes, both are writing code. But nearly all the time, the constraints you want to express can be expressed with zod, and in that case using zod means you write less code, and the code you do write is more correct.
> Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.
Yes, judgement is required to make depending on zod (or any library) worthwhile. This is not different in principle from trusting those same things hold for TypeScript, or Node, or V8, or the C++ compiler V8 was compiled with, or the x86_64 chip it's running on, or the laws of physics.
jmull
Sure... the laws of physics last broke backwards compatibility at the Big Bang, Zod last broke backwards compatibility a few months ago.
12_throw_away
I like this advice, and yeah, I always try to make illegal states unrepresentable, possibly even to a fault.
The problem I run into here is - how do you create good error messages when you do this? If the user has passed you input with multiple problems, how do you build a list of everything that's wrong with it if the parser crashes out halfway through?
ffsm8
I think you're looking at it too literally - what people usually mean with"making invalid state unrepresentable" is in the main application which has your domain code - which should be separate from your inputs
He even gives the example of zod, which is a validation library he defines to be a parser.
What he wants to say : "I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema"
MrJohz
> I don't want to write my own validation in a CLI, give me a good API already that first validates and then converts the inputs into my declared schema
But that _is_ parsing, at least in the sense of "parse, don't validate". It's about turning inputs into real objects representing the domain code that you're about to be working with. The result is still going to be a DTO of some description, but it will be a DTO with guaranteed invariants that are useful to you. For example, a post request shouldn't be parsed into a user object just because it shares a lot of fields in common with a user. Instead it should become a DTO with the invariants fulfilled that makes sense for a DTO. Some of those invariants are simple (like "dates should be valid" -> the DTO contains Date objects not strings), and some will be more complex like the "if the server is active, then the port also needs to be provided" restriction from the article.
This is one of the key ideas behind Zod - it isn't just trying to validate whether an object matches a certain schema, but it converts the result into a type that accurately expresses the invariants that must be in place if the object is valid.
ffsm8
I dont disagree with the desire to get a good API like that. I was just pointing out that this was the core of the desire the author had, as 12_throw_away was correctly pointing out that _true_ parsing and making invalid state unrepresentable forces you to error out on the first missmatch, which makes it impossible to raise multiple issues. the only way around that is to allow invalid state during the input phase.
zod also allows invalid state as input, then attempts to shoehorn them into the desired schema, which still runs these validations the author was complaining about - just not in the code he wrote.
8n4vidtmkvmk
Zod might be a validation library, but it also does type coercion and transforms. I believe that's what the author means by a parser.
goku12
Apparently not. The author cites the example of json parsing for APIs. You usually don't split it into a generic parsing into native data types and then validate the result in memory (unless you're on a dynamically typed language and don't use a validation schema). Instead, the expected native data type of the result (composed using structs, enums, unions, vectors, etc) is defined first and then you try to parse the json into that data type. Any json errors and schema violations will error out in a single step.
mark38848
Just use optparse-applicative in PureScript. Applicatives are great for this and the library gives it to you for free.
bradrn
> Just use optparse-applicative in PureScript.
Or in Haskell!
adinisom
If talking about UI, the flip side is not to harm the user's data. So despite containing errors it needs to representable, even if it can't be passed further along to back-end systems.
For parsing specifically, there's literature on error recovery to try to make progress past the error.
ambicapter
Most validation libraries worth their salt give you options to deal with this sort of thing? They'll hand you an aggregate error with an 'errors' array, or they'll let you write an error message "prettify-er" to make a particular validation error easier to read.
pmarreck
Right, but that's validation, and this article is talking about parsing (not validating) into an already-correct structure by making invalid inputs unrepresentable.
So maybe the reason why they were able to reduce the code is because they lost the ability to do good error reporting.
jpc0
How is getting an error array not making invalid input unrepresentable.
You either get the correctly parsed data or you get an error array. The incorrect input was never represented in code, vs a 0 value being returned or even worse random gibberish.
A trivial example: 1/0 should return DivisionByZero not 0 or infinity or NaN or whatever else. You can then decide in your UI whether that is a case you want to handle as an error or as an edge case but the parser knows that is not possible to represent.
lmm
You parse into an applicative validation structure, combine those together, and then once you've brought everything together you handle that as either erroring out with all the errors or continuing with the correct config. It's easier to do that with a parsing approach than a validating approach, not harder.
Ygg2
Parsers can be made to not fail on first error. You return either a parsed structure or an array of found error.
Html5 parser is notoriously friendly to errors. See adoption agency algorithm.
Thaxll
This work if all errors are self contained, stoping at the first one is fine too.
geysersam
Maybe you can use his `or` construct to allow a `--server` without `--port`, but then also add a default `error_message` property.
After parsing you check if `error_message` exists and raise that error.
akoboldfrying
Agree. It should definitely be possible to get error messages on par with what TypeScript gives you when you try to assign an object literal to an incompatibly typed variable; whether that's currently the case, and how difficult it would be to get there if not, I don't know.
nine_k
This is a recurring idea: "Parse, don't validate". Previously:
https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va... (2019, using Haskell)
https://www.lelanthran.com/chap13/content.html (April 2025, using C)
jetrink
The author credits Alexis King at the beginning and links to that post.
SloopJon
I don't see anything in the post or the linked tutorial that gives a flavor of the user experience when you supply an invalid option. I tried running the example, but I've forgotten too much about Node and TypeScript to make it work. (It can't resolve the @optique references.) What happens when you pass --foo, --target bar, or --port 3.14?
macintux
I had a similar question: to me, the output format “or” statement looks like it might deterministically pick one winner instead of alerting the user that they erred. A good parser is terrific, but it needs to give useful feedback.
Dragging-Syrup
Absolutely; I think calling the function xor would be more appropriate.
andrewguy9
Docopt!
Make use of the usage string be the specification!
A criminally underused library.
tomjakubowski
A great example of "declaration follows use" outside of C syntax.
fragmede
My favorite. A bit too much magic for some, but it seems well specified to me.
esafak
The "problem" is that some languages don't have rich enough type systems to encode all the constraints that people want to support with CLI options. And many programmers aren't that great at wielding the type systems at their disposal.
geon
I just recently implemented my own parser combinator lib in typescript too. It was surprisingly simple in the end.
This function parses a number in 6502 asm. So `255` in dec or `$ff` in hex: https://github.com/geon/dumbasm/blob/main/src/parsers/parseN...
I looked at several typescript libraries but they all felt off. Writing my own at least ensured I know how it works.
SoftTalker
I like just writing functions for each valid combination of flags and parameters. Anything that isn’t handled is default rejected. Languages like Erlang with pattern matching and guards make this a breeze.
bsoles
>> // This is a parser
>> const port = option("--port", integer());
I don't understand. Why is this a parser? Isn't it just way of enforcing a type in a language that doesn't have types?
I was expecting something like a state machine that takes the command line text and parses it to validate the syntax and values.
hansvm
The heavy lifting happens in the definitions of `option` and `integer`. Those will take in whatever arguments they take in and output some sort of `Stream -> Result<Tuple<T, Stream>>` function.
That might sound messy but to the author's point about parser combinators not being complicated, they really don't take much time to get used to, and they're quite simple if you wanted to build such a library yourself. There's not much code (and certainly no magic) going on under the hood.
The advantage of that parsing approach:
It's reasonably declarative. This seems like the author's core point. Parser-combinator code largely looks like just writing out the object you want as a parse result, using your favorite combinator library as the building blocks, and everything automagically works, with amazing type-checking if your language has such features.
The disadvantages:
1. Like any parsing approach, you have to actually consider all the nuances of what you really want parsed (e.g., conditional rules around whitespace handling). It looks a little to me (just from the blog post, not having examined the inner workings yet) like this project side-stepped that by working with the `Stream` type as just the `argv` list, allowing you to be able to say things like "parse the next blob as a string" without also having to encode whitespace and blob boundaries.
2. It's definitely slower (and more memory-intensive) than a hand-rolled parser, and usually also worse in that regard than other sorts of "auto-generated" parsing code.
For CLI arguments, especially if they picked argv as their base stream type, those disadvantages mostly don't exist. I could see it performing poorly for argv parsing for something like `cp` though (maybe not -- maybe something like `git cp`, which has more potential parse failures from delimiters like `--`?), which has both options and potentially ginormous lists of files; if you're not very careful in your argument specification then you might have exponential backtracking issues, and where that would be blatantly obvious in a hand-rolled parser it'll probably get swept under the rug with parser combinators.
null
m463
This kind of stuff is what makes me appreciate python's argparse.
It's a genuine pleasure to use, and I use it often.
If you dig a little deeper into it, it does all the type and value validation, file validation, it does required and mutually exclusive args, it does subargs. And it lets you do special cases of just about anything.
And of course it does the "normal" stuff like short + long args, boolean args, args that are lists, default values, and help strings.
MrJohz
Actually, I think argparse falls into the same trap that the author is talking about. You can define lots of invariants in the parser, and say that these two arguments can't be passed together, or that this argument, if specified, requires these arguments to also be specified, etc. But the end result is a namespace with a bunch of key-value pairs on it, and argparse doesn't play well with typing systems like mypy or pyright. So the rest of the tool has to assume that the invariants were correctly specified up-front.
The result is that you often still this kind of defensive programming, where argparse ensures that an invariant holds, but other functions still check the same invariant later on because they might have been called a different way or just because the developer isn't sure whether everything was checked where they are in the program.
What I think the author is looking for is a combination of argparse and Pydantic, such that when you define a parser using argparse, it automatically creates the relevant Pydantic classes that define the type of the parsed arguments.
bvrmn
In general case generating CLI options from app models leads to horrible CLI UX. Opposite is also true. Working with "nice" CLI options as direct app models is horrendous.
You need a boundary to convert nice opts into nice types. Like pydantic models could take argparse namespace and convert it to something manageable.
MrJohz
I mean, that's much the same as working with web APIs or any other kind of interface. Your DTO will probably be different from your internal models. But that doesn't mean it can't contain invariants, or that you can't parse it into a meaningful type. A DTO that's just a grab-bag of optional values is a pain to work with.
Although in practice, I find clap's approach works pretty well: define an object that represents the parsed arguments as you want them, with annotations for details that can't be represented in the type system, and then derive a parser from that. Because Rust has ADTs and other tools for building meaningful types, and because the derive process can do so much. That creates an arguments object that you can quite easily pass to a function which runs the command.
sgarland
Precisely my thought. I love argparse, but you can really back yourself into a corner if you aren’t careful.
js2
> What I think the author is looking for is a combination of argparse and Pydantic
Not quite that, but https://typer.tiangolo.com/ is fully type driven.
hahn-kev
It's almost like you want compile time type safety
MrJohz
You can have that with Mypy and friends in Python, and Typescript in the JS world. The problem is that older libraries often don't utilise that type safety very well because their API wasn't designed for it.
The library in the original post is essentially a Javascript library, but it's one designed so that if you use it with Typescript, it provides that type safety.
Rust with Clap solved this forever ago.
Also - don't write CLI programs in languages that don't compile to native binaries. I don't want to have to drag around your runtime just to execute a command line tool.