A steam locomotive from 1993 broke my yarn test
70 comments
·April 2, 2025bouke
tlb
Exec doesn't know about shell aliases. Only what's in the $PATH.
I liked the shell in MPW (Mac Programmer's Workshop, pre-NeXT) where common commands had both long names and short ones. You'd type the short ones at the prompt, but use the long, unambiguous ones in scripts.
Kwpolska
PowerShell has long commands and short aliases, but the aliases can still shadow executables, e.g. the `sc` alias for `Set-Content` shadows `sc.exe` for configuring services. And you only notice when you see no output and weird text files in the current working directory.
szszrk
Networking crowd probably think it's obvious. Because of things like Cisco cli, or even Mikrotik. Or "ip" cli as well, I guess.
I never bothered to check what's the origin of that pattern.
hnlmorg
Ive taken entire web farms offline due to an unexpected expansion of a command on a Cisco load balancer.
The command in question was:
administer-all-port-shutdown
(Or something to that effect —it’s been many years now)And so I went to log in via serial port (like I said, *many years ago so this device didn’t have SSH), didn’t get the prompt I was expecting. So typed the user name again:
admin
And shortly afterwards all of our alarms started going off.The worst part of the story is that this happened twice before I realised what I’d done!
I still maintain that the full command is a stupid name if it means a phrase as common as “admin” can turn your load balancer off. But I also learned a few valuable lessons about being more careful when running commands on Cisco gear.
skykooler
Theoretically you could do this in Linux by calling /usr/bin/sl or whatever - but since various distros put binaries in different places, that would probably cause more problems than it could solve.
Tractor8626
No. This is not the real problem. There is nothing you can do if your 'bash', 'ls', 'cat', 'grep', etc do something they not supposed to do.
Proper error handling would be helpful though.
Etheryte
The fact that Jest blindly calls whatever binary is installed as `sl` is downright reckless and that's an understatement. If they need the check, a simple way to avoid the problem would be to install it as a dependency, call `require.resolve()` [0] and Bob's your uncle. If they don't want the bundle size, write a heuristic, surely Meta can afford it. Blindly stuffing strings into exec and hoping it works out is not fine.
[0] https://nodejs.org/api/modules.html#requireresolverequest-op...
Joker_vD
"That's just, like, your opinion, man". There is another school of thought that postulates that an app should use whatever tools that exist in the ambient environment that the user has provided the app with, instead of pulling and using random 4th-party dependencies from who knows where. If I symlinked e.g. "find", or "python3", or "sh", or "sl" to my weird interceptor/preprocessor/trapper script, that most likely means that I do want the apps to use it, damn it, not their own homebrewed versions.
> a simple way to avoid the problem would be to install it as a dependency
I've seen once a Makefile that had "apt remove -y [libraries and tools that somehow confuse this Makefile] ; apt install -y [some other random crap]" as a pre-install step, I kid you not. Thankfully, I didn't run it with "sudo make" (as the README suggested) but holy shit, the presumptuousness of some people.
The better way would have been to have "Sapling CLI" explicitly declared as a dependency, and checked for, somehow. But as the whole history of dev experience shows, that's too much ask from the people, and the dev containers are, sadly, the sanest and most robust way to go.
Etheryte
I think where our opinions differ is what boundaries this logic should cross. When I'm in Bash-land, I'm happy that my Bash-isms use the rest of what's available in the Bash env. When I'm in Node, likewise, as this is an expected and desirable outcome. Where this doesn't sit right with me is when a Node-land script crosses this boundary and starts murking around with things from a different domain.
In general, I would want everything to work by the principle of least surprise, so Node stuff interacts with Node dependencies, Python does Python things, Bash does Bash env, etc. If I need one to interact with the other, I want to be explicit about it, not have some spooky action at a distance.
blueflow
What else should the test runner do?
pavel_lishin
There must be a better way to tell if a repo is a Sapling repo than by running some arbitrary binary, right?
Symbiote
For Git one could look for .git/config. There must be something equivalent.
pasc1878
Use the full path of sl and not rely on $PATH in the same way cron and macOS GUI apps do for I assume this exact reason.
stonegray
Is the full path guaranteed? For example homebrew, snap, and apt might put it all in different places. $PATH is a useful tool.
Joker_vD
How would knowing the full path help you anyway? It's either in "/usr/bin/sl", or "/usr/local/bin", or "~/.local/bin", now what?
By the way, believe it or not, POSIX compliance requires existence of only two directories (/dev and /tmp) and three files (/dev/console, /dev/null, and /dev/tty) on the system; everything else is completely optional, including existence of /bin, /etc, and /usr.
skipants
What if the full path is just `/usr/bin/sl`?
charcircuit
Finding the full path of sl requires looking at $PATH
GTP
Just from the title, I suspected that Steam Locomotive had something to do with it. So I quickly glanced through the article up to the point where the locomotive shows up. Sometimes there's the idea hanging in my mind to make a version called Slow Locomotive, where the train slows down every time you press ctrl-c.
dullcrisp
If you press ^Z does it stop entirely?
And do these sorts of ideas ever get you into trouble?
throwanem
I once reimplemented in Perl Nethack's logic for phase-of-moon and Friday 13th computation and notification, and added the resulting cute little script to the root .profile on our consulting firm's main web hosting boxes.
I didn't get fired when my boss found it by surprise a couple months (and lunar cycles) later, but I did learn a valuable lesson about how one may wisely limit one's exercise of whimsy.
Google took a few years more to achieve the same discovery, as I recall, but presumably this has to do with pedagogical methods involving not as many ex-sergeants.
fifticon
as a 30+y employed systems programmer, when I read a story like this, I get angry at the highly piled brittle system,not at the guy having sl installed. I am aware there exists a third option of not getting angry in the first place, but I hate opaque nonrobust crap. This smells like everything I hate about front-end tooling. ignorance and arrogance in perfect balance.
ericmcer
What would you have done differently? They were dependent on SL (which is a facebook source control system written in C) but the user had overwritten the expected path with a shell script. That is not something most engineers would build around... "what if the user is overwriting the path to dependencies with nonsense shell scripts?".
It doesn't feel like something that is entirely the Jest maintainers fault, I am not sure why Jest needs a source control system but there are probably decent reasons.
Like if I overwrite `ls` to a shell script that deletes everything on my desktop and then I execute code you wrote that relies on `ls` are you to blame because you didn't validate its behavior before calling it?
MD87
The difference is that `ls` is specified in POSIX and everyone has roughly the same expectations of what it does.
Nothing specifies what a binary called `sl` does. The user didn't "overwrite" anything. They just had an `sl` binary that was not the `sl` binary Jest expects. Arguably they had the more commonly known binary with that name.
mmlb
Use the lessons learned from those before us in less heterogeneous days, aka inspect the binaries you're going to call out to for fitness. Things like "check if grep is gnu or bsd" or "check if sl is sapling or steamlocomotive".
I've done that a bit to deal with macos crippled bash for example.
ploxiln
jest (or whatever was trying to auto-detect a "sapling" repo) should take explicit configuration to enable "sapling" or "mercurial" or whatever integration. And not try to run "sl" 16+ times in various modules/threads trying to auto-detect it.
"automagic" things trying to be easy and helpful is really a significant source of my stress fixing software these days.
sixothree
I hate to say it but choosing to name something sl in the first place is about as arrogant as you can get. I just can’t understand the world in which sl was an acceptable name to use much less an acceptable executable to have a dependency on.
Tractor8626
Totally happens in C code too. Maybe even more often.
Just today had proxmox not working because of invalid localhost line in /etc/hosts. Or had problem with logging in KDE because /etc/shadow was owned by root.
In both cases only incomprehensible error messages. Luckily solutions was googleable.
salmonellaeater
A useful error message would have made this a 1-minute investigation. The "fix" of trying to detect this specific program is much too narrow. The right fix is to change Yarn to print a message about what it was trying to do (check for a Sapling repo) and what happened instead. This is also likely a systemic problem, so a good engineer would go through the whole program and fix other places that need it.
burnte
I discovered SL in 1999, and forgot about it. I rediscovered it 5 years later when on my personal server I typoed ls as sl and hit enter. A steam locomotive drove across my screen, and I remembered installing it 5 years later and laughed by butt off. I wound up pranking myself and it took 5 years to pay off!
pjc50
Plus points for using strace. It's one of those debugging tools everyone know about for emergencies that can't be solved at a higher level, and a great convenience of using Linux. The Windows ETW system is much harder to use, and I'm not sure if it's even possible at all under OSX security.
throwway120385
I have solved an incredible number of problems just by looking at strace output very carefully. Strace combined with Wireshark or Tcpdump are incredible as a toolset for capturing what a program is doing and for capturing what the effect is either on the USB or the NIC.
frizlab
macOS has dtrace which is actually nicer to use. Cannot be used on all processes when SIP is on though.
pjc50
Last time I tried SIP prevented me from using it on my own processes, but I may have been holding it wrong.
dontlaugh
macOS’s Solaris-inspired dtrace is actually nicer, especially the UI.
mrguyorama
The chrome folks built https://randomascii.wordpress.com/2015/04/14/uiforetw-window... to improve ETW usability.
You usually don't need that full industrial level tracing though on Windows! Process Monitor is 95% of the solution for most people, and provides very similar functionality to strace, if a lot easier to read.
snovymgodym
The real story here is that the author and his coworker wasted a bunch of time tracking down this bug because their dev environment was badly set up.
> his system (MacOS) is not affected at all versus mine (Linux)
> nvm use v20 didn't fix it
If you are writing something like NodeJS, 99% of the time it will only ever be deployed server-side on Linux, most likely in a container.
As such, your dev environment should include a dev dockerfile and all of your work should be done from that container. This also has the added benefit of marginally sandboxing the thousands of mystery-meat NPM packages that you will no doubt be downloading from the rest of your machine.
There is zero reason to even mess with a "works on my machine" or a "try a different node version" situation on this kind of NodeJS project. Figure out your dependencies, codify them in your container definition, and move on. Oh, your tests work on MacOS? Great, it could not matter less because you're not deploying there.
Honestly, kind of shocking that a company like Cloudflare wouldn't have more standard development practices in place.
bilekas
>If you are writing something like NodeJS, 99% of the time it will only ever be deployed server-side on Linux, most likely in a container.
I'm really curious where you're getting this impression from ? I for one never run docker containers on my dualcore atom server with 4gb ram.. but i have a lot of node services running.
> There is zero reason to even mess with a "works on my machine" or a "try a different node version" situation on this kind of NodeJS project
There are a lot of reasons to investigate these things, infact that's what I would expect from a larger more industry invoved companies, knowing the finer nuances and details of these things can be important. What might seem benign can just as quickly become something really dangerous or important when working on a huge scale such as CloudFlare.
Edit : BTW I do agree mistakes were made, and the hell that is NPM chain of delivery attacks is terrifying. Those are the points I would focus on more personally.
snovymgodym
> I'm really curious where you're getting this impression from?
Experience mainly, though perhaps I live in a bubble. My "99%" assertion was more pointed at the "server-side on Linux" part than the "most likely in a container" part.
Really the point I wanted to make was that your development and test environment should be the same as, or as close as possible to, your production environment.
If your app is going to be deployed on Red Hat Enterprise Linux (whether in a container, VM, or baremetal), then don't bother chasing down cryptic NPM errors that arise when you run it on Ubuntu, Mac, or Windows. Just run everything out of a RHEL docker container which mimics your production environment and spent your limited time doing the actual task at hand. It simply is not worth your time to rabbit hole endlessly on NPM errors that happen on an environment you'll never deploy to.
> There are a lot of reasons to investigate these things, ...
Sure, I don't really disagree with that and generally it's good to have a solid understanding of your tools and what lies in the layers below the abstractions that you normally work with. The detective work in the post is solid.
But the thing is that the author was supposed to be learning NodeJS in order to ramp up on a React project. But he got derailed (heh) by this side quest which delayed him being able to do the actual work he set out to do. Whether or not it was worth the time is subjective. But either way, it would not have happened in the first place with better dev environment practices.
bilekas
> Really the point I wanted to make was that your development and test environment should be the same as, or as close as possible to, your production environment.
I’m really glad to hear that actually, I think you did make that point but it was a bit overlooked with the other points.
About having better Dev environments I think you're also spot on, not just with infrastructure but also with support from other maybe more experienced developers who could identify these things early and knowledge share, for me at least that's one of the main development requirements, if you're not learning, you should be teaching.
throwanem
The last time I dealt with a non-dockerized Node deployment, at work or at home, was in 2013. That this was also the year of Docker's initial release is no coincidence at all.
bilekas
I think for production it’s a good move, it just doesn’t feel like a sure assumption that the majority of node services are containerized.
Kwpolska
Naming your source control tool after a common mistyping of ls is such a Facebook move.
m4rtink
Yeah! What are they going to do next - call a programming language "go" or something ? Even Google would not be that stupid - imaging Googling for that and getting only irrelevant stuff!
12345hn6789
Go slice array differences golang
computerfriend
Naming it after a commonly installed program that has been around since 1993 is also some hubris.
mrguyorama
The reality is that most devs writing code in Facebook were not alive in 93, and certainly weren't Linux admins at that time.
Does Facebook even have any greybeards in the trenches?
wrs
I had a similar problem where builds were timing out. When I looked at the build log, there was a calendar in it (?!). I eventually figured out a script was calling `date`, and something I had `go install`ed (I think) had a test binary called `date` that was an interactive calendar.
null
rossdavidh
I demonstrated that I am not a serious or good programmer by installing steam locomotive on my Linux laptop immediately after reading this.
normie3000
> git commit, which hooked into yarn test
There's the real wtf. How are you meant to commit a failing test? Or any other kind of work in progress?
zdragnar
You mark the failing test with "failing". The test runner knows that it might fail but doesn't fail the suite.
I'm not a big fan of git commit hooks, but it can give faster feedback than waiting for a CI runner to point out something that should have been obvious if you keep it light weight (such as style linting or compiler warnings).
Edit: replaced "Todo" with "failing" since we're talking about jest specifically: https://jestjs.io/docs/api#testfailingname-fn-timeout
computerfriend
git commit -n
So the real problem is that Jest just executes to whatever `sl` resolves. The fix they intent to release doesn't address that, but it tries to recognise the train steaming through. How is this acceptable behaviour from a test runner, as it looks like a disaster to happen. What if I have `alias sl=rm -rf /`, as one typically wants to have such a command close at hand?