Skip to content(if available)orjump to list(if available)

My failed attempt to shrink all NPM packages by 5%

franga2000

We wouldn't have to worry about over-the-wire package size if the modern DevOps approach wasn't "nuke everything, download from the Internet" every build.

Back in my Java days, most even small-time dev shops had a local Maven registry that would pass through and cache the big ones. A CI job, even if the "container" was nuked before each build, would create maybe a few kilobytes of Internet traffic, possibly none at all.

Now your average CI job spins up a fresh VM or container, pulls a Docker base image, apt installs a bunch of system dependencies, pip/npm/... installs a bunch of project dependencies, packages things up and pushes the image to the Docker registry. No Docker layer caching because it's fresh VM, no package manager caching because it's a fresh container, no object caching because...you get the idea....

Even if we accept that the benefits of the "clean slate every time" approach outweigh the gross inefficiency, why aren't we at least doing basic HTTP caching? I guess ingress is cheap and the egress on the other side is "someone else's money".

alimbada

After reading the article, this comment and the comment thread further down on pnpm[1], it feels to me like the NPM team are doing everyone a disservice by ignoring the inefficiencies in the packaging system. It may not be deliberate or malicious but they could easily have provided better solutions than the one proposed in the article which, in my opinion is a band-aid solution at best. The real fix would be to implement what you mention here: local registry and caching, and/or symlinking a la pnpm.

[1] https://news.ycombinator.com/item?id=42841658

j-krieger

In every org I've worked with, we had a local dependency mirror in the GitOps architecture.

robertlagrant

Lots of places use a cache like Artifactory so they don't get slammed with costs, and are resilient to network outages and dependency builds vanishing.

Spivak

I really don't want to go back to the old world where every part of your build is secretly stateful and fails in mysterious hard to reproduce ways.

You can and should have your own caching proxy for all your builds but local caches are evil.

Terr_

Yeah, I would also note that in addition to speed/transfer-costs, having an organizational package proxy is useful for reproducibility and security.

sd9

The final pro/cons list: https://github.com/npm/rfcs/pull/595#issuecomment-1200480148

I don't find the cons all that compelling to be honest, or at least I think they warrant further discussion to see if there are workarounds (e.g. a choice of compression scheme for a library like typescript, if they would prefer faster publishes).

It would have been interesting to see what eventually played out if the author hadn't closed the RFC themselves. It could have been the sort of thing that eventually happens after 2 years, but then quietly makes everybody's lives better.

jerf

"I don't find the cons all that compelling to be honest"

This is a solid example of how things change at scale. Concerns I wouldn't even think about for my personal website become things I need to think about for the download site being hit by 50,000 of my customers become big deals when operating at the scale of npm.

You'll find those arguments the pointless nitpicking of entrenched interests who just don't want to make any changes, until you experience your very own "oh man, I really thought this change was perfectly safe and now my entire customer base is trashed" moment, and then suddenly things like "hey, we need to consider how this affects old signatures and the speed of decompression and just generally whether this is worth the non-zero risks for what are in the end not really that substantial benefits".

I do not say this as the wise Zen guru sitting cross-legged and meditating from a position of being above it all; I say it looking at my own battle scars from the Perfectly Safe things I've pushed out to my customer base, only to discover some tiny little nit caused me trouble. Fortunately I haven't caused any true catastrophes, but that's as much luck as skill.

Attaining the proper balance between moving forward even though it incurs risk and just not changing things that are working is the hardest part of being a software maintainer, because both extremes are definitely bad. Everyone tends to start out in the former situation, but then when they are inevitably bitten it is important not to overcorrect into terrified fear of ever changing anything.

pif

> This is a solid example of how things change at scale.

5% is 5% at any scale.

michaelmior

Yes and no. If I'm paying $5 a month for storage, I probably don't care about saving 5% of my storage costs. If I'm paying $50,000/month in storage costs, 5% savings is a lot more worthwhile to pursue

gregmac

5% off your next lunch and 5% off your next car are very much not the same thing.

horsawlarway

5% of newly published packages, with a potentially serious degradation to package publish times for those who have to do that step.

Given his numbers, let's say he saves 100Tb of bandwidth over a year. At AWS egress pricing... that's $5,000 total saved.

And arguably - NPM is getting at least some of that savings by adding CPU costs to publishers at package time.

Feels like... not enough to warrant a risky ecosystem change to me.

syncsynchalt

In some scenarios the equation flips, and the enterprise is looking for _more_ scale.

The more bandwidth that Cloudflare needs, the more leverage they have at the peering table. As GitHub's largest repo (the @types / DefinitelyTyped repo owned by Microsoft) gets larger, the more experience the owner of GitHub (also Microsoft) gets in hosting the world's largest git repos.

I would say this qualifies as one of those cases, as npmjs is hosted on Azure. The more resources that NPM needs, the more Microsoft can build towards parity with AWS's footprint.

null

[deleted]

Aicy

That's right, and 5% of a very small number is a very small number. 5% of a very big number is a big number.

knighthack

Do you even know how absolute numbers work vis-à-vis percentages?

sd9

I agree with everything you said, but it doesn’t contradict my point

jerf

I'm saying you probably don't find them compelling because from your point of view, the problems don't look important to you. They don't from my point of view either. But my point of view is the wrong point of view. From their point of view this would be plenty to make me think twice and several times over past that from changing something so deeply fundamental to the system for what is a benefit that nobody who is actually paying the price for the package size seems to be particularly enthusiastic about. If the people paying the bandwidth bill aren't even that excited about a 5% reduction, then the cost/benefits analysis tips over into essentially "zero benefit, non-zero cost", and that's not very compelling.

alt227

I feel massively increasing publish time is a valid reason not to push this though considering such small gains and who the gains apply to.

scott_w

I agree, going from 1 second to 2.5 minutes is a huge negative change, in my opinion. I know publishing a package isn't something you do 10x a day but it's probably a big enough change that, were I doing it, I'd think the publish process is hanging and keep retrying it.

pletnes

If you’re working on the build process itself, you’ll notice it a lot!

rererereferred

Since it's backwards compatible, individual maintainers could enable it in their own pipeline if they don't have issues with the slowdown. It sounds like it could be a single flag in the publish command.

michaelmior

Probably not worth the added complexity, but in theory, the package could be published immediately with the existing compression and then in the background, replaced with the Zopfli-compressed version.

aja12

> Probably not worth the added complexity, but in theory, the package could be published immediately with the existing compression and then in the background, replaced with the Zopfli-compressed version.

Checksum matters aside, wouldn't that turn the 5% bandwidth savings into an almost double bandwidth increase though? IMHO, considering the complexity to even make it a build time option, the author made the right call.

Null-Set

No, it can't because the checksums won't match.

hiatus

Wouldn't that result in a different checksum for package-lock.json?

advisedwang

The pros aren't all that compelling either. The npm repo is the only group that this would really be remotely significant for, and there seemed to be no interest. So it doesn't take much of a con to nix a solution to a non-problem.

ForOldHack

Every single download, until the end of time is affected: It speeds up the servers, speeds up the updates, saves disk space on the update servers, and saves on bandwidth costs and usage.

Everyone benefits, the only cost is a ultra microscopic time on the front end, and a tiny cost on the client end, and for a very significant number of users, time and money saved. The examples of compression here...

pclmulqdq

Plus a few years of a compression expert writing a JS implementation of what was likely some very cursed C. And someone auditing its security. And someone maintaining it.

n4r9

I felt the same. The proposal wasn't rejected! Also, performance gains go beyond user stories - e.g. they reduce infra costs and environmental impact - so I think the main concerns of the maintainers could have been addressed.

IshKebab

> The proposal wasn't rejected!

They soft-rejected by requiring more validation than was reasonable. I see this all the time. "But did you consider <extremely unlikely issue>? Please go and run more tests."

It's pretty clear that the people making the decision didn't actually care about the bandwidth savings, otherwise they would have put the work in themselves to do this, e.g. by requiring Zopfli for popular packages. I doubt Microsoft cares if it takes an extra 2 minutes to publish Typescript.

Kind of a wild decision considering NPM uses 4.5 PB of traffic per week. 5% of that is 225 TB/week, which according to my brief checks costs around $10k/week!

I guess this is a "not my money" problem fundamentally.

johnfn

This doesn't seem quite correct to me. They weren't asking for "more validation than was reasonable". They were asking for literally any proof that users would benefit from the proposal. That seems like an entirely reasonable thing to ask before changing the way every single NPM package gets published, ever.

I do agree that 10k/week is non-negligible. Perhaps that means the people responsible for the 10k weren't in the room?

bombcar

Or another way to look at it is it's just (at most!) 5% off an already large bill, and it might cost more than that elsewhere.

And I can buy 225 TB of bandwidth for less than $2k, I assume Microsoft can get better than some HN idiot buying Linode.

arccy

massively increase the open source github actions bill for runners running longer (compute is generally more expensive) to publish for a small decrease in network traffic (bandwidth is cheap at scale)?

lyu07282

> which according to my brief checks costs around $10k/week

That's the market price though, for Microsoft its a tiny fraction of that.

macspoofing

> I don't find the cons all that compelling to be honest

I found it reasonable.

The 5% improvement was balanced against the cons of increased cli complexity, lack of native JS zopfli implementation, and slower compression .. and 5% just wasn't worth it at the moment - and I agree.

>or at least I think they warrant further discussion

I think that was the final statement.

sd9

Yes, but there’s a difference between “this warrants further discussion” and “this warrants further discussion and I’m closing the RFC”. The latter all but guarantees that no further discussion will take place.

philipwhiuk

No it doesn't. It only does that if you think discussion around future improvements belongs in RFCs.

mootothemax

> I don't find the cons all that compelling to be honest, or at least I think they warrant further discussion

It needs a novel JS port of a C compresison library, which will be wired into a heavily-used and public-facing toolchain, and is something that will ruin a significant number of peoples' days if it breaks.

For me, that kind of ask needs a compelling use case from the start.

null

[deleted]

cedws

Last I checked npm packages were full of garbage including non-source code. There's no reason for node_modules to be as big as it usually is, text compresses extremely well. It's just general sloppiness endemic to the JavaScript ecosystem.

eitau_1

It's not even funny:

  $ ll /nix/store/*-insect-5.9.0/lib/node_modules/insect/node_modules/clipboardy/fallbacks/*
  /nix/store/…-insect-5.9.0/lib/node_modules/insect/node_modules/clipboardy/fallbacks/linux:
  .r-xr-xr-x 129k root  1 Jan  1970 xsel

  /nix/store/…-insect-5.9.0/lib/node_modules/insect/node_modules/clipboardy/fallbacks/windows:
  .r-xr-xr-x 444k root  1 Jan  1970 clipboard_i686.exe
  .r-xr-xr-x 331k root  1 Jan  1970 clipboard_x86_64.exe
(clipboardy ships executables and none of them can be run on NixOS btw)

dicytea

I don't know why, but clipboard libraries tend to be really poorly implemented, especially in scripting languages.

I just checked out clipboardy and all they do is dispatch binaries from the path and hope it's the right one (or if it's even there at all). I think I had a similar experience with Python and Lua scripts. There's an unfunny amount of poorly-written one-off clipboard scripts out there just waiting to be exploited.

I'm only glad that the go-to clipboard library in Rust (arboard) seems solid.

cedws

Are they reproducible? Shipping binaries in JS packages is dodgy AF - a Jia Tan attack waiting to happen.

eitau_1

The executables are vendored in the repo [0].

[0] https://github.com/sindresorhus/clipboardy/tree/main/fallbac...

tanepiper

Yep, I wrote a script that starts at a root `node_modules` folder and iterates through to remove anything not required (dotfiles, Dockerfile, .md files, etc) - in one of our smaller apps this removes about 25Mb of fluff, some packages are up to 60-70mb of crap removed.

vinnymac

You might be interested in e18e if you would like to see that change: https://e18e.dev/

They’ve done a lot of great work already.

KTibow

Does this replace ljharb stuff?

hombre_fatal

One of the things I like about node_modules is that it's not purely source code and it's not purely build artifacts.

You can read the code and you can usually read the actual README/docs/tests of the package instead of having to find it online. And you can usually edit library code for debugging purposes.

If node_modules is taking up a lot of space across a bunch of old projects, just write the `find` script that recursively deletes them all; You can always run `npm install` in the future when you need to work on that project again.

MortyWaves

Totally agree with you. I wish npm did a better job of filtering the crap files out of packages.

Alifatisk

At least, switch to pnpm minimize the bloat

MBCook

As someone who mostly works in Java it continues to floor me that this isn’t the default. Why does every project I work on need an identical copy of possibly hundreds of packages if they’re the same version?

I also like Yarn pnp’s model of leaving node_modules as zip files. CPUs are way faster than storage, they can decompress on the fly. Less disk space at rest, less disk slack, less filesystem bookkeeping.

Every single filesystem is way faster at dealing with one file than dozens/hundreds. Now multiply that by the the hundreds if does, it add up.

jefozabuss

I just installed a project with pnpm about 120 packages mostly react/webpack/eslint/redux related

with prod env: 700MB

without prod env: 900MB

sadly the bloat cannot be avoided that well :/

jeffhuys

pnpm stores them in a central place and symlinks them. You’ll see the benefits when you have multiple projects with a lot of the same packages.

TheRealPomax

That's on the package publishers, not NPM. They give you an `.npmignore` that's trivially filled out to ensure your package isn't full of garbage, so if someone doesn't bother using that: that's on them, not NPM.

(And it's also a little on the folks who install dependencies: if the cruft in a specific library bothers you, hit up the repo and file an issue (or even MR/PR) to get that .npmignore file filled out. I've helped folks reduce their packages by 50+MB in some cases, it's worth your own time as much as it is theirs)

silverwind

It's much better to allowlist the files meant to be published using `files` in package.json because you never know what garbage the user has in their folder at the time of publish.

On a typical project with a build step, only a `dist` folder would published.

TheRealPomax

Not a fan of that one myself (it's far easier to tell what doesn't belong in a package vs. what does belong in a package) but that option does exist, so as a maintainer you really have no excuse, and as a user you have multiple MR/PRs that you can file to help them fix their cruft.

> On a typical project with a build step, only a `dist` folder would published.

Sort of, but always include your docs (readme, changelog, license, and whatever true docs dir you have, if you have one). No one should need a connection for those.

hinkley

I believe I knocked 10% off of our node_modules directory by filing .npmignore PRs or bug reports to tools we used.

Now if rxjs weren’t a dumpster fire…

fergie

Props to anyone who tries to make the world a better place.

Its not always obvious who has the most important use cases. In the case of NPM they are prioritizing the user experience of module authors. I totally see how this change would be great for module consumers, yet create potentially massive inconvenience for module authors.

Interesting write-up

atiedebee

I think "massive" is overstating it. I don't think deploying a new version of a package is something that happens many times a day, so it wouldn't be a constant pain point.

Also, since this is a case of having something compressed once and decompressed potentially thousands of times, it seems like the perfect tool for the job.

philipwhiuk

Every build in a CI system would probably create the package.

This is changing every build in every CI system to make it slower.

mkesper

Just use it on the release build.

abound

A few people have mentioned the environmental angle, but I'd care more about if/how much this slows down decompression on the client. Compressing React 20x slower once is one thing, but 50 million decompressions being even 1% slower is likely net more energy intensive, even accounting for the saved energy transmitting 4-5% fewer bits on the wire.

web007

It's very likely zero or positive impact on the decompression side of things.

Starting with smaller data means everything ends up smaller. It's the same decompression algorithm in all cases, so it's not some special / unoptimized branch of code. It's yielding the same data in the end, so writes equal out plus or minus disk queue fullness and power cycles. It's _maybe_ better for RAM and CPU because more data fits in cache, so less memory is used and the compute is idle less often.

It's relatively easy to test decompression efficiency if you think CPU time is a good proxy for energy usage: go find something like React and test the decomp time of gzip -9 vs zopfli. Or even better, find something similar but much bigger so you can see the delta and it's not lost in rounding errors.

DannyBee

I can speak to this - there is no meaningful decompression effect across an insane set of tested data at Google and elsewhere. Zopfli was invented prior to brotli

Zopfli is easiest to think of as something that just tries harder than gzip to find matches and better encodings. Much harder.

decompression speed is linear either way.

It's easiest to think of decompression as a linear time vm executor[1], where the bytecoded instructions are basically

go back <distance> bytes, output the next <length> bytes you see, then output character <c>

(outputting literal data is the instruction <0,0,{character to output}>)

Assuming you did not output a file larger than the original uncompressed file (why would you bother?), you will, worst case, process N bytes during decompression, where N is the size of the original input file.

The practical decompression speed is driven by cache behavior, but it thrashes the cache no matter what.

In practice, reduction of size vs gzip occurs by either finding larger runs, or encodings that are smaller than the existing ones.

After all, if you want the compressed file to shrink, you need output less instructions somehow, or make more of the instructions identical (so they can be represented in less bits by later huffman coding).

In practice, this has almost exclusively positive effects on decompression speed - either the vm has less things to process (which is faster), or more of the things it does look the same (which has better cache behavior).

[1] this is one way archive formats will sometimes choose to deal with multiple compression method support - encode them all to the same kind of bytecode (usually some form of copy + literal instruction set), and then decoding is the same for all of them. ~all compression algorithms output some bytecode like the above on their own already, so it's not a lot of work. This doesn't help you support other archive formats, but if you want to have a bunch of per-file compression options that you pick from based on what works best, this enables you to still only have to have one decoder.

sltkr

For formats like deflate, decompression time doesn't generally depend on compressed size. (zstd is similar, though memory use can depend on the compression level used).

This means an optimization like this is virtually guaranteed to be a net positive on the receiving end, since you always save a bit of time/energy when downloading a smaller compressed file.

adgjlsfhk1

This seems like a place where the more ambitious version that switches to ZSTD might have better tradeoffs. You would get similar or better compression, with faster decompression and recompression than zstd.It would lose backward compatibility though...

vlovich123

Not necessarily - could retain backward compat by publishing both gzip and zstd variants and having downloaders with newer npm’s prefer to download zstd. Over time, you could require packages only upload zstd going forward and either generate zstd versions of the backlog of unmaintained packages or at least those that see some amount of traffic over some time period if you’re willing to drop very old packages. The ability to install arbitrary versions of packages probably means you’re probably better off reprocessing the backlog although that may cost more than doing nothing.

The package lock checksum is probably a more solvable issue with some coordination.

The benefit of doing this though is less immediate - it will take a few years to show payoff and these kinds of payoffs are not typically made by the kind of committee decisions process described (for better or worse).

bufferoverflow

Brotli and lzo1b have good compression ratios and pretty fast decompression speeds. Compression speed should not matter that much, since you only do it once.

https://quixdb.github.io/squash-benchmark/

There even more obscure options:

https://www.mattmahoney.net/dc/text.html

MBCook

Thats a much higher hurdle to jump. I don’t blame the author for trying this first.

If accepted, it might have been a good stepping stone too. A chance to get to know everyone and their concerns and how they think.

So if you wanted to see how this works (proposal + in prod) and then come back later proposing something bigger by switching off zip that would make sense to me as a possible follow up.

PaulHoule

Years back I came to the conclusion that conda using bzip2 for compression was a big mistake.

Back then if you wanted to use a particular neural network it was meant for a certain version of Tensorflow which expected you to have a certain version of the CUDA libs.

If you had to work with multiple models the "normal" way to do things was use the developer unfriendly [1][2] installers from NVIDIA to install a single version of the libs at a time.

Turned out you could have many versions of CUDA installed as long as you kept them in different directories and set the library path accordingly, it made sense to pack them up for conda and install them together with everything else.

But oh boy was it slow to unpack those bzip2 packages! Since conda had good caching, if you build environments often at all you could be paying more in decompress time than you pay in compression time.

If you were building a new system today you'd probably use zstd since it beats gzip on both speed and compression.

[1] click... click... click...

[2] like they're really going to do something useful with my email address

zahlman

>But oh boy was it slow to unpack those bzip2 packages! Since conda had good caching, if you build environments often at all you could be paying more in decompress time than you pay in compression time.

For Paper, I'm planning to cache both the wheel archives (so that they're available without recompressing on demand) and unpacked versions (installing into new environments will generally use hard links to the unpacked cache, where possible).

> If you were building a new system today you'd probably use zstd since it beats gzip on both speed and compression.

FWIW, in my testing LZMA is a big win (and I'm sure zstd would be as well, but LZMA has standard library support already). But there are serious roadblocks to adopting a change like that in the Python ecosystem. This sort of idea puts them several layers deep in meta-discussion - see for example https://discuss.python.org/t/pep-777-how-to-re-invent-the-wh... . In general, progress on Python packaging gets stuck in a double-bind: try to change too little and you won't get any buy-in that it's worthwhile, but try to change too much and everyone will freak out about backwards compatibility.

PaulHoule

I designed a system which was a lot like uv but written in Python and when I looked at the politics I decided not to go forward with it. (My system also had the problem that it had to be isolated from other Pythons so it would not get its environment trashed, with the ability for software developers to trash their environment I wasn't sure it was a problem that could be 100% solved. uv solved it by not being written in Python. Genius!)

zahlman

Yes, well - if I still had reason to care about the politics I'd be in much the same position, I'm sure. As is, I'm going to just make the thing, write about it, and see who likes it.

stabbles

One thing that's excellent about zopfli (apart from being gzip compatible) is how easy it is to bootstrap:

    git clone https://github.com/google/zopfli.git
    cc -O2 zopfli/src/zopfli/*.c -lm
It just requires a C compiler and linker.

stabbles

The main downside though, it's impressively slow.

Comparing to gzip isn't really worth it. Combine pigz (threaded) with zlib-ng (simd) and you get decent performance. pigz is used in `docker push`.

For example, gzipping llvm.tar (624MB) takes less than a second for me:

    $ time /home/harmen/spack/opt/spack/linux-ubuntu24.04-zen2/gcc-13.2.0/pigz-2.8-5ptdjrmudifhjvhb757ym2bzvgtcsoqc/bin/pigz -k hello.tar 
    
    real    0m0.779s
    user    0m11.126s
    sys     0m0.460s
At the same time, zopfli compiled with -O3 -march=native takes 35 minutes. No wonder it's not popular.

It is almost 2700x slower than the state of the art for just 6.8% bytes saved.

Levitating

> 2700x slower

That is impressively slow.

In my opinion even the 28x decrease in performance mentioned would be a no-go. Sure the package saves a few bytes but I don't need my entire pc to grind to a halt every time I publish a package.

Besides, storage is cheap but CPU power draw is not. Imagine the additional CO2 that would have to be produced if this RFC was merged.

> 2 gigabytes of bandwidth per year across all installations

This must be a really rough estimate and I am curious how it was calculated. In any case 2 gigabytes over a year is absolutely nothing. Just my home network can produce a terabyte a day.

bonzini

2 GB for the author's package which is neither extremely common nor large; it would be 2 TB/year just for react core.

orta

Congrats on a great write-up. Sometimes trying to ship something at that sorta scale turns out to just not really make sense in a way that is hard to see at the beginning.

Another personal win is that you got a very thorough understanding of the people involved and how the outreach parts of the RFC process works. I've also had a few fail, but I've also had a few pass! Always easier to do the next time

hinkley

Pulling on this thread, there are a few people who have looked at the ways zopfli is inefficient. Including this guy who forked it, and tried to contribute a couple improvements back to master:

https://github.com/fhanau/Efficient-Compression-Tool

These days if you’re going to iterate on a solution you’d better make it multithreaded. We have laptops where sequential code uses 8% of the available cpu.

vardump

> These days if you’re going to iterate on a solution you’d better make it multithreaded.

Repetition eliminating compression tends to be inherently sequential. You'd probably need to change the file format to support chunks (or multiple streams) to do so.

Because of LZ back references, you can't LZ compress different chunks separately on different cores and have only one compression stream.

Statistics acquisition (histograms) and entropy coding could be parallel I guess.

(Not a compression guru, so take above with a pinch of salt.)

hinkley

There are gzip variants that break the file into blocks and run in parallel. They lose a couple of % by truncating the available history.

But zopfli appears to do a lot of backtracking to find the best permutations for matching runs that have several different solutions. There’s a couple of ways you could run those in parallel. Some with a lot of coordination overhead, others with a lot of redundant calculation.

jsheard

I wonder if it would make more sense to pursue Brotli at this point, Node has had it built-in since 10.x so it should be pretty ubiquitous by now. It would require an update to NPM itself though.

silverwind

+1 to brotli. Newly published packages could use brotli by default, so old ones stay compatible.

canucker2016

Here's the Brotli supporter's blog post about adding Brotli support to NPM packages.

https://jamiemagee.co.uk/blog/honey-i-shrunk-the-npm-package...

and the related HN discussion from that time:

https://news.ycombinator.com/item?id=37754489

frabjoused

This reminds me of a time I lost an argument with John-David Dalton about cleaning up/minifying lodash as an npm dependency, because when including the readme and license for every sub-library, a lodash import came to ~2.5MB at the time. This also took a lot of seeking time for disks because there were so many individual files.

The conversation started and ended at the word cache.

MBCook

> This also took a lot of seeking time for disks because there were so many individual files.

The fact NPM keeps things in node_modules unzipped seems wild to me. Filesystems are not great at hundreds of thousands of little files. Some are bad, others are terrible.

Zip files are easier to store, take up less space, and CPUs are faster than disks so the decompression in memory is probably faster reading the unzipped files.

That was one of my favorite features of Yarn when I tried it - pnp mode. But since it’s not what NPM does it requires a shim that doesn’t work with all ps mage’s. Or at least didn’t a few years ago.