Going down the rabbit hole of Git's new bundle-URI
19 comments
·March 13, 2025jakub_g
maccard
Funny you say this. At my last job I managed a 1.5TB perforce depot with hundreds of thousands of files and had the problem of “how can we speed up CI”. We were on AWS, so I synced the repo, created an ebs snapshot and used that to make a volume, with the intention of reusing it (as we could shove build intermediates in there too.
It was faster to just sync the workspace over the internet than it was to create the volume from the snapshot, and a clean build was quicker from the just sync’ed workspace than the snapshotted one, presumably to do with however EBS volumes work internally.
We just moved our build machines to the same VPC as the server and our download speeds were no longer an issue.
jclarkcom
VMware?
captn3m0
The linux kernel does the same thing, and publishes bundle files over CDN[0] for CI systems using a script called linux-bundle-clone[1]
[0]: https://www.kernel.org/best-way-to-do-linux-clones-for-your-...
[1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/mricon/k...
autarch
> This has resulted in a contender for the world's smallest open source patch:
Hah, got you beat: https://github.com/eki3z/mise.el/pull/12/files
It's one ASCII character, so a one-byte patch. I don't think you can get smaller than that.
yangman
There is a cursor rendering fix in xf86-video-radeonhd (or perhaps -radeon) that flips a single bit.
It took the group several years to narrow in on.
timdorr
falcor84
What's the story behind that? Did you just deploy a blank commit to trigger a hook?
nine_k
Only accepted and merged commits count!
ZeWaka
That's a line modification, so presumably you'd count just an insertion or just a deletion as 'smaller'.
andrewshadura
Interestingly, Mercurial had solved the bundles more than ten years ago and back then they already worked better than Git's today
capitainenemo
Not the only mercurial feature where that's the case.. sad, I keep rooting for the project to implement mercurial frontend over a git db, but they seem to be limited by missing git features.
nine_k
But branches were more problematic.
capitainenemo
Mercurial has had git-like "lightweight branches"/bookmarks without the revision record of mercurial named branches for over 15 years. There are good reasons to use the traditional branches though.
jedimastert
This actually might solve a massive CI problem we've been having...will report back tomorrow
jwpapi
!remind me
This is super interesting, as I maintain a 1M commits / 10GB size repo at work, and I'm researching ways to have it cloned by the users faster. Basically for now I do a very similar thing manually, storing a "seed" repo in S3 and having a custom script to fetch from S3 instead of doing `git clone`. (It's faster than cloning from GitHub, as apart from not having to enumerate millions of objects, S3 doesn't throttle the download, while GH seem to throttle at 16MiB/s.)
Semi-related: I always wondered but never got time to dig into what exactly are the contents of the exchange between server and client; I sometimes notice that when creating a new branch off main (still talking the 1M commits repo), with just one new tiny commit, the amount of data the client sends is way bigger than I expected (tens of MBs). I always assumed the client somehow established with the server that it has a certain sha, and only uploads missing commit, but it seems it's not exactly the case when creating a new branch.