Skip to content(if available)orjump to list(if available)

Going down the rabbit hole of Git's new bundle-URI

jakub_g

This is super interesting, as I maintain a 1M commits / 10GB size repo at work, and I'm researching ways to have it cloned by the users faster. Basically for now I do a very similar thing manually, storing a "seed" repo in S3 and having a custom script to fetch from S3 instead of doing `git clone`. (It's faster than cloning from GitHub, as apart from not having to enumerate millions of objects, S3 doesn't throttle the download, while GH seem to throttle at 16MiB/s.)

Semi-related: I always wondered but never got time to dig into what exactly are the contents of the exchange between server and client; I sometimes notice that when creating a new branch off main (still talking the 1M commits repo), with just one new tiny commit, the amount of data the client sends is way bigger than I expected (tens of MBs). I always assumed the client somehow established with the server that it has a certain sha, and only uploads missing commit, but it seems it's not exactly the case when creating a new branch.

maccard

Funny you say this. At my last job I managed a 1.5TB perforce depot with hundreds of thousands of files and had the problem of “how can we speed up CI”. We were on AWS, so I synced the repo, created an ebs snapshot and used that to make a volume, with the intention of reusing it (as we could shove build intermediates in there too.

It was faster to just sync the workspace over the internet than it was to create the volume from the snapshot, and a clean build was quicker from the just sync’ed workspace than the snapshotted one, presumably to do with however EBS volumes work internally.

We just moved our build machines to the same VPC as the server and our download speeds were no longer an issue.

jclarkcom

VMware?

captn3m0

The linux kernel does the same thing, and publishes bundle files over CDN[0] for CI systems using a script called linux-bundle-clone[1]

[0]: https://www.kernel.org/best-way-to-do-linux-clones-for-your-...

[1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/mricon/k...

autarch

> This has resulted in a contender for the world's smallest open source patch:

Hah, got you beat: https://github.com/eki3z/mise.el/pull/12/files

It's one ASCII character, so a one-byte patch. I don't think you can get smaller than that.

yangman

There is a cursor rendering fix in xf86-video-radeonhd (or perhaps -radeon) that flips a single bit.

It took the group several years to narrow in on.

timdorr

falcor84

What's the story behind that? Did you just deploy a blank commit to trigger a hook?

nine_k

Only accepted and merged commits count!

ZeWaka

That's a line modification, so presumably you'd count just an insertion or just a deletion as 'smaller'.

autarch

Yes, but so is the PR shown in the article. You're not going to get a diff that's less than one line unless you are using something besides the typical diff and patch tools.

san1t1

My smallest PR was adding a missing executable file permission.

andrewshadura

Interestingly, Mercurial had solved the bundles more than ten years ago and back then they already worked better than Git's today

capitainenemo

Not the only mercurial feature where that's the case.. sad, I keep rooting for the project to implement mercurial frontend over a git db, but they seem to be limited by missing git features.

nine_k

But branches were more problematic.

capitainenemo

Mercurial has had git-like "lightweight branches"/bookmarks without the revision record of mercurial named branches for over 15 years. There are good reasons to use the traditional branches though.

https://mercurial.aragost.com/kick-start/en/bookmarks/

jedimastert

This actually might solve a massive CI problem we've been having...will report back tomorrow

jwpapi

!remind me