Skip to content(if available)orjump to list(if available)

Artisanal handcrafted Git repositories

bradfitz

My recent horror from some git work was discovering how git sorts its tree objects.

The docs just say to sort by C locale (byte-order sorting). Easy. Except git was sometimes rejecting my packfiles as being bogus per its fsck code, saying my trees were misordered.

TURNS OUT THERE'S AN UNDOCUMENTED RULE: you need to append an implicit forward slash to directory tree entry names before you sort them.

That forward slash is not encoded in the tree object, nor is the type of the entry. You just put the 20 byte SHA1 hash, which is to either a blob or a hash (or a commit for submodules).

So you can have one directory with directory "testing" and file "testing.md" and it'll sort differently than a directory with two files "testing" and "testing.md".

You can see a repro at https://gist.github.com/bradfitz/4751c58b07b57ff303cbfec3e39...

(So to verify whether a tree object is formatted correctly, you need to have the blobs of all the entries in the tree, at least one level)

lucasoshiro

Something that I really like in Git is how its data structures are easy to understand and how transparent it is. It's possible to write your own "Git" compatible with existing Git directories only by reading how it works under the hood

shivasaxena

I agree, but only in theory.

Projects like gitoxide have been in development for years now.

fiddlerwoaroof

I wrote a nearly complete implementation of git file format parsers in Common Lisp over like a month of evenings and weekends. I’m sure there are hard parts between where I am and a full git implementation but you can get quite a bit of utility out of a relatively small amount of effort.

MrJohz

It's a case of Pareto. Parsing the git file format is relatively simple, but handling all the weird states a Git repo can be in and doing the correct things to those files in each state is a lot harder. And then adding the network protocol on top of that makes directly reproducing Git quite difficult.

I know JJ used to use Git2 for a lot of network operations like pushing and pulling, but ran into too many issues with SSH handling that they've since switched to directly invoking the Git binary for those operations.

lucasoshiro

Yeah, I wrote mine in Haskell. It's a good exercise for understanding how Git works

chubot

Not sure what gitoxide is, but libgit already exists, and it seems to be an independent implementation - https://github.com/libgit2/libgit2

I think Github and most big Git hosts use it

steveklabnik

libgit2 has a ton of compatibility issues, especially around authentication, that make it only useful in some circumstances.

(gitoxide is a similar project but in Rust, it's not ready for the big time either, though it keeps on getting better!)

3eb7988a1663

Jujitsu threw in the towel and is shelling out to the git CLI because of minor variations in libgit vs the binary.

Failing to find a write-up, but there was this lobster thread[0] where someone from GitLab reported they had to do the same owing to some discrepancies vs the binary -where all of the real development happens.

[0] https://lobste.rs/s/vmdggh/jujutsu_v0_30_0_released

veganjay

Neat to see this done by hand! It helps demystify the magic behind git commands.

If you like this, I also recommend "Write Yourself a Git", where you build a minimal git implementation using python: https://wyag.thb.lt/

bhasi

A similar project is CodeCrafters' Build Your Own Git: https://app.codecrafters.io/courses/git/overview

wonderwonder

How cool, thank you

sc68cal

To the site author: I'm on a MBP M1 Mac and honestly I can't really read the text. Far too small, and increasing the zoom just makes the text large but the margins less wide. Firefox reader mode also renders really badly.

Please, consider making the layout better for us old coders whose eyes are going, or for hi res displays

derefr

FYI: the pinch-to-zoom gesture from mobile browsers (from before websites were mobile-responsive) has also long been implemented for all modern desktop browsers. It's viewport zoom, which is far better than the font-scaling zoom you get by pressing Cmd-+, and makes this site easily readable.

(The much-less-well-known mobile double-tap-on-text gesture [it zooms-to-fit whatever element you tapped on to the width of the viewport] was also ported to desktop browsers. Though, on desktop with a touchpad, it's a two-finger double-tap — which I don't think anyone would ever even think to try.)

BobaFloutist

Double tap on text highlights it for me. Is that an Iphone/android thing or what?

derefr

As I said, it's a two finger double-tap.

But also, under further investigation — and unlike with pinch-to-zoom — desktop support for the two-finger double-tap gesture seems to be specific to macOS. (Which is weird, because Chrome has support for arbitrary multitouch gesture processing to enable the JS multitouch API. So you'd think Chrome's support for "the multitouch gestures the OS expects" would be built on top of that generic multitouch recognizer [and therefore working everywhere that recognizer works], instead of expecting the OS to pre-recognize specific gestures and translate them to specific OS input events.)

sam_lowry_

Works great on Firefox for Android though )

lucasoshiro

Also works great on Safari on a M1 MacBook Air, here

BobbyTables2

I realize the concept is very similar but would love to see a writeup on bow Docker stores images using OverlayFS. (Has quite a bit of metadata!)

kassah

The simplicity of Git is awesome. Great article! I had looked at what it would take to find a single file in a remote git repo. I decided against talking the git protocol directly and just checking out the entire repo to get a single file. Reading through this makes me think I may have given up too easily.

I asked a few git hosting providers, and they all said they had private APIs developed internally for the purpose.

HexDecOctBin

Okay, there's something I have been thinking about recently. Is it possible to somehow make Git use the Content Defined Chunking algorithm from rsync? Maybe somehow using clean/smudge? If not git, then maybe Mercurial, Fossil or any other DVCS?

This would help with large binary assets without having to deal with the mess that is LFS, as long as the assets were uncompressed.

aeblyve

I thought this was going to be a sardonic article about doing programming without LLMs.

lioeters

I'm starting to see this kind of wording as a unique selling point, that some software (or article, visual art, etc.) is handcrafted and artisanal, as opposed to AI-generated. "Every word was written by me, a human being!" At this point in the emerging technology I can usually tell the difference intuitively, but it's possible that one day it will be indistinguishable - and the quality of "handmade" will be simply a matter of branding for niche enthusiasts, like vinyl records.

gerdesj

This is all very well but how does Linus Thorvalds use git? Given he invented the bloody thing, it might be nice to see how the Boss uses it!

git was created to scratch an itch (actually a bit of a roiling boil, that needed a serious amount of soothing ointment and as it turns out: a compiler, some source code and quite a lot of effort). ... anyway the history of it is well documented.

FFS: git was called git because a Finnish bloke with English as a second, but well used, tongue had learned what a "git" is and it seemed appropriate. Bear in mind that Mr T was deeply in his shouty phase at that point in time.

Artisanal git sounds all kinds of wrong 8) Its just a tool to do a job and I suggest you use it in the same way as the XKCD comic mandates (that is the official manual, despite what you might think)

The Conclusion is spot on - great article.

deadbabe

[flagged]

ChrisMarshallNY

My understanding is that Mercurial is sort of Beta to Git's VHS. There are some definite advantages, but it's losing support.

GuB-42

I am sure that it is because the porn industry settled on Git :)

Anyways, I started on Mercurial, and I think it has a better UX, but technically I now prefer Git. The success of Mercurial over Git surprised me a little because of that, Git is not an easy version control system to get into, at least when compared to Mercurial, it shouldn't help adoption, but I guess it is just because some big names decided on Git.

Mercurial and Git use the same fundamental principles, and one is not really better than the other, just details.

zanecodes

I thought all the cool kids were on Pijul, or was it Darcs? Maybe it was Fossil? No wait, it was definitely Jujutsu.

jact

Can confirm that cool kids are definitely using Fossil

null

[deleted]

lysace

I would have called this: "Futzing around with internal git data structures".