Skip to content(if available)orjump to list(if available)

Breaking Git with a carriage return and cloning RCE

armchairhacker

> The result of all this, is when a submodule clone is performed, it might read one location from path = ..., but write out a different path that doesn’t end with the ^M.

How does this achieve “remote code execution” as the article states? How serious is it from a security perspective?

> I'm not sharing a PoC yet, but it is an almost trivial modification of an exploit for CVE-2024-32002. There is also a test in the commit fixing it that should give large hints.

EDIT: from the CVE-2024-32002

> Repositories with submodules can be crafted in a way that exploits a bug in Git whereby it can be fooled into writing files not into the submodule's worktree but into a .git/ directory. This allows writing a hook that will be executed while the clone operation is still running, giving the user no opportunity to inspect the code that is being executed.

So a repository can contain a malicious git hook. Normally git hooks aren’t installed by ‘git clone’, but this exploit allows one to, and a git hook can run during the clone operation.

beala

More information here (https://github.blog/open-source/git/git-security-vulnerabili...) on the new CVE:

> When reading a configuration value, Git will strip any trailing carriage return (CR) and line feed (LF) characters. When writing a configuration value, however, Git does not quote trailing CR characters, causing them to be lost when they are read later on. When initializing a submodule whose path contains a trailing CR character, the stripped path is used, causing the submodule to be checked out in the wrong place.

> If a symlink already exists between the stripped path and the submodule’s hooks directory, an attacker can execute arbitrary code through the submodule’s post-checkout hook.

Along with a bunch of other git CVEs that are worth a look.

gitfan86

This seems easy for GitHub to block

caust1c

Yes, unfortunately it's pretty trivial. Any time arbitrary file write is possible, RCE is usually possible too.

null

[deleted]

pinoy420

[flagged]

jmb99

In this case, there is more than enough information given to make an exploit trivial for anyone actually investigating the issue. I don’t see a reason to distribute PoC exploit code here, it’s fairly clear what the problem is as well as at least one possible RCE implementation.

zahlman

A well-written article should still have spelled out what armchairhacker did. The article spent many paragraphs working through what seemed to me like a very obvious chain of reasoning that didn't need so much hand-holding, and then left me completely bewildered at the end with the last step. And even with the explanation, I'm still not sure why writing the files to a different path allows the hook to be written. (Surely Git knows that it's still in the middle of a recursive clone operation and that it therefore shouldn't accept any hooks?)

10000truths

This is a big problem with using ad-hoc DSLs for config - there's often no formal specification for the grammar, and so the source of truth for parsing is spread between the home-grown serialization implementation and the home-grown deserialization implementation. If they get out of sync (e.g. someone adds new grammar to the parser but forgets to update the writer), you end up with a parser differential, and tick goes the time bomb. The lesson: have one source of truth, and generate everything that relies on it from that.

fpoling

This bug is orthogonal to one source of truth. It is a pure logic bug that could have existed in a standard system library for config files if such existed on Unix.

And consider that consequences of such bug would be much worse if it was in a standard system library. At least here it is limited mostly to developers where machines are updated.

ajross

Nitpick: the DSL here ("ini file format") is arguably ad-hoc, but it's extremely common and well-understood, and simple enough to make a common law specification work well enough in practice. The bug here wasn't due to the format. What you're actually complaining about is the hand-coded parser[1] sitting in the middle of a C program like a bomb waiting to go off. And, yes, that nonsense should have died decades ago.

There are places for clever hand code, even in C, even in the modern world. Data interchange is very not much not one of them. Just don't do this. If you want .ini, use toml. Use JSON if you don't. Even YAML is OK. Those with a penchant for pain like XML. And if you have convinced yourself your format must be binary (you're wrong, it doesn't), protobufs are there for you.

But absolutely, positively, never write a parser unless your job title is "programming language author". Use a library for this, even if you don't use libraries for anything else.

[1] Fine fine, lexer. We are nitpicking, after all.

heisenbit

How many hand crafted lexers dealing with lf vs. cr-lf encodings do exist? My guess is n > ( number of people who coded > 10 KSLOC ).

JdeBP

Reading someone quote Jon Postel in the context of CR+LF brings back memories.

* https://jdebp.uk/FGA/qmail-myths-dispelled.html#MythAboutBar...

"that may not be the most sensible advice now", says M. Leadbeater today. We were saying that a lot more unequivocally, back in 2003. (-:

As Mark Crispin said then, the interpretations that people put on it are not what M. Postel would have agreed with.

Back in the late 1990s, Daniel J. Bernstein did the famous analysis that noted that parsing and quoting when converting between human-readable and machine-readable is a source of problems. And here we are, over a quarter of a century later, with a quoter that doesn't quote CRs (and even after the fix does not look for all whitespace characters).

Amusingly, git blame says that the offending code was written 19 years ago, around the time that Daniel J. Bernstein was doing the 10 year retrospective on the dicta about parsing and quoting.

* https://github.com/git/git/commit/cdd4fb15cf06ec1de588bee457...

* https://cr.yp.to/qmail/qmailsec-20071101.pdf

I suppose that we just have to keep repeating the lessons that were already hard learned in the 20th century, and still apply in the 21st.

darig

[dead]

Lockal

"trivial modification of an existing exploit"...

Why git does not use Landlock? I know it is Linux-only, but why? "git clone" should only have r/o access to config directory and r/w to clone directory. And no subprocesses. In every exploit demo: "Yep, <s>it goes to a square hole</s> it launches a calculator".

TheDong

> no subprocesses

I guess you're okay with breaking all git hooks, including post-checkout, because those are subprocesses as a feature.

You can always run your git operations in a container with seccomp or such if you're not using any of the many features that it breaks

Spivak

This would also break custom commands. Which if you don't know about it, is a pretty cool feature.

Drop a git-something executable in your path and you can execute it as git something.

byearthithatius

Why is this helpful? Just add the executable itself to path and execute it with "something" instead of "git something". Why are we making git an intermediary ? I am kind of stupid and this is genuine.

SSLy

> And no subprocesses.

have you never used git over ssh?

deanc

Would homebrew itself be problematic here? Does it do recursive cloning?

At least a cursory glance at the repo suggests it might: https://github.com/Homebrew/brew/blob/700d67a85e0129ab8a893f...

msgodel

It would be odd if it didn't. Although the goal of homebrew is to execute the code in the repo.

The only situation where the RCE here is a problem is if you clone github repos containing data you don't want to execute. That's fairly unusual.

null

[deleted]

leni536

The question is whether recursive submodule checkout happens after some integrity/signature validation or before. The RCE can be an issue in the latter case.

lossolo

It seems like Homebrew still provides a vulnerable version, the same goes for Debian Bookworm.

therealmarv

guess I have to wait a bit more... no update to git 2.50.1 on Homebrew yet.

null

[deleted]

dwrodri

The year is 2025 and we are still cleaning up string parsing RCEs in C. I should revisit why in the world efforts for a more secure string parsing in C never took off.

jerf

As the article says: "I find this particularly interesting because this isn't fundamentally a problem of the software being written in C. These are logic errors that are possible in nearly all languages, the common factor being this is a vulnerability in the interprocess communication of the components (either between git and external processes, or within the components of git itself). It is possible to draw a parallel with CRLF injection as seen in HTTP (or even SMTP smuggling)."

You can write this in any language. None of them will stop you. I'm on the cutting edge of "stop using C", but this isn't C's fault.

gpm

You can, but in languages like python/java/go/rust/... you wouldn't, because you wouldn't write serialization/de-serialization code by hand but call out to a battle hardened library.

This vulnerability is the fault of the C ecosystem where there is no reasonable project level package manager so everyone writes everything from scratch. It's exacerbated by the combination of a lack of generics (rust/java's solution), introspection (java/python's solution), and poor preprocessor in C (go's solution) so it wouldn't even be easy to make a ergonomic general purpose parser.

shakna

Python's pathlib wouldn't help you here, it can encode the necessary bits. Especially with configparser - it's 20 year old configuration reader. Java's story is worse.

What part of this would be prevented by another language?

You'd need to switch your data format to something like json, toml, etc. to prevent this from the outset. But JSON was first standardised 25 years ago, and AJAX wasn't invented when this was written. JSON was a fledgling and not widely used yet.

I guess we had netrc - but that's not standardised and everyone implements it differently. Same story for INI.

There was XML - at a time when it was full of RCEs, and everyone was acknowledging that its parser would be 90% of your program. Would you have joined the people disparaging json at the time as reinventing xml?

This vulnerability is the fault of data formats not being common enough to be widely invented yet.

bangaladore

I have a feeling that this code was developed before any of those languages were widely popular and before their package managers or packages were mature.

This file was written like 20 years ago.

alexvitkov

We keep getting RCEs in C because tons of useful programs are written in C. If someone writes a useful program in Rust, we'll get RCEs in Rust.

dietr1ch

It's not that only C programs are useful. It's that subtle mistakes on C result in more catastrophic vulnerabilities.

Make a mistake in application code in a language like, say Java, and you'll end up with an exception.

asplake

The article refutes that somewhat:

> I find this particularly interesting because this isn't fundamentally a problem of the software being written in C. These are logic errors that are possible in nearly all languages, the common factor being this is a vulnerability in the interprocess communication of the components (either between git and external processes, or within the components of git itself).

mrkeen

C programmers don't see C problems. They see logic errors that are possible in any language.

dietr1ch

Running with scissors isn't a problem. The problem is stabbing yourself with them. Isn't it obvious?

bpt3

As mentioned in the article, this is a logic error that has nothing to do with C strings.

eptcyka

Whilst true, there’s a swathe of modern tooling that will aide in marshalling data for IPC. Would you not agree that if protobuf, json or yaml were used, it’d be far less likely for this bug have slipped in?

alexvitkov

In isolation, for any one particular bug, yes, but if you start applying this logic to everything, even problems as simple as reading some bytes from a file, you end up with a heao of dependencies for the most mundane things. We've tried that, it's bad.

bangaladore

The OC was about language choice. You can use protobuf, json or yaml in C as well.

In general, though, all these can be wildly overkill for many tasks. At some point you just need to write good code and actually test it.

greatgib

Having "safe" yaml parsing is a whole topic of head scratching in whatever language of your choice if you want a rabbit hole to look into...

smileson2

If only it were just the c code that was causing people to be owned lol

null

[deleted]

tuetuopay

Using other languages would likely fix the issue but as a side-effect. Most people would expect a C-vs-Rust comparison so I’ll take Go as an example.

Nobody would write the configuration parsing code by hand, and just use whatever TOML library available at hand for Go. No INI shenanigans, and people would just use any available stricter format (strings must be quoted in TOML).

So yeah, Rust and Go and Python and Java and Node and Ruby and whatnot would not have the bug just by virtue of having a package manager. The actual language is irrelevant.

However, whatever the language, the same hand implementation would have had the exact same bug.

TacticalCoder

[flagged]

zahlman

Suppose the system call to list a directory examined the place on the disk where a filename should be, and found bytes representing ASCII control characters. Should it deny the existence of the corresponding file? Assume disk corruption? Something else? After all, maybe (this is admittedly more theoretical than practical) those bytes map to something else in the current locale. It's not like modern Windows which assumes the filenames are all UTF-16.

0x457

Because filenames (and all other strings) are just bags of bytes on unix based systems.

smaudet

Is it just me or is the font an eyestrain on this blog?

MisterTea

I am not sure if there's bias on my part after reading your comment but yes, it is bothersome.

metalliqaz

yeah I see what you mean. it's like the anti-aliasing is broken

null

[deleted]

b0a04gl

why tf is git still running submodule hooks during clone at all. like think. youre cloning a repo which you didnt write it or audit it. and git just... runs a post checkout hook from a submodule it just fetched off the internet. even with this CRLF bug fixed, thats still bananas