Skip to content(if available)orjump to list(if available)

Whose code am I running in GitHub Actions?

ptx

There seems to be a slight misunderstanding in the article. It says that the "v2" tag "looks like an immutable reference" and points out that it's actually mutable, as if this was surprising and unintended. It also says that the reason people use tags despite this (making a tradeoff against security) is that "tags are easier to read and compare".

But the GitHub documentation [0] makes it clear that tags for major versions are intended to be mutable and be updated to point to new minor versions as they are released, not because it's "easier to read" but because you "can expect an action's patch version to include necessary critical fixes and security patches, while still remaining compatible with their existing workflows" (as long as the author follows their recommended semantic versioning scheme).

So choosing a major-version tag is GitHub's recommended practice precisely because it is mutable and does change.

[0] https://docs.github.com/en/actions/sharing-automations/creat...

cesarb

> major versions are intended to be mutable and be updated to point to new minor versions as they are released [...] because you "can expect an action's patch version to include necessary critical fixes and security patches [...]

It's two sides of the same coin: on one hand, an update can include fixes for bugs and vulnerabilities; on the other hand, an update can also include new bugs and vulnerabilities (or even malicious code). Updating too quickly can be risky. Updating too slowly can also be risky.

dgl

Unfortunately this makes a mistake by using a short commit ID: "(e.g. a5b3abf)"

That's not a full commit ID, so it can still result in a mutable reference if either someone can find a clash[1] or if they can push a tag with that name and it takes priority in the context it is used (this is somewhat complex, e.g. GitHub prohibits pushes of branches and tags which are exactly 40 hex characters long, but other services may not).

[1]: https://people.kernel.org/kees/colliding-with-the-sha-prefix...

bewuethr

Shortened commit SHAs are actually not supported by Actions; if you try, you get

"Unable to resolve action `actions/checkout@11bd719`, the provided ref `11bd719` is the shortened version of a commit SHA, which is not supported. Please use the full commit SHA `11bd71901bbe5b1630ceea73d27597364c9af683` instead."

chatmasta

What if the repository has a tag called 11bd719? Does Git/GitHub forbid creation of this tag if a commit exists with that prefix?

What if a Git commit is created that matches an existing tag? Does Git have a procedure to make a new one? e.g. imagine I pregenerate a few million 8 character tags and wait for a collision

btw: Even if you specify the full commit SHA, this can still be attacked; there have been pre-image attacks against Git commit hashes in the past. At least for older versions of Git, the algorithm was Sha1. Maybe that’s changed but an attacker could always construct a malicious repository with intentionally weak hashes with the intent of later swapping one of them. (But at that point they may as well just push the malicious code in the first place.)

anamexis

What is the attack exactly? Only full commit SHAs are valid to reference a commit by SHA. GitHub disallows tags and branch names that could collide with a full commit SHA. There is never any collision between commit SHAs and tags.

immibis

It's still SHA-1 by the way, but they included counter-cryptanalysis to reject objects that appear to be one side of a collision using known techniques.

password4321

So just so I'm clear based on what you've mentioned, even the policy prohibiting 40 hex character tags isn't doing anything to stop a tage the same as the short commit ID?

Also, per this comment on a previous discussion on this incident at https://news.ycombinator.com/item?id=43367987#43369710:

> the real renovate bot immediately took the exfiltration commit from the fake renovate bot and started auto-merging it (updating full SHA1 references)

mmmaantu

SHA pinning won't necessarily help if the dependency you are pinning doesn't pin its own dependencies! You still get stuff pulled via vulnerable tags etc. How long till we get this https://github.com/github/roadmap/issues/592 ...

bracketfocus

https://github.com/features/preview/immutable-actions

They are actually releasing this very soon. I’ve seen some of my workflows use an immutable OCI image for some of GH’s actions like actions/checkout.

sepositus

Yes, this is a crucial distinction to make. The fact of the matter is that you have to treat GitHub Actions like a compromised system. Sure, there's not a ton of steps you can take for protecting builds if it's your primary builder, but you can for example not hook up an AWS account with full admin privileges to it (which I've seen more times than I would have like to).

sureIy

Isn't that wrong? I think you have to pre-bundle your actions, it won't do an npm install.

mikepurvis

I set up this recently at a new company and did yarn + ncc to build a compiled js out of typescript. It was a bit hairy as a novice, but ended up working fine.

That protects from npm supply chain stuff, but obviously third-party includes like docker/build-push-action are still a risk.

thenaturalist

Thanks for highlighting this open issue.

The fact they've been stalling this for a good 2.5 years is... insane??

dietrichepp

I just started using GitHub Actions for a personal project, and as you do, I trawled HN for opinions on how to use it.

At first I built a workflow out of steps published on GitHub. Use ilammy/mms-dev-cmd, lukka/get-cmake, lukka/run-vcpkg, all to build a project with CMake for Windows targets. Of course I referred to actions by SHA like you should

   uses: ilammy/msvc-dev-cmd@0b201ec74fa43914dc39ae48a89fd1d8cb592756
But one comment stuck with me. Something like, “You should just run your own code on GitHub Actions, rather than piecing it together from publicly available actions.” That made a lot of sense. I ended up writing a driver program for my personal project’s CI builds. One job builds the driver program, and then the next job runs the driver program to do the entire build.

I wouldn’t do this if I were getting paid for it… it’s more time-consuming. But it means that I am only minimally tied to GitHub actions. I can run the build driver from my own computer easily enough.

huijzer

> I ended up writing a driver program for my personal project’s CI builds. One job builds the driver program, and then the next job runs the driver program to do the entire build.

Yes things like that have been discussed before on HN. Also for example use a justfile (or something similar) and then call that from inside the Action to reduce vendor lock-in.

timewizard

I use Actions merely as a way to trigger a custom Webhook. Then I do everything on the server that receives the hook with my own code. I hate YAML that much.

dietrichepp

What I want is something like “please run this command on a server somewhere when event X happens”. Seems like the options are along the lines of:

1. SaaS CI/CD products, like GitHub Actions,

2. Run your own Jenkins cluster,

3. Figure out how to orchestrate cloud resources to do this for you.

Maybe there are easy options that I’m missing. I don’t really want to create docker containers just to build some program I’m working on.

CrimsonRain

Listen to a port from a server. Do a post with an API key. Then run your bash script there.

Run GitHub actions self hosted (takes 2 mins to setup)

Just ssh in and run it.

So many options.

timewizard

That's effectively what we are doing. The webhook receives any "custom properties" you have defined on your repo, the ssh url, and critically, the name of the Action that was run. The receiving server can use all of this to select the appropriate pipeline. Our build server is not containerized.

arccy

why not just use regular GitHub webhooks...

timewizard

We do, but you can only trigger those on predefined events, and we want our release manager to be independent of any push or pull mechanisms on the repo. You can also run actions from the github web ui which makes them available even to non technical managers.

Our Action has a single step, it has an "if: false" declaration so it never runs, and no runners are engaged. This immediately completes and fires off a "workflow_job" webhook which triggers the build server to act.

bdcravens

Github Actions is definitely a vector for abuse.

I was looking at Seleniumbase recently, and they tell you that you can use Github Actions for web scraping to bypass a lot of blocks (apparently Github Actions use a residential IP-space)

https://seleniumbase.com/new-video-unlimited-free-web-scrapi...

Marsymars

This seems like a wild thing for a third-party project to promote. The intention of GitHub Actions to run CI/CD and other repository-related tasks. You’d never see, for instance, Adobe promoting, via YouTube, “unlimited free web OCR with Adobe CLI on GitHub Actions!”

I’ve never heard of Seleniumbase, but this makes them look like a rinky-dink project.

dylan604

That's the whole point of hacking is to use something in a way unintended by its maker. That could be for something cool/interesting, or it could be for something nefarious. Nobody ever thought a coffee maker should run Doom, but they do. Not sure if there's a morality clause type of dis-qualifier for a Show HN, but there's a lot of people that would be interested in seeing how something benign was used for a different purpose. Especially if if saved them money/compute/time/resources/etc.

Intralexical

Yep. And "hackers" who apply that mindset to abusing publicly shared resources are why the rest of us are going to get DRM on our own, private coffee makers.

bastardoperator

I've found this to be a problem that a lot CI providers suffer from. They allow extension via third party code which is awesome, people write useful code, a lot of that useful code doesn't get maintained properly or ever, rots, and eventually everyone has a security issue.

You can also see the GitHub IP space here, I don't think it's "residential", unless that terminology includes azure and aws?: https://api.github.com/meta

bdcravens

I'm not sure. Perhaps when you call a browser it's going through another network? I haven't tested this for myself, only going off of of what was reported by that project.

bastardoperator

Maybe it was a self hosted runner? I run those locally all the time.

bri3d

They don't use a residential IP space; they use Azure Data Center (which, being less popular, isn't blocked as often as for example EC2).

meltyness

Network enabled compute is definitely an unusual free lunch, but I suppose the trade off is handing out free source code.

spaceywilly

This does not bode well for genetic AI

INTPenis

I always felt that Gitlab CI was a lot more understandable. But in Gitlab CI, just as in Github Actions, you're usually running some container. And aside from the container you're also running some globally defined actions.

That's the most obfuscated part for me, the globally defined actions that can belong to any organisation in Github.

In Gitlab it was at most a globally defined git repo with templates, but you could somehow understand it better.

dqh

I've never liked the idea of community actions in the critical build path, so I use official actions/* when I can, and otherwise use actions/github-script to invoke the GitHub API via inline JavaScript when I can't.

donatj

> At a glance, this looks like an immutable reference to an already-released “version 2” of this action, but actually this is a mutable Git tag. If somebody changes the v2 tag in the tj-actions/changed-files repo to point to a different commit, this action will run different code the next time it runs.

The worst part of this is that this is BY DESIGN.

I maintain a small handful of actions. You are expected to, as an action maintainer DELETE and RETAG your major versions when you release a new minor or patch version. That is to say for instance your v2 tag should point to the same commit as your latest 2.x.x tag.

Not everyone does this mind you, but this is the default and the expected way of operating.

I was frankly kind of taken aback when I learned this. I know for a fact documentation of this used to exist, but I am failing to find it currently.

You can see GitHub themselves however doing exactly this here, with the v4 and v4.2.2 tags matching here (as of today, v4 will move in future)

https://github.com/actions/checkout/tags

sureIy

This is probably the dumbest design decision of GitHub Actions. You'd think that the biggest git platform would know better than asking you to force push every release. They should have just used branches, because that's exactly what they're for. Or they should have resolved the right tag themselves, like npm does.

CrimsonRain

I only use a handful of official github and docker actions. This behavior is something that I (and many others) want in that case.

Here @v4 is very similar to how you'd tag docker image v4-latest.

Ultimately, it should be a choice. Like you can do with package.json. Pin an exact, allow patch updates, allow minor updates or always latest etc.

Joker_vD

> Tags vs commit IDs is a tradeoff between convenience and security. Specifying an exact commit ID means the code won’t change unexpectedly, but tags are easier to read and compare.

Imagine if we could specify both the tag and its commit, and the runner would check, at run-time, whether the specified tag is still pointing to the specified commit. This would essentially "lock" the dependency. Although storing such "locks" inline would probably be a bit too ugly, maybe we could instead collect them all and store them in a separate "file of locks", so to speak. Does anyone know if something like this has been tried before or am I just making up stupid stuff?

ph3t

GitHub’s dependency graph is supposed to give us this kind of visibility without any custom scripting, but from my experience it’s pretty spotty and often misses dependencies entirely.

Also, the script from the article doesn’t cover transitive GitHub Actions dependencies. So if a third-party action you’re using relies on a vulnerable action internally, it won’t catch that.

password4321

This article appears to be in response to the linked Tj-actions/changed-files GitHub Action Compromised – used by over 23K repos discussed at https://news.ycombinator.com/item?id=43367987 10 days ago; not a duplicate as it discusses a detection tool but perhaps it rhymes.

TheRealPomax

Was this an auto-generated comment? Because yes, of course it's related, it's someone doing their own investigations based on the news.

password4321

I tried to point out that though the topic is a duplicate, the link does add additional value. Do you not think the previous discussion is worth linking? Yeesh

fergie

Right, but lets get real here: GitHub Actions is fundamentally insecure. You are blindly trusting upstream libraries and GitHub itself to respect and protect your secrets.