Congratulations on creating the one billionth repository on GitHub

140 comments

·June 11, 2025

Aachen

Reminds me of the 100 millionth OpenStreetMap changeset (commit). A few people, myself included, were casually trying for it but in the end it went to someone who wasn't trying and just busy mapping Africa! Much more wholesome, seeing it with hindsight. This person was also previously nominated for an OSM award. I guess it helps that openstreetmap doesn't really allow for creating crap, because it's all live in production, and that's how the Nth commit is way more likely to be someone's random whim? Either way, a fun achievement for Github :)

In case anyone cares to read more about the OSM milestone, the official blog entry: https://blog.openstreetmap.org/2021/02/25/100-million-edits-... My write-up of changeset activity around the event: https://www.openstreetmap.org/user/LucGommans/diary/395954

ash_091

A friend of mine spent an entire workday figuring out how to ensure he created the millionth ticket in our help desk. Not sure how he cracked it in the end but we had a little team party to celebrate the achievement.

This was probably fifteen years ago. I feel like working in tech was more fun back then.

deruta

I was involved in the 99,999th and the 100,000th one in my FQA days.

We were being onboarded, they were just for demo and were promptly deleted. No one cared about the Cool Numbers.

jpsouth

In my first job I raised JIRA-1337 and was pretty chuffed with myself, being on a team of young, nerdy gamer type folk. My manager not so much, they wanted to raise it (for a meme?) but I was doing actual work rather than watching numbers go up so that was quite satisfying when it was a genuine defect.

darkwater

I wonder which is the latest ID today then...

chneu

Your kind of comment is exactly why HN still rules. What a fun story. Thanks for sharing

Aachen

Aww, thanks! I wasn't sure if I should go off-topic this much so I'm happy to hear this!

caleblloyd

Awesome! Only a little over a billion more to go before GitHub’s very own OpenAPI Spec can start overflowing int32 on repositories too, just like it already does for workflows run IDs!

https://github.com/github/rest-api-description/issues/4511

bartread

The company where I did my stint as CTO I turned up, noticed they were using 32-bit integers as primary keys on one of their key tables that already had 1.3 billion rows and, at the rate they were adding them, would overflow on primary key values within months… so we ran a fairly urgent project to upgrade the IDs to 64-bit to avoid the total meltdown that would have ensued otherwise.

darkwater

Lived that with a MySQL table. The best thing is that the table was eventually dismissed (long after the migration) because the whole data model around it was basically wrong.

gchamonlive

What are the challenges of such projects? How many people are usually involved? Does it incur downtimes or significant technical challenges for either the infrastructure or the codebase?

bartread

Changing the type of the column is no big deal per se, except on a massive table it’s a non-trivial operation, BUT you also have to change the type in everything that touches it, everywhere it’s assigned or copied, everywhere it’s sent over the wire and deserialized where assumptions might be made, any tests, and on, and on. And god help you if you’ve got stuff like int.MaxValue having a special meaning (we didn’t in this context, fortunately).

Our hosting environment at that time was a data centre so we were limited on storage, which complicated matters a bit. Like ideally you’d create a copy of the table but with a wider PK column and write to both tables, then migrate your reads, etc., but we couldn’t do that because the table was massive and we didn’t have enough space. Procuring more drives was possible but took sometimes weeks - no just dragging a slider in your cloud portal. And then of course you’d have to schedule a maintenance window for somebody to plug it in. It was absolutely archaic, especially when you consider this was late 2017/early 2018.

You need multiple environments so you can do thorough testing, which we barely had at that point, and because every major system component was impacted, we had to redeploy our entire platform. Also, because it was the PK column affected, we couldn’t do any kind of staged migration or rollback without the project becoming much more complex and taking a lot longer - time we didn’t have due to the rate at which we were consuming 32-bit integer values.

In the end it went off without a hitch, but pushing it live was still a bit of a white knuckle moment.

tengbretson

If you've written your services in JavaScript, going from i32 to i64 means your driver is probably going return it as a string (or a BigInt or some custom Decimal type), rather than the IEEE754 number you were getting before. This means you now need change your interfaces (both internal and public-facing) to a string or some other safely serializable representation. And if you are going to go through all that trouble, you may as well take the opportunity to just switch to some uuid strategy anyway.

The alternative is that you can monkey-patch the database driver to parse the i64 id as an IEEE754 number anyway and deal with this problem later when you overflow the JavaScript max safe integer size (2^53), except when that happens it will manifest in some really wacky ways, rather than the db just refusing to insert a new row.

roberttod

I remember such a project, and due to our large and aging TypeScript frontend projects it would have added a couple of weeks to adjust all the types affected. All IDs in many places deep in code caused thousands of errors from the mismatch which was a nightmare. I can't remember exactly why it was so tough to go through them all, but we were under intense time pressure.

To speed things up we decided to correct the ID types for the server response, which was key since they were generated from protobuf. But we kept everything using number type IDs everywhere else, even though they would actually be strings, which would not cause many issues because there ain't much reason to be doing numeric operations on an ID, except the odd sort function.

I remember the smirk on my face when I suggested it to my colleague and at the time we knew it was what made sense. It must have been one of the dumbest solutions I've ever thought of, but it allowed us to switch the type eventually to string as we changed code, instead of converting the entire repos at once. Such a Javascript memory that one :)

jiggawatts

Not the original commenter, but I've read through half a dozen post-mortems about this kind of thing. The answer is: yes. There's challenges and sometimes downtime and/or breaking changes are inevitable.

For one, if your IDs are approaching the 2^31 signed integer limit, then by definition, you have nearly two billion rows, which is a very big DB table! There are only a handful of systems that can handle any kind of change to that volume of data quickly. Everything you do to it will either need hours of downtime or careful orchestration of incremental/rolling changes. This issue tends to manifest first on the "biggest" and hence most important table in the business such as "sales entries" or "user comments". It's never some peripheral thing that nobody cares about.

Second, if you're using small integer IDs, that decision was probably motivated in part because you're using those integers as foreign keys and for making your secondary indexes more efficient. GUIDs are "simpler" in some ways but need 4x the data storage (assuming you're using a clustered database like MySQL or SQL Server). Even just the change from 32-bits to 64-bits doubles the size of the storage in a lot of places. For 2 billion rows, this is 8 GB more data minimum, but is almost certainly north of 100 GB across all tables and indexes.

Third, many database engines will refuse to establish foreign key constraints if the types don't match. This can force big-bang changes or very complex duplication of data during the migration phase.

Fourth, this is a breaking change to all of your APIs, both internal and external. Every ORM, REST endpoint, etc... will have to be updated with a new major version. There's a chance that all of your analytics, ETL jobs, etc... will also need to be touched.

Fun times.

cyberax

The same story happened inside Amazon.

hobs

heh, that's happened at at least 5 companies I have worked at - go to check the database, find - currency as floats, hilarious indexes, integers gonna overflow, gigantic types with nothing in them.

rudasn

I bet you haven't seen indeces on decimals though! Fun times :)

neomantra

A couple weeks ago there was some Lua community issues because LuaRocks surpassed 65,535 packages.

There was a conflict between this and the LuaRocks implementation under LuaJIT [1] [2], inflicting pain on a narrow set of users as their CI/CD pipelines and personal workflows failed.

It was resolved pretty quick, but interesting!

[1] https://github.com/luarocks/luarocks/issues/1797

[2] https://github.com/openresty/docker-openresty/issues/276

JKCalhoun

I wish I were still at Apple. Probably most people here know that Apple uses an internal tool called "Radar" since, forever. Each "Radar" has an ID (bug #) associated with it.

Radars that were bug #1,000,000, etc. were kind of special. Unless someone screwed up (and let down the whole team) they were usually faux-Radars with lots of inside jokes, etc.

Pulling up one was enough since the Radar could reference other Radars ... and generally you would go down the rabbit hole at that point enjoying the ride.

I was a dumbass not to capture (heck, even print) a few of those when I had the opportunity.

xmprt

> I was a dumbass not to capture (heck, even print) a few of those when I had the opportunity.

On the other hand, given how Apple deals with confidential data, you probably wouldn't want to be caught exfiltrating internal documents however benign they are.

bjackman

At Google, the monorepo VCS has monotonic IDs like this for changes. Unfortunately a few years ago when approaching some round number, the system was DOS'd by people running scripts trying to snag the ID. So now it skips IDs in the vicinity of big round numbers :(

I think there's probably a lesson in there about schema design...

msarnoff

#SnakesOnARadar

8organicbits

While we are doing cool GitHub repo IDs, the first is here:

https://api.github.com/repositories/1

https://github.com/mojombo/grit

mkagenius

The millionth one is "vim-scripts/nexus.vim"

The 1000th is missing.

dgellow

And the first commit: https://github.com/mojombo/grit/commit/634396b2f541a9f2d58b0...

CGamesPlay

Probably created via a script that just repeatedly checked https://api.github.com/repositories/999999999 until it showed up, and then created a new repository. Since repositories can be modified, could have even given it some buffer and created a bunch of repos, just delete the ones that don't get the right number. [append] Looking at the author's other repo created yesterday, I'm betting "yep" was supposed to be the magic number, and "shit" was an admission of missing the mark.

Does anyone remember D666666 from Facebook? It was a massive codemod; the author used a technique similar to this one to get that particular number.

notfed

Or...not. Why are you assuming this guy purposely grabbed the repo?

CGamesPlay

Mostly just to share an approach to solving the "problem" of getting memorable numbers from a pool of sequential IDs.

But given that this user doesn't have activity very often, and created two repositories as the number was getting close, it feels likely that it was deliberate. I could be wrong!

topherPedersen

You solved the mystery!

umanwizard

On a serious note, I'm a bit surprised that GitHub makes it trivial to compute the rate at which new repositories are created. Isn't that kind of information usually a corporate secret?

cheschire

When your moat is a billion wide, you tend to walk around in your underwear a bit more I guess.

90s_dev

Excellent Diogenes quote reference.

sebastiennight

Do you mean a specific quote here? I couldn't find the reference.

NooneAtAll3

unless you're youtube?

raincole

Is there any reason for GitHub to hide this information though? How could it be used against them?

(I understand many companies default to not expose any information unless forced otherwise.)

xboxnolifes

Companies usually hide this type of information so competitors have a harder time determining if they are growing/shrinking/neutral.

blitzar

Companies usually hide this type of information so VC's / stonk investors will give them more money.

dietr1ch

And engineers thinking of scale would usually try to steer away from a sequential id because of self inflicted global locking and hot spots.

toast0

The rate of creation is like meh, but being able to enumerate all of the repos might be problematic, following new repos and scanning them for leaked credentials could be a negative... but github may have a feed of new repos anyway?

Also, having a sequence implies at least a global lock on that sequence during repo creation. Repo creation could otherwise be a scoped lock. OTOH, it's not necessarily handled that way --- they could hand out ranges of sequences to different servers/regions and the repo id may not be actually sequential.

progval

> but github may have a feed of new repos anyway?

Yes: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-... (you can filter to only show repositories created since a given date).

4hg4ufxhy

What would be the issue with global lock? I think repo creation is a very rare event when measured in computer time.

colechristensen

>following new repos and scanning them for leaked credentials could be a negative

People do this. GitHub started doing it too so now you get a nice email from them first instead of another kind of surprise.

beaugunderson

and you can find the latest ID incredibly quickly using binary search! (I used to track a bunch of websites' growth this way)

paulddraper

You can see the rate of creation of new users too.

Which is arguably even more interesting…

Cyphase

I'm wondering if AasishPokhrel created this repo for the purpose of being the billionth.

paxys

It's pretty easy to game this. Just keep creating repos till you hit # one billion and remove the old ones. Their API makes it trivial. The only issue will be rate limits, and other people simultaneously creating repos, so it's a matter of luck.

GodelNumbering

There was a guy who got fired from Meta for creating excessive automated diffs in pursuit of a certain magic number

paxys

I hope PR #80085 was worth it.

fancyswimtime

69 doesn't seem excessive

nithssh

Sounds interesting, is there anything online about this?

recursive

I don't believe they will renumber the old ones. Also, it can't be trivial, since two people can try this, and only one can win.

Macha

The one who lost doesn't get discussed in this thread.

handfuloflight

There is always one trillion to look forward to!

null

[deleted]

maniacalhack0r

AasishPokhrel made 2 repos yday - shit and yep. no activity between may 17th and june 10th.

i have no idea if its possible to calculate the rate at which repos are being created and time your repo creation to hit vanity numbers

kylehotchkiss

I think he’s in university for software development in Nepal, and it’s really touching that a milestone could go so deeply into the world. Hopefully he has a big spot for this on his resume and can find a great career in development!

netsharc

I don't get why this needs a big spot in his resume, and why it should lead to a great career. A company/hiring manager that thinks being lucky to hit a magic number on some system has any relevance to work, I'd rate as very insane...

mkagenius

I will be really sus of someone's intelligence if they mentioned this as an achievement. It's fine as a joke, though.

notfed

I find a bit of humor in the fact that this is completely unrequited attention. There's even a chance the guy is oblivious.

joshdavham

I highly doubt it, but that does sound possible.

Sohcahtoa82

The repo seems to have gotten renamed and now redirects to https://github.com/AasishPokhrel/repository/

Lame. :-(

Sohcahtoa82

It was renamed back! :-D

Aachen

Makes me wonder how many repositories exist in general, from all the local Forgejo and Gitlab servers. Heck, include Subversion and Mercurial and git's other friends (and foes!)

Did anyone make a search engine for these yet, so we'd be able to get an estimate by searching for the word "a" or so?

(This always seemed like the big upside of centralised GitHub to me: people can actually find your code. I've been thinking of making a search since MS bought GH but didn't think I could do the marketing aspects and so it would be a waste of effort and I never did it. Recently I was considering whether this would be worth revisiting, with the various projects I'm putting on Codeberg, but maybe someone beat me to the punch)

mdaniel

Well, based on the API enumeration mentioned in sibling comments, surely one doesn't have to estimate

https://docs.gitlab.com/api/projects/#list-all-projects (for dumb reasons it seems GL calls them Projects, not Repositories)

https://codeberg.org/api/swagger#/repository/repoGetByID (that was linked to by the Forgejo.org site, so presumably it's the same for it and Codeberg) and its friend https://gitea.com/api/swagger#/repository/repoGetByID

Heptapod is a "friendly fork" of GitLab CE so its API works the same: https://heptapod.net/pages/faq#api-hgrc

and then I'd guess one would need to index the per-project GitLab instances: Gnome, GNU (if they ever open theirs back up), whatever's going on with Savannah, probably Sourceforge, maybe sourcehut (assuming he doesn't have some political reason to block you), etc

If I won the lottery, I'd probably bankroll a sourcegraph instance (from back when they were Apache) across everything I could get my hands upon, and donate snapshots of it to the Internet Archive

progval

At Software Heritage, we listed 380M public repositories, 280M of which are on Github: https://archive.softwareheritage.org/

Repository search is pretty limited so far: only full-text search on URLs or in a small list of metadata files like package.json.

lbeckman314

> https://ohshitgit.com

hyperhopper

https://stevelosh.com/blog/2013/04/git-koans/

zaps

Respect to GitHub for committing to the bit

null

[deleted]