Congratulations on creating the one billionth repository on GitHub
140 comments
·June 11, 2025Aachen
ash_091
A friend of mine spent an entire workday figuring out how to ensure he created the millionth ticket in our help desk. Not sure how he cracked it in the end but we had a little team party to celebrate the achievement.
This was probably fifteen years ago. I feel like working in tech was more fun back then.
deruta
I was involved in the 99,999th and the 100,000th one in my FQA days.
We were being onboarded, they were just for demo and were promptly deleted. No one cared about the Cool Numbers.
jpsouth
In my first job I raised JIRA-1337 and was pretty chuffed with myself, being on a team of young, nerdy gamer type folk. My manager not so much, they wanted to raise it (for a meme?) but I was doing actual work rather than watching numbers go up so that was quite satisfying when it was a genuine defect.
darkwater
I wonder which is the latest ID today then...
caleblloyd
Awesome! Only a little over a billion more to go before GitHub’s very own OpenAPI Spec can start overflowing int32 on repositories too, just like it already does for workflows run IDs!
bartread
The company where I did my stint as CTO I turned up, noticed they were using 32-bit integers as primary keys on one of their key tables that already had 1.3 billion rows and, at the rate they were adding them, would overflow on primary key values within months… so we ran a fairly urgent project to upgrade the IDs to 64-bit to avoid the total meltdown that would have ensued otherwise.
darkwater
Lived that with a MySQL table. The best thing is that the table was eventually dismissed (long after the migration) because the whole data model around it was basically wrong.
gchamonlive
What are the challenges of such projects? How many people are usually involved? Does it incur downtimes or significant technical challenges for either the infrastructure or the codebase?
bartread
Changing the type of the column is no big deal per se, except on a massive table it’s a non-trivial operation, BUT you also have to change the type in everything that touches it, everywhere it’s assigned or copied, everywhere it’s sent over the wire and deserialized where assumptions might be made, any tests, and on, and on. And god help you if you’ve got stuff like int.MaxValue having a special meaning (we didn’t in this context, fortunately).
Our hosting environment at that time was a data centre so we were limited on storage, which complicated matters a bit. Like ideally you’d create a copy of the table but with a wider PK column and write to both tables, then migrate your reads, etc., but we couldn’t do that because the table was massive and we didn’t have enough space. Procuring more drives was possible but took sometimes weeks - no just dragging a slider in your cloud portal. And then of course you’d have to schedule a maintenance window for somebody to plug it in. It was absolutely archaic, especially when you consider this was late 2017/early 2018.
You need multiple environments so you can do thorough testing, which we barely had at that point, and because every major system component was impacted, we had to redeploy our entire platform. Also, because it was the PK column affected, we couldn’t do any kind of staged migration or rollback without the project becoming much more complex and taking a lot longer - time we didn’t have due to the rate at which we were consuming 32-bit integer values.
In the end it went off without a hitch, but pushing it live was still a bit of a white knuckle moment.
tengbretson
If you've written your services in JavaScript, going from i32 to i64 means your driver is probably going return it as a string (or a BigInt or some custom Decimal type), rather than the IEEE754 number you were getting before. This means you now need change your interfaces (both internal and public-facing) to a string or some other safely serializable representation. And if you are going to go through all that trouble, you may as well take the opportunity to just switch to some uuid strategy anyway.
The alternative is that you can monkey-patch the database driver to parse the i64 id as an IEEE754 number anyway and deal with this problem later when you overflow the JavaScript max safe integer size (2^53), except when that happens it will manifest in some really wacky ways, rather than the db just refusing to insert a new row.
roberttod
I remember such a project, and due to our large and aging TypeScript frontend projects it would have added a couple of weeks to adjust all the types affected. All IDs in many places deep in code caused thousands of errors from the mismatch which was a nightmare. I can't remember exactly why it was so tough to go through them all, but we were under intense time pressure.
To speed things up we decided to correct the ID types for the server response, which was key since they were generated from protobuf. But we kept everything using number type IDs everywhere else, even though they would actually be strings, which would not cause many issues because there ain't much reason to be doing numeric operations on an ID, except the odd sort function.
I remember the smirk on my face when I suggested it to my colleague and at the time we knew it was what made sense. It must have been one of the dumbest solutions I've ever thought of, but it allowed us to switch the type eventually to string as we changed code, instead of converting the entire repos at once. Such a Javascript memory that one :)
jiggawatts
Not the original commenter, but I've read through half a dozen post-mortems about this kind of thing. The answer is: yes. There's challenges and sometimes downtime and/or breaking changes are inevitable.
For one, if your IDs are approaching the 2^31 signed integer limit, then by definition, you have nearly two billion rows, which is a very big DB table! There are only a handful of systems that can handle any kind of change to that volume of data quickly. Everything you do to it will either need hours of downtime or careful orchestration of incremental/rolling changes. This issue tends to manifest first on the "biggest" and hence most important table in the business such as "sales entries" or "user comments". It's never some peripheral thing that nobody cares about.
Second, if you're using small integer IDs, that decision was probably motivated in part because you're using those integers as foreign keys and for making your secondary indexes more efficient. GUIDs are "simpler" in some ways but need 4x the data storage (assuming you're using a clustered database like MySQL or SQL Server). Even just the change from 32-bits to 64-bits doubles the size of the storage in a lot of places. For 2 billion rows, this is 8 GB more data minimum, but is almost certainly north of 100 GB across all tables and indexes.
Third, many database engines will refuse to establish foreign key constraints if the types don't match. This can force big-bang changes or very complex duplication of data during the migration phase.
Fourth, this is a breaking change to all of your APIs, both internal and external. Every ORM, REST endpoint, etc... will have to be updated with a new major version. There's a chance that all of your analytics, ETL jobs, etc... will also need to be touched.
Fun times.
cyberax
The same story happened inside Amazon.
neomantra
A couple weeks ago there was some Lua community issues because LuaRocks surpassed 65,535 packages.
There was a conflict between this and the LuaRocks implementation under LuaJIT [1] [2], inflicting pain on a narrow set of users as their CI/CD pipelines and personal workflows failed.
It was resolved pretty quick, but interesting!
[1] https://github.com/luarocks/luarocks/issues/1797
[2] https://github.com/openresty/docker-openresty/issues/276
JKCalhoun
I wish I were still at Apple. Probably most people here know that Apple uses an internal tool called "Radar" since, forever. Each "Radar" has an ID (bug #) associated with it.
Radars that were bug #1,000,000, etc. were kind of special. Unless someone screwed up (and let down the whole team) they were usually faux-Radars with lots of inside jokes, etc.
Pulling up one was enough since the Radar could reference other Radars ... and generally you would go down the rabbit hole at that point enjoying the ride.
I was a dumbass not to capture (heck, even print) a few of those when I had the opportunity.
xmprt
> I was a dumbass not to capture (heck, even print) a few of those when I had the opportunity.
On the other hand, given how Apple deals with confidential data, you probably wouldn't want to be caught exfiltrating internal documents however benign they are.
bjackman
At Google, the monorepo VCS has monotonic IDs like this for changes. Unfortunately a few years ago when approaching some round number, the system was DOS'd by people running scripts trying to snag the ID. So now it skips IDs in the vicinity of big round numbers :(
I think there's probably a lesson in there about schema design...
msarnoff
#SnakesOnARadar
8organicbits
While we are doing cool GitHub repo IDs, the first is here:
mkagenius
The millionth one is "vim-scripts/nexus.vim"
The 1000th is missing.
dgellow
And the first commit: https://github.com/mojombo/grit/commit/634396b2f541a9f2d58b0...
CGamesPlay
Probably created via a script that just repeatedly checked https://api.github.com/repositories/999999999 until it showed up, and then created a new repository. Since repositories can be modified, could have even given it some buffer and created a bunch of repos, just delete the ones that don't get the right number. [append] Looking at the author's other repo created yesterday, I'm betting "yep" was supposed to be the magic number, and "shit" was an admission of missing the mark.
Does anyone remember D666666 from Facebook? It was a massive codemod; the author used a technique similar to this one to get that particular number.
notfed
Or...not. Why are you assuming this guy purposely grabbed the repo?
CGamesPlay
Mostly just to share an approach to solving the "problem" of getting memorable numbers from a pool of sequential IDs.
But given that this user doesn't have activity very often, and created two repositories as the number was getting close, it feels likely that it was deliberate. I could be wrong!
topherPedersen
You solved the mystery!
umanwizard
On a serious note, I'm a bit surprised that GitHub makes it trivial to compute the rate at which new repositories are created. Isn't that kind of information usually a corporate secret?
cheschire
When your moat is a billion wide, you tend to walk around in your underwear a bit more I guess.
90s_dev
Excellent Diogenes quote reference.
sebastiennight
Do you mean a specific quote here? I couldn't find the reference.
NooneAtAll3
unless you're youtube?
raincole
Is there any reason for GitHub to hide this information though? How could it be used against them?
(I understand many companies default to not expose any information unless forced otherwise.)
xboxnolifes
Companies usually hide this type of information so competitors have a harder time determining if they are growing/shrinking/neutral.
blitzar
Companies usually hide this type of information so VC's / stonk investors will give them more money.
dietr1ch
And engineers thinking of scale would usually try to steer away from a sequential id because of self inflicted global locking and hot spots.
toast0
The rate of creation is like meh, but being able to enumerate all of the repos might be problematic, following new repos and scanning them for leaked credentials could be a negative... but github may have a feed of new repos anyway?
Also, having a sequence implies at least a global lock on that sequence during repo creation. Repo creation could otherwise be a scoped lock. OTOH, it's not necessarily handled that way --- they could hand out ranges of sequences to different servers/regions and the repo id may not be actually sequential.
progval
> but github may have a feed of new repos anyway?
Yes: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-... (you can filter to only show repositories created since a given date).
4hg4ufxhy
What would be the issue with global lock? I think repo creation is a very rare event when measured in computer time.
colechristensen
>following new repos and scanning them for leaked credentials could be a negative
People do this. GitHub started doing it too so now you get a nice email from them first instead of another kind of surprise.
beaugunderson
and you can find the latest ID incredibly quickly using binary search! (I used to track a bunch of websites' growth this way)
paulddraper
You can see the rate of creation of new users too.
Which is arguably even more interesting…
Cyphase
I'm wondering if AasishPokhrel created this repo for the purpose of being the billionth.
paxys
It's pretty easy to game this. Just keep creating repos till you hit # one billion and remove the old ones. Their API makes it trivial. The only issue will be rate limits, and other people simultaneously creating repos, so it's a matter of luck.
GodelNumbering
There was a guy who got fired from Meta for creating excessive automated diffs in pursuit of a certain magic number
paxys
I hope PR #80085 was worth it.
fancyswimtime
69 doesn't seem excessive
nithssh
Sounds interesting, is there anything online about this?
recursive
I don't believe they will renumber the old ones. Also, it can't be trivial, since two people can try this, and only one can win.
Macha
The one who lost doesn't get discussed in this thread.
handfuloflight
There is always one trillion to look forward to!
null
maniacalhack0r
AasishPokhrel made 2 repos yday - shit and yep. no activity between may 17th and june 10th.
i have no idea if its possible to calculate the rate at which repos are being created and time your repo creation to hit vanity numbers
kylehotchkiss
I think he’s in university for software development in Nepal, and it’s really touching that a milestone could go so deeply into the world. Hopefully he has a big spot for this on his resume and can find a great career in development!
netsharc
I don't get why this needs a big spot in his resume, and why it should lead to a great career. A company/hiring manager that thinks being lucky to hit a magic number on some system has any relevance to work, I'd rate as very insane...
mkagenius
I will be really sus of someone's intelligence if they mentioned this as an achievement. It's fine as a joke, though.
notfed
I find a bit of humor in the fact that this is completely unrequited attention. There's even a chance the guy is oblivious.
joshdavham
I highly doubt it, but that does sound possible.
Sohcahtoa82
The repo seems to have gotten renamed and now redirects to https://github.com/AasishPokhrel/repository/
Lame. :-(
Sohcahtoa82
It was renamed back! :-D
Aachen
Makes me wonder how many repositories exist in general, from all the local Forgejo and Gitlab servers. Heck, include Subversion and Mercurial and git's other friends (and foes!)
Did anyone make a search engine for these yet, so we'd be able to get an estimate by searching for the word "a" or so?
(This always seemed like the big upside of centralised GitHub to me: people can actually find your code. I've been thinking of making a search since MS bought GH but didn't think I could do the marketing aspects and so it would be a waste of effort and I never did it. Recently I was considering whether this would be worth revisiting, with the various projects I'm putting on Codeberg, but maybe someone beat me to the punch)
mdaniel
Well, based on the API enumeration mentioned in sibling comments, surely one doesn't have to estimate
https://docs.gitlab.com/api/projects/#list-all-projects (for dumb reasons it seems GL calls them Projects, not Repositories)
https://codeberg.org/api/swagger#/repository/repoGetByID (that was linked to by the Forgejo.org site, so presumably it's the same for it and Codeberg) and its friend https://gitea.com/api/swagger#/repository/repoGetByID
Heptapod is a "friendly fork" of GitLab CE so its API works the same: https://heptapod.net/pages/faq#api-hgrc
and then I'd guess one would need to index the per-project GitLab instances: Gnome, GNU (if they ever open theirs back up), whatever's going on with Savannah, probably Sourceforge, maybe sourcehut (assuming he doesn't have some political reason to block you), etc
If I won the lottery, I'd probably bankroll a sourcegraph instance (from back when they were Apache) across everything I could get my hands upon, and donate snapshots of it to the Internet Archive
progval
At Software Heritage, we listed 380M public repositories, 280M of which are on Github: https://archive.softwareheritage.org/
Repository search is pretty limited so far: only full-text search on URLs or in a small list of metadata files like package.json.
zaps
Respect to GitHub for committing to the bit
null
Reminds me of the 100 millionth OpenStreetMap changeset (commit). A few people, myself included, were casually trying for it but in the end it went to someone who wasn't trying and just busy mapping Africa! Much more wholesome, seeing it with hindsight. This person was also previously nominated for an OSM award. I guess it helps that openstreetmap doesn't really allow for creating crap, because it's all live in production, and that's how the Nth commit is way more likely to be someone's random whim? Either way, a fun achievement for Github :)
In case anyone cares to read more about the OSM milestone, the official blog entry: https://blog.openstreetmap.org/2021/02/25/100-million-edits-... My write-up of changeset activity around the event: https://www.openstreetmap.org/user/LucGommans/diary/395954