Source code for the X recommendation algorithm
141 comments
·September 9, 2025Scene_Cast2
Just like with their last release, they only released the architecture and not the weights. It may be useful for analyzing the system if you're a competitor (but from my last dive into it, it seemed like a strict subset of fancier, industry-leading rec systems), or perhaps getting into rec / retrieval systems as a newcomer.
However, this gives roughly zero insight into how Twitter's feed behaves.
barbazoo
Not only no weights. Not sure what people's expectations are but a lot of the time this isn't even valid code with all the redaction they did [1]. I'm confused as to who this is for, this surely isn't the repo they're working on, is it?
[1] https://github.com/twitter/the-algorithm/blob/main/trust_and...
Raed667
This is 100% for headlines and Musk to be able to say "we're open" during interviews. Its actual usefulness is not the point
Aurornis
When they "open sourced" the Tesla Roadster the website only had a couple of mostly useless files. Discussion at the time https://news.ycombinator.com/item?id=38383099
Despite not containing more than a few random files, there were headlines everywhere about the "Open Source Tesla Roadster". There were countless comments, Tweets, and posts about how amazing it was that the Roadster was now open source.
None of the people reporting on it or praising it actually looked at the files and realized you couldn't actually build anything other than the HVAC control board for the car.
pbasista
The reporters should be getting down to the point and asking Elon Musk about the practical usefulness of such a heavily redacted public release.
nextaccountic
Are you talking about this?
wandb_key = ...
wandb.login(...)
It's rather weird that they would add keys to the source code like this, rather than reading from the environment or some secrets service. Rather than redacting the source, they should refactor to remove the keys from the sourcebarbazoo
One example, that's right. Another one:
train_query = f"""
SELECT
{{feature_names}},
{",".join(labels)},
...
"""
and right at the top: cat_names = [
...
]
mvdtnz
There's no way you got to this bit without skipping over multiple actual redactions, like SQL queries with all of the details replaced with ellipsis. Why are you cherry-picking one innocent instance when you know exactly what the parent comment is talking about?
Levitating
what is your footnote referring to exactly?
nativeit
I know when I think “open source”, I am always thinking “heavily redacted”.
/s
Gabrys1
It'd assume that weights are changing constantly so they'd need to open source a service tweaking the weights in real time rather than the weights themselves...
dotancohen
They could publish a snapshot of any point in time. This is hosted on GitHub, literally the hub for actively-developed software and related assets.
Kaethar
Not an ML expert, but is it feasible to train the weights using the actual Twitter feed as an oracle?
minimaxir
No, even if you somehow were able to download the corpus of all public X posts. There are many hidden signals that are feature engineered in good recsys, and the stripped-down algo won't be able to replicate them.
sieabahlpark
[dead]
MiguelHudnandez
It would cost a fortune in API calls, so it's not practical for anyone except internally at corporate.
bpavuk
well, Bluesky and Mastodon posts would suffice, but it's still useless because of how redacted the release is
paulpauper
the criteria for deciding which posts in comments or feed are spam or should be otherwise be suppressed are unsurprisingly also hidden . It's known that blue checkmark accounts rank above non-verified ones for comments, but I dunno about feed visibiblity.
gyanchawdhary
For all its flaws .. it’s still a step up from how Parag and co used to run twitter
jordanscales
Unfortunately, this [0] cancels out everything ten-fold. The owner of the website is boosting the content of himself and the people he supports. This did not happen in the old twitter - not even close.
null
snapcaster
Why? I've never been a twitter user
gyanchawdhary
Post Musk Twitter is amazing. It lets you see how stories, opinions that you support or don’t are attacked from all sides and Community noted / @grok fact checked … a lot of UX changes too .. pre Musk, the moderation / banning was biased and arbitrary (who is watching the watchers?) .. my personal fav was to see the special tick removed from journalists ..
amelius
There might be some value if someone can show that the feed mis-behaves for some selection of weights.
Scene_Cast2
Nope. Every single system like that will misbehave if given a bad set of weights, or even a random set of weights. I'd go as far as saying that even with "good" weights, it's likely to have some sigma of misbehavior.
jsheard
RIP author_is_elon, we hardly knew ye.
uyzstvqs
The file in question is now here: https://github.com/twitter/the-algorithm/blob/main/home-mixe...
author_is_elon, author_is_democrat, and author_is_republican are in fact gone. Now there is grok_politics_neutral, grok_politics_left, and grok_politics_right. This is in addition to a whole group of other categories, such as grok_category_sports and grok_category_music. All are based on annotations by Grok.
Importantly, this file is not used for recommendations. Everything in this file is only used for "metrics tracking purposes to measure how often we serve posts with various attributes." This would also have applied to author_is_elon.
null
0points
Oh my god lolollol
author_is_elon
author_is_power_user
author_is_democrat
author_is_republican
echelon
Republican, Democrat, and Elon.
Wow.
SXX
South Park: The Game level of irony.
hereme888
Rep, Dem, and "America Party".
null
ivape
Is this real? We accept that the algorithm may link you abstractly with other people, but I didn’t think they were literally labeling on this level. If you just say “we look for what’s similar and leave it at that”, then there’s much less liability.
This is political targeting. This guy was one of the biggest political donors, how can this fly?
burnte
Yes, he really had twitter change their code to push his tweets more.
0points
> This guy was one of the biggest political donors, how can this fly?
The system is rigged. Haven't you noticed yet?
frabcus
Looks pretty real:
https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
When this started it really put me off X - I'd have tolerated, and almost liked the idea, of a freedom of speeech place. But a place that boosts its owners posts... Nope.
I'm out - it's such a big personal diss of me, I'm not interested any more.
bongodongobob
You do realize people officially register as party members right? I have no idea why this upsets you. It's just categorization. I fucking hope my feeds do this, I do not want to see maga trash.
openquery
I've always wondered - how can I as a non X engineer be sure that the code on GH is actually deployed on their servers?
ml-anon
It’s not. The last “algorithm” release was a random grab bag of code which existed in some of the Twitter repo that might have been tangentially related to recommendations/feed.
Source: worked at Twitter in ML/recsys.
TheAceOfHearts
Anon, when I was looking through this source dump I saw a huge range of timeouts used in various services, do you know if there's any writeup or explanation as to how the engineering team settled on those values?
anonym29
False, this is definitely production code.
Source: I work at Twitter.
jibal
"..." all over the place in 2 year old code is production code?
And people who work at X don't say they work at Twitter.
3np
This is not believable. It's not syntactically valid Python.
https://github.com/twitter/the-algorithm/blob/c54bec0d4e029f...
majewsky
This does not contradict what GP said.
kklisura
~65k lines added, ~3k removed in span of more than 2 years. Do you guys do anything there?
jsheard
Even if this is the actual production code at this very second, it won't match prod for long if they continue this pattern of only dropping an update every two years or so.
close04
Honest question. Would you even dare to say you work at Twitter and then spill the beans on some very public lie or misdirection? It’s trivial to match your writing style between your HN comments and your work emails to identify you. Musk is famously a very petty, bitter, and vindictive person with an easy to bruise ego.
I don’t have any knowledge of the reality inside Twitter but I also have no reason to believe the company would be transparent given the many past controversies, or that any one employee would be able to look at this code which has obvious redactions and say “everything else is definitely 100% prod” and not exactly what GP suggested.
GuinansEyebrows
> Source: I work at Twitter.
Please stop
random3
I don’t think that’s the point of open sourcing things, in general
openquery
I agree in general it isn't. But in this case Musk claimed that was the point of open-sourcing the algorithm. Transparency on what they are or are not suppressing.
cma
When Tesla "open sourced" their patents, they required companies taking them up on it to, not reciprocally, not copy their "designs". So you get access to their patents in exchange for vague restrictions broader than the patent or copyright system.
random3
Oh, I see. Well, purely on his claim:bs ratio, I'd too take than with a grain of salt :)
h1fra
you can't, and it's 100% sure it's not this code running in prod
jjordan
100% huh? That's a bold statement with no supporting evidence.
viraptor
Already posted above: https://github.com/twitter/the-algorithm/blob/c54bec0d4e029f...
It's redacted.
jibal
Claiming that there's no supporting evidence is a bold (and obviously false) claim when the code is 2 years old and heavily redacted.
Pxtl
Sounds like the right tone when discussing a Musk project.
gchamonlive
How can you be sure that the machine code that was generated from your C source files actually match the behaviour encoded in them?
https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...
null
lambdaone
This is essentially useless without the training set or the weights. It's open-source theatre.
TheAceOfHearts
I browsed through it a bit and these are some details that raised questions or which I found interesting:
There's multiple mentions of slop, for example: SlopsAuthorScoreFeature in HomeTweetTypePredicates. That means everyone gets a slop score between 0 and 1, which makes me wish that it was openly visible and that people with a high slop score would get a little piggy emoji next to their name.
There's a CLIENT_TWEET_TAKE_SCREENSHOT action, which is likely used to keep track of when a (mobile, presumably) client takes a screenshot. I hadn't considered this before, but for a social media app where posts are often shared externally through screenshots, keeping track of this can give you another engagement metric.
They have two types of NSFW filters: isNsfw and isSoftNsfw, but I couldn't figure out the distinction. Other metadata types include: isGore, isViolent, isSpam, isLowQuality, isOcr.
In ContentFeatureAdapter there's a getTweetLengthType function which shows the range for each tweet type. This is used to set TWEET_LENGTH_TYPE elsewhere. I wonder if it would help your virality to switch up your tweet lengths to regularly put out tweets which hit every length category, or if it doesn't significantly affect your potential reach.
There's a hardcoded list of top-level Grok topics [0]. Just mildly interesting to see what they consider to be top-level categories. Anime has achieved a significant cultural victory by getting separated into its own major category.
The timeout values for different service request types varied a lot across the application, which makes me curious about how they settled on those numbers. This is a question I've pondered in the past but haven't gotten around to researching deeply.
[0] https://github.com/twitter/the-algorithm/blob/c54bec0d4e029f...
saagarjha
I assume soft NSFW is non-hardcore content
swaptr
Not sure if this is the right place to ask, but why does Bluesky feel so much faster to load and interact with compared to X? On the surface, both have similar interfaces and equally rich content, yet Bluesky consistently feels snappier and more responsive, even though it’s the newer platform.
recursivecaveat
Newer is generally faster, hasn't had time to accumulate cludges and dead ends from years of evolution. The bigger factor though I would imagine is not having 100 tons of analytics tracking everything.
pavel_lishin
No idea, but Twitter is functionally un-usable if you're not logged in.
cropcirclbureau
Iiirc, Twitter uses some mongrel version of React Native on the web. That's why you get the 3 seconds long loading thingie whenever you open a new tab.
palmfacehn
Could be from lower usage.
thewisenerd
sidenote: when do you think they're going to coax GitHub to transfer the `x` username?
OG_BME
I tried going through the latest diff, but there is so much boilerplate that I was nt able to find any real insights through skimming.
Has anyone found anything useful? Interesting needle-in-a-haystack problem for LLMs to try as well.
Previous discussions:
25-apr-2022 https://news.ycombinator.com/item?id=31160546 380 comments
31-mar-2023 https://news.ycombinator.com/item?id=35391433 1185 comments