Google Titans architecture, helping AI have long-term memory
145 comments
·December 7, 2025okdood64
Palmik
DeepSeek and other Chinese companies. Not only do they publish research, they also put their resources where their mouth (research) is. They actually use it and prove it through their open models.
Most research coming out of big US labs is counter indicative of practical performance. If it worked (too) well in practice, it wouldn't have been published.
Some examples from DeepSeek:
abbycurtis33
[flagged]
CGMthrowaway
Is there evidence that DeepSeek was stolen from the US? Or is that just a talking point like "covid leaked from a lab in china" ?
pylotlight
which of the 5-10~ papers DS published were stolen exactly..?
mapmeld
Well it's cool that they released a paper, but at this point it's been 11 months and you can't download a Titans-architecture model code or weights anywhere. That would put a lot of companies up ahead of them (Meta's Llama, Qwen, DeepSeek). Closest you can get is an unofficial implementation of the paper https://github.com/lucidrains/titans-pytorch
alyxya
The hardest part about making a new architecture is that even if it is just better than transformers in every way, it’s very difficult to both prove a significant improvement at scale and gain traction. Until google puts in a lot of resources into training a scaled up version of this architecture, I believe there’s plenty of low hanging fruit with improving existing architectures such that it’ll always take the back seat.
p1esk
Until google puts in a lot of resources into training a scaled up version of this architecture
If Google is not willing to scale it up, then why would anyone else?
tyre
Google is large enough, well-funded enough, and the opportunity is great enough to run experiments.
You don't necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?
m101
Prove it beats models of different architectures trained under identical limited resources?
nickpsecurity
But, it's companies like Google that made tools like Jax and TPU's saying we can throw together models with cheap, easy scaling. Their paper's math is probably harder to put together than an alpha-level prototype which they need anyway.
So, I think they could default on doing it for small demonstrators.
UltraSane
Yes. The path dependence for current attention based LLMs is enormous.
root_axis
I don't think the comparison is valid. Releasing code and weights for an architecture that is widely known is a lot different than releasing research about an architecture that could mitigate fundamental problems that are common to all LLM products.
SilverSlash
The newer one is from late May: https://arxiv.org/abs/2505.23735
AugSun
Gemini 3 _is_ that architecture.
FpUser
I've read many very positive reviews about Gemini 3. I tried using it including Pro and to me it looks very inferior to ChatGPT. What was very interesting though was when I caught it bullshitting me I called its BS and Gemini expressed very human like behavior. It did try to weasel its way out, degenerated down to "true Scotsman" level but finally admitted that it was full of it. this is kind of impressive / scary.
informal007
I don't think model code is a big deal compared to the idea. If public can recognize the value of idea 11 months ago, they could implement the code quickly because there are so much smart engineers in AI field.
jstummbillig
If that is true, does it follow this idea does not actually have a lot of value?
mapmeld
Well we have the idea and the next best thing to official code, but if this was a big revelation where are all of the Titan models? If this were public, I think we'd have a few attempts at variants (all of the Mamba SSMs, etc.) and get a better sense if this is valuable or not.
innagadadavida
Just keep in mind it is performance review time for all the tech companies. Their promotion of these seems to be directly correlated with that event.
bluecoconut
Bytedance is publishing pretty aggressively.
Recently, my favorite from them was lumine: https://arxiv.org/abs/2511.08892
Here's their official page: https://seed.bytedance.com/en/research
Hendrikto
Meta is also being pretty open with their stuff. And recently most of the Chinese competition.
okdood64
Oh yes, I believe that's right. What's some frontier research Meta has shared in the last couple years?
markisus
Their VGGT, Dinov3, and segment anything models are pretty impressive.
robrenaud
Anything with Jason Weston as a coauthor tends to be pretty well written/readable and often has nice results.
colesantiago
Take a look at JEPAs (Video Joint Embedding Predictive Architecture), SAM (Segment Anything), etc for Meta's latest research.
UltraSane
Meta just published Segment Anything 3 and along with a truly amazing version that can create 3D models posing like the people in a photo. It is very impressive.
tonyhart7
"What's some frontier research Meta has shared in the last couple years?"
the current Meta outlook is embarassing tbh, the fact they have largest data of social media in planet and they cant even produce a decent model is quiet "scary" position
embedding-shape
> Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.
80% of the ecosystem is built on top of companies, groups and individuals publishing their research openly, not sure why Google would get more credit for this than others...
asim
It was not always like this. Google was very secretive in the early days. We did not start to see things until the GFS, BigTable and Borg (or Chubby) papers in 2006 timeframe.
okdood64
By 2006, Google was 8 years old. OpenAI is now 10.
vlovich123
Google publishes detailed papers of its architecture once it’s built the next version.
AI is a bit different.
rcpt
Page Rank
hiddencost
Every Google publication goes through multiple review. If anyone thinks the publication is a competitor risk it gets squashed.
It's very likely no one is using this architecture at Google for any production work loads. There are a lot of student researchers doing fun proof of concept papers, they're allowed to publish because it's good PR and it's good for their careers.
jeffbee
Underrated comment, IMHO. There is such a gulf between what Google does on its own part, and the papers and source code they publish, that I always think about their motivations before I read or adopt it. Think Borg vs. Kubernetes, Stubby vs. gRPC.
cubefox
The author is listed as a "student researcher", which might include a clause that students can publish their results.
Here is a bit more information about this program: https://www.google.com/about/careers/applications/jobs/resul...
doctor_blood
"At long last, we have created the Torment Nexus from the classic novel Don't Create the Torment Nexus"
(In Eclipse Phase, TITAN - the Total Information Tactical Awareness Network - mulched humanity when it went rogue.)
esperent
Hey it was my turn to post this quote today!
kgeist
>The model uses this internal error signal (the gradient) as a mathematical equivalent of saying, "This is unexpected and important!" This allows the Titans architecture to selectively update its long-term memory only with the most novel and context-breaking information
So one can break a model by consistently feeding it with random, highly improbable junk? Everything would be registered as a surprise and get stored, impacting future interactions
andy12_
This is an oversimplification of what Titans does. The model performs nested learned, where the model learns during inference, and during training the model weights learn _how and what_ to learn during inference. If the input contains junk of irrelevant information, the model most likely learned during training to assign low surprise query and key embeddings to those tokens, because learning those junk tokens would have hurt the overall ability of the model to predict subsequent next tokens (and thus, it would have had increased the training loss).
pmichaud
I’m guessing that this is the first thing they thought of and the problem only exists in the superficial gloss you’re responding to?
bethekidyouwant
In what world can you not always break the response of an AI by feeding it a bunch of random junk?
xnx
Indeed. In what world can you not break any tool when deliberately misusing it?
kgeist
I mean, currently LLMs are stateless and you can get rid of all the poisoned data by just starting a new conversation (context). And OP introduces "long-term memory" where junk will accumulate with time
soerxpso
I believe you're misunderstanding what the OP means about "long-term" memory. From what I can tell, it's not actively modifying the weights of the underlying model, it just "remembers" things from a high number of tokens into the past of its context. The point is that this allows it to remember something it read ~200 pages ago in a very long context window, not that it can remember something from one session into another clean session.
dmix
In something like Cursor if it messes something up your can click 'undo'. I'd imagine a small snapshot would only persisted to the memory if you keep it's output and even then it's mostly just a summary.
There's probably lots of small signals of "the user is happy with the output" plus the longer the history the more it will converge on the middle of being what you want. Including when the user says "don't do [x]" which override past stuff.
CooCooCaCha
I mean ideally AI would be resilient to junk, don't you think?
amarant
Ideally, you'd run your own instance of this, I think.
I can see a product where you purchase a model that has basic training, and then, using the features outlined in the paper, it learns on the fly from your usage.
I can also see there being a secondary market for specially trained models, long-term memory filled with some specific skill, done in some specific way. To make a silly example, imagine buying a licence to Torvald's OS coding assistant, ready to insult your prs before you even commit them!(And possibly help you write code in Torvald's style too)
This would of course require Linus to use the model enough for it to learn,I won't comment on the likelihood of that happening: it's just a silly example after all
vlovich123
Humans are pretty vulnerable to junk so I’m not sure.
idiotsecant
The is the start of what I always thought an AI should have - a limbic system. Humans don't store memory based on novelty, they store it based on emotional content. This is where I was afraid of the tiger, this is where I smelled delicious food, this was what it felt like when I was victorious in the hunt.
AI needs an internal emotional state because that's what drives attention and memory. AI needs to want something.
luckydata
That would be the biggest mistake anyone could do. I hope nobody goes down this route. AI "wanting" things are an enormous risk to alignment.
pixl97
I mean setting any neural net with a 'goal' is really just defining a want/need. You can't just encode the entire problemspace of reality, you have to give the application something to filter out.
idiotsecant
At some point I think we'll have to face the idea that any AI more intelligent than ourselves will by definition be able to evade our alignment tricks.
photochemsyn
This is no different from what happens to humans if they're locked into cult programming situations, they'll start believing and regurgitating all kinds of nonsense if their information stream is tightly curated,
Practically, for use with a codebase development effort, if the model remembers the original design decisions, the discussions about costs and benefits, then can remember all that much later in the process, it's going to start getting really good at thinking about what the next step is, or even to make decisions about when a major refactor is neede, etc.
voodooEntity
When i first read the papers for titans for me it was a "this will be a big step forward".
While i have no "AI" title or work in the respective AI industry, ive spend many years thinking about AI concepts, even long before the whole NN/LLM hype started.
Maybe because of that i was always really annoyed that LLM are called AI because in my years of thinking about how an actual "human like" thinking AI might work, the things an LLM does was far below what my minimum definition was.
But when i stumbled accross the Titans paper, while it still is not an "AI" as i would call it, from my POV its a massive step towarsd the right direction.
Sometimes i consider to write all my ideas/thoughts about AI down in my blog, but than i think nobody would care anyway since im not a known figure shrug - so if not to say "look i wrote it years ago!" theres no actual point in doing so i guess.
However - im looking forward to see titans in action, and i guess it will impress us all.
chr15m
Sharing it in your blog over a period of months or years is how you become a known figure eventually.
ocrow
A lot of LLM/AI writing these days can feel lost in the weeds – the specifics of very detailed techniques are interesting undoubtedly, but writing that steps back and looks at the big picture, informed by those details, could be very useful for people who want to think about where this all may be going.
Barbing
Are you curious to see whether a blog post shared here might gain any traction and perhaps some valuable feedback?
nasvay_factory
I wrote about that a while ago: https://paxamans.github.io/blog/titans/
moffkalast
Are there any pretrained models with this architecture yet or is it all still completely theoretical beyond Google's unverifiable claims? They published the original Titans paper last year and nobody seems to have built on the idea.
AlexCoventry
The fundamental ideas in the paper aren't particularly novel. They will probably work as advertised.
djrhails
https://github.com/lucidrains/titans-pytorch - is the only public iteration.
But no one appears to have taken the risk/time to properly validate it.
jonplackett
I’m curious if this makes them more or less susceptible to prompt injection?
On the one hand can learning on the job allow better training of what not to be influenced by, but on the other hand can an injected prompt have an even deeper effect on them long term.
null
null
dmix
> The Transformer architecture revolutionized sequence modeling with its introduction of attention, a mechanism by which models look back at earlier inputs to prioritize relevant input data
I've always wanted to read how something like Cursor manages memory. It seems to have developed a long history of all of prompts and understands both the codebase and what I'm building slightly more over time, causing less errors.
nubg
Very interesting. Is it correct for me to imagine it as some kind of "LoRA" thats continuously adapted as the model goes through its day?
If so, could there perhaps be a step where the LoRA is merged back into the main model?
That would be like sleeping :-)
robrenaud
I don't think that's a great analogy.
LoRAs tend to be adapters bolted onto to systems by people other than the system designers, and they are low rank factorizations.
There is nothing low rank or adapter here.
andy12_
Kind-of. You could theoretically use LoRA for this, in fact, but it probably wouldn't have enough capacity to make it a proper substitute of the attention mechanism. Instead a full MLP is trained as input chunks get processed.
6r17
Would this also allow to align it furthermore with user's prompt ? notably due to the surprise factor and how it may understand it ?
Alifatisk
Titans: Learning to Memorize at Test Time https://arxiv.org/abs/2501.00663
bentt
This just feels like a tremendous missing piece to LLMs. Looking forward to seeing it in action.
willangelo
Very very interesting, definitely a missing piece in current AI space.
Small typo where the text “Virtually all successful existing sequence models rely on mean squared error…” is repeated twice within the same paragraph. Happens to the best of us.
From the blog:
https://arxiv.org/abs/2501.00663
https://arxiv.org/pdf/2504.13173
Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.