Claude's memory architecture is the opposite of ChatGPT's

modeless

The link to the breakdown of ChatGPT's memory implementation is broken, the correct link is: https://www.shloked.com/writing/chatgpt-memory-bitter-lesson

This is really cool, I was wondering how memory had been implemented in ChatGPT. Very interesting to see the completely different approaches. It seems to me like Claude's is better suited for solving technical tasks while ChatGPT's is more suited to improving casual conversation (and, as pointed out, future ads integration).

I think it probably won't be too long before these language-based memories look antiquated. Someone is going to figure out how to store and retrieve memories in an encoded form that skips the language representation. It may actually be the final breakthrough we need for AGI.

ornornor

> It may actually be the final breakthrough we need for AGI.

I disagree. As I understand them, LLMs right now don’t understand concepts. They actually don’t understand, period. They’re basically Markov chains on steroids. There is no intelligence in this, and in my opinion actual intelligence is a prerequisite for AGI.

extr

They are changing the way memory works soon, too: https://x.com/btibor91/status/1965906564692541621

Edit: They apparently just announced this as well: https://www.anthropic.com/news/memory

qgin

I love Claude's memory implementation, but I turned memory off in ChatGPT. I use ChatGPT for too many disparate things and it was weird when it was making associations across things that aren't actually associated in my life.

kiitos

> Anthropic's more technical users inherently understand how LLMs work.

good (if superficial) post in general, but on this point specifically, emphatically: no, they do not -- no shade, nobody does, at least not in any meaningful sense

omnicognate

Understanding how they work in the sense that permits people to invent and implement them, that provides the exact steps to compute every weight and output, is not "meaningful"?

There is a lot left to learn about the behaviour of LLMs, higher-level conceptual models to be formed to help us predict specific outcomes and design improved systems, but this meme that "nobody knows how LLMs work" is out of control.

lukev

If we are going to create a binary of "understand LLMs" vs "do not understand LLMs", then one way to do it is as you describe; fully comprehending the latent space of the model so you know "why" it's giving a specific output.

This is likely (certainly?) impossible. So not a useful definition.

Meanwhile, I have observed a very clear binary among people I know who use LLMs; those who treat it like a magic AI oracle, vs those who understand the autoregressive model, the need for context engineering, the fact that outputs are somewhat random (hallucinations exist), setting the temperature correctly...

kingkawn

Thanks for this generalization, but of course there is a broad range of understanding how to improve usefulness and model tweaks across the meat populace.

SweetSoftPillow

If I remember correctly, Gemini also have this feature? Is it more like Claude or ChatGPT?

simonw

This post was great, very clear and well illustrated with examples.

richwater

ChatGPT is quickly approaching (perhaps bypassing?) the same concerns that parents, teachers, psychologists had with traditional social media. It's only going to get worse, but trying to stop the technological process will never work. I'm not sure what the answer is. That they're clearly optimizing for people's attention is more worrisome.

visarga

> That they're clearly optimizing for people's attention is more worrisome.

Running LLMs is expensive and we can swap models easily. The fight for attention is on, it acts like an evolutionary pressure on LLMs. We already had the sycophantic trend as a result of it.

WJW

Seems like either a huge evolutionary advantage for the people who can exploit the (sometimes hallucinating sometimes not) knowledge machine, or else a huge advantage for the people who are predisposed to avoid the attention sucking knowledge machine. The ecosystem shifted, adapt or be outcompeted.

HN

Claude's memory architecture is the opposite of ChatGPT's

Claude's memory architecture is the opposite of ChatGPT's