Skip to content(if available)orjump to list(if available)

Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens

valine

I think it’s helpful to remember that language models are not producing tokens, they are producing a distribution of possible next tokens. Just because your sampler picks a sequence of tokens that contain incorrect reasoning doesn't mean a useful reasoning trace isn’t also contained within the latent space.

It’s a misconception that transformers reason in token space. Tokens don’t attend to other tokens. High dimensional latents attend to other high dimensional latents. The final layer of a decoder only transformer has full access to entire latent space of all previous latents, the same latents you can project into a distribution of next tokens.

woadwarrior01

> Just because your sampler picks a sequence of tokens that contain incorrect reasoning doesn't mean a useful reasoning trace isn’t also contained within the latent space.

That's essentially the core idea in Coconut[1][2], to keep the reasoning traces in a continuous space.

[1]: https://arxiv.org/abs/2412.06769

[2]: https://github.com/facebookresearch/coconut

modeless

> we then train models on noisy, corrupted traces which have no relation to the specific problem each is paired with, and find that not only does performance remain largely consistent with models trained on correct data, but in some cases can improve upon it

This is the interesting part. We've probably all had the experience where the model is going off the rails during the thinking process but somehow spits out the right answer at the end. Apparently the reasoning doesn't even need to be correct during training?

I guess it suggests to me that the reason CoT helps is that the model gets more compute to think internally, not that the words it produces are meaningful. I'm surprised nobody has come up with a good scheme for adaptive compute per token yet. Maybe we can skip CoT entirely.

trehalose

> We've probably all had the experience where the model is going off the rails during the thinking process but somehow spits out the right answer at the end. Apparently the reasoning doesn't even need to be correct during training?

How do we know if the reasoning was correct or not? Do we have more information about what the model was thinking besides just what it says it was thinking?

kelseyfrog

> I'm surprised nobody has come up with a good scheme for adaptive compute per token yet.

I have one, I just don't have the time or money to research it :(

golol

Post it let's go.

istjohn

Uh... hmmm... uhhh... ummm...

timhigins

This paper seems to focus on highly algorithmic/puzzle-like problems, which are not the typical application domain of LLMs, using a <500M parameter model. So my hunch is "reasoning" works much better for math, coding, factual recall, and writing tasks that most LLMs actually deal with.

throwawaymaths

why is it unreasonable that giving the llm a spot to think and collate long range attention and summarize without the pressure of building a meaningful next token so quickly would result in higher effectiveness?

null

[deleted]

naasking

I wonder if this finding would hold for something like Meta's Large Concept Models.

nullc

Even when you train AI on human language, the tokens can have "subtext" that is only legible to the AI. And, unfortunately, it's not even legible to the AI in ways that it could ever explain it to us.

It's no different than how in English we can signal that a statement is related to a kind of politics or that it's about sex through particular word and phrase choice.

Training for reasoning should be expected to amplify the subtext, since any random noise in the selection that by chance is correlated with the right results will get amplified.

Perhaps you could try to dampen this by training two distinct models for a while, then swap their reasoning for a while before going back-- but sadly distinct models may still end up with similar subtexts due to correlations in their training data. Maybe ones with very distinct tokenization would be less likely to do so.

nihakue

This is such a bonkers line of thinking, I'm so intrigued. So a particular model will have an entire 'culture' only available or understandable to itself. Seems kind of lonely. Like some symbols might activate together for reasons that are totally incomprehensible to us, but make perfect sense to the model. I wonder if an approach like the one in https://www.anthropic.com/research/tracing-thoughts-language... could ever give us insight into any 'inside jokes' present in the model.

I hope that research into understanding LLM qualia eventually allow us to understand e.g. what it's like to [be a bat](https://en.wikipedia.org/wiki/What_Is_It_Like_to_Be_a_Bat%3F)

candiddevmike

IMO this is why natural language will always be a terrible _interface_--because English is a terrible _language_ where words can have wildly different meanings that change over time. There's no ambiguity with intentions with traditional UX (or even programming languages).