Skip to content(if available)orjump to list(if available)

Efficient Reasoning with Hidden Thinking

scribu

Would be curious to know how this stacks up against Coconut [1] which also uses latent space for reasoning.

[1] https://arxiv.org/abs/2412.06769

kevmo314

Definitely curious, this looks very similar to Coconut, even down to the CoT encoding process in Figure 2. They go into a lot more detail though, seems like parallel innovation.

esafak

I'm behind on reading but don't all models use continuous embeddings to represent reasoning?

winwang

I believe the "continuous" in Coconut means that the CoT is in the continuous latent space, instead of being on output tokens (see Fig. 1).

moolimon

I feel like this is the obvious next step for chain of thought reasoning. Excited to see work on models that try and transform the intermediate thinking space tokens, down to language. Allowing us to still try and see what's happening inside the "mind" of the LLM, if that process is even possible to map to language anymore. I also wonder what the implications of this research are on chain of thought reasoning with reinforcement learning, since from my understanding many of the reward mechanisms set up during reinforcement learning are around the structure of thought process.

Davidzheng

WRT last sentence: I think the recent breakthroughs are precisely not caring at all about the cot itself and evaluating only the end product, allowing the model to develop a method of reasoning which is not necessarily procured by human data distribution (has the benefit of allowing it to collapse to a "personalized" reasoning pattern)

null

[deleted]

null

[deleted]

another_poster

Is “multimodal reasoning” as big a deal as it sounds? Does this technique mean LLMs can generate chains of thought that map to other modalities, such as sound and images?

deoxykev

I don't think autoregressive models have a fundemental difference in terms of reasoning capability in latent space vs token space. Latent space enables abstract reasoning and pattern recognition, while token space acts as both the discrete interface for communication, and as a interaction medium to extend, refine and synthesize high order reasoning over latent space.

Intuively speaking, most people think of writing as a communication tool. But actually it's also a thinking tool that helps create deeper connections over discrete thoughts which can only occupy a fixed slice of our attention at any given time. Attentional capacity the primary limitation-- for humans and LLMs. So use the token space as extended working memory. Besides, even the Coconut paper got mediocre results. I don't think this is the way.

bravura

I appreciate your argument, but add the following nuance:

Latent space reasoning can represent and manipulate UNCERTAINTY more concisely and elegantly than token space reasoning.

byschii

isn't this dangerous? isn't the efficiency given at the expense of safety and interpretability?

https://arxiv.org/abs/2412.14093 (Alignment faking in large language models)

https://joecarlsmith.com/2024/12/18/takes-on-alignment-fakin...

PS I m definitely not an expert

winwang

Depends on if we can interpret the final hidden layer. It's plausible we evolve models to _have_ interpretable (final/reasoning) hidden layers, just that they aren't constrained to the (same representation of) input/output domains (i.e. tokens).

IshKebab

I don't see how it is any more dangerous than the already existing black-box nature of DNNs.

patcon

Yeah, agreed. The limits of human minds constrain language. To allow these things to reason outside words is in my intuitions a tactic with more abundant paths toward super intelligence, and the exact sort of path we'll have a harder time monitoring (we'll need fancy tools to introspect instead of just watching it think)

My current thinking is that I would support a ban on this style of research. Really hard to set lines for regulation, but this feels like an easy and intuitive place to exercise caution

achierius

Yes, but what do you think matters more: - Safety and (in the long run) human lives - More papers ?

jononor

Turns out we are the main paperclip optimizers...

thom

Very importantly here they provide a ways of decoding the encoded thought tokens, so you're not really losing explanatory power or debuggability. As much as OpenAI want to present hidden chain of thought as some sort of long term advantage or safety feature, it's horrible when you want to understand how a model came to some insane conclusion.

Davidzheng

Probably not needed in the end to reason in latent space. Unless constrained by human preference/SFT data, RL spontaneously should create new additions to language to help with new reasoning methods/new concepts invented by the system.

gunalx

I would be interrested in seeing how a combined latent space and traditional gpro cot could perform vs just one of either.

My intuition is still that latent space would be better at emulating larger models with fewer params, and cot helping refining the output after latent space.

Combined it would kinda being able to think about a problem. Throw down a draft then refine it.

aradox66

Could someone ELI5? It sounds like they generate a compressed token which represents a whole "thought" rather than elaborating the entire "thought" in actual language. Is that right?

ipunchghosts

Currently, when AI models solve problems, they write out long chains of thoughts (like showing their work in math). While helpful, this takes up a lot of computing power.

Heima does something clever - instead of writing out long explanations, it compresses each step of thinking into a single "thinking token." Think of it like using a shorthand symbol instead of writing out a full sentence.

Ancapistani

I've been doing a lot of introspection lately about how I think. I lack the terminology here unfortunately, but your description here sounds super familiar:

> instead of writing out long explanations, it compresses each step of thinking into a single "thinking token." Think of it like using a shorthand symbol instead of writing out a full sentence.

I have clear memories of how cognition worked for me before I understood spoken language. I recall thinking in concepts - kind of a weird mix of forms, motions, and intent. I know this sounds metaphysical, but that's not my intent. I just don't have the words to explain it.

I wish I did, though, because my very early memories of self-awareness certainly seem to map well onto the current state of AI development.

fzzzy

Plenty of people don't think with an internal monologue or internal imagery.

antirez

Cool, but isn't this encoding a potentially very long thinking process into a fixed embedding? Intuitively should not work as well.

jakobschwich

Seems like a promising next step.