Contemplative LLMs
8 comments
·January 12, 2025lsy
The title makes it sound like some new architecture, but this is a blog post where someone likes the results they get sometimes when they fiddle with their input to the LLM to suggest “contemplation”, which apparently makes the LLM generate a large paragraph of highly neurotic text before the answer. There aren’t benchmarks or investigation of the model to see whether it is robust or generalizable so it’s hard to say whether this is useful or not.
padolsey
Tbf have you read most of the academic papers on LLMs lately? It's all a lot of boilerplate academic language packaging around "we tried prompting it this way and it did good things". Tho yes I do appreciate some scientific prudence.
vunderba
Isn't this just a much larger "prompt based equivalent" of chain-of-reasoning systems like Qwq?
riwsky
Pierre Menard, Author of the “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” Paper
m3kw9
If they were confident of the answer at first even when using contemplating as the context, why would they start saying “that doesn’t seem right”? And then re work the answer
pizza
It might be that the most obvious way to eventually get to the right answer in the long run might look like starting from the enticing, obvious, and wrong initial guess and then pivoting. It definitely seems like an easier behavior to emulate than always getting the answer right on the first try.
DrewADesign
I just really don’t like having to fight through a research tool’s Dunning-Kruger incompetence blindness, and I’m pretty sure most non-technical users don’t either. If you’ve got a built-in habit of academic skepticism for even the most confidently delivered information, and ideally at least a vague mental model of why that process works, then it’s a useful tool. That describes a portion of the population too small to justify spending the kind of money we’ve spent on generative AI. And beyond that, you have to know enough about what you’re asking it to realize when it’s full of shit. How much more does that reduce the useful use cases? At first, I thought “man it would be cool to give this thing access to all kinds of APIs so I could get news, weather info, transit info… but if I just have to double check everything it says to make sure it’s not just making its response up, what’s the use in that? I sure hope this all becomes a lot more reliable without me having to tell it to be, really soon.
Cool! I love seeing the very subtle emergent behaviours of different CoT approaches. I reckon people still don't fully appreciate the brittle artistic subtlety of trying to make something akin to logic emerge from these weird transformer machines.
> Now, instead asking the model to respond in JSON we use XML tags to separate the start and end of the contemplation phase and the final answer
I suspect the author wisely avoided function-calling/JSON since it doesn't guarantee sequence. This–and a few other frailties–make me almost always use XML-like markup for my LLM API calls.
Markup langs like XML and HTML lend themselves quite beautifully to this task. They are stream-friendly, semantically enriched, leniently parseable (html was designed in part for fallible humans to write and for browsers to incrementally render) and by nature of being "markup" they are complementary to the autoregressive nature of LLMs. One assumes as well that tonnes of prose appears in HTML found in training corpuses, less so in JSON which is usually used for transactional data and RPC-like things, which must surely bias JSON completions to more robotic formations. FWIW I ended up creating a library (github.com/padolsey/xmllm) to help me get structured data from LLMs using XML (through the forgiving eyes of an HTML parser), so that I never have to rely on specific LLM tool/function-calling abstractions. Even tiny models like Qwen2.5 and Ministral3B have pretty superb (x|ht)ml compliance, much less so with JSON.