Skip to content(if available)orjump to list(if available)

Alignment Is Capability

Alignment Is Capability

16 comments

·December 8, 2025

xnorswap

I've only been using it a couple of weeks, but in my opinion, Opus 4.5 is the biggest jump in tech we've seen since ChatGPT 3.5.

The difference between juggling Sonnet 4.5 / Haiku 4.5 and just using Opus 4.5 for everything is night & day.

Unlike Sonnet 4.5 which merely had promise at being able to go off and complete complex tasks, Opus 4.5 seems genuinely capable of doing so.

Sonnet needed hand-holding and correction at almost every step. Opus just needs correction and steering at an early stage, and sometimes will push back and correct my understanding of what's happening.

It's astonished me with it's capability to produce easy to read PDFs via Typst, and has produced large documents outlining how to approach very tricky tech migration tasks.

Sonnet would get there eventually, but not without a few rounds of dealing with compilation errors or hallucinated data. Opus seems to like to do "And let me just check my assumptions" searches which makes all the difference.

airstrike

I'm not so sure. Opus 4.1 was more capable than 4.5, but it was too damn expensive and slow.

boxed

I had a situation this weekend where Claude said "x does not make sense in [context]" and didn't do the change I asked it to do. After an explanation of the purpose of the code, it fixed the issue and continued. Pretty cool.

(Of course, I'm still cognizant of the fact that it's just a bucket of numbers but still)

sd9

My kingdom for an LLM that tells me I’m wrong

delichon

> Miss those, and you're not maximally useful. And if it's not maximally useful, it's by definition not AGI.

I know hundreds of natural general intelligences who are not maximally useful, and dozens who are not at all useful. What justifies changing the definition of general intelligence for artificial ones?

exe34

they were born in carbon form by sex.

trillic

IVF babies are AGI

munchler

> A model that aces benchmarks but doesn't understand human intent is just less capable. Virtually every task we give an LLM is steeped in human values, culture, and assumptions. Miss those, and you're not maximally useful. And if it's not maximally useful, it's by definition not AGI.

This ignores the risk of an unaligned model. Such a model is perhaps less useful to humans, but could still be extremely capable. Imagine an alien super-intelligence that doesn’t care about human preferences.

tomalbrc

Except that it is not anything remotely alien but completely and utterly human, being trained on human data.

munchler

Fine, then imagine a super-intelligence trained on human data that doesn’t care about human preferences. Very capable of destroying us.

js8

I am not sure if this is what the article is saying, but the paperclip maximizer examples always struck me as extremely dumb (lacking intelligence), when even a child can understand that if I ask them to make paperclips they shouldn't go around and kill people.

I think superintelligence will turn out not to be a singularity, but as something with diminishing returns. They will be cool returns, just like a Brittanica set is nice to have at home, but strictly speaking, not required to your well-being.

__MatrixMan__

A human child will likely come to the conclusion that they shouldn't kill humans in order to make paperclips. I'm not sure its valid to generalize from human child behavior to fledgeling AGI behavior.

Given our track record for looking after the needs of the other life on this planet, killing the humans off might be a very rational move, not so you can convert their mass to paperclips, but because they might do that to yours.

Its not an outcome that I worry about, I'm just unconvinced by the reasons you've given, though I agree with your conclusion anyhow.

lulzury

There's a direct line between ideology and human genocide. Just look at Nazi Germany.

"Good intentions" can easily pave the road to hell. I think a book that quickly illustrates this is Animal Farm.

exe34

Given the kind of things Claude code does with the wrong prompt or the kind of overfitting that neural networks do at any opportunity, I'd say the paperclip maximiser is the most realistic part of AGI.

if doing something really dumb will lower the negative log likelihood, it probably will do it unless careful guardrails are in place to stop it.

a child has natural limits. if you look at the kind of mistakes that an autistic child can make by taking things literally, a super powerful entity that misunderstands "I wish they all died" might well shoot them before you realise what you said.

riskable

The service that AI chatbots provide is 100% about being as user-friendly and useful as possible. Turns out that MBA thinking doesn't "align" with that.

If your goal is to make a product as human as possible, don't put psychopaths in charge.

https://www.forbes.com/sites/jackmccullough/2019/12/09/the-p...

podgorniy

Great deep analysis and writing. Thanks for sharing.