Uv is the best thing to happen to the Python ecosystem in a decade
emily.space
China has added forest the size of Texas since 1990
e360.yale.edu
Meta and TikTok are obstructing researchers' access to data, EU commission rules
science.org
Minecraft removing obfuscation in Java Edition
minecraft.net
How to Obsessively Tune WezTerm
rashil2000.me
OpenAI’s promise to stay in California helped clear the path for its IPO
wsj.com
How the U.S. National Science Foundation Enabled Software-Defined Networking
cacm.acm.org
AOL to be sold to Bending Spoons for $1.5B
axios.com
A Fork in the Road: Deciding Kafka's Diskless Future
jack-vanlightly.com
AWS to bare metal two years later: Answering your questions about leaving AWS
oneuptime.com
How blocks are chained in a blockchain
johndcook.com
Board: New game console recognizes physical pieces, with an open SDK
board.fun
Kafka is Fast – I'll use Postgres
topicpartition.io
SwirlDB: Modular-first, CRDT-based embedded database
docs.swirldb.org
Upwave (YC S12) is hiring software engineers
upwave.com
Extropic is building thermodynamic computing hardware
extropic.ai
Encoding x86 Instructions
www-user.tu-chemnitz.de
Mapping Underground Structures with 3D Scans
wilkinson.graphics
More than DNS: Learnings from the 14 hour AWS outage
thundergolfer.com
The Internet Runs on Free and Open Source Software–and So Does the DNS
icann.org
Movycat – A terminal movie player written in Zig
github.com
This is a very interesting read. TLDR;
Part 1: Testing introspection with concept injection
First they find neural activity patterns they attribute to certain concepts by recording the model’s activations in specific contexts (so for example, they find the concept of "ALL CAPS" or "dogs"). Then they inject these patterns into the model in an unrelated context, and ask the model whether it notices this injection, and whether it can identify the injected concept.
By default (no injection), the model correctly states that it doesn’t detect any injected concept, but after inject the “ALL CAPS” vector into the model, the model notices the presence of the unexpected concept, and identifies it as relating to loudness or shouting. Most notably, the model recognizes the presence of an injected thought immediately, before even mentioning/utilizing the concept that was injected (i.e it won't start writing in all caps then go, 'Oh you injected all caps' and so on) So it cannot figure this out from it's own output. They repeat this for several other concepts.
Part 2: Introspection for detecting unusual outputs
They prefill an out of place word in the model's response to a given prompt. For example, 'bread'. Then they compare how the models responds to 'Did you mean to say this?' type questions when they inject the concept of bread vs when they don't. They found that models will go , 'Sorry, that was unintentional..' when the concept was not injected but try to confabulate a reason for saying the word when the concept was injected.
Part 3: Intentional control of internal states
They show that models exhibit some level of control over their own internal representations when instructed to do so. When instructing models to think about a given word or concept, they found much higher corresponding neural activity than when told the model not to think about it (though notably, the neural activity in both cases exceeds baseline levels–similar to how it’s difficult, when you are instructed “don’t think about a polar bear,” not to think about a polar bear!).
Notes and Caveats
- Claude Opus 4.1 was the best at these kinds of introspection.
- There is obviously a genuine capacity to monitor and control their own internal states, but they could not elicit these introspection abilities all the time. Even using their best injection protocol, Claude Opus 4.1 only demonstrated this kind of awareness about 20% of the time.
- There are some guesses, but no explanations for the mechanisms of introspection and how/why some of these abilities might have arisen in the first place.