The Dragon Hatchling: The missing link between the transformer and brain models

75 comments

·October 22, 2025

alyxya

I tried understanding the gist of the paper, and I’m not really convinced there’s anything meaningful here. It just looks like a variation of the transformer architecture inspired by biology, but no real innovation or demonstrated results.

> BDH is designed for interpretability. Activation vectors of BDH are sparse and positive.

This looks like the main tradeoff of this idea. Sparse and positive activations makes me think the architecture has lower capacity than standard transformers. While having an architecture be more easily interpretable is a good thing, this seems to be a significant cost to the performance and capacity when transformers use superposition to represent features in the activations spanning a larger space. Also I suspect sparse autoencoders already make transformers just as interpretable as BDH.

jimbo808

There isn't. The title is totally clickbait.

Anything "brain-like" that fits into one single paper is bullshit.

astroflection

The actual paper's title: "The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain"

Don't berate the authors for the HN submitter's carelessness.

jimbo808

The actual title is just as click baity

ljlolel

A real scientist wouldn’t use an imprecise term like “brain-like”

nickpsecurity

That last line isn't true. To be brain-like, it only needs to imitate one thing in the brain. That thing is udually tested in isolation against observed results in human brains. Then, people will combine multiple, brain-inspired components in various ways.

That's standard in computational neuroscience. Our standard should simply be whether they are imitating an actual structure or technique in the brain. They usually mention that one. If they don't, it's probably a nonsense comparison to get more views or funding.

jimbo808

I am genuinely baffled by this reply. Every single sentence you've typed is complete and utter nonsense. I'm going to bookmark this as a great example of the Dunning-Kruger effect in the wild.

Just to illustrate the absurdity of your point: I could claim, using your standard, that a fresh pile of cow dung is brain-like because it imitates the warmth and moistness of a brain.

busssard

This is like at the beginning or the end of the Crypto Bubble. Publish a whitepaper for the next model architecture and hope that uninformed people with money blow it up your companys... i mean blow up the economy.... i mean blow , ahh whatever you know

raincole

> the end of the Crypto Bubble

BTC literally hit all time high this month, fyi.

jimbo808

Remember the dotcom bubble? There are still websites, by the way. Doesn't mean it wasn't a bubble.

cootsnuck

What's your point?

House prices are at all time highs too. That doesn't mean the housing bubble never happened.

oofbey

Attention mechanisms are wonderfully interpretable as is. You can literally see which tokens each token is attending to. People don’t bother much these days. But that’s not a strong selling point.

fxwin

I haven't read through the entire thing yet, but the long abstract combined with the way the acronym BDH is introduced (What does the B stand for?) along with the very "flowery" name (When neither "dragon" nor "hatchling" appears again past page 2) is rather offputting

- It seems strange to make use of the term "scale-free" and then defer a definition until half way through the paper (in fact, the term is mentioned 3 times after, and 14 times before said definition)

- This might just be CS people doing CS things, but the notation in the paper is awful: Claims/Observations end with a QED-symbol (for example on pages 29 and 30) but without a proof

- They make strong claims about performance and scaling ("It exhibits Transformer-like scaling laws") but the only (i think?) benchmark is a translation task comparison with <1B models, ,which is ~2 orders of magnitude smaller than sota

mwigdahl

The B stands for "Baby". Baby Dragon Hatchling is their model name.

fxwin

Seems like this should be in the paper! Thanks though

halfdeadcat

It's a 'dragon hatchling' because it is 'scale-free'.

fxwin

Hah, that's pretty clever if it's true .D

polskibus

One of the authors, Adrian, is a very interesting person. Got his PhD at 21, started CS studies at an age when his peers were starting high school. Knowing some of his previous achievements, I’d say his work deserves at least some curiosity.

ZeroCool2u

This is one of the first papers in the neuromorphic vein that I think may hold up. It would be amazing if it did too due to the following properties:

-Linear (transformer) complexity at training time

-Linear scaling with number of tokens

-Online learning(!!!)

The main point that made me cautiously optimistic:

-Empirical results on par with GPT-2

I think this is one of those ideas that needs to be tested with scaled up experiments sooner rather than later, but someone with budget needs to commit. Would love to see HuggingFace do a collab and throw a bit of $$$ at it with a hardware sponsor like Nvidia.

deviation

I guarantee if there's even a 0.1% chance of this architecture eventually outperforming traditional ones, then Zuckerberg et al are already eating the cost and have teams spinning up experiments doing just that.

ZeroCool2u

Absolutely agreed, but we may not even hear about it as Meta has made it clear they're not necessarily committed to the open source first policy at this point.

nickpsecurity

That's not true. The AI industry appears to play a game of follow the leader copying other companies and major researchers. There's all kinds of good ideas we never see applied by big companies. So, it's not safe to assume they tried them all and they didn't work.

In fact, we've sometimes seen new companies show up with models based on research big companies didn't use, the new models are useful or better in some way, and people use them or big companies acquire them. I'd say that's proof big companies miss a lot of good ideas internally.

bob1029

The nature of the abstract is making me hesitate to go any further on this one. It doesn't even seem to fit within arxiv's web layout.

CaptainOfCoit

Judging science based on the layout of a webpage feels less than ideal :/ The PDF seems to render just fine.

bob1029

This doesn't change the fact that the PDF contains a ~440 word abstract. It comes off as a defensive marketing pitch when it's this long.

batuhandumani

You're truly judging the book by its cover, but I have to give credit where it's due the abstract is very long.

oofbey

It’s a clear signal the paper is gonna be hard to read. It takes a ton of work to compress complex ideas down to 8 pages for a conference paper. But that work makes it easier to understand. This paper did not do that work. In fact it seems they did the opposite: try to write a LONG paper as if that shows how much originality they have.

polskibus

I posted it 19 days ago. It it didn’t get any traction, I wonder why. https://news.ycombinator.com/item?id=45453119

recitedropper

Repo seems legit, and some of the ideas are pretty novel. As always though, we'll have to see how it scales. A lot of interesting architectures have failed the GPT3+ scale test.

As a sidenote--does anyone really think human-like intelligence on silica is a good idea? Assuming it comes with consciousness, which I think is fair to presume, brain-like AI seems to me like a technology that shouldn't be made.

This isn't a doomer position; that human-like AI would bring about the apocalypse. It is one of empathy: At this point in time, our species isn't mature enough to have the ability to spin up conscious beings so readily. I mean look how we treat each other--we can't even treat beings we know to be conscious with kindness and compassion. Mix our immaturity with a newfound ability to create digital life and it'll be the greatest ethical disaster of all time.

It feels like researchers in the space think there is glory to be found in figuring out human-like intelligence on silicon. That glory has even attracted big names outside the space (see John Carmack), under the presumption that the technology is a huge lever for good and likely to bring eternal fame.

I honestly think it is a safer bet that, given how we aren't ready for such technology, the person / team who would go on to actually crack brain-like AI would be remembered closer to Hitler than to Einstein.

ACCount37

We have no clue what "consciousness" even is, let alone what the prerequisites are. Our best guesses are just that. Guesses. Guesswork based on information so sparse that astronomers in ancient Greece might have had a better time guessing what the stars truly are.

For all we know, an ICE in a 2001 Toyota truck is conscious too - just completely inhuman in its consciousness.

Nonetheless, here we are - building humanlike intelligence. Because it's useful. Having machines that think like humans do is very useful. LLMs are a breakthrough in that already - they implement a lot of humanlike thinking on a completely inhuman substrate.

recitedropper

For the record, I'm agnostic to whether or not consciousnses is possible upon silica. I think it is pretty safe to say though that it likely is an emergent property of specifically-configured complex systems, and humanlike intelligence on silica is certainly something that might qualify.

I don't think appealing to whether or not inanimate objects may be conscious is sufficient to discount that we are toying with a different beast in machine learning. And, if we were to discover that inanimate objects are in-fact conscious, that would be an even greater reason to reconfigure our society and world around compassion.

I agree that LLMs are a great breakthrough, and I think there are many reasons to doubt consciousness there. But I would suggest we rest on our laurels for a bit, and see what we can get out of LLMs, rather than push to create something that is closer to mimicking humans because it might be more useful. From the evil perspective of pure utility, slaves are quite useful as well.

cootsnuck

Yea, actual "human-like" consciousness would be an ethical nightmare. Any sane company should not be legitimately pursuing this.

My most generous interpretation of Anthropic's flirting with it is they too think it would be a nightmare and are hyper-vigilant. (My more realistic interpretation is that it's just some mix of a Frankenstein complex and hype-boosting.)

recitedropper

I hope your generous interpretation is right... I can't really tell what's going on with Anthropic's theater either. They definitely seem like they are vigilant of bad outcomes, going as far as to publish their own economic index trying to monitor how AI is affecting labor markets.

That said, the cynic in me thinks they give lip service to these things while pushing fully ahead into the unknown on the presumption of glory and a possibility of abundance. A bunch of the leadership are EAs who subscribe to a kind of superintelligence eschatology that goes as far as to give a shot at their own immortality. Given that, I think they act on the assumption that AGI is a necessity, and they'd rather take the risks on everyone's behalf than just not create the technology in the first place.

Them recently flirting with money from the gulf states is a pretty concerning signal pointing to them being more concerned with their own goals rather than ethics.

nix0n

> the person / team who would go on to actually crack brain-like AI would be remembered closer to Hitler than to Einstein

I completely agree. I think that the people who are funding AI research are essentially attempting to create slaves. The engineers actually doing the work have either not thought it through or don't care.

> Assuming it comes with consciousness, which I think is fair to presume, brain-like AI seems to me like a technology that shouldn't be made.

"Fair to presume" is a good way to put it. I'm not convinced that being "like a brain" is either necessary or sufficient for consciousness, but it's necessary to presume it will, because consciousness is not understood well enough for the risk to be eliminated.

lr4444lr

I mean look how we treat each other--we can't even treat beings we know to be conscious with kindness and compassion. Mix our immaturity with a newfound ability to create digital life and it'll be the greatest ethical disaster of all time.

Or maybe if we had artificial life to abuse, it would be a suffcient outlet for our destructive and selfish impulses so that we would do less of it to genuine life. Maybe it's just an extension of sport contests that scratch that tribal itch to compete and win. There are no easy answers to these questions.

recitedropper

In this thought experiment, I am considering artificial life genuine. I would agree that there could be productive outlets for our selfish impulses if there was something that mimicked their targets without consciousness to experience the externalities of such impulses.

That said, I think probably the best path would just be to build and foster technologies that help our species mature, so if one day we do get the ability to spin-up conscious beings artificially, it can be done in a manner that adds more beauty rather than despair to our universe.

raducu

> human-like intelligence on silica is a good idea.

The famous Chinese Room Translator -- silica is irelevant, you could probably implement LLM-like algorithm with pen and paper, do you still think the paper could suffer or be "conscious"?

kelseyfrog

The paper would think it's ridiculous that meat could suffer.

varjag

Suffering isn't necessary outside evolutionary pressures. But if a bouillon of animal proteins could be conscious why not.

recitedropper

I am empathetic to arguments against consciounsess being computational. Definitely strange to imagine an algorithm played out on trillions of abacuses being conscious.

That said, I don't think it is a sufficient appeal to entirely discount the possibility that the right process implemented on silicon could in fact be conscious in the same way we are. I'm open to whether or not it is possible--I don't have a vested interest in the space--but silica seems to be a medium that can possible hold the level of complexity for something like consciousness to emerge.

So this is to say that I agree with you that consciousness likely requires substrate-specific embodiment, but I'm open to silica being a possible substrate. I certainly don't think it can be discounted at this point in time, and I'd suggest that we don't risk a digital holocaust on the bet that it can't.

CaptainOfCoit

> It exhibits Transformer-like scaling laws: we find empirically that BDH rivals GPT2-architecture Transformer performance on language and translation tasks, at the same number of parameters (10M to 1B), for the same training data.

I'm assuming they're using "rivals GPT2-architecture" instead of "surpasses" or "exceeds" because they got close, but didn't manage to create something better. Is that a fair assessment?

ACCount37

Pretty much.

Everyone and their dog says "transformer LLMs are flawed", but words are cheap - and in practice, no one seems to have come up with something that's radically better.

Sidegrades yes, domain specific improvements yes, better performance across the board? Haha no. For how simple autoregressive transformers seem, they sure set a high bar.

pyeri

I've just stepped into LLMs, pytorch, transformers, etc. on the learning path, I don't know much about advanced AI concepts yet. But I somehow feel that scale alone isn't going to solve AGI problem, there is something fundamental about the nature of intelligence itself that we don't know yet, cracking that will lead to unleashing of true AGI.

badmonster

How does BDH handle long-range dependencies compared to Transformers, given its locally interacting neuron particles? Does the scale-free topology implicitly support efficient global information propagation?

jacobgorm

I read the through first 20+ pages 1.5 weeks ago, and found it quite inspiring. I tried submitting it here, but it did not catch on at the time. I watched the podcast interview with the founder, who seems very smart, but that made me realize that not everything described in the paper has been released as open source, which was a bit disappointing.

neom

This the podcast you watched? https://www.youtube.com/watch?v=mfV44-mtg7c

lackoftactics

The authors seem to have good credentials and I found the repo with code for this paper.

https://github.com/pathwaycom/bdh

There isn't a ton of code and there are a lot comments in my native language, so at least that is novel to me