Zebra-Llama: Towards Efficient Hybrid Models

adityashankar

Due to perverse incentives and the historical nature of models over-claiming accuracy, it's very hard to believe anything until it is open source and can be tested out

that being said, I do very much believe that computational efficiency of models is going to go up [correction] drastically over the coming months, which does pose interesting questions over nvidia's throne

*previously miswrote and said computational efficiency will go down

credit_guy

Like this?

https://huggingface.co/amd/Zebra-Llama-8B-8MLA-24Mamba-SFT

danielbln

I think you mean computational efficiency will go _up_ in the future. To your last point: Jevons paradox might apply.

mason_mpls

> Zebra-Llama achieves Transformer-level accuracy with near-SSM efficiency using only 7–11B training tokens (compared to trillions of tokens required for pre-training) and an 8B teacher. Moreover, Zebra-Llama dramatically reduces KV cache size—down to 3.9%, 2%, and 2.73% of the original for the 1B, 3B, and 8B variants, respectively—while preserving 100%, 100%, and 97% of average zero-shot performance on LM Harness tasks.

This is an extraordinary claim, is there a catch I’m missing? Am I misreading?

null

[deleted]

HN

Zebra-Llama: Towards Efficient Hybrid Models

Zebra-Llama: Towards Efficient Hybrid Models