Zebra-Llama: Towards Efficient Hybrid Models
5 comments
·December 6, 2025adityashankar
danielbln
I think you mean computational efficiency will go _up_ in the future. To your last point: Jevons paradox might apply.
mason_mpls
> Zebra-Llama achieves Transformer-level accuracy with near-SSM efficiency using only 7–11B training tokens (compared to trillions of tokens required for pre-training) and an 8B teacher. Moreover, Zebra-Llama dramatically reduces KV cache size—down to 3.9%, 2%, and 2.73% of the original for the 1B, 3B, and 8B variants, respectively—while preserving 100%, 100%, and 97% of average zero-shot performance on LM Harness tasks.
This is an extraordinary claim, is there a catch I’m missing? Am I misreading?
null
[deleted]
Due to perverse incentives and the historical nature of models over-claiming accuracy, it's very hard to believe anything until it is open source and can be tested out
that being said, I do very much believe that computational efficiency of models is going to go up [correction] drastically over the coming months, which does pose interesting questions over nvidia's throne
*previously miswrote and said computational efficiency will go down