Looking Back at Speculative Decoding

numeri

I've been slightly annoyed by how the Speculative Decoding paper has gotten all the credit for the technique – I first learned about the technique from a paper more than a year older[1], Shallow Aggressive Decoding.

They introduce the same method, but apply it to grammatical error correction, meaning the "draft" output is just the input itself. The Speculative Decoding paper tries to emphasize differences between this and their method, saying that theirs is more general, as they apply it to more domains, allowing the draft to come from a smaller model, and extend it to allow sampling.

All of that is great, and deserves another paper, but doesn't deserve the credit for inventing and rights to rename the method, especially when they were aware of Shallow Aggressive Decoding before uploading their first draft.

[1]: https://arxiv.org/abs/2106.04970

fritzo

I first saw the trick in 2009 when Linda Brown Westrick independently discovered it as a way to parallelize sequential sampling of MCMC chains.

numeri

Do you have a good reference I can read up on? I'd love to learn a bit more and update my mental "citation"

fritzo

unpublished, i'm sure the idea has come up many times

veryluckyxyz

https://pytorch.org/blog/hitchhikers-guide-speculative-decod...

https://colab.research.google.com/github/sanchit-gandhi/note...

HN

Looking Back at Speculative Decoding

Looking Back at Speculative Decoding