Speech and Language Processing (3rd ed. draft)
5 comments
·December 8, 2025brandonb
I learned speech recognition from the 2nd edition of Jurafsky's book (2008). The field has changed so much it sometimes feels unrecognizable. Instead of hidden markov models, gaussian mixture models, tri-phone state trees, finite state transducers, and so on, nearly the whole stack has been eaten from the inside out by neural networks.
But, there's benefit to the fact that deep learning is now the "lingua franca" across machine learning fields. In 2008, I would have struggled to usefully share ideas with, say, a researcher working on computer vision.
Now neural networks act as a shared language across ML, and ideas can much more easily flow across speech recognition, computer vision, AI in medicine, robotics, and so on. People can flow too, e.g., Dario Amodei got his start working on Baidu's DeepSpeech model and now runs Anthropic.
Makes it a very interesting time to work in applied AI.
ForceBru
> Gaussian mixture models
In what fields did neural networks replace Gaussian mixtures?
brandonb
The acoustic model of a speech recognizer used to be a GMM, which mapped a pre-processed acoustic signal vector (generally MFCCs-Mel-Frequency Cepstral Coefficients) to an HMM state.
Now those layers are neural nets, so acoustic pre-processing, GMM, and HMM are all subsumed by the neural network and trained end-to-end.
One early piece of work here was DeepSpeech2 (2015): https://arxiv.org/pdf/1512.02595
mfalcon
I was eagerly waiting for a chapter on semantic similarity as I was using Universal Sentence Encoder for paraphrase detection, then LLMs showed up before that chapter :).
Latecomers to the field may be tempted to write this off as antiquated (though updated to cover transformers, attention, etc.) but a better framing would be that it is _grounded_. Understanding the range of related approaches is key to understanding the current dominant paradigm.