Skip to content(if available)orjump to list(if available)

Learning Theory from First Principles [pdf]

miltava

This looks interesting. Does anyone knows how it compares to other learning theory books like foundations of machine learning [1] in terms of depth and approachability?

[1] https://cs.nyu.edu/~mohri/mlbook/

mikhael

Learning From Data: https://amlbook.com/

commandersaki

Ah damn I was hoping this would be a position paper on why learning _theory_ from first principles is a good idea.

drsopp

I was hoping the paper was on _learning_ theory.

amelius

It is a book on that. Not sure why they left out "machine" from "machine learning".

grg0

Good resource, boys. Thanks for sharing.

almostgotcaught

I honestly don't understand why people write these books anymore. Let me explain: there used to be a lot of these kinds survey books that start with linear regression and end at... something classical. I can rattle off a lot of titles (Pattern Recognition and Machine Learning, Elements of Statistical Learning, Intro to Statistical Learning, blah blah blah). They all covered the same material at various levels of sophistication (some of them covered meta theory like PAC learning or shattering dimension or empirical risk minimization or whatever). Some of them took the statistical approach and some of them took the optimization approach. Again: blah blah blah. The synthesis/summary is/was there is no grand unified theory of machine learning and everyone saw that it should be clear.

And then "deep learning" arrived and it became even more obvious that the only thing that matters is data and time spent crunching numbers (more of both and you get better results no matter the model).

Again I just want to be crystal clear, because I'm sure someone will pop in and claim "oh I still use SVM to pick my family's shopping list": no professional ML engineer/team/org today that ships and ML product "at scale" gives a fuck about SVMs or graphical models or bayes nets or kernel methods. No one. So who cares about all this sophistry? What value is it to learn concentration inequalities - training goes brrr no matter what if you have enough data. And if you don't, if you're really building a model to predict your family's shopping list, I encourage to reflect on whether it would be simpler to just ask your family what they want for dinner instead.

My 2 cents: teach people/students useful things instead of this stuff. They'll be happier and you'll feel more fulfilled (even though you didn't get flex your big math brain).

miltava

Maybe the book it’s just not for you. It doesn’t mean it’s not for anyone.

I understand that deep learning is all in vogue now. But when I was in graduate school, a professor asked me why I was using neural nets in a project since it was not as good as SVMs. We used to study Vapnik and VC dimensions, SVMs etc. and neural nets were totally out of fashion.

Imagine what would have happened if everybody were using and researching only those methods because they worked better. And deep learning could benefit from a theory that explains why, when and how it works so well. Maybe someone working on this could develop on it to include it.

Also I don’t think you’re right to assume that all models out there are deep learning models. Yes they are very good for many cases (specially those with less structured data, like image or nlp). But in some cases gradient boosting or even GLMs are better suited for the task (because of the structure and size of the data or because of computing restrictions).

And in the end, people can just want to learn it because they find it interesting. It’s a bit sad to do only things that are “useful”. That’s my 2 cents.

pinkmuffinere

Some of these “ML” methods have applications outside of what you’d think of as ML. My background is in control theory, which relies on guarantees which you just can’t get from neural nets. Skimming through the outline, there are tons of methods here which are used in controls and estimation — certainly they’re still useful

shusaku

> no professional ML engineer/team/org today that ships and ML product "at scale" gives a fuck about SVMs or graphical models or bayes nets or kernel methods.

There’s a reason why “AI is just statistics” became a meme: a lot of places do use textbook machine learning techniques and dress it up as AI. Yes deep learning will win with enough data but few companies have that luxury.

choonway

This book is not for you.

Teaching others how to replicate solutions is very different from guiding people how to solve yet unsolved problems in the field.

This book is for the latter. For the former, you might want to look out for one in the "for dummies" series.

almostgotcaught

> This book is not for you... you might want to look out for one in the "for dummies" series.

My guy I learned this material from a healthy mix of ESL and Casella Berger and Billingsley. I could still, to this day, probably do every proof in this book without reviewing the material. And yet, despite all that training, I still argue this book is not useful for absolutely anything except assigning homework problems and setting exams.

choonway

That's not strange. There are many people who can ace exams but are not competent in applying it in their workplace.

null

[deleted]

sarosh

But why does, as you explain "training goes brrr"?

Francis Bach, the author, makes a good faith effort to explain exactly why this material is beneficial (see https://francisbach.com/my-book-is-out/):

"Why yet another book on learning theory? ...the main reason is that I felt that the current trend in the mathematical analysis of machine learning was leading to overly complicated arguments and results that are often not relevant to practitioners. Therefore, my aim was to propose the simplest formulations that can be derived from first principles, trying to remain rigorous without overwhelming readers with more powerful results that require too much mathematical sophistication."

From my own reading and experience on the mathematical analysis approach of this "training goes brrr" approach, I thought the material in Chapter 12, Overparameterized Models, was interesting and coherent with 12.2.4 Linear Regression with Gaussian Projections being an especially elegant explanation. It would be interesting to hear if you had read/skimmed/purused this section and found it wanting etc.

gloomyday

The pursuit of knowledge is not a linear path. The reason you benefit from deep learning now is because a few people in the past believed neural networks had a future despite not working as well as other techniques such as SVMs.

Discovering knowledge and using the knowledge that works best are very different.

Your argument remind me from this lecture from Feynman. Quoting him: "...and every theoretical physicist that is any good knows 6 or 7 different theoretical representations for exactly the same physics and knows that they are equivalent... but he keeps them in his head hoping that they'll give him different ideas for guessing."

https://www.youtube.com/watch?v=NM-zWTU7X-k

fancyfredbot

This book doesn't seem massively different from several other existing textbooks. There are also several good textbooks on deep learning specifically (I'd recommend the new Bishop)

This textbook is hardly irrelevant for people who only care about deep learning though. It covers regularisation, optimisation, overparameterised models, double descent and err, neural networks. Sounds pretty relevant to me?

If you think the rest of the book is irrelevant then skip it.

You sound a bit nutty when you confidently state nobody uses any of the other methods in this book. How could you possibly know that?

adalarmed

You seem to know a lot about this area. I do not but I've heard explaining what deep learning models do is a black-box? If you work in a "misson critical" you'd have to explain all the math behind the model. Let's say in healthcare, finance, aviation, etc.

Also, the "big math brain"'s you're talking about probably read all the books your shutting down. I'd say their big math brains are the reason we have LLMs today.