Prediction Games

derbOac

I have a hard time keeping up with the literature on this and it's not exactly my area of research, but the "overfitting is ok" always seemed off and handwavy to me. It violates some pretty basic information-theoretic literature, for one thing.

I guess it seems like parameters need to be "counted" differently or there's something misunderstood about what a parameter is, or whether and how it's being adjusted for somewhere. Some of the gradient descent literature I've read, makes it seem like there are sometimes adjustments for parameters as part of the optimization process, so talking about "overfitting doesn't mean anything" is misleading.

It just seems like something where there's a lot of imprecision in terms that is critically important, no definitive explanations for anything, and so forth.

The results are the results, but then again we have hallucinations and weird adversarial probe glitches suggestive of overfitting (see also e.g., http://proceedings.mlr.press/v119/rice20a). I might even suggest the definition of overfitting in a DL context has been poorly operationalized. Sure you can have a training and a test set, but if the test set isn't sufficiently differentiated from the training set, are you going to identify overfitting? I can take training and test sets with a traditional statistical model and if I define the test set a certain way, minimize overfitting results.

I guess I just feel like a lot of overfitting discussions tend to feel kind of handwavy or misleading and I wish they were different. The number of parameters has never really been the correct metric when talking about overfitting, it just happens to align nicely with the correct metric in conventional models.

freeone3000

The definition of overfitting is handwavy. It’s a failure to generalize outside of the observed data. The current batch of LLMs is trained on essentially all of the internet; what would something outside of the observed data even look like? What does it mean there?

On the contrary, if a printing press controller “overfits” to the printing press it’s installed on, that is actually pretty desirable!

So what are you actually trying to prevent when you want to prevent “overfitting”, and why?

esafak

All of the Internet does not include everything you can extrapolate from it. When I ask it to help with my code or writing, I am not asking it to reproduce anything.

freeone3000

This sort of gets at the crux of it: aren’t you? You’re asking for the most probable example of a sequence of tokens in a formalized language (code or english) to communicate the idea, where the entire language, its rules, and many examples are in the training set. And the probability of your question, or a question much like it, having been asked is quite high actually. Performance can be seen to tank drastically when you ask for APL or MUMPS code instead of Javascript or Python; it produces Ol Chiki at significantly less proficiency than English. Does this mean these models are drastically overfit to English and Python? And if so, so what?

dkkergoog

[dead]

janalsncm

I have worked on online recommender systems. (Here, “online” means the model is being continuously updated from user interactions.)

Overfitting is typically not a concern. You train on the last N days of user interactions, and because of the volume of data there isn’t time for the model to see an interaction twice.

So you don’t need a test set. Your performance metrics may go up or down in a day depending on data drift.

sdwr

How does "overfitting is ok" violate information theory?

How are hallucinations suggestive of overfitting?

Overfitting is a tactical term, not a strategic one, and is heavily coupled to the specific implementation.

AlotOfReading

I suspect they're trying to relate the pigeonhole principle to overparameterization, but those pieces don't really fit together into a coherent argument for me.

genewitch

What is the difference between a strategy and a tactic?

sdwr

Strategy is the big-picture plan, tactics are "in the moment" actions.

Scamming seniors over the phone is a strategy, pretending to be their grandson is a tactic.

optimalsolver

>This is a bitter lesson about the interplay between techlash activism and big tech power structures. Twenty years of privacy complaints have only made tech companies more powerful.

So we should have done what, exactly, Ben?

jfkrrorj

> Netflix launched an open competition ... in-house recommendation system by 10%.

It worked great for them. Current masterpiece from Netflix has 13 Oscar nominations! Every AI company should learn and apply this lesson!

hobs

They pretty quickly publicly abandoned that algorithm, so they may have recreated it (since the core stuff is pretty reproducible as the blog states) but yeah, that competition being brought up without bringing up that they abandoned it is interesting.

kombine

I recently read another great post by the same author about the connection between optimisation with constraints and backpropagation algorithm https://archives.argmin.net/2016/05/18/mates-of-costate/ Apparently based on older LeCun's paper.

pitt1980

How many $1 million prizes were given out?

esafak

Where's the beef? This is old hat and you could have read it at https://en.wikipedia.org/wiki/Netflix_Prize

HN

Prediction Games

Prediction Games