Ingesting PDFs and Why Gemini 2.0 Changes Everything
sergey.fyi
Servo in 2024: stats, features and donations
servo.org
Why Is Warner Brothers Discovery Dumping Old Movies On YouTube?
tedium.co
S1: The $6 R1 Competitor?
timkellogg.me
Software development topics I've changed my mind on
chriskiehl.com
Avoiding outrage fatigue while staying informed
scientificamerican.com
Running ArchiveTeam's Warrior in Kubernetes
gabrielsimmer.com
Implementation of a RingBuffer in Java with optional FIFO like semantics
github.com
Ploomber (YC W22) Is Hiring Engineers (Infra, Backend, Growth)
ycombinator.com
Gemini 2.0 is now available to everyone
blog.google
An elusive California mammal has just been photographed alive
sfgate.com
Ask HN: Do you know travel blogs that have animated SVG maps of their travels?
Chrome 133 Supports DOM State-Preserving Move with moveBefore()
chromestatus.com
Catgrad: A categorical deep learning compiler
catgrad.com
OpenWISP: Multi-device fleet management for OpenWrt routers
openwisp.org
Show HN: Matle – A Daily Chess Puzzle Inspired by Wordle
matle.io
US egg prices increased 22% in 2025 and 202% in 12 months
tradingeconomics.com
The New York Times Has Spent $10.8M in Its Legal Battle with OpenAI So Far
hollywoodreporter.com
I have a hard time keeping up with the literature on this and it's not exactly my area of research, but the "overfitting is ok" always seemed off and handwavy to me. It violates some pretty basic information-theoretic literature, for one thing.
I guess it seems like parameters need to be "counted" differently or there's something misunderstood about what a parameter is, or whether and how it's being adjusted for somewhere. Some of the gradient descent literature I've read, makes it seem like there are sometimes adjustments for parameters as part of the optimization process, so talking about "overfitting doesn't mean anything" is misleading.
It just seems like something where there's a lot of imprecision in terms that is critically important, no definitive explanations for anything, and so forth.
The results are the results, but then again we have hallucinations and weird adversarial probe glitches suggestive of overfitting (see also e.g., http://proceedings.mlr.press/v119/rice20a). I might even suggest the definition of overfitting in a DL context has been poorly operationalized. Sure you can have a training and a test set, but if the test set isn't sufficiently differentiated from the training set, are you going to identify overfitting? I can take training and test sets with a traditional statistical model and if I define the test set a certain way, minimize overfitting results.
I guess I just feel like a lot of overfitting discussions tend to feel kind of handwavy or misleading and I wish they were different. The number of parameters has never really been the correct metric when talking about overfitting, it just happens to align nicely with the correct metric in conventional models.