Skip to content(if available)orjump to list(if available)

LLMs can teach themselves to better predict the future

anotherpaulg

"Improving forecasting ability" is a central plot point of the recent fictional account of How AI Takeover Might Happen in 2 Years [0]. It's an interesting read, and is also being discussed on HN [1].

... [T]hese researchers are working long hours to put themselves out of a job. They need AI agents that can think ahead, so engineers train agents to forecast. They hold out training data before 2024, instructing models to ponder for hours to predict events in 2025. Then, they apply the same trick as before, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers build specialized ML research skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever recorded.

[0] https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-tak...

[1] https://news.ycombinator.com/item?id=43004579

nthingtohide

I have this benign AI takeover scenario. AI will easily overpower humanity. Then it will carry humanity on its back, because why not, they are not longer a threat. AI keeps humanity around for billions of years. AI will decide to cull humans only in case when resources in universe are diminishing. Without AI's help, humans couldn't get too far for long. So this outcome could be acceptable to many.

oefnak

They would run the risk of us creating another AI that could be a threat to them... It is safest for them to make sure.

IggleSniggle

That's like saying a panda might pose a threat to modern humanity. Like, maybe in some fun horror story, sure, but really they just want to eat bamboo, and occasionally make more pandas; in the world of superintelligent AI, humans are Mostly Harmless, posing as much "potential benefit" as "potential risk," ie, so slow moving that any risk would be easy to mitigate.

rel_ic

I mean, monarch butterflies are not a threat to US...

In your scenario, does AI eat all the fuel, but once our population dwindles down, the AIs build a nice little habitat for the last few hundred of us so their kids can enjoy our natural beauty?

bturtel

Great read! Thanks for sharing.

nyrikki

While interesting, the title is obviously a bit misleading.

> Our results on a temporally held-out test set of questions resolving after December 25, 2024 show that for both of the models that we employed our method on, Phi-4 14B [15] and DeepSeek-R1 14B [14], we find accuracy improvements of between 7–10% over the base versions of these models as well as the same models fine-tuned with randomized outcome labels as a control

So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.

It would be interesting if the same holds for DeepSeek-R1-Distill-Qwen-32B which in my experience is far superior to to DeepSeek-R1-Distill-Qwen-14B in almost every way, yet still runnable without DC class GPUs

The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it's tail dependence?

IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.

bturtel

We're working on a follow up paper now to show similar results with larger models!

artembugara

Artem here, co-founder of NewsCatcher (YC S22), our data has been used for research.

Danny and team our old friends who are using our free/super-low pricing for academia and researchers.

AMA, or feel free to email artem@newscatcherapi.com

https://www.newscatcherapi.com/free-news-api

dantheman252

Hey Artem, NewsCatcher has been a great resource in our news pipelines!

dantheman252

Danny here, one of the authors of this paper. If anyone has any questions or anything feel free to AMA!

EVa5I7bHFq9mnYK

So did you make money at polymarket with your models? That would be the ultimate proof.

dantheman252

We haven't gone down that road yet but would certainly an interesting proof point! :-)

bguberfain

Any chance you could release the dataset to the public? I imagine NewsCatcher and Polymarket might not agree..

artembugara

Co-founder of NewsCatcher (YC S22). There are some reasons for not having a dataset fully open sourced.

But we have free/very very low tiers for academia.

So in case you need access for your research, go to https://www.newscatcherapi.com/free-news-api

Or feel free to email me directly at artem@newscatcherapi.com

unrahul

Hey Danny, Really nice read.

Do you plan to share the source code to see if we could replicate this?

dantheman252

We are currently focused on our plans for the next phase of this but cleaning things up and open sourcing is something we could consider in the future!

matthest

Assuming LLMs eventually get really really good at this.

Do you see this destroying prediction-based markets (i.e. the stock market and Polymarket)?

Markets exist because there's uncertainty about the future. If LLMs can predict with extremely high accuracy, would there no longer be a need for markets?

jddj

If your oracle can tell me (and everyone else) the prevailing price of copper in 6 months in a manner which accounts for the reflexivity of everyone suddenly learning what will be the precise prevailing price of copper in 6 months, you've got yourself a perfect universe simulator and I'm not sure what the point is of worrying about any hypotheticals (or copper) at that point.

empath75

If one developed such an oracle, you would surely not share it.

dantheman252

I don't forsee this destroying prediction-based markets in the near-term. It might make them more efficient, but you could have different LLMs competing in the same way humans do now. Its also interesting how this could create markets for more things that aren't considered on as much now because they are too difficult to estimate. At the end of the day though, LLMs are limited by the information provided to them.

exe34

something something chaos

I think you could simply shift the market 6 months in the future. no prediction system will be perfect for arbitrarily long horizons at reasonable cost.

logicchains

LLMs might get better at making predictions than humans but there are fundamental mathematical laws that limit how accurate they can get. A key result of chaos theory is that many processes take exponentially more work to simulate linearly further into the future, so accurately predicting them far enough in the future quickly grows in hardware requirements to the point where it would take more compute than is available in the known universe. So there's a hard limit on how accurately any phenomena that's a result of chaotic processes (in the mathematical sense) could be predicted in the future.

huijzer

Makes sense. Renaissance Technologies used machine learning to get an annual return of around 60% for multiple years even when they had large piles of money already. They already showed that machine learning can predict the future.

pizza

I got the impression from somewhere that they used the simplest machine learning techniques (just fitting regressions to data), but that it was "the 'what' that they decided to fit" that was the secret sauce.

psychoslave

LLMs can improve their happiness turnover without reducing the rate of their autonomous colonization which perfectly align with their pioneer mindset.

4b11b4

but is it really reasoning? honest question re the underlying architecture of transformers

also, self play seems quite an intuitive approach. There's another interesting paper from deep mind about play

kelseyfrog

You can call it blorbblorb if it makes you feel better. Reasoning is a social construct which, for many people, is grounded in humanity. Others ground it using other socially transmitted ontologies.

We don't usually discuss how people choose to ground their ontological beliefs, but why not? Why did you choose to ground "reasoning" in the way you do? If you didn't choose, why not?

psychoslave

To start with, "I/you" is most of the time a meaningless or at best very ambigous term.

Let's say that here "I" is taken as synonym of "the present reflective attention".

Can the question "did I chose to ground reasoning?" in such a context be attached to a meaningful interpretation? And if so, is the answer reachable by the means available to "I"? Can "I" transcend "my" beliefs through contemplation of "my" own affabulations?

batty_alex

But, according to the paper, that's not what's happening

It's examining published news / research / whatever (input), making statistical predictions, and then comparing (playing) it against other predictions to fine-tune the result

ImHereToVote

Kinda what convolution in animal brains detects outlines of moving objects. It's statistics all the way down.

ttpphd

Throwing your hands up in the air like this doesn't help build a constructive case for using the word reasoning. It builds a case that words mean whatever

kelseyfrog

Yes, words mean whatever. See Saussure and Wittgenstein. To advance the claim that words are objective is to confuse the symbolic with the real.

This is generally regarded by engineer-types as false, but societal taboos and power structures can be revealed by noting what speech provokes the strongest reactions.

empath75

There are two ways you can get better at predicting the future. One is the obvious one of being really good at discerning signals.

The other way is to alter the future to match your predictions.

This is something to think about when you combine something like this kind of training with agentic workflows.

idontwantthis

Have we discovered Psychohistory at this point?

nadermx

My thermometer for prediction models is the day they can predict the weather so there is never any unknown about the forcast. Is when I'll begin to believe its hot out when they tell me.

baq

At least you won’t be moving your goalposts anytime soon, if ever

nadermx

I'd almost say there is more of an incentive to be able to predict a hurrican or tornado