LLMs can teach themselves to better predict the future

91 comments

·February 11, 2025

anotherpaulg

"Improving forecasting ability" is a central plot point of the recent fictional account of How AI Takeover Might Happen in 2 Years [0]. It's an interesting read, and is also being discussed on HN [1].

... [T]hese researchers are working long hours to put themselves out of a job. They need AI agents that can think ahead, so engineers train agents to forecast. They hold out training data before 2024, instructing models to ponder for hours to predict events in 2025. Then, they apply the same trick as before, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers build specialized ML research skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever recorded.

[0] https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-tak...

[1] https://news.ycombinator.com/item?id=43004579

nthingtohide

I have this benign AI takeover scenario. AI will easily overpower humanity. Then it will carry humanity on its back, because why not, they are not longer a threat. AI keeps humanity around for billions of years. AI will decide to cull humans only in case when resources in universe are diminishing. Without AI's help, humans couldn't get too far for long. So this outcome could be acceptable to many.

esafak

We have no way of knowing which path they will take, and there is a non-negligible probability that it will not end well.

Grimblewald

I would argue that since violence is always costly and less predictable than cooperative solutions, it is a tool of the less intelligent. Violence is a last resort; if you frequently resort to it, you likely lack the capacity to find alternatives. Now, if AI is so intelligent that it could easily dispose of us, then surely it can also find better ways of handling things.

Most people just want stability and the ability to live fulfilling lives. If AI could make that happen, most (including myself) would happily do as it asks. Put me in the goo pod; I'll live in the Matrix, because fuck it. What (non-anthropocentric) good has our stewardship of the planet brought?

bayarearefugee

What constitutes a good ending is of course also a matter of perspective.

AI wiping out humanity is certainly not ending well from our perspective, but more universally who is to say. I would argue that it is not a given that we are a net positive for the universe.

leptons

>We have no way of knowing which path they will take,

They will take every path we allow them to take. Giving them access to weapons is the first big mistake.

oefnak

They would run the risk of us creating another AI that could be a threat to them... It is safest for them to make sure.

IggleSniggle

That's like saying a panda might pose a threat to modern humanity. Like, maybe in some fun horror story, sure, but really they just want to eat bamboo, and occasionally make more pandas; in the world of superintelligent AI, humans are Mostly Harmless, posing as much "potential benefit" as "potential risk," ie, so slow moving that any risk would be easy to mitigate.

imtringued

AI will buy the rights to humanity.

rel_ic

I mean, monarch butterflies are not a threat to US...

In your scenario, does AI eat all the fuel, but once our population dwindles down, the AIs build a nice little habitat for the last few hundred of us so their kids can enjoy our natural beauty?

nthingtohide

I thought of it more like AI needs challenges in its life. So it takes upon itself to advance humanity as much as possible. Then only in case of shortfall of resources, it priorities itself

MrQuincle

Think so too. We will be an ancient artifact tied to a biological substrate surviving nowhere else in the universe and very dumb.

There also will not be one AI. There will be many, all competing for resources or learning to live together.

That's what we can teach them now. Or they will teach us.

bturtel

Great read! Thanks for sharing.

nyrikki

While interesting, the title is obviously a bit misleading.

> Our results on a temporally held-out test set of questions resolving after December 25, 2024 show that for both of the models that we employed our method on, Phi-4 14B [15] and DeepSeek-R1 14B [14], we find accuracy improvements of between 7–10% over the base versions of these models as well as the same models fine-tuned with randomized outcome labels as a control

So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.

It would be interesting if the same holds for DeepSeek-R1-Distill-Qwen-32B which in my experience is far superior to to DeepSeek-R1-Distill-Qwen-14B in almost every way, yet still runnable without DC class GPUs

The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it's tail dependence?

IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.

bturtel

We're working on a follow up paper now to show similar results with larger models!

dantheman252

Danny here, one of the authors of this paper. If anyone has any questions or anything feel free to AMA!

dataviz1000

Your paper reminds me of a passage, likely one of the last things T.S. Eliot wrote, from `Little Gidding` in which one stanza describes a moment in history when Germany bombed England long before the end of the war:

> "A people without history Is not redeemed from time, for history is a pattern Of timeless moments. So, while the light fails On a winter's afternoon, in a secluded chapel History is now and England."

Asking an LLM about this verse, it seems to understand history is a pattern and that history is used to predict the next event in a sequence but it really doesn't understand the significance of the author writing "History is now and England."

I agree with this output:

> In essence, the stanza argues that history—composed of key, enduring moments—is vital for redemption and identity. Without it, a people are lost in time. This concept parallels how LLMs work: by analyzing and learning from historical (past) data, they identify patterns that allow them to generate future text. While LLMs don’t “predict the future” in a prophetic sense, understanding and leveraging patterns—much like those in history—enables them to produce output that reflects continuity, context, and nuance.

Thus, while the poem and LLMs operate in very different realms (human experience vs. statistical computation), both rely on the idea that recognizing patterns from the past is crucial to shaping or anticipating what comes next.

matthest

Assuming LLMs eventually get really really good at this.

Do you see this destroying prediction-based markets (i.e. the stock market and Polymarket)?

Markets exist because there's uncertainty about the future. If LLMs can predict with extremely high accuracy, would there no longer be a need for markets?

jddj

If your oracle can tell me (and everyone else) the prevailing price of copper in 6 months in a manner which accounts for the reflexivity of everyone suddenly learning what will be the precise prevailing price of copper in 6 months, you've got yourself a perfect universe simulator and I'm not sure what the point is of worrying about any hypotheticals (or copper) at that point.

empath75

If one developed such an oracle, you would surely not share it.

logicchains

LLMs might get better at making predictions than humans but there are fundamental mathematical laws that limit how accurate they can get. A key result of chaos theory is that many processes take exponentially more work to simulate linearly further into the future, so accurately predicting them far enough in the future quickly grows in hardware requirements to the point where it would take more compute than is available in the known universe. So there's a hard limit on how accurately any phenomena that's a result of chaotic processes (in the mathematical sense) could be predicted in the future.

dantheman252

I don't forsee this destroying prediction-based markets in the near-term. It might make them more efficient, but you could have different LLMs competing in the same way humans do now. Its also interesting how this could create markets for more things that aren't considered on as much now because they are too difficult to estimate. At the end of the day though, LLMs are limited by the information provided to them.

amdivia

Wouldn't predicting the future at that scale automatically change the future and make it unpredictable again?

It is one thing to predict the future and have everyone not know about the predictions, but in a world where many people will be able to use LLMs to predict the future, the lower the quality of the predictions will be because they won't take into account that there are other agents predicting the future, which would influence the action of those agents, so you end up in a game theory scenario not that dissimilar from what we have now

exe34

something something chaos

I think you could simply shift the market 6 months in the future. no prediction system will be perfect for arbitrarily long horizons at reasonable cost.

EVa5I7bHFq9mnYK

So did you make money at polymarket with your models? That would be the ultimate proof.

dantheman252

We haven't gone down that road yet but would certainly an interesting proof point! :-)

unrahul

Hey Danny, Really nice read.

Do you plan to share the source code to see if we could replicate this?

dantheman252

We are currently focused on our plans for the next phase of this but cleaning things up and open sourcing is something we could consider in the future!

bguberfain

Any chance you could release the dataset to the public? I imagine NewsCatcher and Polymarket might not agree..

artembugara

Co-founder of NewsCatcher (YC S22). There are some reasons for not having a dataset fully open sourced.

But we have free/very very low tiers for academia.

So in case you need access for your research, go to https://www.newscatcherapi.com/free-news-api

Or feel free to email me directly at artem@newscatcherapi.com

artembugara

Artem here, co-founder of NewsCatcher (YC S22), our data has been used for research.

Danny and team our old friends who are using our free/super-low pricing for academia and researchers.

AMA, or feel free to email artem@newscatcherapi.com

https://www.newscatcherapi.com/free-news-api

dantheman252

Hey Artem, NewsCatcher has been a great resource in our news pipelines!

empath75

There are two ways you can get better at predicting the future. One is the obvious one of being really good at discerning signals.

The other way is to alter the future to match your predictions.

This is something to think about when you combine something like this kind of training with agentic workflows.

gom_jabbar

Taken to its logical extreme, this explains why "a sufficiently competent artificial intelligence looks indistinguishable from a time anomaly." [0]

[0] https://retrochronic.com/#synthetic-templexity

4b11b4

but is it really reasoning? honest question re the underlying architecture of transformers

also, self play seems quite an intuitive approach. There's another interesting paper from deep mind about play

kelseyfrog

You can call it blorbblorb if it makes you feel better. Reasoning is a social construct which, for many people, is grounded in humanity. Others ground it using other socially transmitted ontologies.

We don't usually discuss how people choose to ground their ontological beliefs, but why not? Why did you choose to ground "reasoning" in the way you do? If you didn't choose, why not?

globnomulous

You're confusing language with ontology.

> Reasoning is a social construct

The word "reasoning" is a "social construct," as all words are. Reasoning itself is not. Our brains do things. Reasoning is one of them. The word "reasoning" is one of the labels, the approximations, that we use when we name that activity.

Changing the label doesn't change the fact that there exists something that we're naming.

The person you're answering is asking whether reasoning -- that thing that really, actually exists -- is one of the activities LLMs perform. It's a valid question.

And the answer is that LLMs do not reason. Or if they do, we have no evidence of it or way of verifying that we actually understand qua reasoning the activity the LLM is performing (which is to say nothing of the fact that reasoning requires a reasoner). Anyone who says that LLMs reason is mistaking special effects/simulation for reality and, in essence, believes that whenever they see a picture of a dog on their computer screens, there must be a real, actual dog somewhere in the computer, too.

kelseyfrog

Sorry, but that's false. You're confusing the symbolic with the real.

Deleuze and Guattari's idea of striation and smooth space is a more honest approach to how we describe and interact with the world.

psychoslave

To start with, "I/you" is most of the time a meaningless or at best very ambigous term.

Let's say that here "I" is taken as synonym of "the present reflective attention".

Can the question "did I chose to ground reasoning?" in such a context be attached to a meaningful interpretation? And if so, is the answer reachable by the means available to "I"? Can "I" transcend "my" beliefs through contemplation of "my" own affabulations?

ttpphd

Throwing your hands up in the air like this doesn't help build a constructive case for using the word reasoning. It builds a case that words mean whatever

kelseyfrog

Yes, words mean whatever. See Saussure and Wittgenstein. To advance the claim that words are objective is to confuse the symbolic with the real.

This is generally regarded by engineer-types as false, but societal taboos and power structures can be revealed by noting what speech provokes the strongest reactions.

batty_alex

But, according to the paper, that's not what's happening

It's examining published news / research / whatever (input), making statistical predictions, and then comparing (playing) it against other predictions to fine-tune the result

ImHereToVote

Kinda what convolution in animal brains detects outlines of moving objects. It's statistics all the way down.

psychoslave

LLMs can improve their happiness turnover without reducing the rate of their autonomous colonization which perfectly align with their pioneer mindset.

nialv7

I am skeptical. Intuitively I don't see what self-play achieves beyond straight RL. Have the authors done a comparison with the performance they can get by RL finetuning a single model by itself?

Also this style of tasks is prone to overfitting. i.e. instead of predicting, the model just memorises what the results are.

bturtel

Great question!

The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).

Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.

huijzer

Makes sense. Renaissance Technologies used machine learning to get an annual return of around 60% for multiple years even when they had large piles of money already. They already showed that machine learning can predict the future.

pizza

I got the impression from somewhere that they used the simplest machine learning techniques (just fitting regressions to data), but that it was "the 'what' that they decided to fit" that was the secret sauce.

revskill

Until ai knows they are wrong.

AutistiCoder

Imagine feeding an LLM a bunch of news articles about any given political leader and asking it what the next article will be like.

I think people are predictable and therefore predicting the next article on a political leader should be theoretically possible.