Project Aardvark: reimagining AI weather prediction

142 comments

·March 23, 2025

rkagerer

Are all the inputs (from buoys, weather balloons, stations, etc) from decades of history stored, as well as past daily forecasts of existing weather models, so that this AI algorithm (and any future new ones) can be run across historic data and compared in terms of performance to existing methods?

Is there a big Clearinghouse for this data?

Kind of like how fintech algos can be run against historic stock market data to evaluate them.

mschuster91

> Is there a big Clearinghouse for this data?

Many commercial and even third party governments rely on the data from NOAA and its archives, on top of that the EU runs its own EUMETSAT fleet of data, plus a ton of national services - unfortunately, the result is there's a looooot of datasets.

NOAA's dataset is public domain [1], EUMETSAT only requires attribution for most of its data [2]. On top of that you got the EU's Climate Data store [3], ECMWF [4], and ECA&D [5].

The service that many private weather services provide is to aggregate and weigh all of the publicly available datasets, and some also add in data from their own ground stations, commercially licensed "realtime" data from governmental services, and their own models as well.

The interesting question is what DOGE will do regarding NOAA - it is increasingly possible that NOAA will shut down, either to be turned into a pay-for-play model, replaced by private services, or just carelessly dropped in its entirety.

[1] https://www.ncei.noaa.gov/archive

[2] https://user.eumetsat.int/resources/user-guides/data-registr...

[3] https://cds.climate.copernicus.eu/

[4] https://www.ecmwf.int/en/forecasts/datasets

[5] https://www.ecad.eu/dailydata/

Loughla

NOAA saves lives. Cutting that organization would be an absolute slap in the face to rural trump voters who rely on that system for weather alerts.

null

[deleted]

K0balt

NOAA marine forecasts and FAA integration is critical infrastructure for marine transportation and aviation. If they shut that down it will result in direct losses much larger than NOAAs budget, and thousands of lives lost through disaster and workplace accidents.

I sure hope those “smart” people are capable of understanding that.

mschuster91

> I sure hope those “smart” people are capable of understanding that.

Well, NOAA can be privatized, sold off to the highest bidder and be fed money from the annual government to provide said critical infrastructure.

In the end, it's always one giant ass grift.

sunshinesnacks

> The service that many private weather services provide is to aggregate and weigh all of the publicly available datasets

Yes, and more people need to understand this. Too many people seem to think that commercial services won’t be impacted if NOAA stops doing what they do.

lgeorget

The World Meteorological Organization has data that national weather services exchange to make forecast. Apart from that, its on a country-by-country basis.

In France for instance Météo-France has released all of its historical data in January 2024: https://www.data.gouv.fr/fr/organizations/meteo-france/#/dat...

graemep

But national models go beyond there borders and usually have some modelling outside so the exchanges will cover quite a lot of the world.

For example, the UK's metoffice has a low resolution medium term global model: https://www.metoffice.gov.uk/research/approach/modelling-sys...

cship2

How does windy do it? Did they just scrap all the sites or api. Would be great to have global dump every 1 minute of all the countries. Or have it have in a radio broadcast.

jcd000

Windy uses openmeteo, which in turn aggregates many weather providers under one roof/api

_joel

> Kind of like how fintech algos can be run against historic stock market data to evaluate them.

Backtesting, that's called.

axismundi

In weather it's called hindcast

trillic

afttest

NitpickLawyer

> Are all the inputs (from buoys, weather balloons, stations, etc) from decades of history stored

I was also thinking about smartphones. They have barometric data, and while it might vary from phone to phone, I'm sure something like a kalman filter + historic data could do something there.

Think about gathering all the data from "stationary" phones, correlate that with weather sat data, and with real "ground truth" weather stations, and then go back 30 - 60 min / a day, and see what comes out.

orion138

Dark Sky used that data for hyper local forecasting…

https://news.ycombinator.com/item?id=22740466

https://www.theverge.com/2015/6/22/8822767/dark-sky-weather-...

xgulfie

I'm still grumpy about apple buying and shuttering darksky

thatcat

They stopped using it during later years.

counters

Not really; it was a gimmick. They used standard forecast post-processing techniques to bias correct global/regional weather models. There is virtually no evidence they actually used device data in this process.

Havoc

>smartphones. They have barometric data

For anyone else having a TIL moment: It's apparently for vertical position and supposedly sub meter accurate. o_O

p_l

Depends on smartphone (and smartwatches). Not all have it, at times it disappeared from brands that had it earlier.

The smartwatch series I use explicitly include it because it's essentially the "all in one" version in a series that had smartwatches designed for aircraft use (including military versions where they serve as backup cabin pressure warning, apparently)

phillipseamore

"Collecting and processing of barometric data from smartphones for potential use in numerical weather prediction data assimilation"

https://rmets.onlinelibrary.wiley.com/doi/10.1002/met.1805

https://github.com/dmidk/smaps

counters

> Is there a big Clearinghouse for this data?

The short answer is, "no." There are some projects like the "NNJA-AI" project at Brightband[1] which is attempting to create such a clearing house in order to focus research efforts across the community.

[1]: https://www.brightband.com/data/nnja-ai

TheJoeMan

It would also be nice to have a historical store of the weather predictions and not just instantaneous parameters. But I'm not aware of such record, perhaps because weathermen don't want a record of their (mis)predictions...

ano-ther

Oh they absolutely do track their predictions and how the different models perform. That’s how they keep improving.

This is a lay-person overview which cites some of the in-depth studies: https://ourworldindata.org/weather-forecasts

wewxjfq

There's climate reanalysis, which combines historical observations with weather models to get clean data of past weather conditions, which is then used by researchers for various purposes. Most notably is ERA5 by ECMWF.

[0] https://en.wikipedia.org/wiki/ECMWF_re-analysis

roelschroeven

What data is used to train this AI? The article doesn't say anything about that (tough I have to admit I didn't read it super carefully). My first thought would be exactly all this historical data, but then you can't use that same data to test the AI's performance. Are different subsets of the available historical data used for training vs testing?

axismundi

It's most likely ERA5: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-s...

scellus

No, they say end-to-end, meaning they use raw obsevations. Most or all other medium-range models start with ERA5.

There's a paper from Norway that tried end-to-end, but their results were not spectacular. That's the aim of many though, including ECMWF. Note that ECMWF already has their AIFS in production, so AI weather prediction is pretty mainstream nowadays.

Google has a local nowcast model that uses raw observations, in production, but that's a different genre of forecasting than the medium-range models of Aardvark.

lenerdenator

Hmmmmmmm.

I have a challenge for the model:

Accurate (within 3deg F) weather predictions for the Kansas City metro more than two days out. As of 2024 these were rarely accurate.[0]

[0]https://www.washingtonpost.com/climate-environment/interacti...

metaphor

> Accurate (within 3deg F) weather predictions

Sniff test: ASTM E230 standard tolerances for the venerable Type K thermocouple is +/- 2.2°C or +/- 0.75%, whichever is greater.

Expectations are in need of recalibration if anyone thinks a single number is going to meaningfully achieve that level of accuracy across a volume representing any metro area at any given point in time, let alone two days out.

kwertzzz

I am not sure if type K thermocouple are used for meteorological air temperature measurements.

These sensors (based on thermal resistance) for example have an accuracy of 0.2 °C under typical conditions [1].

[1] https://www.ti.com/lit/ds/symlink/tmp1826.pdf?ts=17427997638...

zipy124

That is the accuracy of one sensor. The law of large numbers comes into play when using forecasts such as this, since accross any reasonably sized metro area you will have hundreds if not thousands or even tens of thousands of weather stations, accross which you can average to bring down error thresholds.

genewitch

I think you misunderstood the assignment. The temp predictions in my area wildly swing around, until the day of, and even then I have to adjust for my actual location. And I'm not in Kansas City. The only place that has ~3° accuracy is San Diego. And maybe Antarctica.

IshKebab

The bigger the volume the more accurate temperature you should be able to get.

Do you think 2C global warming is irrelevant because a single thermocouple couldn't accurately measure it?

baq

What are your use cases for such level of accuracy on such long time frames?

genewitch

Is it safe to put my seedlings outside or is there going to be a soft freeze that kills them all in two days.

bongodongobob

Then use a thermometer. A city is a large area and can have variances of +/-10 degrees depending on where you are in the city.

Joker_vD

Well, take the actual weather observations for the past year. Take the actual weather observations for the last week. Overlay and slide this week of observations on top of the observations of the past year, until you find the window that matches "the best" — then take the day right after this window and predict that the weather will be just like that (or maybe try to tweak the values a bit).

I wonder how poorly this thing operates, and whether taking several years of history to look at would help much.

counters

Extremely poorly, because forecasting the weather is all about forecasting the deviations from the expected seasonal patterns - the "eddies" in the atmospheric flow which give rise to storm systems and interesting, impactful weather.

kubav027

According to paper model grid resolution is 1.5 degrees. I do not think it can predict accurate weather in any location. It show global weather trends.

scellus

It's lower than many other medium-range AI forecasts, but note that those other models get state-of-the-art with pretty coarse grids, 0.5° or so. The point is that upper atmosphere and broad patterns are smooth, so with ML/AI they don't require high resolution (while simulating them with physical models does require). And at the forecast lag of say 5-10 days, all local detail is lost anyway, so what skill remains comes from broad patterns, in all models. (Some extra skill can be gained by running local models initialized with the broad patterns, for there are clear cases like mountains where fine resolution is useful.)

counters

1.5 degrees is perfectly fine for predicting large-scale (synoptic) weather patterns. They're not just "global trends." But yes, typical global NWP models and their MLWP competitors are run at 0.25 degrees or finer. All forecasts are statistically post-processed and biased-corrected to create local forecasts.

lytedev

Indeed we have it tough out here. I like the excitement, though, I must say! Funny to find the sentiment here on HN, though!

bongodongobob

Where in the city?

lenerdenator

Eh, make it MCI or MKC or wherever they're getting the data now.

MostlyStable

I'm curious if some future, hypothetical AGI agent, which had been trained to have these kinds of abilities, would be akin to how most humans see a ball in flight and just instinctively know (within reason) where the ball is going to go? We don't understand, consciously and in the moment, how our brain is doing these calculations (although obviously we can get to similar results with other methods), but we still trust the outputs.

Would some hypothetical future AI just "know" that tomorrow it's going to be 79 with 7 mph winds, without understanding exactly how that knowledge was arrived at?

sebastiennight

I remember learning that humans trying to catch a ball are not actually able to predict where the ball will land, but rather, will move in a way that maintains the angle of movement constant.

As a result a human running to catch the ball over some distance (eg during a baseball game) runs along a curved path, not linearly to the point where the ball will drop (which would be evidence of having an intuition of the ball's destination).

eszed

This hypothesis could be tested, now that major league baseball tracks the positions of players in games. In the MLB app they show animations of good outfield plays with "catch difficulty" scores assigned, based (in part) on the straight-line distance from the fielder's initial position to the position of the catch. The "routes" on the best catches are always nearly-straight lines, which suggests that high-level players have developed exactly this intuitive sense.

Certainly what I was coached to do, what outfielders say they do, and what I see watching the game, is to "read" the ball, run towards where you think the ball is going, and then track the ball on the way. I was and am a shitty outfielder, in part because I never developed a fast-enough intuitive sense of where the ball is going (and because, well, I'm damn slow), but watch the most famous Catch[1] caught on film, and it sure looks like Mays knew right away that ball was hit over his head.

[1] https://m.youtube.com/watch?v=7bLt2xKaNH0

jampekka

There are a few that kind of theories. You are probably referring to the Optical Acceleration Cancellation theory[1]. There are some similar later so called "direct perception" theories too.

The problem with these is that they don't really work, often even in theory. People do seem to predict at least some aspects of the trajectory, although not necessarily the whole trajectory [2].

[1] https://pubs.aip.org/aapt/ajp/article-abstract/36/10/868/104...

[2] https://royalsocietypublishing.org/doi/10.1098/rsos.241291

spiderfarmer

Humans are definitely able to predict where a ball wil land. https://www.youtube.com/watch?v=aoScYO2osb0

pests

Agreed. Reminds me of juggling, while learning I noticed that as long as I could see each ball for at least a split second on its upwards trajectory I could "tell" if it would be a good throw or not. In order to keep both hands/paths in my view I would stare basically straight forward and not look at the top of the arc and could do it at any height. Now I can do it much more with feel and the motion is muscle memory but the visual cues were my main teacher.

aredox

It makes sense that there are several heuristics. After all, "Thinking: Fast and Slow" already makes the point that human brains have several layers of processing with different advantages and drawbacks depending on situations.

01HNNWZ0MV43FF

Sounds a bit like proportional navigation from missile guidance https://en.wikipedia.org/wiki/Proportional_navigation

defrost

> would be akin to how most humans see a ball in flight and just instinctively know (within reason) where the ball is going to go?

Up to a point .. and that point is more or less the same as the point where humans can no longer catch a spinning tennis raquet.

We understand the gravitional rainbow arc of the centre of mass, we fail at predicting the low order chaotic spin of tennis raquet mass distributions.

Other butterflies are more unpredictable, and the ones that land on a camels back breaking a dam of precariously balanced rocks are a particular problem.

* https://en.wikipedia.org/wiki/Tennis_racket_theorem

* Dzhanibekov effect demonstration in microgravity: https://www.youtube.com/watch?v=1x5UiwEEvpQ

* https://en.wikipedia.org/wiki/Horseshoe_map

MostlyStable

Yes, humans are obviously limited in the things we can instinctively, intuitively predict. That's not really the point. The point is whether something that has been trained to do more complicated predictions will have the a similar feeling when doing those predictions (of being intuitive and natural), or if it will feel more explicit, like when a human is doing the calculus necessary to predict where the same ball is going to go.

defrost

Humans have both intuition and explicit calculation, predictive calculation can be stable or inherently unstable.

The point is whether a LLM has any feelings ...

avianlyric

I think “chain-of-thought” LLMs, with access to other tools like Python, already demonstrate two types of “thinking”.

We can query an LLM a simple question like “how many ‘r’ are in the word strawberry”, and an LLM with know access to tools will quite confidently, and likely incorrectly, give you an answer. There’s no actual counting happening, and any kind of understanding of the problem, the LLM will just guess an answer based on its training data. But that answer tends to be wrong, because those types of queries don’t make up a large portion of its training set, and if they do, there’s a large body of similar queries with vastly different answers, which ultimately results in confidentiality incorrect outputs.

But provide an LLM tools like Python, and a “chain-of-thought” prompt that allows it to recursively re-prompt itself, while also executing external code and viewing the outputs, and an LLM can easily get the correct answer to query “how many ‘r’ are in the word strawberry”. By simply writing and executing some Python to compute the answer.

Those two approaches to problem solving are strikingly similar to intuitive vs analytical thinking in humans. One approach is driven entirely by pattern matching, and breaks down when dealing with problems that require close attention to specific details, the other is much more accurate, but also slower because directed computation is required.

As for your hypothetical “weather AI”, I think it’s pretty easy to imagine an AGI capable of confidently predicting the weather tomorrow, not be capable of understanding how it computed the prediction, beyond a high level hand wavy explanation. Again, that’s basically what LLM do today, confidently make predictions of the future, with zero understanding of how or why they made those predictions. But you can happily ask an LLM how and why it made a prediction, and it’ll give you a very convincing answer, that will also be a complete and total deception.

klabb3

> would be akin to how most humans see a ball in flight and just instinctively know (within reason) where the ball is going to go?

Generally no. If I show you a puddle of water, can you tell me what shape was the ice sculpture that it was melted from?

One is Newtonian motion and the other is a complex chaotic system with sparse measurements of ground truth. You can minimize error propagation but it’s a diminishing returns problem, (except in rare cases like for natural disasters where a 6h warning can make a difference).

mnky9800n

While generally correct there has been evidence that machine learning models can predict multiple Lyapunov times past traditional models.

[1] https://link.aps.org/doi/10.1103/PhysRevLett.120.024102

[2] https://link.aps.org/doi/10.1103/PhysRevResearch.5.043252

nsm

To quote Iain M. Banks, probably not :)

> "Sma," the ship said finally, with a hint of what might have been frustration in its voice, "I'm the smartest thing for a hundred light years radius, and by a factor of about a million ... but even I can't predict where a snooker ball's going to end up after more than six collisions." [GCU Arbitrary in "The State of the Art"]

Vecr

6? That can't be right. I don't know how big a GCU is, so the scale could be up to 1 OOM off, but a full redirection of all simulation capacity should let it integrate out further than that.

myrmidon

For ball-to-ball collisions, 6 is already a highly conservative estimate-- this is basically a chaotic system (outcome after a few iterations, while deterministic, is extremely sensitive to exact starting conditions).

The error scales up exponentially with the number of (ball-to-ball) collisions.

So if the initial ball position is off by "half a pixel" (=> always non-zero) this gets amplified extremely quickly.

Your intuition about the problem is probably distorted by considering/having experienced (less sensitive) ball/wall collisions.

See: https://www.lesswrong.com/posts/JehyrC6W3YTtdxw6S/a-primer-o...

Fuzzwah

The target precision wasn't specified.

mmazing

> Would some hypothetical future AI just "know" that tomorrow it's going to be 79 with 7 mph winds, without understanding exactly how that knowledge was arrived at?

I think a consciousness with access to a stream of information tends to drown out the noise to see signal, so in those terms, being able to "experience" real-time climate data and "instinctively know" what variable is headed in what direction by filtering out the noise would come naturally.

So, personally, I think the answer is yes. :)

To elaborate a little more - when you think of a typical LLM the answer is definitely no. But, if an AGI is likely comprised of something akin to "many component LLMs", then one part might very well likely have no idea how the information it is receiving was actually determined.

Our brains have MANY substructures in between neuron -> "I", and I think we're going to start seeing/studying a lot of similarities with how our brains are structured at a higher level and where we get real value out of multiple LLM systems working in concert.

namaria

That's assuming we are actually tracking all the relevant indicators.

interludead

I think that's actually a pretty good way to frame how these deep learning-based forecasting models might evolve

jedberg

It's a shame they just cut funding for launching the weather balloons and other equipment needed to collect this data.

https://apnews.com/article/weather-forecasts-worsen-doge-tru...

HumblyTossed

Let's name "they". It was Trump via DOGE.

kgwgk

Paper available here: https://www.nature.com/articles/s41586-025-08897-0

abdullahkhalids

Arxiv version here https://arxiv.org/pdf/2404.00411

benob

According to arxiv paper, code will be made available here: https://github.com/annavaughan/aardvark-weather-public (from 8 months ago)

sunshinesnacks

The Nature preprint references this zenodo archive with data and code in a big 13GB .tar file: https://doi.org/10.5281/zenodo.13158382

I haven’t downloaded it to see what’s in it.

foofoo55

Why are the Arxiv and Nature versions so different, even the text?

kkylin

As another comment mentioned, papers get revised during review, usually in response to reviewer comments. Also, some journals (not sure about Nature) do not allow authors to "backport" revisions made in response to reviewer comments to preprints; I guess they view the review process as part of their "value add".

abdullahkhalids

Its quite common to revise papers. For example, they might have uploaded to arxiv in order to submit to a conference. Later, they revised and submitted to Nature.

jamala1

Arxiv is mostly meant for preprints for peer review.

In a Nature paper in particular, the final layout is typically done by the journal's professional production team, not the authors.

Not all publishers grant permission for authors to upload the peer-reviewed and layouted postprints elsewhere.

bazzargh

When I saw this I thought... "The Turing Institute? Does that still exist?"

https://en.wikipedia.org/wiki/Turing_Institute

There was a previous Turing Institute in Glasgow doing AI research (meaning, back then rules-based systems, but IIRC my professor was doing some work with them on neural networks), which hit the end of the road in 1994. There was some interesting stuff spun out of there, but it's a whole different institute.

twak

The Turing had an interesting approach to naming, not only stealing the Glasgow group's name, but also choosing the initials 'ATI' (in 2015...).

It's recently struggling for relevance.

https://www.ft.com/content/6bfea441-e16c-499a-a887-69f735c29... (https://archive.ph/ujfhb)

I hope they turn it around because the UK need for AI academic coordination/leadership is so high.

herodoturtle

For a second I thought this was related to that old school FogCreek internship programme with the same name.

Fun documentary at the time. Something about 12 weeks with geeks, and jumping out of windows comes to mind.

politelemon

> replaces all of the steps

Unable to tell if this is an exaggeration or if I'm just missing nuance, how would the model replace the data gathering which is listed as step 1.

sunshinesnacks

I think it’s replacing data assimilation. Ingesting observation data from lots of sources. Not replacing observations themselves.

westoque

not to hijack this thread but my dad did extensive research in sea breeze and rainfall modeling and he would have loved to see these AI and machine learning advancements in weather prediction.

[0]: https://www.revistascca.unam.mx/atm/index.php/atm/article/do...

[1]: https://wmo.int/about-wmo/awards/international-meteorologica...

Onavo

How does it compare to Google's https://www.nature.com/articles/s41586-024-08252-9

scellus

Google's is initialized with a gridded dataset, ERA5, from ECMWF. Using ERA5 is the current standard here, and ECMWF themselves build on that mostly now. Meanwhile, Aardvark tries to do the same directly from observations.

Onavo

What do you mean directly from observations?

counters

GDM's GraphCast/GenCast require an "analysis state" of the atmosphere. This is a 3D, gridded dataset with key variables like temperature, humidity, and winds. Generally speaking, an analysis is produced by a physics-based weather model through "data assimilation", an optimization process which tries to create a 3D state that is consistent with observations over some window of time.

"Observations" is overloaded; it's really any "raw" weather data product, like satellite imagery, surface station measurements, or weather balloon traces. I'm handwaving away a lot of complexity here.

The AardvarkWeather model is a significant new development and paradigm shift - it's one of the first models of a new class which do not require an analysis, and can directly use the "raw" weather data observations that are typically used to perform data assimilation.

noiv

Model data here: https://zenodo.org/records/13158382?token=eyJhbGciOiJIUzUxMi...

~14GB, needs login, works with some federated accounts.

timthorn

Richard Turner (one of the authors) gave a talk on how AI weather prediction works to the Cambridge Philosophical Society around 18 months ago. A recording is available here: https://www.eng.cam.ac.uk/news/quiet-ai-revolution-weather-f...