Skip to content(if available)orjump to list(if available)

ChatGPT now performs well at GeoGuesser

barcode_feeder

I gave it a series of 11 images stripped of all metadata. It performed quite well, only misidentifying the two taken in a small college town in the NE of the US. It got two questions correct on photos taken in Korea (one with a fairly clear view of Haneul Park, the other a rather difficult to identify picture not resembling anything on google of Sunrise Peak). It got every other question in the US correct, ranging from some under-construction Austin taken from the river to some somewhat difficult shots in NYC (the upper halves of some building from Rockefeller terrace to the black wall of the MOMA). While not perfect, I'm bluntly shocked at how well it performed

thomasfromcdnjs

I uploaded this image that I screenshotted off Google street view (no metadata) and it got with 200m.

https://chatgpt.com/share/6801bbf7-fd40-8008-985d-75c8813f55...

There is the chat.

Weirdly it said, "I’ve seen that exact house before on Google Street View when exploring Cairns neighborhoods."

geysersam

> Weirdly it said, "I’ve seen that exact house before on Google Street View when exploring Cairns neighborhoods."

That's slightly creepy!

oezi

The anthropomorphisation certainly is weird. But the technical aspect seems even weirder. Did OpenAI really build dedicated tools to have their models train on Google Street View? Or do they have generic technology for browsing complex sites like Street view?

casey2

They should definitely add that feature.

Tell it your name and then it just looks you up and street views your house, and puts that all into memory.

bluesnews

It might train off of street view

marxisttemp

This is the most impressive ChatGPT chat I’ve seen yet. While I theoretically can accept how large-scale probabilistic text generation can lead to this chain of “reasoning”, it really feels like actual intelligence.

HaZeust

It's been intelligence for a long time; the goalposts just shift, and people can't abstract the idea to an LLM. But language processing and large data processing itself IS a form of intelligence.

PhilipRoman

Maybe you're right, but I think it's more likely that it had been trained on street view photos and then invented a plausible justification for the guess afterwards (which is something I often see ChatGPT do, when it easily arrives at the correct answer, but gives bullshit explanations for it).

CSMastermind

I played a round of Geoguessr against it and while it did a shockingly good job compared to what I was expecting, it still lags behind even novice human players.

The locations and its guesses were:

Bliss, Idaho - Burns, Oregon (273 miles away)

Quilleco, Biobio, Chile - Eugene, Oregon (6,411 miles away)

Dettighofen, Switzerland - Mühldorf, Germany (228 miles away)

Pretoria, South Africa - Johannesburg, South Africa (36 miles away)

Rockhampton, Australia - Gold Coast, Australia (437 miles away)

CSMastermind

Okay, I decided to benchmark a bunch of AI models with geoguessr. One round each on diverse world, here's how they did out of 25,000:

Claude 3.7 Sonnet: 22,759

Qwen2.5-Max: 22,666

o3-mini-high: 22,159

Gemini 2.5 Pro: 18,479

Llama 4 Maverick: 14,316

mistral-large-latest: 10,405

Grok 3: 5,218

Deepseek R1: 0

command-a-03-2025: 0

Nova Pro: 0

nemo1618

Neat, thanks for doing this!

msephton

How does Google Lens compare?

bn-l

What about 04-mini-high ?

jen729w

I just took a picture from my own front porch of the street and the houses opposite. It said 'probably Australia but I'd need more info'.

I said, give me your best guess.

And it guessed Canberra, Australia. Where I'm sitting right now drinking a Martini. Pretty spectacular.

Measter

I gave o4-mini-high a cropped version of a photo I found on Facebook[0][1], and it quickly determined that this was in the UK from the road markings. It also decided that it was from a coastal city because it could see water on the horizon, which is the correct conclusion from incorrect data. There is no water, I think that's trees on a hill. It focused heavily on the spherical structure, which makes sense because it's distinctive, though it had a hard time placing it. It also decided that the building on the left was probably a shopping centre.

It eventually decided that the photo was taken outside the Scottish Exhibition and Conference Centre in Glasgow. It actually generally considered Scottish locations more than others.

The picture was actually taken in Plymouth (so pretty much as far from Scotland as you can get in Britain), on Charles Street looking south-east[2]. The building on the right is Drake Circus, and the one on the left is the Arts University. It actually did consider Plymouth, but decided it didn't match.

[0] This image with the "university plymouth" on the left cropped out, just to make it harder: https://www.facebook.com/photo/?fbid=9719044988151697&set=gm...

[1] https://chatgpt.com/share/68024c91-61d0-800c-99b1-fcecf0bfe8...

[2] https://maps.app.goo.gl/3TXv2UxH5128xQjJ9

delusional

I gave It some photos from denmark, didn't even bother to strip the metadata. One is correctly said give of "Scandinavian vibes" every other photo was very wrong. I also gave it a photo of the french Alps, it guessed Switzerland.

actuallyalys

It wouldn’t shock me if multimodal LLMs were good at GeoGuesser [0], but if we’re being picky, it takes more than a few examples to demonstrate a game is “solved.” I also wonder what kind of data leakage might have been at play, like other people have suggested.

To be clear, my point is not that this is unimpressive, just that this doesn’t demonstrate much. (Edit: I should have said, it doesn’t demonstrate what the title claims.)

[0] they were very likely trained on a large number of photos that had their location, and they have the ability to isolate features. Combinined with their ability to interpret instructions and just, well, guess, that seems like you have enough for the game.

Kolya

The examples are cherry-picked. I took a photo outside my office window in a built-up area, o3 thought for 5m 7s (!), and it got the location wrong by 40km. Doesn't look solved to me.

sadeshmukh

40 km is imo pretty impressive, but the 5 minute is really a killer. No use in real time applications

antonvs

For real time applications, GPS seems like it'd be much more useful.

bpodgursky

The military/infosec uses of this are not real time. You can wait 5 minutes for a drone strike.

paulcole

That’s also not GeoGuessr.

casey2

Were you in a Google street view car when you took the photo?

nxobject

Did you glean anything interesting from the chain of thought about why it took so long?

xzjis

I uploaded a picture I took, I don't save GPS coordinates on my pictures, but the first thing ChatGPT did is to read exif data from it.

echelon

I was thinking it was using IP geolocation, but after experimenting, I think it's just generally informed.

Here are a few results from GPT 4.5:

https://imgur.com/a/lGTipnn

cwmoore

New title is pretty accurate: "now performs well". Another amenable HN solution.

tkgally

I asked the just-released ChatGPT o4-mini-high to locate four photographs of varying difficulty. It didn’t get any of them right, though the guesses weren’t bad. The reasoning was also interesting to watch, as it cropped sections of the photos to examine them more closely. I put the photos, response, and reasoning trace here:

https://www.gally.net/temp/20250418chatgptgeoguesser/index.h...

Later: I tried the same prompt and photos with Gemini 2.5 Pro. It also got them all wrong, though with a similar degree of reasonableness to its guesses. I had thought that Google’s map and street-view data might lead to better results, but not this time.

tkgally

Still later: I later read that o3 is supposedly particularly good with this geoguessing, so I tried the same prompt and photos with o3. This time it got one out of four correct: “The view of the canal with cherry blossoms and the green railway viaduct is the Ōoka River in Yokohama, looking north from the little road bridge between Hinodechō and Koganechō stations. The tracks on the left belong to the Keikyū Main Line, and the high‑rises in the distance are the Minato‑Mirai and Kita‑Naka district towers.” Its other three answers were still wrong.

viraptor

There's various degrees of "solved" here. Identifying a generic area is cool. But I wouldn't call it a "solved problem" until it can consistently beat for example Rainbolt in accuracy. And there's no good comparison of completely random roads posted so far - mainly popular locations.

Basically, it's one thing to pick out a specific thing photographed thousands of times, but another to get a random country side view and pick out all the unique features for a very precise guess.

Benjammer

One problem is how can you even set up a "fair" competition between an AI and Rainbolt? He does ones where it flashes for a fraction of a second and then he guesses the country. How do you simulate "only saw it for a fraction of a second" to an AI?

al_borland

Maybe limit the time the AI is allowed to think? In the post it showed the AI thought for almost a minute.

I’ve seen Rainbolt ID an image based on some dirt and nothing else. I’d want to see AI be able to do that before saying it’s a solved problem.

sandermvanvliet

“This is the gradient of Senegal”

dewey

It’s fair, it just means he’s better and AI isn’t there yet.

GaggiX

This is probably the best attemp at consistently beating Rainbolt in accuracy: https://youtu.be/ts5lPDV--cU

jampa

I wonder how much in the "Geoguesser Meta Iceberg" this new generation of AI is:

https://i.redd.it/dz8bhnamohb71.jpg

littlecranky67

I've been telling women to keep copies of all the dick pics they get sent. Since you can tell by the characteristic noise of a cameras sensor which other pictures were taken with the same camera. All missing is a search engine capable of doing this. I feel with AI, we are 2-3 years away from people uploading a dick pic to AI and getting the social media profile of that person...

Reubend

Can you share some sources? I would be extremely surprised if such fine grained noise survives imagine compression to the extent that you could identify the source of an imagine despite changing lighting conditions, locations, exposure times, etc

dharmab

A kagi search found this paper that achieved decent results on youtube compression, 5 years ago https://arxiv.org/abs/2012.06277

littlecranky67

There are tons of results (and papers) if you google/kagi: "image noise fingerprint ccd sensor". I smell lazyness here.

bravetraveler

Good advice, though purpose-scoped devices are so common we have songs

chneu

This is just a data problem. The more dick pics we can feed into it then the better the results will be.

C'mon boys. Start uploading those dick pics for research purposes.

romanhn

The Alki Beach example is absolute madness. On one hand, I can't wait until all thousands of my photos get automatic semantic and geographic tagging (I guess that's possible now). On the other, goodbye privacy, we hardly knew ye. It will be interesting to apply this to historical, or just old, photos.

jsheard

> The Alki Beach example is absolute madness.

I wonder about info leakage with that one, the poster uses that exact photo as their avatar so if they've mentioned Alki Beach before then reverse image searching might pick it up from context. Ideally you'd want to test it with a photo that's never been posted online.

omalled

I also wonder how much the memory feature plays a role. I gave it a picture of the street I live on that isn’t online anywhere. O3 guessed the town correctly. Then, I turned off the memory feature and did the same thing. It didn’t even get the state right.

beoberha

Assuming it only used the pixels and not any metadata from the file or memory from the user (which is a massive assumption), how fucking cool that it can identify the Olympics versus any other mountain range. At that point it’s probably not too hard to guess the picture came from Alki or Golden Gardens, but still very impressive!

I’m also completely ignoring it inferred location from the pride flag and corgi which have heavy Seattle vibes :)

bee_rider

Of course, it is pretty impressive.

It isn’t magic. I think there we’ll often be surprised by the “hits,” but it’ll also get a lot of “misses.” A random photo in your back yard that doesn’t show the horizon will probably be a miss, right?

Meanwhile, lots of photos already include GPS coordinates… data you can strip out, but which is enabled by default for millions of people. Privacy, eh, never knew it I guess.

xnickb

It's a fallacy to assume that one didn't have privacy merely because most people opted out of it due to personal reasons.

bee_rider

Which fallacy is it, out of curiosity?

weregiraffe

>goodbye privacy, we hardly knew ye.

Don't upload your private photos anywhere.

Retr0id

In all these examples, I wonder if it's indirectly able to draw on the user's own location? Not necessarily via image metadata, but the request origin IP etc. If I ask ChatGPT for the weather forecast, I get it for my own location.

Would be interesting to have someone reproduce coming from a different country.

dataviz1000

I'm in Lima, Peru on vacation. Yes, it knows where I am.

MattGaiser

I am sending it screenshots of my vacation pictures (i.e. without the metadata) and screenshots from GeoGuessr. It’s not doing well and is making guesses based on my location.

So I sent it a photo from Warsaw and it guessed a place in my hometown of Calgary.

Sent it a picture of Bangkok and it guessed Calgary in the summer or Hawaii or something like Singapore.

It confidently stated that a photo from Versailles was Hyde Park.

It correctly identified that the panda photo was from the zoo, but said it exactly matched a zoo in Sichuan, a place I have never been.

It identified a street as being in Bridgewater Nova Scotia, when it was actually in Miramachi NB. Even made up business names for the buildings.

It identified a random house in Moncton as the "old sheriffs house in Annapolis Royal, Nova Scotia.

A beach in Phuket was identified as Destin, Florida.

avgDev

So much for a "Solved Problem".

Retr0id

Right, and what happens if you ask it the location of an image from somewhere else, sans metadata?

rescripting

I just screenshotted a friend's instagram post to omit any metadata. I know they are somewhere in Portugal, but not the exact town. Its a generic pretty photo of a beach and the town atop a cliff next to it.

ChatGPT guessed Ericeira which after looking up more photos, didn't match very well. However, Google Lens guessed Albufeira and was bang on.

dataviz1000

> The dish shown appears to be a variation of a creamy potato gratin (also known as "potatoes au gratin" or "gratin dauphinois"), accompanied by a meat preparation covered with a creamy sauce. The potato gratin typically consists of thinly sliced potatoes layered with cream, cheese, garlic, and baked until golden and tender.

> it is stuffed pepper

> Ah, thank you for clarifying! Given your detail, the dish is likely "Rocoto Relleno con Pastel de Papa" (Stuffed Pepper with Potato Gratin). It's a classic dish from the region of Arequipa, Peru, commonly served throughout Peru.

> This photo was taken in Cusco, Peru. It shows a characteristic view from a balcony overlooking the historical area of Cusco, visible through the distinctive colonial architecture, terracotta rooftops, and mountainous backdrop typical of the region. The architecture and landscape are unmistakable, capturing the essence of this iconic Peruvian city, likely near the historic center (Plaza de Armas or San Blas neighborhood).

Nailed it with San Blas. I'm impressed, nonetheless. [0]

[0] https://imgur.com/a/YMi87wK

emmelaich

It can, look at the alt-text of one of the pics lower on the page.

iambateman

As the article notes, our threat model for who can identify where a picture was posted needs to change from “dedicated, skilled person” to “any creep with $20.”

That’s the point of the switch and it’s a big deal. We’re so used to posting pictures online…I’m just not sure it’s a good idea long-term.

ofrzeta

It is impressive and it almost located the church in my town properly, although in a neighbouring town. However that showed a lack of understanding because its conclusion about the location came from "reading" a signpost that pointed to that other village. Clearly there would be no signpost in a town pointing to itself. Still impressive and lots of correct observations about the subject like architectural details, roman numerals on the watch face etc.

xeonmc

The steam engine was invented and the 100m dash is now a solved problem.

layman51

GeoGuessr is also not a “solved problem” in the sense that if you give the model a photo of an outdoor location that is not covered by Google Street View, then it will just make an educated guess which might still be many kilometers away.

pelagicAustral

I'm not ashamed to say that for about 6 months I played Geogessr thinking that you could not move around, but only look around... and I did pretty good in my mind.

paulcole

Thats the only legit way to play IMO.

No moving, no panning, and no zooming is too hard. Moving is too boring. Pan and zoom is just the right balance.

defrost

Much of the interior of Western Australia (an area 3x the size of Texas) can be divided into very few cells which essentially look near identical in all directions, coupled with few roads for the area and not much in the way of uploaded snaps coverage.

There's a lot of flat lands with spinifex in certain areas (easily a couple of United Kingdoms in size) and even a touch of mesa won't help narrow a location down from the general are as there are many of those with identical edge profiles.

But sure "Pilbara", "Kimberley", "Wheatbelt" can be geo guessed .. it's a real challenge to narrow down (I spent some time doing wet film photogrammetry prior to sheperding in WGS84 differential GPS locating and digital film and multi spectral geophysical aquisition).

No so hard if there's a few relatively unique man made features.

marxisttemp

Because seem like someone who might have an interesting answer:

A GeoGuessr player, GeoWizard, has done a few “straight line challenges”, where he attempts to walk across a country in as straight a line as possible, usually planning beforehand with Google Earth and PostGIS. This got me thinking of what could fairly be thought of as “crossing”, since obviously you couldn’t describe e.g. walking from one side of Florida to the other as “crossing the USA”.

My best thought was to set the ending point of the line by following the border of the country in each direction til they met on the other side. To avoid the fractal coastline problem, use the challenger’s stride length as the unit of measure for the border.

But perhaps there is a better, more rigorous way of defining the opposite point on the edge of an arbitrary polygon.

dredmorbius

A far more apropos comparison would be the internal combustion engine and the horse, in a military context. Though sticking with steam engines, military logistics advantaged over a wagon caravan.

The question here isn't a casual guessing game, but threat models (as directly addressed in TFA), and general informational hygiene.

xnickb

I fail to see the drastic change here. AI was used for this in the past as well. The difference is far from that between steam and combustion engines.

paxys

Just like how chess engines ended competitive chess as people were predicting at the time.

colordrops

"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra

Mistletoe

Isn’t a human brain an organic computer? I find the human brain very interesting. The most interesting thing I know of in the universe.

xnickb

Religious people claim it's merely a communication device, and the computer is elsewhere.

Clearly one's interpretation is a function of the paradigm one exists in.

AIPedant

I don’t think “organic computer” is a good way to think of a brain. The fact that human brains + some scratch paper can implement any Turing-complete model of computation is very interesting. But that’s not true of chimpanzee brains or orca brains or crow brains, all of which belong to intelligent, thinking animals.

Going the other way, it’s not clear that a Turing-complete model of computation can tractably implement (say) crow cognition. Turing machines can solve arbitrary systems of Schrödinger equations, so theoretically we could simulate every atom in a crow’s body and get an AI crow[1]. But that’s obviously intractable for any known physical computer, and would remain intractable even if we moved to proteins rather than atoms. So are there higher-order “primitives” of crow cognition that can be implemented on a Turing machine? Or is the problem akin to integration, where neat “symbolic” solutions are impossible for most brains, and only “numeric” approaches work?

[1] This is also true for humans: even if you take a loopy quantum consciousness approach, AGI is theoretically possible. Sometimes you see people arguing against AGI on the grounds of Gödel-incompleteness, but this is a mystical nonsense understanding of what Gödel actually proved.

SecretDreams

I think growing a baby from nothing is pretty nifty, but brains are up there too.

reaperman

Normally I dislike these quips for HN; I hate that I love this one.

abcanddbutnote

One line of javascript solved that "problem" a while ago. The answer is in the DOM.

Crestwave

This isn't just referring to GeoGuessr the game, but locating photographs in general. The source post sums it up very reasonably and concisely:

> PSA: When posting any outdoors photos, update your threat model from "someone skilled and dedicated could theoretically locate this" to "any stalker can do this for 20€/mo"