Reverse geocoding is hard
98 comments
·April 27, 2025Dachande663
jandrewrogers
Even accounting for tectonic drift, there is a concept of positioning reproducibility that is separate from precision. In general the precision of the measurements is much higher than the reproducibility of the same measurements. That is, you may be able to measure a fixed point on the Earth using an instrument with 1cm precision at a specific point in time but if you measure that same point every hour for a year with the same instrument, the disagreement across measurements will often be >10cm (sometimes much greater), which is much larger than e.g. tectonic drift effects.
For this reason, many people use the reproducibility rather than instrument precision as the noise floor. It doesn’t matter how precise an instrument you use if the “fixed point” you are measuring doesn’t sit still relative to any spatial reference system you care to use.
Robotbeat
The whole accuracy vs precision thing.
jandrewrogers
Related but slightly different. The accuracy is real but it is only valid at a point in time. Consequently, you can have both high precision and high accuracy that nonetheless give different measurements depending on when the measurements were made.
In most scientific and engineering domains, a high-precision, high-accuracy measurement is assumed to be reproducible.
AlotOfReading
GPS coordinates actually account for the motion of the Earth's tectonic plates. The problem is that it's a highly approximate model that doesn't accurately reflect areas like Australia very well.
There's a great visualizer of the coordinate velocity from the Earthscope team:
https://www.unavco.org/software/visualization/GPS-Velocity-V...
jandrewrogers
GPS coordinates do not account for tectonic motion. It is a synthetic spheroidal model that is not fixed to any point on Earth. The meridians are derived from the average motion of many objects, some of which are not on the planetary surface.
The motion of tectonic plates can be calculated relative to this spatial reference system but they are not part of the spatial reference system and would kind of defeat the purpose if they were.
AlotOfReading
The corrections are incorporated into the datum. WGS84 is updated every 6 months to follow ITRF by changing the tracking station locations as the plates move around.
janzer
I'm pretty positive that is showing the reverse, i.e. how much a given "location" is moving using gps coordinates. Not adjusting the gps coordinates to refer to a constant "location".
xucheng
Can this be solved by storing a timestamp of the record along with precise GPS coordinates? Could we then utilize some database to compute the drift from then and now?
jandrewrogers
Yes, in fact it should essentially be mandatory because the spatial reference system for GPS is not fixed to a point on Earth. This has become a major issue for old geospatial data sets in the US where no one remembered to record when the coordinates were collected.
To correct for these cases you need to be able to separately attribute drift vectors due to the spatial reference system, plate tectonics, and other geophysical phenomena. Without a timestamp that allows you to precisely subtract out the spatial reference system drift vector, the magnitude of the uncertainty is quite large.
omcnoe
You don’t need to store a timestamp, but the local coordinate reference system that the coordinates are in. When revisions like this are made, it’s by updating the specification of a specific local coordinate reference.
WGS84 is global, but for most precise local work more specific national coords are used instead.
null
haneefmubarak
I mean, certainly - if you store both GPS time and derived coordinates from the same sampling, then you can always later interpret it as needed - whether relative to legal or geographical boundaries etc as you might want to interpret in the future.
cameldrv
I think Australia has its own datum for this reason that can float against WGS84
RainyDayTmrw
This is one of many reasons why property surveying records use so many seemingly obscure or redundant points of reference. In case anyone wonders why modern property surveying isn't only recording lots of GPS coordinates.
pavel_lishin
Damn! 7cm per year feels blazing fast when you consider the fact that it's a whole continent.
niccl
A way to think about it I've seen a few times: continental drift is roughly the same order of magnitude as the rate your fingernails grow!
anotherevan
We're coming for you!
XorNot
I mean I'm still mind blown that the Three Gorges dam in China literally changed the rotational speed of the Earth, and thus the length of the day.
sleepy_keita
Japan publishes new CRSes after large earthquakes to account for drift. The M9 earthquake in 2011 recorded a maximum shift of 5 meters!
akst
My knowledge of geospatial sets is fairly shallow, but I’ve worked a bit with Australian map data and I’m assuming are you referring to the different CRSs, GDA2020 and GDA1994?
I’d imagine older coordinates would work with the earlier CRS?
But I can understand not all coordinates specify their CRS. This have really been an issue for me personally, but I’ve mostly worked with NSW spatial and the Australian Bureau of statistics geodata.
andrew_eu
I have a memorable reverse geocoding story.
I was working with a team that was wrapping up a period of many different projects (including a reverse geocoding service) and adopting one major system to design and maintain. The handover was set to be after the new year holidays and the receiving teams had their own exciting rewrites planned. I was on call the last week of the year and got an alert that sales were halted in Taiwan due to some country code issue and our system seemed at fault. The customer facing application used an address to determine all sorts of personalization stuff: what products they're shown, regulatory links, etc. Our system was essentially a wrapper around Google Maps' reverse geocoding API, building in some business logic on top of the results.
That morning, at 3am, the API stopped serving the country code for queries of Kinmen County. It would keep the rest of the address the same, but just omit the country code, totally botching assumptions downstream. Google Maps seemingly realized all of a sudden what strait the island was in, and silently removed what some people dispute.
Everyone else on the team was on holiday and I couldn't feasibly get a review for any major mitigations (e.g. switching to OSM or some other provider). So I drew a simple polygon around the island, wrote a small function to check if the given coordinates were in the polygon, and shipped the hotfix. Happily, the whole reverse geocoding system was scrapped with a replacement by February.
marc_abonce
I faced the same issue with locations inside Crimea and Kashmir. The Google Places API wouldn't return a country code for those regions. At the time I couldn't find any documentation from Google specifying which inhabited locations return a null country code, I assume they want to avoid any potential controversy. Unfortunately this lack of documentation makes it harder to work around this issue.
modeless
Wow, I had no idea that Taiwan controlled an island less than three miles from mainland China, essentially surrounded by China in a bay. (The main island is 80+ miles away.) I'm really surprised China has allowed that for 80 years. Unsurprisingly, the beach looks like this: https://www.google.com/maps/place/Shuang+Kou+Zhan+Dou+Cun/@2...
Also interesting that there's a Japanese island only 60 miles from Taiwan on the other side. I guess claims to small Pacific islands have been weird for a long time.
nradov
If the Chinese Communist Party decides to escalate the pressure on Taiwan then one likely scenario is some sort of blockade against those small islands close to the mainland.
jandrewrogers
Most people don’t have an intuitive sense of just how technically difficult mapping from real geospatial coordinates to feature spaces is. This is a great example of a relatively simple case. You are essentially doing inference on a sparse data model with complex local non-linearities throughout. If you add in dynamic relationships, like things that move in space, it becomes another order of magnitude worse. We frequently don’t have enough data to make a reliable inference even in theory and you need a way of reliably determining that.
This problem has been the subject of intense interest by the defense research community for decades. It has been conjectured to be an AI-complete type problem for at least ten years, i.e. solving it is equivalent to solving AGI. The current crop of LLM type AI persistently fails at this class of problems, which is one of the arguments for why LLM tech can’t lead to true AGI.
TimTheTinker
Just putting this out there. This is one area where Esri's software really shines. They have so many software offerings and so much is said about different things you can do with ArcGIS (and competing systems), but the capability of their projection engine and geocoding systems - the code that lies at its heart - is unmatched, by far, at least as of 5 years ago when I left for a different company.
I had long conversations with Esri's projection engine lead. Really remarkable guy - he's got graduate degrees in geography and math (including a PhD) and he's an excellent C/C++ developer. That kind of expertise trifecta is rare. I'd walk by his office and sometimes see him working out an integral of a massive equation on his whiteboard (not that he didn't also use a CAS). "Oh yeah, I'm adding support for a new projection this week."
jandrewrogers
Many people don’t appreciate the extent that building robust geospatial systems requires seriously hardcore mathematics and physics skills. All of the mapping companies have really smart PhDs wrangling with these problems. I’ve always enjoyed talking with them about the subtleties of the challenges. There are so many nuances that never occurred to me until they mentioned them.
null
null
sinuhe69
Not my area of expertise, but is this not a form of perfectionist problem? I mean, most places have a clear and simple address. For the rest, either a human can solve it, or we can make a few examples and let an AI do the work. We can go back to them later and revise them if we need to. Addresses don't change often, so I think things can stay the same for a long time.
Except for emergency dispatch and a few high-profile use cases, you can have a good enough address to let the user find its neighbourhood. But they still have the GPS or other form of address coding, so they can find the exact location easily. I'd say 99.9% of the cases are like that. The rest can be solved quickly by looking at the map!
ryandrake
You can call it perfectionism or you can call it "doing it right." I think this gets at a fundamental difference in philosophy among [software] engineers: We have a problem with a lot of edge cases, where a "good enough" solution can be done quickly. What do we do? There's a class of engineers who say 1. Do the "good enough" solution and ignore/error on the edge cases--we'll fix them later somehow (may or may not have an actual plan to do this). And there's a class of engineers who say 2. We cannot solve this problem correctly yet and need more research and better data.
Unfortunately (in my view), group #1 is making all the products and is responsible for the majority of applications of technology that get deployed. Obviously this is the case because they will take on projects that group #2 cannot, and have no compunction against shipping them. And we can see the results with our eyes. Terrible software that constantly underestimates the number and frequency of these "edge cases" and defects. Terrible software that still requires the user to do legwork in many cases because the developers made an incorrect assumption or had bad input data.
AI is making this problem even worse, because now we don't even know what the systems can and cannot do. LLMs nondeterministically fail in ways that sometimes can't even be directly corrected with code, and all engineering can do is stochastically fix defects by "training with better models."
I don't know how we get out of this: Every company is understandably biased towards "doing now" rather than "waiting" to research more and make a better product, and the doers outcompete the researchers.
sbarre
> Unfortunately (in my view), group #1 is making all the products and is responsible for the majority of applications of technology that get deployed.
This is an interesting take, and I think I see where you're coming from..
My first thought on "why" is that so many products today are free to the user, meaning the money is made elsewhere, and so the experience presented to the user can be a lot more imperfect or non-exhaustive than it would otherwise have to be if someone was paying to use that experience.
So edge cases can be ignored because really you're looking for a critical mass of eyeballs to sell to advertisers or to harvest usage data from, etc.. If a small portion of your users has a bad time or experiences errors, well, you get what you pay for as they say..
And does that kind of pervasiveness now mean that many engineers think this is just the way to go no matter what?
mootothemax
> most places have a clear and simple address
That depends on your definition of "clear and simple" and "address" :) While a lot boils down to use case - are you trying to navigate somewhere, or link a string to an address? - even figuring out what is an address can be hard work. Is an address the entrance to a building? Or a building that accepts postal deliveries? Is the "shell" of a building that contains a bunch of flats/apartments but doesn't itself have a postal delivery point or bills registered directly to it an address? How about the address the a location was known as 1 year ago? 2 years ago? 10 years ago?
Park and other public spaces can be fun; they may have many local names that are completely different to the "official" name - and it's a big "if" whether an official name exists at all. Heck, most _roads_ have a bunch of official names that are anything but the names people refer to them as. I have a screaming obsession with the road directly in front of Buckingham Palace that, despite what you see on Google Maps, is registered as "unnamed road" in all of the official sources.
> Addresses don't change often
At the individual level, perhaps. In aggregate? Addresses change all the time, sometimes unrecognisably so. City and town boundaries are forever expanding and contracting, and the borders between countries are hardly static either (and if you're ever near the Netherlands / Belgium border, make a quick trip to Baarle-Hertog and enjoy the full madness). Thanks to intercontinental relative movement, the coordinates we log against locations have a limited shelf life too. All of the things I used to think were certain...
If someone hasn't done "faleshoods programmers believe about addresses," I think its time might be now!
Edit: answering myself with https://www.mjt.me.uk/posts/falsehoods-programmers-believe-a...
jandrewrogers
The update rate for a global map data model, all of which are still woefully incomplete in many contexts, is surprisingly high. The territory underlying the map is a lot less static than people assume. Also, local reality is often much less “regular” than people assume such that a person really can’t figure it out reliably. Currently there are literally thousands of people tasked with incorporating these changes because it has proven to be resistant to automation thus far due to the pervasiveness of edge cases. For your basic global map data model, these are the edge cases that are left after several thousand heuristic and empirically derived rules have been applied.
It is a deeply complex data model that changes millions of times a day in unpredictable ways. Unfortunately, many applications are very sensitive to the local accuracy of the model, which is much higher variance than average accuracy. Only trying to be “good enough” in an 80/20 rule sense is the same as “broken”. The updates are also noisy and often contain errors, so the process has to be resilient to those errors.
The resistance of the problem to automation and the high rate of change has made it extremely expensive to asymptotically converge on model with consistently acceptable accuracy for the vast majority of applications.
edent
I am deeply guilty of being a perfectionist!
Ultimately, I just want something which is a nice balance between being useful for a human and not so long that it is overwhelming.
curiousObject
You’re the author?
The final step in the process “Wait for complaints” seems like a smart acceptance of the “perfect is the enemy of good” challenge
Publish and be damned, or as we say now: Move fast and break things
smitty1e
I was going to take this tack.
80% of the problem is just transforming floating point coordinates into API calls.
Getting to something useful with it is the hard 20%, and it will be a diminishing returns problem after that.
While not anybody's LLM proponent, that last mile might be a good AI application.
vintermann
Genealogy applications run into this a lot. The person of interest lived at Engeset. FamilySearch has geocoded a place called "Engeset, Møre og Romsdal, Norway". So that's it, right? Not so fast, [there are at least 3 Engesets in Møre og Romsdal](https://www.google.com/maps/search/Engeset/@62.3358577,6.225...).
But that's at least better than when it's some local place name which it's never heard of, and thinks sounds most similar to a place in Afghanistan (this happens all the time).
And to add to it, there are administrative regions, and ecclesiastical regions. Do you put them in the parish, or in the municipality? The birth in the parish and the baptism in the municipality, maybe? How about the burial then...
modeless
Converting from a name/address to coordinates is geocoding. Reverse geocoding is mapping from coordinates to a name/address.
AlotOfReading
I haven't found a better way do this than the Google maps solution [0]:
You write a query of all the different kinds of addresses you'd like to display. The query result is a list of valid candidate addresses for the point matching at least one format that you can rank based on whatever criteria you like.
[0] https://developers.google.com/maps/documentation/geocoding/r...
mvdtnz
It sounds like the author is more interested in getting city or town names from a coordinate. Google maps is massively overkill and horrendously expensive for this use case. I mentioned in another comment I do this in a game I wrote and can complete queries in microseconds.
punnerud
I created this to solve my own need for reverse geocoding: https://github.com/punnerud/rgcosm (Saving me thousands of $ compared to Google API)
Uses OpenStreetmap file, Python and SQLite3.
First it finds all addresses using +/- like a square from lat/lon, then calculate distance based on the smaller list (Pythagoras), and pick the closest. It expands until a set maximum if no address is found in the first search.
davidmurdoch
Just curious if you looked into using S2 cells for this? It's what Pokemon Go uses for its coordinate system. http://s2geometry.io/devguide/s2cell_hierarchy.html
punnerud
Isn’t the main purpose of S2 to be able to scan from different “directions”? More a purpose of Google Maps when viewing the world as a spherical object compared to Sqlite3 just using a simple B-tree index on lat+lon?
davidmurdoch
The individual cells being sized and being able to easily to compute neighboring cells seems useful for the described algorithm. I haven't given it much thought on applicability here, but it sounded somewhat similar to a search pattern I once implemented within pgsql to locate items on a map that were within proximity of a given latlong.
andrewaylett
It's a lot more expensive, but measuring navigation distance rather than straight line distance would avoid the "river" issue. Although depending on the routing engine and dataset it might well introduce more issues where points can be really close on foot but the only known route is a driving route.
edent
If you know of an API which does navigation distance to POI, I'd love to hear about it!
mootothemax
You can self-host or run locally Valhalla (https://github.com/valhalla/valhalla), reading in data from OSM as a starting point.
(For my purposes, I went with local running, generating walking-distance isochrones across pretty much well the entire UK)
nerdralph
I've used OSRM and Arcgis for addresses in Canada. I think one or both of them have POI support in their APIs. https://route.arcgis.com/arcgis/
rahimnathwani
Google has Routes API: https://developers.google.com/maps/documentation/routes
petre
Check out Graphopper. But if your POIs are from OSM, OSRM might be okay as well.
kylecazar
Good article. FWIW, some major cities offer seating data. New York, for example, returns bench locations as a Point (coordinates). They even have a column in the data for the nearest address of the "seating feature".
https://data.cityofnewyork.us/Transportation/Seating-Locatio...
nerdralph
Part of the problem is the different ways addresses are expressed throughout the world. I was born and grew up in Canada, and was confused when I started dealing with companies in China. Instead of street addresses, many are given by province, city, district, sub-district, and a building number.
Another problem is choosing which authority for the "correct" address. I've seen many cases where the official postal address city/town name is different than the 911 database. For example Canada Post will say some street addresses are in Dartmouth, while the official civic address is really Cole Harbour. https://www.canadapost-postescanada.ca/ac/ https://nsgi.novascotia.ca/civic-address-finder/
Even streets can have multiple official names/aliases. People who live on "East Bay Hwy", also live on "Highway 4", which is an alias.
johnlk
It’s almost more of a UX challenge than anything. The feedback widget idea at the end could offer a crowd sourced solution the same way Twitch solved translation via crowdsourcing.
Fun fact that was dredged up because the author mentions Australia: GPS points change. Their example coordinates give 6 decimal places, accurate to about 10-15cm. Australia a few years back shifted all locations 1.8m because of continental drift they’re moving north at ~7cm/year). So even storing coordinates as a source of truth can be hazardous. We had to move several thousand points for a client when this happened.