Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
You thought OpenStreetMap data uses the WGS84 datum? No it doesn't (openstreetmap.org)
196 points by Reventlov on July 17, 2019 | hide | past | favorite | 39 comments


I get why OSM does this, and it’s not necessarily a bad thing. The most important thing is being transparent about what format the data is in and always including that information in the metadata. the article touches on this point in this blurb:

We should inform data consumers, so that they can decide to convert the coordinates to WGS84 if they want to.

As someone who works in the GIS space, if we get data without this crucial information it’s sometimes very difficult to find the original datum to perform transformations and usually amounts to some form of educated guesswork based on the warp from WGS84 and the origin of the data “what datum/projection would I have used if I was the data creator?”


The trouble is, the documentation is just wrong - all the official documentation says that OSM is in WGS-84. When I realised that this couldn't actually work due to plate tectonics and tried to figure out what they were actually doing, all I found was an April fool's joke from a couple of years ago: https://blog.openstreetmap.org/2017/03/31/osm-plate-tectonic...


Interpreting OSM as if it mattered would anyway be a mistake. The data does not have consistent or reliable accuracy and precision.

Edit: Someone has downvoted this. It's an important point to remember when using OSM. Many GIS datasets are developed in a way that the accuracy and precision of the data is consistent across the entire dataset. OSM doesn't bother with this, because it's really hard to do, especially globally. That doesn't make OSM less useful for making consumer maps (there's no problem if data is off by 2 or 4 meters or whatever), but it can be a problem if say you were trying to make a legal determination about property ownership. OSM isn't a useful source for the latter task and would have to fundamentally change to become so.


To back this up, when I was making maps of the rural area where I live in Japan, there were a few places where aerial maps were obscured by trees and where GPS measurements were wildly inaccurate due to geographical features. So I just guessed where the road was. It's in there somewhere and if you are driving your car or riding your bike, it's more than good enough. If you needed to know the precise location of that road for another purpose? Totally useless. Maybe someone has gone in and improved my initial efforts, maybe not (although, I was looking at it the other day and I'm pretty sure they have).


I've contributed to OSM, fixing trail maps based on traces I recorded with my Garmin GPSMAP 64 which is configured to use WGS-84. If OSM wants something else they'd better let their contributors know how to deal with this.


It doesn't matter too much when using normal GPS receivers for mapping because the difference between datums is less than the position error. The main reason I was looking into it is because I was experimenting with a GPS setup that could theoretically achieve high enough accuracy for this to matter.


This is a problem all over data processing ... once metadata becomes separate from the data over time it can become hard or impossible to correct later.

I always push for detailed metadata, preferably in the same location/artifact. I'm regularly surprised by people who don't understand why it's important and just think "it's too much hassle".


One thing that might be nice to do is to include an entirely WGS84 transformed version of the data for people looking to perform worldwide geospatial analysis. if the datum/projections are available for the regional datasets it’s a relatively trivial program to write as part of the ETL processing needed to make data available (PROJ implementations such as proj4j have everything you need to do this if the input metadata is available). Might need a decent amount of compute to perform all that transformation on a regular basis, but if it’s something like a weekly dump it doesn’t seem unrealistic at all. I think an optimized Spark job using proj4j on worldwide OSM data could probably perform this task in a few hours.


In GIS we have a system designed specifically for storing data which we tell people to use. But when it comes to metadata people use separate systems which need their own interface. And usually the metadata standard is far more complicated than the spatial data format! This seems crazy to me and just demonstrates how limited a lot of our technology is.


We also suffer from there being hundreds of formats for geodata and with that hundreds of ways Metadata is stored. And they don't always line up perfectly to allow seamless conversions.


I suspect this is common in lots of areas. At best you have a dedicated data catalog that is used consistently but still has a different interface. At worst you get a dogs breakfast.


For years we suffered at the hands of Oracle so developers refused to record time zones "for performance". This is no longer a technical concern but the cultural damage has been done.


It really is a nightmare. It's like time zones, only when the datum is wrong, it's not necessarily immediately obvious what the correct one should be.

People, even the ones who ought to know better, don't even seem to think it's necessary to specify their datum. It's bizarrely common to get a giant spreadsheet with coordinates in it and no datum. I've never seen data with its vertical datum specified.

Don't get me started on the situation in China. https://en.wikipedia.org/wiki/Restrictions_on_geographic_dat...


Absolutely this. Or I'll get a CAD export in "Wyoming State Plane" but no indication that the designer decided to work in inches, not (US survey) feet. I spend more time than I ever would have guessed playing datum detective.


The problem is that the errors are too small for most users to care about. Few care if a date-time value from 2015 is a few hours out or if a location in Australia from 2008 is a metre out. So hardly anyone cares about finding a solution.

The problem would be taken a lot more seriously if continental drift was 10m per year instead of a few cm.


This is a nice little illustration of how hard it is, and how far you can end up going, when you try to solve what seem like simple problems properly. I use OSM because i just want to know where my nearest postbox is; under the hood it's modelling plate tectonics!


The post is saying that it isn't modeling plate tectonics.


It's modelling their movements, but not their distortion


The imagery sources all use "locked" datums that ignore the movement, so the OSM data derived from the imagery is also ignoring the movement.


Well, that’s not what the article says. Some of their image sources are using plate-locked frames of reference, and some are using global frames of reference. It’s actually inconsistent.


For those without a satnav lock yet: Datum is a defined 3D geometry model used to translate a physical coordinate given by satnav into a latitude, longitude and altitude.

Different datum -> different lat/long/alt.

Disclaimer: Trimble alum


you're the perfect person to ask about this: i have a trimble i used to survey some points (using RTX) that shows a ~1 meter offset from data i've collected using piksi PPK'ed against a CORS station. i've checked the datum i export in on the trimble (wgs84) and the datum of the CORS base station and despite performing a conversion i can't get them to line up. any ideas?


Are they the using the same realization of the datum? https://vdatum.noaa.gov/docs/datums.html


In practice, GPS receivers are typically accurate to no more than 3-4 meters under good conditions (open sky, no obstacles), so even in Australia the difference between WGS-84 and a datum locked to the plate isn't enough to matter for navigation. Specialized equipment can use GPS to produce more accurate measurements, to the point where these things matter, particularly in Australia which as far as I know is moving fastest relative to WGS-84.


"13 years ago... ....drifted ~1m"

That's incredible


There's a couple of examples at https://teara.govt.nz/en/photograph/31154/greendale-fault of what might euphemistically be called an "internal distortion" near the edge of a tectonic plate.


Parts of Japan moved ~4 meters during a recent earthquake.

https://slate.com/news-and-politics/2011/03/japanese-earthqu...


A 2016 earthquake in New Zealand had a slip of 10m along one fault: https://watchers.news/2016/11/16/kaikoura-earthquake-new-zea...


> "parts of" japan.

I think this alone means the question in the blog post is answered.

OSM should pick a datum time, used for persistent storage, and then store a time-varying deformation model to adjust the way things are displayed. That deformation model needs to allow all kinds of warping etc.


California moves in dozens of places each year, in small amounts.. rely on it


Ridgecrest quakes shifted some area over a meter.


Great summary!

Any idea how the different plate motion models behave near plate boundaries? There are a few plate boundaries in areas that are interesting for human users. The East African Rift Valley is famous, but there are others as well.


The models you'd use for map making are often really simple. For example, the Australian plate motion model is a "seven parameter transformation" - three translations, three rotations and a scale factor. The deformation errors which can't be modeled with this are roughly centimeter level over decades; see https://www.researchgate.net/publication/258401581_ITRF_to_G...

For anyone using these kind of transformations, the boundary between the preferred models for two countries may well be discontinuous!

Ultimately the correct way to deal with this is to use a global time dependent deformation model. Some countries with large internal deformation have already gone to the effort of producing local versions. For example New Zealand: https://www.linz.govt.nz/data/geodetic-system/datums-project....

I guess there will eventually be a high quality global deformation map available but I wasn't aware of any such thing in general use as of two years ago.


Most importantly, an object could, due to plate movements, become a malformed shape.

For example, a polygon that becomes self-intersecting because some of it's points are on different plates.

Thats bad for linters and database constraints...


Just to see if I understood: so one of these coordinate systems is fixed in place relative to the sky above the earth somehow, with the idea that as the tectonic plates move around below, coordinates may become inaccurate or need to be updated?

But then other coordinate systems are based on something on the ground? So it is what, the based on the vector between some reference object on the (moving) plate and a local point on it? But then there are also smaller distortions of the plate that can make the plate-local coordinates inaccurate?

And I'm guessing normal GPS coordinates are from the non-moving sky coordinates?


Reality has a pesky way of interfering with our simple, elegant models...

As for the timestamp, this is probably the easiest solution no? OSM keeps track of all modifications.


The timestamp will tell you that I added a new item in 2019, but it won't tell you if I put in proper absolute 2019 coordinates, or did I draw it based on an aerophoto overlay that was made with 2016 coordinates, or did I add that item in it's proper relative position with respect to surrounding items that were placed in 2013 and not adjusted.


The timestamp of most of the source imagery isn't very available. And then people don't always correctly state the source of a location.

Anybody who is paying attention already knows that OpenStreetMap isn't all that precise, which is all the post is really saying.


I've been dabbling with GIS data for many years without being even remotely aware of this issue. I now realize how much of a sandcastle GIS data is. So, thanks for posting this because I learned something.

There is a lot of data out there that is effectively user generated. What this means is that someone with a phone goes to a some place and than posts some content (photo, tweet, restaurant check in, etc) with some kind of coordinate. All this stuff ends up in databases. This stuff was never very accurate. If you collect POI data from different sources about the same POI, the coordinates are going to vary by tens of meters. Worse if users manually enter the coordinates because people lie. Also, real world places are not points but have geometry, sometimes that can be tens to hundred of meters. E.g. the coordinate of the Louvre in Paris is meaningless because the place is huge. Also the whole of open street maps is basically many different users contributing what they think should be the coordinates of things.

Proximity is a good signal when de-duplicating POIs but not reliable by itself. E.g. in many Japanese cities, some bars are on the n-th floor of some sky scraper and there may be dozens of restaurants and bars in the same building. Also, they close and re-open as something new regularly; so there's a lot of stale GEO data out there like reviews of restaurants that no longer exist or that just opened and get tagged to the wrong POI, buildings that got demolished, or new buildings and streets that got constructed recently in what formerly was a field.

So, GIS data is messy and the coordinate system is helpful for getting close enough that you can figure it out once you are near enough to see the signs but generally not very precise.

But then, there are more elaborate things like indoor maps that are geo-referenced; or 3D building models that are shown on a map. These things are usually modeled separately and then manually or automatically aligned with a map. Most data sets specifying coordinates have no notion of anything else than a pair of doubles that are generally referred to as the WGS 84 coordinates. This includes open street maps; which mostly consists of user generated data by users with no clue about this issue using inaccurate gps signals and either some local knowledge about stuff is relative to other stuff in OSM or some satellite imagery.

So, the implication of this article is that most of that data is only valid in the context of whatever it was geo-referenced to. Which, in most cases is simply unknown since it is typically not recorded in datasets or even known by whoever records the data.

This explains a lot of things about some stuff I've been seeing in 3D scenery for flight simulators like x-plane where if you have e.g. some satellite imagery and 3D objects for real world buildings, airports, etc. it is quite common for them to be slightly misaligned since the data comes from lots of different sources, all lacking meta information about what their coordinate system actually is supposed to be, that are combined automatically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: