MongoDB, OpenStreetMap and a Little Demo
Jun 20, 2011
I was curious in finding worldwide points of interest, and I quickly found the OpenStreetMap database. The complete database is available as a 16GB compressed XML file (which comes in at around 250gb uncompressed), which is updated daily by generous contributors. Thankfully, you can find mirrors that have partitioned the data in some meaningful way (like by major cities).
For our needs, the data is made up of few important elements. The first is a node
, which has a longitude, latitude and an id. A node has zero or more tag
child-elements, which are key-value pairs of meta data. There's also a way
element which references multiple node
elements. You see, in my naive mind a point of interest like a building would be represented by a single node
. However, from a mapping point of view, it's really a polygon made up of multiple nodes
. A way
can also have zero or more tags.
Now ever since I wrote the MongoDB Geospatial tutorial, I've had an itch to try more real-world stuff with MongoDB's geo capabilities. This database seemed like an ideal candidate. The first thing I did was download a bunch of city-dumps from a mirror and started writing a C# importer (github). I wasn't actually interested in polygons, so I calculated the centroid of any way
and converted it into a node
. Most of the time the result was quite good. The importer's readme has more information.
Next, I wrote a little Sinatra app and did the obvious thing using the Google Maps API. You can also find the source for this on github.
Different cities have different amounts of data. I left everything in and you can see there's quite a bit of information. Given that MongoDB supports composite indexes, it'd be trivial to provide additional node filtering.
And that's why, when people ask me What did you do this weekend?, I can say I parsed a 250gb XML file (because, yes, I did download it and I did *try* to import it)
.