Category Archives: GIS

maps are just one way of looking at it

osm vs maps.me

(Hey! I’m not a lawyer. Read a few paragraphs of this, and it’ll be clear I shouldn’t even try to give any kind of opinion. The screenshots I include with maps.me, booking.com and Google data are presented merely for review and critique of the situation.)

An interesting discussion showed up on the OSM Legal mailing list a couple of weeks back: MAPS.ME combining OSM data and non-OSM data? OSM has some restrictions on how its data can be used. At a grossly simple level, if you mix your proprietary data in with OSM data and present a single layer map, you must either:

  1. provide the proprietary data you mixed in under the same licence as OSM, or;
  2. stop doing that.

The OSMF Licence/Community Guidelines/Horizontal Map Layers – Guideline has some example scenarios that I will quote here:

Examples of where you DO NOT need to share your non-OpenStreetMap data

  1. You use OpenStreetMap as a base topographical map and make your best reasonable efforts to exclude ALL restaurants. You then add a layer of your own restaurant data.

Examples of where you DO need to share your non-OpenStreetMap data

  1. You add restaurants in one area from non-OpenStreetMap data based on comparison with OpenStreetMap data in other layers.

So for this article, please bear in mind that where the above example says restaurants, I’m looking for an analogous case with hotels.

MAPS.ME (which they want you to pronounce as “maps with me”, even though it says “maps dot me”; I guess “maps  me” wouldn’t make such a nice URI) is an offline map app that shows a mix of POIs over a custom-styled OSM basemap. Its publisher mixes OSM POIs with proprietary ones, and publishes a list (local copy: skipped_nodes-20160731.zip) of OSM nodes that it has skipped from its POI list. To satisfy my curiosity, I wanted to see if the required separation of data was in place with maps.me.

First, I used overpass turbo to return all the hotel and motel nodes in and around Toronto (the query I used was “type:node and (tourism=hotel or tourism=motel) in bbox”, saved as TorontoHotelNodes.zip). This list contains 50 hotels and motels.

Next, I used maps.me’s skipped nodes list to find the OSM hotels and motels that maps.me had omitted from their app. This list should closely match the one from booking.com, where the app company said they got their hotel data. For completeness, I queried the OSM hotels extract to see what hotels maps.me had possibly not skipped.

Finally, I overlaid those OSM nodes on a georeferenced screendump from booking.com’s Google map. Here’s what it looks like (click to enlarge):

to_hotels-OSM_vs_maps_me

So OSM nodes not skipped (which is not necessarily the same as used) by maps.me are marked  , while those excluded by maps.me are marked . At first glance, there’s clearly not a 100% correlation between booking.com hotels, but a few clues emerge that hint strongly that maps.me are mixing in some OSM nodes with some proprietary nodes. It’s therefore likely that maps.me has not abided by the OSMF community guideline.

Scrolling around and zooming in on the booking.com hotel map shows that they don’t show you all of the listed properties. There’s always some filter based on location, price or availability. This makes sense if you’re managing a valuable proprietary data set, as there will always be someone trying to scrape your data and take your business with slightly cheaper referrals. One example of a hotel available on booking.com is The Old Mill (booking.com link: Old Mill Toronto) even though it’s not shown with a booking.com pin on my screenshot above. The Old Mill is also shown as not skipped on maps.me, so let’s take a look at it in the app:

2016-08-01_12.09.00Ah; there’s the booking.com property, with the OSM node off to the east under a slightly different name, and the OSM restaurant node somewhere in between. (When you search maps.me for restaurants, in Toronto at least, there are no referral links: missed opportunity there, folks!)

Since OSM is community-generated data, sometimes our spelling isn’t so great. One hotel node that leaps out from the maps.me node set is the (now fixed) “Sharaton”:

2016-07-31_23.35.19aka Sheraton Toronto Airport Hotel to booking.com, or Sheraton Toronto Airport Hotel & Conference Centre to its friends:

2016-07-31_23.36.11I wonder if maps.me will exclude the OSM node for this Sheraton from now on? Now that its name has been edited, if they do skip it, it means that they’re comparing OSM against a proprietary list, so would be breaking the guidelines. On the other hand, if they don’t exclude it, they really need to get a better fuzzy match algorithm, and post haste.

We can’t know the algorithm that maps.me uses to choose hotels to include referral links and skip the OSM data. The details of that are their trade secret, and no open licence can compel them to disclose it. But the cynic in me has found a possible clue in the budget motels that line Kingston Road on the eastern Toronto lake shore. Both the Days Inn – Toronto East and the Park Motel are listed on booking.com. Only the slightly more expensive Days Inn gets a referral link on maps.me. Could it be that the value of the cheaper motel’s referral isn’t worth including? (Update: this is wrong; I found another reason, which I will write about later.)

In addition, I wonder if maps.me’s skipped nodes list could be considered a “derived work” of booking.com? As the choice of nodes to skip is informed by booking.com’s data, it does give us a small insight into their database.

Some notes on the wartime “Modified British” map coordinate systems

Growing up in a small country, I assumed the whole world used metric grid map coordinates. I mean, why would anyone bother with those tedious latitudes and longitudes when you could have your location defined by something as neat as NS539555?

The tidy, militaristic Ordnance Survey came up with the National Grid reference system, where large grid squares were given letters, and the rest of the reference was given numerically. Wikipedia gives a better graphical explanation than I could:

Illustration of the Ordnance Survey National Grid coordinate system, with Trafalgar Square as an example [CC BY-SA 3.0]

During WW2, the UK war office extended this system across most of Europe. Since most European countries didn’t use exactly the same Transverse Mercator projection as the UK did, a number of existing mapping systems were pressed into use, but using the same interleaved alphanumeric format as the OS grid reference.

The reference site for these systems is Thierry Arsicaud’s excellent Notes on the “Modified British System” used on the European Theatre of Operations during the WWII. Thierry, however, wasn’t trying to use these historical map references in a GIS, so his work needs a little massage to get to be used with QGIS.

In this example, I’m going to concentrate on the South Italy zone, as that’s where I was asked to look at some war diaries from 1943. The system is similar to the OS grid, but the main difference is that the major grid reference is often given in lower case. So RN in OSGB would most often be denoted rN in the Modified British system. Both would refer to a 100 km square at (700, 700) km from the origin. (The exceedingly nitpicky might point out that RN is never used in the UK as it’s somewhere in the Atlantic, west of Ireland. To them, I say: Well, bless your heart)

The key to both the OSGB grid and the Modified British system is a 2500 × 2500 km square, split into 25 500 km squares, and given a letter, az with i excluded:

mod_brit-major

Each letter encodes both an easting and a northing; so r is (500000, 500000). About the easiest way to unpack this encoding is through a simple string lookup:

result = SEARCH(letter, "VWXYZQRSTULMNOPFGHJKABCDE")-1
easting = result MOD 5
northing = INT(result / 5)

where SEARCH is a function which returns the position of a letter in a string (so SEARCH(“V”, “VWXYZ…”) returns 1).

When applied as per the GSGS South Italy system, you get something like this:

italy_south-major-r

These major grid squares are in turn split into 25 minor squares of 100 km side:

mod_brit-minor

Or overlaid on a map:

italy_south-minor-r

For grid references of higher precision, a series of numbers is appended. There should always be an even count of these numbers, for reasons which should become clear soon. Here is the rC square, split into 10 km references:

mod_brit-rC

These two digit references are about the shortest/least precise you might ever see. Overlaid on an appropriate sheet from the McMaster archive WWII Topographic Map Series (which are CC BY-NC; for which, many thanks), you get:

chieti-rC_grid

The actual projection details are given on each map:

south_italy

We can turn this into a PROJ.4 definition:

Projection - Lambert Conical Orthomorphic  →    +proj=lcc
Ellipsoid: Bessel 1841                     →    +ellps=bessel
False Easting : 700000                     →    +x_0=700000
False Northing : 600000                    →    +y_0=600000
Central Meridian : 14.0°                   →    +lon_0=14 
Central Parallel : 39.5°                   →    +lat_0=39.5 +lat_1=39.5
Scale Factor : 0.99906                     →    +k_0=0.99906
                       (other proj.4 terms)     +units=m +no_defs  

or in one line,

+proj=lcc +lat_0=39.5 +lat_1=39.5 +lon_0=14 +k_0=0.99906 +x_0=700000 +y_0=600000 +ellps=bessel +units=m +no_defs

In QGIS, you can plug those values into the Custom CRS manager, and you will be able to work in these antiquated coordinate systems with ease:

Screenshot from 2015-01-09 07:28:53

I haven’t yet quite managed to work out some of the other GSGS coordinate systems. My work on North Italy is a stubborn 100 km off true, for no well-defined reason. I haven’t managed to work out unpicking alphanumeric grid references into geometries automatically, either. These will come later.

Finally, some of the coordinates you might see might not meet these specifications. In the limited survey I’ve done, I’ve noted:

  1. references with the major grid missing, so rCxy… was written as Cxy….
  2. references to ‘MR’ (map reference), with no alphanumeric part, such as MR 322142 (from here), which would be more correctly given as rC322142.

Huge thanks to Thierry Arsicaud, both for the great reference website, and also for the e-mail correspondence helping explain the parameters for the GSGS Italy South system. Props too to the Geographic Information Systems Stack Exchange folks for help with working out the proj.4 settings.

QGIS on Raspberry Pi

Hey! This is really old! The current version of Raspbian has QGIS 2.4 included in the repository. Just install that. It won’t run very fast on single-core Raspberry Pis, though.

qgis-on-pi

  1. Install Raspbian:
  2. Update Raspbian from its Debian wheezy base to Debian jessie:
    • sudo vi /etc/apt/sources.list # or use your favourite editor
    • change all references of wheezy to jessie
    • sudo apt-get update
    • sudo apt-get upgrade # this will take a long time, with occasional user prompts
    • sudo apt-get dist-upgrade # this will take a very long time
  3. Install qgis: sudo apt-get install gdal-bin qgis

This will install QGIS 2.2. It’s a bit slow for general use, but it does work …

(modified from my gis.stackexchange answer: linux – GDAL and QGIS Raspberry Pi. Mainly just so I would have a place to put the image.)

Accurate distance buffers over very large distances

Today, I’m going to describe how I get fairly accurate buffer distances over a really large area.

But first, I’m going to send a huge look of disapproval (great big red look of disapproval, if your browser doesn't support inline data) to Norway. It’s not for getting all of the oil and finding a really mature way of dealing with it, and it’s not for the anti-personnel foods (rancid fish in a can, salt liquorice, sticky brown cheese …) either. It’s for this:

norway I disapprove of what you did there

The rest of the world is perfectly fine with having their countries split across Universal Transverse Mercator zones, but not Norway. “Och, my wee fjords…” they whined, and we gave them a whole special wiggle in their UTM zone. Had it not been for Norway’s preciousness, GIS folks the work over could’ve just worked out their UTM zone from a simple calculation, as every other zone is just 6° of longitude wide.

Canada has no such qualms. In a big country (dreams stay with you …), we have a lot of UTM zones:

canada's UTM zones; *EAT* it, Norway…We’re in zones 8–22, which is great if you’re working in geographic coordinates. If you’re unlucky enough to have to apply distance buffers over a long distance, the Earth is inconveniently un-flat, and accuracy falls apart.

What we can do, though, is transform a geographic coordinate into a projected one, apply a buffer distance, then transform back to geographic again. UTM zones are quite good for this, and if it weren’t for bloody Norway, it would be a trivial process. So first, we need a source of UTM grid data.

A Source of UTM Grid Data

Well, the Global UTM Zones Grid from EPDI looks right, and it’s CC BY-NC-SA licensed. But it’s a bit busy with all the grid squares:

canada-gridWhat’s more, there’s no explicit way of getting the numeric zone out of the CODE field (used as labels above). We need to munge this a bit. In a piece of gross data-mangling, I’m using an awk (think: full beard and pork chops) script to process a GeoJSON (all ironic facial hair and artisanal charcuterie) dump of the shape file. I’m not content to just return the zone number; I’m turning it into the EPSG WGS84 SRID of the zone, a 5-digit number understood by proj.4:

32hzz

where:

  • h is the hemisphere: 6 for north, 7 for south.
  • zz is the zone number.

I live in Zone 17 North, so my SRID is 32617.

Here’s the code to do it: zones_add_epsg-awk (which you’ll likely have to rename/fix permissions on). To use it:

  1. Unzip the Global UTM zones grid.
  2. Convert the shape file to GeoJSON, using ogr2ogr:
    ogr2ogr -f GeoJSON utm_zones_final.geojson utm_zones_final.shp
  3. Process it:
    ./zones_add_epsg.awk utm_zones_final.geojson >  utm_zones_final-srid.geojson
  4. Convert the modified GeoJSON back to a shapefile:
    ogr2ogr utm_zones_final-srid.shp utm_zones_final-srid.geojson
  5. Now some magic: create a simplified shapefile with entire UTM zones keyed against the (integer) SRID:
    ogr2ogr wgs84utm.shp utm_zones_final-srid.shp -dialect SQLITE -sql 'SELECT epsgsrid,ST_Union(Geometry) FROM "utm_zones_final-srid" GROUP BY epsgsrid;'

And, lo!

canada-sridSo we can now load this wgs84utm shapefile as a table in SpatiaLite. If you wanted to find the zone for the CN Tower (hint: it’s the same as me), you could run:

select EPSGSRID from wgs84utm where within(GeomFromText('POINT(-79.3869585 43.6425361)',4326), geom);

which returns ‘32617’, as expected.

Making the Transform

(I have to admit, I was amazed when this next bit worked.)

Let’s say we have to identify all the VOR stations in Canada, and draw a 25 km exclusion buffer around them (hey, someone might want to …). VOR stations can be queried from TAFL using the following criteria:

  • the licensee is Nav Canada, or similar,
  • the TX frequency is between 108–117.96 MHz,
  • the location  contains ‘VOR’.

This can be coded as:

SELECT *
FROM tafl
WHERE licensee LIKE 'NAV CANADA%' AND tx >= 108
AND tx <= 117.96 AND location LIKE '%VOR%';

which returns a list of 67 stations, from VYT80 on Mount Macintyre, YT to YYT St Johns, NL. We can use this, along with the UTM zone query above, to make beautiful, beautiful circles:


SELECT tafl.PK_ROWID, tafl.tx, tafl.location, tafl.callsign,
Transform (Buffer ( Transform ( tafl.geom, wgs84utm.epsgsrid ), 25000 )
, 4326 ) AS bgeom
FROM tafl, wgs84utm
WHERE
tafl.licensee LIKE 'NAV CANADA%'
AND tafl.tx >= 108
AND tafl.tx < 117.96
AND tafl.location LIKE '%VOR%'
AND Within( tafl.geom, wgs84utm.geom );

Ta-da!

ont-vor-buffer-geoYes, they look oval; don’t forget that geographic coordinates don’t maintain rectilinearity. Transformed to UTM, they look much more circular:

ont-vor-buffer-utm

TAFL — as a proper geodatabase

Update, 2017: TAFL now seems to be completely dead, and Spectrum Management System has replaced it. None of the records appear to be open data, and the search environment seems — if this is actually possible — slower and less feature-filled than in 2013.

Update, 2013-08-13: Looks like most of the summary pages for these data sets have been pulled from data.gc.ca; they’re 404ing. The data, current at the beginning of this month, can still be found at these URLs:

I build wind farms. You knew that, right? One of the things you have to take into account in planning a wind farm is existing radio infrastructure: cell towers, microwave links, the (now-increasingly-rare) terrestrial television reception.

I’ve previously written on how to make the oddly blobby shape files to avoid microwave links.  But finding the locations of radio transmitters in Canada is tricky, despite there being two ways of doing it:

  1. Wrestle with the Spectrum Direct website, which can’t handle the large search radii needed for comprehensive wind farm design. At best, it spits out weird fixed-width text data, which takes some effort to parse.
  2. Download the Technical and Administrative Frequency Lists (TAFL; see update above for URLs), and try to parse those (layout, fields). Unless you’re really patient, or have mad OpenRefine skillz, this is going to be unrewarding, as the files occasionally drop format bombs like
    never do this, okay?
    Yes, you just saw conditional different fixed-width fields in a fixed-width text file. In my best Malcolm Tucker (caution, swearies) voice I exhort you to never do this.

So searching for links is far from obvious, and it’s not like wireless operators do anything conventional like register their links on the title of the properties they cross … so these databases are it, and we must work with them.

The good things is that TAFL is now Open Data, defined by a reasonable Open Government Licence, and available on the data.gc.ca website. Unfortunately, the official Industry Canada tool to process and query these files, is a little, uh, behind the times:tafl2dbfYes, it’s an MS-DOS exe. It spits out DBase III Files. It won’t run on Windows 7 or 8. It will run on DOSBox, but it’s rather slow, and fails on bigger files.

That’s why I wrote taflmunge. It currently does one thing properly, and another kinda-sorta:

  1. For all TAFL records fed to it, generates a SpatiaLite database containing these points and all their data; certainly all the fields that the old EXE produced. This process seems to work for all the data I’ve fed to it.
  2. Tries to calculate point-to-point links for microwave communications. This it does less well, but I can see where the SQL is going wrong, and will fix it soon.

taflmunge runs anywhere SpatiaLite does. I’ve tested it on Linux and Windows 7. It’s just a SQL script, so no additional glue language required. The database can be queried on anything that supports SQLite, but for real spatial cleverness, needs SpatiaLite loaded. Full instructions are in the taflmunge / README.md.

TAFL is clearly maintained by licensees, as the data can be a bit “vernacular”. Take, for example, a tower near me:

pharm_eg

The tower is near the top of the image, but the database entries are spread out by several hundred meters. It’s the best we’ve got to work with.

Ultimately, I’d like to keep this maintained (the Open Data TAFL files are updated monthly), and host it in a nice WebGIS that would allow querying by location, frequency, call sign, operator, … But that’s for later. For now, I’ll stick with refining it locally, and I hope that someone will find it useful.