My post diary of a geonumpty to my main blog is really what got me started thinking about abstract geographic data. In it, I (with a lot of external help) develop queries to count points in areas with the same owner, and find points outside properties.
While most GIS applications work with delimited text inputs, sometimes you just have to have a shapefile. Amongst many other things, Frank Warmerdam wrote the Shapefile C Library, which comes with a few simple tools. I suspect Frank meant the little utilities to be code samples that wouldn’t see much use, but they do the job.
Let’s take the coordinates 43.73066°N, 79.26482°W from my first entry. I will make a single point shapefile with this coordinate.
First you have to make the SHP file and the DBF database:
dbfcreate junction -s Name 16 shpcreate junction point
This makes an empty shapefile for storing points, with one string field ‘Name’ of width 16 characters.
Now you have to add your point – this takes two stages, adding the database row, and then adding the geometry:
dbfadd junction.dbf 'Chevron/Kenmark' shpadd junction.shp -79.26482 43.73066
And that’s it – you’ve made a trivial shapefile.
For reasons that are not particularly clear, the Toronto.ca|Open data is in two different coordinate reference systems (CRS), MTM 3 Degree Zone 10, NAD27 (EPSG 2019) and UTM 6 Degree Zone 17N NAD27 (EPSG 26717). This confuses QGIS even if you’ve input the proper SRIDs into SpatiaLite. The image above shows two apparent Torontos, one in each of the CRSs.
What you have to do is go to to the Project Properties, select the Coordinate Reference System (CRS) tab, and “Enable ‘on the fly’ CRS transformation”. This will line those city layers right back up.
select Distance(Parks.geometry, Neighbourhoods.geometry)/1000 as Distance_km from Parks, Neighbourhoods where Parks.name='CORVETTE PARK' and Neighbourhoods.hood='Kennedy Park'
which returns a distance of over 314 km. That’s not right.
So we need to transform the geometries to the same CRS.
!!! NB: I might be doing the next bit wrong. CRS transformation is subtle. I’m not, particularly.
The OGR Simple Feature Library is your friend. It can convert pretty much any geo format to another, and can transform coordinates between systems. In exchange for this power, it wants your soul is rather complex.
I’ve chosen to use NAD83(CSRS) / UTM zone 17N (EPSG 2958) for my Toronto maps. It’s fairly accurate over the whole city area. To convert the Parks and Neighbourhoods shape files:
ogr2ogr -s_srs EPSG:2019 -t_srs EPSG:2958 dest/2958/parks/TCL3_UPARK.shp src/2019/parks/TCL3_UPARK.shp ogr2ogr -s_srs EPSG:26717 -t_srs EPSG:2958 dest/2958/neighbourhoods/Neighbourhoods.shp src/26717/neighbourhoods/Neighbourhoods.shp
Note that it wants the destination file first, then the source. Haven’t seen that order since PIP under CPM/2.2. I was also a bit nerdly, and arranged the files in directories by SRID:
If we load the transformed shapefiles into Spatialite, and run that query again, it comes out with the correct distance: 0.0 km, as Corvette Park is in the Kennedy Park Neighbourhood.
Now we can run a proper query: what parks are in Kennedy Park, and what are their areas?
select tp.name, round(Area(tp.geometry)/10000,1) as Area_ha from Parks as tp, Neighbourhoods as tn where tn.hood='Kennedy Park' and within(tp.geometry, tn.geometry) order by Area_ha
|MAYWOOD TOT LOT||0.7|
|GLEN SHEPPARD PARK||1.0|
|MID-SCARBOROUGH C.C. & ARENA||2.9|
(note how I sneakily used the round() function to avoid too many decimal places?)
Me and Catherine are quite partial to libraries. I’m going to use the address points database we made yesterday to find the libraries within 2km of a given address. It’s not a very useful query, but it shows the very basics of searching by distance.
I’m going to use the address from yesterday, 789 Yonge St. The fields I’m interested in are:
- address – this is the street number (789)
- lf_name – the street name, in all-caps, with the customary abbreviations for rd/ave/blvd, etc (YONGE ST)
- fcode_desc – the type of the address. Most places don’t have this set, but here it’s ‘Library’.
- geometry – the description of the feature’s locus. This isn’t human readable, but can be viewed with the AsText() function.
I’m also going to use a calculated field for the distance to make the query shorter. Since my map units are metres, calculating Distance(…)/1000 will return kilometres. So:
select t2.name, t2.address, t2.lf_name, distance( t1.geometry, t2.geometry ) / 1000 as Distance_km from TCL3_ADDRESS_POINT as t1, TCL3_ADDRESS_POINT as t2 where t1.address = 789 and t1.lf_name = 'YONGE ST' and t2.fcode_desc = 'Library' and distance_km < 2 order by distance_km
Note I’m using two instances of the same table; one for the source address (t1), and the other for the destinations (t2). The results I get are:
|Toronto Reference||789||YONGE ST||0.0|
|130||ST GEORGE ST||1.2973836702297|
|Spadina Road||10||SPADINA RD||1.52482151385834|
|252||MC CAUL ST||1.58040842489387|
|40||ST GEORGE ST||1.59417399071161|
|Lillian H.Smith Library||239||COLLEGE ST||1.81606690760918|
|265||GERRARD ST E||1.86262658418202|
|Parliament||269||GERRARD ST E||1.87733631488281|
|Deer Park||40||ST CLAIR AVE E||1.9224871094566|
There’s one at zero distance, because 789 Yonge St is a library, so the search finds itself. Try any other address, and you wouldn’t get the zero. I’m pretty sure the 14 decimal places is overkill.
I’m going to use SpatiaLite and the Toronto One Address Repository to try some simple geocoding. That is, given an address, spit out the real-world map coordinates. As it happens, the way the Toronto data is structured it doesn’t really need to use any GIS functions, just some SQL queries. There are faster and better ways to code this, but I’m just showing you how to load up data and run simple queries.
SpatiaLite is my definition of magic. It’s an extension to the lovely SQLite database that allows you to work with spatial data – instead of selecting data within tables, you can select within polygons, or intersections with lines, or within a distance of a point.
I’m going to try to avoid having too many maps here, as maps are a snapshot of a particular view of a GIS at a certain time. Maps I can make; GIS is what I’m trying to learn.
So, download the data and load up SpatiaLite GUI. Here I’ve created a new database file. addresses.sqlite. I’m all ready to load the shapefile.
Shapefiles are messy things, and are definitely glaikit. Firstly, they’re a misnomer; a shapefile is really a bunch of files which need to be kept together. They’re also a really old format; the main information store is actually a dBaseIII database. They also have rather dodgy ways of handling projection metadata. For all their shortcomings, no-one’s come up with anything better that people actually use.
Projection information is important, because the world is inconveniently unflat. If you think of a projected X-Y coordinate system as a graph paper Post-It note stuck to a globe, the grid squares depend on where you’ve decided to stick the note. Also, really only the tiny flat part that’s sticking to the globe closely approximates to real-world coordinates.
Thankfully, the EPSG had a handle on all this projection information (and, likely, Post-It notes). Rather than using proprietary metadata files, they have a catalogue of numbers that exactly identify map projections. SpatiaLite uses these Spatial Reference System Identifiers (SRIDs) to keep different projections lined up.
Toronto says its address data is in ‘MTM 3 Degree Zone 10, NAD27’. That’s not a SRID. You can list all the SRIDs that SpatiaLite knows with:
select * from spatial_ref_sys
which returns over 3500 results.
As we know there’s an MTM (Modified Transverse Mercator) and a 27 in the title, we can narrow things down:
select srid,ref_sys_name from spatial_ref_sys where ref_sys_name like '%MTM%' and ref_sys_name like '%27%'
The results are a bit more manageable:
|2017||NAD27(76) / MTM zone 8|
|2018||NAD27(76) / MTM zone 9|
|2019||NAD27(76) / MTM zone 10|
|2020||NAD27(76) / MTM zone 11|
|2021||NAD27(76) / MTM zone 12|
|2022||NAD27(76) / MTM zone 13|
|2023||NAD27(76) / MTM zone 14|
|2024||NAD27(76) / MTM zone 15|
|2025||NAD27(76) / MTM zone 16|
|2026||NAD27(76) / MTM zone 17|
|32081||NAD27 / MTM zone 1|
|32082||NAD27 / MTM zone 2|
|32083||NAD27 / MTM zone 3|
|32084||NAD27 / MTM zone 4|
|32085||NAD27 / MTM zone 5|
|32086||NAD27 / MTM zone 6|
So it looks like 2019 is our SRID. That last link goes to spatialreference.org, who maintain a handy guide to projections and SRIDs. (Incidentally, Open Toronto seems to use two different projections for its data – the other is ‘UTM 6 Degree Zone 17N NAD27’ with a SRID of 26717.)
So let’s load it:
This might take a while, as there are over 500,000 points in this data set.
If you want to use this data along with more complex geographic queries, add a Spatial Index by right-clicking on the Geometry table and ‘Build Spatial Index’. This will take a while again, and make the database file quite huge (128MB on my machine).
Update: there’s a much quicker way of doing this without messing with invproj in this comment.
Now we’re ready to geocode. I was at the Toronto Reference Library today, which is at 789 Yonge Street. Let’s find that location:
select easting, northing, address, lf_name, name, fcode_desc from TCL3_ADDRESS_POINT where lf_name like 'yonge%' and address=789
|313923.031||4836665.602||789||YONGE ST||Toronto Reference||Library|
(for most places, NAME and FCODE_DESC are blank.)
Ooooh … but those coordinates don’t look anything like the degrees we had yesterday. We have to convert back to unprojected decimal degrees with my old friend, proj. If we store the northing, easting and a label in a file, we can get the get the geographic coordinates with:
invproj -E -r -f "%.6f" +proj=tmerc +lat_0=0 +lon_0=-79.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=clrk66 +units=m +no_defs < file.txt
which gives us:
4836665.602000 313923.031000 -79.386869 43.671824 Library
Now that’s more like it: 43.671824°N, 79.386869°W. On a map, that’s:
Pretty close, eh?
Incidentally, I didn’t just magic up that weird invproj line. Most spatial databases use proj to convert between projections, and carry an extra column with the command line parameters. For our SRID of 2019, we can call it up with this:
select proj4text from spatial_ref_sys where srid=2019;
+proj=tmerc +lat_0=0 +lon_0=-79.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=clrk66 +units=m +no_defs
Update: there’s a much better way of invoking invproj. It understands EPSG SRIDs, so we could have done:
invproj -E -r -f "%.6f" +init=EPSG:2019 < file.txt
So if I want to learn some GIS skills, it would be helpful if I had some data to work with. Here are two data sources I have slight familiarity with:
- Municipal Address Points – Toronto One Address Repository (November 2009 – MTM 3 Degree Zone 10, NAD27)
- Business Improvement Areas (BIA) (2009 – MTM 3 Degree Zone 10, NAD27)
- Toronto Centreline (TCL) (2009 – MTM 3 Degree Zone 10, NAD27)
- Food Banks (October 2009 – UTM 6 Degree Zone 17N NAD27)
- Neighbourhoods (October 2009 – UTM 6 Degree Zone 17N NAD27)
- Parks (October 2009 – MTM 3 Degree Zone 10, NAD27)
- Priority Investment Neighbourhoods (October 2009 – UTM 6 Degree Zone 17N NAD27)
- Places of Worship (2006 – UTM 6 Degree Zone 17N NAD27)
- Rent Banks (2007 – UTM 6 Degree Zone 17N NAD27)
- Rent Bank Zones (2007 – UTM 6 Degree Zone 17N NAD27)
- Solid Waste Management Districts (October 2009 – MTM 3 Degree Zone 10, NAD27)
- Transit City (October 2009 – UTM 6 Degree Zone 17N NAD27)
- City Wards (2009 – MTM 3 Degree Zone 10, NAD27)
The mixed map projections are a bit of a pain, and there are reports that some of the data is skewed from the rest of the Canadian data, but there’s much to love about this data.
GeoGratis, from Natural Resources Canada
An absolute tonne of data, in vector and raster formats. Services I’ve used are CanVec (vector data covering almost every feature) and Toporama (raster topographic maps; it has an associated Toporama Web Map Service).
My GPS just told me that the middle of the nearest road intersection to my house has the following coordinates:
What does that mean? Since latitudes are positive north from the equator, and longitudes positive east of the prime meridian, that means I’m at 43.73066°N, 79.26482°W. This is what it looks like on a map:
If I wanted to put on airs, I’d quote the location in degrees, minutes and seconds of arc. These units are a bit fiddly to calculate (a minute being 1/60th of a degree, and a second 1/60th of a minute) but are traditional and compact. You can cheat, and put the location into Google Maps and it’ll spit out the DMS coordinates, or you can work it out:
43.73066°N 43° .73066 × 60 = 43.8396′ 43′ .8396 × 60 = 50.376″ 50″ 79.26482°W 79° .26482 × 60 = 15.8892′ 15′ .8892 × 60 = 53.352″ 53″
So, 43° 43′ 50″ N, 79° 15′ 53″ W it is. (I’ve used prime and double-prime characters as I think it looks neater, and it shouldn’t confuse smart quotes).
So that’s my location, relative to somewhere else. I guess one could take a theodolite sextant, learn how to use it, and see (roughly) how far north I am. Longitude’s tricky, and when I first visited Greenwich I really thought that they’d discovered the physical 0° meridian there, and how convenient was it that it was so close to London’s docks! (See, this blog’s not called Numpty’s Progress for nothing.)
But degrees don’t really tell me how far you are from something, and the surface of a not-quite sphere is inconvenient to map onto a flat surface. So from now on, I’m going to look around me as if my surroundings are flat.