Tag Archives: toronto

Toronto’s Human Centre, part 2: by neighbourhood

Beware throwaway comments; that way overanalysis lies. This was a challenge.

Taking the 2006 neighbourhoods population into account, the human centre of Toronto is at 43.717955°N, 79.389828°W …

… pretty close to the one I’ve already worked out by ward.

Scraping the neighbourhood populations was hard. For the 140 neighbourhoods, the data is stored in a pdf with the URL like http://www.toronto.ca/demographics/cns_profiles/2006/pdf1/cpa124.pdf (in this case, 124; Kennedy Park, represent!). The population number is stored in a table on page 2 of each pdf. I used pdf2xml to convert the files into something parseable.

Of course, the tables weren’t exactly in the same place in every file, so I took a sample of 10% of the files, and worked out the X & Y coordinates of the population box. pdf2xml spits out elements like

<TOKEN sid="p2_s427" id="p2_w417" font-name="arialmt" serif="yes" fixed-width="yes" bold="no" italic="no" font-size="7.47183" font-color="#000000" rotation="0" angle="0" x="299.739" y="122.117" base="129.634" width="22.7692" height="9.94501">17,050</TOKEN>

Yes, I should have used an XML parser, but a Small Matter of Perl got me 126 out of the 140 matching. The rest I keyed in by hand …

Table after the jump.

Continue reading

toronto’s human centre

Since it was trivial to calculate the centre of the city, I thought I’d do something a little more complex: calculate the centre of the city, weighted for population. I scraped the 2006 population data by ward from the City of Toronto: Ward Profiles pages; hooray for curl and regexes. Realising that I had no information on population distribution within wards, I made a good old engineering assumption: we could idealize the population as a point mass at the centroid of each ward, and then calculate the centre of mass by balancing moments around the X and Y axis. (I mean, c’mon – it’s the sort of thing we all think about daily … isn’t it? Guys … hello … anyone there?)

I’ll spare you the details until after the jump, but I calculate the human centre of Toronto to be at 43.717794°N, 79.390299°W – that’s in Blythwood Ravine, just south of Blythwood Road. We should have a picnic there …

Continue reading

Toronto’s Angles

As many Canadians will tell you, most Torontonians are a bit skewed. While others might have different explanations, I put it down to our road grid. What we call north in the city is actually some 16.7° west of north.

While we do have a perpendicular grid, it’s twisted to the left (steady, Edmontonians). I attempted to work out what the average angle is.

I eyeballed several long straight streets, and picked out representative intersections. Finidng the coordinates of these intersections took a surprisingly long time, as each street has several road sections. To find an intersection, each section on one road has to be queried against each section of the other. It gets there, eventually.

To save the tedium of showing the queries, here are the results. The angles are anticlockwise from the X-axis.

N-S	 Intersection1 Intersection2 Angle
======== ============= ============= ======
Bathurst King	       Dupont	     16.184
Yonge	 King	       Rosehill	     16.783
Woodbine Queen	       Plains	     17.049

E-W	 Intersection1 Intersection2 Angle
======== ============= ============= ======
Danforth Greenwood     Main	     16.736
Eglinton Dufferin      Bayview	     16.159
Steeles	 Dufferin      Kennedy	     17.194

I was surprised that SpatiaLite didn’t have any angle measurement functions built in, so I had to feed the output through geod. So for instance, for Steeles Avenue, the intersection of Steeles West and Dufferin is at 43.787229°N, 79.470285°W, and the intersection of Steeles East and Kennedy is 43.823636°N, 79.307265°W. Feeding these numbers to geod:

echo '43.787229 -79.470285 43.823636 -79.307265'| geod +ellps=WGS84 -f "%.3f" -p -I +units=m

This spits out three numbers: 72.806    252.918    13727.395. The first is the bearing from the first point to the second; this is the number we’re interested in, and it’s (90-82.806) = 17.194° clockwise from E. The second is the bearing from the second point back to the first. The last is the distance in metres; Steeles is a long, straight street.

Averaging the numbers for the streets I chose gets 16.684°; just the thing for Torontohenge. For a visual axis, maybe using Steeles’s 17.194° could be better, as it’s a line above the city. You can set the map rotation angle in QGIS’s map composer, and get a better aligned city:

Finding the exact geographic centre of Toronto

Spacing has alluded to it. The Ontario Science Centre makes bold (and incorrect) claims about it. But here’s the real deal.

  • If you consider Toronto to be defined by its city wards, the centre of Toronto lies at 43.725518°N, 79.390531°W.
  • If you consider Toronto to be defined by its neighbourhoods, the centre of Toronto lies at 43.726495°N, 79.390641°W.

You can work this out in one line of SQL. By combining all the wards or neighbourhoods into one union shape (SpatiaLite uses the GUnion() function), and then calculating the centroid, that’s the centre of the city:

select astext(transform(centroid(gunion(geometry)),4326)) from wards

To get the results in a more human-friendly format, I transformed it to WGS84 (EPSG SRID 4326), and used astext() to get it in something other than binary.

Update, 18 March: great minds, etc … Torontoist just posted the similar In Search of Toronto’s Geographic Centre.

closer to ward maps: scraping the data

Toronto publishes its candidates here  http://app.toronto.ca/vote2010/findByOffice.do?officeType=2&officeName=Councillor in a kind of tabular format. All I want to do is count the number of candidates per ward, remembering that some wards have no candidates yet.

Being lazy, I’d far rather have another program parse the HTML, so I work from the formatted output of W3M. It’s relatively easy to munge the output using Perl. From there, I hope to stick the additional data either into a new column in the shapefile, or use SpatiaLite. I’m undecided.

My dubious Perl script:

#!/usr/bin/perl -w
# ward_candidates - mimic mez ward map
# created by scruss on 02010/03/01
# RCS/CVS: $Id$

use strict;
my $URL =
my $stop = 1;

my %wards;
for ( 1 .. 44 ) {
 $wards{$_} = 0;    # initialise count to zero for each ward

open( IN, "w3m -dump \"$URL\" |" );
while (<IN>) {
 next if (/^$/);
 $stop = 1 if (/^Withdrawn Candidate/);
 unless ( 1 == $stop ) {
 my ($ward) = /(\d+)$/;
 $wards{$ward}++;    # increment candidate for this ward
 $stop = 0 if (/^City Councillor/);

foreach ( sort { $a <=> $b } ( keys(%wards) ) ) {
 printf( "%2d\t%2d\n", $_, $wards{$_} );


which outputs the following (header added for clarity):

Ward Candidates
==== ==========
 1     3
 2     1
 3     0
 4     0
 5     1
 6     1
 7     7
 8     3
 9     2
10     3
11     2
12     3
13     1
14     4
15     3
16     1
17     2
18     4
19     6
20     2
21     1
22     1
23     1
24     0
25     2
26     3
27    12
28     3
29     6
30     3
31     3
32     2
33     1
34     0
35     5
36     2
37     2
38     2
39     1
40     2
41     1
42     5
43     3
44     3

Labelling: harder than it looks

I’m rather taken with Mez’s rather neat Toronto ward candidate maps. I wonder if I could reproduce them (semi-)automatically?

As a start, here’s the Toronto Wards layer, rendered in QGIS with the ward number as a label:

You’ll notice that something is quite off. It looks like QGIS uses the centre of the minimum bounding rectangle of a polygon as the label point. While this is okay for nice regular shapes, weird glaikit shapes end up with the label outside the boundary. Not good.

I was about to give up on this completely, when I saw QGIS’s “Labeling” [sic] plugin. What it does is work out a variety of better visual positions for your labels. Here’s the setting I chose:

The result is much more pleasing:

Much better.

tale of two cities: coordinate reference systems, and what on earth is the maywood tot lot?

For reasons that are not particularly clear, the Toronto.ca|Open data is in two different coordinate reference systems (CRS), MTM 3 Degree Zone 10, NAD27 (EPSG 2019) and UTM 6 Degree Zone 17N NAD27 (EPSG 26717). This confuses QGIS even if you’ve input the proper SRIDs into SpatiaLite. The image above shows two apparent Torontos, one in each of the CRSs.

What you have to do is go to to the Project Properties, select the Coordinate Reference System (CRS) tab, and “Enable ‘on the fly’ CRS transformation”. This will line those city layers right back up.

Once we do that, things align as they should. Here’s my neighbourhood, with its parks

But things are still off if you’re querying the SQL directly:

select Distance(Parks.geometry, Neighbourhoods.geometry)/1000
 as Distance_km
 from Parks, Neighbourhoods
 where Parks.name='CORVETTE PARK'
 and Neighbourhoods.hood='Kennedy Park'

which returns a distance of over 314 km. That’s not right.

So we need to transform the geometries to the same CRS.

!!! NB: I might be doing the next bit wrong. CRS transformation is subtle. I’m not, particularly.

The OGR Simple Feature Library is your friend. It can convert pretty much any geo format to another, and can transform coordinates between systems. In exchange for this power, it wants your soul is rather complex.

I’ve chosen to use NAD83(CSRS) / UTM zone 17N (EPSG 2958) for my Toronto maps. It’s fairly accurate over the whole city area. To convert the Parks and Neighbourhoods shape files:

ogr2ogr -s_srs EPSG:2019 -t_srs EPSG:2958 dest/2958/parks/TCL3_UPARK.shp src/2019/parks/TCL3_UPARK.shp
ogr2ogr -s_srs EPSG:26717 -t_srs EPSG:2958 dest/2958/neighbourhoods/Neighbourhoods.shp src/26717/neighbourhoods/Neighbourhoods.shp

Note that it wants the destination file first, then the source. Haven’t seen that order since PIP under CPM/2.2. I was also a bit nerdly, and arranged the files in directories by SRID:


If we load the transformed shapefiles into Spatialite, and run that query again, it comes out with the correct distance: 0.0 km, as Corvette Park is in the Kennedy Park Neighbourhood.

Now we can run a proper query: what parks are in Kennedy Park, and what are their areas?

select tp.name, round(Area(tp.geometry)/10000,1) as Area_ha
 from Parks as tp, Neighbourhoods as tn
where tn.hood='Kennedy Park'
 and within(tp.geometry, tn.geometry)
order by Area_ha
NAME Area_ha

(note how I sneakily used the round() function to avoid too many decimal places?)

my first real spatial query: finding nearby libraries

Me and Catherine are quite partial to libraries. I’m going to use the address points database we made yesterday to find the libraries within 2km of a given address. It’s not a very useful query, but it shows the very basics of searching by distance.

I’m going to use the address from yesterday, 789 Yonge St. The fields I’m interested in are:

  • address – this is the street number (789)
  • lf_name – the street name, in all-caps, with the customary abbreviations for rd/ave/blvd, etc (YONGE ST)
  • fcode_desc – the type of the address. Most places don’t have this set, but here it’s ‘Library’.
  • geometry – the description of the feature’s locus. This isn’t human readable, but can be viewed with the AsText() function.

I’m also going to use a calculated field for the distance to make the query shorter. Since my map units are metres, calculating Distance(…)/1000 will return kilometres. So:

select t2.name, t2.address, t2.lf_name,
distance( t1.geometry, t2.geometry ) / 1000 as
from TCL3_ADDRESS_POINT as t1,
where t1.address = 789 and t1.lf_name =
 'YONGE ST' and t2.fcode_desc = 'Library' and
 distance_km < 2
 order by distance_km

Note I’m using two instances of the same table; one for the source address (t1), and the other for the destinations (t2). The results I get are:

Toronto Reference 789 YONGE ST 0.0
Yorkville 22 YORKVILLE AVE 0.161394052244849
130 ST GEORGE ST 1.2973836702297
Spadina Road 10 SPADINA RD 1.52482151385834
252 MC CAUL ST 1.58040842489387
40 ST GEORGE ST 1.59417399071161
Lillian H.Smith Library 239 COLLEGE ST 1.81606690760918
265 GERRARD ST E 1.86262658418202
Parliament 269 GERRARD ST E 1.87733631488281
Deer Park 40 ST CLAIR AVE E 1.9224871094566

There’s one at zero distance, because 789 Yonge St is a library, so the search finds itself. Try any other address, and you wouldn’t get the zero. I’m pretty sure the 14 decimal places is overkill.

I see that some of the libraries don’t have names. I’m pretty sure that the St George St ones are the UofT Library, and the McCaul St one is OCAD‘s, but the others, I can’t tell.

a simple geocoder for toronto

I’m going to use SpatiaLite and the Toronto One Address Repository to try some simple geocoding. That is, given an address, spit out the real-world map coordinates. As it happens, the way the Toronto data is structured it doesn’t really need to use any GIS functions, just some SQL queries. There are faster and better ways to code this, but I’m just showing you how to load up data and run simple queries.

SpatiaLite is my definition of magic. It’s an extension to the lovely SQLite database that allows you to work with spatial data – instead of selecting data within tables, you can select within polygons, or intersections with lines, or within a distance of a point.

I’m going to try to avoid having too many maps here, as maps are a snapshot of a particular view of a GIS at a certain time. Maps I can make; GIS is what I’m trying to learn.

So, download the data and load up SpatiaLite GUI. Here I’ve created a new database file. addresses.sqlite. I’m all ready to load the shapefile.

Shapefiles are messy things, and are definitely glaikit. Firstly, they’re a misnomer; a shapefile is really a bunch of files which need to be kept together. They’re also a really old format; the main information store is actually a dBaseIII database. They also have rather dodgy ways of handling projection metadata. For all their shortcomings, no-one’s come up with anything better that people actually use.

Projection information is important, because the world is inconveniently unflat. If you think of a projected X-Y coordinate system as a graph paper Post-It note stuck to a globe, the grid squares depend on where you’ve decided to stick the note. Also, really only the tiny flat part that’s sticking to the globe closely approximates to real-world coordinates.

Thankfully, the EPSG had a handle on all this projection information (and, likely, Post-It notes). Rather than using proprietary metadata files, they have a catalogue of numbers that exactly identify map projections. SpatiaLite uses these Spatial Reference System Identifiers (SRIDs) to keep different projections lined up.

Toronto says its address data is in ‘MTM 3 Degree Zone 10, NAD27′. That’s not a SRID. You can list all the SRIDs that SpatiaLite knows with:

select * from spatial_ref_sys

which returns over 3500 results.

As we know there’s an MTM (Modified Transverse Mercator) and a 27 in the title, we can narrow things down:

select srid,ref_sys_name from spatial_ref_sys where ref_sys_name like '%MTM%' and ref_sys_name like '%27%'

The results are a bit more manageable:

srid ref_sys_name
2017 NAD27(76) / MTM zone 8
2018 NAD27(76) / MTM zone 9
2019 NAD27(76) / MTM zone 10
2020 NAD27(76) / MTM zone 11
2021 NAD27(76) / MTM zone 12
2022 NAD27(76) / MTM zone 13
2023 NAD27(76) / MTM zone 14
2024 NAD27(76) / MTM zone 15
2025 NAD27(76) / MTM zone 16
2026 NAD27(76) / MTM zone 17
32081 NAD27 / MTM zone 1
32082 NAD27 / MTM zone 2
32083 NAD27 / MTM zone 3
32084 NAD27 / MTM zone 4
32085 NAD27 / MTM zone 5
32086 NAD27 / MTM zone 6

So it looks like 2019 is our SRID. That last link goes to spatialreference.org, who maintain a handy guide to projections and SRIDs. (Incidentally, Open Toronto seems to use two different projections for its data – the other is ‘UTM 6 Degree Zone 17N NAD27′ with a SRID of 26717.)

So let’s load it:

This might take a while, as there are over 500,000 points in this data set.

If you want to use this data along with more complex geographic queries, add a Spatial Index by right-clicking on the Geometry table and ‘Build Spatial Index’. This will take a while again, and make the database file quite huge (128MB on my machine).

Update: there’s a much quicker way of doing this without messing with invproj in this comment.

Now we’re ready to geocode. I was at the Toronto Reference Library today, which is at 789 Yonge Street. Let’s find that location:

select easting, northing, address, lf_name, name, fcode_desc from TCL3_ADDRESS_POINT where lf_name like 'yonge%' and address=789

which gives:

313923.031 4836665.602 789 YONGE ST Toronto Reference Library

(for most places, NAME and FCODE_DESC are blank.)

Ooooh … but those coordinates don’t look anything like the degrees we had yesterday. We have to convert back to unprojected decimal degrees with my old friend, proj. If we store the northing, easting and a label in a file, we can get the get the geographic coordinates with:

invproj -E -r -f "%.6f" +proj=tmerc +lat_0=0 +lon_0=-79.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=clrk66 +units=m +no_defs < file.txt

which gives us:

4836665.602000	313923.031000	-79.386869	43.671824	Library

Now that’s more like it:  43.671824°N, 79.386869°W. On a map, that’s:

Pretty close, eh?

Incidentally, I didn’t just magic up that weird invproj line. Most spatial databases use proj to convert between projections, and carry an extra column with the command line parameters. For our SRID of 2019, we can call it up with this:

select proj4text from spatial_ref_sys where srid=2019;

which returns:

+proj=tmerc +lat_0=0 +lon_0=-79.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=clrk66 +units=m +no_defs

Update: there’s a much better way of invoking invproj. It understands EPSG SRIDs, so we could have done:

invproj -E -r -f "%.6f" +init=EPSG:2019 < file.txt