Looks like hell has frozen over — there’s now an Ontario open buildings layer that’s compatible with the OSM licence: The Open Database of Buildings. Thanks go to StatCan for aggregating all the data sets into one huge database.
Update, late 2017: Looks like geocoder.ca’s database hasn’t been updated since late 2015. The download link is still active, though.
geocoder.ca provides a crowdsourced postal code geocoder under the ODbL. You can download the database as CSV directly. Here’s a bash script to convert that text file into a (very large) point shapefile:
#!/bin/bash
# geocoder2shp.sh - convert geocoder.ca CSV to a shape file
# NB: input CSV is UTF-8; it is passed through unchanged
# Needs >= v1.7 of GDAL
# scruss - 2012/04/15
if [ ! -f Canada.csv.gz ]
then
echo ""
echo " " Download \"Canada.csv.gz\" into the current directory from
echo " " http://geocoder.ca/onetimedownload/Canada.csv.gz
echo " " and try again.
echo ""
exit 1
fi
# make input file with header
echo PostalCode,Latitude,Longitude,City,Province > Canada2.csv
gunzip -c Canada.csv.gz >> Canada2.csv
# create GDAL VRT file
cat > Canada2.vrt <<EOF
<OGRVRTDataSource>
<!-- note that OGRVRTLayer name must be basename of source file -->
<OGRVRTLayer name="Canada2">
<SrcDataSource>Canada2.csv</SrcDataSource>
<GeometryType>wkbPoint</GeometryType>
<LayerSRS>EPSG:4326</LayerSRS>
<GeometryField encoding="PointFromColumns" x="Longitude" y="Latitude"/>
<Field name="PostalCode" type="String" width="6" />
<Field name="Latitude" type="Real" />
<Field name="Longitude" type="Real" />
<Field name="City" type="String" width="60" />
<Field name="Province" type="String" width="2" />
</OGRVRTLayer>
</OGRVRTDataSource>
EOF
# create shapefile
ogr2ogr PostalCodes.shp Canada2.vrt
# clean up
rm -f Canada2.csv Canada2.vrt
Though the script is a bit Unix-centric, it’s just a simple list of instructions which could be run on any command line. What it does is add some headers to the geocoder.ca file, then sets up an OGR Virtual Format to convert the text into a fairly well-defined shapefile. When you use this shapefile, you should credit geocoder.ca as the ODbL requires.
When designing wind farms, you need to keep the turbines a certain distance apart from one another. If you don’t, the wakes from the turbines reduce efficiency, and the turbulence can reduce the warranted life of the machine. Typically, a manufacturer might specify a minimum downwind separation of 5 diameters, and a crosswind separation of 3 diameters. It’s an easy check with a buffer overlap, but these buffers are elliptical, which not all GIS packages can draw.
Take, for example, the following three points designated as (completely made-up) wind turbine locations:
These look quite far apart, even if you were using large, 100+m diameter wind turbines:
But if we have a wind direction of 210°, downwind/crosswind separation of 5D & 3D respectively, and a 101m diameter rotor, it’s not so good:
Turbines 2 & 3 are too close together; the ellipses shouldn’t touch.
As awk is the only scripting language I have on my work computer, I wrote the script that generates the buffer shapefile in awk. The script calls Frank’s Shapefile C Library utilities to actually make the shapefile. Here’s the code:
#!/bin/awk -f
# draw an ellipse based on turbine location to generate
# for WTG separation buffer
# scruss - 2011-09-27
# assumes that stdin has three columns:
# 1 - label
# 2 - x coordinate
# 3 - y coordinate
# variables:
# diameter = rotor diameter
# cross = crosswind separation, diameters
# down = downwind separation, diameters
# wind = prevailing wind direction
# base = base for shape file name
BEGIN {
OFMT="%.1f";
CONVFMT="%.1f";
OFS=" ";
if (diameter < 0) {
print "diameter must be set";
exit;
}
if (cross < 0) {
print "cross must be set";
exit;
}
if (down < 0) {
print "down must be set";
exit;
}
if (down < cross) {
print "down must be greater than cross";
exit;
}
if (wind < 0) {
print "wind must be set";
exit;
}
if (base ~ /^$/) {
print "base must be a string";
exit;
}
pi = 3.141592654; # I know, I know ...
# calc cartesian angle from wind bearing, in radians
beta = ((450 - wind)%360) * pi/180;
# output shapelib tools init commands
print "dbfcreate " base " -s name 40";
print "shpcreate " base " polygon";
}
# for every line
{
name=$1;
x=$2;
y=$3;
major = diameter * down/2;
minor = diameter * cross/2;
first="";
points="";
maxn=36;
for (i=0; i<maxn; i++) {
alpha = (i * (360/maxn)) * pi/180;
x1 = x + major * cos(alpha) * cos(beta) - minor * sin(alpha) * sin(beta);
y1 = y + major * cos(alpha) * sin(beta) + minor * sin(alpha) * cos(beta);
if (i == 0) { # store the first point
first= x1 " " y1;
}
points = points " " x1 " " y1;
}
points = points " " first;
print "dbfadd " base ".dbf " name;
print "shpadd " base, points;
}
awk is charmingly odd in that you can specify variable on the command line. Here’s how I called it, with the above coordinates as input:
After yesterday’s post, I went a bit nuts with working out the whole amateur radio grid locator thing (not that I’m currently likely to use it, though). I’d hoped to provide a shapefile of the entire world, but that would be too big for the format’s 2GB file size limit.
What I can give you, though, is:
A Perl program that will generate a shapefile of an entire Maidenhead grid field, down to the subsquare level: make_grid.pl. You’ll need Geo::Shapelib to make this work. 324 (= 182) of these files would cover the whole world, and at 8MB or so a pop, things get unwieldy quickly.
If anyone would like their grid square in Google Earth format, let me know, or read on …
Making KML Files
Several people have asked, so here’s how you convert to KML. You’ll need the OGR toolkit installed, which comes in several open-source geo software bundles: FWTools/osgeo4w/QGis. Let’s assume we want to make the grid square ‘EN’.
While most GIS applications work with delimited text inputs, sometimes you just have to have a shapefile. Amongst many other things, Frank Warmerdam wrote the Shapefile C Library, which comes with a few simple tools. I suspect Frank meant the little utilities to be code samples that wouldn’t see much use, but they do the job.
Let’s take the coordinates 43.73066°N, 79.26482°W from my first entry. I will make a single point shapefile with this coordinate.
First you have to make the SHP file and the DBF database:
dbfcreate junction -s Name 16
shpcreate junction point
This makes an empty shapefile for storing points, with one string field ‘Name’ of width 16 characters.
Now you have to add your point – this takes two stages, adding the database row, and then adding the geometry:
I’m going to use SpatiaLite and the Toronto One Address Repository to try some simple geocoding. That is, given an address, spit out the real-world map coordinates. As it happens, the way the Toronto data is structured it doesn’t really need to use any GIS functions, just some SQL queries. There are faster and better ways to code this, but I’m just showing you how to load up data and run simple queries.
SpatiaLite is my definition of magic. It’s an extension to the lovely SQLite database that allows you to work with spatial data – instead of selecting data within tables, you can select within polygons, or intersections with lines, or within a distance of a point.
I’m going to try to avoid having too many maps here, as maps are a snapshot of a particular view of a GIS at a certain time. Maps I can make; GIS is what I’m trying to learn.
So, download the data and load up SpatiaLite GUI. Here I’ve created a new database file. addresses.sqlite. I’m all ready to load the shapefile.
Shapefiles are messy things, and are definitely glaikit. Firstly, they’re a misnomer; a shapefile is really a bunch of files which need to be kept together. They’re also a really old format; the main information store is actually a dBaseIII database. They also have rather dodgy ways of handling projection metadata. For all their shortcomings, no-one’s come up with anything better that people actually use.
Projection information is important, because the world is inconveniently unflat. If you think of a projected X-Y coordinate system as a graph paper Post-It note stuck to a globe, the grid squares depend on where you’ve decided to stick the note. Also, really only the tiny flat part that’s sticking to the globe closely approximates to real-world coordinates.
Thankfully, the EPSG had a handle on all this projection information (and, likely, Post-It notes). Rather than using proprietary metadata files, they have a catalogue of numbers that exactly identify map projections. SpatiaLite uses these Spatial Reference System Identifiers (SRIDs) to keep different projections lined up.
Toronto says its address data is in ‘MTM 3 Degree Zone 10, NAD27’. That’s not a SRID. You can list all the SRIDs that SpatiaLite knows with:
select * from spatial_ref_sys
which returns over 3500 results.
As we know there’s an MTM (Modified Transverse Mercator) and a 27 in the title, we can narrow things down:
select srid,ref_sys_name from spatial_ref_sys where ref_sys_name like '%MTM%' and ref_sys_name like '%27%'
The results are a bit more manageable:
srid
ref_sys_name
2017
NAD27(76) / MTM zone 8
2018
NAD27(76) / MTM zone 9
2019
NAD27(76) / MTM zone 10
2020
NAD27(76) / MTM zone 11
2021
NAD27(76) / MTM zone 12
2022
NAD27(76) / MTM zone 13
2023
NAD27(76) / MTM zone 14
2024
NAD27(76) / MTM zone 15
2025
NAD27(76) / MTM zone 16
2026
NAD27(76) / MTM zone 17
32081
NAD27 / MTM zone 1
32082
NAD27 / MTM zone 2
32083
NAD27 / MTM zone 3
32084
NAD27 / MTM zone 4
32085
NAD27 / MTM zone 5
32086
NAD27 / MTM zone 6
So it looks like 2019 is our SRID. That last link goes to spatialreference.org, who maintain a handy guide to projections and SRIDs. (Incidentally, Open Toronto seems to use two different projections for its data – the other is ‘UTM 6 Degree Zone 17N NAD27’ with a SRID of 26717.)
So let’s load it:
This might take a while, as there are over 500,000 points in this data set.
If you want to use this data along with more complex geographic queries, add a Spatial Index by right-clicking on the Geometry table and ‘Build Spatial Index’. This will take a while again, and make the database file quite huge (128MB on my machine).
Update: there’s a much quicker way of doing this without messing with invproj in this comment.
Now we’re ready to geocode. I was at the Toronto Reference Library today, which is at 789 Yonge Street. Let’s find that location:
select easting, northing, address, lf_name, name, fcode_desc from TCL3_ADDRESS_POINT where lf_name like 'yonge%' and address=789
which gives:
EASTING
NORTHING
ADDRESS
LF_NAME
NAME
FCODE_DESC
313923.031
4836665.602
789
YONGE ST
Toronto Reference
Library
(for most places, NAME and FCODE_DESC are blank.)
Ooooh … but those coordinates don’t look anything like the degrees we had yesterday. We have to convert back to unprojected decimal degrees with my old friend, proj. If we store the northing, easting and a label in a file, we can get the get the geographic coordinates with:
Now that’s more like it: 43.671824°N, 79.386869°W. On a map, that’s:
Pretty close, eh?
Incidentally, I didn’t just magic up that weird invproj line. Most spatial databases use proj to convert between projections, and carry an extra column with the command line parameters. For our SRID of 2019, we can call it up with this:
select proj4text from spatial_ref_sys where srid=2019;
The mixed map projections are a bit of a pain, and there are reports that some of the data is skewed from the rest of the Canadian data, but there’s much to love about this data.
An absolute tonne of data, in vector and raster formats. Services I’ve used are CanVec (vector data covering almost every feature) and Toporama (raster topographic maps; it has an associated Toporama Web Map Service).