uMap is neat. It allows you to trace places, routes and areas on top of OpenStreetMap tiles, and then publish/share the results.
(here’s a full screen link to my work of cartographic genius.)
It was Doors Open Toronto last weekend, and the city published the locations as open data: Doors Open Toronto 2013. I thought I’d try to geocode it after Richard suggested we take a look. OpenStreetMap has the Nominatim geocoder, which you can use freely as long as you accept restrictions on bulk queries.
As a good and lazy programmer, I first tried to find pre-built modules. Mistake #1; they weren’t up to snuff:
So I rolled my own, using nowt but the Nominatim Search Service Developer’s Guide, and good old simple modules like URI::Escape, LWP::Simple, and JSON::XS. Much to my surprise, it worked!
Much as I love XML, it’s a bit hard to read as a human, so I smashed the Doors Open data down to simple pipe-separated text: dot.txt. Here’s my code, ever so slightly specialized for searching in Toronto:
#!/usr/bin/perl -w
# geonom.pl - geocode pipe-separated addresses with nominatim
# created by scruss on 02013/05/28
use strict;
use URI::Escape;
use LWP::Simple;
use JSON::XS;
# the URL for OpenMapQuest's Nominatim service
use constant BASEURI => 'http://open.mapquestapi.com/nominatim/v1/search.php';
# read pipe-separated values from stdin
# two fields: Site Name, Street Address
while (<>) {
chomp;
my ( $name, $address ) = split( '\|', $_, 2 );
my %query_hash = (
format => 'json',
street => cleanaddress($address), # decruft address a bit
# You'll want to change these ...
city => 'Toronto', # fixme
state => 'ON', # fixme
country => 'Canada', # fixme
addressdetails => 0, # just basic results
limit => 1, # only want first result
# it's considered polite to put your e-mail address in to the query
# just so the server admins can get in touch with you
email => 'me@mydomain.com', # fixme
# limit the results to a box (quite a bit) bigger than Toronto
bounded => 1,
viewbox => '-81.0,45.0,-77.0,41.0' # left,top,right,bottom - fixme
);
# get the result from Nominatim, and decode it to a hashref
my $json = get( join( '?', BASEURI, escape_hash(%query_hash) ) );
my $result = decode_json($json);
if ( scalar(@$result) > 0 ) { # if there is a result
print join(
'|', # print result as pipe separated values
$name, $address,
$result->[0]->{lat},
$result->[0]->{lon},
$result->[0]->{display_name}
),
"\n";
}
else { # no result; just echo input
print join( '|', $name, $address ), "\n";
}
}
exit;
sub escape_hash {
# turn a hash into escaped string key1=val1&key2=val2...
my %hash = @_;
my @pairs;
for my $key ( keys %hash ) {
push @pairs, join( "=", map { uri_escape($_) } $key, $hash{$key} );
}
return join( "&", @pairs );
}
sub cleanaddress {
# try to clean up street addresses a bit
# doesn't understand proper 'Unit-Number' Canadian addresses tho.
my $_ = shift;
s/Unit.*//; # shouldn't affect result
s/Floor.*//; # won't affect result
s/\s+/ /g; # remove extraneous whitespace
s/ $//;
s/^ //;
return $_;
}
It quickly became apparent that the addresses had been entered by hand, and weren’t going to geocode neatly. Here are some examples of the bad ones:
Curiously, some (like the address for Black Creek Pioneer Village) were right, but just not found. Since the source was open data, I put the right address into OpenStreetMap, so for next year, typos aside, we should be able to find more events.
Now, how accurate were the results? Well, you decide: