Sometimes, you just have to roll your own…

It was Doors Open Toronto last weekend, and the city published the locations as open data: Doors Open Toronto 2013. I thought I’d try to geocode it after Richard suggested we take a look. OpenStreetMap has the Nominatim geocoder, which you can use freely as long as you accept restrictions on bulk queries.

As a good and lazy programmer, I first tried to find pre-built modules. Mistake #1; they weren’t up to snuff:

  • Perl’s Geo::Coder::Many::OSM would only read from OSM‘s server. MapQuest run their own mirror as part of their great MapQuest Open Platform Web Services suite, and they have almost no limitation on query volume. OSM runs their operation on a shoestring, and too many queries gets you the disapproval face, or worse.
  • Python’s geopy gave spurious results amid copious whiny error messages.
    (Standard operational procedure for python, then… ☺)

So I rolled my own, using nowt but the Nominatim Search Service Developer’s Guide, and good old simple modules like URI::Escape, LWP::Simple, and JSON::XS. Much to my surprise, it worked!

Much as I love XML, it’s a bit hard to read as a human, so I smashed the Doors Open data down to simple pipe-separated text: dot.txt. Here’s my code, ever so slightly specialized for searching in Toronto:

#!/usr/bin/perl -w
# geonom.pl - geocode pipe-separated addresses with nominatim
# created by scruss on 02013/05/28

use strict;
use URI::Escape;
use LWP::Simple;
use JSON::XS;

# the URL for OpenMapQuest's Nominatim service
use constant BASEURI => 'http://open.mapquestapi.com/nominatim/v1/search.php';

# read pipe-separated values from stdin
# two fields: Site Name, Street Address
while (<>) {
    chomp;
    my ( $name, $address ) = split( '\|', $_, 2 );
    my %query_hash = (
        format  => 'json',
        street  => cleanaddress($address),    # decruft address a bit
                                              # You'll want to change these ...
        city    => 'Toronto',                 # fixme
        state   => 'ON',                      # fixme
        country => 'Canada',                  # fixme
        addressdetails => 0,                  # just basic results
        limit          => 1,                  # only want first result
             # it's considered polite to put your e-mail address in to the query
             # just so the server admins can get in touch with you
        email => 'me@mydomain.com',    # fixme

        # limit the results to a box (quite a bit) bigger than Toronto
        bounded => 1,
        viewbox => '-81.0,45.0,-77.0,41.0'    # left,top,right,bottom - fixme
    );

    # get the result from Nominatim, and decode it to a hashref
    my $json = get( join( '?', BASEURI, escape_hash(%query_hash) ) );
    my $result = decode_json($json);
    if ( scalar(@$result) > 0 ) {             # if there is a result
        print join(
            '|',    # print result as pipe separated values
            $name, $address,
            $result->[0]->{lat},
            $result->[0]->{lon},
            $result->[0]->{display_name}
          ),
          "\n";
    }
    else {          # no result; just echo input
        print join( '|', $name, $address ), "\n";
    }
}
exit;

sub escape_hash {

    # turn a hash into escaped string key1=val1&key2=val2...
    my %hash = @_;
    my @pairs;
    for my $key ( keys %hash ) {
        push @pairs, join( "=", map { uri_escape($_) } $key, $hash{$key} );
    }
    return join( "&", @pairs );
}

sub cleanaddress {

    # try to clean up street addresses a bit
    # doesn't understand proper 'Unit-Number' Canadian addresses tho.
    my $_ = shift;
    s/Unit.*//;     # shouldn't affect result
    s/Floor.*//;    # won't affect result
    s/\s+/ /g;      # remove extraneous whitespace
    s/ $//;
    s/^ //;
    return $_;
}

It quickly became apparent that the addresses had been entered by hand, and weren’t going to geocode neatly. Here are some examples of the bad ones:

  • 200 University Ave St W — it’s an avenue, not a street, and it runs north-south
  • 2087 Davenport Road (Rear House) Rd Unit: Rear — too many rears
  • 21 Colonel Sameul Smith Park Dr — we can’t fix typos
  • 0 Construction Trailer:Lower Simcoe at Lakeshore Blvd — 0? Zero??? What are you, some kinda python programmer?

Curiously, some (like the address for Black Creek Pioneer Village) were right, but just not found. Since the source was open data, I put the right address into OpenStreetMap, so for next year, typos aside, we should be able to find more events.

Now, how accurate were the results? Well, you decide:

2 thoughts on “Sometimes, you just have to roll your own…”

  1. No, I did try it at the time, and it didn’t return useful results from OpenMapquest. It seems to do so now.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>