Categories
GIS

Sometimes, you just have to roll your own…

It was Doors Open Toronto last weekend, and the city published the locations as open data: Doors Open Toronto 2013. I thought I’d try to geocode it after Richard suggested we take a look. OpenStreetMap has the Nominatim geocoder, which you can use freely as long as you accept restrictions on bulk queries.

As a good and lazy programmer, I first tried to find pre-built modules. Mistake #1; they weren’t up to snuff:

  • Perl’s Geo::Coder::Many::OSM would only read from OSM‘s server. MapQuest run their own mirror as part of their great MapQuest Open Platform Web Services suite, and they have almost no limitation on query volume. OSM runs their operation on a shoestring, and too many queries gets you the disapproval face, or worse.
  • Python’s geopy gave spurious results amid copious whiny error messages.
    (Standard operational procedure for python, then… ☺)

So I rolled my own, using nowt but the Nominatim Search Service Developer’s Guide, and good old simple modules like URI::Escape, LWP::Simple, and JSON::XS. Much to my surprise, it worked!

Much as I love XML, it’s a bit hard to read as a human, so I smashed the Doors Open data down to simple pipe-separated text: dot.txt. Here’s my code, ever so slightly specialized for searching in Toronto:

#!/usr/bin/perl -w
# geonom.pl - geocode pipe-separated addresses with nominatim
# created by scruss on 02013/05/28

use strict;
use URI::Escape;
use LWP::Simple;
use JSON::XS;

# the URL for OpenMapQuest's Nominatim service
use constant BASEURI => 'http://open.mapquestapi.com/nominatim/v1/search.php';

# read pipe-separated values from stdin
# two fields: Site Name, Street Address
while (<>) {
    chomp;
    my ( $name, $address ) = split( '\|', $_, 2 );
    my %query_hash = (
        format  => 'json',
        street  => cleanaddress($address),    # decruft address a bit
                                              # You'll want to change these ...
        city    => 'Toronto',                 # fixme
        state   => 'ON',                      # fixme
        country => 'Canada',                  # fixme
        addressdetails => 0,                  # just basic results
        limit          => 1,                  # only want first result
             # it's considered polite to put your e-mail address in to the query
             # just so the server admins can get in touch with you
        email => 'me@mydomain.com',    # fixme

        # limit the results to a box (quite a bit) bigger than Toronto
        bounded => 1,
        viewbox => '-81.0,45.0,-77.0,41.0'    # left,top,right,bottom - fixme
    );

    # get the result from Nominatim, and decode it to a hashref
    my $json = get( join( '?', BASEURI, escape_hash(%query_hash) ) );
    my $result = decode_json($json);
    if ( scalar(@$result) > 0 ) {             # if there is a result
        print join(
            '|',    # print result as pipe separated values
            $name, $address,
            $result->[0]->{lat},
            $result->[0]->{lon},
            $result->[0]->{display_name}
          ),
          "\n";
    }
    else {          # no result; just echo input
        print join( '|', $name, $address ), "\n";
    }
}
exit;

sub escape_hash {

    # turn a hash into escaped string key1=val1&key2=val2...
    my %hash = @_;
    my @pairs;
    for my $key ( keys %hash ) {
        push @pairs, join( "=", map { uri_escape($_) } $key, $hash{$key} );
    }
    return join( "&", @pairs );
}

sub cleanaddress {

    # try to clean up street addresses a bit
    # doesn't understand proper 'Unit-Number' Canadian addresses tho.
    my $_ = shift;
    s/Unit.*//;     # shouldn't affect result
    s/Floor.*//;    # won't affect result
    s/\s+/ /g;      # remove extraneous whitespace
    s/ $//;
    s/^ //;
    return $_;
}

It quickly became apparent that the addresses had been entered by hand, and weren’t going to geocode neatly. Here are some examples of the bad ones:

  • 200 University Ave St W — it’s an avenue, not a street, and it runs north-south
  • 2087 Davenport Road (Rear House) Rd Unit: Rear — too many rears
  • 21 Colonel Sameul Smith Park Dr — we can’t fix typos
  • 0 Construction Trailer:Lower Simcoe at Lakeshore Blvd — 0? Zero??? What are you, some kinda python programmer?

Curiously, some (like the address for Black Creek Pioneer Village) were right, but just not found. Since the source was open data, I put the right address into OpenStreetMap, so for next year, typos aside, we should be able to find more events.

Now, how accurate were the results? Well, you decide:

Categories
GIS

Georeferencing QGIS output rasters

Some of the design packages I rely on use very crude GIS facilities. In fact, all they can support is a georeferenced raster as a background image, so it’s more of a rough map than GIS. It helps if these rasters are at a decent resolution, typically 1m/pixel or better.

A while back, I asked on the QGIS forum if the package could output high resolution georeferenced rasters. I received a rather terse response that no, it couldn’t (and I inferred from the tone that the poster thought that it shouldn’t, and I was wrong to want such a thing). I shelved the idea at the time.

After having to fix a lot of paths in a QGIS project file I’d moved to a new system, I noticed that all the map composer attributes are rather neatly defined in the XML file structure. Some messing around with Perl, XML::Simple and Data::Dumper::Simple, and I had a little script that would spit out an ESRI World File for the map composer raster defined in the project.

To run this, you have to create a project with just one Print Composer page, save the composed map as an image, save the project, then run the script like this:

./geoprint.pl project.qgs > image.pngw

There are some caveats:

  • This probably won’t work for projects with multiple print composers
  • It doesn’t quite get the scale right, but it’s within a pixel or so. I may not have corrected for image borders.

Though there’s some fairly hideous XML-mungeing in the code, what the script does is entirely trivial. If you feel you can use it, good; if you feel you can improve it, be my guest.

#!/usr/bin/perl -w
# geoprint - georef a QGIS output image by creating a world file
# one arg: qgis project file (xml)
# $Id: geoprint.pl,v 1.3 2012/04/06 03:32:01 scruss Exp $

use strict;
use XML::Simple;
use constant MM_PER_INCH => 25.4;

my $qgis = $ARGV[0];
die "$qgis must exist\n" unless ( -f $qgis );

my $q = XMLin($qgis) or die "$!\n";
my %composer = %{ $q->{Composer} };
my $image_width =
  int( $composer{Composition}->{paperWidth} *
    $composer{Composition}->{printResolution} /
    MM_PER_INCH );
my $image_height =
  int( $composer{Composition}->{paperHeight} *
    $composer{Composition}->{printResolution} /
    MM_PER_INCH );

# we need xpixelsize, ypixelsize, ulx and uly
my $xpixelsize =
  ( $composer{ComposerMap}->{Extent}->{xmax} -
    $composer{ComposerMap}->{Extent}->{xmin} ) /
  int( $composer{ComposerMap}->{ComposerItem}->{width} *
    $composer{Composition}->{printResolution} /
    MM_PER_INCH );
my $ypixelsize =
  -1.0 *
  ( $composer{ComposerMap}->{Extent}->{ymax} -
    $composer{ComposerMap}->{Extent}->{ymin} ) /
  int( $composer{ComposerMap}->{ComposerItem}->{height} *
    $composer{Composition}->{printResolution} /
    MM_PER_INCH );
my $ulx =
  $composer{ComposerMap}->{Extent}->{xmin} -
  $xpixelsize *
  int( $composer{ComposerMap}->{ComposerItem}->{x} *
    $composer{Composition}->{printResolution} /
    MM_PER_INCH ) - $xpixelsize;
my $uly =
  $composer{ComposerMap}->{Extent}->{ymax} -
  $ypixelsize *
  int( $composer{ComposerMap}->{ComposerItem}->{y} *
    $composer{Composition}->{printResolution} /
    MM_PER_INCH ) - $ypixelsize;

printf( "%.12f\n%.12f\n%.12f\n%.12f\n%.12f\n%.12f\n",
  $xpixelsize, 0.0, 0.0, $ypixelsize, $ulx, $uly );

# FIXME? pixel scale seems a tiny bit off - allow for border?
exit;
Categories
GIS

Ham Radio log to interactive OpenStreetMap

You might notice that there’s now a Ham Radio QSO Map lurking on the front page. Thanks to the WordPress OpenStreetMap plugin (which I’ve slightly abused before). Here’s a small piece of Perl which will take your ADIF log and convert it to a WP-OSM marker file.

Note that this program assumes you’ve downloaded your log from QRZ.com, as it requires the locator field for both inbound and outbound stations.

#!/usr/bin/perl -w
# adif2osm - convert ADIF log to OSM map file
# scruss.com / VA3PID - 2011/06/19

use strict;
use constant MARKERDIR =>
  'https://glaikit.org/wp-content/plugins/osm/icons/';
use constant QRZURL => 'http://qrz.com/db/';
sub maidenhead2latlong;

my ( $temp, @results ) = '';

### Fast forward past header
while (<>) {
  last if m/<eoh>\s+$/i;
}

### While there are records remaining...
while (<>) {
  $temp .= $_;

  ### Process if end of record tag reached
  if (m/<eor>\s+$/i) {
    my %hash;
    $temp =~ s/\n//g;
    $temp =~ s/<eoh>.*//i;
    $temp =~ s/<eor>.*//i;
    my @arr = split( '<', $temp );
    foreach (@arr) {
      next if (/^$/);
      my ( $key, $val ) = split( '>', $_ );
      $key =~ s/:.*$//;
      $hash{ lc($key) } = $val unless ( $key eq '' );
    }
    push @results, \%hash;
    $temp = '';
  }
}

# generate OSM plugin file
my @data = ();
my ( $mygrid, $station_callsign ) = '';

# output header
print
  join( "\t", qw/lat lon title description icon iconSize iconOffset/ ),
  "\n";
foreach (@results) {
  next unless ( exists( $_->{gridsquare} ) && exists( $_->{call} ) );
  $mygrid = $_->{my_gridsquare}
    if ( exists( $_->{my_gridsquare} ) );
  $station_callsign = $_->{station_callsign}
    if ( exists( $_->{station_callsign} ) );

  push @data, $_->{freq} . ' MHz' if ( exists( $_->{freq} ) );
  $data[$#data] .= ' (' . $_->{band} . ')' if ( exists( $_->{band} ) );
  push @data, $_->{mode} if ( exists( $_->{mode} ) );
  push @data, $_->{qso_date} . ' ' . $_->{time_on} . 'Z'
    if ( exists( $_->{qso_date} ) && exists( $_->{time_on} ) );
  my ( $lat, $long ) = maidenhead2latlong( $_->{gridsquare} );
  print join( "\t",
    $lat,
    $long,
    '<a href="' . QRZURL . $_->{call} . '">' . $_->{call} . '</a>',
    join( ' - ', @data ),
    MARKERDIR . 'wpttemp-green.png',
    '0,-24' ),
    "\n";

  @data = ();
}

# show home station last, so it's on top
my ( $lat, $long ) = maidenhead2latlong($mygrid);
print join( "\t",
  $lat,
  $long,
  '<a href="'
    . QRZURL
    . $station_callsign . '">'
    . $station_callsign . '</a>',
  'Home Station',
  MARKERDIR . 'wpttemp-red.png',
  '0,-24' ),
  "\n";

exit;

sub maidenhead2latlong {

  # convert a Maidenhead Grid location (eg FN03ir)
  #  to decimal degrees
  # this code could be cleaner/shorter/clearer
  my @locator =
    split( //, uc(shift) );    # convert arg to upper case array
  my $lat      = 0;
  my $long     = 0;
  my $latdiv   = 0;
  my $longdiv  = 0;
  my @divisors = ( 72000, 36000, 7200, 3600, 300, 150 )
    ;                          # long,lat field size in seconds
  my $max = ( $#locator > $#divisors ) ? $#divisors : $#locator;

  for ( my $i = 0 ; $i <= $max ; $i++ ) {
    if ( int( $i / 2 ) % 2 ) {    # numeric
      if ( $i % 2 ) {             # lat
        $latdiv = $divisors[$i];    # save for later
        $lat += $locator[$i] * $latdiv;
      }
      else {                        # long
        $longdiv = $divisors[$i];
        $long += $locator[$i] * $longdiv;
      }
    }
    else {                          # alpha
      my $val = ord( $locator[$i] ) - ord('A');
      if ( $i % 2 ) {               # lat
        $latdiv = $divisors[$i];    # save for later
        $lat += $val * $latdiv;
      }
      else {                        # long
        $longdiv = $divisors[$i];
        $long += $val * $longdiv;
      }
    }
  }
  $lat  += ( $latdiv / 2 );         # location of centre of square
  $long += ( $longdiv / 2 );
  return ( ( $lat / 3600 ) - 90, ( $long / 3600 ) - 180 );
}

You’ll need to update MARKERDIR to reflect your own WP-OSM installation. Mine might move, so if you don’t change it, and you don’t get markers, please don’t blame me.

The basic code to include a map is like this:

You’ll need to change the marker_file URL, too.

Note that, while this script generates links into the QRZ callsign database, it doesn’t hit that site unless you click a link.

Categories
GIS

More radio amateur grid squares

Toronto, as understood by the Maidenhead Locator system

After yesterday’s post, I went a bit nuts with working out the whole amateur radio grid locator thing (not that I’m currently likely to use it, though). I’d hoped to provide a shapefile of the entire world, but that would be too big for the format’s 2GB file size limit.

What I can give you, though, is:

  • A Perl program that will generate a shapefile of an entire Maidenhead grid field, down to the subsquare level: make_grid.pl. You’ll need Geo::Shapelib to make this work. 324 (= 182) of these files would cover the whole world, and at 8MB or so a pop, things get unwieldy quickly.
  • A Google Earth KML file covering the whole world in 20° by 10° grid fields: Maidenhead_Locator_World_Grid. (If you’re feeling nerdy, here it is in Shapefile format: Maidenhead_Locator_World_Grid-shp).

If anyone would like their grid square in Google Earth format, let me know, or read on …

Making KML Files

Several people have asked, so here’s how you convert to KML. You’ll need the OGR toolkit installed, which comes in several open-source geo software bundles: FWTools/osgeo4w/QGis. Let’s assume we want to make the grid square ‘EN’.

    1. Run make_grid.pl:
      make_grid.pl en
    2. Convert to KML using ogr2ogr:
      ogr2ogr -f KML EN-maidenhead_grid.kml EN-maidenhead_grid.shp
    3. Alternatively, if you just want to extract a square (say EN82), you can use ogr2ogr’s ‘where’ clause to select just the geometry you want:
      ogr2ogr -f KML -where "Square='82'" EN82-maidenhead_grid.kml EN-maidenhead_grid.shp

 

Categories
GIS

Maidenhead Grid Locator, in Perl

Amateur radio operators sometimes use the Maidenhead Grid Locator Square system to identify location. Here’s my attempt at a translator, written in slightly squirrelly Perl:

#!/usr/bin/perl -w
# maidenhead - 6 character locator
# created by scruss/VA3PID on 02011/04/01
# RCS/CVS: $Id: maidenhead,v 1.3 2011/04/03 11:04:38 scruss Exp $

use strict;
sub latlong2maidenhead;
sub maidenhead2latlong;

if ( $#ARGV >= 1 ) {    # two or more args
                        # convert lat/long to grid
  print latlong2maidenhead( $ARGV[0], $ARGV[1] ), "\n";
}
elsif ( $#ARGV == 0 ) {    # one arg
                           # convert grid to lat/long
  print join( ", ", maidenhead2latlong( $ARGV[0] ) ), "\n";
}
else {                     # no args

  # print usage
  print 'Usage: ', $0, ' dd.dddddd ddd.dddddd', "\n",
    ' or', "\n",
    $0, ' grid_location', "\n";
}
exit;

sub maidenhead2latlong {

  # convert a Maidenhead Grid location (eg FN03ir)
  #  to decimal degrees
  # this code could be cleaner/shorter/clearer
  my @locator =
    split( //, uc(shift) );    # convert arg to upper case array
  my $lat      = 0;
  my $long     = 0;
  my $latdiv   = 0;
  my $longdiv  = 0;
  my @divisors = ( 72000, 36000, 7200, 3600, 300, 150 )
    ;                          # long,lat field size in seconds
  my $max = ( $#locator > $#divisors ) ? $#divisors : $#locator;

  for ( my $i = 0 ; $i <= $max ; $i++ ) {
    if ( int( $i / 2 ) % 2 ) {    # numeric
      if ( $i % 2 ) {             # lat
        $latdiv = $divisors[$i];    # save for later
        $lat += $locator[$i] * $latdiv;
      }
      else {                        # long
        $longdiv = $divisors[$i];
        $long += $locator[$i] * $longdiv;
      }
    }
    else {                          # alpha
      my $val = ord( $locator[$i] ) - ord('A');
      if ( $i % 2 ) {               # lat
        $latdiv = $divisors[$i];    # save for later
        $lat += $val * $latdiv;
      }
      else {                        # long
        $longdiv = $divisors[$i];
        $long += $val * $longdiv;
      }
    }
  }
  $lat  += ( $latdiv / 2 );         # location of centre of square
  $long += ( $longdiv / 2 );
  return ( ( $lat / 3600 ) - 90, ( $long / 3600 ) - 180 );
}

sub latlong2maidenhead {

  # convert a WGS84 coordinate in decimal degrees
  #  to a Maidenhead grid location
  my ( $lat, $long ) = @_;
  my @divisors =
    ( 72000, 36000, 7200, 3600, 300, 150 );    # field size in seconds
  my @locator = ();

  # add false easting and northing, convert to seconds
  $lat  = ( $lat + 90 ) * 3600;
  $long = ( $long + 180 ) * 3600;
  for ( my $i = 0 ; $i < 3 ; $i++ ) {
    foreach ( $long, $lat ) {
      my $div  = shift(@divisors);
      my $part = int( $_ / $div );
      if ( $i == 1 ) {    # do the numeric thing for 2nd pair
        push @locator, $part;
      }
      else {              # character thing for 1st and 3rd pair
        push @locator,
          chr( ( ( $i < 1 ) ? ord('A') : ord('a') ) + $part );
      }
      $_ -= ( $part * $div );    # leaves remainder in $long or $lat
    }
  }
  return join( '', @locator );
}

Grid square to lat/long and grid square shapefiles to follow.

Categories
GIS

Got a receiver inside my head – writing Spectrum Direct data as CSV

Industry Canada publishes the locations of all licensed radio spectrum users on Spectrum Direct. You can find all the transmitters/receivers near you by using its Geographical Area Search. And there are a lot near me:

While Spectrum Direct’s a great service, it has three major usability strikes against it:

  1. You can’t search by address or postal code; you need to know your latitude and longitude. Not just that, it expects your coordinates as a integer of the format DDMMSS.
  2. It’s very easy to overwhelm the system. Where I live, I can pretty much search for only 5km around me before the system times out.
  3. The output formats aren’t very useful. You can either get massively verbose XML, or very long line undelimited text, and neither of these are very easy to work with.

Never fear, Perl is here! I wrote a tiny script that glues together Dave O’Neill‘s Parse::SpectrumDirect::RadioFrequency module (which I wonder if you can guess what it does?) to Robbie Bow‘s  Text::CSV::Slurp module. The latter is used to blort out the former’s results to a CSV file that you can load into any GIS/mapping system.

Here’s the code:

#!/usr/bin/perl -w
# spectest.pl - generate CSV from Industry Canada Spectrum Direct data
# created by scruss on 02010/10/29 - for https://glaikit.org/

# usage: spectest.pl geographical_area.txt &gt; outfile.csv

use strict;
use Parse::SpectrumDirect::RadioFrequency;
use Text::CSV::Slurp;
use constant MINLAT =&gt; 40.0;    # all of Canada is &gt;40 deg N, for checking

my $prefetched_output = '';

# get the whole file as a string
while (&lt;&gt;) {
 $prefetched_output .= $_;
}

my $parser = Parse::SpectrumDirect::RadioFrequency-&gt;new();

# magically parse Spectrum Direct file
$parser-&gt;parse($prefetched_output) or die &quot;$!\n&quot;;
my $legend_hash = $parser-&gt;get_legend();    # get column descriptions
my @keys        = ();
foreach (@$legend_hash) {

 # retrieve column keys in order so the output will resemble input
 push @keys, $_-&gt;{key};
}

# get the data in a ref to an array of hashes
my $stations = $parser-&gt;get_stations();

my @good_stations = ();

# clean out bad values
foreach (@$stations) {
 next if ( $_-&gt;{Latitude} &lt; MINLAT );
 push @good_stations, $_;
}

# create csv file in memory then print it
my $csv = Text::CSV::Slurp-&gt;create(
 input       =&gt; \@good_stations,
 field_order =&gt; \@keys
);
print $csv;
exit;

The results aren’t perfect; QGis boaked on a file it made where one of the records appeared to have line breaks in it. It could filter out multiple pieces of equipment at the same call sign location. But it works, mostly, which is good enough for me.

Categories
GIS

three norths to choose from

North is a slippery thing, a trickster. There are at least three norths to choose from:

  • Magnetic north – where the compass needle points. Currently way in the northwest of Canada, it’s scooting rapidly towards Siberia. Come baaack, magnetic north! Was it something we said?
  • True north – straight up and down on the axis of this planet.
  • Grid north – the vertical lines on your map. Due to the earth’s curvature, there’s only one line on a grid that exactly coincides with true north. Away from that line, there’s a correction that you need to apply.

That grid to true correction is what I’m looking at today. In my work, I need to work out where shadows of wind turbines fall. I work in grid coordinates, so I need to correct for true north sun angles. Here’s a shell script that uses familiar programs invproj and geod to work out the local true north correction:

#!/bin/sh
# gridnorth.sh - show the grid shift angle at a certain point
# created by scruss on Sun 4 Apr 2010 18:19:29 EDT
# $Id: gridnorth.sh,v 1.2 2010/04/05 12:23:57 scruss Exp $

# you might want to change this
srid=2958

# two arguments mandatory, third optional
if
 [ $# -lt 2 ]
then
 echo Usage: $0 x y [srid]
fi

# truncate arguments to integers so the shell maths will work
x=`echo $1 | sed 's/\.[0-9]*//;'`
y=`echo $2 | sed 's/\.[0-9]*//;'`
# calculate points 1000 units (grid) south and north of the chosen point
lowy=$((y-1000))
highy=$((y+1000))

if
 [ $# -eq 3 ]
then
 srid=$3
fi

min=`echo $lowy  $x foo | invproj -r -f &amp;amp;quot;%.6f&amp;amp;quot; +init=EPSG:$srid`
max=`echo $highy $x foo | invproj -r -f &amp;amp;quot;%.6f&amp;amp;quot; +init=EPSG:$srid`
minlon=`echo $min | awk '{print $1;}'`
minlat=`echo $min | awk '{print $2;}'`
maxlon=`echo $max | awk '{print $1;}'`
maxlat=`echo $max | awk '{print $2;}'`
echo $minlat $minlon $maxlat $maxlon |\
 geod +ellps=WGS84 -f &amp;amp;quot;%.3f&amp;amp;quot; -p -I +units=m |\
 awk '{print $1;}'

The script is called with two arguments (the easting and the northing) and an optional third argument, the SRID for the current projection:

$ ./gridnorth.sh 639491 4843599 26717
1.198

The value returned is the angle (clockwise) that grid north differs from true north. Unless you’re using the same SRID that I’ve put in the script as default, you probably want to specify (or change) it.

So how do I know if this is even remotely correct? If you look at any CanMatrix scanned topographic map (either Print Ready or Georeferenced). there’s a declination indicator shown:

That’s the map eastern Toronto is in; its sheet reference is 030M11. We’ll ignore the magnetic declination for now, as 1984 is way out of date, and there is much to be said on geomagnetism that I don’t understand.  The map says its centre is 1° 12′ clockwise of true north. The centre of the map is approximately 641500E, 4831000N (UTM Zone 17N NAD27; EPSG 26717; or in real terms, in the lake, a few klicks SE of Ashbridges Bay) so:

$ ./gridnorth.sh 641500 4831000 26717
1.209

1° 12′ is 1.2, so I think I’m officially pretty durn close.

To show that grid skew changes across the map, the upper right hand corner of the map is 660000E, 4845000N, which is off by 1.375°.

Categories
GIS

Toronto’s Human Centre, part 2: by neighbourhood

Beware throwaway comments; that way overanalysis lies. This was a challenge.

Taking the 2006 neighbourhoods population into account, the human centre of Toronto is at 43.717955°N, 79.389828°W …

… pretty close to the one I’ve already worked out by ward.

Scraping the neighbourhood populations was hard. For the 140 neighbourhoods, the data is stored in a pdf with the URL like http://www.toronto.ca/demographics/cns_profiles/2006/pdf1/cpa124.pdf (in this case, 124; Kennedy Park, represent!). The population number is stored in a table on page 2 of each pdf. I used pdf2xml to convert the files into something parseable.

Of course, the tables weren’t exactly in the same place in every file, so I took a sample of 10% of the files, and worked out the X & Y coordinates of the population box. pdf2xml spits out elements like

<TOKEN sid="p2_s427" id="p2_w417" font-name="arialmt" serif="yes" fixed-width="yes" bold="no" italic="no" font-size="7.47183" font-color="#000000" rotation="0" angle="0" x="299.739" y="122.117" base="129.634" width="22.7692" height="9.94501">17,050</TOKEN>

Yes, I should have used an XML parser, but a Small Matter of Perl got me 126 out of the 140 matching. The rest I keyed in by hand …

Table after the jump.

Categories
GIS

closer to ward maps: scraping the data

Toronto publishes its candidates here  http://app.toronto.ca/vote2010/findByOffice.do?officeType=2&officeName=Councillor in a kind of tabular format. All I want to do is count the number of candidates per ward, remembering that some wards have no candidates yet.

Being lazy, I’d far rather have another program parse the HTML, so I work from the formatted output of W3M. It’s relatively easy to munge the output using Perl. From there, I hope to stick the additional data either into a new column in the shapefile, or use SpatiaLite. I’m undecided.

My dubious Perl script:


#!/usr/bin/perl -w
# ward_candidates - mimic mez ward map
# created by scruss on 02010/03/01
# RCS/CVS: $Id$

use strict;
my $URL =
'http://app.toronto.ca/vote2010/findByOffice.do?officeType=2&amp;officeName=Councillor';
my $stop = 1;

my %wards;
for ( 1 .. 44 ) {
 $wards{$_} = 0;    # initialise count to zero for each ward
}

open( IN, &quot;w3m -dump \&quot;$URL\&quot; |&quot; );
while (&lt;IN&gt;) {
 chomp;
 s/^\s+//;
 next if (/^$/);
 $stop = 1 if (/^Withdrawn Candidate/);
 unless ( 1 == $stop ) {
 my ($ward) = /(\d+)$/;
 $wards{$ward}++;    # increment candidate for this ward
 }
 $stop = 0 if (/^City Councillor/);
}
close(IN);

foreach ( sort { $a &lt;=&gt; $b } ( keys(%wards) ) ) {
 printf( &quot;%2d\t%2d\n&quot;, $_, $wards{$_} );
}

exit;

which outputs the following (header added for clarity):

Ward Candidates
==== ==========
 1     3
 2     1
 3     0
 4     0
 5     1
 6     1
 7     7
 8     3
 9     2
10     3
11     2
12     3
13     1
14     4
15     3
16     1
17     2
18     4
19     6
20     2
21     1
22     1
23     1
24     0
25     2
26     3
27    12
28     3
29     6
30     3
31     3
32     2
33     1
34     0
35     5
36     2
37     2
38     2
39     1
40     2
41     1
42     5
43     3
44     3