closer to ward maps: scraping the data

Toronto publishes its candidates here  http://app.toronto.ca/vote2010/findByOffice.do?officeType=2&officeName=Councillor in a kind of tabular format. All I want to do is count the number of candidates per ward, remembering that some wards have no candidates yet.

Being lazy, I’d far rather have another program parse the HTML, so I work from the formatted output of W3M. It’s relatively easy to munge the output using Perl. From there, I hope to stick the additional data either into a new column in the shapefile, or use SpatiaLite. I’m undecided.

My dubious Perl script:


#!/usr/bin/perl -w
# ward_candidates - mimic mez ward map
# created by scruss on 02010/03/01
# RCS/CVS: $Id$

use strict;
my $URL =
'http://app.toronto.ca/vote2010/findByOffice.do?officeType=2&officeName=Councillor';
my $stop = 1;

my %wards;
for ( 1 .. 44 ) {
 $wards{$_} = 0;    # initialise count to zero for each ward
}

open( IN, "w3m -dump \"$URL\" |" );
while (<IN>) {
 chomp;
 s/^\s+//;
 next if (/^$/);
 $stop = 1 if (/^Withdrawn Candidate/);
 unless ( 1 == $stop ) {
 my ($ward) = /(\d+)$/;
 $wards{$ward}++;    # increment candidate for this ward
 }
 $stop = 0 if (/^City Councillor/);
}
close(IN);

foreach ( sort { $a <=> $b } ( keys(%wards) ) ) {
 printf( "%2d\t%2d\n", $_, $wards{$_} );
}

exit;

which outputs the following (header added for clarity):

Ward Candidates
==== ==========
 1     3
 2     1
 3     0
 4     0
 5     1
 6     1
 7     7
 8     3
 9     2
10     3
11     2
12     3
13     1
14     4
15     3
16     1
17     2
18     4
19     6
20     2
21     1
22     1
23     1
24     0
25     2
26     3
27    12
28     3
29     6
30     3
31     3
32     2
33     1
34     0
35     5
36     2
37     2
38     2
39     1
40     2
41     1
42     5
43     3
44     3

One thought on “closer to ward maps: scraping the data”

Leave a Reply

Your email address will not be published. Required fields are marked *