Toronto publishes its candidates here http://app.toronto.ca/vote2010/findByOffice.do?officeType=2&officeName=Councillor in a kind of tabular format. All I want to do is count the number of candidates per ward, remembering that some wards have no candidates yet.
Being lazy, I’d far rather have another program parse the HTML, so I work from the formatted output of W3M. It’s relatively easy to munge the output using Perl. From there, I hope to stick the additional data either into a new column in the shapefile, or use SpatiaLite. I’m undecided.
My dubious Perl script:
#!/usr/bin/perl -w # ward_candidates - mimic mez ward map # created by scruss on 02010/03/01 # RCS/CVS: $Id$ use strict; my $URL = 'http://app.toronto.ca/vote2010/findByOffice.do?officeType=2&officeName=Councillor'; my $stop = 1; my %wards; for ( 1 .. 44 ) { $wards{$_} = 0; # initialise count to zero for each ward } open( IN, "w3m -dump \"$URL\" |" ); while (<IN>) { chomp; s/^\s+//; next if (/^$/); $stop = 1 if (/^Withdrawn Candidate/); unless ( 1 == $stop ) { my ($ward) = /(\d+)$/; $wards{$ward}++; # increment candidate for this ward } $stop = 0 if (/^City Councillor/); } close(IN); foreach ( sort { $a <=> $b } ( keys(%wards) ) ) { printf( "%2d\t%2d\n", $_, $wards{$_} ); } exit;
which outputs the following (header added for clarity):
Ward Candidates ==== ========== 1 3 2 1 3 0 4 0 5 1 6 1 7 7 8 3 9 2 10 3 11 2 12 3 13 1 14 4 15 3 16 1 17 2 18 4 19 6 20 2 21 1 22 1 23 1 24 0 25 2 26 3 27 12 28 3 29 6 30 3 31 3 32 2 33 1 34 0 35 5 36 2 37 2 38 2 39 1 40 2 41 1 42 5 43 3 44 3
One reply on “closer to ward maps: scraping the data”
[…] I’ve sorted out formatting the labels and scraping the data, I should be almost ready to produce a pretty […]