untangling CanVec

It’s almost as if NRCan doesn’t want anyone to use CanVec. I mean, it’s a free and comprehensive data set for the whole country; anyone who can type in a postal code and click a couple of times can download the CanVec map tile for where they live. But on the other hand, cracking open that download reveals an impenetrable mess of information that probably makes most users go away.

I’ve played with it before, and do occasionally drag out a layer at work, but have never got much further than that. GIS types must be very quiet, because Using CanVec – maphew and CanVec – OpenStreetMap Wiki are about the only public discussions of its content.

CanVec is delivered in two formats: Geography Markup Language (GML), and our friend, the Shapefile. While the GML version contains relatively few files, all the tools I have choke on the data. So shapefiles it is.

Opening up the archive for the Toronto area (it’s tile 030m11) I see 192 files. Four of those are (not very useful) metadata files. Realising that a shapefile ships as four files (the mandatory shp, dbf and shx files, plus the optional prj file) that’s 47 layers. The file names look a bit like this: 030m11_6_0_BS_1250009_0.shp, 030m11_6_0_BS_1370009_2.shp, 030m11_6_0_BS_2010009_0.shp. The names really do mean something:

|--+-| |+| |----+-----|
   |    |        \ Layer Code and Type
   |    \ Version
   \ Map tile

CanVec – Entity Names and Codes, Edition 1.1.0 (XLS) explains the layers, and how they relate to the shapefile names. Rather than relating unique layer codes to layer descriptions, the Entity Names & Codes document has it backwards. So I made the much simplified canvec_simple-20100523.csv which lists layer codes against attributes in a more sensible manner. I added a derived ‘name’ column, which I use for layer naming from these files. The layers use EPSG:4617 (NAD83 CSRS) coordinates.

Tip of the hat to maphew – Revision 123: /trunk/gis/canvec for providing a file that was the ‘Aha!’ moment.