1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US"> <head profile="http://gmpg.org/xfn/11"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title> hbus.ca: Blog</title> <link rel="icon" type="image/png" href="http://hbus.ca/site_media/images/favico_bus.png"/>. <link href="http://hbus.ca/site_media/stylesheets/scaffold.css" rel="stylesheet" type="text/css" /> <link href="http://hbus.ca/site_media/stylesheets/style.css" rel="stylesheet" type="text/css" /> <link rel="stylesheet" href="http://hbus.ca/blog/wp-content/themes/hbus/style.css" type="text/css" media="screen" /> <script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/yahoo/yahoo-min.js"></script> <script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/yahoo-dom-event/yahoo-dom-event.js"></script> <script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/element/element-beta-min.js"></script> <script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/layout/layout-min.js"></script> <script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/event/event-min.js"></script> <link rel="alternate" type="application/rss+xml" title="hbus.ca: Blog RSS Feed" href="http://hbus.ca/blog/?feed=rss2" /> <link rel="alternate" type="application/atom+xml" title="hbus.ca: Blog Atom Feed" href="http://hbus.ca/blog/?feed=atom" /> <link rel="pingback" href="http://hbus.ca/blog/xmlrpc.php" /> <link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://hbus.ca/blog/xmlrpc.php?rsd" /> <link rel="wlwmanifest" type="application/wlwmanifest+xml" href="http://hbus.ca/blog/wp-includes/wlwmanifest.xml" /> <meta name="generator" content="WordPress 2.7.1" /> </head> <body id="nav_blog"> <div id="header"> <h1><a href="/">hbus<span>.ca</span></a> <span class="beta">beta</span></h1> <p>Plan trips on Halifax Metro Transit</p> <ul class="nav"> <li class="nav_home"><a href="http://hbus.ca/">Home</a></li> <li class="nav_about"><a href="http://hbus.ca/about">About</a></li> <li class="nav_help"><a href="http://hbus.ca/help">Help</a></li> <li class="nav_blog"><a href="http://hbus.ca/blog">Blog</a></li> <li class="nav_privacy"><a href="http://hbus.ca/privacy">Privacy</a></li> <ul> </div> <div class="container" id="content"> <div class="container"> <div id="sidebar"> <ul> <!-- Author information is disabled per default. Uncomment and fill in your details if you want to use it. <li><h2>Author</h2> <p>A little something about you, the author. Nothing lengthy, just an overview.</p> </li> --> <li><h2>Archives</h2> <ul> <li><a href='http://hbus.ca/blog/?m=200904' title='April 2009'>April 2009</a></li> </ul> </li> <!-- <li class="categories"><h2>Categories</h2><ul> <li class="cat-item cat-item-1"><a href="http://hbus.ca/blog/?cat=1" title="View all posts filed under Uncategorized">Uncategorized</a> (1) </li> </ul></li>--> <li id="linkcat-3" class="linkcat"><h2>Contributors</h2> <ul class='xoxo blogroll'> <li><a href="http://masalalabs.ca">Ginger Tea and Channa Masala » hbus</a></li> </ul> </li> <li id="linkcat-5" class="linkcat"><h2>Friends</h2> <ul class='xoxo blogroll'> <li><a href="http://carsharehfx.ca" title="Car sharing for Halifax">Car Share Halifax</a></li> <li><a href="http://thehubhalifax.ca" title="Coworking space in Halifax">The Hub Halifax</a></li> </ul> </li> <li id="linkcat-4" class="linkcat"><h2>Sponsors</h2> <ul class='xoxo blogroll'> <li><a href="http://navarra.ca" title="The Navarra Group">The Navarra Group</a></li> </ul> </li> <li><h2>Meta</h2> <ul> <li><a href="http://hbus.ca/blog/wp-login.php">Log in</a></li> <li><a href="http://validator.w3.org/check/referer" title="This page validates as XHTML 1.0 Transitional">Valid <abbr title="eXtensible HyperText Markup Language">XHTML</abbr></a></li> <li><a href="http://gmpg.org/xfn/"><abbr title="XHTML Friends Network">XFN</abbr></a></li> <li><a href="http://wordpress.org/" title="Powered by WordPress, state-of-the-art semantic personal publishing platform.">WordPress</a></li> </ul> </li> </ul> </div> <div id="posts"> <div class="post hentry category-uncategorized" id="post-9"> <h2><a href="http://hbus.ca/blog/?p=9" rel="bookmark" title="Permanent Link to Creating a google transit feed for fun and profit">Creating a google transit feed for fun and profit</a></h2> <small>April 23rd, 2009 <!-- by admin --></small> <div class="entry"> <p>People frequently ask me how I manage to collect and input the data that is used by hbus.ca, given Metro Transit’s intransigence. The “bike and GPS” angle is well known <a href="http://www.thecoast.ca/halifax/beta-the-public-transit-day-tripper/Content?oid=1098826">by now</a>, but what about the rest of the process? How do I get the data into a format that hbus.ca can consume?</p> <p>The defacto standard for the interchange of transit information is <a href="http://code.google.com/p/googletransitdatafeed">Google Transit Feed</a> (GTFS). This exceedingly simple comma seperated value format is now supported by a plethora of software, including <a href="http://google.com/transit">Google Transit</a>, <a href="http://github.com/bmander/graphserver">graphserver</a>, as well as my very own <a href="http://github.com/wlach/libroutez">libroutez</a> (used by hbus.ca). It was obvious to me right from the beginning that the first step to building hbus.ca would be to create one of these feeds. </p> <p>Manipulating a GTFS by hand is probably not a great idea. It’s basically a dump of a relational database, and is pretty inscrutable from the point of view of a human being. What I really want to be able to do is be able<br /> to manipulate things on the level of stops, service periods, and routes– and let some kind of abstraction layer take care of the low-level details. Fortunately, the awesome engineers at google created a python library called <a href="http://code.google.com/p/googletransitdatafeed">Google Transit Data Feed</a>, which can help with creating one of these things by providing abstractions of the key elements of a google transit feed (stops, service periods, etc.). You can then write a program which uses these abstractions to create and save a GTFS.</p> <p>Of course, providing the library appropriate information is easier said than done. Metro Transit’s PDF schedules are not readily computer parsable (being designed to be printed out, after all). I needed some kind of semi-automated way of converting a Metro Transit schedule into GTFS, or this whole project was<br /> going nowhere fast. </p> <p>As an initial step, it turns out that it’s quite possible to extract textual information from a PDF using the open source <a href="http://poppler.freedesktop.org/">popplar</a> library. From there, it’s possible to extract the stopping times for an individual bus route. Let’s give an example. For example, let’s take the case of adding the 60 (Portland Hill’s route), something I’m currently working on. All I had to do was download the PDF file from Metro Transit’s site and then run the following on the command line:<br /> <code><br /> pdftotext -raw route60.pdf<br /> </code><br /> The raw option basically makes sure the raw strings are dumped to disk, and that no attempt is made to preserve formatting. The result is a text file with content like this in it:<br /> <code><br /> 842a 847a 855a 858a 903a 906a 912a -<br /> 857a 902a 910a 913a 918a 921a - 925a<br /> 910a 915a 923a 926a 931a 934a 940a -<br /> 940a 945a 953a - 1000a 1003a 1009a -<br /> ...and every 30 minutes until<br /> 210p 215p 223p - 230p 233p 239p -<br /> </code><br /> This type of format can be parsed easily enough. To create a proper transit feed though, schedule information isn’t enough: you also need to know the locations of the stops, names of routes, etc. After some deliberation, I came to the determination that I needed some kind of intermediate format to store the above schedule information and this additional information. It would be readable both by humans (to ease its creation) and machines.</p> <p>The obvious markup for something like this is <a href="http://yaml.org">YAML</a> (if you’re still using XML to store structured information, run, don’t walk, and look at YAML: you can thank me later). Simple, clean, effective. GTFS is still the better choice for using the information in another application as its representation is much more amenable to being stored in a graph. Here’s a few examples of my YAML format in action:</p> <p><a href="http://github.com/wlach/halifax-transit-feed/blob/fef68c18928272670b3c57ae5530260deed85883/7-robie-to-gottingen.yml">7 (Robie to Gottingen)</a><br /> <a href="http://github.com/wlach/halifax-transit-feed/blob/fef68c18928272670b3c57ae5530260deed85883/10-to-westphal.yml">10 (Westphal)</a></p> <p>Besides the scheduling information, the other main interesting component of a GTFS is the location of the stops. As anyone who’s used a Metro Transit schedule has noticed, only major timepoints are covered in the PDF schedules. What of all the stops in between? This is where the bike and GPS come in.</p> <p>What I did was take a standard GPS from Mountain Equipment Co-op (The Garmin GPSMap 60x), get on my bike, take the readings of individual gotime numbers and positioning information, of the individual stops between the major timepoints. I then took this device back to my computer and, using a utility called <a href="http://gpsbabel.org">GPSBabel</a>, dumped out the stop information in a format called “comma seperated value”. It looks like this:<br /> <code><br /> 44.65825, -63.59252, 6785-21-31-33-34-35-3-7<br /> 44.65982, -63.59452, 6768-21-31-33-35-86-3-7<br /> 44.66113, -63.59659, 6782-21-31-33-34-35-3-7<br /> </code><br /> The first two items are latitude and longitude, providing the positioning of the stop. The last item is a gotime number, followed by the set of buses which pass by the stop. Turning this into YAML is a matter of applying<br /> the following regular expression to the input:<br /> <code><br /> \([0-9]+.[0-9]+\), \(-63.[0-9]+\), \([0-9]+\)- -> - { name: xxx, stop_code: \3, lat: \1, lng: \2 }<br /> </code><br /> To get an actual name for the stop (i.e.: “Gottingen and Young”), I wrote a simple script which finds the nearest intersection close to the stop in the <a href="http://geobase.ca">GeoBase</a> dataset. I then (at my discretion) corrected it based on my on-the-street knowledge of the layout of Halifax as well as adding certain details to help the user (e.g. bus stops on the way to the south end of Halifax are marked “south bound”).</p> <p>With these two elements in place (a format for creating human-readable transit information and a library for creating GTFS), the only thing left to do is create a program which bridges the gap. Behold, the magic of<br /> <a href="http://github.com/wlach/halifax-transit-feed/blob/fef68c18928272670b3c57ae5530260deed85883/createfeed.py">createfeed.py</a>. With all of this in place, creating a google transit feed for Halifax is a simple matter of typing “make”.</p> <p>Is this a ridiculous amount of work? I wouldn’t say so. The vast, vast majority of my work on hbus.ca has been in creating the pathfinding code and geocoding functionality. This is work that can be translated to many different municipalities, and can easily be extended and made more useful in a myriad of ways.</p> <p>What does seem a little intimidating to me is completing what I started. Capturing bus stop information for the Halifax peninsula is one thing, but covering the outlying areas (Bayer’s Lake, Sackville, etc.) is quite<br /> another. There’s a lot of biking involved there, more perhaps than what one person can reasonably be expected to do. It was my hope that the initial release of hbus would validate the model of community-developed transit software to Metro Transit and they would see the benefit of releasing their internal copy of this data to the public, but unfortunately that doesn’t seem to have happened. </p> <p>Getting that problem solved seems to be more a political problem than a technical one, and it’s not my specialty. It really does make me wonder if I shouldn’t reconsider the option of crowd sourcing, which I had<br /> <a href="http://masalalabs.ca/2009/03/hbusca-and-thoughts-about-crowdsourcing/">rejected</a> earlier.</p> </div> <p class="postmetadata"> Posted in <a href=" |