Incomplete data, or not fully-understood data, can lead to some pretty faulty conclusions.
On Wednesday, 570 News published an unbelievable headline: “Waterloo Region has the longest commute time in the country.” The article cites a new study from traffic app Waze, claiming the average commute time in Waterloo is 53 minutes, nearly double that of Toronto, where it was 31 minutes.
Don’t buy it.
It’s not even remotely plausible when compared against other data. Statistics Canada says the average commute duration in Waterloo Region was just under 16 minutes in the 2011 National Household Survey. The Transportation for Tomorrow Survey also shows more than half of all trips by Waterloo Region residents were under 5 km, with only around 10% of trips (13% of driver trips) being greater than 20 km in length. Granted, all these data points come from 2011, (we should get more recent 2016 data by the end of the year) but it’s difficult to conceive how employment patterns and traffic could have been so radically altered in five or six years to more than triple the average commute length.
What’s the reason for such an outrageous figure? We don’t have access to specific details of the Waze study, but we can probably make the assumption that it’s based on data collected from their own users. This means the report is limited to people whose drives are so long and unpredictable that they have opted to download and install the Waze app to help ease their commute.
You can probably see where this might be generating significant self-selection bias, especially in a community like Waterloo Region where most commutes are short and traffic is often light – there isn’t widespread need for an app to guide everyday trips. To get an average of 53 minutes, there’s also likely a significant share of Waterloo Region Waze app users who commute out of town. (If anything is to be concluded from these figures, it is perhaps that we need faster, more frequent transit options to Guelph and the GTHA.)
This report serves as a reminder that data alone isn’t sufficient to gain real insights into the transportation issues we face every day. Worse, incomplete or the wrong sort of data can lead to faulty conclusions.
Joe Cortright writes, “Our use of data is subject to what we call the ‘drunk under the streetlamp’ problem: An obviously intoxicated man is on his hands and knees on the sidewalk, under a streetlamp. A passing cop asks him what he’s doing. ‘Looking for my keys,’ the man replies. ‘Well, where did you drop them?’ the cop inquires. ‘About a block away, but the light’s better here.'” (The limits of data-driven approaches to planning, CityObservatory) We should be careful not to read too much into the ‘light’ of these kinds of slow-traffic reports, when the truth lies a block away.
It’s easy though to point fingers at a radio station for naively re-broadcasting what really amounts to promotional material for a mobile app, but the truth is, the problem underlies much of our transportation planning. For instance, our cities have a wealth of information regarding traffic volumes, with which they optimize travel lanes and intersections for these traffic flows, but lack useful metrics to optimize for pedestrian or cyclist comfort and safety. New data sources, like cycling app heat maps of heavily biked areas, may not be the most appropriate for showing cities where to invest in cycling infrastructure, because not only are they not necessarily representative of the general population, they also don’t show where people want to bike, but feel they can’t. Traffic engineers don’t feel justified in installing a new signalized crosswalk unless they can count 200 people crossing at that point in an 8 hour period – but the lack of crosswalk may be holding people back from doing so.
When asked to prove that ped crossings or #bikelanes are needed, it's hard to justify a bridge by the # of people swimming across a river.
— Brent Toderian (@BrentToderian) May 16, 2016
The answer to these problems of course, isn’t to eschew data, but to seek better data, for which we more fully understand the context and limits.