Toolbox – Probability and Statistics

    Mathematics has a reputation for being exact. Consider the Pythagorean Theorem, which exactly relates the lengths of the three sides of a triangle. I didn’t know Pythagoras (c. 500 B.C.E.), but I like to imagine him scurrying back and forth around a giant marble triangle, measuring the sides with a piece of string and chiseling the values into his granite lab notebook. When he sat down to ponder over his results, what did he make of his measurements? Specifically, what did he make of the fact that each time he went to measure side number one, he got a slightly different number? As it turns out, Pythagoras was something of an idealist. By ignoring the imperfections in his measurements, he gave us the beautiful and precise formula we love. However, another story lies in exactly that which Pythagoras overlooked: uncertainty. Probability and statistics are the branches of mathematics that embrace uncertainty and the fact that in the real world, we acquire all of our knowledge through imperfect measuring devices.

    Historically speaking, probability and statistics were a bit late to the party. It wasn’t until 1654 (for those keeping track, that’s a mind-blowing 2 millenia after Pythagoras) that all-star mathematicians Blaise Pascal and Pierre de Fermat teamed up to give uncertainty a mathematical treatment. This lag is reflected in the types of mathematics that are taught last and remembered least in American schools (a recent survey by the National Science Foundation showed that 43% of Americans lack an understanding of probability, a figure that only 57% of Americans grasp, unfortunately). Most current-generation traffic routing algorithms are Pythagoras-like, in that they ignore that measurements have variability–each time you drive down a road, it takes a different amount of time. The figure below shows what the full dataset might look like for two fictional roads, with the height of the curve at each point showing how likely it is for any one trip to last that particular amount of time. This chart is known as a probability distribution. A Pythagoras-like algorithm would assume that the road has one travel time–this distribution’s expected value, which is simply the average for all the trips observed (the dotted green and black vertical lines). This simplification might speed up the calculations, but we throw away any information we had about how dependable (that is, uncertain) the estimates are.

    To better appreciate the devastating repercussions this might have, imagine you have to be home in 10 minutes to take a pie out of the oven. You can choose one of two routes, shown in the figure above. Option one is Bieber Speedway, a high speed expressway, which happens to also cut through the Justin Bieber Sculpture Park, so that the occasional walking tour crosses the street and holds up traffic. Option two is the tranquil and secluded Country Road, which has a lower speed limit, but no Bieber Park. A current-generation router would only know the expected values (dotted lines) of both distributions: on average, Bieber Speedway gets you home in 6 minutes, which is faster than Country Road, which takes 8 minutes. Accordingly, the algorithm would direct you to take Bieber Speedway. It turns out that today just wasn’t your day, and a mob of fans crosses the street, adding 5 minutes to your trip. You get home to find that your pie is a sooty heap.

    Closer inspection of the two probability distributions reveals how understanding uncertainty could have saved your dessert. By collapsing the entire travel time distribution to one number, traditional routing algorithms lose sight of the fact that a considerable proportion of trips down Bieber Speedway take longer than 10 minutes. Based on historical traffic data, there is a zero percent chance of a trip down Country Road taking 10 minutes or more, so while Country Road is slower on average, it is more reliable. Next-generation algorithms like those employed in the Reliable Router (There’s a map for that, pg. 36 of this issue) keep track of the travel time probability distribution, allowing the reliability of each road to factor into your route selection.

    Humans intuitively evaluate the uncertainty in life without much prompting, which is why when you ask for directions from a local, they’ll probably send you through low-risk routes. Computers are blind to these subtleties–until we teach them. By endowing computers with the ability to appreciate uncertainty via the mathematics of probability and statistics, we come one step closer to automating good advice. Maybe a future generation of algorithms will also favor routes that take us right by the best slice of pie in town.

    Leave a Reply