Superforecasting: The Art and Science of Prediction (Book Review)

As an organic chemist, my day is filled with making predictions—or rather, that’s how I see it. Every reaction that I run has an implicit prediction: I predict that mixing such-and-such reagents together will produce the desired compound. And in trying to run as many successful reactions—make as successful predictions—as possible, I marshal and apply many of the fundamental concepts of organic chemistry that I’ve learned over the years.

In this way, chemistry is like all other fields of science. In general, scientists predict events in the world using tools embedded within some theoretical framework. Astrophysicists predict the motions of stars using the laws of physics; evolutionary biologists predict how species respond to selection pressures based on Darwin’s theory of evolution by natural selection; economists predict the behavior of economies based on a few fundamental economic principles.

So it seems we have ample tools to predict chemical events, astronomical events, biological events, economic events—but what about global events?

Do we have any tools we can use to predict whether North Korea will deploy nuclear weapons in the next six months? Are we able to predict the exchange rate between the Euro and the dollar on December 31st of this year? Can we predict exactly when the next president of Venezuela will come to power? Every astrophysicist has to be comfortable with Newtonian mechanics to predict the motions of celestial bodies—what are the tools that forecasters need to master in order to forecast a diverse set of global events?

This is the key question that University of Pennsylvania Professor of Political Science and Psychology Philip Tetlock has devoted much of his career to, culminating in a series of research conclusions laid out in his and co-author Dan Gardner’s most recent book, Superforecasting: The Art and Science of Prediction. The punchline: some among us are “superforecasters”, able to predict global events with higher accuracy than even intelligence analysts with access to classified data, and that the key to superforecaster status is not necessarily just being super smart, nor being super caught-up-with-the-news, but rather being comfortable with a few mental habits all of us can train and learn.

Image credit: Penguin Random House, LLC. Used with permission.

Tetlock and Gardner begin the book by investigating the intuitive answer to the question of how to predict global events: just ask the experts. After all, there is no shortage of pundits in the media, in business, or in government whose job it is to give you their projections of the future price of oil, or their prognosis of the Israel-Palestine conflict. If these people stake their livelihoods on their forecasting abilities, shouldn’t they be the first we ask?

Apparently not, according to Tetlock. In fact, he has twenty years of his own research to show just the opposite: experts don’t have a great track record when it comes to forecasting. First, when experts make public forecasts, they tend to hedge their bets, using statements like “may”, “might”, and “fair chance” in their forecasts, making it extremely difficult to judge a forecast as accurate or inaccurate after the fact. Second, when Tetlock did get experts to anonymously make concrete, probabilistic predictions of global events, following the predictions of 284 experts over 20 years and amassing over 25,000 predictions, he found that the average expert didn’t do much better than anyone would do with random guessing. As these results have now so often been summarized, the average expert was “roughly as accurate as a dart-throwing chimpanzee”.

In a sense, this is old news, having already been detailed in Tetlock’s previous book Expert Political Judgment: How Good Is It? How Can We Know? But the brief revisitation of this conclusion in Superforecasting helps set up the next logical question that Tetlock set out to answer: if expertise in and of itself can’t help us forecast global events, then what can?

The intelligence community had the same question in the aftermath of their 2002 assessment that Iraq had continued developing weapons of mass destruction. By the end of 2003, it was clear that this assessment had been made far too confidently, and the intelligence community was forced to make changes in the face of widespread public opprobrium. In 2006, the Intelligence Advanced Research Projects Activity (IARPA) was created to fund research with the potential to make the intelligence community’s predictions more accurate, and in 2010, IARPA decided to host a massive forecasting tournament. Top researchers from around the country, including Tetlock, were asked to create research teams and compete against one another, creating incentives for the teams to come up with novel methods of making accurate forecasts.

For the tournament, each individual on each research team was asked for his or her probability estimate for a variety of financial and geopolitical events, ranging from the granting of European Union status to Serbia to the price of gold exceeding a certain amount by a certain date. Initial forecasts could be updated an unlimited number of times until the closing date of the question, at which time the forecasts for a question would be scored for accuracy. The closer a group’s forecasts were to what actually happened, the better they scored.

Tetlock, along with his research partner and wife Barbara Mellers, advertised far and wide to recruit people, eventually bringing together thousands of laypeople in a team he called the Good Judgment Project (GJP). Cumulatively, the GJP involved over 20,000 non-experts making predictions of global events over the course of five years. By aggregating these predictions in a clever way detailed in the book, the GJP ended up outperforming all competitors by large margins. After 2 years, IARPA was so impressed by the GJP’s scores that it ended up dropping all the other teams.

While the factors that led to the outstanding success of the GJP as a team were in and of themselves intriguing to IARPA, perhaps of even more interest was the treasure trove of data that the GJP had amassed. Upon detailed analysis of the thousands of predictions that the GJP had made, Tetlock found that certain individuals were consistently at the top of the forecasting leaderboard—and with some work, he was able to extract from the data exactly what made these “superforecasters” so good at forecasting.

Sprinkled throughout the book are mini-profiles of these superforecasters, who by all accounts are otherwise ordinary people. Bill Flack, one superforecaster, is a retiree from Nebraska who enjoys bird-watching. Elaine Rich, another, is a pharmacist who lives in Washington, DC. If they fit the profile of the typical superforecaster, they’re above average in intelligence, but not in “genius” territory. Neither has any professional reason to want to make good forecasts. So what’s their secret?

According to Tetlock and Gardner, what sets superforecasters apart from normal forecasters is, more than anything else, the mental habits that they execute when approaching forecasting questions. Tetlock and Gardner clearly detail how superforecasters think through a forecasting question, and even compile a list of tips in an appendix titled “Ten Commandments for Aspiring Superforecasters”. Throughout, Superforecasting gives plentiful examples of how to apply the habits of superforecasters to your own probability estimates, facilitated by Tetlock and Gardner’s clear and easy-to-read writing.

In addition to explaining the importance of these mental habits, Tetlock and Gardner emphasize that you can’t reach superforecaster status without practicing forecasting yourself. So, to see if I could make use of the lessons of Superforecasting, I signed up in early July to make forecasts on Good Judgment Open, a forecasting site where anyone is free to sign up and make probabilistic predictions on the same kinds of events that were used in the IARPA tournament. The first question I decided to forecast read:

“What will Gallup report President Trump’s approval rating to be on 1 August 2017?”

I was asked to give probability estimates for four separate ranges: <30%, 31–40%, 41–50%, and >50%.

How would a superforecaster think through this question? As detailed in the book, the first thing a superforecaster would do is break down the problem into easier-to-answer sub-problems. I broke down this question into the sub-questions, “How much would Trump’s current approval rating have to change to fall into each of those ranges?” and “How likely would each of those changes be?”

At the time of my initial forecast, Trump’s approval rating was sitting at 40%. Since this question was asked in July, in order to fall below 30% or above 50%, the rating would have to change by 10 points within a month. To determine how likely this would be, a superforecaster would first take what is known as the “outside view”, ignoring the details of the particular case at hand and instead focusing on statistical rates across other similar situations. For example, to take the outside view for this question, I looked at the last three presidencies to see how often a president’s approval rating changed by 10 points within a month. Across a combined almost 300 months of presidencies, this drastic of a swing only happened a handful of times. This set my initial probability estimates for the ranges of <30% and >50% at 1% apiece.

What about the middling ranges of 31–40% and 41–50%? Naively, you might be tempted to assign roughly a 50% chance of Trump’s approval rating falling into each of those ranges, given a current approval rating of 40%. But this is where the outside view comes into play: when I looked at the past 30 days, Trump’s approval rating had risen above 40% just once, suggesting that my initial probability estimate for the 41–50% range should be a mere 3%.

But superforecasters don’t stop at the outside view. Once they have a base estimate, they then pivot to the inside view, taking into account pertinent details specific to the case at hand. For example, I knew that debate about the healthcare bill in the Senate would continue throughout July, and that attention on healthcare had correlated with drops in Trump’s approval rating in the past. This made me slightly more confident that Trump’s approval rating on August 1st would fall in the 31–40% range.

Once superforecasters have an estimate based on their outside and inside views, they then seek out other perspectives and synthesize those perspectives with their own. This could involve consulting experts and talking with other forecasters; I simply looked at the average forecast on the Good Judgment Open site, which happened to be slightly less confident in the 31–40% range than I was, and moved my forecast toward it. With all things considered, my initial forecast for the four separate ranges ended up being:

<30%: 1%
31–40%: 86%
41–50%: 12%
>50%: 1%

After superforecasters make an initial estimate, they monitor the news for any important information that would have bearing on their forecasts, and then update their forecasts accordingly. Updating is a difficult component of the forecasting process, since you have to avoid both underreaction and overreaction to new information. I followed the Gallup polls and adjusted my forecast every few days, although I ended up never straying too far from my initial estimate.

This is essentially the superforecaster method in a nutshell: break down a forecasting question into more manageable parts; synthesize together the outside view, the inside view, and other perspectives to make an initial estimate; and then, finally, update forecasts in a level-headed manner in response to new information. In addition to forecasting Trump’s approval rating, I attempted to replicate this procedure to predict the July US civilian unemployment rate, the winner of the presidential elections in Kenya, and whether or not China would declare an air-defense identification zone over any part of the South China Sea before August 1st.

So how did I end up doing? At first glance, I did pretty well; by the scoring system used on the site, I ended up doing quite a bit better than the average forecaster, placing relatively high probabilities on the eventual outcomes for each question. However, it would be a mistake to simply label this experiment a success. As emphasized throughout Superforecasting, the difference between a superforecaster and a normal forecaster can only be shown over a large number of forecasts. Just as an amateur poker player can get lucky and beat a pro for a few hands, average forecasters can do well on a few forecasting questions without making use of a forecasting procedure that would be more accurate in the long run. Only by aggregating data over thousands of forecasts can you really tell the difference.

Thankfully, Tetlock already did that job for us with his Good Judgment Project, and the work he and Gardner detail in Superforecasting is a fascinating demonstration that employing a “superforecaster mindset” can make forecasting complex global events as scientific as predicting chemical reactions in a flask. Superforecasting is an important read, and one that will leave you feeling like you’re sharing in privileged information that intelligence agencies worldwide will be mulling over in the future.

Featured image credit: CC0 public domain.

Leave a Reply