In the last few years, machine learning (ML) has transformed from an esoteric abstraction to a mainstream phenomenon—and for good reason! From Wall Street floors to art gallery walls, the high performance, flexibility, and user-friendly tools associated with modern ML infrastructure seems perfect for anyone interested in making their fast-paced, techno-centric, and data-rich life a little less chaotic.
With all this emergent technology at our fingertips, how can we tell the difference between innovation and imitation? How do we know the route Google takes us is really the fastest way to get to work? Why do we trust a machine to spot a cancerous tumor that a doctor might have missed? And what makes ChatGPT any better than the dreaded customer service bots of years past? As ML’s popularity continues to grow, these and similar questions have risen to the forefront of academic deliberations around the world. The more people look to ML to solve their problems, the more opportunities they will have to misuse this deceptively complex resource, and without proper education, guidance, and collaboration, individual mistakes can quickly propagate into widespread, systemic issues.
So, how do we address this threat, commonly referred to as the “reproducibility crisis?” Dr. Bin Yu, a statistics professor at the University of California, Berkeley, has a simple answer: stability. As the final component of her group’s innovative “Predictability-Computability-Stability” (PCS) framework, stability plays an important role in what Dr. Yu refers to as “veridical data science,” a theory that prioritizes responsible, reliable, and transparent data analysis and decision making.
In work published in 2020, Dr. Yu argues that a model’s predictability (the ability to accurately estimate target parameters), and computability (the ability of the model to be created and applied), are necessary for a model to run. However, stability, or the replicability of a model’s results under shifting conditions, are necessary for a model to work. This distinction is often missing from traditional ML coursework and its omission, when combined with unrealistic expectations and external pressures, can lead many practitioners to become skeptical of well-designed models and readily accept faulty ones, especially since unstable models often appear to outperform their stable counterparts.
This may seem counterintuitive at first, but consider the following: you want to design a model that predicts the closing price of the S&P 500 using only the average price of gas in Alabama. This task may seem impossible, but you decide to randomly divide up your data and give it a whirl. And wouldn’t you know it, the model you create instantly returns an accuracy of 98%! On your first try! However, the second you start investing, you watch your portfolio take a nosedive. So, what happened?
No, the model doesn’t hate you. No, it wasn’t just bad luck. And no, the S&P didn’t fundamentally change overnight. Okay, so what did happen? Well, although your model seemed to run well, according to Dr. Yu, randomizing your training and test sets actually undermined its stability. By looking at the prices on Monday and Wednesday, your model learned to estimate the S&P’s close, not with Alabama gas prices, but by the date. In other words, you showed the model the answers before ever asking it a question, so it had no idea what to do with new information when it appeared! This phenomenon is colloquially referred to as “data leakage,” and is a common mistake among ML developers. Thankfully, as an avid proponent of the PCS framework, you realize your mistake before losing too much of your hard-earned money and arrive at the proper conclusion: Alabama gas prices have no influence over the S&P 500.
That being said, the time and energy it takes to question your assumptions, check your results, and properly test every aspect of your work are seldom insignificant. Even with popular ML tools and techniques at your fingertips, you may spend more time disproving your findings than you did creating the model in the first place. However, for the models that matter most, remember to always adhere to the PCS framework by remaining skeptical of success, inspect your biases, and respect that, at the end of the day, stability is king.