Snowy with a chance of data

Last winter, snow fell on the mountain ghost town of Gothic, Colorado as always. While that snow normally melts and flows roughly 500 miles down to the Colorado River Basin, by springtime, it became clear that a lot of that water mysteriously vanished along the way. By August, the US Bureau of Reclamation was forced to issue its first-ever declaration of a water shortage at Lake Mead, one of the Colorado River’s main reservoirs. This water supplies roughly 40 million people in the American West, so understanding where it comes from is crucial. “Yet,” says Dan Feldman, staff scientist at Lawrence Berkeley National Lab (LBL), “what happened to the water is still a mystery.”

How do we count snowflakes?

To find out what happened to the water, scientists first need to ask another deceptively simple question: how much water is there, hiding in snow? Measuring liquid rainfall is straightforward enough for an elementary schooler to get the job done with a bucket and a basic calculator. But measuring snowfall is not for the faint of heart. From a sheer logistical standpoint, Feldman explains, snow is difficult to measure. In the winter, there are avalanches. Year-round, there are areas of wilderness too dangerous to explore. “We can’t measure everything everywhere,” he says.

Snowfall also varies greatly over both space and time. Pay attention next time you drive up to Lake Tahoe or the Sierras, and you’ll notice how quickly snow appears as you approach the mountains. If you’re a skier, you may also spot differences in snowfall between neighboring resorts. Those numbers matter for more than just skiing—which side of Lake Tahoe a snowflake falls on determines its most likely future path, either as reservoir water providing for the East Bay or a drop in the Southern California watershed.

Finding the snow water equivalent, or how much water a snowpack contains once it's melted, is tricky. Feldman explains that, from a satellite, “you can see that there’s snow here, and no snow there, but it doesn’t tell you anything about how much water is in there,” which varies with the density of newly-fallen snow. Snow water equivalent is affected by a myriad of interacting factors, from the intensity of a snowstorm to the amount of energy the earth is absorbing from the Sun.

And where does it go once it melts? “For society’s use of snow, that water content is key,” Feldman says, taking a video call from the driver’s seat of his van. He’s parked at SAIL, the Surface Atmosphere Integrated Field Laboratory in Crested Butte, Colorado. A weather balloon floats through the background, one of over three dozen scientific instruments on site. All of these data provide slightly different insights into the snowpack, how it evolves over time, and how it ultimately flows into rivers and streams. But if you’re not careful, Feldman warns, “putting it together turns into a big fruit salad. These data don’t play nicely together.”

Traditionally, a team of many graduate students would take a few years to wrestle with all the data involved. The data comes in many forms, from single numbers listing yardstick measurements to satellite videos and atmospheric readings. Some are collected in real time, while other numbers only come in once a year. Some data covers about a snowman’s worth of ground, while others blur across 100-kilometer patches. With a lot of effort, climate scientists can simplify this smorgasbord to make the datasets play nicely together, but this both throws away valuable information, and wastes even-more-valuable time. “This is a big deal when we’re facing a rapidly-changing snowpack,” says Feldman. “We need to be able to keep up with the speed of climate change.”

Cutting-edge artificial intelligence (AI) techniques could help climate researchers understand and act upon their data more effectively, but climate change has been historically understudied in the world of artificial intelligence. A group of UC Berkeley AI researchers, including computer science PhD candidate Colorado Reed, are working to change that.

Data and climate scientists put their heads together

At the end of 2020, Berkeley AI co-director Dr. Trevor Darrell challenged his trainees, including Reed, to a competition: put together a couple of slides sharing your biggest idea for 2021, all plausibility be damned. Reed proposed an initiative to bring together experts from AI and the natural sciences to tackle some of the biggest data-related challenges in climate change. After presenting, his lab mate, PhD candidate Medhini Narasimhan, said she was in. Following a spirited open debate on Slack, Reed and Narasimhan met up and decided to form a reading group. They met bi-weekly, reading papers at the intersection of climate and AI, garnering more and more interest over time. “That’s what got this whole thing going,” Narasimhan says.

After months of regular meetings, Ritwik Gupta, a first-year PhD student at the time, told Reed and Narasimhan that it was time to make it bigger. Together, they transformed Reed’s five-slide moonshot into the student-led Berkeley Artificial Intelligence Climate Initiative (BCI), with enthusiastic support from their advisor Dr. Darrell. “They had a great vision,” he says, “and I knew it would attract a lot of interest.” By the fall of 2021, Gupta, Narasimhan, and Reed were setting up meetings with scientists studying climate-related issues across departments at UC Berkeley, exploring potential avenues for collaboration. “You need someone in the climate space with in-depth knowledge about their field, and what’s missing,” Narasimhan explains.

But it took much more than scheduling one-off meetings—building bridges between AI and climate researchers is an ongoing, iterative process. When you’re first meeting with someone from another field, Gupta says, “It’s hard to even know what the right questions to ask are.” In each of their meetings, the AI team outlined what they can and can’t do in their field, with examples tailored to the interests of the scientist of the hour. Then, they ask the scientist to do the same in turn. After asking many, many questions, they eventually hone in on a problem they have complementary tools for solving.

Last fall, Feldman met with Reed, who faced the opposite problem: Reed, an AI (artificial intelligence) researcher, had the means to develop tools for handling difficult datasets, but was searching for data worth handling. In the world of AI, researchers typically use toy datasets like ImageNet to test how well their algorithms can do things like label pictures. “I love how fast AI moves,” says Reed, “but on a personal level, I wanted to develop AI methods while directly working on problems with an immediate impact and benefit.” Rather than celebrating theoretical wins measured by arbitrary benchmarks, Reed would rather celebrate real advances in our understanding of climate change. “I’d rather walk into the lab and say, ‘Yes! We did it! We have a greater understanding of how much snow is in the Sierras, and that’s going to help determine local water allocations!’ That’s something I could get excited about.” Together, Reed, Feldman, and an interdisciplinary team of AI researchers, data scientists, and hydrologists, are now collaborating on the Fate of Snow project, where they hope to figure out how to distribute water across the American West by calculating how much water to expect from melting snow.

Torvi_snow AI_body.png

Getting the data to talk

As in hydrology (studying the movement of water, including snow), many fields within the natural sciences face data-related problems. Dr. Aditi Krishnapriyan, professor in the Department of Chemical and Biomolecular Engineering, says that developing AI methods that transfer well to problems in the natural sciences, from chemical engineering to conservation biology, is “the essence of what BCI is about.” Oneproblem that spans the natural sciences is data that spans multiple spatial and temporal scales. This special type of complicated data is called OMMS (pronounced like “om” in yoga): Observational, Multimodal, and Multi-Scale. In other words, Gupta explains, “the data comes in all different forms, and we don’t have a set framework that allows us to deal with it all at once.”

In a recent workshop paper, Fate of Snow researchers announced promising initial results. The big challenge: making five sources of data, ranging from daily satellite images capturing 500-meter patches of earth to spreadsheets of weather data covering 8 times as much land, get along. To do this, all of the data, from RGB images to tabular text documents, need to be converted into the same format without stripping away too much information. They fed the raw data into artificial neural networks, which convert that input into a single column of numbers, regardless of each source’s original form. Now, data from all five sources can be stacked together and sent, as a unit, through another artificial neural network that guesses the snow water equivalent for the one kilometer square of land accounted for by the original data. The team was able to predict the snow water equivalent in Sierra Nevada water basins with better accuracy than researchers had been able to get from physical models or on-site measurements alone.

One important goal of the Fate of Snow project, Reed says, is empowering AI researchers to more readily step into the world of hydrology. “There’s this huge barrier to entry if we want to work on intrinsically meaningful problems like this,” he says. When Reed began collaborating with scientists like Feldman, he quickly realized how much unglamorous groundwork was ahead of him: collecting data, aligning that data, processing it to make it usable, figuring out what software libraries to use, and so on.

By creating standardized, easily-accessible datasets, benchmark tests, and software libraries, called SnowBench, many recent publications have used AI models to take what we know about physics to estimate snowpack from raw satellite data. But without standardized datasets, benchmarks, or data collection methods, comparing results across research groups is prohibitively challenging. SnowBench aims to serve as a ready-to-go package of everything an AI researcher needs to take a stab at snowpack estimation, using tools they’re already familiar with. “We’re doing all of that gritty groundwork for our community,” he says. “It’s not a sexy, front-page-of-the-New-York-Times result, but it’s a pragmatic thing that, over time, can really bring about change.”

Changing academic incentives for real-world progress

The BCI still faces another, more nebulous challenge: academia itself. “Everyone thinks that climate change is the most pressing problem facing us today,” says Gupta, “but the incentives in our field aren’t there to motivate us to study it.” He explains that climate change, and natural science in general, are seen as applications, “not the glorious, masterful theoretical work we should be focused on.” Applying AI to natural science problems also binds AI researchers to a slower timeline—several years, from experiment planning to publication—than the 3-6 month publication cycle they’re used to. While the slowed pace can be discouraging, especially in the face of a competitive academic job market, “the reward at the end is big,” Narasimhan says.

BCI is a student-run hub, and PhD students are famously overcommitted and underpaid, even without the pressures of running an interdisciplinary research program. In terms of administrative load, chuckles Gupta, “it’s really faculty-levels of responsibility.” “We’re putting the rails down while we’re moving,” Reed adds. The BCI is still a nascent work-in-progress, and finding researchers willing to commit fully to its vision will be key for growing into the future. Dr. Darrell and Dr. Krishnapriyan are currently the main faculty on the leadership team, with other faculty involved more peripherally. “Many current faculty have other commitments,” explains Krishnapriyan. “That’s challenging, even if their students are very interested.”

One solution, she proposes, is to hire new faculty who enter the scene already invested in the intersection of AI and climate change. “We need faculty who want to say, ‘The path to success is doing this type of work,’” says Reed. With faculty buy-in comes new graduate student projects. Krishnapriyan, whose lab develops machine learning methods tailored for challenges in the natural sciences, says, “I’m trying to have all of my students work on something that would be part of BCI by proxy. Moving forward, the hope is that more faculty will do the same. Darrell says, “We need to continue to build bridges inside Berkeley, and between Berkeley and the outside world, to show the value of this interdisciplinary area.”

------- Celia Ford is a graduate student in neuroscience

Design by Julia Torvi

This article is part of the Fall 2022 issue.

Snowy with a chance of data

Artificial intelligence makes sense of the natural world

Recent Articles

The earthquake alert and the tsunami warnings of December 5th

From the Editor

Expanding the mind, expanding the field (Metamorphosis of the mind)

Most Popular

Environmental justice is rooted in science

Irene Chen

Plants do the wave