In the first part of this blog post, I reported on the opening of the new Berkeley Institute for Data Science (BIDS). Today, I am going to share with you some of the ways that UC Berkeley scientists are using and analyzing “big data.” At the BIDS event, there were both talks and poster presentations highlighting recent projects. Here’s just a small taste of what was happening!
Solomon Hsiang, Assistant Professor of Public Policy
Hsiang’s work focuses on the effect of the environment on human society, and recently his lab reconstructed dozens of storms in the Philippines and linked that information to detailed, household survey data. They found that in a given year after a storm, there was an increased risk of not having basic assets such as walls, electricity and plumbing. These storms cause a localized economic depression, and people significantly reduce their expenses on nutritious food, education and medical care, while infant mortality significantly increases. Matching data to the distribution of where people live in the Philippines, and based on hidden costs and economic losses, these events tend to be “roughly 15 times more costly than what you see in the newspaper” — highlighting the importance of rebuilding efforts after storms.
Rosemary Gillespie, Professor of Environmental Sciences
The Berkeley Eco-informatics Engine will integrate biological and environmental data to learn more about how organisms respond to global changes. Currently, the default is to use the physiological constraints of organisms, such as temperature and precipitation tolerance, to predict where organisms might go, but this method does not always predict behavior well. The Berkeley Eco-Informatics Engine will bring together huge amounts of diverse data in an open API development to allow integration and analysis of huge amounts of data. This will include looking at museum specimens, each of which is associated with a space and time, and using geo-spatial base layers such as land cover, climate, and the history of the landscape to predict response to climate change.
Richard Allen, Professor, Earth and Planetary Science
The Berkeley Seismology lab runs networks across all of California, with sensors around several active fault lines, with a precise GPS network and mobile instruments that can stream low-amplitude signals and tremors. Allen noted that “the potential of the future is most exciting” — one possibility is using accelerometers in the approximately 16 million smart phones in California (compared to 1000 seismic stations) to record motion and location in real time. This means a huge increase in information: so how to harness this data and make it available? There are many challenges, Allen states, “but this is where BIDS comes in.”
K.Shankari, Graduate Student, AMP Lab
“I want to look at transportation problems.” With ongoing concerns about global warming, Shankari is looking at improving bike-share systems. Right now, when people use these systems, it can be difficult to find or return bikes where they want to because some centers are more popular than others. She is approaching this as a distributed scheduling problem, which is a classic paradigm in computer science. Shankari ran naïve algorithms using simulation and incorporating information about when stations are nearing their limits of either bikes or empty spots. Her results showed that this algorithm could improve the redistribution of resources and thus the likelihood of a customer finding a bike or spot to park when they wanted to.
Qingkai Kong, Graduate Student, Berkeley Seismological Laboratory
The BSL has been using a shake table to simulate large earthquakes, and then collect data from smart phones, training an algorithm to recognize the features of earthquakes to see how well they can predict earthquake activity from phone data. The technology is very exciting and promising because “99.8 percent of the time it can correctly recognize an earthquake.”
Harrison Dekker, Head Librarian, Library Data Lab
“I help social science students …who need to work with real world data. I help them identify the resources, understand the formats they are in, and because I have a background in computer programming, I can help them use statistical programming languages.” This is a free service in the library that can offer support at all levels.
One goal is to see more class offerings for undergrads and for faculty to teach with data, where students have to come up with a topic, find the resources and analyze the data. Dekker wants to see undergraduates trained on statistics and programming: “If they learn this stuff as undergraduates, they get jobs.”
Jason Vytlacil, Graduate Student, Helen Wills Neuroscience Institute
“I don’t usually do anything like this!” Jason was influenced by Griffiths et al.’s finding that Google and the mind face similar computational problems – responding rapidly to a query with the most likely target. Vytlacil tested this using popular music, giving subjects the name of a musical artist, and asking what song first came to mind. The question: Is the answer predicted by raw exposure (times you have listened to the song) or is it based on general popularity and how does this help us model the human mind? “I built a code base that takes in your iTunes library” and asked “for every track in your library, get me all the similar tracks in last.fm.” He then ranked and sorted music by the number of listens (how familiar subjects were with the song) and page rank. This type of modeling could lead to the development of personalized stimuli for future studies of human memory organization.
Want to know more?
More information about the Berkeley Institute for Data Science: