Tag Archives: big data

The Berkeley Institute for Data Science: Part II

In the first part of this blog post, I reported on the opening of the new Berkeley Institute for Data Science (BIDS). Today, I am going to share with you some of the ways that UC Berkeley scientists are using and analyzing “big data.” At the BIDS event, there were both talks and poster presentations highlighting recent projects. Here’s just a small taste of what was happening!

Solomon Hsiang, Assistant Professor of Public Policy

IMG_3039Hsiang’s work focuses on the effect of the environment on human society, and recently his lab reconstructed dozens of storms in the Philippines and linked that information to detailed, household survey data. They found that in a given year after a storm, there was an increased risk of not having basic assets such as walls, electricity and plumbing. These storms cause a localized economic depression, and people significantly reduce their expenses on nutritious food, education and medical care, while infant mortality significantly increases. Matching data to the distribution of where people live in the Philippines, and based on hidden costs and economic losses, these events tend to be “roughly 15 times more costly than what you see in the newspaper” — highlighting the importance of rebuilding efforts after storms.

Rosemary Gillespie, Professor of Environmental Sciences

The Berkeley Eco-informatics Engine will integrate biological and environmental data to learn more about how organisms respond to global changes. Currently, the default is to use the physiological constraints of organisms, such as temperature and precipitation tolerance, to predict where organisms might go, but this method does not always predict behavior well. The Berkeley Eco-Informatics Engine will bring together huge amounts of diverse data in an open API development to allow integration and analysis of huge amounts of data. This will include looking at museum specimens, each of which is associated with a space and time, and using geo-spatial base layers such as land cover, climate, and the history of the landscape to predict response to climate change.

The Berkeley Institute for Data Science: “Everyone is going to have to become a data scientist”

On Thursday, December 12th, 2013, hundreds of Berkeley scientists, students and industry folk gathered to celebrate the opening of the Berkeley Institute for Data Science (BIDS), a multidisciplinary support system that will provide incentives, infrastructure and support for scientists from all departments hoping to jump on the big data wagon. Funded to the tune of 37.8 million by the Alfred P. Sloan Foundation and Gordon and Betty Moore, BIDS represents the future of science, and part of a larger trend to improve technological advances in the United States, as the White House recently pledged $200 million to its own “big data initiative” at its “Data to Knowledge to Action” event last month, where the BIDS partnership was announced.

Nobel Laureate and Director of BIDS Saul Permutter on how programming environments should not be an obstacle to scientists: "I don't think God wrote in C."

BIDS is part of a collaborative project with New York University and the University of Washington, and is led by a team that includes Berkeley astrophysicist and Nobel Laureate Saul Perlmutter. The new center at Berkeley will be housed at 190 Doe Library, a national landmark known for its history; it is a space that will now be known for moving UC Berkeley into the future.

What is big data and why do we need an institute for it? Big data really refers to the massive amounts of information that we are now able to obtain due to advances in technology — from smart phones and web clicks to geographic positioning systems to genomic data and beyond! Recognizing the importance of statistical knowledge and computer programming skills to harness this type of data, and the need for people to cross fields and also for academia to stay relevant in a time where industry may lure tech-savvy individuals away, BIDS is in a key position to address the current and upcoming changes to how we think about science.

Nicholas Dirks kicked off the ceremony by pledging that innovation in research was a core focus for him as the new chancellor of UC Berkeley, and that big data will “bring departments and programs together in unprecedented ways.” Vicky Chandler, a geneticist and chief program officer from the Moore Foundation, stated that this project is a bold set of experiments that will change the culture at Berkeley and will revolutionize how science is practiced.

Saul Perlmutter, the new director of BIDS, underscored the importance of bridging the gap for underrepresented populations (important food for thought at a heavily Caucasian and male event) and discovering what is slowing down scientists who are less “data science savvy.”  Programming environments should not be an obstacle for scientists, he said, noting, “I don’t think God wrote in C.”

The 23andMe Shutdown

Confession: I had my DNA analyzed by 23andMe. And I have no regrets. I haven’t rushed to my doctor, jumped off  a bridge, or engaged in any risky medical behaviors. If anything, I’ve become more interested in genetics, learned about analyzing risk and “odds ratios,” and discovered some interesting information about myself.DNA

23andMe is a genomics company (backed in part by Google) that offers the general public access to genetic information. For $99, you can send in a saliva sample and get back a full suite of genetic details about yourself, including ancestry, physical traits, and risks of having certain medical conditions. Unless you have been hiding under a rock for the last few weeks, you likely have heard that 23andMe has been ordered to stop selling its product by the Food and Drug Administration (FDA).

So why is the FDA trying to shut 23andMe down? You can see the letter, dated November 22, 2013, from the FDA here:

The concerns outlined by the FDA include that 23andMe is “intended for use in the diagnosis of disease or other conditions or in the cure, mitigation, treatment, or prevention of disease” and “the potential health consequences that could result from false positive or false negative assessments.” The FDA suggests that someone with a false positive test could undergo serious medical procedures, or that someone with a false negative would neglect to do so.

The shutdown has caused a bit of uproar and a slew of opinions in the media, some which imply that the tests may be inaccurate. But is the ensuing debate really about the quality of the information you receive from 23andMe? From wading through many science blogs and articles, it appears that no, the consensus among many scientists is that the information you receive from 23andMe is reasonably accurate; in other words, the report you get from 23andMe can tell you which nucleotides are present in the parts of the genomic code that are analyzed.

Now, to be clear, 23andMe does not sequence the entire genome. Instead, they selected thousands of “SNPs” (single nucleotide polymorphisms) or fragments of DNA where there tend to be differences between individuals. Note that while most of our genetic code is the same, there are an estimated 10 million places where we differ—and 23andMe has targeted a subset of those genome locations for analysis.

So the crux of the argument is really how people will use the information they receive from the DNA test. Let’s face reality; we live in a world where information is everywhere. The internet alone has several “symptom interpreters” that could have a potential hypochondriac such as me thinking I have an entire spectrum of diseases. In addition, we love collecting data on ourselves, as the wave of popularity of personal fitness tracking devices and apps (e.g. Fitbit, JawboneUP) would suggest. Should these websites (such as WebMD and the Mayo Clinic’s symptom checker) and fitness trackers also be considered medical devices that need to be regulated by the FDA?

The risk of false positives and negatives could happen with any medical test, even ones done by your doctor. Hopefully anyone who received information from 23andMe indicating that they were at risk for a serious medical condition, such as breast cancer, would get further diagnostics done before getting a double mastectomy!