Big data isn’t just changing the way we shop, date, and communicate. It’s also changing the way we do science. This April 2 to 5, molecular and computational biologists from across the country converged in San Francisco for the Joint Genome Institute (JGI) annual meeting. The JGI was created by the US Department of Energy (DoE) to help sequence the human genome. Since the completion of the human genome sequencing project, there have been enormous changes for both JGI and genomics, the field of biology that has grown out of DNA sequencing. After the human genome project, JGI pivoted toward environmental genomics, or the use of DNA sequencing to understand the biology of wild plants and microbes, focusing on ecosystems and sustainability. The JGI annual meeting showcased the latest research in applied and fundamental environmental genomics. Microbial communities, synthetic biology, and plant genetics were featured most heavily, and in all cases a single trend was clear: research projects are getting exponentially bigger and faster. Amid this celebration of scale, though, were many discussions about what this obsession with growth means for science as a field.
Microbes play a huge role in the cycling of nutrients and other molecules through ecosystems. As the cost of microbial genome sequencing plummets from cheap to cheaper, it has become more common in recent years for scientists to study entire microbiomes, or microbial communities. Typically, these projects aim to uncover fundamental principles of microbial biology, characterize new species, or identify species or genes that may be useful to humans. Some projects did all three, such as the one presented by Aindrila Mukhopadyay. Her team isolated microbes from a toxic waste site, and sequenced their plasmids—DNA molecules that float around apart from the genome and that can be easily transferred from one bug to another. In addition to identifying some novel microbial species, the research team found one plasmid that was the largest naturally-occurring plasmid to date. They also found another plasmid that confers mercury resistance to the microbe carrying it.
Another exciting large-scale microbiome project came from Kevin Solomon, whose team has discovered new microbes that efficiently degrade plant waste, a major bottleneck for biofuel production. One particular fungus, which they isolated from the donkey gut microbiome, may even be able to fine-tune which plant-chewing enzyme it secretes into the animal’s digestive tract based on the composition of the plant material the donkey consumes. They are continuing to study it, in the hopes of discovering distinct enzyme cocktails that are each optimized to degrade corn, other grasses, or trees.
Large-scale microbiology is not limited to natural communities, and scientists at the meeting were also developing tools for synthetic biology. Synthetic biology uses genetic engineering to produce marketable products. Often, this involves screening through thousands of microbial strains to find the strain that’s most productive. Pamela Peralta-Yahya pointed out that “a key challenge in engineering microbes for chemical production is that we can only screen a fraction of the microbes we can build.” It’s fairly easy to produce thousands of mutant strains from one culture, but characterizing the molecules each strain produces is tedious, involving lengthy chemical extractions and mass spectrometry analysis by specially trained experts. To enable faster screening of, for example, drug-producing microbes, her group has been engineering yeast which contain modified human olfactory (smell) receptors that glow fluorescently following successful drug-receptor interaction. With this technology, a researcher could simply measure the amount of light that’s emitted from each strain to identify how much of the drug it’s producing. They have already had promising results with a pilot study, in which they isolated a strain of engineered yeast that produces high amounts of serotonin. Her lab’s goal is to be able to use this technology to isolate productive strains faster, by testing a million chemical-producing microbes in a single day.
Plant genetics, a notoriously slow field of biology, is also scaling and speeding up. Doreen Ware illustrated this with her comment that, “ten years ago, my group was involved in sequencing the first maize genome, and that took a bunch of labs and about $50 million. This year, my lab is coming out with 27 high-quality maize genomes, and well, let’s just say, it’s a lot cheaper.” Michael Purugganan, one of the meeting’s keynote speakers, described a staggering effort, using data from thousands of rice varieties, to assign a fitness value to all the genetic variants in the rice genome. In other words, if your breed of rice has a mutation in a particular gene, you could use Purugganan’s map to look up how advantageous, neutral, or harmful that mutation is predicted to be. Todd Mockler described a similarly massive project on sorghum, an up-and-coming biofuel crop. His team is using ground cameras, laser scanners and satellites to collect many types of plant growth data for 400 sorghum varieties. They’ve already used this information to identify naturally occurring mutations that may contribute to robust growth and drought tolerance, and they aim to identify many more.
With all these impressively large efforts, the rapid rise of big data-driven biology was itself a hot topic at the meeting. One such discussion arose following a presentation by Sunil Chandran, the vice president of Amyris, who described the green bioproduct company’s latest efforts to streamline. Their infrastructure is designed around a need for speed, from robotics that help them to analyze 100,000 microbial strains per month to machine learning algorithms that help automate experimental design. During the Q&A, an audience member asked, “Isn’t it likely that with all these high-throughput pipelines, some of those strains you’re discarding are still valuable?” Heads in the audience bobbed in agreement.
This is an important question for science today. It can feel sometimes like biology has become a race not to make the most profound discovery, but to create the biggest dataset as fast as possible. It’s like finding a cove piled high with sunken treasure, making off with a fistful, and then embarking on another voyage to find a bigger cove without fully exploring the first cove. There was even one presentation at the meeting from a sociologist, Emma Frow, who discussed how biology is becoming less experiment-oriented and more focused on “keeping the robots and the computers busy.” Has big data made science more wasteful? And as blind screening techniques become the norm, have we lost some of the elegance of classical hypothesis-driven experiments?
Perhaps, but just as the JGI Meeting raised this issue, the scientific community’s response shone through as well. Collaboration and democratization permeated just about every project. For example, Iria Bernhardsgrütter described a project to engineer a synthetic pathway for biological sequestration of CO2, a potent greenhouse gas. This massive effort to engineer carbon-capturing microbes has brought together biochemists, structural biologists, and bioinformaticians across 4 different institutes in the U.S. and Germany. Debra Mohnen spoke eloquently about the need for collaborative science, which she illustrated with her own project to find genes that control growth of poplar, a proposed biofuel crop. One team found candidate genes, another team disrupted those genes with genetic engineering, and another team recorded the effects on plant traits with field trials. The eminent Geri Richmond, former president of the American Association for the Advancement of Science (AAAS) and a physical chemist by training, gave a talk about the need for diversity and collaboration in science. In it, she joked, “I feel like I don’t need to tell you about how diversity improves productivity; I wish my field were as collaborative as yours!”
To some extent, science is inherently wasteful: it’s our job to throw things at the wall until something sticks. These days, we have the capacity to throw unprecedented amounts of data at the wall. And it’s true that as a result, attitudes about what makes for a compelling scientific investigation are changing. High-profile studies are more commonly a mile wide and an inch deep, generating far more data than we are capable of analyzing.
The flip side is that big data can bring scientists together in a way that small data can’t. Researchers who never would have communicated with each other prior to the internet age are now doing so routinely. And if there’s one thing our planet needs right now, it’s diverse research teams working together on big problems. In spite of the sometimes uncomfortable speed with which computers are changing science, the atmosphere at the JGI Meeting was charged with optimism. The findings presented at the meeting demonstrate that when everyone puts their heads together, big projects and big data can yield big results.
Featured image by Ousa Chea.