BIF_1, a computationally designed protein.

Proteins are tiny machines within our cells that work tirelessly to keep us alive. Each is a result of millions or even billions of years of evolution by mutations—small, essentially random changes to its molecular composition—and fine-tuned to perform its specific function. Everything from breaking down sugars for energy, to using oxygen from the air to carry out chemical reactions, to identifying harmful pathogens inside the body is done by proteins. But what if we need a protein that can do something completely new? What if we need a protein to solve a problem we never encountered in the past million years but is urgent today? Can we afford to wait millennia for random chance to give us what we need? No, and we don’t need to. With recent scientific and technological advances, today we can do something unprecedented: design and build brand-new proteins from scratch using de novo protein design.

To understand how proteins can be designed, we must first understand what exactly proteins are. Proteins are one of the functional units of life. They are the brick and mortar that hold organisms together, the activators and regulators of chemical reactions inside cells, and the connectors between an organism and its environment. Proteins are an enormously diverse set of molecules, each precisely tuned to carry out individual functions in concert with many others. Proteins are chains built by an “alphabet” of 20 smaller molecules called amino acids, with each chain having the ability to fold into various three-dimensional shapes and sizes for a plethora of functions. Even more fascinating is that proteins are largely products of chance. Mostly random mutations over the course of millions of years have altered protein chains one amino acid at a time, resulting in a wide range of different proteins with different capabilities. Collectively these changes have given rise to proteins that are evolved for the functions they perform today.

Proteins are composed of chains of amino acids, which eventually fold into the three-dimensional native structure.

Proteins are composed of chains of amino acids, which eventually fold into the three-dimensional native structure. The three-letter codes represent the 20 amino acids that make up the protein “alphabet.” RCSB Protein Data Bank ID: 1RIL.

But evolution can only go so far. The natural selection that has driven such diversity among proteins has also restricted nature to only finding combinations at an extremely slow rate. Under evolution, we only end up with certain proteins if they happen to have a favorable effect and promote fitness of an organism, which may take millions of years. Naturally occurring proteins evolved to solve problems that existed under specific environmental conditions in the past. But to solve new biological problems, we need new proteins, and waiting millions of years is not an option. For example, vaccines are a relatively recent human invention that artificially produce an immune response against pathogens—no protein has naturally evolved to become a vaccine. Current vaccines often use modified microbial proteins to evoke an immune response, but this approach is problematic (manufacturing and unpredictability being two major issues). If we could bypass evolutionary constraints and control the design of proteins, we could create completely brand-new proteins that are capable of brand-new functions. This is where de novo protein design comes in.

De novo (Latin, “from new”), or computational protein design, is an emerging field in the biological sciences. The basis for de novo protein design is a postulate by Christian Anfinsen in 1973 about how proteins naturally fold into their characteristic three-dimensional shapes, which influence their functions. Proteins are chains of amino acids that can fold into many conformations, and some conformations are more stable than others, meaning they are more likely to stay folded and functional. According to Anfinsen’s hypothesis, which turns out to hold true in most cases, the most stable conformation represents the protein’s functional form and is known as the “native state”. Once they’re manufactured in the cell, proteins sample multiple conformations until they end up in their stable native state.

If an average protein is 200 amino acids long, then there are theoretically 20200 possible amino acid sequences of which a protein could be made, but nature has only sampled 1012. De novo protein design uses all nature’s building blocks, but goes beyond nature’s limits by computationally sampling all possible combinations to give rise to brand-new proteins. Using de novo design, we can create artificial (or “synthetic”) proteins that act as vaccines, for example. Another recent breakthrough in de novo protein design is the development of a synthetic enzyme that can break down gluten, which could make a huge improvement in the lives of patients suffering from Celiac disease. Both of the above are real-world examples that originated in Dr. David Baker’s laboratory.

Dr. David Baker, a professor of biochemistry and Director of the Institute of Protein Design at the University of Washington in Seattle, is considered the pioneer of de novo protein design. The work started with the development of the Rosetta—a computer program that uses algorithms to predict protein structures based on the stability of combinations of amino acids.

“Up until recently, the only proteins we knew were the ones that existed in nature,” says Baker. “It was a foreign idea to think of brand new proteins.” The Rosetta program works backward from how proteins normally fold: instead of finding the most stable conformation of a particular sequence of amino acids, it takes a pre-determined structure and intended function and scans multiple amino acid combinations to find the sequence with the most stable native state within the structural constraints. This novel approach allows scientists to start by specifying what they want the new protein to do, and then carefully build it piece by piece. In a few months, new, never-before-existent proteins can be produced in the lab.

Some examples of designed proteins from the Baker lab.

Some examples of designed proteins from the Baker lab. B, C, D, and F are examples of self-assembling protein “cages” that can potentially be used to deliver cargo (such as drugs) to specific targets inside the cell. RCSB Protein Databank IDs: 4YXY, 4DDF, 3VCD, 4NWO, 4DCL, 5IM4.

Manipulating proteins to carry out new functions is not a new idea: for decades, the field of protein engineering has been mutating single amino acids within proteins to alter their chemistry, adding or deleting entire parts of proteins, or even fusing different proteins together (making so-called “chimeras”). These modifications can improve protein stability or reactivity, or enable them to operate in new conditions, such as high temperatures. But these methods rely on changing or optimizing already existing proteins. The commonly used Taq polymerase enzyme, used to amplify small amounts of DNA, is an example. Researchers have made multiple significant modifications to improve the protein’s stability and accuracy over the years, but its fundamental properties remain the same. This approach is relatively simple for scientists, as it relies on the natural protein’s native stability and activity, and only alters the composition of the protein to improve its performance. For a more diverse range of problems, however, altering or modifying proteins isn’t enough: this approach is not only limited by the size and shape of the original protein, but can also mislead us to believe that nature’s way is the only way.

De novo protein design has been popularized over the past several years due to a few major factors. According to Baker, the leaps in computing power, the improved understanding of the physics that drive protein folding, and the desire to actually use de novo design have accelerated the field to its current popularity.

Computer processing power has always been a technical hurdle for computational design, until recently. Even with the huge advancements made in modern computing, Baker’s lab realized that just buying, storing, and running more computers nonstop would not provide enough power. The solution: crowd computing, and the birth of Baker’s Rosetta@home program. Rosetta@home began as a project on the Berkeley Open Infrastructure for Network Computing, or BOINC, which is a distributed computing network that allows personal computers all over the world to contribute idle processing power to tackle small pieces of large problems.

Another huge advance made in the field has been scientists’ improved understanding of the physics that govern protein folding. Scientists are continuing to develop better tools to observe biomolecules experimentally and to understand their dynamic properties, using improved computer simulations to better predict their behavior. “We are improving our basic physical model of proteins and making predictions more and more accurately. This is the main bottleneck,” says Baker. While there have been significant leaps made in this area, there are still some major concepts missing, including a complete understanding of enzyme mechanisms and molecular machines that use chemical energy to generate mechanical forces. Nature still has the edge on these more sophisticated systems that have been refined by millions of years of trial-and-error, but we are definitely hot on its trail.

The emerging desire to use de novo protein design is particularly interesting because it involves re-thinking proteins in general. “In some ways the richness of the natural protein world was what prevented people from looking elsewhere,” says Baker. “It was easier to start from proteins already in nature, rather than to start from scratch…it’s like when people thought Eurasia was the entirety of the world, and then explorers discovered the New World.” This New World of de novo protein design offers many advantages that are not feasible if we restrict ourselves to the tools evolution has provided. “Proteins in nature solve problems that existed during evolution, and they are very good at carrying out their functions. But we now live in a world that is very different from the one we evolved in,” notes Baker.

One of the “new” challenges facing humanity today is combating disease. An important way synthetic proteins are influencing scientific thought is in the treatment of currently untreatable serious diseases. In the near future, scientists may be able to build proteins that are capable of recognizing fluctuations in chemical signals and performing actions in response to the detected level of signals—a previously inconceivable idea that is now within our grasp. The potential for de novo proteins appears to be endless.

The allure of these novel applications of de novo design has given rise to a wave of scientists using and developing this approach. Yang Zhang from the University of Michigan has developed QUARK and I-TASSER, which are protein structure prediction and design algorithms similar to Rosetta, but use different computational approaches. Riju Das of Stanford University is researching de novo design of RNA, another type of biomolecule that can not only perform reactions, just like proteins can, but also plays a critical role in the production of proteins from the instructions in an organism’s genome. All these innovative applications help improve our constantly evolving knowledge of biomolecules, and with our understanding of the principles underlying the molecules of life, we are poised to enter a new chapter in synthetic biology: one in which we are no longer limited by the rules set by evolution.

Featured image: BIF_1, a computationally designed protein. For more information, see Pearson and Mills et al

Leave a Reply