How Computers Are Starting to Think Like Chemists

The artificial intelligence (AI) revolution is upon us. Self-driving cars seem to be on the cusp of regular use. AlphaGo Zero, the newest iteration of Google’s Go-playing computer program, has surpassed its world-champion-defeating predecessor AlphaGo by leaps and bounds a mere two years after AlphaGo first made headlines. And the Kiwi food delivery robots often seen roaming across the UC Berkeley campus serve as a daily, local reminder of these advances in machine learning that are revolutionizing industries as disparate as manufacturing and healthcare.

But for all its achievements, the AI revolution seems to have left at least one realm of human activity (relatively) untouched: conducting scientific research. Scientific research today is still driven primarily by human efforts, just as it was 100 years ago––humans are designing and conducting experiments to test hypotheses that humans have come up with. That’s not to say that AI hasn’t impacted science at all––machine learning has greatly assisted researchers in making sense of the large amount of data generated in particle physics, for example––but a team of postdoctoral researchers and graduate students, rather than IBM Watson, is still the prototypical unit driving scientific advances. While self-driving cars might soon cause truck drivers to go the way of elevator operators, scientists don’t yet need to worry about their job security.

Or so it may seem.

While I can’t speak for other fields, all is not quiet within my own field of synthetic organic chemistry. People are working hard to bring the AI revolution to a synthetic chemist’s front door. It seems only a matter of time before a large fraction of the present-day synthetic chemist’s day-to-day activities is no longer of uniquely human capability, and how synthetic chemists respond to this impending reality will play a large part in shaping the future of the field.

Synthetic chemistry primarily involves two tasks: first, thinking up a sequence of chemical reactions to construct a particular synthetic target, and second, actually doing it. Dr. Bartosz Grzybowski, a professor at the Ulsan National Institute of Science and Technology in South Korea and at the Polish Academy of Sciences, is most concerned with the “thinking” part.

Ever since he was in graduate school, Grzybowski has been wondering if the cognitive part of a chemist’s job could be outsourced to a computer algorithm. Soon after beginning his professorship in 2003, he and his team began working on a computer program that would autonomously design a synthetic route to an inputted target molecule. By 2016, they had developed a program ready to be shown to the world.

The computer program, named Chematica (whose inner workings were revealed in a 2016 publication), can design a synthetic route to a given target molecule optimized for cost, step count, and feasibility within the span of just a few minutes. What used to take chemists months of fine-tuning and experimentation can now be accomplished with Chematica within a coffee break. Often, the routes that Chematica produces are faster, more efficient, and more cost-effective than those that chemists could have come up with on their own.

Chematica’s performance was most visibly demonstrated in a recent publication in the journal Chem, where Grzybowski and his team collaborated with chemical supplier MilliporeSigma to choose eight synthetic targets to which Chematica would design synthetic routes. In the ultimate practical test, chemists would then carry out these routes in the lab. The eight targets were each chosen to challenge Chematica in some way or another. Six out of eight were medicinally-relevant compounds for which previous synthetic efforts by MilliporeSigma chemists had been unsuccessful. Synthesis of the seventh target, the blockbuster drug dronedarone, was protected by tens of patents, and therefore would challenge Chematica to develop alternative synthetic routes. Finally, the eighth target was a recently-discovered compound that had never before been synthesized.

Chematica passed this test with flying colors. Each of the eight compounds was successfully synthesized in the laboratory using Chematica’s recommended pathways, often with significant cost savings, yield improvements, and step count reductions over previous existing syntheses. Furthermore, the procedures did not depend on years of acquired laboratory expertise to carry out; four of the syntheses were performed by chemists who lacked significant experience in multistep organic synthesis.

Targets synthesized by Chematica and improvements over literature routes. Image source: Brian Wang

How does Chematica work? It doesn’t perform a simple brute force investigation of all possible reaction pathways; given the number of chemical reactions possible on any given structure, such a strategy would be computationally intractable. Rather, it must intelligently navigate the tree of possible reaction sequences. The strategy that Chematica uses was inspired by chess-playing programs. In playing chess, a computer program chooses its moves by evaluating the moves available to it constrained by the rules of the game; each move is assigned a score, in part based on the desirability of the resulting board position (e.g., do I now have more pieces than my opponent?). By repeatedly choosing the move with the best score, chess-playing programs are able to manipulate the board position into a state closer to the desired end state (i.e., checkmate).

Grzybowski realized that retrosynthetic planning––the most common method chemists use to design synthetic routes, which involves planning backwards from a complex target to simple, commercially-available starting materials––is a lot like playing chess; each retrosynthetic step is a “move” that can be evaluated against other possible moves with a scoring function that rewards, among other things, the generation of structural complexity. The “rules of the game” constraining the possible moves are the reactions that are currently known to organic chemists; these have to be pre-programmed into Chematica, which currently involves over 50,000 reaction rules coded by Grzybowski’s team through years of hard work. By searching through the possible moves and repeatedly choosing the move with the best score, Chematica will begin to build up a synthetic route that will efficiently reach the desired end state (i.e., commercially-available starting materials).

In order to keep Chematica up to date with the most recently published literature, chemists will have to continually write newly discovered reactions into Chematica’s code. More desirable would be an algorithmic synthetic planner with greater autonomy––that is, one that could teach itself reaction rules based on the published literature without the need for human intervention. Such an algorithmic synthetic planner was just published in March of 2018 by Dr. Mark Waller and coworkers. The authors describe training a set of three deep neural networks on an enormous public database of chemical reactions––or, as the authors put it, on “essentially all reactions ever published in organic chemistry”––resulting in the extraction of reaction rules without the need for human hand-coding. Combined with a search technique that guides the program through possible reaction pathways, this new iteration of an algorithmic synthetic planner was able to generate synthetic plans for more molecules in a shorter amount of time than Chematica-like synthetic planners relying on hand-coded reaction rules.

In addition, when graduate students were asked to report their preference between a synthetic route generated by this new synthetic planner and a literature-reported route to a number of different compounds, without knowing in advance which route was which, they showed no statistically significant preference for the literature-reported route. Even in the eyes of those with synthetic chemistry expertise, the algorithmic synthetic planner was coming up with synthetic routes that looked no less plausible than those that had already been proven to work in the lab.

The evolution of computer-aided retrosynthetic planning is reminiscent of the evolution from AlphaGo to AlphaGo Zero: both involved a transition from algorithms relying on expert input to those that learned the rules of the game from scratch. The analogy isn’t perfect, however; while both AlphaGo and AlphaGo Zero developed Go-playing capabilities that far surpass those of even the most talented humans, neither Chematica nor the newest neural network-based algorithmic synthetic planner can currently claim to best the most experienced synthetic chemists at retrosynthetic planning for all molecules. Algorithmic synthetic planners aren’t yet very good at taking into account how the idiosyncrasies of a molecule––for example, its three-dimensional shape, or whether it exists as an equilibrium between two forms––affect its reactivity, and these idiosyncrasies abound for structurally complex molecules.

The recent history of AI, however, is full of examples of computer algorithms conquering territory once thought to belong solely to human capabilities, and there’s no reason to believe that humans will be able to retain primacy over the realm of synthetic planning forever. This doesn’t mean that synthetic chemists are doomed to a future of grunt work in which AIs have been given all the interesting cognitive tasks. Rather, chemists will have to redefine what they see as their mission, and to shift to tasks that AI won’t be able to perform for the foreseeable future. For example, while algorithmic synthetic planners can train themselves on the entirety of the current chemical literature, they can’t discover new reactions. They also only know how to make an inputted molecule, not what molecule would be best to make in the first place. These limitations provide synthetic chemists of the future ample room for intellectual contribution and may result in chemists and AIs serving complementary roles. While human chemists discover new reactions, algorithmic synthetic planners will learn these new reactions to further their planning capabilities; while human chemists determine which molecules to make based on their predicted or desired properties, AIs will provide the route to make them.

This reinvention of a synthetic chemist’s role won’t necessarily come easily to those who have seen synthetic planning as the most important part of a synthetic chemist’s job––but it may be necessary. “Thinking about how to make molecules has long been seen as one of the vital parts of organic chemistry,” medicinal chemist and popular blogger Derek Lowe writes, commenting on the recent advances in algorithmic synthetic planning, “but knowing how to handle horses was long seen as a vital part of raising crops for food, too. It’ll be an adjustment.” It’s an adjustment that members of so many other professions have had to learn to make over the years, and one that synthetic chemists will have to embrace, too.

Featured image: Max Pixel

Leave a Reply