Every synthetic chemist knows about the facts of endless struggle behind the development of new methodology, especially when the reaction is not the easiest one to handle. Basic human intelligence, experience, literature, and extremely hard work remained the four pillars behind the successful development of a critical synthetic methodology. In the 21st century, almost every field of science has adopted Artificial intelligence to accelerate progress, but automation or Machine learning in chemistry is still very rare.
However, the introduction of the newest “optimization algorithm” by Prof. Doyle and co-researchers may favor synthetic chemists massively; it’s probably the most demanding tool that all synthetic chemists were looking for many years.
|Two US scientists, Prof. Abigail G. Doyle from the Department of Chemistry, Princeton University, and Prof. Ryan P. Adams from the Department of Computer Science, Princeton University collaborated to develop an immerging software to help synthetic chemists.|
After years of continuous effort, Prof. A. G. Doyle, Prof. R. P. Adams, and their groups discovered a “state-of-the-art global optimization algorithm” adopting the key principles of Bayesian Optimization (BO) that allows faster and more efficient synthesis of versatile chemicals in the laboratory; they published the research report recently in February 2021.
After designing the algorithm they set several competitions between AI and human effort, and they find, that artificial intelligence could offer unexpectedly better results in comparison to the manual effort driven by humans.
Before entering the in-depth discussion about the software, I’d prefer to introduce a brief background story to expose the real picture in drug industries and synthetic laboratories. This will clear the urgency of this newly discovered machine learning in chemistry.
The Dependency of the Drug Industry on Synthetic Chemistry
Synthetic organic chemistry is the core of drug discovery as well as medical science (medicine). A medicine launches in the market after several stages; in brief, those are,
- Designing target molecules
- Synthesis of the target molecules and related compounds
- In vitro and in vivo tests
- Toxicity profiling
- Quality control
- Several stages of clinical trial
A medicine may be approved by FDA only after successful qualification in all the above stages. Remarkably, it takes 10-15 years to launch a medicine in the market from scratch, involving billions of dollars.
However, the “idea of a new drug development” spends a huge time in the research & development section of synthetic laboratories; Designing new compounds, methodology development & synthesis, thousands of failures, modifications, re-designing, repeat synthesis, these processes continue to rotate day-and-night in the chemical laboratories. Moreover, the synthesis of an expected drug moiety passes through multiple steps to achieve the final form, and every step of the sequence needs to be optimized perfectly to obtain the maximum atom-economy as well as the highest yields.
Though sometimes the chemists in industrial laboratories develop their own methodologies for certain steps, to be honest, the drug industry doesn’t have enough time to develop a new methodology for every step. Eventually, they enormously rely on the reported literature established by academic researchers worldwide.
Reaction Optimization: One of the Toughest Job in Synthetic Laboratory
Most of the conventional reactions are well-established and they generally offer close to 100% atom economy, but often these reactions fail to perform in a complex system (when the target molecule is complex). Even though sometimes they can afford a complex molecule, the procedure becomes long enough to experience a huge loss in atom economy.
Eventually, industries stuck out of the budget, and the price of the established medicine hikes a lot; it becomes inaccessible to the common people.
The recent developments in synthetic organic chemistry explored several techniques that can work perfectly in complex molecules and reduce lots of steps to reach the final molecule. Cross-coupling reactions, C-H activation, asymmetric synthesis, photo-redox, electro-redox, carboxylation, direct amino acid synthesis, etc., are some of the most demanding technologies that several industries are adopting nowadays for the seamless production of complex drug-like molecules.
Notably, none of these techniques are easy to optimize.
the successful development of a practical methodology for such critical reactions having potential industrial applicability needs massive optimization that takes several months, sometimes even years.
I can personally remember, I had to run around 1000 reactions to optimize a ligand-enabled C(sp3)-H activation reaction affording direct access to unnatural amino acids (still I couldn’t afford >90% yield). I was frustrated with continuous failures in terms of yield or selectivity, and once thinking to quit. I was even dreaming of a machine or a technology, or anything else that could resolve the horrible optimization issues. But unfortunately, there was no such machine learning in Chemistry in those days.
Discovery of Novel Machine Learning in Chemistry
It seems like, reaction optimization is ubiquitous in chemical synthesis, both in academia and across the chemical industry.
How the newly discovered artificial intelligence (AI) can improve the optimization process? – Let’s see.
By definition, Bayesian Optimization is a sequential strategy for the optimization of black-box functions. It is specifically advantageous for the problems where the function f(x) can’t be evaluated, which means, f(x) is a black box having some unknown structure.
Prof. Doyle and co-workers utilized the Bayesian Optimization strategy to build the tool for optimization of chemical synthesis.
Undoubtfully, synthetic chemists are pretty wise to pick the right strategy for optimizing a reaction, but most of the time they follow a series of literature representing similar conditions having slight variations. As a result, they often overlook the unusual entities that may enhance the efficiency of the reaction.
On the other hand, while artificial intelligence is searching for a completely unknown function based on a sequential algorithm, it is completely free from any sort of bias. It can fetch unexpected reaction conditions that a human chemist never thought of.
Prof. Doyle reported experiencing such results during their experiment.
Artificial Intelligence Outperformed the Human Experts
After completion of the coding process, the researchers aimed to benchmark the performance of Bayesian optimization (vs human experts); it’s basically a statistical test. They selected a specific C-H functionalization reaction to develop a game that can track the decisions taken by the chemists.
To launch the game they selected a subspace consisting of 1,728 reactions that included 12 ligands, 4 bases, 4 solvents, 3 temperatures, and 3 concentrations (based on experts’ analysis).
50 expert chemists from academia and industry participated in this reaction optimization game, and they were given one month of time to find out the optimal reaction conditions.
According to the results, it was observed, although initially, the human experts were doing better than the artificial intelligence, after only 3 batches of 5 experiments AI completely suppressed the human performance!
For finding the best experimental conditions each human participant had up to 20 batches of reactions for a total of 100 experiments. Notably, most of the participants didn’t complete 20 batches expecting they got the best optimum conditions (although they couldn’t find the best conditions as the AI did). Assuming human experts would find the best result in the next batch after they quit, the researchers set an upper bound as in figure…
On the other hand, Bayesian optimization achieved >99% of yield representing completely new and unexpected conditions within the first 50 experiments. The optimized reaction by the AI was like this: CgMe-PPh, CsOPiv or CsOAc, DMAc, 0.153 M, 105 °C.
If you are an organic chemist, especially if you are associated with C-H activation chemistry, you’ll definitely NOT think of such a condition for optimizing this specific reaction; because CgMe-PPh is really an uncommon ligand for C-H arylation reaction.
This way, the artificial intelligence completely outperformed the human experts.
How Does The Software Work?
A reaction is controlled by several parameters, such as catalysts, reagents, ligands, solvents, temperatures, and concentrations.
After selecting a reaction for optimization, the user needs to prepare a space introducing the above parameters. Best suited parameters may be inserted through the literature search. Next, need to define the number of batches and experiments to be run. The software itself chooses the experimental conditions for further evaluation.
Once the user runs the experiment, the process analyzes every smallest cast and iterates the process until the best optimization condition obtains.
The scientists tried to design the software in a manner so that it can evaluate the optimum reaction conditions from the minimum information injected by the user. It basically analyzes the optimum condition by comparing the energy states of the many possible intermediates that may be generated by the implementation of different catalysts, ligands, solvents, temperatures, etc.
Definitely, the higher information you put in the system, the better will be the analysis as well as the results. So, human expertise is always valuable even though you are using the software. However, machine learning can save a lot of time and resources that you generally spend in the manual optimization process. Eventually, a critical reaction can be optimized within a determined budget, or even lower.
Novel Machine learning in Chemistry: The Conclusive Remarks
Prof. Adam, the associate scientist in this work stated that it’s just the beginning of implementing AI in chemistry, specifically in the optimization of chemical reactions. The state-of-art technique offered Bayesian Optimization has just proved it can find a better condition in a faster way than human experts could typically identify.
Reaction optimization is the most basic step to drug discovery. However, often the optimization of critical reactions becomes an expensive and time-consuming process. Initiation of this software is a landmark discovery in the field of synthetic chemistry and will be far more upgraded within the next few years when it’ll accumulate millions of data from different methodologies. Maybe, in the future, the AI will be capable to expose the best reaction conditions without having any parameter input from the users.
The software and examples can be accessed HERE.
The following GitHub links are open to all for accessing the software and related information.
- The software that represents the chemicals
- Software for reaction optimization
- The game that conducted for testing the efficiency of the AI
Reference: Benjamin J. Shields, Jason Stevens, Jun Li, Marvin Parasram, Farhan Damani, Jesus I. Martinez Alvarado, Jacob M. Janey, Ryan P. Adams, and Abigail G. Doyle, “Bayesian reaction optimization as a tool for chemical synthesis,” Nature 590, 89–96 (2021) [DOI: 10.1038/s41586-021-03213-y].