Building AI models that understand chemical principles | MIT News

Among all possible chemical compounds, it is estimated that between 1020 and 1060 may be as potent as small molecule drugs.
Testing each of those compounds experimentally would be very time-consuming for chemists. Therefore, in recent years, researchers have begun to use artificial intelligence to help identify compounds that could make suitable drug candidates.
One of those researchers is MIT Associate Professor Connor Coley PhD ’19, a Class of 1957 Career Development Associate Professor who is employed jointly by the departments of Chemical Engineering and Electrical Engineering and Computer Science and the MIT Schwarzman College of Computing. His research straddles the line between chemical engineering and computer science, as he develops and uses computer models to analyze a large number of possible chemical compounds, design new compounds, and predict the reaction pathways that may produce those compounds.
“It’s a very general method that can be used for any application of organic molecules, but the main application we’re thinking about is small drug discovery,” he says.
The intersection of AI and science
Coley’s interest in science runs in the family. In fact, he says, his family includes more scientists than non-scientists, including his father, a radiologist; his mother, who majored in molecular biophysics and biochemistry before attending the MIT Sloan School of Management; and his grandmother, a professor of mathematics.
As a high school student in Dublin, Ohio, Coley participated in Science Olympiad competitions and graduated from high school at the age of 16. He then headed to Caltech, where he chose chemical engineering as a major because it offered a way to combine his interests in science and mathematics.
During his graduate years, he also pursued an interest in computer science, working in a structural biology lab using the Fortran programming language to help solve protein crystal structures. After graduating from Caltech, he decided to pursue a career in chemical engineering and came to MIT in 2014 to begin a PhD.
Advised by professors Klavs Jensen and William Green, Coley worked on ways to improve spontaneous chemical reactions. His work focuses on combining machine learning and cheminformatics – the use of computational methods to analyze chemical data – to program reaction pathways that can create new drug molecules. He also worked on designing hardware that could be used to automate that reaction.
Part of that work was done through a DARPA-funded program called Make-It, which focused on using machine learning and data science to develop the synthesis of drugs and other useful compounds from simple building blocks.
“That was my real entry point into thinking about cheminformatics, thinking about machine learning, and thinking about how we can use models to understand how different chemicals can be made and what reactions are possible,” Coley said.
Coley began applying for faculty jobs while still a graduate student, and accepted an offer from MIT at age 25. He received mixed advice about taking a job at the same school where he attended graduate school, and ultimately decided that the position at MIT was too appealing to him to turn down.
“MIT is a very special place in terms of resources and flexibility across departments. MIT seems to be doing a great job of supporting the intersection of AI and science, and it was a vibrant ecosystem to live in,” he said. “The quality of the students, the enthusiasm of the students, and the amazing ability to work together definitely outweighed any concerns about staying in one place.”
Chemistry intuition
Coley deferred a tenure-track position for a year to do a postdoc at the Broad Institute, where he sought more experience in chemical biology and drug discovery. There, he worked on methods to identify small molecules, from billions of candidates in DNA-encoded libraries, that might have binding interactions with mutated proteins associated with diseases.
After returning to MIT in 2020, he formed his own lab team with the goal of deploying AI not only to synthesize existing compounds with therapeutic potential, but also to design new molecules with desirable properties and new ways to make them. Over the past few years, his lab has developed a variety of methods to address those goals.
“We’re trying to think about how best to match a challenge in chemistry with a possible solution in computing. And often that pairing drives the development of new methods,” Coley said. One model his lab developed, known as SHEPhERD, was trained to test molecules for potential new drugs based on how they would interact with target proteins, based on the three-dimensional structure of the drug. This model is now being used by pharmaceutical companies to help them find new drugs.
“We’re trying to give medical chemistry intuition to a generative model, so the model knows the right methods and considerations,” Coley said.
In another project, Coley’s lab developed a generative AI model called FlowER, which can be used to predict reaction products that will result from combining different chemical inputs.
In building that model, researchers build on an understanding of basic physical principles, such as the law of conservation of mass. They also forced the model to consider the possibility of intermediate steps that must occur on the way from reactants to products. These constraints, the researchers found, improved the accuracy of the model’s predictions.
“Thinking about those intermediate steps, the processes involved, and how the reaction occurs is something that chemists do naturally. It’s how chemistry is taught, but it’s not something that models naturally think about,” Coley said. “We’ve spent a lot of time thinking about how we can make sure our machine learning models are based on understanding reaction pathways, in the same way as a chemist.”
Students in his lab also work in many different areas related to chemical process optimization, including computer-aided structure elucidation, laboratory automation, and comprehensive experimental design.
“With these many different research threads, we hope to push the frontier of AI in chemistry,” Coley said.



