Computational Chemistry: Data Science in the Lab

Corey J Sinnott
2 min readDec 23, 2020
image from pixabay.com

Any chemist who has ever fit curve after curve to calibration data, trying to get the coveted 1.0000 r-squared value, will instantly see the value machine learning can bring to the field. In fact, despite what we see watching CSI and our favorite police procedurals, there is no magic instrument that can deduce molecular structures, give exact concentrations, or even say, “hey, I think you’ve got some analyte here!”

Just about everything an analytical chemist does is a “guess,” based on a combination of what the chemist knows, and what the instrument knows. Most often, for quantitative analysis, a software package is doing most of the math; taking arbitrary results from an instrument, and plotting a regression. If the chemist is skilled, they created some near-perfect concentration standards for the instrument to build their curve. The more the software knows, from both the analyst and the instrument, the more confident we can be in the result. With machine learning, an analyst can gain insight that otherwise limited by their own personal knowledge-database — one that may have been degrading since they took their last organic chemistry exam. A great of example of this was recently published from Beijing, where researchers used an algorithm to predict interference while performing mass spectrometry.

Another shortcoming in chemistry, one that every researcher has experienced, are budget constraints. Unless you are working on a vaccine during a pandemic, there’s simply no way your lab manager is going to buy you every reagent you ask for, let alone approve the overtime your lab techs need to run all of your experiments. These budget and time constraints have led to an entirely new field to get your PhD in: Computational Chemistry. Computational chemistry has become a must-have skill in pharmaceuticals, with even this years flu vaccine using the technique.

Computational chemistry is allowing researchers in every field to run simulated experiments, passing off every chemist’s nightmare, math (physical chemists not withstanding), on to computers. Machines are able learn the physical properties of molecules and practice probability equations infinitely more efficiently than the kids that ruined the curves in your differential equations class. Using machine learning, researchers have even gotten to the quantum mechanical origins of the covalent bond.

With machines taking on more and more experiments, and learning more about molecules and physical chemistry everyday, our knowledge in subjects like drug discovery and materials science will continue to grow exponentially. It’s an exciting time to be a chemist with some Python skills!

Sources:

https://www.nature.com/articles/s41467-020-18670-8

Theory and Applications of Computational Chemistry: The First Forty Years by Clifford Dykstra, Gernot Frenking, Kwang Kim, and Gustavo Scuseria

https://www.mdpi.com/2075-163X/9/5/259

https://www.sciencedirect.com/science/article/abs/pii/S1359644620304803?via%3Dihub

--

--

Corey J Sinnott

Aspiring Data Scientist and student at General Assembly.