Working with Data Scientists to Improve On-line Chemical Extraction and Analysis


In the past, I have openly lamented the seeming lack of National Science Foundation (NSF) support for research into fundamental separation science. I have not been alone in that sentiment. This past fall, I was heartened to receive an NSF grant (CHE-2108767) to support explorations into how data science techniques could be used to advance complex on-line extraction and analysis systems. I am willing to eat my past words to some degree. While I think more federal research support could always be provided, with this project support, we are investigating fundamental relationships between the structures of molecules and their interaction with different materials, in the context of on-line supercritical fluid extraction–supercritical fluid chromatography (SFE). -SFC).

We are working together with the Center on Stochastic Modeling, Optimization, and Statistics (COSMOS) in the Department of Industrial Engineering at the University of Texas at Arlington. We approached this group, led by Profs. Victoria Chen and Jay Rosenberger, when we realized our attempts at using standard multivariate optimization for SFE–SFC(1) might fall short in the long term. SFE–SFC is an extremely powerful analytical system, but it is quite complex in the number of variables, which need to be considered and optimized for new applications.

While I do not intend to drill down into the nuts and bolts of the full project details, our intent is to combine machine learning and surrogate optimization to efficiently reach optimal SFE–SFC conditions for a wide range of applications. A big challenge is being able to generate a methodology, that could work across a wide range of molecule types and materials. SFE and SFC are known to be applicable across much of the space that both gas chromatography and liquid chromatography can analyze. The materials under consideration are those that might contain the molecules from which they need to be extracted (such as sample materials), as well as the SFC stationary phases, which are used to both trap and separate the molecules.

Deriving parameters associated with a particular molecule that are predictive of its physicochemical properties, as well as its interactions with a wide variety of materials, is also challenging. On the one side, much can be done by determining linear solvation energy relationships for solutes that have detailed property descriptors, but the sets of molecules where these descriptors are known are limited, and the descriptors are not easy to determine for other molecules of interest. On the other hand, properties that are easily calculable, such as pKa or log P, provide limited prediction capacity when the molecules are present in complex systems.

With that in mind, we have decided to pursue machine learning methods, which can encode the actual chemical structure of the compound and correlate it with measured properties. This type of work is being pursued in drug discovery and synthesis, but as of yet, only to a very limited degree in analytical chemistry.

In order to determine the best encoding strategy for the two- and three-dimensional chemical structures, we have been exploring the potential to predict ultraviolet vacuum (VUV) absorption spectra. Previously, we have investigated the use of computational chemistry methods for predicting VUV spectra (2,3,4). Techniques that use, for example, time-dependent density functional theory, do an okay job, but they do not often produce the fine spectral structure we can observe in experimental gas phase VUV spectra. Using a variety of deep learning methods, we have had good success now in predicting VUV spectra from chemical structures, and even vice versa. This should create a powerful tool to aid both VUV detection for gas chromatography, as well as providing a framework for us to advance our work to optimize SFE–SFC.

We will also use a surrogate optimization technique to study and optimize the on-line extraction and variable trapping. Surrogate optimization is an enhanced-response surface methodology. It incorporates a wider range of functions, in order to handle more complex response surfaces. Our team has been working on the code. For that initial work, we have focused on modeling electrospray response of different analytes, which is a bit simpler and less instrument intensive than jumping into the SFE–SFC optimization. This also builds off of previous related research, we have reported in the past (5).

Working with the students and faculty from COSMOS has been a great experience. While I am far from able to code or decode any Python, it has been extremely enlightening to gain a better appreciation for cutting-edge data science. The biggest challenge with such a collaboration is communication. As we are each experts in our domains, trying to bridge the gap requires discussions that often drop down to very basic fundamentals in each of our fields. But, as we begin to be able to better speak each other’s language, the potential for advancement in both of our fields has become clear.

Now, as some students graduate from the team, it will be interesting to see what kind of employment they can find. I have no doubt that analytical chemists with some data science background will be snatched up. I wonder more about the data scientists with some analytical chemistry background. I performed a cursory search on some job sites but did not find a glut of opportunities. Are analytical chemistry companies considering how they might utilize a hard-core data scientist? I would think that this should be quite an active area of ​​hiring. I can attest to the exceptional level of data science expertise these industrial engineers possess as they leave the COSMOS program. I would like to, with this blog, promote them to the analytical chemistry community. Let me know if you are hiring in this area, or if you know someone or some company who is.

References

  1. AP Wicker, K. Tanaka, M. Nishimura, V Chen, T. Ogura, W. Hedgepeth, and KA Schug, Anal. Chim. Acta, 1127, 282–294 (2020).
  2. KA Schug, The LCGC Blog, Nov. 6, 2019. http://www.chromatographyonline.com/lcgc-blog-theoretical-computations-aid-vacuum-ultraviolet-spectroscopic-gas-chromatography-detection
  3. JX Mao, P. Walsh, P. Kroll, and KA Schug, Apple. Spec. 74, 72–80 (2020).
  4. JX Mao, P. Kroll, K. A. Schug, Structural Chem. 30, 2217–2224 (2019).
  5. MA Raji and KA Schug, Int. J. Mass Spectrom. 279, 100–106 (2009).

Kevin A. Schug is a Full Professor and the Shimadzu Distinguished Professor of Analytical Chemistry in the Department of Chemistry & Biochemistry at The University of Texas (UT) at Arlington. He is also a Partner in Medusa Analytical, LLC. Research in the Schug group at UT Arlington spans fundamental and applied areas of separation science, spectroscopy, and mass spectrometry. Schug was named the LCGC Emerging Leader in Chromatography in 2009 and the 2012 American Chemical Society Division of Analytical Chemistry Young Investigator in Separation Science. He is a fellow of both the UT Arlington and UT System-Wide Academies of Distinguished Teachers.

Leave a Comment