Machine learning prediction of empirical polarity using SMILES encoding of organic solvents

Mol Divers. 2022 Nov 5. doi: 10.1007/s11030-022-10559-6. Online ahead of print.

ABSTRACT

Machine learning based statistical models have played a significant role in increasing the speed and accuracy with which the chemical and physical properties of chemical compounds can be predicted as compared to the experimental, and traditional ab initio and quantum mechanical approaches. The transformative impact that these techniques have, in the field of chemical sciences has completely changed the way experiments are designed. The last decade has seen the prominence of computer-aided molecular design based on machine learning algorithms. The major challenge has been the generation of machine-readable data in the form of descriptors and observations for training the model, which can again be time-consuming and computationally expensive if atomic coordinates based molecular encoding approach is used. In this study, we have tried to solve this problem using SMILES representation of molecules for generating various topological, physicochemical, electronic and steric descriptors using open-source cheminformatics packages. With the aid of the data generated using these packages, we have been able to develop a simple and explainable quantitative structure property relationship model using artificial neural network based on 7 numerical descriptors and 1 categorical descriptor for predicting the empirical polarity of a wide diversity of organic solvents. Since polarity is the representation of various solute-solvent and solvent-solvent interactions taking place in an organic transformation, its intuition beforehand will definitely help a chemist in a better experimental design. An ANN algorithm based on 8 descriptors was successfully employed to predict the E_T(30) values of organic solvents.

PMID:36334165 | DOI:10.1007/s11030-022-10559-6

By Nevin Manimala