Recent applications of liquid chromatography-based QSRR models for pharmaceutically relevant small molecules: A review

J Pharm Sci. 2025 Oct 30:104047. doi: 10.1016/j.xphs.2025.104047. Online ahead of print.

ABSTRACT

In recent years, quantitative structure retention relationship (QSRR) models have not only been used for predicting retention time (RT) of the newly identified molecules but also for screening their physicochemical properties, predicting multi-target properties, determining of the molecular mechanism of separation, mechanism of small molecule’s affinity to phospholipids, and the retention mechanism of isomeric separation, and optimizing chromatographic method. In addition, researchers are exploring how to integrate Analytical Quality by Design (AQbD), Quantitative Structure Retention Relationship (QSRR) modeling, and Design of Experiments (DoE) to assess the operable design region of validated chromatographic methods, rather than relying solely on empirical optimization of virtual method setups. Researchers have been trying another unique approach called transfer learning. Since the in-house project-based datasets are typically smaller and show issues to get higher accuracy, transfer learning from big data showed a lot of promise. In short, a model needs to pre-train from an established database, such as METLIN-SMRT or CMRT, followed by fine-tuning the model with the in-house dataset and then predicting the RTs of the target molecules. QSRR studies can also follow OECD (Q)SAR guidance during model development to ensure clearly defined endpoints, transparent algorithms, defined applicability domains, and reproducible validation processes. Adoption of these principles would strengthen model reliability. Two of the most crucial factors in getting better model performance are the structural diversity of the selected molecules and the relevance of chosen molecular descriptors. Careful descriptor selection, guided by mechanistic interpretability and OECD-recommended transparency, ensures robust and reproducible predictions across chemically diverse datasets. For stereoisomers, the use of 3D descriptors in conventional machine learning models, such as Random Forest, Support Vector Machines, or Partial Least Squares, which rely on predefined molecular features, or the application graph neural network (GNN)-based models is necessary to capture subtle structural differences and enable mechanistically interpretable predictions, consistent with the green and white analytical chemistry (GAC-WAC) principles of analytical performance, sustainability, and cost-efficiency.

PMID:41176062 | DOI:10.1016/j.xphs.2025.104047

By Nevin Manimala