Categories
Nevin Manimala Statistics

Automated machine-learning framework for predicting drug solubility in supercritical CO2 for sustainable process development

Sci Rep. 2026 Jun 25. doi: 10.1038/s41598-026-59449-z. Online ahead of print.

ABSTRACT

Reliable prediction of drug solubility in supercritical carbon dioxide (SC-CO2) is central to designing environmentally conscious pharmaceutical processes, yet experimental solubility measurements remain slow, resource-intensive, and often restricted to limited operating ranges. These constraints restrict the development of green technologies such as particle formation, controlled micronization, and solvent-free formulation strategies. This study introduces an automated computational framework that couples modern regression algorithms with bio-inspired optimization to deliver scalable solubility prediction across varying conditions. The approach employs Adaptive Boosting Regression and Light Gradient Boosting Regression as core learners, which are combined through hybrid ensemble schemes and tuned using two recent metaheuristic algorithms the Osprey Optimization Algorithm and the Artificial Protozoa Optimizer. Model behavior was assessed using repeated cross-validation, a suite of accuracy metrics (RMSE, R2, MDAPE, SI, NSE), non-parametric statistical comparison, and ANOVA-based sensitivity evaluation. A multi-criteria ranking using the TOPSIS method identified the APO-driven ensemble (ALAP) as the most reliable configuration, achieving RMSE = 0.191, R2 = 0.982, and MDAPE = 15.6% on the test set. Beyond its algorithmic design, the framework offers a practical computational alternative to extensive laboratory experimentation, providing a transferable, data-driven tool for efficient and eco-friendly pharmaceutical process design.

PMID:42342798 | DOI:10.1038/s41598-026-59449-z

By Nevin Manimala

Portfolio Website for Nevin Manimala