TY - JOUR
T1 - Machine learning estimating paracetamol solubility in supercritical CO2 by utilization of K-nearest neighbor regression and metaheuristic algorithms
AU - Thajudeen, Kamal Y.
AU - Alshehri, Saad Ali
AU - Rahamathulla, Mohamed
AU - Ahmed, Mohammed Muqtader
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Because of extensive usage of paracetamol by patients, its solubility improvement would have major impact on wellbeing. Supercritical processing can be used for nanonization of drug particles which in turn increases their solubility and consequently low dosage of drug for patients. This study presents the results of Neighbor-based ensemble models for predicting the mole fraction of paracetamol drug in supercritical solvent as well as solvent density at different conditions. The models were trained and evaluated using data of 40 instances. The K-nearest neighbor regression algorithm selected here as the base model, and ensemble methods of bagging and AdaBoost, were employed for model improvement. Additionally, two metaheuristic algorithms, BAT and GWO, were applied to adjust the hyperparameters of the models. The assessment of each model’s performance was conducted through the utilization of three metrics, namely the R-squared score, MSE, and AARD percentage. The outcomes showed that the GWO-ADA-KNN model demonstrated superior performance in predicting both mole fraction and density, as evidenced by its respective R-squared scores of 0.98105 and 0.96719. These findings indicate that the proposed optimizer and models can predict accurately drug mole fraction and density under different conditions.
AB - Because of extensive usage of paracetamol by patients, its solubility improvement would have major impact on wellbeing. Supercritical processing can be used for nanonization of drug particles which in turn increases their solubility and consequently low dosage of drug for patients. This study presents the results of Neighbor-based ensemble models for predicting the mole fraction of paracetamol drug in supercritical solvent as well as solvent density at different conditions. The models were trained and evaluated using data of 40 instances. The K-nearest neighbor regression algorithm selected here as the base model, and ensemble methods of bagging and AdaBoost, were employed for model improvement. Additionally, two metaheuristic algorithms, BAT and GWO, were applied to adjust the hyperparameters of the models. The assessment of each model’s performance was conducted through the utilization of three metrics, namely the R-squared score, MSE, and AARD percentage. The outcomes showed that the GWO-ADA-KNN model demonstrated superior performance in predicting both mole fraction and density, as evidenced by its respective R-squared scores of 0.98105 and 0.96719. These findings indicate that the proposed optimizer and models can predict accurately drug mole fraction and density under different conditions.
KW - Machine learning
KW - Metaheuristic algorithms
KW - Solubility
KW - Supercritical CO
UR - https://www.scopus.com/pages/publications/105022851791
U2 - 10.1038/s41598-025-22903-5
DO - 10.1038/s41598-025-22903-5
M3 - Article
C2 - 41285973
AN - SCOPUS:105022851791
SN - 2045-2322
VL - 15
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 41761
ER -