Machine learning derived correlations for scale-up and technology transfer of primary nucleation kinetics

Yerdelen, Stephanie and Yang, Yihui and Quon, Justin L. and Papageorgiou, Charles D. and Mitchell, Chris and Houson, Ian and Sefcik, Jan and ter Horst, Joop H. and Florence, Alastair J and Brown, Cameron J. (2023) Machine learning derived correlations for scale-up and technology transfer of primary nucleation kinetics. Crystal Growth and Design, 23 (2). pp. 681-693. ISSN 1528-7483 (

[thumbnail of Yerdelen-etal-CGD-2023-Machine-learning-derived-correlations-for-scale-up]
Text. Filename: Yerdelen_etal_CGD_2023_Machine_learning_derived_correlations_for_scale_up.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (3MB)| Preview


Scaling up and technology transfer of crystallization processes have been and continue to be a challenge. This is often due to the stochastic nature of primary nucleation, various scale dependencies of nucleation mechanisms, and the multitude of scale-up approaches. To better understand these dependencies, a series of isothermal induction time studies were performed across a range of vessel volumes, impeller types, and impeller speeds. From these measurements, the nucleation rate and growth time were estimated as parameters of an induction time distribution model. Then using machine learning techniques, correlations between the vessel hydrodynamic features, calculated from computational flow dynamic simulations, and nucleation kinetic parameters were analyzed. Of the 18 machine learning models trained, two models for the nucleation rate were found to have the best performance (in terms of % of predictions within experimental variance): a nonlinear random Forest model and a nonlinear gradient boosting model. For growth time, a nonlinear gradient boosting model was found to outperform the other models tested. These models were then ensembled to directly predict the probability of nucleation, at a given time, solely from hydrodynamic features with an overall root mean square error of 0.16. This work shows how machine learning approaches can be used to analyze limited datasets of induction times to provide insights into what hydrodynamic parameters should be considered in the scale-up of an unseeded crystallization process.