Data mining crystallization kinetics

Maldonado, Diego A. and Vassileiou, Antony and Johnston, Blair and Florence, Alastair J. and Brown, Cameron J. (2022) Data mining crystallization kinetics. Digital Discovery, 1 (5). pp. 621-635. ISSN 2635-098X (

[thumbnail of Maldonado-etal-DD-2022-Data-mining-crystallization-kinetics]
Text. Filename: Maldonado_etal_DD_2022_Data_mining_crystallization_kinetics.pdf
Final Published Version
License: Creative Commons Attribution-NonCommercial 3.0 logo

Download (1MB)| Preview


The population balance model is a valuable modelling tool which facilitates the optimization and understanding of crystallization processes. However, in order to use this tool, it is necessary to have previous knowledge of the crystallization kinetics, specifically crystal growth and nucleation. The majority of approaches to achieve proper estimations of kinetic parameters require experimental data. Over time, a vast amount of literature on the estimation of kinetic parameters and population balances has been published. Considering the availability of data, in this work a database was built with information on solute, solvent, kinetic expression, parameters, crystallization method and seeding. Correlations were assessed and cluster structures identified by hierarchical cluster analysis. The final database contains 336 datapoints of kinetic parameters from 185 different sources. The data were analysed using kinetic parameters of the most common expressions. Subsequently, clusters were identified for each kinetic model. With these clusters, classification random forest models were made using solute descriptors, seeding, solvent, and crystallization methods as classifiers. Random forest models had an overall classification accuracy higher than 70% whereby they were useful for providing rough estimates of kinetic parameters, although these methods have some limitations.