OpenCrystalData : an open-access particle image database to facilitate learning, experimentation, and development of image analysis models for crystallization processes

Barhate, Yash and Boyle, Christopher and Salami, Hossein and Wu, Wei-Lee and Taherimakhsousi, Nina and Rabinowitz, Charlie and Bommarius, Andreas and Cardona, Javier and Nagy, Zoltan K. and Rousseau, Ronald and Grover, Martha (2024) OpenCrystalData : an open-access particle image database to facilitate learning, experimentation, and development of image analysis models for crystallization processes. Digital Chemical Engineering, 11. 100150. ISSN 2772-5081 (https://doi.org/10.1016/j.dche.2024.100150)

[thumbnail of Barhate-etal-DCE-2024-OpenCrystalData-an-open-access-particle-image-database]
Preview
Text. Filename: Barhate-etal-DCE-2024-OpenCrystalData-an-open-access-particle-image-database.pdf
Final Published Version
License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 logo

Download (2MB)| Preview

Abstract

Imaging and image-based process analytical technologies (PAT) have revolutionized the design, development, and operation of crystallization processes, providing greater process understanding through the characterization of particle size, shape and crystallization mechanisms in real-time. The performance of corresponding PAT models, including machine learning/artificial intelligence (ML/AI)-based approaches, is highly reliant on the data quality used for training or validation. However, acquiring high quality data is often time consuming and a major roadblock in developing image analysis models for crystallization processes. To address the lack of diverse, high-quality, and publicly available particle image datasets, this paper presents an initiative to create an open-access crystallization-related image database: OpenCrystalData (OCD, at www.kaggle.com/opencrystaldata/datasets). The datasets consist of images from different crystallization systems with different particle sizes and shapes captured under various conditions. The initial release consists of four different datasets, addressing the estimation of particle size distribution using in-situ images for different categories of particles and detection of anomalous particles for process monitoring purposes. Images are collected using various instruments, followed by case-specific processing steps, such as ground-truth labeling and particle size characterization using offline microscopy. Datasets are released on the online collaborative platform Kaggle, along with specific guidelines for each dataset. These datasets are aimed to serve as a resource for researchers to enable learning, experimentation, development, and evaluation and comparison of different analytical approaches and algorithms. Another goal of this initiative is to encourage researchers to contribute new datasets focusing on various systems and problem statements. Ultimately, OpenCrystalData is intended to facilitate and inspire new developments in imaging-based PAT for crystallization processes, encouraging a shift from time-consuming offline analysis towards comprehensive real-time process insights that drive product quality.