Counterfactual medical images generation for lung disease diagnosis using probabilistic causal models and active learning

Zhu, Yifei and Zhang, Lei and Sainsbury, Chris and Dong, Feng and MacLay, John and Lowe, David J. and Ye, Xujiong (2025) Counterfactual medical images generation for lung disease diagnosis using probabilistic causal models and active learning. IEEE Access, 13. pp. 170817-170826. ISSN 2169-3536 (https://doi.org/10.1109/access.2025.3615683)

[thumbnail of Zhu-etal-2025-Counterfactual_Medical_Images_Generation_for_Lung_Disease_Diagnosis]
Preview
Text. Filename: Zhu-etal-2025-Counterfactual_Medical_Images_Generation_for_Lung_Disease_Diagnosis.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (3MB)| Preview

Abstract

Recent advancements in deep learning have shown promise in diagnosing lung diseases from medical images, but these methods often lack causal inference capabilities, limiting their applicability in clinical decision-making. This study focuses on leveraging causal generative modelling for counterfactual analysis to enhance the understanding and diagnosis of lung diseases. We developed a Structured Causal Model designed to generate clinically meaningful counterfactual images of lung diseases. Our framework integrates active learning with uncertainty measurement to address data quality issues in clinical datasets and refine the training set distribution. Inspired by the human-in-the-loop concept, expert feedback was incorporated into the training pipeline to ensure the generated images align with clinical expectations. We evaluated the generated counterfactuals using model accuracy and a specialized expert model that calculates disease probabilities based on the images. The proposed model achieved a 93.27% accuracy in generating counterfactual images representative of the corresponding clinical conditions, as confirmed by medical experts. Active learning with uncertainty measurement effectively enhanced the data distribution, maintaining a heavy-tailed structure to better reflect real-world clinical data. The integration of expert knowledge further ensured the clinical validity and relevance of the counterfactuals, supporting more informed diagnostic and prognostic decisions. Our study highlights the potential of causal generative modelling to improve lung disease diagnosis and prognosis by generating clinically meaningful counterfactual images, supported by active learning and expert feedback.