Uncertainty quantification for deep learning in ultrasonic crack characterization

Pyle, Richard J. and Hughes, Robert R. and Ali, Amine Ait Si and Wilcox, Paul D. (2022) Uncertainty quantification for deep learning in ultrasonic crack characterization. IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 69 (7). pp. 2339-2351. ISSN 0885-3010 (https://doi.org/10.1109/TUFFC.2022.3176926)

[thumbnail of Pyle-etal-IEEE-TUFFC-2022-Uncertainty-quantification-for-deep-learning]
Preview
Text. Filename: Pyle_etal_IEEE_TUFFC_2022_Uncertainty_quantification_for_deep_learning.pdf
Accepted Author Manuscript
License: Strathprints license 1.0

Download (1MB)| Preview

Abstract

Deep learning for nondestructive evaluation (NDE) has received a lot of attention in recent years for its potential ability to provide human level data analysis. However, little research into quantifying the uncertainty of its predictions has been done. Uncertainty quantification (UQ) is essential for qualifying NDE inspections and building trust in their predictions. Therefore, this article aims to demonstrate how UQ can best be achieved for deep learning in the context of crack sizing for inline pipe inspection. A convolutional neural network architecture is used to size surface breaking defects from plane wave imaging (PWI) images with two modern UQ methods: deep ensembles and Monte Carlo dropout. The network is trained using PWI images of surface breaking defects simulated with a hybrid finite element / ray-based model. Successful UQ is judged by calibration and anomaly detection, which refer to whether in-domain model error is proportional to uncertainty and if out of training domain data is assigned high uncertainty. Calibration is tested using simulated and experimental images of surface breaking cracks, while anomaly detection is tested using experimental side-drilled holes and simulated embedded cracks. Monte Carlo dropout demonstrates poor uncertainty quantification with little separation between in and out-of-distribution data and a weak linear fit ( R=0.84 ) between experimental root-mean-square-error and uncertainty. Deep ensembles improve upon Monte Carlo dropout in both calibration ( R=0.95 ) and anomaly detection. Adding spectral normalization and residual connections to deep ensembles slightly improves calibration ( R=0.98 ) and significantly improves the reliability of assigning high uncertainty to out-of-distribution samples.