Two-stage GAN for field-of-view standardisation and tongue region enhancement in ultrasound for cleft palate speech pattern analysis

Al Ani, Saja and Cleland, Joanne and Zoha, Ahmed; (2025) Two-stage GAN for field-of-view standardisation and tongue region enhancement in ultrasound for cleft palate speech pattern analysis. In: Proceedings of 2025 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2025). Lecture Notes in Electrical Engineering . Springer, GBR. (In Press)

[thumbnail of Ani-etal-MICAD-2025-Two-Stage-GAN-for-Field-of-View-Standardisation-and-Tongue-Region] Text. Filename: Ani-etal-MICAD-2025-Two-Stage-GAN-for-Field-of-View-Standardisation-and-Tongue-Region.pdf
Accepted Author Manuscript
Restricted to Repository staff only until 1 January 2099.

Download (1MB) | Request a copy

Abstract

Ultrasound tongue imaging (UTI) provides non-invasive, real-time insight into tongue motion. However, variability in the field of view (FoV) and distracting background anatomy limit the automated analysis. We propose a two-stage generative pipeline using a Pix2Pix conditional generative adversarial network (cGAN) to address these issues. Stage 1 standardises FoV, converting wide-angle frames to a standardised 97° view, while Stage 2 refines the region of interest (ROI). Evaluation showed near-lossless fidelity in Stage 1 (SSIM =0.999996, PSNR = 88.5 dB) and high structural preservation in Stage 2 (SSIM = 0.973, PSNR = 33.3 dB). For binary classification of children with cleft palate ± lip versus typically developing (TD) peers, FoV standardisation improved generalisability and achieved 98.8% accuracy with mixed real and synthetic training. ROI refinement boosted recall to 97.8%, reducing false negatives in screening. Cross-domain testing confirmed that Stage 1 outputs are indistinguishable from real data, while Stage 2 shifts were mitigated by mixed training. Beyond classification, the framework supports data harmonisation, interpretability, and augmentation of rare phoneme classes, highlighting the potential of generative AI to enhance robustness and clinical utility of UTI in speech disorder assessment.

ORCID iDs

Al Ani, Saja ORCID logoORCID: https://orcid.org/0009-0001-3703-8040, Cleland, Joanne ORCID logoORCID: https://orcid.org/0000-0002-0660-1646 and Zoha, Ahmed;