Two-stage GAN for field-of-view standardisation and tongue region enhancement in ultrasound for cleft palate speech pattern analysis
Al Ani, Saja and Cleland, Joanne and Zoha, Ahmed; (2025) Two-stage GAN for field-of-view standardisation and tongue region enhancement in ultrasound for cleft palate speech pattern analysis. In: Proceedings of 2025 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2025). Lecture Notes in Electrical Engineering . Springer, GBR. (In Press)
|
Text.
Filename: Ani-etal-MICAD-2025-Two-Stage-GAN-for-Field-of-View-Standardisation-and-Tongue-Region.pdf
Accepted Author Manuscript Restricted to Repository staff only until 1 January 2099. Download (1MB) | Request a copy |
Abstract
Ultrasound tongue imaging (UTI) provides non-invasive, real-time insight into tongue motion. However, variability in the field of view (FoV) and distracting background anatomy limit the automated analysis. We propose a two-stage generative pipeline using a Pix2Pix conditional generative adversarial network (cGAN) to address these issues. Stage 1 standardises FoV, converting wide-angle frames to a standardised 97° view, while Stage 2 refines the region of interest (ROI). Evaluation showed near-lossless fidelity in Stage 1 (SSIM =0.999996, PSNR = 88.5 dB) and high structural preservation in Stage 2 (SSIM = 0.973, PSNR = 33.3 dB). For binary classification of children with cleft palate ± lip versus typically developing (TD) peers, FoV standardisation improved generalisability and achieved 98.8% accuracy with mixed real and synthetic training. ROI refinement boosted recall to 97.8%, reducing false negatives in screening. Cross-domain testing confirmed that Stage 1 outputs are indistinguishable from real data, while Stage 2 shifts were mitigated by mixed training. Beyond classification, the framework supports data harmonisation, interpretability, and augmentation of rare phoneme classes, highlighting the potential of generative AI to enhance robustness and clinical utility of UTI in speech disorder assessment.
ORCID iDs
Al Ani, Saja
ORCID: https://orcid.org/0009-0001-3703-8040, Cleland, Joanne
ORCID: https://orcid.org/0000-0002-0660-1646 and Zoha, Ahmed;
-
-
Item type: Book Section ID code: 94692 Dates: DateEvent4 November 2025Published4 November 2025AcceptedSubjects: Medicine > Internal medicine > Neuroscience. Biological psychiatry. Neuropsychiatry > Communicative disorders. Speech and language disorders Department: Faculty of Humanities and Social Sciences (HaSS) > Psychological Sciences and Health > Speech and Language Therapy
Strategic Research Themes > Health and Wellbeing
Faculty of Engineering > Electronic and Electrical EngineeringDepositing user: Pure Administrator Date deposited: 10 Nov 2025 12:09 Last modified: 22 Jan 2026 10:44 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/94692
Tools
Tools





