Deep learning in ultrasound tongue imaging : a systematic review toward automated detection of speech sound disorders
Al Ani, Saja and Cleland, Joanne and Zoha, Ahmed (2025) Deep learning in ultrasound tongue imaging : a systematic review toward automated detection of speech sound disorders. Frontiers in Artificial Intelligence, 8. 1631134. ISSN 2624-8212 (https://doi.org/10.3389/frai.2025.1631134)
Preview |
Text.
Filename: Al-Ani-etal-FAI-2025-Deep-Learning-in-Ultrasound-Tongue-Imaging-A-Systematic-Review.pdf
Final Published Version License:
Download (1MB)| Preview |
Abstract
Background: Speech sound disorders (SSD) in children can significantly impact communication and development. Ultrasound tongue imaging (UTI) is a non-invasive method for visualising tongue motion during speech, offering a promising alternative for diagnosis and therapy. Deep learning (DL) techniques have shown great promise in automating the analysis of UTI data, although their clinical application for SSD remains underexplored. Objective: This review aims to synthesise how DL has been utilised in UTI to support automated SSD detection, highlighting the advancement of techniques, key challenges, and future directions. Methods: A comprehensive search of IEEE Xplore, PubMed, ScienceDirect, Scopus, Taylor & Francis, and arXiv identified studies from 2010 through 2025. Inclusion criteria focused on studies using DL to analyse UTI data with relevance to SSD classification, feature extraction, or speech assessment. Eleven studies met the criteria: three directly tackled disordered speech classification tasks, while four addressed supporting tasks like tongue contour segmentation and tongue motion modelling. Promising results were reported in each category, but limitations such as small datasets, inconsistent evaluation, and limited generalisability were common. Results: DL models demonstrate effectiveness in analysing UTI for articulatory assessment and show early potential in identifying SSD-related patterns. The included studies collectively outline a developmental pipeline, from foundational pre-processing to phoneme-level classification in typically developing speakers, and finally to preliminary attempts at classifying speech errors in children with SSD. This progression illustrates significant technological advances; however, it also emphasises gaps such as the lack of large, disorder-focused datasets and the need for integrated end-to-end systems. Conclusions: The field of DL-driven UTI assessment for speech disorders is developing. Current studies provide a strong technical foundation and proof-of-concept for automatic SSD detection using ultrasound, but clinical translation remains limited. Future research should prioritise the creation of larger annotated UTI datasets of disordered speech, developing generalisable and interpretable models, and validating fully integrated DL-UTI pipelines in real-world speech therapy settings. With these advances, DL-based UTI systems have the potential to transform SSD diagnosis and treatment by providing objective, real-time articulatory feedback in a child-friendly manner.
ORCID iDs
Al Ani, Saja
ORCID: https://orcid.org/0009-0001-3703-8040, Cleland, Joanne
ORCID: https://orcid.org/0000-0002-0660-1646 and Zoha, Ahmed;
-
-
Item type: Article ID code: 94120 Dates: DateEvent24 September 2025Published9 September 2025AcceptedSubjects: Medicine > Internal medicine > Neuroscience. Biological psychiatry. Neuropsychiatry > Communicative disorders. Speech and language disorders Department: Faculty of Humanities and Social Sciences (HaSS) > Psychological Sciences and Health > Speech and Language Therapy
Strategic Research Themes > Health and Wellbeing
Faculty of Engineering > Electronic and Electrical EngineeringDepositing user: Pure Administrator Date deposited: 10 Sep 2025 14:37 Last modified: 27 Oct 2025 08:34 URI: https://strathprints.strath.ac.uk/id/eprint/94120
Tools
Tools






