A biopsy/non-biopsy approach to voice disorder classification using deep learning
Conway, Frank and Perry, Ross and Di Caterina, Gaetano and Cohen, Wendy and Wynne, David M.; (2026) A biopsy/non-biopsy approach to voice disorder classification using deep learning. In: 2025 IEEE Symposium on Computers and Communications (ISCC). IEEE Symposium on Computers and Communications (ISCC) . IEEE, ITA. ISBN 979-8-3315-2420-3 (https://doi.org/10.1109/ISCC65549.2025.11325982)
Preview |
Text.
Filename: Conway-etal-IEEE-ISCC-2025-A-biopsy-non-biopsy-approach-to-voice-disorder-classification-using-deep-learning.pdf
Accepted Author Manuscript License:
Download (837kB)| Preview |
Abstract
Systems to detect voice pathologies have gained increasing attention due to the advancement of machine learning and the potential positive impact it can have on the healthcare industry. However, when developing such systems, many existing methods share the challenges of small sample sizes within voice pathology datasets. Many methods have chosen to group these samples and compare them to healthy ones for a binary 'has voice pathology/healthy' approach, which does not prove useful in real-world applications, i.e. clinical settings. This research proposes a novel, practical method of grouping voice pathologies for feature learning, which showed promising results on the Saarbrucken Voice Database (SVD) and a local Recurrent Respiratory Papillomatosis (RRP) dataset. Mel-frequency coefficients were used with various Recurrent Neural Networks for feature learning. These models were compared using a multi-stage approach. The first stage, classifying all classes available in the SVD, predictably produced the worst results, likely due to features being hard to distinguish when the sample sizes are few and the classes are many. The second stage investigated the impact of grouping the SVD into Functional, Structural or Neurological classes and saw that the F1-Score increased to 41.04%. In the last stage, each voice pathology was grouped into whether or not the clinician would require a biopsy or not, which increased the F1-Score to 69.81% on the SVD and 64.25% on a local RRP dataset. Although this novel approach shows promising results, further research using more sophisticated deep learning models is needed to confirm its reliability.
ORCID iDs
Conway, Frank, Perry, Ross
ORCID: https://orcid.org/0009-0008-5315-2987, Di Caterina, Gaetano
ORCID: https://orcid.org/0000-0002-7256-0897, Cohen, Wendy
ORCID: https://orcid.org/0000-0002-1271-9229 and Wynne, David M.;
-
-
Item type: Book Section ID code: 93785 Dates: DateEvent13 January 2026Published25 March 2025AcceptedSubjects: Medicine > Internal medicine > Neuroscience. Biological psychiatry. Neuropsychiatry > Communicative disorders. Speech and language disorders
Science > Mathematics > Electronic computers. Computer science > Other topics, A-Z > Human-computer interactionDepartment: Faculty of Engineering > Electronic and Electrical Engineering
Strategic Research Themes > Health and Wellbeing
Faculty of Humanities and Social Sciences (HaSS) > Psychological Sciences and Health > Speech and Language TherapyDepositing user: Pure Administrator Date deposited: 11 Aug 2025 14:22 Last modified: 14 May 2026 17:58 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/93785
Tools
Tools






