A biopsy/non-biopsy approach to voice disorder classification using deep learning

Conway, Frank and Perry, Ross and Di Caterina, Gaetano and Cohen, Wendy and Wynne, David M.; (2026) A biopsy/non-biopsy approach to voice disorder classification using deep learning. In: 2025 IEEE Symposium on Computers and Communications (ISCC). IEEE Symposium on Computers and Communications (ISCC) . IEEE, ITA. ISBN 979-8-3315-2420-3 (https://doi.org/10.1109/ISCC65549.2025.11325982)

[thumbnail of Conway-etal-IEEE-ISCC-2025-A-biopsy-non-biopsy-approach-to-voice-disorder-classification-using-deep-learning]
Preview
Text. Filename: Conway-etal-IEEE-ISCC-2025-A-biopsy-non-biopsy-approach-to-voice-disorder-classification-using-deep-learning.pdf
Accepted Author Manuscript
License: Creative Commons Attribution 4.0 logo

Download (837kB)| Preview

Abstract

Systems to detect voice pathologies have gained increasing attention due to the advancement of machine learning and the potential positive impact it can have on the healthcare industry. However, when developing such systems, many existing methods share the challenges of small sample sizes within voice pathology datasets. Many methods have chosen to group these samples and compare them to healthy ones for a binary 'has voice pathology/healthy' approach, which does not prove useful in real-world applications, i.e. clinical settings. This research proposes a novel, practical method of grouping voice pathologies for feature learning, which showed promising results on the Saarbrucken Voice Database (SVD) and a local Recurrent Respiratory Papillomatosis (RRP) dataset. Mel-frequency coefficients were used with various Recurrent Neural Networks for feature learning. These models were compared using a multi-stage approach. The first stage, classifying all classes available in the SVD, predictably produced the worst results, likely due to features being hard to distinguish when the sample sizes are few and the classes are many. The second stage investigated the impact of grouping the SVD into Functional, Structural or Neurological classes and saw that the F1-Score increased to 41.04%. In the last stage, each voice pathology was grouped into whether or not the clinician would require a biopsy or not, which increased the F1-Score to 69.81% on the SVD and 64.25% on a local RRP dataset. Although this novel approach shows promising results, further research using more sophisticated deep learning models is needed to confirm its reliability.

ORCID iDs

Conway, Frank, Perry, Ross ORCID logoORCID: https://orcid.org/0009-0008-5315-2987, Di Caterina, Gaetano ORCID logoORCID: https://orcid.org/0000-0002-7256-0897, Cohen, Wendy ORCID logoORCID: https://orcid.org/0000-0002-1271-9229 and Wynne, David M.;