Polynomial eigenvalue decomposition-based target speaker voice activity detection in the presence of competing talkers

Neo, Vincent W. and Weiss, Stephan and McKnight, Simon W. and Hogg, Aidan O. T. and Naylor, Patrick A. (2022) Polynomial eigenvalue decomposition-based target speaker voice activity detection in the presence of competing talkers. In: 17th International Workshop on Acoustic Signal Enhancement, 2022-09-05 - 2022-09-08. (https://doi.org/10.1109/IWAENC53105.2022.9914796)

[thumbnail of Neo-etal-IWAENC2022-Polynomial-eigenvalue-decomposition-based-target-speaker-voice-activity-detection]
Preview
Text. Filename: Neo_etal_IWAENC2022_Polynomial_eigenvalue_decomposition_based_target_speaker_voice_activity_detection.pdf
Accepted Author Manuscript
License: Strathprints license 1.0

Download (855kB)| Preview

Abstract

Voice activity detection (VAD) algorithms are essential for many speech processing applications, such as speaker diarization, automatic speech recognition, speech enhancement, and speech coding. With a good VAD algorithm, non-speech segments can be excluded to improve the performance and computation of these applications. In this paper, we propose a polynomial eigenvalue decomposition-based target-speaker VAD algorithm to detect unseen target speakers in the presence of competing talkers. The proposed approach uses frame-based processing across multi-microphones to compute the syndrome energy, used for testing the presence or absence of a target speaker. The proposed approach is consistently among the best in F1 and balanced accuracy scores over the investigated range of signal to interference ratio (SIR) from -10 dB to 20 dB.

ORCID iDs

Neo, Vincent W., Weiss, Stephan ORCID logoORCID: https://orcid.org/0000-0002-3486-7206, McKnight, Simon W., Hogg, Aidan O. T. and Naylor, Patrick A.;