Maximum Gaussianality training for deep speaker vector normalization

Cai, Yunqi and Li, Lantian and Abel, Andrew and Zhu, Xiaoyan and Wang, Dong (2024) Maximum Gaussianality training for deep speaker vector normalization. Pattern Recognition, 145. 109977. ISSN 0031-3203 (https://doi.org/10.1016/j.patcog.2023.109977)

[thumbnail of Cai-etal-PR-2023-Maximum-Gaussianality-training-for-deep-speaker-vector-normalization] Text. Filename: Cai_etal_PR_2023_Maximum_Gaussianality_training_for_deep_speaker_vector_normalization.pdf
Accepted Author Manuscript
Restricted to Repository staff only until 25 September 2024.
License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 logo

Download (1MB) | Request a copy

Abstract

Automatic Speaker Verification (ASV) is a critical task in pattern recognition and has been applied to various security-sensitive scenarios. The current state-of-the-art technique for ASV is based on deep embedding. However, a significant challenge with this approach is that the resulting deep speaker vectors tend to be irregularly distributed. To address this issue, this paper proposes a novel training method called Maximum Gaussianality (MG), which regulates the distribution of the speaker vectors. Compared to the conventional normalization approach based on maximum likelihood (ML), the new approach directly maximizes the Gaussianality of the latent codes, and therefore can both normalize the between-class and within-class distributions in a controlled and reliable way and eliminate the unbound likelihood problem associated with the conventional ML approach. Our experiments on several datasets demonstrate that our MG-based normalization can deliver much better performance than the baseline systems without normalization and outperform discriminative normalization flow (DNF), an ML-based normalization method, particularly when the training data is limited. In theory, the MG criterion can be applied to any task in any research domain where Gaussian distributions are needed, making the MG training a versatile tool.