Gabor-based audiovisual fusion for Mandarin Chinese speech recognition
Xu, Yan and Wang, Hongce and Dong, Zhongping and Li, Yuexuan and Abel, Andrew; (2022) Gabor-based audiovisual fusion for Mandarin Chinese speech recognition. In: 2022 30th European Signal Processing Conference (EUSIPCO). 2022 30th European Signal Processing Conference (EUSIPCO) . IEEE, SRB, pp. 603-607. ISBN 9789082797091 (https://doi.org/10.23919/eusipco55093.2022.9909634)
Preview |
Text.
Filename: Xu_etal_EUSIPCO_2022_Gabor_based_audiovisual_fusion_for_Mandarin_Chinese_speech_recognition.pdf
Accepted Author Manuscript License: Strathprints license 1.0 Download (607kB)| Preview |
Abstract
Audiovisual Speech Recognition (AVSR) is a popular research topic, and incorporating visual features into speech recognition systems has been found to deliver good results. In recent years, end-to-end Convolutional Neural Network (CNN) based deep learning has been widely utilized. However, these often require big data and can be time consuming to train. A lot of speech research also tends to focus on English language datasets. In this paper, we propose a lightweight optimized and automated speech recognition system using Gabor based feature extraction, combined with our Audiovisual Mandarin Chinese (AVMC) corpus. This combines Mel-frequency Cepstral Coefficients (MFCCs) + CNN_Bidirectional Long Short-term Memory (BiLSTM)_Attention (CLA) model for Audio Speech Recognition, and low dimension Gabor visual features + CLA model for Visual Speech Recognition. As we are focusing on Chinese language recognition, we individually analyse initials, finals, and tones, as part of pinyin speech production. The proposed low dimensionality system achieves 88.6%, 87.5% and 93.6% accuracy for tones, initials and finals respectively at char-level, 84.8% for pinyin at word-level.
ORCID iDs
Xu, Yan, Wang, Hongce, Dong, Zhongping, Li, Yuexuan and Abel, Andrew ORCID: https://orcid.org/0000-0002-3631-8753;-
-
Item type: Book Section ID code: 83858 Dates: DateEvent18 October 2022Published29 August 2022Published Online15 May 2022AcceptedNotes: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Subjects: Technology > Electrical engineering. Electronics Nuclear engineering
Language and Literature > Oriental languages and literaturesDepartment: Faculty of Science > Computer and Information Sciences Depositing user: Pure Administrator Date deposited: 26 Jan 2023 12:14 Last modified: 15 Oct 2024 00:16 URI: https://strathprints.strath.ac.uk/id/eprint/83858