Gabor-based audiovisual fusion for Mandarin Chinese speech recognition

Xu, Yan and Wang, Hongce and Dong, Zhongping and Li, Yuexuan and Abel, Andrew; (2022) Gabor-based audiovisual fusion for Mandarin Chinese speech recognition. In: 2022 30th European Signal Processing Conference (EUSIPCO). 2022 30th European Signal Processing Conference (EUSIPCO) . IEEE, SRB, pp. 603-607. ISBN 9789082797091 (https://doi.org/10.23919/eusipco55093.2022.9909634)

Preview

Text. Filename: Xu_etal_EUSIPCO_2022_Gabor_based_audiovisual_fusion_for_Mandarin_Chinese_speech_recognition.pdf
Accepted Author Manuscript
License: Strathprints license 1.0
Download (607kB)| Preview

Abstract

Audiovisual Speech Recognition (AVSR) is a popular research topic, and incorporating visual features into speech recognition systems has been found to deliver good results. In recent years, end-to-end Convolutional Neural Network (CNN) based deep learning has been widely utilized. However, these often require big data and can be time consuming to train. A lot of speech research also tends to focus on English language datasets. In this paper, we propose a lightweight optimized and automated speech recognition system using Gabor based feature extraction, combined with our Audiovisual Mandarin Chinese (AVMC) corpus. This combines Mel-frequency Cepstral Coefficients (MFCCs) + CNN_Bidirectional Long Short-term Memory (BiLSTM)_Attention (CLA) model for Audio Speech Recognition, and low dimension Gabor visual features + CLA model for Visual Speech Recognition. As we are focusing on Chinese language recognition, we individually analyse initials, finals, and tones, as part of pinyin speech production. The proposed low dimensionality system achieves 88.6%, 87.5% and 93.6% accuracy for tones, initials and finals respectively at char-level, 84.8% for pinyin at word-level.

Share and Export

Item metadata

Item type:	Book Section
ID code:	83858
Dates:	Date Event 18 October 2022 Published 29 August 2022 Published Online 15 May 2022 Accepted
Notes:	© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subjects:	Technology > Electrical engineering. Electronics Nuclear engineering Language and Literature > Oriental languages and literatures
Department:	Faculty of Science > Computer and Information Sciences
Depositing user:	Pure Administrator
Date deposited:	26 Jan 2023 12:14
Last modified:	16 Apr 2024 00:06
URI:	https://strathprints.strath.ac.uk/id/eprint/83858

CORE (COnnecting REpositories)