A data driven approach to audiovisual speech mapping
Abel, Andrew and Marxer, Ricard and Barker, Jon and Watt, Roger and Whitmer, Bill and Derleth, Peter and Hussain, Amir; Liu, Cheng-Lin and Hussain, Amir and Luo, Bin and Tan, Kay Chen and Zeng, Yi and Zhang, Zhaoxiang, eds. (2016) A data driven approach to audiovisual speech mapping. In: Advances in Brain Inspired Cognitive Systems. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) . Springer-Verlag, CHN, pp. 331-342. ISBN 9783319496856 (https://doi.org/10.1007/978-3-319-49685-6_30)
Preview |
Text.
Filename: Abel_etal_BICS2016_A_data_driven_approach_to_audiovisual_speech_mapping.pdf
Accepted Author Manuscript License: Strathprints license 1.0 Download (165kB)| Preview |
Abstract
The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.
ORCID iDs
Abel, Andrew ORCID: https://orcid.org/0000-0002-3631-8753, Marxer, Ricard, Barker, Jon, Watt, Roger, Whitmer, Bill, Derleth, Peter and Hussain, Amir; Liu, Cheng-Lin, Hussain, Amir, Luo, Bin, Tan, Kay Chen, Zeng, Yi and Zhang, Zhaoxiang-
-
Item type: Book Section ID code: 86686 Dates: DateEvent13 November 2016PublishedSubjects: Science > Mathematics > Electronic computers. Computer science Department: Faculty of Science > Computer and Information Sciences Depositing user: Pure Administrator Date deposited: 06 Sep 2023 10:33 Last modified: 04 Dec 2024 01:09 URI: https://strathprints.strath.ac.uk/id/eprint/86686