Signal compaction using polynomial EVD for spherical array processing with applications

Neo, Vincent W. and Evers, Christine and Weiss, Stephan and Naylor, Patrick A. (2023) Signal compaction using polynomial EVD for spherical array processing with applications. IEEE/ACM Transactions on Audio Speech and Language Processing, 31. 3537 - 3549. ISSN 2329-9304 (https://doi.org/10.1109/TASLP.2023.3313441)

[thumbnail of Neo-etal-IEEE-TASLP-2023-Signal-compaction-using-polynomial-EVD-for-spherical]

Preview

Text. Filename: Neo_etal_IEEE_TASLP_2023_Signal_compaction_using_polynomial_EVD_for_spherical.pdf
Accepted Author Manuscript
License: Strathprints license 1.0
Download (2MB)| Preview

Abstract

Multi-channel signals captured by spatially separated sensors often contain a high level of data redundancy. A compact signal representation enables more efficient storage and processing, which has been exploited for data compression, noise reduction, and speech and image coding. This article focuses on the compact representation of speech signals acquired by spherical microphone arrays. A polynomial matrix eigenvalue decomposition (PEVD) can spatially decorrelate signals over a range of time lags and is known to achieve optimum multi-channel data compaction. However, the complexity of PEVD algorithms scales at best cubically with the number of channel signals, e.g., the number of microphones comprised in a spherical array used for processing. In contrast, the spherical harmonic transform (SHT) provides a compact spatial representation of the 3-dimensional sound field measured by spherical microphone arrays, referred to as eigenbeam signals, at a cost that rises only quadratically with the number of microphones. Yet, the SHT's spatially orthogonal basis functions cannot completely decorrelate sound field components over a range of time lags. In this work, we propose to exploit the compact representation offered by the SHT to reduce the number of channels used for subsequent PEVD processing. In the proposed framework for signal representation, we show that the diagonality factor improves by up to 7 dB over the microphone signal representation with a significantly lower computation cost. Moreover, when applying this framework to speech enhancement and source separation, the proposed method improves metrics known as short-time objective intelligibility (STOI) and source-to-distortion ratio (SDR) by up to 0.2 and 20 dB, respectively.

Share and Export

Item metadata

Item type:	Article
ID code:	86641
Dates:	Date Event 20 October 2023 Published 8 September 2023 Published Online 17 August 2023 Accepted
Notes:	© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subjects:	Technology > Electrical engineering. Electronics Nuclear engineering > Electrical apparatus and materials
Department:	Faculty of Engineering > Electronic and Electrical Engineering Technology and Innovation Centre > Sensors and Asset Management
Depositing user:	Pure Administrator
Date deposited:	31 Aug 2023 15:05
Last modified:	26 Apr 2024 01:03
Related URLs:	Journal or Publication
URI:	https://strathprints.strath.ac.uk/id/eprint/86641

CORE (COnnecting REpositories)