Towards high-performance deep learning architecture and hardware accelerator design for robust parameters analysis in diffuse correlation spectroscopy
Zang, Zhenya and Wang, Quan and Pan, Mingliang and Zhang, Yuanzhe and Chen, Xi and Li, Xingda and Li, David Day Uei (2025) Towards high-performance deep learning architecture and hardware accelerator design for robust parameters analysis in diffuse correlation spectroscopy. Computer Methods and Programs in Biomedicine, 258. 108471. ISSN 0169-2607 (https://doi.org/10.1016/j.cmpb.2024.108471)
Preview |
Text.
Filename: Zhang-etal-CMPB-2024-Towards-high-performance-deep-learning-architecture-and-hardware.pdf
Final Published Version License: Download (10MB)| Preview |
Abstract
This study proposes a compact deep learning (DL) architecture and a highly parallelized computing hardware platform to reconstruct the blood flow index (BFi) in diffuse correlation spectroscopy (DCS). We leveraged a rigorous analytical model to generate autocorrelation functions (ACFs) to train the DL network. We assessed the accuracy of the proposed DL using simulated and milk phantom data. Compared to convolutional neural networks (CNN), our lightweight DL architecture achieves 66.7% and 18.5% improvement in MSE for BFi and the coherence factor β, using synthetic data evaluation. The accuracy of rBFi over different algorithms was also investigated. We further simplified the DL computing primitives using subtraction for feature extraction, considering further hardware implementation. We extensively explored computing parallelism and fixed-point quantization within the DL architecture. With the DL model’s compact size, we employed unrolling and pipelining optimizations for computation-intensive for-loops in the DL model while storing all learned parameters in on-chip BRAMs. We also achieved pixel-wise parallelism, enabling simultaneous, real-time processing of 10 and 15 autocorrelation functions on Zynq-7000 and Zynq-Ultrascale+ field programmable gate array (FPGA), respectively. Unlike existing FPGA accelerators that produce BFi and the β from autocorrelation functions on standalone hardware, our approach is an encapsulated, end-to-end on-chip conversion process from intensity photon data to the temporal intensity ACF and subsequently reconstructing BFi and β. This hardware platform achieves an on-chip solution to replace post-processing and miniaturize modern DCS systems that use single-photon cameras. We also comprehensively compared the computational efficiency of our FPGA accelerator to CPU and GPU solutions.
ORCID iDs
Zang, Zhenya, Wang, Quan, Pan, Mingliang, Zhang, Yuanzhe, Chen, Xi, Li, Xingda and Li, David Day Uei ORCID: https://orcid.org/0000-0002-6401-4263;-
-
Item type: Article ID code: 91026 Dates: DateEvent1 January 2025Published28 October 2024Published Online20 October 2024AcceptedSubjects: Medicine > Biomedical engineering. Electronics. Instrumentation Department: Faculty of Science > Strathclyde Institute of Pharmacy and Biomedical Sciences
Faculty of Engineering > Biomedical Engineering
Strategic Research Themes > Health and WellbeingDepositing user: Pure Administrator Date deposited: 31 Oct 2024 15:42 Last modified: 15 Nov 2024 08:45 URI: https://strathprints.strath.ac.uk/id/eprint/91026