Towards high-performance deep learning architecture and hardware accelerator design for robust parameters analysis in diffuse correlation spectroscopy

Tools

Zang, Zhenya and Wang, Quan and Pan, Mingliang and Zhang, Yuanzhe and Chen, Xi and Li, Xingda and Li, David Day Uei (2025) Towards high-performance deep learning architecture and hardware accelerator design for robust parameters analysis in diffuse correlation spectroscopy. Computer Methods and Programs in Biomedicine, 258. 108471. ISSN 0169-2607 (https://doi.org/10.1016/j.cmpb.2024.108471)

[thumbnail of Zhang-etal-CMPB-2024-Towards-high-performance-deep-learning-architecture-and-hardware]

Preview

Text. Filename: Zhang-etal-CMPB-2024-Towards-high-performance-deep-learning-architecture-and-hardware.pdf
Final Published Version
License:

Download (10MB)| Preview

Abstract

This study proposes a compact deep learning (DL) architecture and a highly parallelized computing hardware platform to reconstruct the blood flow index (BFi) in diffuse correlation spectroscopy (DCS). We leveraged a rigorous analytical model to generate autocorrelation functions (ACFs) to train the DL network. We assessed the accuracy of the proposed DL using simulated and milk phantom data. Compared to convolutional neural networks (CNN), our lightweight DL architecture achieves 66.7% and 18.5% improvement in MSE for BFi and the coherence factor β, using synthetic data evaluation. The accuracy of rBFi over different algorithms was also investigated. We further simplified the DL computing primitives using subtraction for feature extraction, considering further hardware implementation. We extensively explored computing parallelism and fixed-point quantization within the DL architecture. With the DL model’s compact size, we employed unrolling and pipelining optimizations for computation-intensive for-loops in the DL model while storing all learned parameters in on-chip BRAMs. We also achieved pixel-wise parallelism, enabling simultaneous, real-time processing of 10 and 15 autocorrelation functions on Zynq-7000 and Zynq-Ultrascale+ field programmable gate array (FPGA), respectively. Unlike existing FPGA accelerators that produce BFi and the β from autocorrelation functions on standalone hardware, our approach is an encapsulated, end-to-end on-chip conversion process from intensity photon data to the temporal intensity ACF and subsequently reconstructing BFi and β. This hardware platform achieves an on-chip solution to replace post-processing and miniaturize modern DCS systems that use single-photon cameras. We also comprehensively compared the computational efficiency of our FPGA accelerator to CPU and GPU solutions.

ORCID iDs

Zang, Zhenya, Wang, Quan

, Pan, Mingliang, Zhang, Yuanzhe

, Chen, Xi, Li, Xingda and Li, David Day Uei

;

Share and Export

Item metadata

Item type:	Article
ID code:	91026
Dates:	Date Event 1 January 2025 Published 28 October 2024 Published Online 20 October 2024 Accepted
Subjects:	Medicine > Biomedical engineering. Electronics. Instrumentation
Department:	Faculty of Science > Strathclyde Institute of Pharmacy and Biomedical Sciences Faculty of Engineering > Biomedical Engineering Strategic Research Themes > Health and Wellbeing
Depositing user:	Pure Administrator
Date deposited:	31 Oct 2024 15:42
Last modified:	11 May 2025 01:03
URI:	https://strathprints.strath.ac.uk/id/eprint/91026

CORE (COnnecting REpositories)