Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies

McHardy, Rose G. and Antoniou, Georgios and Conn, Justin J. A. and Baker, Matthew J. and Palmer, David S. (2023) Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies. Analyst, 148 (16). pp. 3860-3869. ISSN 0003-2654 (https://doi.org/10.1039/D3AN00669G)

[thumbnail of McHardy-etal-Analyst-2023-Augmentation-of-FTIR-spectral-datasets-using-Wasserstein-generative-adversarial-networks]

Preview

Text. Filename: McHardy_etal_Analyst_2023_Augmentation_of_FTIR_spectral_datasets_using_Wasserstein_generative_adversarial_networks.pdf
Final Published Version
License:

Download (750kB)| Preview

Abstract

Over recent years, deep learning (DL) has become more widely used within the field of cancer diagnostics. However, DL often requires large training datasets to prevent overfitting, which can be difficult and expensive to acquire. Data augmentation is a method that can be used to generate new data points to train DL models. In this study, we use attenuated total reflectance Fourier-transform infrared (ATR-FTIR) spectra of patient dried serum samples and compare non-generative data augmentation methods to Wasserstein generative adversarial networks (WGANs) in their ability to improve the performance of a convolutional neural network (CNN) to differentiate between pancreatic cancer and non-cancer samples in a total cohort of 625 patients. The results show that WGAN augmented spectra improve CNN performance more than non-generative augmented spectra. When compared with a model that utilised no augmented spectra, adding WGAN augmented spectra to a CNN with the same architecture and same parameters, increased the area under the receiver operating characteristic curve (AUC) from 0.661 to 0.757, presenting a 15% increase in diagnostic performance. In a separate test on a colorectal cancer dataset, data augmentation using a WGAN led to an increase in AUC from 0.905 to 0.955. This demonstrates the impact data augmentation can have on DL performance for cancer diagnosis when the amount of real data available for model training is limited.

Share and Export

Item metadata

Item type:	Article
ID code:	86124
Dates:	Date Event 6 July 2023 Published 6 July 2023 Published Online 21 June 2023 Accepted 28 April 2023 Submitted
Subjects:	Science > Chemistry > Analytical chemistry Medicine > Internal medicine > Neoplasms. Tumors. Oncology (including Cancer)
Department:	Faculty of Science > Pure and Applied Chemistry
Depositing user:	Pure Administrator
Date deposited:	12 Jul 2023 11:06
Last modified:	27 Apr 2024 00:36
URI:	https://strathprints.strath.ac.uk/id/eprint/86124

CORE (COnnecting REpositories)