Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies
McHardy, Rose G. and Antoniou, Georgios and Conn, Justin J. A. and Baker, Matthew J. and Palmer, David S. (2023) Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies. Analyst, 148 (16). pp. 3860-3869. ISSN 0003-2654 (https://doi.org/10.1039/D3AN00669G)
Preview |
Text.
Filename: McHardy_etal_Analyst_2023_Augmentation_of_FTIR_spectral_datasets_using_Wasserstein_generative_adversarial_networks.pdf
Final Published Version License: Download (750kB)| Preview |
Abstract
Over recent years, deep learning (DL) has become more widely used within the field of cancer diagnostics. However, DL often requires large training datasets to prevent overfitting, which can be difficult and expensive to acquire. Data augmentation is a method that can be used to generate new data points to train DL models. In this study, we use attenuated total reflectance Fourier-transform infrared (ATR-FTIR) spectra of patient dried serum samples and compare non-generative data augmentation methods to Wasserstein generative adversarial networks (WGANs) in their ability to improve the performance of a convolutional neural network (CNN) to differentiate between pancreatic cancer and non-cancer samples in a total cohort of 625 patients. The results show that WGAN augmented spectra improve CNN performance more than non-generative augmented spectra. When compared with a model that utilised no augmented spectra, adding WGAN augmented spectra to a CNN with the same architecture and same parameters, increased the area under the receiver operating characteristic curve (AUC) from 0.661 to 0.757, presenting a 15% increase in diagnostic performance. In a separate test on a colorectal cancer dataset, data augmentation using a WGAN led to an increase in AUC from 0.905 to 0.955. This demonstrates the impact data augmentation can have on DL performance for cancer diagnosis when the amount of real data available for model training is limited.
ORCID iDs
McHardy, Rose G., Antoniou, Georgios, Conn, Justin J. A., Baker, Matthew J. and Palmer, David S. ORCID: https://orcid.org/0000-0003-4356-9144;-
-
Item type: Article ID code: 86124 Dates: DateEvent6 July 2023Published6 July 2023Published Online21 June 2023Accepted28 April 2023SubmittedSubjects: Science > Chemistry > Analytical chemistry
Medicine > Internal medicine > Neoplasms. Tumors. Oncology (including Cancer)Department: Faculty of Science > Pure and Applied Chemistry Depositing user: Pure Administrator Date deposited: 12 Jul 2023 11:06 Last modified: 20 Jan 2025 02:23 URI: https://strathprints.strath.ac.uk/id/eprint/86124