Prescriptive method for optimizing cost of data collection and annotation in machine learning of clinical ultrasound
Lawley, Alistair and Hampson, Rory and Worrall, Kevin and Dobie, Gordon; (2023) Prescriptive method for optimizing cost of data collection and annotation in machine learning of clinical ultrasound. In: 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, AUS. ISBN 9798350324471 (https://doi.org/10.1109/EMBC40787.2023.10340858)
Preview |
Text.
Filename: Lawley_etal_EMBC_2023_data_collection_and_annotation_in_machine_learning_of_clinical_ultrasound.pdf
Accepted Author Manuscript License: Strathprints license 1.0 Download (1MB)| Preview |
Abstract
Machine learning in medical ultrasound faces a major challenge: the prohibitive costs of producing and annotating clinical data. Optimizing the data collection and annotation will improve model training efficiency, reducing project cost and times. This paper prescribes a 2-phase method for cost optimization based on iterative accuracy/sample size predictions, and active learning for annotation optimization. Methods: Using public breast, fetal, and lung ultrasound datasets we can: Optimize data collection by statistically predicting accuracy for a desired dataset size; and optimize labeling efficiency using Active Learning, where predictions with lowest certainty were labelled manually using feedback. A practical case study on BUSI data was used to demonstrate the method prescribed in this work. Results: With small data subsets, ~10%, dataset size vs. final accuracy relations can be predicted with diminishing results after 50% usage. Manual annotation was reduced by ~10% using active learning to focus the annotation. Conclusion: This led to cost reductions of 50%-66%, depending on requirements and initial cost model, on BUSI dataset with a negligible accuracy drop of 3.75% from theoretical maximums. Clinical Relevance— This work provides methodology to optimize dataset size and manual data labelling, this allows generation of cost-effective datasets, of interest to all, but particularly for financially limited trials and feasibility studies, Reducing the time burden on annotating clinicians.
ORCID iDs
Lawley, Alistair ORCID: https://orcid.org/0000-0002-0903-1116, Hampson, Rory ORCID: https://orcid.org/0000-0001-7903-7460, Worrall, Kevin and Dobie, Gordon ORCID: https://orcid.org/0000-0003-3972-5917;-
-
Item type: Book Section ID code: 86116 Dates: DateEvent11 December 2023Published24 July 2023Published Online30 June 2023AcceptedNotes: © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Subjects: Medicine > Biomedical engineering. Electronics. Instrumentation Department: Faculty of Engineering > Electronic and Electrical Engineering Depositing user: Pure Administrator Date deposited: 11 Jul 2023 15:06 Last modified: 11 Nov 2024 15:34 URI: https://strathprints.strath.ac.uk/id/eprint/86116