A cost focused framework for optimizing collection and annotation of ultrasound datasets
Lawley, Alistair and Hampson, Rory and Worrall, Kevin and Dobie, Gordon (2024) A cost focused framework for optimizing collection and annotation of ultrasound datasets. Biomedical Signal Processing and Control, 92. 106048. ISSN 1746-8094 (https://doi.org/10.1016/j.bspc.2024.106048)
Preview |
Text.
Filename: Lawley-etal-BSPC-2024-A-cost-focused-framework-for-optimizing-collection-and-annotation-of-ultrasound-datasets.pdf
Final Published Version License: Download (6MB)| Preview |
Abstract
Machine learning for medical ultrasound imaging encounters a major challenge: the prohibitive costs of producing and annotating clinical data. The issue of cost vs size is well understood in the context of clinical trials. These same methods can be applied to optimize the data collection and annotation process, ultimately reducing machine learning project cost and times in feasibility studies. This paper presents a two-phase framework for quantifying the cost of data collection using iterative accuracy/sample size predictions and active learning to guide/optimize full human annotation in medical ultrasound imaging for machine learning purposes. The paper demonstrated potential cost reductions using public breast, fetal, and lung ultrasound datasets and a practical case study on Breast Ultrasound. The results show that just as with clinical trials, the relationship between dataset size and final accuracy can be predicted, with the majority of accuracy improvements occurring using only 40-50% of the data dependent on tolerance measure. Manual annotation can be reduced further using active learning, resulting in a representative cost reduction of 66% with a tolerance measure of around 4% accuracy drop from theoretical maximums. The significance of this work lies in its ability to quantify how much additional data and annotation will be required to achieve a specific research objective. These methods are already well understood by clinical funders and so provide a valuable and effective framework for feasibility and pilot studies where machine learning will be applied within a fixed budget to maximize predictive gains, informing resourcing and further clinical study.
ORCID iDs
Lawley, Alistair ORCID: https://orcid.org/0000-0002-0903-1116, Hampson, Rory ORCID: https://orcid.org/0000-0001-7903-7460, Worrall, Kevin and Dobie, Gordon ORCID: https://orcid.org/0000-0003-3972-5917;-
-
Item type: Article ID code: 88092 Dates: DateEvent30 June 2024Published7 February 2024Published Online29 January 2024AcceptedSubjects: Technology > Electrical engineering. Electronics Nuclear engineering Department: Faculty of Engineering > Electronic and Electrical Engineering Depositing user: Pure Administrator Date deposited: 06 Feb 2024 12:00 Last modified: 22 Nov 2024 01:21 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/88092