Picture of neon light reading 'Open'

Discover open research at Strathprints as part of International Open Access Week!

23-29 October 2017 is International Open Access Week. The Strathprints institutional repository is a digital archive of Open Access research outputs, all produced by University of Strathclyde researchers.

Explore recent world leading Open Access research content this Open Access Week from across Strathclyde's many research active faculties: Engineering, Science, Humanities, Arts & Social Sciences and Strathclyde Business School.

Explore all Strathclyde Open Access research outputs...

Quantifying the specificity of near-duplicate image classification functions

Connor, Richard and Cardillo, Franco Alberto (2016) Quantifying the specificity of near-duplicate image classification functions. In: 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2016-02-27 - 2016-02-29.

[img]
Preview
Text (Connor-Cardillo-VISAPP-2016-quantifying-the-specificity-of-near-duplicate-image-classification-functions)
Connor_Cardillo_VISAPP_2016_quantifying_the_specificity_of_near_duplicate_image_classification_functions.pdf - Accepted Author Manuscript

Download (787kB) | Preview

Abstract

There are many published methods for detecting similar and near-duplicate images. Here, we consider their use in the context of unsupervised near-duplicate detection, where the task is to find a (relatively small) near-duplicate intersection of two large candidate sets. Such scenarios are of particular importance in forensic near-duplicate detection. The essential properties of a such a function are: performance, sensitivity, and specificity. We show that, as collection sizes increase, then specificity becomes the most important of these, as without very high specificity huge numbers of false positive matches will be identified. This makes even very fast, highly sensitive methods completely useless. Until now, to our knowledge, no attempt has been made to measure the specificity of near-duplicate finders, or even to compare them with each other. Recently, a benchmark set of near-duplicate images has been established which allows such assessment by giving a near-duplicate ground truth over a large general image collection. Using this we establish a methodology for calculating specificity. A number of the most likely candidate functions are compared with each other and accurate measurement of sensitivity vs. specificity are given. We believe these are the first such figures be to calculated for any such function.