Quantifying the specificity of near-duplicate image classification functions

Connor, Richard and Cardillo, Franco Alberto (2016) Quantifying the specificity of near-duplicate image classification functions. In: 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2016-02-27 - 2016-02-29.

[thumbnail of Connor-Cardillo-VISAPP-2016-quantifying-the-specificity-of-near-duplicate-image-classification-functions]
Preview
Text. Filename: Connor_Cardillo_VISAPP_2016_quantifying_the_specificity_of_near_duplicate_image_classification_functions.pdf
Accepted Author Manuscript

Download (787kB)| Preview

Abstract

There are many published methods for detecting similar and near-duplicate images. Here, we consider their use in the context of unsupervised near-duplicate detection, where the task is to find a (relatively small) near-duplicate intersection of two large candidate sets. Such scenarios are of particular importance in forensic near-duplicate detection. The essential properties of a such a function are: performance, sensitivity, and specificity. We show that, as collection sizes increase, then specificity becomes the most important of these, as without very high specificity huge numbers of false positive matches will be identified. This makes even very fast, highly sensitive methods completely useless. Until now, to our knowledge, no attempt has been made to measure the specificity of near-duplicate finders, or even to compare them with each other. Recently, a benchmark set of near-duplicate images has been established which allows such assessment by giving a near-duplicate ground truth over a large general image collection. Using this we establish a methodology for calculating specificity. A number of the most likely candidate functions are compared with each other and accurate measurement of sensitivity vs. specificity are given. We believe these are the first such figures be to calculated for any such function.