Evaluation of Jensen-Shannon distance over sparse data

Connor, Richard and Cardillo, Franco Alberto and Moss, Robert and Rabitti, Fausto; Brisaboa, Nieves and Pedreira, Oscar and Zezula, Pavel, eds. (2013) Evaluation of Jensen-Shannon distance over sparse data. In: Similarity Search and Applications. Lecture Notes in Computer Science, 8199 . Springer, ESP, pp. 163-168. ISBN 9783642410611 (https://doi.org/10.1007/978-3-642-41062-8_16)

[thumbnail of Connor-etal-LNCS2013-Evaluation-Jensen-Shannon-distance-over-sparse-data]
Preview
Text. Filename: Connor_etal_LNCS2013_Evaluation_Jensen_Shannon_distance_over_sparse_data.pdf
Accepted Author Manuscript

Download (249kB)| Preview

Abstract

Jensen-Shannon divergence is a symmetrised, smoothed version of Küllback-Leibler. It has been shown to be the square of a proper distance metric, and has other properties which make it an excellent choice for many high-dimensional spaces in R*. The metric as defined is however expensive to evaluate. In sparse spaces over many dimensions the Intrinsic Dimensionality of the metric space is typically very high, making similarity-based indexing ineffectual. Exhaustive searching over large data collections may be infeasible. Using a property that allows the distance to be evaluated from only those dimensions which are non-zero in both arguments, and through the identification of a threshold function, we show that the cost of the function can be dramatically reduced.