Evaluation of Jensen-Shannon distance over sparse data

Connor, Richard and Cardillo, Franco Alberto and Moss, Robert and Rabitti, Fausto (2013) Evaluation of Jensen-Shannon distance over sparse data. In: Similarity Search and Applications. Lecture Notes in Computer Science, 8199 . Springer, Berlin, pp. 163-168. ISBN 9783642410611

[img]
Preview
Text (Connor-etal-LNCS2013-Evaluation-Jensen-Shannon-distance-over-sparse-data)
Connor_etal_LNCS2013_Evaluation_Jensen_Shannon_distance_over_sparse_data.pdf
Accepted Author Manuscript

Download (249kB)| Preview

    Abstract

    Jensen-Shannon divergence is a symmetrised, smoothed version of Küllback-Leibler. It has been shown to be the square of a proper distance metric, and has other properties which make it an excellent choice for many high-dimensional spaces in R*. The metric as defined is however expensive to evaluate. In sparse spaces over many dimensions the Intrinsic Dimensionality of the metric space is typically very high, making similarity-based indexing ineffectual. Exhaustive searching over large data collections may be infeasible. Using a property that allows the distance to be evaluated from only those dimensions which are non-zero in both arguments, and through the identification of a threshold function, we show that the cost of the function can be dramatically reduced.