Picture of scraped petri dish

Scrape below the surface of Strathprints...

The Strathprints institutional repository is a digital archive of University of Strathclyde research outputs. Explore world class Open Access research by researchers at Strathclyde, a leading technological university.

Explore

Clustering methods based on variational analysis in the space of measures

Van Lieshout, M.N.M. and Molchanov, I.S. and Zuev, S. (2001) Clustering methods based on variational analysis in the space of measures. Biometrika, 88 (4). pp. 1021-1033. ISSN 1464-3510

[img]
Preview
PDF (strathprints004606.pdf)
strathprints004606.pdf

Download (366kB) | Preview

Abstract

We formulate clustering as a minimisation problem in the space of measures by modelling the cluster centres as a Poisson process with unknown intensity function.We derive a Ward-type clustering criterion which, under the Poisson assumption, can easily be evaluated explicitly in terms of the intensity function. We show that asymptotically, i.e. for increasing total intensity, the optimal intensity function is proportional to a dimension-dependent power of the density of the observations. For fixed finite total intensity, no explicit solution seems available. However, the Ward-type criterion to be minimised is convex in the intensity function, so that the steepest descent method of Molchanov and Zuyev (2001) can be used to approximate the global minimum. It turns out that the gradient is similar in form to the functional to be optimised. If we discretise over a grid, the steepest descent algorithm at each iteration step increases the current intensity function at those points where the gradient is minimal at the expense of regions with a large gradient value. The algorithm is applied to a toy one-dimensional example, a simulation from a popular spatial cluster model and a real-life dataset from Strauss (1975) concerning the positions of redwood seedlings. Finally, we discuss the relative merits of our approach compared to classical hierarchical and partition clustering techniques as well as to modern model based clustering methods using Markov point processes and mixture distributions.