Imaging and Computing Seminar

Facundo Memoli, Mathematics, Stanford

Characterization, stability and convergence of Hierarchical Clustering Schemes

Clustering is one of the most basic data analysis techniques, yet, not much is known about the theoretical properties of most clustering schemes. Jon Kleinberg has proved that there exists no clustering scheme that simultaneously satisfies three very natural axioms that he identifies. We prove that in the more relaxed context of hierarchical clustering methods, a set of axioms very similar to those of Kleinberg's yield existence and uniqueness instead of non-existence.

Other properties of this unique scheme can be established, namely stability and convergence. In order to do this, we represent dendrograms as ultrametric spaces and use tools from metric geometry to quantify the degree to which perturbations in the input metric space affect the result of hierarchical methods.

The unique scheme that we characterize turns out to be single linkage hierarchical clustering which is commonly criticized for exhibiting the so called 'chaining effect'. As a refinement of a classical observation by Hartigan, we prove that in the case when the input to our unique scheme comes from sampling an underlying measure metric space in an i.i.d. fashion, our convergence results imply that one asymptotically recovers a certain multiscale decomposition of the support of the underlying probability measure.

In a related vein, Hierarchical clustering can be seen as the 0-dimensional version of persistent Homology. We show how our stability results extend to k-dimensional persistence Homology invariants coming from Vietoris-Rips simplicial constructions.