Imaging and Computing Seminar
Facundo Memoli, Mathematics, Stanford
Title:
Characterization, stability and convergence of Hierarchical Clustering
Schemes
Abstract:
Clustering is one of the most basic data analysis techniques, yet, not
much is known about the theoretical properties of most clustering
schemes. Jon Kleinberg has proved that there exists no clustering
scheme that simultaneously satisfies three very natural axioms that he
identifies. We prove that in the more relaxed context of hierarchical
clustering methods, a set of axioms very similar to those of
Kleinberg's yield existence and uniqueness instead of non-existence.
Other properties of this unique scheme can be established, namely
stability and convergence. In order to do this, we represent
dendrograms as ultrametric spaces and use tools from metric geometry
to quantify the degree to which perturbations in the input metric
space affect the result of hierarchical methods.
The unique scheme that we characterize turns out to be single linkage
hierarchical clustering which is commonly criticized for exhibiting
the so called 'chaining effect'. As a refinement of a classical
observation by Hartigan, we prove that in the case when the input to
our unique scheme comes from sampling an underlying measure metric
space in an i.i.d. fashion, our convergence results imply that one
asymptotically recovers a certain multiscale decomposition of the
support of the underlying probability measure.
In a related vein, Hierarchical clustering can be seen as the
0-dimensional version of persistent Homology. We show how our
stability results extend to k-dimensional persistence Homology
invariants coming from Vietoris-Rips simplicial constructions.