Imaging and Computing Seminar
Mauro Maggioni, Mathematics, Duke University
Title:
Intrinsic dimensionality estimation and multiscale geometry of data sets
Abstract:
The analysis of large data sets, modeled as point clouds in high dimensional spaces, is needed in a wide variety of applications such as
recommendation systems, search engines, molecular dynamics, machine learning, statistical modeling, just to name a few. Oftentimes it is
claimed or assumed that many data sets, while lying in high dimensional spaces, have indeed a low-dimensional structure. It may come perhaps
as a surprise that only very few, and rather sample-inefficient, algorithms exist to estimate the intrinsic dimensionality of these point
clouds. We present a recent multiscale algorithm for estimating the intrinsic dimensionality of data sets, under the assumption that they are
sampled from a rather tame low-dimensional object, such as a manifold, and perturbed by high dimensional noise. Under natural assumptions,
this algorithm can be proven to estimate the correct dimensionality with a number of points which is merely linear in the intrinsic dimension.
Experiments on synthetic and real data will be discussed. Furthermore, this algorithm opens the way to novel algorithms for exploring,
visualizing, compressing and manipulating certain classes of high-dimensional point clouds.