Synthetic maps for navigating high-dimensional data spaces – Professor Alessandro Laio, SISSA Trieste

December 4, 2019 @ 2:15 pm – 3:15 pm
Department of Chemistry
Unilever lecture theatre
Lisa Masters

The analysis of large databases aims at obtaining a synthetic description of a system revealing its salient features.
We will describe an approach for charting complex and heterogeneous data spaces, providing a topography of the high-dimensional probability distribution from which the data are harvested. This topography includes information on the number and the height of the probability peaks, the depth of the “valleys” separating them, the relative location of the peaks and their hierarchical organization. The topography is reconstructed by using an unsupervised variant of Density Peak clustering[Science, 1492, vol 322 (2014)] exploiting a non-parametric density estimator[JCTC ,1206, vol 14 , (2018) ], which automatically measures the density in the manifold containing the data[Sci Rep. 12140, vol 7 (2017)]. Importantly, the density estimator provides an estimate of the error. This is a key feature, which allows distinguishing genuine probability peaks from density fluctuations due to finite sampling.