Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Research Unit 1735

Multi-scale analysis of graphs

In modern applications of data analysis and statistics, data often has a complex structure that goes beyond Euclidean spaces. Instead of providing information about individual entities (“data points”), data often consists of information about relationships between such entities. Such data is represented by a graph. As examples consider interactions between neurons in the brain, dependencies of linked web pages in the world wide web, social interactions between people, relations between proteins in a metabolic network, the structure of complex chemical compounds, or interactions between sensors in a sensor network. It is a vast modern challenge to analyse graph-based, empirical data in a statistically sound way because graphs are discrete objects with little “mathematical structure” (such as distances, norms, etc), but complex dependency patterns. The high level goal of this project is to elucidate the latent structure of graph-based data and subsequently adapt statistical procedures to this latent structure. 
In the new funding period we considerably extend our focus from random geometric graphs (last funding period) to general graphs with finite doubling dimension, which can exhibit interesting multi-scale structure.
Our first line of work is devoted to the study of general mathematical properties of graphs with small doubling dimension. Our second line of work (“wavelets on graphs”) is devoted to further developments of the tools of statistical multiscale analysis on graphs. In the first funding period, we successfully constructed tight wavelet-type frames when the underlying sampling space of the vertices is a manifold. We will extend this framework to graphs whose underlying sampling space is a metric measure space with finite doubling dimension and local Poincaré inequality, then to more general models of random graphs (e.g. graphs with polynomial growth).
Applications to adaptive regression, density estimation (when appropriate in the context) and testing (see next point) will be precisely studied. The third line of work (``two sample tests on graphs'') considers the case where we are given two populations of random graphs. The task is to test whether these graphs have been generated according to the same underlying mechanism (for example, whether they come from the same metric measure space). Such tests are of prime importance in applied areas such as neuroscience.


The principal investigators are


Scientific staff is

  • Franziska Göbel (email).