# Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Research Unit 1735

Multi-scale analysis of graphs

## Multi-scale analysis of graphs

In modern applications of data analysis and statistics, data often has a complex structure that goes beyond Euclidean spaces. Instead of providing information about individual entities  (“data points”), data often consists of information about relationships between such entities. Such data is represented by a graph. As examples consider interactions between neurons in the brain, dependencies of linked web pages in the world wide web, social interactions between people, relations between proteins in a metabolic network, the structure of complex chemical compounds, or interactions between sensors in a sensor network. It is a vast modern challenge to analyse graph-based, empirical data in a statistically sound way because graphs are discrete objects with little “mathematical structure” (such as distances, norms, etc), but complex dependency patterns. The high level goal of this project is to elucidate the latent structure of graph-based data and subsequently adapt statistical procedures to this latent structure. In particular, we want to develop tools to analyse graphs on many scales. Our first line of work (“Scale-adapted random walks”) deals with unsupervised techniques to adaptively infer the multi-scale structure (hierarchical cluster structure) of a random graph. The second line (“Wavelet frames on graphs”) is complementary to the first and aims at constructing a set of localised basis functions that is adapted to the multi-scale structure of the graph.

The common goal of these two lines of work is to improve significantly the efficiency of statistical methods on graph data by making them adaptive to the inferred structure. In particular, one prominent application is classification or regression on graphs. We are given a graph, and some of the vertices in the graph come with labels. The goal is to extend the labelling to all other vertices in such a way that the cluster and smoothness properties of the graph are taken into account. One can solve this problem by representing the target function as a weighted sum of certain basis functions, where the weights have to be estimated from the data. Here it is crucial that the basis functions are adapted to the cluster structure of the graph.

The principal investigators are

Scientific staff are