# Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Mathematik

Forschungsseminar Mathematische Statistik

## Forschungsseminar Mathematische Statistik

### A. Carpentier, S. Greven, W. Härdle, M. Reiß, V. Spokoiny

#### Ort

Weierstrass-Institut für Angewandte Analysis und Stochastik
Erhard-Schmidt-Raum
Mohrenstrasse 39
10117 Berlin

#### Zeit

mittwochs, 10.00 - 12.00 Uhr

#### Programm

Achtung!
The seminar will be hybrid and realized via Zoom. Our lecture room ESH has according to hygiene recommendations only a capacity of 16 people. If you intend to come to same of the talks in person, you must register for our mailinglist with Andrea Fiebig (fiebig@math.hu-berlin.de). Prior to each talk a doodle will be created where it is mandatory to sign in for attendance in person. Therefore, it is mandatory for those who want to participate in person to register (put your name in the list) using the doodle link sent by e-mail before the lecture. Please follow the streamed talk, if 16 guests have already registered under the zoom link (to be inquired at fiebig@math.hu-berlin.de).
The so-called ''3G rule'' applies at the Weierstrass Institute

20. Oktober 2021
N.N.
27. Oktober 2021
N.N.
03. November 2021
Eugene Stepanov (Steklov Mathematical Institute of the Russian Academy of Sciences, St.Petersburg)
The story of a fish in a turbulent ocean: how to survive and how to return home
Abstract: Can a fish with limited velocity capabilities reach any point in the (possibly unbounded) ocean? In a recent paper by D. Burago, S. Ivanov and A. Novikov, "A survival guide for feeble fish", an affirmative answer has been given under the condition that the fluid velocityfield is incompressible, bounded and has vanishing mean drift. This brilliant result extends some known point-to-point global controllability theorems though being substantially non constructive. We will give a fish a different recipe of how to survive in a turbulent ocean, and show how this is related to structural stability of dynamical systems by providing a constructive way to change slightly a divergence free vector field with vanishing mean drift to produce a non dissipative dynamics. This immediately leads to closing lemmas for dynamical systems, in particular to C. Pugh's closing lemma, saying also that the fish can eventually return home. Joint work with Sergey Kryzhevich (St. Petersburg)
10. November 2021
Marc Hoffmann (Université Paris-Dauphine)
Some statistical inference results for interacting particle models in a mean-field limit
Abstract: We propose a systematic — theoretical — statistical analysis for systems of interacting diffusions, possibly with common noise and/or degenerate diffusion components, in a mean-field regime. These models are more or less widely used in finance, MFG, systemic risk analysis, behaviourial sociology or ecology. We consider several inference issues such as: i) nonparametric estimation of the solution of the underlying Fokker-Planck type equation or the drift of the system ii) testing for the interaction between components iii) estimation of the interaction range between particles. This talk is based on joint results with C. Fonte, L. Della Maestra and R. Maillet.
!!! There will be a 2nd seminar in the afternoon (2 pm, Dorotheenstraße 1, R.005) by !!!
Nikita Zhivotovsky (Google Research)
Distribution-Free Robust Linear Regression
Abstract: We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. When learning without assumptions on the covariates, we establish boundedness of the conditional second moment of the response variable as a necessary and sufficient condition for achieving deviation-optimal excess risk bounds. First, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Györfi, Kohler, Krzyzak, and Walk. However, in spite of its optimal in-expectation performance, we show that this procedure fails with constant probability for some distributions. Combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order O(d/n) with the optimal sub-exponential tail. Joint work with Jaouad Mourtada (CREST, ENSAE) and Tomas Vaškevičius (University of Oxford).
17. November 2021
Christophe Giraud (Institut de Mathématiques d'Orsay, Université Paris-Saclay)
A Geometric Approach to Fair Online Learning
Abstract: Machine learning is ubiquitous in daily decisions and producing fair and non-discriminatory predictions is a major societal concern. Various criteria of fairness have been proposed in the literature, and we will start with a (biased!) tour on fairness concepts in machine learning. Many decision problems are of a sequential nature, and efforts are needed to better handle such settings. We consider a general setting of fair online learning with stochastic sensitive and non-sensitive contexts. We propose a unified approach for fair learning in this setting, by interpreting this problem as an approachability problem. This point of view offers a generic way to produce algorithms and theoretical results. Adapting Blackwell’s approachability theory, we exhibit a general necessary and sufficient condition for some learning objectives to be compatible with some fairness constraints, and we characterize the optimal trade-off between the two, when they are not compatible.
(joint work with E. Chzhen and G. Stoltz)
24. November 2021
Matthew Reimherr (Penn State University)  (online - 2-4 pm)
Pure Differential Privacy in Functional Data Analysis
Abstract: We consider the problem of achieving pure differential privacy in the context of functional data analysis, or more general nonparametric statistics, where the summary of interest can naturally be viewed as an element of a function space. In this talk I will give a brief overview and motivation for differential privacy before delving into the challenges that arise in the sanitization of an infinite dimensional summary. I will present a new mechanism, called the Independent Component Laplace Process, for achieving privacy followed by several applications and examples.
01. Dezember2021
Nikita Puchkin (Higher School of Economics, Moscow) (online talk)
Rates of convergence for density estimation with GANs
Abstract: We undertake a thorough study of the non-asymptotic properties of the vanilla generative adversarial networks (GANs). We derive theoretical guarantees for the density estimation with GANs under a proper choice of the deep neural networks classes representing generators and discriminators. In particular, we prove that the resulting estimate converges to the true density 𝗉∗ in terms of Jensen-Shannon (JS) divergence at the rate (logn/n)2β/(2β+d) where n is the sample size and β determines the smoothness of 𝗉∗. Moreover, we show that the obtained rate is minimax optimal (up to logarithmic factors) for the considered class of densities.
08. Dezember 2021
Davy Paindaveine (Université libre de Bruxelles)
Hypothesis testing on high-dimensional spheres: the Le Cam approach
Abstract: Hypothesis testing in high dimensions has been a most active research topics in the last decade. Both theoretical and practical considerations make it natural to restrict to sign tests, that is, to tests that uses observations only through their directions from a given center. This obviously maps the original Euclidean problem to a spherical one, still in high dimensions. With this motivation in mind, we tackle two testing problems on high-dimensional spheres, both under a symmetry assumption that specifies that the distribution at hand is invariant under rotations with respect to a given axis. More precisely, we consider the problem of testing the null hypothesis of uniformity ("detecting the signal") and the problem of testing the null hypothesis that the symmetry axis coincides with a given direction ("learning the signal direction"). We solve both problems by exploiting Le Cam's asymptotic theory of statistical experiments, in a double- or triple-asymptotic framework. Interestingly, contiguity rates depend in a subtle way on how well the parameters involved are identified as well as on a possible further antipodally-symmetric nature of the distribution. In many cases, strong optimality results are obtained from local asymptotic normality. When this cannot be achieved, it is still possible to establish minimax rate optimality.
05. Januar 2022
Alexandra Suvorikova (WIAS Berlin) (online talk)
Robust k-means in metric spaces and spaces of probability measures
12. Januar 2022
Martin Wahl (HU Berlin)
Lower bounds for invariant statistical models with applications to PCA
Abstract: This talk will be concerned with nonasymptotic lower bounds for the estimation of principal subspaces. I will start by reviewing some previous methods, including the local asymptotic minimax theorem and the Grassmann approach. Then I will present a new approach based on a van Trees inequality (i.e. a Bayesian version of the Cramér-Rao inequality) tailored for invariant statistical models. As applications, I will provide nonasymptotic lower bounds for principal component analysis, the matrix denoising problem and the phase synchronization problem.
19. Januar 2022
Denis Belomestny (Universität Duisburg-Essen)
Minimax bounds on the sample complexity of reinforcement learning with a generative model
Abstract: We consider the problem of learning the optimal value function in discounted-reward Markov decision processes (MDPs). We analyze the sample complexity of a new upper-value iteration procedure in the presence of a generative model of the MDP. The main result indicates that for an MDP with $N$ state-action pairs, only $O(N log(N)/\epsilon)$ state-transition samples are required to find an $\epsilon$-optimal estimate of the corresponding value function with high probability. This bound should be contrasted to $O(N log(N)/\epsilon^2)$ complexity bound for estimating the action-value function. We also discuss the optimality of the obtained complexity bound.
26. Januar 2022
Pierre Jacob  (ESSEC Paris) (online talk)
Some methods based on couplings of Markov chain Monte Carlo algorithms
Abstract: Markov chain Monte Carlo algorithms are commonly used to approximate a variety of probability distributions, such as posterior distributions arising in Bayesian analysis. I will review the idea of coupling in the context of Markov chains, and how this idea not only leads to theoretical analyses of Markov chains but also to new Monte Carlo methods. In particular, the talk will describe how coupled Markov chains can be used to obtain 1) unbiased estimators of expectations and of normalizing constants, 2) non-asymptotic convergence diagnostics for Markov chains, and 3) unbiased estimators of the asymptotic variance of MCMC ergodic averages.
02. Februar 2022
Alessandra Menafoglio (MOX - Dept. of Mathematics, Politecnico di Milano)
Object Oriented Data Analysis in Bayes spaces: from distributional data to the analysis of complex shapes
Abstract: In the presence of increasingly massive and heterogeneous data, the statistical modeling of distributional observations plays a key role. Choosing the 'right' embedding space for these data is of paramount importance for their statistical processing, to account for their nature and inherent constraints. The Bayes space theory is a natural embedding space for (spatial) distributional data, and was successfully applied in varied settings. In this presentation, I will discuss the state-of-the-art methods for the modelling, analysis, and prediction of distributional data, with a particular attention to cases when their spatial dependence cannot be neglected. I will embrace the viewpoint of object-oriented spatial statistics (O2S2), a system of ideas for the analysis of complex data with spatial dependence. All the theoretical developments will be illustrated through their application on real data, highlighting the intrinsic challenges of a statistical analysis which follows the Bayes spaces approach. Applications will cover a varied range of fields, from the assessment of COVID-19 on mortality data to the analysis of complex shapes produced in additive manufacturing.
09. Februar 2022
Ervan Scornet (France)
Variable importance in random forests
Abstract: Nowadays, machine learning procedures are used in many fields with the notable exception of so-called sensitive areas (health, justice, defense, to name a few) in which the decisions to be taken are fraught with consequences. In these fields, it is necessary to obtain a precise decision but, to be effectively applied, these algorithms must provide an explanation of the mechanisms that lead to the decision and, in this sense, be interpretable. Unfortunately, the most accurate algorithms today are often the most complex. A classic technique to try to explain their predictions is to calculate indicators corresponding to the strength of the dependence between each input variable and the output to be predicted. In this talk, we will focus on variable importances designed for the original random forest algorithm: the Mean Decreased Impurity (MDI) and the Mean Decrease Accuracy (MDA). We will see how theoretical results provide guidance for their practical uses.
16. Februar 2022
Celine Duval
tba


Interessenten sind herzlich eingeladen.

Für Rückfragen wenden Sie sich bitte an:

##### Frau Andrea Fiebig

Mail: fiebig@mathematik.hu-berlin.de
Telefon: +49-30-2093-45460
Fax:        +49-30-2093-45451
Humboldt-Universität zu Berlin
Institut für Mathematik
Unter den Linden 6
10099 Berlin, Germany