Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Mathematik

Forschungsseminar Mathematische Statistik

Für den Bereich Statistik

A. Carpentier, S. Greven, W. Härdle, M. Reiß, V. Spokoiny



Weierstrass-Institut für Angewandte Analysis und Stochastik
Mohrenstrasse 39
10117 Berlin



mittwochs, 10.00 - 12.00 Uhr


17. April 2024
Gil Kur (ETH Zurich)
Connections between Minimum Norm Interpolation and Local Theory of Banach Spaces
Abstract:  We investigate the statistical performance of "minimum norm'' interpolators in non-linear regression under additive Gaussian noise. Specifically, we focus on norms that satisfy either 2-uniform convexity or the cotype 2 property - these include inner-product spaces, lp norms, and Wp Sobolev spaces for 1 ≤   ≤ 2. Our main result demonstrates that under 2-uniform convexity, the bias of the minimal norm solution is bounded by the Gaussian complexity of the class. We then prove a "reverse'' Efron-Stein type estimate for the variance of the minimal norm solution under cotype 2 - that provides an optimal bound for over-parametrized linear regression. Our approach leverages tools from the local theory of finite dimensional Banach spaces, and, to the best of our knowledge, it is the first to study non-linear models that are "far'' from Hilbert spaces.
24. April 2024
Nicolas Verzelen (INRAE Montpellier)
Computational Trade-offs in High-dimensional Clustering
Abstract: In this talk, I will discuss the fundamental problem of clustering a mixture of isotropic Gaussian. After reviewing some results on K-means-type procedures and on some of its relaxations, I will investigate the existence of a fundamental computation-information gap for the problem of in the high-dimensional regime, where the ambient dimension p is larger than the number n of points. The existence of a computation-information gap in a specific Bayesian high-dimensional asymptotic regime has been conjectured by lesieur2016phase  based on the replica heuristic from statistical physics. We provide  evidence of the existence of such a gap generically in the high-dimensional regime p ≥ n, by  proving a non-asymptotic low-degree polynomials computational barrier for clustering in high-dimension, matching the performance of the best known polynomial time algorithms.
08. Mai 2024
Georg Keilbar, Ratmir Miftachov (Humboldt-Universität zu Berlin) 
Shapley Curves : A Smoothing Perspective
Abstract: This paper fills the limited statistical understanding of Shapley values as a variable importance measure from a nonparametric (or smoothing) perspective. We introduce population-level Shapley curves to measure the true variable importance, determined by the conditional expectation function and the distribution of covariates. Having defined the estimand, we derive minimax convergence rates and asymptotic normality under general conditions for the two leading estimation strategies. For finite sample inference, we propose a novel version of the wild bootstrap procedure tailored for capturing lower-order terms in the estimation of Shapley curves. Numerical studies confirm our theoretical findings, and an empirical application analyzes the determining factors of vehicle prices.
15. Mai 2024
Fabian Telschow (Humboldt-Universität zu Berlin)
Estimation of the Expected Euler Characteristic of Excursion sets of Random fields and Applications to Simultaneous Confidence bands
Abstract: The expected Euler characteristic (EEC) of excursion sets of a smooth Gaussian-related random field over a compact manifold can be used to approximate the distribution of its supremum for high thresholds. Viewed as a function of the excursion threshold, the EEC of a Gaussian-related field is expressed by the Gaussian kinematic formula (GKF) as a finite sum of known functions multiplied by the Lipschitz–Killing curvatures (LKCs) of the generating Gaussian field.
In the first part of this talk we present consistent estimators of the LKCs as linear projections of “pinned” Euler characteristic (EC) curves obtained from realizations of zero-mean, unit variance Gaussian processes. As observed data seldom is Gaussian, we generalize these LKC estimators by an unusual use of the Gaussian multiplier bootstrap to obtain consistent estimates of the LKCs of Gaussian limiting fields of non-stationary statistics. In the second part, we explain applications of LKC estimation and the GKF to simultaneous familywise error rate inference, for example, by constructing simultaneous confidence bands and CoPE sets for spatial functional data over complex domains such as fMRI and climate data and discuss their benefits and drawbacks compared to other methodologies.
22. Mai 2024
Vladimir Spokoiny (WIAS/ HU)
Gaussian Variational Inference in high dimension
Abstract: We consider the problem of approximating a high-dimensional distribution by a Gaussian one by minimizing the Kullback-Leibler divergence.
The main result extends Katsevich and Rigollet (2023) and claims that the minimiser can be well approximated by the Gaussian distribution with the mean and variance 
as for the underlying measure. We also describe the accuracy of approximation and the range of applicability for such approximation in terms of efficient dimension. 
The obtained results can be used for analysis of various sampling scheme in optimization.

29. Mai 2024

Tailen Hsing (University of Michigan)

A functional-data perspective in spatial data analysis 


Abstract: More and more spatiotemporal data nowadays can be viewed as functional data. The first part of the talk focuses on the Argo data, which is a modern oceanography dataset that provides unprecedented global coverage of temperature and salinity measurements in the upper 2,000 meters of depth of the ocean. I will discuss a functional kriging approach to predict temperature and salinity as a smooth function of depth, as well as a co-kriging approach of predicting oxygen concentration based on temperature and salinity data. In the second part of the talk, I will give an overview on some related topics, including spectral density estimation and variable selection for functional data.

05. Juni 2024    
Jia-Jie Zhu (WIAS Berlin)
Wasserstein and Beyond: Optimal Transport and Gradient Flows for Machine Learning and Optimization
Abstract: In the first part of the talk, I will provide an overview of gradient flows over non-negative and probability measures and their application in modern machine learning tasks, such as variational inference, sampling, training of over-parameterized models, and robust optimization. Then, I will present our recent results on the analysis of a couple of particularly relevant gradient flows, including the settings of Wasserstein, Hellinger/Fisher-Rao, and reproducing kernel Hilbert space. The focus is on the global exponential decay of the entropy functionals along the gradient flows such as Hellinger-Kantorovich (a.k.a. Wasserstein-Fisher-Rao) and a new type of gradient flow geometries that guarantee convergence of minimizing a maximum-mean discrepancy, which we term the interaction-force transport.


The talk is based on the joint works with Alexander Mielke, Pavel Dvurechensky, and Egor Gladin.

12. Juni 2024    
Marc Hallin (Université libre de Bruxelles)

The long quest for quantiles and ranks in Rd and on manifolds


Quantiles are a fundamental concept in probability, and an essential tool in statistics, from descriptive to inferential. Still, despite half a century of attempts, no satisfactory and fully agreed-upon definition of the concept, and the “dual” notion of ranks, is available beyond the well-understood case of univariate variables and distributions. The need for such a definition is particularly critical for varia- bles taking values in Rd, for directional variables (values on the hypersphere), and, more generally, for variables with values on manifolds. Unlike the real line, indeed, no canonical ordering is available on the- se domains. We show how measure transportation brings a solution to this problem by characterizing distribution-specific (data-driven, in the empirical case) orderings and center-outward distribution and quantile functions (ranks and signs in the empirical case) that satisfy all the properties expected from such concepts while reducing, in the case of real-valued variables, to the classical univariate notion.

19. Juni 2024  - WIAS Evaluation
26. Juni 2024    Achtung anderer Raum u. anderes Geb.: R. 3.13 im HVP 11a !
Clement Berenfeld (Universität Potsdam)
A theory of stratification learning
Abstract: Given i.i.d. sample from a stratified mixture of immersed manifolds of different dimensions, we study the minimax estimation of the underlying stratified structure. We provide a constructive algorithm allowing to estimate each mixture component at its optimal dimension-specific rate adaptively. The method is based on an ascending hierarchical co-detection of points belonging to different layers, which also identifies the number of layers and their dimensions, assigns each data point to a layer accurately, and estimates tangent spaces optimally. These results hold regardless of any ambient assumption on the manifolds or on their intersection configurations. They open the way to a broad clustering framework, where each mixture component models a cluster emanating from a specific nonlinear correlation phenomenon.
03. Juli 2024
Celine Duval (Université de Lille)
Geometry of excursion sets: computing the surface area from discretized points
Abstract: The excursion sets of a smooth random field carries relevant information in its various geometric measures. After an introduction of these geometrical quantities showing how they are related to the parameters of the field, we focus on the problem of discretization. From a computational viewpoint, one never has access to the continuous observation of the excursion set, but rather to observations at discrete points in space. It has been reported that for specific regular lattices of points in dimensions 2 and 3, the usual estimate of the surface area of the excursions remains biased even when the lattice becomes dense in the domain of observation. We show that this limiting bias is invariant to the locations of the observation points and that it only depends on the ambient dimension. (based on joint works with H. Biermé, R. Cotsakis, E. Di Bernardino and A. Estrade).
10. Juli 2024
Anya Katsevich (MIT)
Laplace asymptotics in high-dimensional Bayesian inference
Abstract: Computing integrals against a high-dimensional posterior is the major computational bottleneck in Bayesian inference. A popular technique to reduce this computational burden is to use the Laplace approximation (LA), a Gaussian distribution, in place of the true posterior. We derive a new, leading order asymptotic decomposition of integrals against a high-dimensional Laplace-type posterior which sheds valuable insight on the accuracy of the LA in high dimensions. In particular, we determine the tight dimension dependence of the approximation error, leading to the tightest known Bernstein von Mises result on the asymptotic normality of the posterior. The decomposition also leads to a simple modification to the LA which yields a higher-order accurate approximation to the posterior. Finally, we prove the validity of the high-dimensional Laplace asymptotic expansion to arbitrary order, which opens the door to approximating the partition function, of use in high-dimensional model selection and many other applications beyond statistics.
17. Juli 2024
findet nicht statt



 Interessenten sind herzlich eingeladen.

Für Rückfragen wenden Sie sich bitte an:

Frau Marina Filatova

Telefon: +49-30-2093-45460
Humboldt-Universität zu Berlin
Institut für Mathematik
Unter den Linden 6
10099 Berlin, Germany