## Forschungsseminar Mathematische Statistik

### Für den Bereich Statistik

S. Greven, W. Härdle, M. Reiß, V. Spokoiny

*Ort*

Weierstrass-Institut für Angewandte Analysis und Stochastik

Erhard-Schmidt-Raum

Mohrenstrasse 39

10117 Berlin

*Zeit*

mittwochs, 10.00 - 12.00 Uhr

*Programm *

- Achtung!
**Aufgrund der aktuellen Situation finden die Vorträge bis auf Weiteres online unter: https://zoom.us/j/159082384 statt.**

- 04. November 2020.
**ACHTUNG! Beginn: 11.30 Uhr** **Pierre E. Jacob**(Harvard University, Cambridge, USA)- Unbiased Markov chain Monte Carlo methods with couplings
- Abstract: Various tasks in statistics involve numerical integration, for which Markov chain Monte Carlo (MCMC) methods are state-of-the-art. MCMC methods yield estimators that converge to integrals of interest in the limit of the number of iterations. This iterative asymptotic justification is not ideal; first, it stands at odds with current trends in computing hardware, with increasingly parallel architectures; secondly, the choice of "burn-in" and of the total number of iterations are difficult. This talk will describe recently proposed estimators that are exactly unbiased for the expectations of interest, while having an almost surely finite computing cost and a finite variance. They can thus be generated independently in parallel and averaged over. We argue that the removal of the "burn-in" bias can be done for many MCMC algorithms of practical interest without inflating the variance by much. The core idea is to use coupled pairs of Markov chains following the pathbreaking work of Peter Glynn & Chan-Han Rhee. The method also provides practical upper bounds on some distances between the marginal distribution of the chain at a finite step and its invariant distribution. This talk will provide an overview of this line of research, joint work with John O'Leary, Yves Atchadé and many others.

The main reference for this presentation is available at https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12336 - 11. November 2020
**N.N.**- 18. November 2020
**Matthias Löffler**(ETH Zürich)- Computationally efficient sparse clustering
- Abstract: We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. For our analysis, we consider the model X_i=z_i\theta+\varepsilon_i, z_i \in \{-1,1\}, \varepsilon_i \thicksim \mathcal{N}(0,I_p), which has two clusters with centres \theta and -\theta. We provide a finite sample analysis of a new sparse clustering algorithm based on sparse PCA and show that it achieves the minimax optimal misclustering rate in the regime \|\theta\| \rightarrow \infty, matching asymptotically the Bayes error.

Our results require the sparsity to grow slower than the square root of the sample size. Using a recent framework for computational lower bounds— the low-degree likelihood ratio—we give evidence that this condition is necessary for any polynomial-time clustering algorithm to succeed below the BBP threshold. This complements existing evidence based on reductions and statistical query lower bounds. Compared to these existing results, we cover a wider set of parameter regimes and give a more precise understanding of the runtime required and the misclustering error achievable.

This is joint work with A.S. Wein (NYU) and A.S. Bandeira (ETH Zürich). - 25. November 2020
**Alexander Kreiss**(University Leuven)- Abstract: Correlation bounds, mixing and m-dependence under random time-varying network distances with an application to Cox-Processes
- Abstract: We will consider multivariate stochastic processes indexed either by vertices or pairs of vertices of a dynamic network. Under a dynamic network we understand a network with a fixed vertex set and an edge set which changes randomly over time. We will assume that the spatial dependence-structure of the processes conditional on the network behaves in the following way: Close vertices (or pairs of vertices) are dependent, while we assume that the dependence decreases conditionally on that the distance in the network increases. We make this intuition mathematically precise by considering three concepts based on correlation, beta-mixing with time-varying beta-coefficients and conditional independence. These concepts allow proving weak-dependence results, e.g. an exponential inequality. These results will be useful for studying asymptotic regimes in which the network grows (i.e. the number of vertices tends to infinity) and we can expect that a growing network results in a larger distance between most vertices. Since the network is dynamic we allow that vertices move around the network and come close to different vertices. In order to demonstrate the use of these concepts in an application we study the asymptotics (for growing networks) of a goodness of fit test in a dynamic interaction network model based on a Cox-type model for counting processes. This model is then applied to bike-sharing data.
- 02. Dezember 2020
**Egor Klochkov**(U Cambridge)- Robust K-means clustering under two moments
- Abstract: We consider the robust algorithms for the k-means clustering problem where a quantizer is constructed based on N independent observations. Our main results are median of means based non-asymptotic excess distortion bounds {that} hold under the two bounded moments assumption in a general separable Hilbert space. In particular, our results extend the renowned asymptotic result of Pollard(1981) who showed that the existence of two moments is sufficient for strong consistency of an empirically optimal quantizer in R^d. In a special case of clustering in R^d, under two bounded moments, we prove matching (up to constant factors) non-asymptotic upper and lower bounds on the excess distortion, which depend on the probability mass of the lightest cluster of an optimal quantizer. Our bounds have the sub-Gaussian form, and the proofs are based on the versions of uniform bounds for robust mean estimators. This is joint work with Alexei Kroshnin and Nikita Zhivotovskiy.
- 11. Dezember 2020
**Judith Rousseau**(Oxford)- (Kolloquium SFB 1294, exceptionally Friday 10 a.m.!)
- Bayesian nonparametric estimation in multivariate non linear Hawkes processes
- Abstract: Hawkes processes form a class of point processes describing self and inter exciting/inhibiting processes. There is now a renewed interest of such processes in applied domains and in machine learning, but there exists only limited theory about inference in such models. To be more precise, the intensity function of a univariate Hawkes process has the following form:
- λ(t) = ∫_0^{t^-} h(t−s)dN_s +
*v* - where N is the Hawkes process and ν > 0. Multivariate Hawkes processes have a similar intensity function which involves the interractions functions between the different components of the process. To allow for negative interaction functions h, one considers non linear Hawkes processes
- λ(t) = φ_θ( ∫_0^{t^-} h(t−s)dN_s +
*v*) - In this work we propose a generic Bayesian non parametric procedure in such models and we study its theoretical properties, both in terms of the estimation of the parameters which are the impulsions ν and the interactions functions together with possibly the parameters θ of the non linear functional φ and in terms of the graph of interactions. As a consequence of these results we also obtain theoretical guaranties for Bayesian tests on the existence of an interaction function.
- 16. Dezember 2020
**Howard Bondell**(University of Melbourne)- Bayesian regression for high-dimensional data using a prior on the model fit
- Abstract: We propose a new class of prior distributions for shrinkage regularization in sparse linear regression, particularly the high dimensional case. Instead of placing a prior on the coefficients themselves, we place a prior on the regression R-squared. This is then distributed to the coefficients in a suitable manner. We show that this framework induces a shrinkage prior that can obtain desirable theoretical properties. Compared to existing shrinkage priors, it can simultaneously achieve both high prior concentration at zero, as well as heavier tails. These two properties combine to provide a higher degree of shrinkage on the irrelevant coefficients, along with less bias in estimation of the larger signals. Both theoretical and practical aspects will be discussed.
- 06. Januar 2021
**Alexey Naumov**(HSE Moscow)- Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise
- Abstract: Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is o(1/𝑘^𝑐) and the steady-state term is O(1/𝑘), where c>1 and 𝑘 is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of \Omega(1/𝑘). A simple numerical experiment is presented to support our theory. The talk is based on the recent paper by Maxim Kaledin, Eric Moulines, Alexey Naumov, Vladislav Tadic, Hoi-To Wai ; Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:2144-2203, 2020.
- 13. Januar 2021
**Vladimir Spokoiny**(WIAS Berlin)- Bayesian inference in machine learning
- Abstract: Statistical inference for binary data is one of the central problem in machine learning which can be treated within a nonparametric Bernoulli model. In many situations the underlying Bernoulli parameter may approach the edges zero or one that leads to inference for non-regular models. Properties like asymptotic normality of the MLE or of the posterior distribution are not valid in such irregular cases. The approach of this paper suggests to impose a Gaussian prior on the canonical Bernoulli parameter, or, analogously, to consider a penalized MLE with a quadratic penalty. A Gaussian prior can be viewed as a kind of a barrier preventing the canonical parameter to approach infinity and forcing it to stay within the region of regularity. The main results stay the classical results like asymptotic normality of the posterior distribution or asymptotic normality of the penalized MLE for any configuration of Bernoulli parameters under the assumption that the prior is strong enough in direction. We also demonstrate how the results can be applied to analysis of random graphs, ranking problems, and nonparametric binary classification.
- 20. Januar 2021
**Sven Wang**(U Cambridge)- On polynomial-time computation of high-dimensional posterior measures by Langevin-type algorithms
- Abstract: In this talk, we consider the task of generating random samples of high-dimensional posterior distributions. The main results consist of non-asymptotic computational guarantees for Langevin-type MCMC algorithms which scale polynomially in key quantities such as the dimension of the model, the desired precision level, and the number of statistical measurements. We also show that posterior mean vectors and maximum a posteriori (MAP) estimates are computable in polynomial time, with high probability under the distribution of the data. Those results are derived in a general non-linear regression setting where posterior measures are not necessarily log-concave. The main example is a non-linear PDE model involving the steady-state Schrödinger equation. The talk is based on the preprint https://arxiv.org/abs/2009.05298
- 27. Januar 2021
**N.N.**- 03. Februar 2021
**N.N.**- 10. Februar 2021
**Eftychia Solea**(ENSAI)- Nonparametric and high-dimensional functional graphical models
- Abstract: We consider the problem of constructing nonparametric undirected graphical models for high-dimensional functional data. Most existing statistical methods in this context assume either a Gaussian distribution on the vertices or linear conditional means. In this article we provide a more flexible model which relaxes the linearity assumption by replacing it by an arbitrary additive form. The use of functional principal components offers an estimation strategy that uses a group lasso penalty to estimate the relevant edges of the graph. We establish concentration inequalities for the resulting estimators allowing both the number of predictors and the number of functional principal components to diverge to infinity with increasing sample size. We also investigate the empirical performance of our method through simulation studies and a real data application.
- 17. Februar 2021
**Georg Keilbar**(IRTG 1792 / BRC, HU Berlin)- Finite sample bounds for quantiles using SGD
- Abstract: Stochastic gradient descent (SGD) is a computationally and memory efficient algorithm for parameter estimation in online settings. In situations in which the data can be contaminated with noise, robust location parameters such as quantiles may be preferred to the mean. This talk aims at studying the non-asymptotic behavior of the SGD estimate for quantiles with Ruppert-Polyak averaging. In particular, we derive exponential probability bounds under general assumptions.
- 24. Februar 2021
**Torsten Hothorn**(Universität Zürich)- Understanding and Applying Transformation Models
- Abstract: The mlt package implements maximum likelihood estimation in the class of conditional transformation models. Based on a suitable explicit parameterization of the unconditional or conditional transformation function using infrastructure from package basefun, we show how one can define, estimate, and compare a cascade of increasingly complex transformation models in the maximum likelihood framework. Models for the unconditional or conditional distribution function of any univariate response variable are set up and estimated in the same computational framework simply by choosing an appropriate transformation function and parameterization thereof. As it is computationally cheap to evaluate the distribution function, models can be estimated by maximization of the exact likelihood, especially in the presence of random censoring or truncation. The relatively dense high-level implementation in the R system for statistical computing allows generalization of many established implementations of linear transformation models, such as the Cox model or other parametric models for the analysis of survival or ordered categorical data, to the more complex situations illustrated in this talk.

Slides: http://user.math.uzh.ch/hothorn/talks/mlt_Berlin_2021.pdf

Interessenten sind herzlich eingeladen.

**Für Rückfragen wenden Sie sich bitte an:**

**Frau Andrea Fiebig**

**Mail: fiebig@mathematik.hu-berlin.de
Telefon: +49-30-2093-45460
Fax: +49-30-2093-45451
Humboldt-Universität zu Berlin
Institut für Mathematik
Unter den Linden 6
10099 Berlin, Germany**