## Forschungsseminar Mathematische Statistik

### Für den Bereich Statistik

S. Greven, W. Härdle, M. Reiß, V. Spokoiny

*Ort*

Weierstrass-Institut für Angewandte Analysis und Stochastik

Erhard-Schmidt-Raum

Mohrenstrasse 39

10117 Berlin

*Zeit*

mittwochs, 10.00 - 12.30 Uhr

*Programm *

- 16. Oktober 2019
**Alexey Onatskiy**(Cambridge University)- Spurious Factor Analysis
- Abstrakt: This paper draws parallels between the Principal Components Analysis of factorless high-dimensional nonstationary data and the classical spurious regression. We show that a few of the principal components of such data absorb nearly all the data variation. The corresponding scree plot suggests that the data contain a few factors, which is collaborated by the standard panel information criteria. Furthermore, the Dickey-Fuller tests of the unit root hypothesis applied to the estimated `idiosyncratic terms' often reject, creating an impression that a few factors are responsible for most of the non-stationarity in the data. We warn empirical researchers of these peculiar effects and suggest to always compare the analysis in levels with that in differences.
- 23. Oktober 2019
**Vladimir Spokoiny**(WIAS und HU Berlin)- Bayesian inference for nonlinear inverse problems
- Abstract: We discuss the properties of the posterior for a wide class of statistical models including nonlinear generalised regression and deep neuronal networks, nonlinear inverse problems, nonparametric diffusion, error-in-operator and IV models. The new calming approach helps to treat all such problems in a unified manner and to obtain tight finite sample results about Gaussian approximation of the posterior with an explicit error bound in term of so called effective dimension.
- 30. Oktober 2019
**N.N.**- 06. November 2019
**Charles Manski**North Western University, USA- ** This is the Hermann Otto Hirschfeld Lecture 2019 **
- Patient Care under Uncertainty
- Abstract: https://press.princeton.edu/titles/30223.html
- 13. November 2019
**Merle Behr**(University of California, Berkeley)- Learning compositional structures
- Abstract: Many data problems, in particular in biogenetics, often come with a highly complex underlying structure. This often makes is difficult to extract interpretable information. In this talk we want to demonstrate that often these complex structures are well approximated by a composition of a few simple parts, which provides very descriptive insights into the underlying data generating process. We demonstrate this with two examples.
- In the first example, the single components are finite alphabet vectors (e.g., binary components), which encode some discrete information. For instance, in genetics a binary vector of length n can encode whether or not a mutation (e.g., a SNP) is present at location i = 1,…,n in the genome. On the population level studying genetic variations is often highly complex, as various groups of mutations are present simultaneously. However, in many settings a population might be well approximated by a composition of a few dominant groups. Examples are Evolve&Resequence experiments where the outer supply of genetic variation is limited and thus, over time, only a few haplotypes survive. Similar, in a cancer tumor, often only a few competing groups of cancer cells (clones) come out on top.
- In the second example, the single components relate to separate branches of a tree structure. Tree structures, showing hierarchical relationships between samples, are ubiquitous in genomic and biomedical sciences. A common question in many studies is whether there is an association between a response variable and the latent group structure represented by the tree. Such a relation can be highly complex, in general. However, often it is well approximated by a simple composition of relations associated with a few branches of the tree.
- For both of these examples we first study theoretical aspects of the underlying compositional structure, such as identifiability of single components and optimal statistical procedures under probabilistic data model. Based on this, we find insights into practical aspects of the problem, namely how to actually recover such components from data.
- 20. November 2019
**Nikita Zhivotowskii**(Google Zürich)- Robust covariance estimation for vectors with bounded kurtosis
- Abstract: Let X be a centered random vector and assume that we want to estimate its covariance matrix. In this talk I will discuss the following result: if the random X satisfies the bounded kurtosis assumption, there is a covariance matrix estimator that given a sequence of n independent random vectors distributed according to X exhibits the optimal performance one would expect had X been a gaussian vector. The procedure also improves the current state-of-the-art regarding high probability bounds in the sub-gaussian case (sharp results were only known in expectation or with constant probability). In both scenarios the new bound does not depend explicitly on the dimension, but rather on the effective rank of the covariance matrix of X. The talk is based on the joint work with S. Mendelson "Robust covariance estimation under L4-L2 moment equivalence", to appear in AoS 2019.
- 27. November 2019
**Alain Celisse (U Lille)**- Kernelized change-points detection procedure
- Abstract: In this talk we discuss the multiple change-points detection problem when dealing with complex data.Our goal is to present a new procedure involving reproducing kernels and allowing us for detecting abrupt changes arising in the full distribution of the observations along the time (and not only in their means).
- The two-stage procedure we introduce is based first on dynamic programming, and second on a $l_0$-type penalty derived from a non-asymptotic model selection result. We will see that a key ingredient in the derivation is a new concentration inequalityapplying to vectors in a reproducing kernel Hilbert space.
- We will illustrate the practical behavior of our kernel change-point procedure on a wide range of simulated/real time-series. In particular we empirically validate our penalty since the resulting penalized criterion recovers the true (number of) change-points with high probability. We also infer the influence of the kernel on the final results in practice.
- 04. Dezember 2019
**Nils Bertschinger**(U Frankfurt)- Systemic Greeks: Measuring risk in financial networks
- Abstract: Since the latest financial crisis, the idea of systemic risk has received considerable interest. In particular, contagion effects arising from cross-holdings between interconnected financial firms have been studied extensively. Drawing inspiration from the field of complex networks, these attempts are largely unaware of models and theories for credit risk of individual firms. Here, we note that recent network valuation models extend the seminal structural risk model of Merton (1974). Furthermore, we formally compute sensitivities to various risk factors -- commonly known as Greeks -- in a network context. In the end, we present some numerical illustrations and discuss possible implications for measuring systemic risk as well as insurance pricing.
- 11. Dezember 2019
**N.N.**- 18. Dezember 2019
**N.N.**- 08. Januar 2020
**Dominik Liebl**(Universität Bonn)- Fast and Fair Simultaneous Confidence Bands for Functional Parameters
- Abstract: Quantifying uncertainty using confidence regions is a central goal of statistical inference. Despite this, methodologies for confidence bands in Functional Data Analysis are underdeveloped compared to estimation and hypothesis testing. This work represents a major leap forward in this area by presenting a new methodology for constructing simultaneous confidence bands for functional parameter estimates. These bands possess a number of striking qualities: (1) they have a nearly closed-form expression, (2) they give nearly exact coverage, (3) they have a finite sample correction, (4) they do not require an estimate of the full covariance of the parameter estimate, and (5) they can be constructed adaptively according to a desired criteria. One option for choosing bands we find especially interesting is the concept of fair bands which allows us to do fair (or equitable) inference over subintervals and could be especially useful in longitudinal studies over long time scales. Our bands are constructed by integrating and extending tools from Random Field Theory, an area that has yet to overlap with Functional Data Analysis. Authors: Dominik Liebl (University Bonn) and Matthew Reimherr (Penn State University)
- 15. Januar 2020
**Sven Wang**(U Cambridge)- Convergence rates for penalised least squares estimators in PDE-constrained regression problems
- Abstract: The main topic of the talk are convergence rates for penalised least squares (PLS) estimators in non-linear statistical inverse problems. Under some general conditions on the forward map, we prove convergence rates for PLS estimators. In our main example, the parameter f is an unknown heat conductivity function in a steady state heat equation [a second order elliptic PDE]. The observations consist of a noisy version of the solution $u[f]$ to the boundary value corresponding to $f$. The PDE-constrained regression problem is shown to be solved a minimax-optimal way. This is joint work with S. van de Geer and R. Nickl. If time permits, we will mention some related work, e.g. on the non-parametric Bayesian approach.
- 22. Januar 2020
**Jorge Matteu**(Universität Jaume I)- Complex Spatial and Spatio-temporal Point Process Dependencies: Linear ANOVA-type Models, Metrics and Barycenters and Predictive Stochastic Models of Crime
- Abstract: This talk overviews several lines of research showing a range of statistical methods for the analysis of complex spatial and spatio-temporal point process dependencies.
- Several methods to analyse structural differences among groups of replicated spatial, spatio-temporal and possibly marked point patterns are presented. We calculate a number of functional descriptors of each pattern to investigate departures from completely random patterns, both among subjects and groups. We also develop strategies for analysing the effects of several factors marginally within each factor level, and the effects due to interaction between factors. We consider the $K$-function and its mark-weighted version as particular descriptors of each pattern in our sample, and develop a set of statistics based on classical analysis of variance statistics and their analogues in functional data analysis. The statistical distributions of our functional descriptors and of our proposed tests are unknown, and thus we use bootstrap and permutation procedures to estimate the null distribution of our statistical test. A simulation study provides evidence of the validity and power of our procedures. Several applications in environmental and engineering problems will be presented.
- We also introduce the transport-transform (TT) metric between finite point patterns on a general space, which provide a unified framework for earlier point pattern metrics. Our main focus is on barycenters, i.e. minimizers of a q-th order Frechet functional with respect to these metrics. We present applications to geocoded data of crimes in Euclidean space and on a street network, illustrating that barycenters serve as informative summary statistics.
- Finally, statistical models to predict the risk of future crime across space and time have become widely used by police departments. Predictive policing methods are now widely employed, and policing programs that take them into account can result in statistically significant crime decreases. Crime has both varying patterns in space, related to features of the environment, economy, and policing, and patterns in time arising from criminal behavior. Serious crimes may also be presaged by minor crimes of disorder. Thus, these spatial and temporal patterns are generally confounded, requiring analyses to take both into account, and we propose considering spatio-temporal point process models that incorporate spatial features, near-repeat and retaliation effects, and triggering.
- 29. Januar 2020
**Nadja Klein**(HU Berlin)- Bayesian Regression Copulas
- Abstract: We propose a new semi-parametric distributional regression model based on a copula decomposition of the joint distribution of the vector of response values. The copula is high-dimensional and constructed by inversion of a pseudo regression with semi-parametric functions of covariates modeled using problem-specific regularized basis functions. By integrating out the basis coefficients, an implicit copula process on the covariate space is obtained, which we call a `regression copula'. We combine this with a non-parametric margin to define a copula model, where the entire distribution---including the mean and variance---of the response is a smooth semi-parametric function of the covariates. By construction we obtain marginally calibrated predictive densities. The copula can be estimated using exact or approximate Bayesian inference; the latter of which is scalable to highly parameterized models. Using three real data examples and a simulation study we illustrate the efficacy of these estimators and the copula model. The first example deals with half-hourly electricity spot prices as a function of demand and two time covariates using radial bases and horseshoe regularization. The second example constructs a spatial variable selection copula applied to a visual experiment using fMRI data. The third one adopts deep learning architectures to conduct a regression approach as an alternative to likelihood-free inference methods such as ABC or synthetic likelihood, which we illustrate along a prey-predator model from ecology.
- 05. Februar 2020
**Tim Sullivan**(FU Berlin)- tba
- 12. Februar 2020
**Alexandra Carpentier**(Universität Magdeburg)- tba

Interessenten sind herzlich eingeladen.

**Für Rückfragen wenden Sie sich bitte an:**

**Frau Andrea Fiebig**

**Mail: fiebig@mathematik.hu-berlin.de
Telefon: +49-30-2093-5860
Fax: +49-30-2093-5848
Humboldt-Universität zu Berlin
Institut für Mathematik
Unter den Linden 6
10099 Berlin, Germany**