Direkt zum InhaltDirekt zur SucheDirekt zur Navigation
▼ Zielgruppen ▼

# Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Mathematik

Forschungsseminar Mathematische Statistik

## Forschungsseminar Mathematische Statistik

### S. Greven, W. Härdle, M. Reiß, V. Spokoiny

#### Ort

Weierstrass-Institut für Angewandte Analysis und Stochastik
Erhard-Schmidt-Raum
Mohrenstrasse 39
10117 Berlin

#### Zeit

mittwochs, 10.00 - 12.00 Uhr

#### Programm

Achtung!
Aufgrund der aktuellen Situation finden die Vorträge bis auf Weiteres online unter:    https://zoom.us/j/159082384    statt.

22. April 2020
Gabriel Peyre (ENS Paris)
Scaling Optimal Transport for High dimensional Learning
Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will review entropic regularization methods which define geometric loss functions approximating OT with a better sample complexity. More information and references can be found on the website of our book "Computational Optimal Transport" https://optimaltransport.github.io/
29. April 2020
Thorsten Dickhaus (University of Bremen)
How many null hypotheses are false?
Abstract: Under the multiple testing framework, estimating the proportion $\pi_0$ of true null hypotheses is informative for various reasons. On the one hand, in applications like quality control or anomaly detection, the presence of a certain number of untypical data points already indicates the necessity for an intervention, no matter which of the data points are responsible for that. On the other hand, data-adaptive multiple test procedures incorporate an estimate of $\pi_0$ into their decision rules in order to optimize power. Many classical estimators of $\pi_0$ rely on the empirical cumulative distribution function (ecdf) of all marginal $p$-values, and implicitly require that the ecdf of those $p$-values which correspond to true null hypotheses is in some sense close to the main diagonal in the unit square. We will discuss three sources of violation of the latter requirement, namely (i) discreteness of the statistical model under investigation (the expected ecdf has jumps), (ii) dependencies among the marginal $p$-values (leading to clustering effects), and (iii) testing composite null hypotheses ($p$-values are super-uniform under the null). Modifications of classical estimators of $\pi_0$ will be discussed to tackle these issues. Applications include multiple testing for replicability of scientific discoveries, particularly in the context of biomarker identification. The presentation is based on [1] - [4].
References:
[1] Thorsten Dickhaus, Klaus Straßburger, Daniel Schunk, Carlos Morcillo-Suarez, Thomas Illig, Arcadi Navarro (2012). How to analyze many contingency tables simultaneously in genetic association studies. Statistical Applications in Genetics and Molecular Biology, Vol. 11, No. 4, Article 12.
[2] Thorsten Dickhaus (2013). Randomized p-values for multiple testing of composite null hypotheses. Journal of Statistical Planning and Inference, Vol. 143, No. 11, 1968-1979.
[3] André Neumann, Taras Bodnar, Thorsten Dickhaus (2017). Estimating the Proportion of True Null Hypotheses under Copula Dependency. Research Report 2017:09, Mathematical Statistics, Stockholm University.
[4] Anh-Tuan Hoang, Thorsten Dickhaus (2019). Randomized p-values for multiple testing and their application in replicability analysis. Preprint, available at arXiv.org > stat > arXiv:1912.06982.
06. Mai 2020
Julia Schaumburg(FU Amsterdam)
Dynamic clustering of multivariate panel data
Abstract: We propose a dynamic clustering model for studying time-varying group structures in multivariate panel data. The model is dynamic in three ways: First, the cluster means and covariance matrices are time-varying to track gradual changes in cluster characteristics over time. Second, the units of interest can transition between clusters over time based on a Hidden Markov model (HMM). Finally, the HMM's transition matrix can depend on lagged cluster distances as well as economic covariates. Monte Carlo experiments suggest that the units can be classified reliably in a variety of settings. An empirical study of 299 European banks between 2008Q1 and 2018Q2 suggests that banks have become less diverse over time in key characteristics. On average, approximately 3\% of banks transition each quarter. Transitions across clusters are related to cluster dissimilarity and differences in bank profitability.
13. Mai 2020
N.N.

20. Mai 2020
Chiara Amorino (Université d'Evry Paris-Saclay)
Invariant adaptive density estimation for ergodic SDE with jumps over anisotropic classes
Abstract: We consider the solution X = (Xt)_{t≥0} of a multivariate stochastic differential equation with Levy-type jumps and with unique invariant probability measure with density μ. We assume that a continuous record of observations X T = (Xt )_{0≤t≤T} is available. In the case without jumps, Dalalyan and Reiss [1] and Strauch [3] have found convergence rates of invariant density estimators, under respectively isotropic and anisotropic Hölder smoothness constraints, which are considerably faster than those known from standard multivariate density estimation. We extend the previous works by obtaining, in presence of jumps, some estimators which have the same convergence rates they had in the case without jumps for d ≥ 2 and a rate which depends on the degree of the jumps in the one-dimensional setting. We propose moreover a data driven bandwidth selection procedure based on the Goldenshluger and Lepski method [2] which leads us to an adaptive nonparametric kernel estimator of the stationary density μ of the jump diffusion X.
References:
[1] Dalalyan, A. and Reiss, M. (2007). Asymptotic statistical equivalence for ergodic diffusions: the multidimensional case. Probab. Theory Relat. Fields, 137(1), 25–47.
[2] Goldenshluger, A., Lepski, O. (2011). Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality. The Annals of Statistics, 39(3), 1608-1632.
[3] Strauch, C. (2018). Adaptive invariant density estimation for ergodic diffu- sions over anisotropic classes. The Annals of Statistics, 46(6B), 3451-3480.
27. Mai 2020
N.N.

03. Juni 2020
Ingrid van Keilegom (KU Leuven)
On a Semiparametric Estimation Method for AFT Mixture Cure Models
Abstract: When studying survival data in the presence of right censoring, it often happens that a certain proportion of the individuals under study do not experience the event of interest and are considered as cured. The mixture cure model is one of the common models that take this feature into account. It depends on a model for the conditional probability of being cured (called the incidence) and a model for the conditional survival function of the uncured individuals (called the latency). This work considers a logistic model for the incidence and a semiparametric accelerated failure time model for the latency part. The estimation of this model is obtained via the maximization of the semiparametric likelihood, in which the unknown error density is replaced by a kernel estimator based on the Kaplan-Meier estimator of the error distribution. Asymptotic theory for consistency and asymptotic normality of the parameter estimators is provided. Moreover, the proposed estimation method is compared with a method proposed by Lu (2010), which uses a kernel approach based on the EM algorithm to estimate the model parameters. Finally, the new method is applied to data coming from a cancer clinical trial.
10. Juni 2020 - Vortragsbeginn 15 Uhr !!!
Jonathan Niles-Weed (New York University)
Minimax estimation of smooth densities in Wasserstein distance
Abstraft: We study nonparametric density estimation problems where error is measured in the Wasserstein distance, a metric on probability distributions popular in many areas of statistics and machine learning. We give the first minimax-optimal rates for this problem for general Wasserstein distances, and show that, unlike classical nonparametric density estimation, these rates depend on whether the densities in question are bounded below. Motivated by variational problems involving the Wasserstein distance, we also show how to construct discretely supported measures, suitable for computational purposes, which achieve the minimax rates. Our main technical tool is an inequality giving a nearly tight dual characterization of the Wasserstein distances in terms of Besov norms.
Joint work with Q. Berthet.
17. Juni 2020
Francois Bachoc (Toulouse)
Valid confidence intervals post-model-selection
Abstract: In this talk, I will first introduce the post-model-selection inference setting, that has recently been subject to intensive investigation. In the case of Gaussian linear regression, I will review the post-model-selection confidence intervals suggested by Berk et al (2013). These intervals are meant to cover model-dependent regression coefficients, that depend on the selected set of variables. I will present some personal contributions on an adaptation of these confidence intervals to the case where the targets of inference are linear predictors. Then, I will present an extension of these confidence intervals to non-Gaussian and non-linear settings. The suggested more general intervals will be supported by asymptotic results and numerical comparisons with other intervals recently suggested in the literature.
24. Juni 2020
Ingo Steinwart(Stuttgart)
Some thoughts and questions towards a statistical understanding of DNNs
Abstract: So far, our statistical understanding of the learning mechanisms of deep neural networks (DNNs) is rather limited. Part of the reasons for this lack of understanding is the fact that in many cases the tools of classical statistical learning theory can no longer be applied. In this talk, I will present some thoughts and possible questions that may be relevant for a successful end-to-end analysis of DNNs. In particular, I will discuss the role of initialization, over-parameterization, global minima, and optimization procedures.
01. Juli 2020
N.N.

08. Juli 2020
N.N.

15. Juli 2020
N.N.

 Interessenten sind herzlich eingeladen.

Für Rückfragen wenden Sie sich bitte an:

##### Frau Andrea Fiebig

Mail: fiebig@mathematik.hu-berlin.de
Telefon: +49-30-2093-45460
Fax:        +49-30-2093-45451
Humboldt-Universität zu Berlin
Institut für Mathematik
Unter den Linden 6
10099 Berlin, Germany