Zhao Ren
University of Pittsburgh
Caldwell Hall Room 102

Robust Estimation under Huber's Contamination Model

This talk describes some new challenges and results in high-dimensional and nonparametric statistics under the celebrated Huber’s contamination model. We particularly focus on the influence of contamination on the minimax rates and the corresponding rate-optimal procedures.

The first part of the talk focuses on robust covariance matrix estimation. To deal with modern complex data sets, not only do we need estimation procedures to take advantage of the structural assumptions of the covariance matrix, it is also important to design methods that are resistant to arbitrary source of outliers. To this end, we define a new concept called matrix depth and propose to maximize the empirical matrix depth function to obtain a robust covariance matrix estimator. Under Huber’s contamination model, the proposed estimator is shown to achieve minimax optimal rate under the spectral norm loss for estimating covariance/scatter matrices with various structures such as bandedness and sparsity.

We then revisit the classical nonparametric density estimation under Huber’s contamination model and consider various £plosses (1 ≤ p < ∞). We carefully study the effect of contamination on estimation through the following model indices: contamination proportion, smoothness of target density, smoothness of contamination density, and the choice of the loss function.

In  the  end,  following  the  above  framework,  we  further  establish  a  general  decision  theory  for robust statistics under Huber’s contamination model.  When the loss is equivalent to the total variation distance, we propose a solution using Scheff´e estimate to a robust two-point testing problem that  leads  to  the  construction  of  robust  estimators  adaptive  to  the  proportion  of  contamination. Applying the general theory, we construct robust estimators for nonparametric density estimation, sparse linear regression and low-rank trace regression.  We show that these new estimators achieve the minimax  rate  with  optimal  dependence  on  the  contamination  proportion.  This  testing  procedure, Scheff´e estimate, also enjoys an optimal rate in the exponent of the testing error, which may be of independent interest.