Corelab Seminar

Argyris Mouzakis
Advances in Private Statistics: The Case for Private Covariance Estimation

In recent years, the evolution of machine learning models and their frequent dependence on sensitive data has resulted in increasing concerns regarding how user privacy is ensured. This has lead to the development of private statistics, a research area that aims to design accurate estimators for all common statistical estimation problems, while also ensuring the privacy of individuals contributing their data. The notion of privacy commonly involved in this setting is Differential Privacy (DP). Roughly speaking, an estimator is differentially private when its output is not significantly affected when a single datapoint is perturbed. This talk will focus on the problem of private covariance estimation, surveying recent works I have contributed to. The talk will be split in two parts. First, we will present differentially private estimators under concentrated DP and pure DP for distributions that may have only a finite number of bounded moments (e.g., heavy-tailed distributions). To achieve these privacy guarantees, it is known that it is necessary to have a-priori bounds on the parameters of the underlying distribution. However, that is not the case with approximate DP, which is a weaker notion of privacy compared to the previous two. Thus, in the second part of the talk, we will present the first private and computationally efficient estimator for Gaussian distributions that involves no prior bounds on the covariance matrix.
The talk is intended to be as self-contained as possible and will include pointers to the broader private statistics literature. It is based on two papers, entitled "Private Covariance Estimation of Heavy-Tailed Distributions" (joint work with Gautam Kamath, soon to appear on arxiv) and "A Private and Computationally-Efficient Estimator for Unbounded Gaussians" (joint work with Gautam Kamath, Vikrant Singhal, Thomas Steinke and Jon Ullman, see