Topic: | Statistical Exploitation of Unlabeled Data under High Dimensionality |
Date: | 16/12/2021 |
Time: | 9:00 am - 10:00 am |
Venue: | Zoom Meeting (please refer to seminar PDF) |
Category: | Seminars |
Speaker: | Dr. Jiwei Zhao |
Details: | ABSTRACT We consider the benefits of unlabeled data in the semi-supervised learning setting under high dimensionality, for parameter estimation and statistical inference. In particular, we address the following two important questions. First, can we use the labeled data as well as the unlabeled data to construct a semi-supervised estimator such that its convergence rate is faster than the supervised estimator? Second, can we construct confidence intervals or hypothesis tests that are guaranteed to be more efficient or powerful than the supervised estimator? We show that, the semi-supervised estimator with a faster convergence rate exists under some conditions, and the implementation of this optimal estimator needs a reasonably good estimation of the conditional mean function. For statistical inference, we mainly propose a safe approach that is guaranteed to be no worse than the supervised estimator in terms of statistical efficiency. Not surprisingly, if the conditional mean function is well estimated, our safe approach becomes semi-parametrically efficient. After the theory development, I will also present some simulation results as well as a real data analysis. This is based on a joint work with Siyi Deng (Cornell), Yang Ning (Cornell) and Heping Zhang (Yale). |