|Topic:||Statistical Exploitation of Unlabeled Data under High Dimensionality|
|Time:||9:00 am - 10:00 am|
|Venue:||Zoom Meeting (please refer to seminar PDF)|
|Speaker:||Dr. Jiwei Zhao|
We consider the beneﬁts of unlabeled data in the semi-supervised learning setting under high dimensionality, for parameter estimation and statistical inference. In particular, we address the following two important questions. First, can we use the labeled data as well as the unlabeled data to construct a semi-supervised estimator such that its convergence rate is faster than the supervised estimator? Second, can we construct conﬁdence intervals or hypothesis tests that are guaranteed to be more eﬃcient or powerful than the supervised estimator? We show that, the semi-supervised estimator with a faster convergence rate exists under some conditions, and the implementation of this optimal estimator needs a reasonably good estimation of the conditional mean function. For statistical inference, we mainly propose a safe approach that is guaranteed to be no worse than the supervised estimator in terms of statistical eﬃciency. Not surprisingly, if the conditional mean function is well estimated, our safe approach becomes semi-parametrically eﬃcient. After the theory development, I will also present some simulation results as well as a real data analysis. This is based on a joint work with Siyi Deng (Cornell), Yang Ning (Cornell) and Heping Zhang (Yale).