Factor Augmented Sparse Throughput Deep ReLU Neural Networks for High Dimensional Regression
We introduce a Factor Augmented Sparse Throughput (FAST) model that utilizes both latent factors and sparse idiosyncratic components for nonparametric regression. The FAST model bridges factor models on one end and sparse nonparametric models on the other end. It encompasses structured nonparametric models such as factor augmented additive model and sparse low-dimensional nonparametric interaction models and covers the cases where the covariates do not admit factor structures. Via diversified projections as estimation of latent factor space, we employ truncated deep ReLU networks to nonparametric factor regression without regularization and to more general FAST model using nonconvex regularization, resulting in factor augmented regression using neural network (FAR-NN) and FAST-NN estimators respectively. We show that FAR-NN and FAST-NN estimators adapt to unknown low-dimensional structure using hierarchical composition models in nonasymptotic minimax rates. We also study statistical learning for the factor augmented sparse additive model using a more specific neural network architecture. Our results are applicable to the weak dependent cases without factor structures. In proving the main technical result for FAST-NN, we establish new a deep ReLU network approximation result that contributes to the foundation of neural network theory. Our theory and methods are further supported by simulation studies and an application to macroeconomic data.
This is a joint work with Yihong Gu.
The Ohio State University
Predictive Model Degrees of Freedom in Linear Regression
Overparametrized interpolating models have drawn increasing attention from machine learning. Some recent studies suggest that regularized interpolating models can generalize well. This phenomenon seemingly contradicts the conventional wisdom that interpolation tends to overfit the data and may perform poorly on test data. Further, it appears to defy the bias-variance trade-off. As one of the shortcomings of the existing theory, the classical notion of model degrees of freedom fails to explain the intrinsic difference among the interpolating models since it focuses on estimation of in-sample prediction error. This motivates an alternative measure of model complexity which can differentiate those interpolating models and take different test points into account. In particular, we propose a measure with a proper adjustment based on the squared covariance between the predictions and observations. Our analysis with least squares method reveals some interesting properties of the measure, which can reconcile the "double descent" phenomenon with the classical theory. This opens doors to an extended definition of model degrees of freedom in modern predictive settings.
This is joint work with Bo Luan and Yunzhang Zhu.
Statistical Learning with Low-resolution Information: There is No Free Lunch
Imprecise probabilities alleviate the need for high-resolution and unwarranted assumptions in statistical modeling and risk assessment. They present an alternative strategy to reduce irreplicable findings. However, updating imprecise models requires the user to choose among alternative updating rules. Competing rules can result in incompatible inferences, and exhibit dilation, contraction and sure loss, unsettling phenomena that cannot occur with precise probabilities and the regular Bayes rule. We revisit some famous statistical paradoxes and show that the logical fallacy stems from a set of marginally plausible yet jointly incommensurable model assumptions akin to the trio of phenomena above. Discrepancies between the generalized Bayes (B) rule, Dempster's (D) rule, and the Geometric (G) rule as competing updating rules are discussed. We note that 1) B-rule cannot contract nor induce sure loss, but is the most prone to dilation due to “overfitting'' in a certain sense; 2) in absence of prior information, both B-rule and G-rule are incapable to learn from data however informative they may be; 3) D-rule and G-rule can mathematically contradict each other by contracting while the other dilating. These findings highlight the invaluable role of judicious judgment in handling low-resolution information, and the care that needs to be taken when applying updating rules to imprecise probability models.
[This talk is based on the discussion article in Statistical Science: Gong and Meng (Statist. Sci. 36(2): 169-190 (May 2021). DOI: 10.1214/19-STS765) Judicious Judgment Meets Unsettling Updating: Dilation, Sure Loss, and Simpson’s Paradox.]
University of California Irvine
Query-augmented Active Metric Learning
We propose an active metric learning method for clustering with pairwise constraints. The proposed method actively queries the label of informative instance pairs, while estimating underlying metrics by incorporating unlabeled instance pairs, which leads to a more accurate and efficient clustering process. In particular, we augment the queried constraints by generating more pairwise labels to provide additional information in learning a metric to enhance clustering performance. Furthermore, we increase the robustness of metric learning by updating the learned metric sequentially and penalizing the irrelevant features adaptively. Specifically, we propose a new active query strategy that evaluates the information gain of instance pairs more accurately by incorporating the neighborhood structure, which improves clustering efficiency without extra labeling cost. In theory, we provide a tighter error bound of the proposed metric learning method utilizing augmented queries compared with methods using existing constraints only. Furthermore, we also investigate the improvement using the active query strategy instead of random selection. Numerical studies on simulation settings and real datasets indicate that the proposed method is especially advantageous when the signal-to-noise ratio between significant features and irrelevant features is low.
Grace Y. YI
University of Western Ontario
Boosting Learning of Censored Survival Data
Survival data frequently arise from cancer research, biomedical studies, and clinical trials. Survival analysis has attracted extensive research interests in the past five decades. Numerous modeling strategies and inferential procedures have been developed in the literature. In this talk, I will start with a brief introductory overview of classical survival analysis which centers around statistical inference, and then discuss a boosting method which focuses on prediction. While boosting methods have been well known in the field of machine learning, they have also been broadly discussed in the statistical community for various settings, especially for cases with complete data. This talk concerns survival data which typically involve censored responses. Three adjusted loss functions are proposed to address the effects due to right-censored responses where no specific model is imposed, and an unbiased boosting estimation method is developed. Theoretical results, including consistency and convergence, are established. Numerical studies demonstrate the promising finite sample performance of the proposed method.