Details: | Abstract
In a linear regression model, parameters of predictor variables represent the individual
effects of the variables. Estimating these parameters has always been an essential
part of a regression analysis. When there are strongly correlated variables in the
model, they generate multicollinearity which makes the least squares estimates of their
parameters unreliable. In this case, a commonly used solution is to abandon the least
squares regression in favor of the ridge regression or principal component regression,
but these alternative methods of regression are complicated in implementation
and interpretation. The sampling distributions of their estimators are also usually
unavailable.
In this talk, I argue that instead of abandoning the least squares regression, we should
abandon the attempt to estimate parameters of the strongly correlated variables
because (i) these parameters are not meaningful and (ii) they cannot be accurately
estimated regardless of the method of regression used unless the sample size is very
large. How, then, do we analyze such variables? I propose that we analyze their
collective impact on the response variable. To this end, I introduce group effects, a
class of linear combinations of their parameters, to represent their collective impact,
and show that there are group effects that can be accurately estimated by their
minimum-variance unbiased linear estimators. These group effects also have good
interpretations and characterize the region of the predictor variable space where the
least squares estimated model makes accurate predictions. They provide a means to
study strongly correlated variables through the simple least squares regression, which
is useful for analyzing observational data from social science, environmental science,
and medical research where strongly correlated variables are common.
|