Background The problem of collinearity due to high correlations between explanatory variables in multiple regression is often overlooked in epidemiological research. The assumption that covariates are independent implies that all pair-wise covariate associations should be negligible—an unlikely scenario for biological and epidemiological data. Small but significant departures from the assumption of independence can severely distort the interpretation of a model and the role of each covariate. If the relative impact of collinearity on the estimates is not understood, these effects can potentially obscure the conclusions of the study.
Methods The impact of collinearity must be assessed in relation to the model environment. Factors such as the relation of the response with the predictors, the sample size and the variation of the covariates each have the potential to exacerbate or relieve the symptoms of collinearity. We present a novel approach to assessing the overall uncertainty in the model estimates, which adjusts in relation to these factors. The index will aid the researcher in the decision towards whether a result is of biological relevance or if it is a consequence of the uncertainty generated by collinearity.
Results We consider data from a paper by Lipkin (1988) in the American Journal of Clinical Nutrition. The study examines the role of factors associated with substantial calciuresis. A hypothetical model is proposed involving measures of calcium and potassium in the diet—two highly correlated predictors. Both produce positive coefficients when entered individually, but the sign of diet protein becomes negative when entered simultaneously. The variance inflation factor (VIF) of 4.51 suggests that the collinearity is not considerable (Belsley, 1991). However, when the VIF index is adjusted using model R2, the impact appears more substantial than first thought. We propose an alternative diagnostic that utilises the additional influences as a basis to assess the impact of collinearity on the model estimates.
Conclusions The results of significance testing for collinear variables within multiple regression should not be the only criteria by which we judge whether collinearity is a problem. The role of collinearity must be carefully assessed and understood using an appropriate index. Measuring the impact of collinearity using overly simplistic diagnostics, such as the VIF, may lure a researcher into a false assurance of the results. Similarly, a model consisting of highly collinear predictors may be relatively unaffected when considered in relation to other factors in the model.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.