Application of Shrinkage Techniques in Logistic Regression Analysis: A Case Study
In: Statistica Neerlandica: journal of the Netherlands Society for Statistics and Operations Research, Band 55, Heft 1, S. 76-88
ISSN: 1467-9574
Logistic regression analysis may well be used to develop a predictive model for a dichotomous medical outcome, such as short‐term mortality. When the data set is small compared to the number of covariables studied, shrinkage techniques may improve predictions. We compared the performance of three variants of shrinkage techniques: 1) a linear shrinkage factor, which shrinks all coefficients with the same factor; 2) penalized maximum likelihood (or ridge regression), where a penalty factor is added to the likelihood function such that coefficients are shrunk individually according to the variance of each covariable; 3) the Lasso, which shrinks some coefficients to zero by setting a constraint on the sum of the absolute values of the coefficients of standardized covariables.Logistic regression models were constructed to predict 30‐day mortality after acute myocardial infarction. Small data sets were created from a large randomized controlled trial, half of which provided independent validation data. We found that all three shrinkage techniques improved the calibration of predictions compared to the standard maximum likelihood estimates. This study illustrates that shrinkage is a valuable tool to overcome some of the problems of overfitting in medical data.