Machine Learning Toolbox for Social Scientists: Applied Predictive Analytics with R
Cover -- Half Title -- Title Page -- Copyright Page -- Table of Contents -- Preface -- 1 How We Define Machine Learning -- 2 Preliminaries -- 2.1 Data and Dataset Types -- 2.1.1 Cross-Sectional -- 2.1.2 Time-Series -- 2.1.3 Panel -- 2.2 Plots -- 2.3 Probability Distributions with R -- 2.4 Regressions -- 2.4.1 Ordinary Least Squares (OLS) -- 2.4.2 Maximum Likelihood Estimators -- 2.4.3 Estimating MLE with R -- 2.5 BLUE -- 2.6 Modeling the Data -- 2.7 Causal vs. Predictive Models -- 2.7.1 Causal Models -- 2.7.2 Predictive Models -- 2.8 Simulation -- Part 1 Formal Look at Prediction -- 3 Bias-Variance Tradeoff -- 3.1 Estimator and MSE -- 3.2 Prediction - MSPE -- 3.3 Biased Estimator as a Predictor -- 3.4 Dropping a Variable in a Regression -- 3.5 Uncertainty in Estimations and Predictions -- 3.6 Prediction Interval for Unbiased OLS Predictor -- 4 Overfitting -- Part 2 Nonparametric Estimations -- 5 Parametric Estimations -- 5.1 Linear Probability Models (LPM) -- 5.2 Logistic Regression -- 5.2.1 Estimating Logistic Regression -- 5.2.2 Cost Functions -- 5.2.3 Deviance -- 5.2.4 Predictive Accuracy -- 6 Nonparametric Estimations - Basics -- 6.1 Density Estimations -- 6.2 Kernel Regressions -- 6.3 Regression Splines -- 6.4 MARS - Multivariate Adaptive Regression Splines -- 6.5 GAM - Generalized Additive Model -- 7 Smoothing -- 7.1 Using Bins -- 7.2 Kernel Smoothing -- 7.3 Locally Weighted Regression loess() -- 7.4 Smooth Spline Regression -- 7.5 Multivariate Loess -- 8 Nonparametric Classifier - kNN -- 8.1 mnist Dataset -- 8.2 Linear Classifiers (again) -- 8.3 k-Nearest Neighbors -- 8.4 kNN with Caret -- 8.4.1 mnist_27 -- 8.4.2 Adult Dataset -- Part 3 Self-Learning -- 9 Hyperparameter Tuning -- 9.1 Training, Validation, and Test Datasets -- 9.2 Splitting the Data Randomly -- 9.3 k-Fold Cross-Validation -- 9.4 Cross-Validated Grid Search.