class: center, middle, inverse, title-slide # IDS 702: Module 2.4 ## Model assessment and validation - binned residuals and roc curves ### Dr. Olanrewaju Michael Akande --- ## Model assessment and validation There are various types of residuals when working with generalized linear models (GLMs). For logistic regression in particular, we have - .hlight[Response residuals] .block[ .small[ `$$e_i = y_i - \hat{\pi}_i.$$` ] ] -- - .hlight[Pearson residuals] .block[ .small[ `$$e_i^P = \dfrac{y_i - \hat{\pi}_i}{\sqrt{\hat{\pi}_i(1-\hat{\pi}_i)}},$$` ] ] which are obtained by "normalizing" the response residuals by the estimated Bernoulli standard deviation. -- - .hlight[Deviance residuals] .block[ .small[ `$$e_i^D = \textrm{sign}(y_i - \hat{\pi}_i) \times 2\left(y_i \textrm{log}\dfrac{1}{\hat{\pi}_i} + (1-y_i) \textrm{log}\dfrac{1}{1-\hat{\pi}_i} \right),$$` ] ] which are the default in R when using the .hlight[residuals()] function. We will talk a bit more about deviance later, but deviance residuals represent the contributions of individual samples to the deviance. --- ## Model assessment and validation - Deviance residuals are usually the most appropriate for residual plots, when working with GLMs. -- - However, unlike what we had for linear regression, just looking at the residuals does not work well here. + They are always positive when `\(Y=1\)` and always negative when `\(Y=0\)`. -- + Also, constant variance is not an assumption of logistic regression. <div class="question"> Why is that the case? </div> Think about the properties of the Bernoulli distribution when we write `\(y_i | x_i \sim \textrm{Bernoulli}(\pi_i)\)` -- + We also do not have normality of residuals to work with either. --- ## Model assessment and validation - What we can do is check to see if the function of predictors is well specified using .hlight[binned residuals]. -- - We can assess the overall fit of our model using .hlight[deviance] and .hlight[change in deviance]. -- - We can also see how well our model predicts (model validation) using + Confusion matrix + ROC curves --- ## Binned residuals - Compute raw (response) residuals for fitted logistic regression. - Order observations by values of predicted probabilities (or predictor values) from the fitted regression. - Using ordered data, form `\(g\)` bins of (approximately) equal size. Default: `\(g = \sqrt{n}\)`. - Compute average residual in each bin. - Plot average residual versus average predicted probability (or average predictor value) for each bin. - Use the .hlight[arm] package in R. --- ## NBA analysis Recall the NBA data ```r nba <- read.csv("data/nba_games_stats_reduced.csv",header=T, stringsAsFactors=T) nba <- nba[nba$Team=="SAS",] colnames(nba)[3] <- "Opp" nba$win <- rep(0,nrow(nba)) nba$win[nba$WINorLOSS=="W"] <- 1 nba$win <- as.factor(nba$win) nba$Opp_cent <- nba$Opp - mean(nba$Opp) nbareg <- glm(win~Opp_cent,family=binomial(link=logit),data=nba) ``` --- ## NBA analysis ```r plot(nbareg,which=1) ``` <img src="2-4-logistic-model-assessment_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> The residuals are the deviance residuals, while the predicted values are on the linear (logit) scale, that is, `\(\beta_0 + \beta_1 x_i\)`. Look to see which cases have large absolute values for cases that don't fit well, but not too useful otherwise. --- ## NBA analysis Plot binned raw residuals versus predicted probabilities (.hlight[arm] package). ```r binnedplot(fitted(nbareg),residuals(nbareg,"resp"),xlab="Pred. probabilities",col.int="red4", ylab="Avg. residuals",main="Binned residual plot",col.pts="navy") ``` <img src="2-4-logistic-model-assessment_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> Look for "randomness" with almost all points within the red lines. --- ## NBA analysis - Useful as a "one-stop shopping" plot; especially with many predictors and you want an initial look at model adequacy. -- - What we have is mostly good, although model seems to struggle for fitted values over 0.95 or so. -- - The red lines represent `\(\pm 2\)` SE bands, which we would expect to contain about 95% of the observations. -- - Too few points here to draw any conclusions! -- - You usually want many more data points before these plots start being useful. --- ## NBA analysis Plot binned raw residuals versus individual predictors. ```r binnedplot(nba$Opp,residuals(nbareg,"resp"),xlab="Opponent's points (centered)", col.int="red4",ylab="Avg. residuals",main="Binned residual plot",col.pts="navy") ``` <img src="2-4-logistic-model-assessment_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## NBA analysis - Mostly good, although model seems to struggle for low values of opponent's points. -- - Also, too many points (16.7%) outside the bands. -- - However, still too few points here for any conclusive takeaways. -- - We also know some important predictors are missing by construction... --- ## Deviance - To assess overall model fit, we can also look at .hlight[deviance]. -- - Deviance measures how well the model fits the data, when compared to the .hlight[saturated model], that is, an abstract model that fits the sample perfectly. -- - Precisely, deviance is defined as the difference of likelihoods between the fitted model and the saturated model: .block[ .small[ `$$D = - 2 \left[ \text{ Log Likelihood}(\text{Fitted Model}) - \text{ Log Likelihood}(\text{Saturated Model}) \right].$$` ] ] -- - However, this "abstract saturated model" will have likelihood equal to one, so that deviance is simply .block[ .small[ `$$D = - 2\text{ Log Likelihood}(\text{Fitted Model}) = - 2 \sum_{i=1}^n \left[y_i \textrm{log}(\hat{\pi}_{1i}) + (1-y_i) \textrm{log}(1-\hat{\pi}_{1i})\right].$$` ] ] -- - Note that .hlight[deviance is always larger or equal than zero], and will only be zero if the fit is "perfect". -- - Overall, deviance is a measure of error, so that, .hlight[lower values of deviance means better fit to the data]. --- ## Deviance - Like the metrics used under MLR, it is also often useful to use deviance for a model in relation to another model. We will revisit this soon. -- - For now, a model we can use for this comparison is the .hlight[null model], that is, the model with only the intercept. -- - Intuitively, this gives us a sense of how much the model improves from the "worst model", by the addition of the predictors. -- - The deviance of the null model, denoted `\(D_0\)`, is thus referred to as the .hlight[null deviance]. -- - To get a general sense of how much better the fitted model is to the null model, compare `\(D\)` to `\(D_0\)`, usually through the difference `\(D_0 - D\)`. -- - The "larger" this .hlight[change in deviance] `\(D_0 - D\)` is, the more confident we are that the predictors we have included improve model fit. -- - In large samples, `\(D_0 - D\)` has approximately a chi-squared distribution with degrees of freedom equal to the difference in the number of predictors between the two models. --- ## NBA analysis For the NBA data for example, we see what looks like a meaningful difference in the two deviance scores. ```r summary(nbareg) ``` ``` ## ## Call: ## glm(formula = win ~ Opp_cent, family = binomial(link = logit), ## data = nba) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.2760 -0.7073 0.4454 0.7902 1.9593 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.13387 0.15145 7.487 7.06e-14 ## Opp_cent -0.12567 0.01655 -7.594 3.11e-14 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 400.05 on 327 degrees of freedom ## Residual deviance: 313.42 on 326 degrees of freedom ## AIC: 317.42 ## ## Number of Fisher Scoring iterations: 5 ``` --- ## NBA analysis - We can formalize this by doing a chi-squared test on the null model vs our fitted model. That is, ```r nbareg_null <- glm(win~1,family=binomial(link=logit),data=nba) anova(nbareg_null,nbareg,test= "Chisq") ``` ``` ## Analysis of Deviance Table ## ## Model 1: win ~ 1 ## Model 2: win ~ Opp_cent ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## 1 327 400.05 ## 2 326 313.42 1 86.63 < 2.2e-16 ``` -- - The low p-value then confirms our previous statement. -- - We will revisit this again when we look at logistic regression with multiple predictors. -- - We will be able to use deviance for model comparison and selection by looking at the change in deviance `\(D_{M_1} - D_{M_2}\)`, for two models `\(M_1\)` and `\(M_2\)`, where `\(M_1\)` is nested within `\(M_2\)`. --- ## Confusion matrix - We can use the estimated probabilities from our fitted model to predict outcomes, and then compare those to the observed values. -- - For example, we could decide to predict `\(Y=1\)` when the predicted probability exceeds `\(0.5\)` and predict `\(Y=0\)` otherwise. -- - We then can determine how many cases we classify correctly and incorrectly. -- - Resulting `\(2 \times 2\)` table is called the .hlight[confusion matrix]. -- - When mis-classification rates are high, model may not be an especially good fit to the data. --- ## Confusion matrix <table> <tr> <th> </th> <th> </th> <th colspan="2">Observed</th> </tr> <tr> <th colspan="2"></th> <td style="text-align:center">Y=1</td> <td style="text-align:center">Y=0</td> </tr> <tr> <th rowspan="2">Predicted</th> <td height="50px">Y=1</td> <td style="text-align:center">TP (True Positives)</td> <td style="text-align:center">FP (False Positives)</td> </tr> <tr> <td height="50px">Y=0</td> <td style="text-align:center">FN (False Negatives)</td> <td style="text-align:center">TN (True Negatives)</td> </tr> </table> -- - .hlight[True positive rate (TPR)] = `\(\dfrac{TP}{TP+FN}\)` (also known as .hlight[sensitivity]) -- - .hlight[False negative rate (FNR)] = `\(\dfrac{FN}{TP+FN}\)` -- - .hlight[True negative rate (TNR)] = `\(\dfrac{TN}{FP+TN}\)` (also known as .hlight[specificity]) -- - .hlight[False positive rate (FPR)] = `\(\dfrac{FP}{FP+TN}\)` (1 - .hlight[specificity]) --- ## ROC Curves - .block[We want high values of sensitivity and low values of (1 - specificity)!] -- - The .hlight[receiver operating characteristic (ROC)] curve plots + Sensitivity on Y axis + 1 - specificity on X axis -- - Evaluated at lots of different values (beyond 0.5) for the threshold. -- - Good fitting logistic regression curves toward the upper left corner, with .hlight[area under the curve (AUC)] near one. -- - Make ROC curves in R using the pROC package. -- - By the way, we also often define .hlight[accuracy] as `\(\dfrac{TP + TN}{TP+FN+FP+TN}\)`. This estimates how well the model predicts correctly overall. --- ## NBA analysis Let's look at the confusion matrix for the NBA data. Load the .hlight[arm], .hlight[e1071], .hlight[caret], and .hlight[pROC] packages. ```r Conf_mat <- confusionMatrix(as.factor(ifelse(fitted(nbareg) >= 0.5, "W","L")), nba$WINorLOSS,positive = "W") Conf_mat$table ``` ``` ## Reference ## Prediction L W ## L 44 19 ## W 54 211 ``` ```r Conf_mat$overall["Accuracy"]; ``` ``` ## Accuracy ## 0.777439 ``` ```r Conf_mat$byClass[c("Sensitivity","Specificity")] ``` ``` ## Sensitivity Specificity ## 0.9173913 0.4489796 ``` .hlight[confusionMatrix] produces a lot of output. Print the .hlight[Conf_mat] object to see all of them. --- ## NBA analysis ```r invisible(roc(nba$win,fitted(nbareg),plot=T,print.thres=c(0.3,0.5,0.7),legacy.axes=T, print.auc =T,col="red3")) ``` <img src="2-4-logistic-model-assessment_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ## NBA analysis ```r invisible(roc(nba$win,fitted(nbareg),plot=T,print.thres="best",legacy.axes=T, print.auc =T,col="red3")) ``` <img src="2-4-logistic-model-assessment_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- class: center, middle # What's next? ### Move on to the readings for the next module!