class: center, middle, inverse, title-slide # IDS 702: Module 4.2 ## Multilevel/hierarchical linear models ### Dr. Olanrewaju Michael Akande --- ## Mativating example: the radon analysis - As a motivating example, we will look at data on radon levels of houses within each of 85 counties in Minnesota. -- - The data is in the file `Radon.txt` on Sakai. -- - The full data actually includes data for more states but we will focus on just Minnesota like the textbook. -- - The U.S. Environmental Protection Agency and the Surgeon General’s Office have estimated that as many as 20,000 lung cancer deaths are caused each year by exposure to radon (reference [here](https://www.radon.com/radon_facts/)). -- - Radon is a cancer-causing radioactive gas and is the second leading cause of lung cancer. Unfortunately, you cannot see, smell or taste it. The most commonly used device for making short-term radon measurements in homes is the charcoal canister -- - Radon occurs naturally as an indirect decay product of uranium. -- - Given that counties are nested within states, thinking about a hierarchical model here makes sense. --- ## Mativating example: the radon analysis Variable | Description :------------- | :------------ radon | radon levels for each house log_radon | log(radon) state | state floor | lowest living area of each house: 0 for basement, 1 for first floor countyname | county names countyID | ID for the county names (1-85) fips | state + county fips code uranium | county-level soil uranium log_uranium | log(uranium) -- The response variable, radon (or log_radon) is continuous, so we need a (hierarchical/multilevel) regression framework. -- To ascertain that we need a multilevel model here, we should check for differences across counties during EDA. --- ## Hierarchical linear models - Hierarchical models (like the model for the school data) can be applied to regression contexts where observations are grouped -- - First we will only focus on models for linear regression. -- - However, the same ideas apply to logistic regression (as we will see soon), Poisson regression, etc. -- - Recall that a standard linear model with one predictor can be written as .block[ .small[ `$$y_i = \beta_0 + \beta_1 x_{i1} + \epsilon_i; \ \ \epsilon_i \sim N(0, \sigma^2); \ \ \ i = 1, \ldots, n.$$` ] ] - Now suppose that the observations fall into `\(J\)` groups, indexed by `\(j\)`. -- - Then there are several ways to take advantage of the group within the context of hierarchical models. --- ## Random intercepts model - First, we can let the intercept alone vary by group, if we think that the predictor has the same effect on each group, but the overall intercept (grand mean of the response) is different for each group. -- - This is known as the .hlight[random intercepts model] or the .hlight[varying-intercept model], and is often written as .block[ .small[ $$ `\begin{split} y_{ij} & = \beta_{0j} + \beta_1 x_{1ij} + \epsilon_{ij}; \ \ \ i = 1, \ldots, n_j; \ \ \ j = 1, \ldots, J \\ \epsilon_{ij} & \sim N(0, \sigma^2) \\ \beta_{0j} & \sim N(\beta_{0}, \tau_0^2). \end{split}` $$ ] ] where `\(i\)` indexes observations and `\(j\)` indexes groups. -- - The model can also be written as .block[ .small[ $$ `\begin{split} y_{ij} & = (\beta_{0} + \gamma_{0j}) + \beta_1 x_{1ij} + \epsilon_{ij}; \ \ \ i = 1, \ldots, n_j; \ \ \ j = 1, \ldots, J \\ \epsilon_{ij} & \sim N(0, \sigma^2) \\ \gamma_{0j} & \sim N(0, \tau_0^2). \end{split}` $$ ] ] --- ## Random intercepts model - Allows separate intercepts for each group, but shrinks estimates towards common value. <img src="4-2-hierarchical-linear-models_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> -- - Useful for repeated measurements, when the "groups" are individuals, e.g., we take a subject's blood pressure three times and include all three measurements in the model). -- - Also useful when some groups have small sample sizes, so that estimation of individual group means is highly variable. --- ## Random slopes model - We may want to let only the slopes vary by group, if we think that the predictor has a different effect on each group, but the overall intercept is the same for each group. -- - This is known as the .hlight[random slopes model] or the .hlight[varying-slope model], and is often written as .block[ .small[ $$ `\begin{split} y_{ij} & = \beta_{0} + \beta_{1j} x_{1ij} + \epsilon_{ij}; \ \ \ i = 1, \ldots, n_j; \ \ \ j = 1, \ldots, J \\ \epsilon_{ij} & \sim N(0, \sigma^2) \\ \beta_{1j} & \sim N(\beta_{1}, \tau_1^2). \end{split}` $$ ] ] -- - The model can also be written as .block[ .small[ $$ `\begin{split} y_{ij} & = \beta_{0} + (\beta_1 + \gamma_{1j}) x_{1ij} + \epsilon_{ij}; \ \ \ i = 1, \ldots, n_j; \ \ \ j = 1, \ldots, J \\ \epsilon_{ij} & \sim N(0, \sigma^2) \\ \gamma_{1j} & \sim N(0, \tau_1^2). \end{split}` $$ ] ] --- ## Random slopes model -- - Allows separate slopes for each group, but shrinks estimates towards common value. -- - Also useful when some groups have small sample sizes, so that estimation of slopes is highly variable. -- - The model implies the same intercept for each group. <img src="4-2-hierarchical-linear-models_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Random slopes and intercepts model - We can also combine both ideas, that is, allow for the slopes and intercepts to both vary by group, if we think that the predictor has a different effect on each group, and the overall intercept is also different for each group. -- - This is known as the .hlight[random slopes and intercepts model] or the .hlight[varying-slope, varying-intercept model], and is often written as .block[ .small[ $$ `\begin{split} y_{ij} & = \beta_{0j} + \beta_{1j} x_{1ij} + \epsilon_{ij}; \ \ \ i = 1, \ldots, n_j; \ \ \ j = 1, \ldots, J \\ \epsilon_{ij} & \sim N(0, \sigma^2) \\ (\beta_{0j},\beta_{1j}) & \sim N_2((\beta_{0},\beta_{1}), \Sigma). \end{split}` $$ ] ] where `\(N_2(\boldsymbol{\mu},\Sigma)\)` is the bivariate normal distribution with mean `\(\boldsymbol{\mu}\)` and covariance matrix `\(\Sigma\)`. -- - The model can also be written as .block[ .small[ $$ `\begin{split} y_{ij} & = (\beta_{0} + \gamma_{0j}) + (\beta_1 + \gamma_{1j}) x_{1ij} + \epsilon_{ij}; \ \ \ i = 1, \ldots, n_j; \ \ \ j = 1, \ldots, J \\ \epsilon_{ij} & \sim N(0, \sigma^2) \\ (\gamma_{0j},\gamma_{1j}) & \sim N_2(\boldsymbol{0}, \Sigma). \end{split}` $$ ] ] --- ## Random slopes and intercepts model - Allows for separate slopes and intercepts for each group, but shrinks estimates towards common value -- - Useful when some groups have small sample sizes, so that estimation of slopes and intercepts is highly variable -- - `\((\gamma_{0j},\gamma_{1j})\)` are called .hlight[random effects] while `\((\beta_{0},\beta_{1})\)` are called .hlight[fixed effects]. Models with fixed and random effects are often called .hlight[mixed effects models]. <img src="4-2-hierarchical-linear-models_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Mixed effects model - Use the .hlight[lmer] command in the .hlight[lme4] package in R to estimate the parameters using maximum likelihood (ML) or restricted maximum likelihood (REML) estimation. -- - Take STA 601/602 and/or STA 610 for information on fitting these models using Bayesian methods. -- - Also, note that the terms fixed effects, random effects, and mixed effects can have (very) different meanings in different fields. -- - So, we will not get too carried away with the terminology. -- - For us, the important thing will be to be able to distinguish between parameters that vary by group and those that do not. --- ## Model assessment and validation - Model assessment and validation from linear regression carries over. -- - You should still have linearity (by each group for varying slopes), independence of the errors (and also of the varying effects for each predictor), equal variance, and normality. -- - You should still look out for outliers and check for multicollinearity. -- - Model comparison between two multi-level models does not quite work the same way. -- - We will not dive deeply into estimation but basically, ML produces unbiased estimates for the fixed effects but not the random effects whereas REML produces unbiased estimates for the random effects. -- - When using the `anova` function in R, keep the random effects part the same when comparing two models (so you'll be comparing fixed effects). Use AIC or BIC to decide the form of the random effects. --- class: center, middle # What's next? ### Move on to the readings for the next module!