Due: 1 hour after class ends
You will work in your pre-assigned teams. Each team should submit ONLY ONE report for this exercise. You must write the names of all team members at the top of the report containing your responses. You all must do the work using one student’s computer and R/RStudio.
Have one team member open R/RStudio on their computer and share their screen with the other team members within the breakout room. At the top of the team report, write “host” in parenthesis besides this student’s name. Have another team member be responsible for documenting the responses. At the top of the team report, write “writer” in parenthesis besides this student’s name.
NOTE: Generally, you will not be penalized for not taking on these roles many times during the semester. This is to simply ensure that you do switch the roles around a “decent number” of times within each team throughout the semester. That said, I will penalize any student who obviously dominates these roles over everyone else, so be sure to give other students an opportunity to do them.
You all should have R and RStudio installed on your computers by now. If you do not, first install the latest version of R here: https://cran.rstudio.com (remember to select the right installer for your operating system). Next, install the latest version of RStudio here: https://www.rstudio.com/products/rstudio/download/. Scroll down to the “Installers for Supported Platforms” section and find the right installer for your operating system.
Gradescope will let you select your team mates when submitting, so make sure to do so. Only one person needs to submit the sheet on Gradescope. You can submit your document in the most common formats, but pdf files are preferred. Submit on Gradescope here: https://www.gradescope.com/courses/280392/assignments. Be sure to submit under the right assignment entry.
The purpose of this exercise is to help you gain more experience working with multilevel linear regression models. The exercise is based on Exercise 6 of Section 12.11 of Data Analysis Using Regression and Multilevel/Hierarchical Models by Gelman A., and Hill, J. The data is from the following article:
Hamermesh, D. S. and Parker, A. (2005), “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity”, Economics of Education Review, v. 24 (4), pp. 369-376.
The data contains information about student evaluations of instructor’s beauty and teaching quality for several courses at the University of Texas from 2000 to 2002. Evaluations were carried out during the last 3 weeks of the 15-week semester. Students administer the evaluation instrument while the instructor is absent from the classroom. The beauty judgements were made later using each instructor’s pictures by six undergraduate students (3 women and 3 men) who had not attended the classes and were not aware of the course evaluations. The sample contains a total of 94 professors across 463 classes, with the number of classes taught by each professor ranging from 1 to 13. Underlying the 463 observations are 16,957 completed evaluations from 25,547 registered students. The data you will use for this exercise excludes some variables in the original dataset.
Read the article (available via Duke library) for more information about the problem.
Download the data (named Beauty.txt
) from Sakai and save it locally to the same directory as your R markdown file. To find the data file on Sakai, go to Resources \(\rightarrow\) Datasets \(\rightarrow\) In-Class Analyses. Once you have downloaded the data file into the SAME folder as your R markdown file, load and clean the data by using the following R code.
Remember to double-check the dimensions and first few rows of the data to confirm you have the right file.
<- read.table ("Beauty.txt", header=T, sep=" ") Beauty
Variable | Description |
---|---|
profnumber | Id for each professor |
beauty | Average of 6 standardized beauty ratings |
eval | Average course rating |
CourseID | Id for 30 individual courses. The remaining classes were not identified in the original data, so that they have value 0. |
tenured | Is the professor tenured? 1 = yes, 0 = no |
minority | Is the professor from a minority group? 1 = yes, 0 = no |
age | Professor’s age |
didevaluation | Number of students filling out evaluations |
female | 1 = female, 0 = male |
formal | Did the instructor dress formally (that is, wears tie–jacket/blouse) in the pictures used for the beauty ratings? 1 = yes, 0 = no |
lower | Lower division course? 1 = yes, 0 = no |
multipleclass | 1 = more than one professor teaching sections in course in sample, 0 = otherwise |
nonenglish | Did the Professor receive an undergraduate education from a non-English speaking country? 1 = yes, 0 = no |
onecredit | 1 = one-credit course, 0 = otherwise |
percentevaluating float | didevaluation/students |
profevaluation | Average instructor rating |
students | Class enrollment |
tenuretrack | Is the professor tenure track faculty? 1 = yes, 0 = no |
Treat the variable eval
as the response variable and the other variables as potential predictors.
Is the distribution of eval
normal? If not, try the log transformation. Does that look more “normal”? If you think the log transformation does not help, what other transformation(s) do you think might work? Examine and describe the distribution(s) for the transformation(s).
Describe the overall relationship between eval
and beauty
.
Examine the relationship between eval
and beauty
, by CourseID
(think EDA for interactions). Are there any courses for which the relationship between eval
and beauty
looks potentially different than others?
There are 31 levels of CourseID
in all, which might be tough to explore graphically, so you should probably take a look at a random sample of say 5 classes plus class CourseID
== 0, making a total of 10 classes. Note that level CourseID
== 0 actually includes so many other classes which were not identified in the data. For the purposes of this lab, we will treat that class as one single huge class. You can use the following skeleton code to take a look at a reduced version of the data containing a random sample of 6 classes plus class CourseID
== 0.
set.seed(011020) #set your own seed to be able to replicate results
<- c(0,sample(1:30,6,replace=F))
sample_courseID <- Data[is.element(Data$CourseID,sample_courseID),] Data_reduced
Do you think it is meaningful to fit a multilevel linear model that includes varying slopes for beauty
by profnumber
? Why or why not?
Now, explore the relationship between eval
and the other potential predictors, excluding profnumber
, profevaluation
, and CourseID
. Don’t include any of the plots, just briefly describe the most interesting relationships. We should not include profevaluation
as a predictor for eval
. Why?