For the final project, you will apply the knowledge and skills that you have learned throughout this course to analyze a dataset that interests you.
The project should be an in-depth statistical analysis of a question that interests you. It is quite common for this final project to be based on your research interests, or topics/questions from one of your other courses. Just about every discipline has questions that are amenable to statistical analyses, including economics, engineering, environmental studies, history, the natural sciences, psychology, and even sports, so there are many options to choose from.
If you do not have a dataset or an inferential problem in mind that you might be interested in, please come talk to me AT LEAST TWO WEEKS before the proposal due date, so we can explore ideas.
Each student MUST use GitHub. A blank repository will be created for this project. The link will be made available to you later. Do not worry if you are not very familiar with working with GitHub. More instructions will be provided later.
The final report should be concise, well written and MUST NOT exceed 5 pages (excluding references and the appendix). Your report should be written according to the following outline:
A few sentences describing the inferential question(s), the method used and the most important results.
A more in-depth introduction to the inferential question(s) of interest.
You should describe the data in this section: how you obtained the data, the variables included, dealing with missing/erroneous values, exploratory data analysis etc.
A detailed description of the model used, how you selected the model, how you selected the variables, model assessment, model validation, and presentation of the model results. What are your overall conclusions in context of the inferential problem(s)?
In this section, you should present the importance of your findings, and describe any limitations of the study. You can also address future work here if there are extensions of your analysis you find interesting, especially those that may address some of the limitations already mentioned.
Grading will take into account the following:
Is it easy for your reader to understand what you did and the arguments you made?
Did you answer your question(s) of interest? You must be clear about what the questions are and how your model results directly answer the questions.
What is the quality of research question and relevancy of the data selected to those questions? Did you tackle a challenging, interesting question (good), or did you just collect and publish descriptive statistics (bad)?
Are the statistical methods carried out and explained correctly?
Did you use statistical techniques wisely when addressing your question? That is, did you use an appropriate statistical method for the question(s) and data, or did you just select a very complicated model even though it clearly cannot answer the question(s) posed?
Quality of writing and explanation.
Some suggestions for scoring highly on these criteria, and suggestions to keep in mind whenever you write anything, include the following:
No, each person should work individually. However, I do encourage you all to discuss what you are doing with classmates. This will improve your final products.
Can you use data that you are already using for a project in another class? Well that depends on the particulars of your proposed project. Come talk to me.
Any additional details will be communicated later.