An intermediate course in applied statistics, covering a range of topics in modeling and analysis of data including: review of simple linear regression, two-sample problems, one-way analysis of variance; multiple linear regression, diagnostics and model selection; two-way analysis of variance, multiple comparisons, and other selected topics. (4 Credits)
Advisory Pre-requisites: MATH 115 and one of (STATS 180 or STATS 250 or STATS 280, STATS 412 or ECON 451). No credit granted if completed or enrolled in STATS 413.
We will build on material in STATS 250. Notes and labs for 250 are available online. STATS 280, STATS 412 and ECON 451 are more advanced and therefore entirely adequate preparation. STATS 180 is a sufficient background but will require some extra catch-up.
Homework will be graded on completeness. To get these points, the homework must include two statements titled “Sources” and “Please explain”.
The statement of sources must list all books, internet resources or other people consulted while completing the homework. For example, “Sources: notes only,” or, “Sources: collaboration with X and Y; web site Z,” or, “Sources: notes and office hours.” Any material taken from any source, such as the internet, must be properly acknowledged. Directly copied text must be in quotation marks. Directly copied equations must be explicitly referenced to the source. Unattributed copying from any source is plagiarism, and has potentially serious consequences.
The explanation request provides useful feedback for the GSI and instructor. For example, “Please explain: nothing, everything is clear,” or, “Please explain: I’m confused by summation notation. What is \(\sum_{i=1}^n j\)?’’ or,”Please explain: I wanted to add a label to a point on a graph. How do you do that in R?"
Homework will not be graded on correctness, to encourage independent work. The GSI may provide some feedback on correctness, but students are responsible for checking their work against posted solutions.
Fitting a linear model to a sample by least squares. (R script).
Toward a population version of the linear model. Random variables (from 401W17). Bivariate random variables (from 401W17).
Hypothesis testing and confidence intervals. Log transformations. Reading Sheather and other texts.
Homework 0. Setting up your laptop. Due in class on 1/3.
Homework 1. Introduction to R. Due in your lab on 1/11 or 1/12. Solutions, as a text file of R commands.
Homework 2. More swirl and matrix exercises. Due in your lab on 1/18 or 1/19. Solutions, as a pdf file or R script
Homework 3. More swirl and fitting a linear model via matrix computations. Due in your lab on 1/25 or 1/26. Solutions.
Homework 4. Summation exercises and a historic data analysis. Due in your lab on 2/1 or 2/2. Solutions.
Homework 5. Probability exercises and an application to a randomized experiment. Due in your lab on 2/8 or 2/9. Solutions.
Homework 6. Variance, covariance and standard errors for fitting a linear model. Due in your lab on 2/15 or 2/16. Solutions.
Homework 7. Using a linear model for investigation of two populations. Due in your lab on 3/15 or 3/16. Solutions.
Homework 8. Making an F test, and exploring the t and F distributions. Due in your lab on 3/22 or 3/23. Solutions.
Homework 9. Analyzing a dataset: Investigating variables associated with hospital-acquired infection. Due in your lab on 4/12 or 4/13. Solutions.
Lab 1. Introduction to R. Slides for 1/4 and 1/5.
Lab 2. Practicing matrix operations. Slides for 1/11 and 1/12.
Lab 3. Using matrices for fitting a linear model by least squares. Slides for 1/18 and 1/19. Solutions.
Lab 4. Summation and quiz review. Slides for 1/25 and 1/26. Sample quiz.
Lab 5. Quiz 1.
Lab 6. Variance, covariance and standard errors for the linear model. Slides for 2/8 and 2/9.
Lab 7. Midterm review. Slides for 2/15 and 2/16.
Lab 8. Confidence intervals and hypothesis tests. Slides for 3/8 and 3/9. Solutions.
Lab 9. F tests for model selection and ANOVA. Slides for 3/15 and 3/16. Solutions.
Lab 10. Review for Quiz 2. Slides for 3/22 and 3/23. Solutions.
Lab 11. Quiz 2.
Lab 12. Data analysis. Code.
Lab 13. Final review. Slides and exercises for 4/12 and 4/13. Solutions for the exercises
Quiz 2 will have 4 questions, with one question of each type (normal approximations; prediction; estimates and confidence intervals; F-tests) drawn randomly.
A list of questions in the quiz 2 generator with some solutions.
F tests.
Prediction intervals.
Confidence intervals for linear model coefficients.
Normal approximations and mean/variance calculations.
Linear model diagnostics.
Colinearity and its consequences.
Building and interpreting design matrices for models with interactions and factors.
Observational studies and causation.
The style and format of the final exam will be based on the practice final exam below. The exam will work through a data analysis, asking questions to requiring understanding of the statistical techniques used, how they are computed, and how they are interpreted.
Practice final exam, with solutions for questions not discussed in class.