The quiz will test the skills covered in homeworks 1 to 4. You will have 50 minutes allocated, though the quiz may take you much less time and you can leave lab once you are done. The quiz will be closed book. Any electronic devices in your possession must be turned off and remain in a bag on the floor. Technical skills tested will include matrix multiplication and transposition, inversion for 2x2, use of the sigma notation for sums. R coding will be tested by multiple choice questions. There will be questions on setting up equations for fitting a linear model, writing these equations in matrix form, and obtaining and interpreting the least squares fit to data.
This document generates different random quizzes each time it is compiled by Rmarkdown. The actual quiz will be a realization generated by this random process, or something very similar.
M1. Evaluate \({\mathbb{A}}{\mathbb{B}}\) when \[ {\mathbb{A}}= \begin{bmatrix} -2 & -1 \\ 2 & -1 \\ 3 & -2 \\ \end{bmatrix} , \quad {\mathbb{B}} = \begin{bmatrix} -1 & -2 \\ 1 & -2 \\ \end{bmatrix} \]
M2. For \({\mathbb{A}}\) as above, write down \({\mathbb{A}}^{{\scriptscriptstyle \mathrm{T}}}\).
M3. For \({\mathbb{B}}\) as above, find \({\mathbb{B}}^{-1}\) if it exists. If \({\mathbb{B}}^{-1}\) doesn’t exist, explain how you know this.
S1. A basic exercise.
Evaluate \(\sum_{a=b}^c d\), where \(b\) and \(c\) are whole numbers with \(c\ge b\).
Calculate \(\sum_{i=0}^{20} (10-i)\).
Calculate \(\sum_{i=k}^{k+4} (i+3)\), where \(k\) is a whole number. Your answer should depend on \(k\).
Evaluate \(\sum_{i=1}^{30} 10 - \sum_{i=10}^{20} 20\).
Evaluate \(\sum_{i=1}^{24} \sqrt{j} - \sum_{i=3}^{26} \big(\sqrt{j} - 1\big)\), where \(j\) is a non-negative number.
Calculate \(\sum_{a=b}^c xa\), where \(b\) and \(c\) are whole numbers with \(c\ge b\). You can use the formula \(\sum_{i=1}^n i = n(n+1)/2\).
S2. An example involving sums of squares and products.
Show that \(\frac{1}{n}\sum_{i=1}^n \big(x_i-\bar x\big)\big(y_i - \bar y\big) = \frac{1}{n}\Big(\sum_{i=1}^n x_iy_i\Big) - \bar x\bar y\), where \(\bar x = \frac{1}{n}\sum_{i=1}^n x_i\) and \(\bar y = \frac{1}{n}\sum_{i=1}^n y_i\).
Let \({\mathbf{1}}=(1,1,\dots,1)\) and \({\mathbf{x}}=(x_1,x_2,\dots,x_n)\) be two vectors treated as \(n\times 1\) matrices. Use \(\Sigma\) notation to evaluate the matrix product \({\mathbf{1}}^{{\scriptscriptstyle \mathrm{T}}}{\mathbf{x}}\).
Evaluate \(\sum_{i=1}^n \big(x_i - \bar x\big)\), where \(\bar x = \frac{1}{n}\sum_{i=1}^n x_i\).
Calculate \(\frac{d}{dm}\sum_{i=1}^{n} (y_i -m x_i)^2\). (This will not be on the quiz, since the notes said differentiation to get the linear model will not be tested.)
Two useful formulas for summations are \(\sum_{i=1}^n i = \frac{n(n+1)}{2}\) and \(\sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6}\). Using these formulas, evaluate \(\sum_{i=1}^{10} (1-2i)^2\). It may be helpful to start by multiplying out the square.
Show that \(\frac{1}{n} \sum_{i=1}^n \big(x_i - \bar x\big)^2 = \Big(\frac{1}{n}\sum_{i=1}^n x_i^2\Big) -\bar x^2\), where \(\bar x = \frac{1}{n}\sum_{i=1}^n x_i\).
Let \(\mathbb{A} = \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \end{bmatrix}\) and let \({\mathbf{b}}= \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix}\). Use \(\sum\) notation to evaluate the matrix product \(\mathbb{A}{\mathbf{b}}\). Your solution should be a matrix for which each term is written as a summation.
R1. Using rep()
and matrix()
.
Which of the following is the output of matrix(c(rep(0,times=4),rep(1,times=4)),ncol=2)
\[ (a) \begin{bmatrix} 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ \end{bmatrix} ; \quad (b) \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ 1 & 1 \\ 1 & 1 \\ \end{bmatrix} ; \quad (c) \begin{bmatrix} 0 & 0 \\ 1 & 1 \\ 0 & 0 \\ 1 & 1 \\ \end{bmatrix} ; \quad (d) \begin{bmatrix} 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ \end{bmatrix} \]
Which of the following code successfully construct the matrix \(\mathbb{A} = \begin{bmatrix}1 & 1\\2 & 2\\3 & 3\end{bmatrix}\)
(a). \(\quad\) A <- matrix(c(1,1,2,2,3,3) ,nrow=3)
(b). \(\quad\) A <- cbind(c(1,1),c(2,2),c(3,3))
(c). \(\quad\) A <- t(matrix(c(1,1,2,2,3,3) ,nrow=2))
(d). \(\quad\) A <- c(c(1:3),c(1:3))
Which of the following successfully select the diagonal elements of the matrix
\(\mathbb{A} = \begin{bmatrix}1 & 0\\2 & 3\end{bmatrix}\) represented in R by A<-matrix(c(1,2,0,3),2,2)
?
\(\quad\) A[c(1,1),c(2,2)]
\(\quad\) A[rbind(c(1,1),c(2,2))]
\(\quad\) A[cbind(c(1,1),c(2,2))]
\(\quad\) A[matrix(c(TRUE,FALSE,FALSE,TRUE),2)]
\(\quad\) all of (a,b,c,d)
\(\quad\) none of (a,b,c,d)
\(\quad\) (b) and (d) only
\(\quad\) (a) and (b) only
Which of the following is the matrix \({\mathbb{A}}\) generated by
A <- t(matrix(c(rep(1,times=2),rep(3,times=2), 6, 4),ncol=3))
\(\quad \mathbb{A} = \begin{bmatrix} 1 & 1 \\ 3 & 3 \\ 6 & 4 \end{bmatrix}\)
\(\quad \mathbb{A} = \begin{bmatrix} 1 & 3 & 6 \\ 1 & 3 & 4 \end{bmatrix}\)
\(\quad \mathbb{A} = \begin{bmatrix} 1 & 3 \\ 1 & 6 \\ 1 & 3 \end{bmatrix}\)
\(\quad \mathbb{A} = \begin{bmatrix} 1 & 1 & 3 \\ 3 & 6 & 4 \end{bmatrix}\)
R2. Manipulating vectors and matrices in R.
Suppose X
is a matrix in R. Which of the following is NOT equivalent to X
?
(a). t(t(X))
(b). X %*% matrix(1,ncol(X)
(c). X*1
(d). X%*%diag(ncol(X))
Suppose we define an R vector by y <- c(3,NA,-1,4,NA,-2)
. What will y[y>0]
give you?
(a). A vector of the positive elements and NA values of y
.
(b). A vector of the negative elements of y
.
(c). A vector of all NA
s.
(d). A vector of TRUE
s and FALSE
s.
(e). A vector of TRUE
s and FALSE
s and NA
s.
Which of the following successfully select the first five odd elements of the vector
\(x = c(1,2,3,4,5,6,7,8,9,10,11)\)? (check all that apply. Do not check commands that will give an error)
\(\square\) \(\quad\) x[rep(c(TRUE,FALSE),each=5)]
\(\square\) \(\quad\) x[rep(c(TRUE,FALSE),times=5)]
\(\square\) \(\quad\) x[rep(c(TRUE,FALSE),length=9)]
\(\square\) \(\quad\) x[rep(c(TRUE,FALSE)][1:5]
\(\square\) \(\quad\) x[rep(c("TRUE","FALSE"),5)]
\(\square\) \(\quad\) None of the above
\(\square\) \(\quad\) All of the above
Suppose we define a vector x <- c(3,0,-1,4,0,-2)
. What will which(x==0)
give you?
TRUE
’s and FALSE
’s.F1. Recall the dataset uswages
containing ten variables on 2000 subjects from the 1988 Current Population Survey.
head(uswages, n=4)
## wage educ exper race smsa ne mw so we pt
## 6085 771.60 18 18 0 1 1 0 0 0 0
## 23701 617.28 15 20 0 1 0 0 0 1 0
## 16208 957.83 16 9 0 1 0 0 1 0 0
## 2720 617.28 12 24 0 1 1 0 0 0 0
Suppose we want to fit a linear model using wage as response, with years of education and years of experience as predictors. Which of the following code succesfully construct the matrix \(\mathbb{X}\) for a representation \({\mathbf{y}}={\mathbb{X}}{\mathbf{b}}+{\mathbf{e}}\).
(a). X <- matrix(uswages$educ, uswages$exper)
(b). X <- matrix(rep(1,nrow(uswages)), uswages$educ, uswages$exper)
(c). X <- cbind(rep(1,nrow(uswages)), uswages$educ, uswages$exper)
(d). X <- cbind(uswages$educ, uswages$exper)
F2. If we want to fit the model using R function lm()
, which of the following calls is correct?
(a). lm(wage ~ ., data = uswages)
(b). lm(y ~ x, data = uswages)
(c). lm(wage = educ + exper, data = uswages)
Explain briefly how you would check whether your proposed solution is correct.
F1. Consider the linear model in the notes where detrended life expectancy is explained by detrended unemployment. We fitted the model lm1 <- lm(L_detrended~U_detrended)
where L_detrended
and U_detrended
are vectors of length 68. Writing L_detrended
as \(y_1,\dots,y_n\) and U_detrended
as \(x_1,\dots,x_n\) with \(n=68\), give the equations that define this model mathematically both by writing an equation for a generic year \(i\) and by using matrix notation.
F2. For the data analysis above, explain how R computes the quantity coef(lm1)
.
F1. Fitting a linear model by least squares.
library(faraway)
data("sat")
head(sat)
## expend ratio salary takers verbal math total
## Alabama 4.405 17.2 31.144 8 491 538 1029
## Alaska 8.963 17.6 47.951 47 445 489 934
## Arizona 4.778 19.3 32.175 27 448 496 944
## Arkansas 4.459 17.1 28.934 6 482 523 1005
## California 4.992 24.0 41.078 45 417 485 902
## Colorado 5.443 18.4 34.571 29 462 518 980
Which of the following would produce the design matrix \(\mathbb{X}\) for the model lm(sat ~ ratio + expend, data = sat)
.
X <- matrix(rep(1, length(ratio)), ratio, expend)
X <- matrix(1, ratio, expend)
X <- cbind(rep(1, length(ratio)), ratio, expend)
X <- cbind(1, ratio, expend)
X <- cbind(ratio, expend)
F2. Consider our kicker data from homework 3.
data_nfl <- read.csv("FieldGoals2003to2006.csv",header = TRUE,skip=5)
head(data_nfl)
## Name Yeart Teamt FGAt FGt Team.t.1. FGAtM1 FGtM1 FGAtM2 FGtM2
## 1 Adam Vinatieri 2003 NE 34 73.5 NE 30 90.0 NA NA
## 2 Adam Vinatieri 2004 NE 33 93.9 NE 34 73.5 30 90.0
## 3 Adam Vinatieri 2005 NE 25 80.0 NE 33 93.9 34 73.5
## 4 Adam Vinatieri 2006 IND 19 89.4 NE 25 80.0 33 93.9
## 5 David Akers 2003 PHI 29 82.7 PHI 34 88.2 NA NA
## 6 David Akers 2004 PHI 32 84.3 PHI 29 82.7 34 88.2
Recall that we built the model \(y_i=mx_i+c_1z_{i,1}+c_2z_{i,2}+ \dots + c_{19}z_{i,19}+e_i\) where where \(x_i\) is FGtM1 and \(z_{i,1}\) takes the value 1 when row i of the data corresponds to kicker 1 (i.e., for i=1,2,3,4) and 0 otherwise. Write the design matrix of the model. (You do not need to include specific values for \(x_i\).)
Acknowledgements: Some questions are derived from https://genomicsclass.github.io/book. Some are derived from http://swirlstats.com/.
License: This material is provided under an MIT license