Quiz 1, STATS 401 W18

The quiz will test the skills covered in homeworks 1 to 4. You will have 50 minutes allocated, though the quiz may take you much less time and you can leave lab once you are done. The quiz will be closed book. Any electronic devices in your possession must be turned off and remain in a bag on the floor. Technical skills tested will include matrix multiplication and transposition, inversion for 2x2, use of the sigma notation for sums. R coding will be tested by multiple choice questions. There will be questions on setting up equations for fitting a linear model, writing these equations in matrix form, and obtaining and interpreting the least squares fit to data.

This document generates different random quizzes each time it is compiled by Rmarkdown. The actual quiz will be a realization generated by this random process, or something very similar.

Matrix exercises

M1. Evaluate ${\mathbb{A}}{\mathbb{B}}$ when \[ {\mathbb{A}}= \begin{bmatrix} -2 & -1 \\ 2 & -1 \\ 3 & -2 \\ \end{bmatrix} , \quad {\mathbb{B}} = \begin{bmatrix} -1 & -2 \\ 1 & -2 \\ \end{bmatrix} \]

M2. For ${\mathbb{A}}$ as above, write down ${\mathbb{A}}^{{\scriptscriptstyle \mathrm{T}}}$.

M3. For ${\mathbb{B}}$ as above, find ${\mathbb{B}}^{-1}$ if it exists. If ${\mathbb{B}}^{-1}$ doesn’t exist, explain how you know this.

Summation exercises

S1. A basic exercise.

Evaluate $\sum_{a=b}^c d$, where $b$ and $c$ are whole numbers with $c\ge b$.

Calculate $\sum_{i=0}^{20} (10-i)$.

Calculate $\sum_{i=k}^{k+4} (i+3)$, where $k$ is a whole number. Your answer should depend on $k$.

Evaluate $\sum_{i=1}^{30} 10 - \sum_{i=10}^{20} 20$.

Evaluate $\sum_{i=1}^{24} \sqrt{j} - \sum_{i=3}^{26} \big(\sqrt{j} - 1\big)$, where $j$ is a non-negative number.

Calculate $\sum_{a=b}^c xa$, where $b$ and $c$ are whole numbers with $c\ge b$. You can use the formula $\sum_{i=1}^n i = n(n+1)/2$.

S2. An example involving sums of squares and products.

Show that $\frac{1}{n}\sum_{i=1}^n \big(x_i-\bar x\big)\big(y_i - \bar y\big) = \frac{1}{n}\Big(\sum_{i=1}^n x_iy_i\Big) - \bar x\bar y$, where $\bar x = \frac{1}{n}\sum_{i=1}^n x_i$ and $\bar y = \frac{1}{n}\sum_{i=1}^n y_i$.

Let ${\mathbf{1}}=(1,1,\dots,1)$ and ${\mathbf{x}}=(x_1,x_2,\dots,x_n)$ be two vectors treated as $n\times 1$ matrices. Use $\Sigma$ notation to evaluate the matrix product ${\mathbf{1}}^{{\scriptscriptstyle \mathrm{T}}}{\mathbf{x}}$.

Evaluate $\sum_{i=1}^n \big(x_i - \bar x\big)$, where $\bar x = \frac{1}{n}\sum_{i=1}^n x_i$.

Calculate $\frac{d}{dm}\sum_{i=1}^{n} (y_i -m x_i)^2$. (This will not be on the quiz, since the notes said differentiation to get the linear model will not be tested.)

Two useful formulas for summations are $\sum_{i=1}^n i = \frac{n(n+1)}{2}$ and $\sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6}$. Using these formulas, evaluate $\sum_{i=1}^{10} (1-2i)^2$. It may be helpful to start by multiplying out the square.

Show that $\frac{1}{n} \sum_{i=1}^n \big(x_i - \bar x\big)^2 = \Big(\frac{1}{n}\sum_{i=1}^n x_i^2\Big) -\bar x^2$, where $\bar x = \frac{1}{n}\sum_{i=1}^n x_i$.

Let $\mathbb{A} = \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \end{bmatrix}$ and let ${\mathbf{b}}= \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix}$. Use $\sum$ notation to evaluate the matrix product $\mathbb{A}{\mathbf{b}}$. Your solution should be a matrix for which each term is written as a summation.

R exercises

R1. Using rep() and matrix().

Which of the following is the output of matrix(c(rep(0,times=4),rep(1,times=4)),ncol=2)

\[ (a) \begin{bmatrix} 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ \end{bmatrix} ; \quad (b) \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ 1 & 1 \\ 1 & 1 \\ \end{bmatrix} ; \quad (c) \begin{bmatrix} 0 & 0 \\ 1 & 1 \\ 0 & 0 \\ 1 & 1 \\ \end{bmatrix} ; \quad (d) \begin{bmatrix} 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ \end{bmatrix} \]

Which of the following code successfully construct the matrix $\mathbb{A} = \begin{bmatrix}1 & 1\\2 & 2\\3 & 3\end{bmatrix}$

(a). $\quad$ A <- matrix(c(1,1,2,2,3,3) ,nrow=3)

(b). $\quad$ A <- cbind(c(1,1),c(2,2),c(3,3))

(c). $\quad$ A <- t(matrix(c(1,1,2,2,3,3) ,nrow=2))

(d). $\quad$ A <- c(c(1:3),c(1:3))

Which of the following successfully select the diagonal elements of the matrix
$\mathbb{A} = \begin{bmatrix}1 & 0\\2 & 3\end{bmatrix}$ represented in R by A<-matrix(c(1,2,0,3),2,2)?

$\quad$ A[c(1,1),c(2,2)]
$\quad$ A[rbind(c(1,1),c(2,2))]
$\quad$ A[cbind(c(1,1),c(2,2))]
$\quad$ A[matrix(c(TRUE,FALSE,FALSE,TRUE),2)]
$\quad$ all of (a,b,c,d)
$\quad$ none of (a,b,c,d)
$\quad$ (b) and (d) only
$\quad$ (a) and (b) only

Which of the following is the matrix ${\mathbb{A}}$ generated by

A <- t(matrix(c(rep(1,times=2),rep(3,times=2), 6, 4),ncol=3))

$\quad \mathbb{A} = \begin{bmatrix} 1 & 1 \\ 3 & 3 \\ 6 & 4 \end{bmatrix}$
$\quad \mathbb{A} = \begin{bmatrix} 1 & 3 & 6 \\ 1 & 3 & 4 \end{bmatrix}$
$\quad \mathbb{A} = \begin{bmatrix} 1 & 3 \\ 1 & 6 \\ 1 & 3 \end{bmatrix}$
$\quad \mathbb{A} = \begin{bmatrix} 1 & 1 & 3 \\ 3 & 6 & 4 \end{bmatrix}$

R2. Manipulating vectors and matrices in R.

Suppose X is a matrix in R. Which of the following is NOT equivalent to X?

(a). t(t(X))

(b). X %*% matrix(1,ncol(X)

(c). X*1

(d). X%*%diag(ncol(X))

Suppose we define an R vector by y <- c(3,NA,-1,4,NA,-2). What will y[y>0] give you?

(a). A vector of the positive elements and NA values of y.

(b). A vector of the negative elements of y.

(c). A vector of all NAs.

(d). A vector of TRUEs and FALSEs.

(e). A vector of TRUEs and FALSEs and NAs.

Which of the following successfully select the first five odd elements of the vector
$x = c(1,2,3,4,5,6,7,8,9,10,11)$? (check all that apply. Do not check commands that will give an error)

$\square$ $\quad$ x[rep(c(TRUE,FALSE),each=5)]

$\square$ $\quad$ x[rep(c(TRUE,FALSE),times=5)]

$\square$ $\quad$ x[rep(c(TRUE,FALSE),length=9)]

$\square$ $\quad$ x[rep(c(TRUE,FALSE)][1:5]

$\square$ $\quad$ x[rep(c("TRUE","FALSE"),5)]

$\square$ $\quad$ None of the above

$\square$ $\quad$ All of the above

Suppose we define a vector x <- c(3,0,-1,4,0,-2). What will which(x==0) give you?

A vector of the 0 elements of x.
A vectors of 0’s.
A vector of TRUE’s and FALSE’s.
The vector of the indices of the 0 values.

Fitting a linear model by least squares

F1. Recall the dataset uswages containing ten variables on 2000 subjects from the 1988 Current Population Survey.

head(uswages, n=4)

##         wage educ exper race smsa ne mw so we pt
## 6085  771.60   18    18    0    1  1  0  0  0  0
## 23701 617.28   15    20    0    1  0  0  0  1  0
## 16208 957.83   16     9    0    1  0  0  1  0  0
## 2720  617.28   12    24    0    1  1  0  0  0  0

Suppose we want to fit a linear model using wage as response, with years of education and years of experience as predictors. Which of the following code succesfully construct the matrix $\mathbb{X}$ for a representation ${\mathbf{y}}={\mathbb{X}}{\mathbf{b}}+{\mathbf{e}}$.

(a). X <- matrix(uswages$educ, uswages$exper)

(b). X <- matrix(rep(1,nrow(uswages)), uswages$educ, uswages$exper)

(c). X <- cbind(rep(1,nrow(uswages)), uswages$educ, uswages$exper)

(d). X <- cbind(uswages$educ, uswages$exper)

F2. If we want to fit the model using R function lm(), which of the following calls is correct?

(a). lm(wage ~ ., data = uswages)

(b). lm(y ~ x, data = uswages)

(c). lm(wage = educ + exper, data = uswages)

Explain briefly how you would check whether your proposed solution is correct.

F1. Consider the linear model in the notes where detrended life expectancy is explained by detrended unemployment. We fitted the model lm1 <- lm(L_detrended~U_detrended) where L_detrended and U_detrended are vectors of length 68. Writing L_detrended as $y_1,\dots,y_n$ and U_detrended as $x_1,\dots,x_n$ with $n=68$, give the equations that define this model mathematically both by writing an equation for a generic year $i$ and by using matrix notation.

F2. For the data analysis above, explain how R computes the quantity coef(lm1).

F1. Fitting a linear model by least squares.

library(faraway)
data("sat")
head(sat)

##            expend ratio salary takers verbal math total
## Alabama     4.405  17.2 31.144      8    491  538  1029
## Alaska      8.963  17.6 47.951     47    445  489   934
## Arizona     4.778  19.3 32.175     27    448  496   944
## Arkansas    4.459  17.1 28.934      6    482  523  1005
## California  4.992  24.0 41.078     45    417  485   902
## Colorado    5.443  18.4 34.571     29    462  518   980

Which of the following would produce the design matrix $\mathbb{X}$ for the model lm(sat ~ ratio + expend, data = sat).

X <- matrix(rep(1, length(ratio)), ratio, expend)
X <- matrix(1, ratio, expend)
X <- cbind(rep(1, length(ratio)), ratio, expend)
X <- cbind(1, ratio, expend)
X <- cbind(ratio, expend)

F2. Consider our kicker data from homework 3.

data_nfl <- read.csv("FieldGoals2003to2006.csv",header = TRUE,skip=5)
head(data_nfl)

##             Name Yeart Teamt FGAt  FGt Team.t.1. FGAtM1 FGtM1 FGAtM2 FGtM2
## 1 Adam Vinatieri  2003    NE   34 73.5        NE     30  90.0     NA    NA
## 2 Adam Vinatieri  2004    NE   33 93.9        NE     34  73.5     30  90.0
## 3 Adam Vinatieri  2005    NE   25 80.0        NE     33  93.9     34  73.5
## 4 Adam Vinatieri  2006   IND   19 89.4        NE     25  80.0     33  93.9
## 5    David Akers  2003   PHI   29 82.7       PHI     34  88.2     NA    NA
## 6    David Akers  2004   PHI   32 84.3       PHI     29  82.7     34  88.2

Recall that we built the model $y_i=mx_i+c_1z_{i,1}+c_2z_{i,2}+ \dots + c_{19}z_{i,19}+e_i$ where where $x_i$ is FGtM1 and $z_{i,1}$ takes the value 1 when row i of the data corresponds to kicker 1 (i.e., for i=1,2,3,4) and 0 otherwise. Write the design matrix of the model. (You do not need to include specific values for $x_i$.)

Acknowledgements: Some questions are derived from https://genomicsclass.github.io/book. Some are derived from http://swirlstats.com/.

License: This material is provided under an MIT license

Quiz 1, STATS 401 W18

In lab on 2/1 or 2/2

Matrix exercises

Summation exercises

R exercises

Fitting a linear model by least squares