Write solutions to the exercises in the first half of the homework. For the data analysis, you do not have to report anything for questions 1–4. For questions 5 and 6, report your code together with a brief explanation. Question 7 asks you to carefully write out the probability model you have used for the standard errors. Recall that you are permitted to collaborate, or to use any internet resources, but you must list all sources that make a substantial contribution to your report. As usual, following the syllabus, you are also requested to give some feedback in a “Please explain” statement.


Exercises on variance, covariance and the normal distribution

  1. Suppose \(X\) and \(Y\) are bivariate random variables with means \(\mu_X\) and \(\mu_Y\) respectively. Use the linearity of expectation, together with the formulas \[ \mathrm{Var}(X)=\mathrm{E}[X^2]-(\mathrm{E}[X])^2, \quad \mathrm{Cov}(X,Y)=\mathrm{E}[XY]-\mathrm{E}[X]\, \mathrm{E}[Y], \] to show that \[ \mathrm{Var}(X+Y)=\mathrm{Var}(X)+\mathrm{Var}(Y) +2\mathrm{Cov}(X,Y). \] This is similar to a calculation we did in class, but here it is a little easier since you can start from the above formulas for \(\mathrm{Var}(X)\) and \(\mathrm{Cov}(X,Y)\) rather than going all the way back to the basic definition \[ \mathrm{Var}(X)=\mathrm{E}\big[ (X- \mathrm{E}[X])^2\big], \quad \mathrm{Cov}(X,Y)=\mathrm{E}\big[ (X-\mathrm{E}[X])(Y-\mathrm{E}[Y]) \big]. \]
\[\begin{equation} \label{Q1} var(X+Y)=E[(X+Y)^2] - [E(X+Y)]^2 \end{equation}\]

Now, \[\begin{align*} E[(X+Y)^2] &= E(X^2 + Y^2 + 2XY)\\ &= E(X^2) + E(Y^2) + 2E(XY)\\ \end{align*}\]

Also, \[ [E(X+Y)]^2 = [E(X)+E(Y)]^2 = [E(X)]^2 + [E(Y)]^2 +2 E(X)E(Y) \]

Substituting both of these in equation , we have \[\begin{align*} Var(X+Y) &= E(X^2) + E(Y^2) + 2E(XY) - [E(X)]^2 - [E(Y)]^2 - 2 E(X)E(Y) \\ &= \underbrace{E(X^2) - [E(X)]^2}_{Var(X)} + \underbrace{E(Y^2) - [E(Y)]^2}_{Var(Y)} + \underbrace{2E(XY)- 2 E(X)E(Y)}_{2Cov(X,Y)}\\ &= Var(X) + Var(Y) + 2 Cov(X,Y) \end{align*}\]

  1. (a)

    Define A such that \(Y+AZ\) where \(Z=\begin{bmatrix}Z_1 \\ Z_2 \\ Z_3\end{bmatrix}\). So we have here that \[ Y = \begin{bmatrix}Y_1 \\ Y_2 \\ Y_3\end{bmatrix} = \begin{bmatrix}6 & 3 & 1 \\ 0 & 5 & 2 \\ 0 & 0 & 4\end{bmatrix}\begin{bmatrix}Z_1 \\ Z_2 \\ Z_3\end{bmatrix} = \begin{bmatrix}6Z_1 +3Z_2 + Z_3 \\ 5Z_2 + 2Z_3 \\ 4Z_3\end{bmatrix}. \] Hence we see that \(Y\) is a vector containing 3 random variables, where \(Y_1 = 6Z_1 +3Z_2 + Z_3\) , \(Y_2 = 5Z_2 + 2Z_3\) and \(Y_3 = 4Z_3\).

We then know that Var(Y) = Var(AZ) = A Var(Z) \(\text{A}^T = \text{A}\mathbb{I}\text{A}^T = \text{AA}^T\).

Hence, Var(Y) = \(\text{A} \text{A}^T\).

A <- matrix(c(6,0,0,3,5,0,1,2,4),nrow=3)
VarY <- A%*%t(A)
VarY
##      [,1] [,2] [,3]
## [1,]   46   17    4
## [2,]   17   29    8
## [3,]    4    8   16

(b)

Var(\(Y_1 - Y_2\)) = Var(\(Y_1\)) + Var(\(Y_2\)) - 2Cov(\(Y_1\),\(Y_2\)).

From the Var(Y) as obtained in the previous part, we know that Var(\(Y_1\)) = 46, Var(\(Y_2\)) = 29 and 2Cov(\(Y_1\),\(Y_2\)) = 2*17 = 34.

So, Var(\(Y_1 - Y_2\)) = 46 + 29 - 34 = 41.

Doing it in matrix form, \(\mathrm{Var}(\mathbb{A}\boldsymbol{\mathrm{Y}})= \mathbb{A}\mathrm{Var}(\boldsymbol{\mathrm{Y}})\mathbb{A}^{\scriptscriptstyle \mathrm{T}}\) where \(\mathbb{A} = \begin{bmatrix} 1 & -1 & 0\end{bmatrix}\)

A <- matrix(c(1,-1,0),nrow=1)
VarAY <- A %*% VarY %*% t(A)
VarAY
##      [,1]
## [1,]   41

(c)

To use pnorm, we need the mean and variance of \(Y_1-Y_2\).

We know that Var(\(Y_1 - Y_2\)) = 34

E(\(Y_1 - Y_2\)) = E(\(Y_1\)) - E(\(Y_2\)).

Recall from (a) that \(Y_1 = 6Z_1 +3Z_2 + Z_3\) and \(Y_2 = 5Z_2 + 2Z_3\).

So, E(\(Y_1 - Y_2\)) = E(\(Y_1\)) - E(\(Y_2\)) = E(\(6Z_1 +3Z_2 + Z_3\)) - E(\(5Z_2 + 2Z_3\)) = 6E(\(Z_1\)) + 3E(\(Z_2\)) + E(\(Z_3\)) - 5E(\(Z_2\)) -2E(\(Z_3\)) = 0

pnorm(3,mean = 0, sd = sqrt(34), lower.tail = FALSE )
## [1] 0.3034527
  1. Let \(\boldsymbol{\mathrm{X}}=(X_1,\dots,X_n)\) be a random vector with expectation \(\mathrm{E}[\boldsymbol{\mathrm{X}}]=\boldsymbol{\mathrm{\mu}}\) for \(\boldsymbol{\mathrm{\mu}}=(\mu_1,\dots,\mu_n)\). Let \(\mathbb{V}\) be the \(n\times n\) variance matrix of \(\boldsymbol{\mathrm{X}}\), so \(\mathbb{V}_{ij}=\mathrm{Cov}(X_i,X_j)\) for \(i=1,\dots,n\) and \(j=1,\dots,n\).
    1. Interpreting \(\boldsymbol{\mathrm{X}}\) and \(\boldsymbol{\mathrm{\mu}}\) as column vectors, show that

      \([\mathrm{Eq.~1}] \hspace{1.5cm} \mathbb{V}= \mathrm{E}[\boldsymbol{\mathrm{X}}\boldsymbol{\mathrm{X}}^{\scriptscriptstyle \mathrm{T}}]- \boldsymbol{\mathrm{\mu}}\boldsymbol{\mathrm{\mu}}^{\scriptscriptstyle \mathrm{T}}.\)

    2. Now let \(\boldsymbol{\mathrm{Y}}=(Y_1,\dots,Y_m)\) be a random vector with expectation \(\mathrm{E}[\boldsymbol{\mathrm{Y}}]=\boldsymbol{\mathrm{\nu}}\) for \(\boldsymbol{\mathrm{\nu}}=(\nu_1,\dots,\nu_m)\). Let \(\mathbb{A}=[a_{ij}]_{n\times m}\) be an arbitrary constant \(n\times m\) matrix. Apply [Eq. 1] with \(\boldsymbol{\mathrm{X}}=\mathbb{A}\boldsymbol{\mathrm{Y}}\) and use the matrix linearity of expectation (i.e., summation and multiplication by a constant matrix can be moved through \(\mathrm{E}\)) to show that

      \([\mathrm{Eq.~2}] \hspace{1.5cm} \mathrm{Var}(\mathbb{A}\boldsymbol{\mathrm{Y}}) = \mathbb{A}\mathrm{Var}(\boldsymbol{\mathrm{Y}})\mathbb{A}^{\scriptscriptstyle \mathrm{T}}.\)

      You will need the formula we gave earlier in the notes for the transpose of a matrix product, \(\big(\mathbb{A}\boldsymbol{\mathrm{Y}}\big)^{\scriptscriptstyle \mathrm{T}}= \boldsymbol{\mathrm{Y}}^{\scriptscriptstyle \mathrm{T}}\mathbb{A}^{\scriptscriptstyle \mathrm{T}}\).

Ans (a)

\[ \mathbb{V}_{ij}=\mathrm{Cov}(X_i,X_j) \] \[\begin{align*} Cov(X_i,X_j) &= E(X_i,X_j) - E(X_i)E(X_j)\\ &= E(X_i,X_j) - \mu_i\mu_j \\ &= E({\mathbf{X} \mathbf{X}^T)}_{ij} - [\boldsymbol{\mu} \boldsymbol{\mu}^T]_{ij} \end{align*}\]

Note that since \(\boldsymbol{\mathrm{X}}\) and \(\boldsymbol{\mathrm{\mu}}\) are column vectors,



\[ \boldsymbol{\mathrm{X}} \boldsymbol{\mathrm{X}}^T =\begin{bmatrix}X_1 \\ X_2 \\ \vdots \\ X_n\end{bmatrix} \begin{bmatrix}X_1 & X_2 \dots X_n\end{bmatrix} = \begin{bmatrix}X_1^2 & X_1 X_2 & X_1X_3& \dots& X_1X_n \\ X_2X_1 & X_2^2 & X_2X_3 & \dots & X_2X_n \\ \vdots & \vdots & \vdots & & \vdots \\ X_nX_1 & X_nX_2 & X_nX_3 & \dots & X_n^2 \end{bmatrix} \]

\[ \boldsymbol{\mathrm{\mu}} \boldsymbol{\mathrm{\mu}}^T =\begin{bmatrix}\mu_1 \\ \mu_2 \\ \vdots \\ \mu_n\end{bmatrix} \begin{bmatrix}\mu_1 & \mu_2 \dots \mu_n\end{bmatrix} = \begin{bmatrix} \mu_1^2 & \mu_1 \mu_2 & \mu_1\mu_3& \dots& \mu_1\mu_n \\ \mu_2\mu_1 & \mu_2^2 & \mu_2\mu_3 & \dots & \mu_2\mu_n \\ \vdots & \vdots & \vdots & & \vdots \\ \mu_n\mu_1 & \mu_n\mu_2 & \mu_n\mu_3 & \dots & \mu_n^2 \end{bmatrix} \]

Thus, \(\mathbb{V}_{ij} = E({\mathbf{X} \mathbf{X}^T)}_{ij} - [\boldsymbol{\mu} \boldsymbol{\mu}^T]_{ij}\) for all i,j.

So, \(\mathbb{V}= \mathrm{E}[\boldsymbol{\mathrm{X}}\boldsymbol{\mathrm{X}}^{\scriptscriptstyle \mathrm{T}}]- \boldsymbol{\mathrm{\mu}}\boldsymbol{\mathrm{\mu}}^{\scriptscriptstyle \mathrm{T}}.\)

(b)

\(\mathrm{Var}(\mathbb{A}\boldsymbol{\mathrm{Y}}) = \mathrm{E}(\mathbb{A}\boldsymbol{\mathrm{Y}} [\mathbb{A}\boldsymbol{\mathrm{Y}}]^{\scriptscriptstyle \mathrm{T}}) - \mathbb{A}\nu [\mathbb{A}\nu]^{\scriptscriptstyle \mathrm{T}}\)

\(= \mathrm{E}( \mathbb{A}\boldsymbol{\mathrm{Y}} {\boldsymbol{\mathrm{Y}}}^{\scriptscriptstyle \mathrm{T}}{\mathbb{A}}^{\scriptscriptstyle \mathrm{T}}) - \mathbb{A}\nu {\nu}^{\scriptscriptstyle \mathrm{T}}[\mathbb{A}]^{\scriptscriptstyle \mathrm{T}}\)

\(= \mathbb{A}\ E(\boldsymbol{\mathrm{Y}} \boldsymbol{\mathrm{Y}}^{\scriptscriptstyle \mathrm{T}}) \mathbb{A}^{\scriptscriptstyle \mathrm{T}}- \mathbb{A}\nu \nu^{\scriptscriptstyle \mathrm{T}}\mathbb{A}^{\scriptscriptstyle \mathrm{T}}\)

\(= \mathbb{A} \big( \mathrm{E}(\boldsymbol{\mathrm{Y}} \boldsymbol{\mathrm{Y}}^{\scriptscriptstyle \mathrm{T}}) - \nu \nu^{\scriptscriptstyle \mathrm{T}}\big) \mathbb{A}^{\scriptscriptstyle \mathrm{T}}\)

\(= \mathbb{A}\mathrm{Var}(\boldsymbol{\mathrm{Y}})\mathbb{A}^{\scriptscriptstyle \mathrm{T}}\)

Understanding how lm() obtains standard errors

Read the analysis of newspaper circulation data in Section 1.2.2 of Sheather. This example is continued in Section 6.2.2 of Sheather. You are now in a position to read most of this section too, but you are not required to do so at this point.

  1. Make a local copy of the data.
circulation <- read.table("circulation.txt",sep="\t",header=T)
  1. Transform the data. Add two new columns to the dataframe called log_Sunday and log_Weekday containing the natural logarithm of the corresponding columns. The R command log() gives this natural logarithm, also known as log to the base \(e\). We’ll discuss later in class how and why to choose a suitable transformation of the data, which is an important decision for data analysis.

  2. Build the model in R. Create a linear model called lm1 by fitting the logarithm of weekday circulation and the binary variable for tabloid competitor as explanatory variables for the logarithm of Sunday circulation. Your code may look something like

lm1 <- lm(log_Sunday~log_Weekday+Tabloid_with_serious_competitor,data=circulation)
  1. Set X to be the design matrix using the model.matrix() command by typing
X <- model.matrix(lm1)
  1. Use the design matrix and the response variable to compute the least squares coefficients and their standard errors by matrix calculations in R. It may be helpful to set
y <- circulation$log_Sunday
b <- solve(t(X) %*% X) %*% t(X) %*% y; b
##                                       [,1]
## (Intercept)                     -0.4472999
## log_Weekday                      1.0613296
## Tabloid_with_serious_competitor -0.5313725
#residual standard error
y_hat <- X %*% b
s <-sqrt(sum((y - y_hat)^2)/(nrow(X)-4));s
## [1] 0.140009
# find standard error for b
b_se <- s*sqrt(diag(solve(t(X) %*% X)));b_se
##                     (Intercept)                     log_Weekday 
##                      0.35344519                      0.02864387 
## Tabloid_with_serious_competitor 
##                      0.06840268
  1. Check that these match the output of summary(lm1). Also, check that your calculation of the estimated standard deviation of the measurement error matches the residual standard error offered by summary(lm1). Why do you think summary(lm1) says that this is computed on 86 degrees of freedom?
summary(lm1)
## 
## Call:
## lm(formula = log_Sunday ~ log_Weekday + Tabloid_with_serious_competitor, 
##     data = circulation)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27550 -0.10175 -0.00198  0.08758  0.34881 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -0.44730    0.35138  -1.273    0.206    
## log_Weekday                      1.06133    0.02848  37.270  < 2e-16 ***
## Tabloid_with_serious_competitor -0.53137    0.06800  -7.814 1.26e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1392 on 86 degrees of freedom
## Multiple R-squared:  0.9427, Adjusted R-squared:  0.9413 
## F-statistic: 706.8 on 2 and 86 DF,  p-value: < 2.2e-16

Clearly, they do match.

This is computed on 86 degrees of freedom because degrees of freedom in this case, n-p = 89 - 3 = 86.

  1. Write out in mathematical notation the probability model used to contruct these standard errors. Be careful to define the notation you use. Specify a letter for each quantity - you can use words to help define the quantities in your equation, but you should usually avoid words in an equation. You can write your equation either using vectors & matrices or by using subscripts to denote each unit \(i\) and specifying the range of values of \(i\) for which the equation holds. Be explicit about what quantities are random vectors. If you define a measurement error model, be sure to specify all means, variances and covariances for the error random variables.

Let \(Y_i\) be the Sunday ciculation of newspaper \(i\). The design matrix \(\mathbb{X} = \begin{bmatrix}X_1 & X_2 & X_3\end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ \vdots \\ x_{i1} & x_{i2} & x_{i3} \\ \vdots \\ x_{n1} & x_{n2} & x_{n3} \end{bmatrix} = \begin{bmatrix} 1 & x_{12} & x_{13} \\ 1 & x_{22} & x_{23} \\ \vdots \\ 1 & x_{i2} & x_{i3} \\ \vdots \\ 1 & x_{n2} & x_{n3} \end{bmatrix}\) where \(x_{i1} = 1\) (which corresponds to the intercept), \(x_{i2}\) is the weekday circulation of newspaper i and \(x_{i3}\) is the indicator function which takes value 1 if there is a competing tabloid, and 0 otherwise.

The model in matrix form is

\[ \boldsymbol{\mathrm{Y}} = \mathbb{X} \boldsymbol{\mathrm{\beta}} + \boldsymbol{\mathrm{\epsilon}} \]

where \(\boldsymbol{\mathrm{Y}} = \begin{bmatrix}Y_1 \\ Y_2 \\ \vdots \\ Y_n\end{bmatrix}\) is the vector of random variables \(Y_i\) which measures the Sunday circulation of newspaper i.

\(\mathbb{X}\) is the design matrix (as defined above). We obtain this purely from the data.

\(\boldsymbol{\mathrm{\beta}} = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}\) is the vector containing the true parameter values.

\(\boldsymbol{\mathrm{\epsilon}} = \begin{bmatrix}\epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n\end{bmatrix}\) is the vetor of random variables \(\epsilon_i\) which measures the error corresponding to newspaper i.

Model in subscript form, for i=1,2,…,n

\[ Y_i = \beta_1 + x_{i2}\beta_2 + x_{i3}\beta_3 + \epsilon_i\]

where \(\mathrm{Var}(\epsilon_i) = \sigma^2\), \(\mathrm{Cov}(\epsilon_i,\epsilon_j)=0\) for \(i\neq j\).

We model our data \(\mathbf{y}= \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_n\end{bmatrix}\) as a draw from this probability model.