Write solutions to the exercises in the first half of the homework. For the data analysis, you do not have to report anything for questions 1–4. For questions 5 and 6, report your code together with a brief explanation. Question 7 asks you to carefully write out the probability model you have used for the standard errors. Recall that you are permitted to collaborate, or to use any internet resources, but you must list all sources that make a substantial contribution to your report. As usual, following the syllabus, you are also requested to give some feedback in a “Please explain” statement.
Now, \[\begin{align*} E[(X+Y)^2] &= E(X^2 + Y^2 + 2XY)\\ &= E(X^2) + E(Y^2) + 2E(XY)\\ \end{align*}\]
Also, \[ [E(X+Y)]^2 = [E(X)+E(Y)]^2 = [E(X)]^2 + [E(Y)]^2 +2 E(X)E(Y) \]
Substituting both of these in equation , we have \[\begin{align*} Var(X+Y) &= E(X^2) + E(Y^2) + 2E(XY) - [E(X)]^2 - [E(Y)]^2 - 2 E(X)E(Y) \\ &= \underbrace{E(X^2) - [E(X)]^2}_{Var(X)} + \underbrace{E(Y^2) - [E(Y)]^2}_{Var(Y)} + \underbrace{2E(XY)- 2 E(X)E(Y)}_{2Cov(X,Y)}\\ &= Var(X) + Var(Y) + 2 Cov(X,Y) \end{align*}\]
We then know that Var(Y) = Var(AZ) = A Var(Z) \(\text{A}^T = \text{A}\mathbb{I}\text{A}^T = \text{AA}^T\).
Hence, Var(Y) = \(\text{A} \text{A}^T\).
A <- matrix(c(6,0,0,3,5,0,1,2,4),nrow=3)
VarY <- A%*%t(A)
VarY
## [,1] [,2] [,3]
## [1,] 46 17 4
## [2,] 17 29 8
## [3,] 4 8 16
Var(\(Y_1 - Y_2\)) = Var(\(Y_1\)) + Var(\(Y_2\)) - 2Cov(\(Y_1\),\(Y_2\)).
From the Var(Y) as obtained in the previous part, we know that Var(\(Y_1\)) = 46, Var(\(Y_2\)) = 29 and 2Cov(\(Y_1\),\(Y_2\)) = 2*17 = 34.
So, Var(\(Y_1 - Y_2\)) = 46 + 29 - 34 = 41.
Doing it in matrix form, \(\mathrm{Var}(\mathbb{A}\boldsymbol{\mathrm{Y}})= \mathbb{A}\mathrm{Var}(\boldsymbol{\mathrm{Y}})\mathbb{A}^{\scriptscriptstyle \mathrm{T}}\) where \(\mathbb{A} = \begin{bmatrix} 1 & -1 & 0\end{bmatrix}\)
A <- matrix(c(1,-1,0),nrow=1)
VarAY <- A %*% VarY %*% t(A)
VarAY
## [,1]
## [1,] 41
To use pnorm, we need the mean and variance of \(Y_1-Y_2\).
We know that Var(\(Y_1 - Y_2\)) = 34
E(\(Y_1 - Y_2\)) = E(\(Y_1\)) - E(\(Y_2\)).
Recall from (a) that \(Y_1 = 6Z_1 +3Z_2 + Z_3\) and \(Y_2 = 5Z_2 + 2Z_3\).
So, E(\(Y_1 - Y_2\)) = E(\(Y_1\)) - E(\(Y_2\)) = E(\(6Z_1 +3Z_2 + Z_3\)) - E(\(5Z_2 + 2Z_3\)) = 6E(\(Z_1\)) + 3E(\(Z_2\)) + E(\(Z_3\)) - 5E(\(Z_2\)) -2E(\(Z_3\)) = 0
pnorm(3,mean = 0, sd = sqrt(34), lower.tail = FALSE )
## [1] 0.3034527
Ans (a)
\[ \mathbb{V}_{ij}=\mathrm{Cov}(X_i,X_j) \] \[\begin{align*} Cov(X_i,X_j) &= E(X_i,X_j) - E(X_i)E(X_j)\\ &= E(X_i,X_j) - \mu_i\mu_j \\ &= E({\mathbf{X} \mathbf{X}^T)}_{ij} - [\boldsymbol{\mu} \boldsymbol{\mu}^T]_{ij} \end{align*}\]
Note that since \(\boldsymbol{\mathrm{X}}\) and \(\boldsymbol{\mathrm{\mu}}\) are column vectors,
\[ \boldsymbol{\mathrm{X}} \boldsymbol{\mathrm{X}}^T =\begin{bmatrix}X_1 \\ X_2 \\ \vdots \\ X_n\end{bmatrix} \begin{bmatrix}X_1 & X_2 \dots X_n\end{bmatrix} = \begin{bmatrix}X_1^2 & X_1 X_2 & X_1X_3& \dots& X_1X_n \\ X_2X_1 & X_2^2 & X_2X_3 & \dots & X_2X_n \\ \vdots & \vdots & \vdots & & \vdots \\ X_nX_1 & X_nX_2 & X_nX_3 & \dots & X_n^2 \end{bmatrix} \]
\[ \boldsymbol{\mathrm{\mu}} \boldsymbol{\mathrm{\mu}}^T =\begin{bmatrix}\mu_1 \\ \mu_2 \\ \vdots \\ \mu_n\end{bmatrix} \begin{bmatrix}\mu_1 & \mu_2 \dots \mu_n\end{bmatrix} = \begin{bmatrix} \mu_1^2 & \mu_1 \mu_2 & \mu_1\mu_3& \dots& \mu_1\mu_n \\ \mu_2\mu_1 & \mu_2^2 & \mu_2\mu_3 & \dots & \mu_2\mu_n \\ \vdots & \vdots & \vdots & & \vdots \\ \mu_n\mu_1 & \mu_n\mu_2 & \mu_n\mu_3 & \dots & \mu_n^2 \end{bmatrix} \]
Thus, \(\mathbb{V}_{ij} = E({\mathbf{X} \mathbf{X}^T)}_{ij} - [\boldsymbol{\mu} \boldsymbol{\mu}^T]_{ij}\) for all i,j.
So, \(\mathbb{V}= \mathrm{E}[\boldsymbol{\mathrm{X}}\boldsymbol{\mathrm{X}}^{\scriptscriptstyle \mathrm{T}}]- \boldsymbol{\mathrm{\mu}}\boldsymbol{\mathrm{\mu}}^{\scriptscriptstyle \mathrm{T}}.\)
\(\mathrm{Var}(\mathbb{A}\boldsymbol{\mathrm{Y}}) = \mathrm{E}(\mathbb{A}\boldsymbol{\mathrm{Y}} [\mathbb{A}\boldsymbol{\mathrm{Y}}]^{\scriptscriptstyle \mathrm{T}}) - \mathbb{A}\nu [\mathbb{A}\nu]^{\scriptscriptstyle \mathrm{T}}\)
\(= \mathrm{E}( \mathbb{A}\boldsymbol{\mathrm{Y}} {\boldsymbol{\mathrm{Y}}}^{\scriptscriptstyle \mathrm{T}}{\mathbb{A}}^{\scriptscriptstyle \mathrm{T}}) - \mathbb{A}\nu {\nu}^{\scriptscriptstyle \mathrm{T}}[\mathbb{A}]^{\scriptscriptstyle \mathrm{T}}\)
\(= \mathbb{A}\ E(\boldsymbol{\mathrm{Y}} \boldsymbol{\mathrm{Y}}^{\scriptscriptstyle \mathrm{T}}) \mathbb{A}^{\scriptscriptstyle \mathrm{T}}- \mathbb{A}\nu \nu^{\scriptscriptstyle \mathrm{T}}\mathbb{A}^{\scriptscriptstyle \mathrm{T}}\)
\(= \mathbb{A} \big( \mathrm{E}(\boldsymbol{\mathrm{Y}} \boldsymbol{\mathrm{Y}}^{\scriptscriptstyle \mathrm{T}}) - \nu \nu^{\scriptscriptstyle \mathrm{T}}\big) \mathbb{A}^{\scriptscriptstyle \mathrm{T}}\)
\(= \mathbb{A}\mathrm{Var}(\boldsymbol{\mathrm{Y}})\mathbb{A}^{\scriptscriptstyle \mathrm{T}}\)
lm()
obtains standard errorsRead the analysis of newspaper circulation data in Section 1.2.2 of Sheather. This example is continued in Section 6.2.2 of Sheather. You are now in a position to read most of this section too, but you are not required to do so at this point.
circulation <- read.table("circulation.txt",sep="\t",header=T)
Transform the data. Add two new columns to the dataframe called log_Sunday
and log_Weekday
containing the natural logarithm of the corresponding columns. The R command log()
gives this natural logarithm, also known as log to the base \(e\). We’ll discuss later in class how and why to choose a suitable transformation of the data, which is an important decision for data analysis.
Build the model in R. Create a linear model called lm1
by fitting the logarithm of weekday circulation and the binary variable for tabloid competitor as explanatory variables for the logarithm of Sunday circulation. Your code may look something like
lm1 <- lm(log_Sunday~log_Weekday+Tabloid_with_serious_competitor,data=circulation)
X
to be the design matrix using the model.matrix()
command by typingX <- model.matrix(lm1)
y <- circulation$log_Sunday
b <- solve(t(X) %*% X) %*% t(X) %*% y; b
## [,1]
## (Intercept) -0.4472999
## log_Weekday 1.0613296
## Tabloid_with_serious_competitor -0.5313725
#residual standard error
y_hat <- X %*% b
s <-sqrt(sum((y - y_hat)^2)/(nrow(X)-4));s
## [1] 0.140009
# find standard error for b
b_se <- s*sqrt(diag(solve(t(X) %*% X)));b_se
## (Intercept) log_Weekday
## 0.35344519 0.02864387
## Tabloid_with_serious_competitor
## 0.06840268
summary(lm1)
. Also, check that your calculation of the estimated standard deviation of the measurement error matches the residual standard error
offered by summary(lm1)
. Why do you think summary(lm1)
says that this is computed on 86 degrees of freedom
?summary(lm1)
##
## Call:
## lm(formula = log_Sunday ~ log_Weekday + Tabloid_with_serious_competitor,
## data = circulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.27550 -0.10175 -0.00198 0.08758 0.34881
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.44730 0.35138 -1.273 0.206
## log_Weekday 1.06133 0.02848 37.270 < 2e-16 ***
## Tabloid_with_serious_competitor -0.53137 0.06800 -7.814 1.26e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1392 on 86 degrees of freedom
## Multiple R-squared: 0.9427, Adjusted R-squared: 0.9413
## F-statistic: 706.8 on 2 and 86 DF, p-value: < 2.2e-16
Clearly, they do match.
This is computed on 86 degrees of freedom
because degrees of freedom in this case, n-p = 89 - 3 = 86.
Let \(Y_i\) be the Sunday ciculation of newspaper \(i\). The design matrix \(\mathbb{X} = \begin{bmatrix}X_1 & X_2 & X_3\end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ \vdots \\ x_{i1} & x_{i2} & x_{i3} \\ \vdots \\ x_{n1} & x_{n2} & x_{n3} \end{bmatrix} = \begin{bmatrix} 1 & x_{12} & x_{13} \\ 1 & x_{22} & x_{23} \\ \vdots \\ 1 & x_{i2} & x_{i3} \\ \vdots \\ 1 & x_{n2} & x_{n3} \end{bmatrix}\) where \(x_{i1} = 1\) (which corresponds to the intercept), \(x_{i2}\) is the weekday circulation of newspaper i and \(x_{i3}\) is the indicator function which takes value 1 if there is a competing tabloid, and 0 otherwise.
The model in matrix form is
\[ \boldsymbol{\mathrm{Y}} = \mathbb{X} \boldsymbol{\mathrm{\beta}} + \boldsymbol{\mathrm{\epsilon}} \]
where \(\boldsymbol{\mathrm{Y}} = \begin{bmatrix}Y_1 \\ Y_2 \\ \vdots \\ Y_n\end{bmatrix}\) is the vector of random variables \(Y_i\) which measures the Sunday circulation of newspaper i.
\(\mathbb{X}\) is the design matrix (as defined above). We obtain this purely from the data.
\(\boldsymbol{\mathrm{\beta}} = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}\) is the vector containing the true parameter values.
\(\boldsymbol{\mathrm{\epsilon}} = \begin{bmatrix}\epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n\end{bmatrix}\) is the vetor of random variables \(\epsilon_i\) which measures the error corresponding to newspaper i.
Model in subscript form, for i=1,2,…,n
\[ Y_i = \beta_1 + x_{i2}\beta_2 + x_{i3}\beta_3 + \epsilon_i\]
where \(\mathrm{Var}(\epsilon_i) = \sigma^2\), \(\mathrm{Cov}(\epsilon_i,\epsilon_j)=0\) for \(i\neq j\).
We model our data \(\mathbf{y}= \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_n\end{bmatrix}\) as a draw from this probability model.