Write solutions to the exercises in the first half of the homework. For the data analysis, you do not have to report anything for questions 1–4. For questions 5 and 6, report your code together with a brief explanation. Question 7 asks you to carefully write out the probability model you have used for the standard errors. Recall that you are permitted to collaborate, or to use any internet resources, but you must list all sources that make a substantial contribution to your report. As usual, following the syllabus, you are also requested to give some feedback in a “Please explain” statement.
Suppose \(X\) and \(Y\) are bivariate random variables with means \(\mu_X\) and \(\mu_Y\) respectively. Use the linearity of expectation, together with the formulas \[ {\mathrm{Var}}(X)={\mathrm{E}}[X^2]-({\mathrm{E}}[X])^2, \quad {\mathrm{Cov}}(X,Y)={\mathrm{E}}[XY]-{\mathrm{E}}[X]\, {\mathrm{E}}[Y], \] to show that \[ {\mathrm{Var}}(X+Y)={\mathrm{Var}}(X)+{\mathrm{Var}}(Y) +2{\mathrm{Cov}}(X,Y). \] This is similar to a calculation we did in class, but here it is a little easier since you can start from the above formulas for \({\mathrm{Var}}(X)\) and \({\mathrm{Cov}}(X,Y)\) rather than going all the way back to the basic definition \[ {\mathrm{Var}}(X)={\mathrm{E}}\big[ (X- {\mathrm{E}}[X])^2\big], \quad {\mathrm{Cov}}(X,Y)={\mathrm{E}}\big[ (X-{\mathrm{E}}[X])(Y-{\mathrm{E}}[Y]) \big]. \]
Use R to find the variance matrix \({\mathrm{Var}}({\boldsymbol{\mathrm{Y}}})\), when \({\boldsymbol{\mathrm{Y}}}\) is defined by \[ {\boldsymbol{\mathrm{Y}}} = \begin{bmatrix} 6 & 3 & 1 \\ 0 & 5 & 2 \\ 0 & 0 & 4 \\ \end{bmatrix} {\boldsymbol{\mathrm{Z}}} \]
Supposing that \(Y_1-Y_2\) is normally distributed, use pnorm()
to find \({\mathrm{P}}[ Y_1-Y_2>3]\).
lm()
obtains standard errorsRead the analysis of newspaper circulation data in Section 1.2.2 of Sheather. This example is continued in Section 6.2.2 of Sheather. You are now in a position to read most of this section too, but you are not required to do so at this point.
sep="\t"
. Since there are spaces within some newspaper names, read.table(....,sep=" ")
does not work. Instead, usecirculation <- read.table("circulation.txt",sep="\t",header=T)
Transform the data. Add two new columns to the dataframe called log_Sunday
and log_Weekday
containing the natural logarithm of the corresponding columns. The R command log()
gives this natural logarithm, also known as log to the base \(e\). We’ll discuss later in class how and why to choose a suitable transformation of the data, which is an important decision for data analysis.
Build the model in R. Create a linear model called lm1
by fitting the logarithm of weekday circulation and the binary variable for tabloid competitor as explanatory variables for the logarithm of Sunday circulation. Your code may look something like
lm1 <- lm(log_Sunday~log_Weekday+Tabloid_with_serious_competitor,data=circulation)
X
to be the design matrix using the model.matrix()
command by typingX <- model.matrix(lm1)
y <- circulation$log_Sunday
Check that these match the output of summary(lm1)
. Also, check that your calculation of the estimated standard deviation of the measurement error matches the residual standard error
offered by summary(lm1)
. Why do you think summary(lm1)
says that this is computed on 86 degrees of freedom
?
Write out in mathematical notation the probability model used to contruct these standard errors. Be careful to define the notation you use. Specify a letter for each quantity - you can use words to help define the quantities in your equation, but you should usually avoid words in an equation. You can write your equation either using vectors & matrices or by using subscripts to denote each unit \(i\) and specifying the range of values of \(i\) for which the equation holds. Be explicit about what quantities are random vectors. If you define a measurement error model, be sure to specify all means, variances and covariances for the error random variables.
License: This material is provided under an MIT license
Acknowledgement: The linear model fitting problem develops an example from S. J. Sheather (2009) “A Modern Approach to Regression with R.”