Question 1

x1 = c(rep(1,12),rep(0,12))
x2 = c(rep(0,12),rep(1,12))

X = cbind(x1,x2)
X
##       x1 x2
##  [1,]  1  0
##  [2,]  1  0
##  [3,]  1  0
##  [4,]  1  0
##  [5,]  1  0
##  [6,]  1  0
##  [7,]  1  0
##  [8,]  1  0
##  [9,]  1  0
## [10,]  1  0
## [11,]  1  0
## [12,]  1  0
## [13,]  0  1
## [14,]  0  1
## [15,]  0  1
## [16,]  0  1
## [17,]  0  1
## [18,]  0  1
## [19,]  0  1
## [20,]  0  1
## [21,]  0  1
## [22,]  0  1
## [23,]  0  1
## [24,]  0  1

Question 2

mice = read.csv("https://ionides.github.io/401w18/hw/hw01/femaleMiceWeights.csv")

mice$mu1 = (mice$Diet == "chow")*1
mice$mu2 = (mice$Diet == "hf")*1

lm1 <- lm(Bodyweight~mu1+mu2-1,data=mice);summary(lm1)
## 
## Call:
## lm(formula = Bodyweight ~ mu1 + mu2 - 1, data = mice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1042 -2.4358 -0.4138  2.8335  7.1858 
## 
## Coefficients:
##     Estimate Std. Error t value Pr(>|t|)    
## mu1   23.813      1.039   22.91   <2e-16 ***
## mu2   26.834      1.039   25.82   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.6 on 22 degrees of freedom
## Multiple R-squared:  0.9819, Adjusted R-squared:  0.9802 
## F-statistic: 595.8 on 2 and 22 DF,  p-value: < 2.2e-16

We do not want to include the intercept in this model. Thus we need -1 in the lm call since otherwise R will automatically include the intercept.

lm <- lm(Bodyweight~mu1+mu2,data=mice);summary(lm)
## 
## Call:
## lm(formula = Bodyweight ~ mu1 + mu2, data = mice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1042 -2.4358 -0.4138  2.8335  7.1858 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   26.834      1.039  25.818   <2e-16 ***
## mu1           -3.021      1.470  -2.055   0.0519 .  
## mu2               NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.6 on 22 degrees of freedom
## Multiple R-squared:  0.1611, Adjusted R-squared:  0.1229 
## F-statistic: 4.224 on 1 and 22 DF,  p-value: 0.05192

If we do not write -1, R produce NA for estimates associated with mu2. This is because there is parameter redundancy when trying to estimate an intercept and a mean for both groups.

95% confidence interval for \(\mu_1\)

c(23.813-1.96*1.039,23.813+1.96*1.039)
## [1] 21.77656 25.84944

95% confidence interval for \(\mu_2\)

c(26.834-1.96*1.039,26.834+1.96*1.039)
## [1] 24.79756 28.87044

Question 3

mice$mu1 = 1
mice$mu_diff = (mice$Diet == "hf")*1
lm2 <- lm(Bodyweight~mu1+mu_diff-1,data=mice);summary(lm2)
## 
## Call:
## lm(formula = Bodyweight ~ mu1 + mu_diff - 1, data = mice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1042 -2.4358 -0.4138  2.8335  7.1858 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## mu1       23.813      1.039  22.912   <2e-16 ***
## mu_diff    3.021      1.470   2.055   0.0519 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.6 on 22 degrees of freedom
## Multiple R-squared:  0.9819, Adjusted R-squared:  0.9802 
## F-statistic: 595.8 on 2 and 22 DF,  p-value: < 2.2e-16

Comparing summart(lm1) and summary(lm2), we notice that mu1 in lm2 has same estimate and standard error as mu1 in lm1. mu_diff in lm2 is equal to mu2 - mu1 in lm1. Also notice that lm1 and lm2 have same residual standard error 3.6.

all.equal(lm1$fitted.values, lm2$fitted.values)
## [1] TRUE

Thus fitted value in lm1 is (approximately) equal to fitted value in lm2 (they might be slightly different due to rounding errors).This again indicates that lm1 and lm 2 are essentially the same model.

95% confidence interval for \(\mu_2-\mu_1\)

# use normal approximation
c(3.021-1.96*1.470,3.021+1.96*1.470)
## [1] 0.1398 5.9022
c(3.021-qt(0.975,df=22)*1.470,3.021+qt(0.975,df=22)*1.470)
## [1] -0.02759341  6.06959341

To see this as a hythothesis problem, we want to test \(H_0:\mu_1 = \mu_2\) against \(H_1: \mu_1 \neq \mu_2\). From summary(lm2), the associated test statistics is 2.055 with p-value 0.0519. Thus we fail to reject \(H_0\) at 0.05 level. Namely we do not have enough evidence to conclude the mean of bodyweight under diet chow and hf is different. (If we choose significant level to be 0.1, we can reject \(H_0\))

Question 4

V <- summary(lm1)$cov.unscaled

a = c(1,-1)
V_diff = (a %*% V %*% a)
sd_diff = sqrt(V_diff)*summary(lm1)$sigma # we need to time sigma here since cov.unscaled does not include sigma
sd_diff
##          [,1]
## [1,] 1.469867

This is consistent with the standard error given in summary(lm2).

summary(lm1)$cov.unscaled
##            mu1        mu2
## mu1 0.08333333 0.00000000
## mu2 0.00000000 0.08333333

The correlation between \(\hat{\mu}_1\) and \(\hat{\mu}_2\) is 0. This is because \(\mu_1\) is estimated using \(y_{1j}\),\(j=1,...,12\) while \(\mu_2\) is estimated using \(y_{2j}\),\(j=1,...,12\). Thus \(\hat{\mu}_1\) and \(\hat{\mu}_2\) are independent since \(y_{ij}\) are assumed to be independent.

License: This material is provided under an MIT license
Acknowledgement: The randomized experiment draws on material from from https://genomicsclass.github.io/book