Processing math: 100%

Q1

The answer is nm+1. For the case where m=1, we can see that the formula correctly gives us ni=11=n1+1=n.

Q2

  1. These are not necessarily equal. For example, suppose m<n. Then ni=1xi=mi=1xi+ni=m+1xi and so the two sides are not equal unless ni=m+1xi=0
  2. These are equal. i and k are arbitrary indexing variables.
  3. These are not necessarily equal. For example, if xi=1 for all i, and zi=2 for all i, then the two values will be different.
  4. These are not necessarily equal. For example, if xi=1 for all i=1,,n, and xi=2 for all i=n+1,,2n, then the two values will be different.

Q3

Because ˉx can be treated as a constant we have ni=1(xiˉx)=ni=1xini=1ˉx=ni=1xinˉx.

However, nˉx=n1nni=1xi=ni=1xi, so the above quantity is zero.

Q4

1x=[111][x1x2xn]=x1+x2++xn=ni=1xi

Q5

First, load the data

mice <- read.csv("https://ionides.github.io/401f18/hw/hw01/femaleMiceWeights.csv")
  1. Let xi,i=1,2,,12 be the weights for the 12 mice on the high fat diet.

The sample mean is given by ˉx=12i=1xi. The sample standard deviation is given by s=11112i=1(xiˉx)2.

To obtain these values in R, we first subset to the high fat diet:

data_hf = mice[mice$Diet == "hf",2] 

We then use the mean and sd function to obtain the desired values:

sample_mean = mean(data_hf)
sample_sd = sd(data_hf)
sample_mean
## [1] 26.83417
sample_sd
## [1] 4.097606
hist(data_hf, freq = FALSE)
normal.x = seq(from = min(data_hf), to = max(data_hf),length = 100)
normal.y = dnorm(normal.x, mean = sample_mean, sd = sample_sd)
lines(normal.x,normal.y,col = "blue")

It does not appear that the data fit a normal distribution model well. The data are slightly skewed and have a much thicker tail.

normal_draws = rnorm(12,sample_mean,sample_sd)

hist(normal_draws, freq = FALSE)
normal.x = seq(from = min(normal_draws), to = max(normal_draws),length = 100)
normal.y = dnorm(normal.x, mean = sample_mean, sd = sample_sd)
lines(normal.x,normal.y,col = "blue")

  1. Even though the data in (c) were actually drawn from a normal distribution, it does not appear that the data fit a normal distribution model any better than the mice data. If we were to repeat this procedure, we would notice a similar trend quite often. This tells us that it is not necessarily reasonable to assume that a normal distribution model is not appropriate based off the 12 data points we collected.

  2. The percentage is given by: 112πexp(x22)+112πexp(x22)

We can evaluate this in R using:

1-pnorm(1) + pnorm(-1)
## [1] 0.3173105

Alternatively, we can obtain the same result by symmetry:

2*pnorm(-1)
## [1] 0.3173105
  1. We can calculate the amount more than 1 sample standard deviation above or 1 sample standard deviation below the sample mean as follows:
above = sum(data_hf > sample_mean + sample_sd)
below = sum(data_hf < sample_mean - sample_sd)

Then we calculate the percentage more than 1 sample standard deviation away from the sample mean:

(above+below)/12
## [1] 0.3333333

This value is fairly close to the value given under a normal approximation.

Q6

my_t <- function(n){
  rnorm(1) / sqrt(sum(rnorm(n)^2/n))
}

(a)-(c)

t_draws = replicate(10000,my_t(6))
hist(t_draws, breaks = 15, freq = FALSE)

x = seq(from = min(t_draws), to = max(t_draws),length = 100)
t.y = dt(x, df = 6)
normal.y = dnorm(x, 0, 1)

lines(x,t.y, col = "blue")
lines(x,normal.y, col = "blue",lty = "dashed")

Both densities are unimodal and symmetric; however, the t distribution has thicker tails than the normal distribution.

Q7

my_F <- function(m,n){
   sum(rnorm(m)^2/m) / sum(rnorm(n)^2/n)
}

(a)-(b)

f_draws = replicate(10000,my_F(5,10))
hist(f_draws, breaks = 15, freq = FALSE)

x = seq(from = min(t_draws), to = max(t_draws),length = 100)
f.y = df(x, df1 = 5, df2 = 10)

lines(x,f.y, col = "blue")

  1. When m=1, we can write Eq 2 as V=1i=1Y2i/1nj=1Z2j/n=Y21nj=1Z2j/n, where Y1,Z1,,Zn are draws from a standard normal distribution. This corresponds exactly to the square of Eq 1. We can therefore conclude V has the same distribution as U2.