The answer is n−m+1. For the case where m=1, we can see that the formula correctly gives us ∑ni=11=n−1+1=n.
Because ˉx can be treated as a constant we have ∑ni=1(xi−ˉx)=∑ni=1xi−∑ni=1ˉx=∑ni=1xi−nˉx.
However, nˉx=n1n∑ni=1xi=∑ni=1xi, so the above quantity is zero.
1⊤x=[11…1][x1x2⋮xn]=x1+x2+⋯+xn=∑ni=1xi
First, load the data
mice <- read.csv("https://ionides.github.io/401f18/hw/hw01/femaleMiceWeights.csv")
The sample mean is given by ˉx=∑12i=1xi. The sample standard deviation is given by s=√111∑12i=1(xi−ˉx)2.
To obtain these values in R, we first subset to the high fat diet:
data_hf = mice[mice$Diet == "hf",2]
We then use the mean and sd function to obtain the desired values:
sample_mean = mean(data_hf)
sample_sd = sd(data_hf)
sample_mean
## [1] 26.83417
sample_sd
## [1] 4.097606
hist(data_hf, freq = FALSE)
normal.x = seq(from = min(data_hf), to = max(data_hf),length = 100)
normal.y = dnorm(normal.x, mean = sample_mean, sd = sample_sd)
lines(normal.x,normal.y,col = "blue")
It does not appear that the data fit a normal distribution model well. The data are slightly skewed and have a much thicker tail.
normal_draws = rnorm(12,sample_mean,sample_sd)
hist(normal_draws, freq = FALSE)
normal.x = seq(from = min(normal_draws), to = max(normal_draws),length = 100)
normal.y = dnorm(normal.x, mean = sample_mean, sd = sample_sd)
lines(normal.x,normal.y,col = "blue")
Even though the data in (c) were actually drawn from a normal distribution, it does not appear that the data fit a normal distribution model any better than the mice data. If we were to repeat this procedure, we would notice a similar trend quite often. This tells us that it is not necessarily reasonable to assume that a normal distribution model is not appropriate based off the 12 data points we collected.
The percentage is given by: ∫∞11√2πexp(−x22)+∫1−∞1√2πexp(−x22)
We can evaluate this in R using:
1-pnorm(1) + pnorm(-1)
## [1] 0.3173105
Alternatively, we can obtain the same result by symmetry:
2*pnorm(-1)
## [1] 0.3173105
above = sum(data_hf > sample_mean + sample_sd)
below = sum(data_hf < sample_mean - sample_sd)
Then we calculate the percentage more than 1 sample standard deviation away from the sample mean:
(above+below)/12
## [1] 0.3333333
This value is fairly close to the value given under a normal approximation.
my_t <- function(n){
rnorm(1) / sqrt(sum(rnorm(n)^2/n))
}
(a)-(c)
t_draws = replicate(10000,my_t(6))
hist(t_draws, breaks = 15, freq = FALSE)
x = seq(from = min(t_draws), to = max(t_draws),length = 100)
t.y = dt(x, df = 6)
normal.y = dnorm(x, 0, 1)
lines(x,t.y, col = "blue")
lines(x,normal.y, col = "blue",lty = "dashed")
Both densities are unimodal and symmetric; however, the t distribution has thicker tails than the normal distribution.
my_F <- function(m,n){
sum(rnorm(m)^2/m) / sum(rnorm(n)^2/n)
}
(a)-(b)
f_draws = replicate(10000,my_F(5,10))
hist(f_draws, breaks = 15, freq = FALSE)
x = seq(from = min(t_draws), to = max(t_draws),length = 100)
f.y = df(x, df1 = 5, df2 = 10)
lines(x,f.y, col = "blue")