Solution to Homework 1

Question 1.1.

\[\begin{eqnarray} \mathrm{Var}\left(\hat{\mu}\left(Y_{1:N}\right)\right)&=&\mathrm{Var}\left(\frac{1}{N}\sum_{n=1}^{N}Y_{n}\right) \\ &=&\frac{1}{N^{2}}\mathrm{Cov}\left(\sum_{m=1}^{N}Y_{m},\sum_{n=1}^{N}Y_{n}\right) \mbox{ using P1 and P3} \\ &=&\frac{1}{N^{2}}\sum_{m=1}^{N}\sum_{n=1}^{N}\mathrm{Cov}\left(Y_{m},Y_{n}\right) \mbox{ using P4} \\ &=&\frac{1}{N^{2}}\left(N\gamma_{0}+2\left(N-1\right)\gamma_{1}+\ldots+2\gamma_{N-1}\right) \mbox{ using P2 to give $\gamma_h=\gamma_{-h}$} \\ &=&\frac{1}{N}\gamma_{0}+\frac{2}{N^{2}}\sum_{h=1}^{N-1}\left(N-h\right)\gamma_{h} \end{eqnarray}\]

Question 1.2. By definition, \[ \hat{\gamma}_{h}\left(y_{1:N}\right)=\frac{1}{N}\sum_{n=1}^{N-h}\left(y_{n}-\hat{\mu}_{n}\right)\left(y_{n+h}-\hat{\mu}_{n+h}\right). \] Here, we consider the null hypothesis where $Y_{1:N}$ is IID with mean $0$ and standard deviation $\sigma$. We therefore use the estimator $\hat\mu_n=0$ and the autocovariance function estimator becomes \[\begin{eqnarray} \hat{\gamma}_{h}\left(y_{1:N}\right) &=& \frac{1}{N}\sum_{n=1}^{N-h}y_{n}y_{n+h}, \end{eqnarray}\] We let $\sum_{n=1}^{N-h}Y_{n}Y_{n+h}=U$ and $\sum_{n=1}^{N}Y_{n}^{2}=V$, and carry out a first order Taylor expansion of \[\hat\rho_h(Y_{1:N}) = \frac{\hat\gamma_h(y_{1:N})}{\hat\gamma_0(y_{1:N})} = \frac{U}{V}\] about $(\mathbb{E}[U],\mathbb{E}[V])$. This gives \[ \hat{\rho}_{h}(Y_{1:N}) \approx\frac{\mathbb{E}\left(U\right)}{\mathbb{E}\left(V\right)}+\left(U-\mathbb{E}\left(U\right)\right)\left.\frac{\partial}{\partial U}\left(\frac{U}{V}\right)\right|_{\left(\mathbb{E}\left(U\right),\mathbb{E}\left(V\right)\right)}+\left(V-\mathbb{E}\left(V\right)\right)\left.\frac{\partial}{\partial V}\left(\frac{U}{V}\right)\right|_{\left(\mathbb{E}\left(U\right),\mathbb{E}\left(V\right)\right)}. \] We have \[ \mathbb{E}\left(U\right)=\sum_{n=1}^{N-h}\mathbb{E}\left(Y_{n}\, Y_{n+h}\right)=0, \] \[ \mathbb{E}\left(V\right)=\sum_{n=1}^{N}\mathbb{E}\left(Y_{n}^{2}\right)=N\sigma^{2}, \] \[ \frac{\partial}{\partial U}\left(\frac{U}{V}\right)=\frac{1}{V}, \] \[ \frac{\partial}{\partial V}\left(\frac{U}{V}\right)=\frac{-U}{V^{2}}. \] Putting this together, we have \[\begin{eqnarray} \hat{\rho}_{h}(Y_{1:N})&\approx&\frac{\mathbb{E}\left(U\right)}{\mathbb{E}\left(V\right)}+\frac{U}{\mathbb{E}\left(V\right)}-\frac{\left(V-\mathbb{E}\left(V\right)\right)\mathbb{E}(U)}{\mathbb{E}(V)^{2}} \\ &=&\frac{U}{N\sigma^{2}}. \end{eqnarray}\] This gives us an approximation, \[ \mathrm{Var}\left(\hat{\rho}_{h}(Y_{1:N})\right)\approx\frac{\mathrm{Var}\left(U\right)}{N^{2}\sigma^{4}}. \] We now look to compute \[ \mathrm{Var}\left(U\right)= \mathrm{Var}\left(\sum_{n=1}^{N-h}Y_{n}Y_{n+h}\right). \] Since $Y_{1:N}$ are independent and mean zero, we have $\mathbb{E}[Y_{n}Y_{n+h}] = 0$ for $h\neq 0$. Therefore, for $m\neq n$, \[ \mathrm{Cov}\left(Y_{m}Y_{m+h},Y_nY_{n+h}\right) = \mathbb{E}\left[ Y_{m}Y_{m+h}\, Y_nY_{n+h}\right] = 0. \] Thus, the terms in the sum for $\mathrm{Var}\left(U\right)$ are uncorrelated for $m\neq n$ and we have \[\begin{eqnarray} \mathrm{Var}\left(U\right) &=& \sum_{n=1}^{N-h} \mathrm{Var}\left(Y_nY_{n+h}\right) \\ &=& (N-h) \, \mathbb{E}\left[Y_n^2Y_{n+h}^2\right] \\ &=& (N-h) \, \sigma^4 \end{eqnarray}\] Therefore, \[ \mathrm{Var}\left(\hat{\rho}_{h}(Y_{1:N})\right)\approx\frac{\left(N-h\right)}{N^{2}} \] When $n\rightarrow\infty$, $\mathrm{Var}\left(\hat{\rho}_h(Y_{1:N})\right)\rightarrow\frac{1}{N}$, justifying a standard deviation under the null hypothesis of $1/\sqrt{N}$.

B. A 95% confidence interval is a function of the data that constructs a set which (under a spedified model) covers the true parameter with probability 0.95.

Here, the interval $\big[-1.96/\sqrt{N},1.96/\sqrt{N}\big]$ does not depend on the data. For any given model, it therefore covers $\rho_h$ either with probability 1 or 0.
The interval $\big[\hat\rho_h(y_{1:N})-1.96/\sqrt{N},\hat\rho_h(y_{1:N})+1.96/\sqrt{N}\big]$ covers zero if and only if $\hat\rho^{}_h$ falls between the dashed lines. In this sense, the dashed lines have some meaning relevant to construction of a confidence interval with local coverage of 95% at $\rho_h=0$ for $N$ large. However, the lines really represent an acceptance region of a test under a null hypothesis of independence. This is a different thing from a confidence interval.

Question 1.3.

Credit for sources was awarded following the policy in the syllabus. Full credit was given if sources were listed, with clear attribution at the point where the source was used.

Solution to Homework 1

STATS/DATASCI 531, Winter 2021