Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC



Objectives

  1. Set up general notation for working with time series data and time series models.

  2. Define the trend function for a time series model, and discuss its estimation from data. In particular, discuss the properties of least square estimation of the trend.

  3. Define the autocovariance and autocorrelation functions and their standard estimators.




2.1 Definition: Time series data and time series models




2.1.1 Review questions

  1. What is a random variable?

  2. What is a collection of jointly defined random variables?

  3. What is a probability density function? What is a joint density function? What is a conditional density function?

  4. What does it mean to say that ``\(\theta\) is a vector of parameters?’’

(There are different answers to these questions, but you should be able to write down an answer that you are satisfied with.)




2.2 Definition: The mean function, or trend

2.3 Definition: The autocovariance and autocorrelation functions, and some estimators of them




2.4 Estimating a trend by least squares

Let’s analyze a time series of global mean annual temperature downloaded from http://climate.nasa.gov/system/internal_resources/details/original/647_Global_Temperature_Data_File.txt. These data are in degrees Celsius measured as an anomaly from a 1951-1980 base. This is climatology jargon for saying that the sample mean of the temperature over the interval 1951-1980 was subtracted from all time points.

global_temp <- read.table("Global_Temperature.txt",header=TRUE)
str(global_temp)
## 'data.frame':    136 obs. of  3 variables:
##  $ Year            : int  1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 ...
##  $ Annual          : num  -0.2 -0.11 -0.1 -0.2 -0.28 -0.31 -0.31 -0.33 -0.2 -0.12 ...
##  $ Moving5yrAverage: num  NA NA -0.18 -0.2 -0.24 -0.29 -0.29 -0.25 -0.27 -0.25 ...
plot(Annual~Year,data=global_temp,ty="l")

2.4.1 Fitting a least squares model with a quadratic trend

Perhaps the simplest trend model that makes sense looking at these data is a quadratic trend, \[\mu(t)= \beta_0 + \beta_1 t + \beta_2 t^2.\] To write the least squares estimate of \(\beta_0\), \(\beta_1\) and \(\beta_2\), we set up matrix notation. Write \[ \mu = (\mu_1,\mu_2,\dots,\mu_N)^{\scriptsize{T}}\] for the column vector describing the mean function, and similarly, \[ \beta = (\beta_0,\beta_1,\beta_2)^{\scriptsize{T}}.\] Then, defining \[ Z = \left(\begin{array}{ccc} 1 & 1880 & 1880^2 \\ 1 & 1881 & 1881^2 \\ 1 & 1882 & 1882^2 \\ \vdots & \vdots & \vdots \end{array}\right),\] we have \[ \mu = Z\beta.\] We also write a generic potential dataset \(y_{1:N}\) and the data \({y_{1:N}^*}\) as column vectors, \[\begin{eqnarray} y &=& (y_1,y_2,\dots,y_N)^{\scriptsize{T}},\\ {y^*} & =& ({y_1^*},{y_2^*},\dots,{y_N^*})^{\scriptsize{T}}. \end{eqnarray}\] The ordinary least squares (OLS) estimator of \(\beta\) is \[\hat\beta_{OLS}(y_{1:N}) = (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}y,\] with corresponding OLS estimate \[{\beta^*}=\hat\beta_{OLS}({y_{1:N}^*}) = (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}{y^*}.\] We can carry out this computation in R by

lm_fit <- lm(Annual~Year+I(Year^2),data=global_temp)
summary(lm_fit)
## 
## Call:
## lm(formula = Annual ~ Year + I(Year^2), data = global_temp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26880 -0.08655  0.00215  0.07388  0.37582 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.818e+02  2.753e+01   10.23   <2e-16 ***
## Year        -2.964e-01  2.829e-02  -10.48   <2e-16 ***
## I(Year^2)    7.789e-05  7.264e-06   10.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1146 on 132 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.8657, Adjusted R-squared:  0.8637 
## F-statistic: 425.6 on 2 and 132 DF,  p-value: < 2.2e-16

where I() is a function that tells R to construct Year^2 as a variable, and inhibits interpretation in the R model formula notation.

We can check visually how well this model fits the data.

yr <- 1880:2026
Z <- cbind(1,yr,yr^2)
beta <- coef(lm_fit)
prediction <- Z%*%beta
plot(Annual~Year,data=global_temp,ty="l",xlim=range(yr),ylim=range(c(global_temp$Annual,prediction),na.rm=TRUE),lty="dashed")
lines(x=yr,y=prediction,col="red")

  • The overall estimated trend seems a reasonable fit for the data.

  • If we want to attach uncertainty to our parameter estimates, and consequently to our forecast, we need a time series model \(Y_{1:N}\), which we write in column vector form as \[Y = (Y_1,Y_2,\dots,Y_N)^{\scriptsize{T}}.\]

  • The standard OLS model is

    [L1] \(\quad\quad\quad\quad\quad Y = Z\beta + \epsilon,\)

    where \(\epsilon=\epsilon_{1:N}\) is a vector of independent, identically distributed random variables with mean zero and constant variance, \[{\mathbb{E}}[\epsilon_n]=0, \quad\quad {\mathrm{Var}}[\epsilon_n] = \sigma^2.\] Standard linear model software, such as lm in R, provides confidence intervals based on this model.

  • Under model L1, the estimator \(\hat\beta_{OLS}(y_{1:N})\) is unbiased. This can be checked: \[\begin{eqnarray} {\mathbb{E}}\big[\hat\beta_{OLS}(Y_{1:N})\big] &=&{\mathbb{E}}\big[ (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}Y \big]\\ &=& {\mathbb{E}}\big[ (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}\{Z\beta + \epsilon \}\big]\\ &=& (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}\{Z\beta + {\mathbb{E}}[\epsilon]\} \\ &=& (Z^{\scriptsize{T}}Z)^{-1}(Z^{\scriptsize{T}}Z)\beta \\ &=& \beta \end{eqnarray}\]
  • A result for linear models is that \(\hat\beta_{OLS}(y_{1:N})\) is the minimum variance unbiased estimator for model L1.

  • The variance/covariance matrix of \(\hat\beta_{OLS}(Y_{1:N})\) under this model is \[{\mathrm{Cov}}[\hat\beta_{OLS}(Y_{1:N})] = \sigma^2 \big( Z^{\scriptsize{T}}Z\big)^{-1},\] which is estimated using an estimator for \(\sigma\) of \[\hat\sigma_{OLS}(y_{1:N})= \sqrt{\frac{1}{N-d} \big(y-Z\hat\beta_{OLS}\big)^{\scriptsize{T}}\big(y-Z\hat\beta_{OLS}\big)},\] where \(d\) is the number of covariates, i.e., the number of columns of \(Z\).

Let’s look at the residuals to assess how appropriate this model is here.

acf(resid(lm_fit))

  • The horizontal dashed lines on the graph of the sample autocorrelation function (ACF) give a measure of chance variation under the null hypothesis that the residuals are IID.

  • At each lag \(h\), the chance that the estimated ACF falls within this band is approximately 95%, under the null hypothesis.

  • Thus, under the null hypothesis, one expects a fraction of \(1/20\) of the lags of the sample ACF to fall outside this band.

  • Here, the sample ACF confirms what we can probably see from the plot of the fitted model: the variation around the fitted model is clustered in time, so the sample ACF of the residuals is not consistent with a model having independent error terms.




2.4.2 Question: How does R construct these horizontal dashed lines?

  • How would you check what R actually does when it constructs these dashed lines? What approximation is being made? When is that approximation appropriate?

Hint: If you type acf in R, you get the source code for the acf function. You’ll see that the plotting is done by a service function plot.acf. This service function is part of the package, and is not immediately accessible to you. Nevertheless, you can check the source code as follows

  1. Notice, either from the help documentation ?acf or the last line of the source code acf that this function resides in the package stats.

  2. Now, you can access this namespace directly, to list the source code, by

    stats:::plot.acf
  3. Finally, relate this source code to the task of testing for lack of correlation, a standard topic in undergrad introductory statistics courses. The critical line of code seems to be

    clim0 <- if (with.ci) qnorm((1 + ci)/2)/sqrt(x$n.used)

    This appears to use a normal distribution approximation for the sample autocorrelation estimator, with mean zero and standard deviation \(1/\sqrt{N}\). Derive this approximation, for a sequence of IID mean zero random variables. You can ignore the issue of estimating the mean: suppose that the mean is known to be zero.




2.4.3 Generalized least squares

  • Suppose for the moment that we knew the covariance matrix, \(\Gamma\), for a model with dependent errors,

    [L2] \(\quad\quad\quad\quad Y = Z\beta + \zeta, \quad \quad \zeta \sim N[0,\Gamma].\)

    We read “\(\zeta \sim N[0,\Gamma]\)” as “\(\zeta\) follows a multivariate normal distribution with mean zero and covariance matrix \(\Gamma\).”

  • The minimum variance unbiased estimator of \(\beta\) for model L2 is the generalized least square (GLS) estimator, \[\hat \beta_{GLS}(y_{1:N}) = \big( Z^{\scriptsize{T}}\Gamma^{-1} Z \big)^{-1} \, Z^{\scriptsize{T}}\Gamma^{-1} y.\]

  • The OLS estimator remains unbiased for L2 (you can check this as an exercise). In this sense it remains a reasonable estimator. It is often a practical solution to use the OLS estimator, expecially for preliminary data analysis. We don’t know \(\Gamma\) so can’t necessarily make a good estimator based on the GLS model. It might be easier to get an estimate of \(\Gamma\) once we have a reasonable estimate of the trend.

  • For model L2, the variance of the OLS estimator is \[{\mathrm{Var}}\big[\hat \beta_{OLS}(Y_{1:N})\big] = (Z^{\scriptsize{T}}Z)^{-1} \, Z^{\scriptsize{T}}\Gamma Z\, (Z^{\scriptsize{T}}Z)^{-1}.\] This is different from the variance under model L1.

  • CONCLUSION. It is okay to do ordinary linear regression for data which are not well modeled with uncorrelated errors. However, if we do so, we should not trust the error estimates coming from L1.

  • This is an example of a situation where some parts of the output from statistical software are reasonable (here, the parameter estimates from lm) and other parts are unreasonable (the corresponding standard errors and any tests based on them). The theory helps us decide which bits of computer output to use and which to ignore.