Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC



Objectives

  1. Set up general notation for working with time series data and time series models.

  2. Define the trend function for a time series model, and discuss its estimation from data. In particular, discuss the properties of least square estimation of the trend.

  3. Define the autocovariance and autocorrelation functions and their standard estimators.




2.1 Definition: Time series data and time series models




2.1.1 Review questions

  1. What is a random variable?

  2. What is a collection of jointly defined random variables?

  3. What is a probability density function? What is a joint density function? What is a conditional density function?

  4. What does it mean to say that “\(\theta\) is a vector of parameters?”

(There are different answers to these questions, but you should be able to write down an answer that you are satisfied with.)




2.2 Definition: The mean function, or trend


2.2.1 Review question

What is expected value? How is it defined? How can it fail to exist for a properly defined random variable?


  • We define the mean function by \[ \mu_n = {\mathbb{E}}[Y_n] = \int_{-\infty}^\infty y_n \, f^{}_{Y_n}(y_n)\, dy_n\] for \(n\in 1{{\,{:}\,}}N\).

  • We use the words “mean function” and “trend” interchangeably.

  • We say “function” since we are thinking of \(\mu_n\) as a function of \(n\).

    • Sometimes, it makes sense to think of time as continuous. Then, we can write \[\mu(t)\] for the expected value of an observation at time \(t\). We only make observations at the discrete collection of times \(t_{1:N}\) and so we require \[ \mu(t_n)= \mu_n.\]

    • A time series may have measurements evenly spaced in time, but our notation does not insist on this. In practice, time series data may contain missing values or unequally spaced observations.

  • \(\mu_n\) may depend on \(\theta\), the parameter vector. We can write \(\mu_n(\theta)\) to make this explicit.

  • We write \(\hat\mu_n(y_{1:N})\) to be some estimator of \(\mu_n\), i.e., a map which is applied to the data to give an estimate of \(\mu_n\). An appropriate choice of \(\hat\mu_n\) will depend on the data and the model.

  • Usually, applied statistics courses do not distinguish between estimators (functions that can be applied to any dataset) and estimates (an estimator evaluated on the actual data). In this course, we will want to be quite careful when thinking about model specification and diagnosing model misspecification. Therefore, we are going to try to preserve this distinction. Properly, this distinction is always there, but is often ignored.

  • For example, the estimate of the mean function is the value of the estimator when applied to the data. Here, we write this as \[ {\hat\mu_n^*} = \hat\mu_n({y_{1:N}^*}).\]

  • We call \({\hat\mu_n^*}\) an estimated mean function or estimated trend.

  • For example, sometimes we suppose a model with \(\mu_n=\mu\), so the mean is assumed constant. In this case, the model is called mean stationary. Then, we might estimate \(\mu\) using the mean estimator, \[\hat\mu(y_{1:N})=\frac{1}{N}\sum_{k=1}^N y_k.\] In this case, the corresponding estimate \({\hat\mu^*}=\hat\mu({y_{1:N}^*})\) is the sample mean.

    • We can compute the sample mean, \({\hat\mu^*}\), for any dataset. It is only a reasonable estimator of the mean function when a mean stationary model is appropriate.

    • Notice that trend is a property of the model, and the estimated trend is a function of the data.

    • Formally, we should not talk about the trend of the data. People do, but we should try not to.

    • Similarly, data cannot be mean stationary. A model can be mean stationary.


2.2.2 Question: properties of models vs properties of data

Consider these two statements. Does is matter which we use?

    1. ``The data look mean stationary.''

    2. ``A mean stationary model looks appropriate for these data.''




2.3 Definition: The autocovariance and autocorrelation functions, and some estimators of them




2.4 Estimating a trend by least squares

Let’s analyze a time series of global mean annual temperature downloaded from http://climate.nasa.gov/system/internal_resources/details/original/647_Global_Temperature_Data_File.txt. These data are in degrees Celsius measured as an anomaly from a 1951-1980 base. This is climatology jargon for saying that the sample mean of the temperature over the interval 1951-1980 was subtracted from all time points.

global_temp <- read.table("Global_Temperature.txt",header=TRUE)
str(global_temp)
## 'data.frame':    137 obs. of  3 variables:
##  $ Year            : int  1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 ...
##  $ Annual          : num  -0.2 -0.12 -0.1 -0.21 -0.28 -0.32 -0.31 -0.33 -0.2 -0.12 ...
##  $ Moving5yrAverage: num  -0.13 -0.16 -0.19 -0.21 -0.24 -0.26 -0.27 -0.27 -0.27 -0.26 ...
plot(Annual~Year,data=global_temp,ty="l")

2.4.1 Fitting a least squares model with a quadratic trend

Perhaps the simplest trend model that makes sense looking at these data is a quadratic trend, \[\mu(t)= \beta_0 + \beta_1 t + \beta_2 t^2.\] To write the least squares estimate of \(\beta_0\), \(\beta_1\) and \(\beta_2\), we set up matrix notation. Write \[ \mu = (\mu_1,\mu_2,\dots,\mu_N)^{\scriptsize{T}}\] for the column vector describing the mean function, and similarly, \[ \beta = (\beta_0,\beta_1,\beta_2)^{\scriptsize{T}}.\] Then, defining \[ Z = \left(\begin{array}{ccc} 1 & 1880 & 1880^2 \\ 1 & 1881 & 1881^2 \\ 1 & 1882 & 1882^2 \\ \vdots & \vdots & \vdots \end{array}\right),\] we have \[ \mu = Z\beta.\] We also write a generic potential dataset \(y_{1:N}\) and the data \({y_{1:N}^*}\) as column vectors, \[\begin{eqnarray} y &=& (y_1,y_2,\dots,y_N)^{\scriptsize{T}},\\ {y^*} & =& ({y_1^*},{y_2^*},\dots,{y_N^*})^{\scriptsize{T}}. \end{eqnarray}\] The ordinary least squares (OLS) estimator of \(\beta\) is \[\hat\beta_{OLS}(y_{1:N}) = (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}y,\] with corresponding OLS estimate \[{{\hat\beta^*}}=\hat\beta_{OLS}({y_{1:N}^*}) = (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}{y^*}.\] We can carry out this computation in R by

lm_fit <- lm(Annual~Year+I(Year^2),data=global_temp)
summary(lm_fit)
## 
## Call:
## lm(formula = Annual ~ Year + I(Year^2), data = global_temp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27092 -0.08638  0.00613  0.07481  0.38067 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.959e+02  2.681e+01   11.04   <2e-16 ***
## Year        -3.111e-01  2.753e-02  -11.30   <2e-16 ***
## I(Year^2)    8.168e-05  7.067e-06   11.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1157 on 134 degrees of freedom
## Multiple R-squared:  0.8766, Adjusted R-squared:  0.8748 
## F-statistic: 476.1 on 2 and 134 DF,  p-value: < 2.2e-16

where I() is a function that tells R to construct Year^2 as a variable, and inhibits interpretation in the R model formula notation.

We can check visually how well this model fits the data.

yr <- 1880:2026
Z <- cbind(1,yr,yr^2)
beta <- coef(lm_fit)
prediction <- Z%*%beta
plot(Annual~Year,data=global_temp,ty="l",xlim=range(yr),ylim=range(c(global_temp$Annual,prediction),na.rm=TRUE),lty="dashed")
lines(x=yr,y=prediction,col="red")

  • The overall estimated trend seems a reasonable fit for the data.

  • If we want to attach uncertainty to our parameter estimates, and consequently to our forecast, we need a time series model \(Y_{1:N}\), which we write in column vector form as \[Y = (Y_1,Y_2,\dots,Y_N)^{\scriptsize{T}}.\]

  • The standard OLS model is

    [L1] \(\quad\quad\quad\quad\quad Y = Z\beta + \epsilon,\)

    where \(\epsilon=\epsilon_{1:N}\) is a vector of independent, identically distributed random variables with mean zero and constant variance, \[{\mathbb{E}}[\epsilon_n]=0, \quad\quad {\mathrm{Var}}[\epsilon_n] = \sigma^2.\] Standard linear model software, such as lm in R, provides confidence intervals based on this model.

  • Under model L1, the estimator \(\hat\beta_{OLS}(y_{1:N})\) is unbiased. This can be checked: \[\begin{eqnarray} {\mathbb{E}}\big[\hat\beta_{OLS}(Y_{1:N})\big] &=&{\mathbb{E}}\big[ (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}Y \big]\\ &=& {\mathbb{E}}\big[ (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}\{Z\beta + \epsilon \}\big]\\ &=& (Z^{\scriptsize{T}}Z)^{-1}Z^{\scriptsize{T}}\{Z\beta + {\mathbb{E}}[\epsilon]\} \\ &=& (Z^{\scriptsize{T}}Z)^{-1}(Z^{\scriptsize{T}}Z)\beta \\ &=& \beta \end{eqnarray}\]
  • A result for linear models is that \(\hat\beta_{OLS}(y_{1:N})\) is the minimum variance unbiased estimator for model L1.

  • The variance/covariance matrix of \(\hat\beta_{OLS}(Y_{1:N})\) under this model is \[{\mathrm{Cov}}[\hat\beta_{OLS}(Y_{1:N})] = \sigma^2 \big( Z^{\scriptsize{T}}Z\big)^{-1},\] which is estimated using an estimator for \(\sigma\) of \[\hat\sigma_{OLS}(y_{1:N})= \sqrt{\frac{1}{N-d} \big(y-Z\hat\beta_{OLS}\big)^{\scriptsize{T}}\big(y-Z\hat\beta_{OLS}\big)},\] where \(d\) is the number of covariates, i.e., the number of columns of \(Z\).

Let’s look at the residuals to assess how appropriate this model is here.

acf(resid(lm_fit))

  • The horizontal dashed lines on the graph of the sample autocorrelation function (ACF) give a measure of chance variation under the null hypothesis that the residuals are IID.

  • At each lag \(h\), the chance that the estimated ACF falls within this band is approximately 95%, under the null hypothesis.

  • Thus, under the null hypothesis, one expects a fraction of \(1/20\) of the lags of the sample ACF to fall outside this band.

  • Here, the sample ACF confirms what we can probably see from the plot of the fitted model: the variation around the fitted model is clustered in time, so the sample ACF of the residuals is not consistent with a model having independent error terms.




2.4.2 Question: How does R construct these horizontal dashed lines?

  • Solving this question is in Homework 1.

  • How would you check what R actually does when it constructs these dashed lines? What approximation is being made? When is that approximation appropriate?

Hint: If you type acf in R, you get the source code for the acf function. You’ll see that the plotting is done by a service function plot.acf. This service function is part of the package, and is not immediately accessible to you. Nevertheless, you can check the source code as follows

  1. Notice, either from the help documentation ?acf or the last line of the source code acf that this function resides in the package stats.

  2. Now, you can access this namespace directly, to list the source code, by

    stats:::plot.acf
  3. Finally, relate this source code to the task of testing for lack of correlation, a standard topic in undergrad introductory statistics courses. The critical line of code seems to be

    clim0 <- if (with.ci) qnorm((1 + ci)/2)/sqrt(x$n.used)

    This appears to use a normal distribution approximation for the sample autocorrelation estimator, with mean zero and standard deviation \(1/\sqrt{N}\). Derive this approximation, for a sequence of IID mean zero random variables. You can ignore the issue of estimating the mean: suppose that the mean is known to be zero.




2.4.3 Generalized least squares

  • Suppose for the moment that we knew the covariance matrix, \(\Gamma\), for a model with dependent errors,

    [L2] \(\quad\quad\quad\quad Y = Z\beta + \zeta, \quad \quad \zeta \sim N[0,\Gamma].\)

    We read “\(\zeta \sim N[0,\Gamma]\)” as “\(\zeta\) follows a multivariate normal distribution with mean zero and covariance matrix \(\Gamma\).”

  • The minimum variance unbiased estimator of \(\beta\) for model L2 is the generalized least square (GLS) estimator, \[\hat \beta_{GLS}(y_{1:N}) = \big( Z^{\scriptsize{T}}\Gamma^{-1} Z \big)^{-1} \, Z^{\scriptsize{T}}\Gamma^{-1} y.\]

  • The OLS estimator remains unbiased for L2 (you can check this as an exercise). In this sense it remains a reasonable estimator. It is often a practical solution to use the OLS estimator, expecially for preliminary data analysis. We don’t know \(\Gamma\) so can’t necessarily make a good estimator based on the GLS model. It might be easier to get an estimate of \(\Gamma\) once we have a reasonable estimate of the trend.

  • For model L2, the variance of the OLS estimator is \[{\mathrm{Var}}\big[\hat \beta_{OLS}(Y_{1:N})\big] = (Z^{\scriptsize{T}}Z)^{-1} \, Z^{\scriptsize{T}}\Gamma Z\, (Z^{\scriptsize{T}}Z)^{-1}.\] This is different from the variance under model L1.

  • CONCLUSION. It is okay to do ordinary linear regression for data which are not well modeled with uncorrelated errors. However, if we do so, we should not trust the error estimates coming from L1.

  • This is an example of a situation where some parts of the output from statistical software are reasonable (here, the parameter estimates from lm) and other parts are unreasonable (the corresponding standard errors and any tests based on them). The theory helps us decide which bits of computer output to use and which to ignore.