Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.
Objectives
Define different concepts for stationarity.
Define white noise.
Use white noise to construct some basic time series models.
A time series model which is both mean stationary and covariance stationary is called weakly stationary.
A time series model for which all joint distributions are invariant to shifts in time is called strictly stationary.
Formally, this means that for any collection of times \((t_1, t_2,\dots,t_K)\), the joint distribution of observations at these times should be the same as the joint distribution at \((t_1+\tau, t_2+\tau,\dots,t_K+\tau)\) for any \(\tau\).
For equally spaced observations, this becomes: for any collection of timepoints \(n_1,\dots,n_K\), and for any lag \(h\), the joint density function of \((Y_{n_1},Y_{n_2},\dots, Y_{n_K})\) is the same as the joint density function of \((Y_{n_1+h},Y_{n_2+h},\dots, Y_{n_K+h})\).
In our general notation for densities, this strict stationarity requirement can be written as \[\begin{eqnarray}&&f_{Y_{n_1},Y_{n_2},\dots, Y_{n_K}}(y_1,y_2,\dots,y_K)\\ &&\quad\quad = f_{Y_{n_1+h},Y_{n_2+h},\dots, Y_{n_K+h}}(y_1,y_2,\dots,y_K). \end{eqnarray}\]
Strict stationarity implies weak stationarity (check this). Note that we only defined weak stationarity for equally spaced observations.
Supposed they are equally spaced, we can 1. calculate average in chunks (for mean stationary) 2. calculate sample ACF in chunks (for covariance stationary)
ANSWER: It sometimes happens. However, systems often change over time, and that may be one of the things we are interested in.
A time series model \(\epsilon_{1:N}\) which is weakly stationary with \[\begin{eqnarray} {\mathbb{E}}[\epsilon_n]&=& 0 \\ {\mathrm{Cov}}(\epsilon_m,\epsilon_n) &=& \left\{\begin{array}{ll} \sigma^2, & \mbox{if $m=n$} \\ 0, & \mbox{if $m\neq n$} \end{array}\right. , \end{eqnarray}\] is said to be white noise with variance \(\sigma^2\).
The “noise” is because there’s no pattern, just random variation. If you listened to a realization of white noise as an audio file, you would hear a static sound.
The “white” is because all freqencies are equally represented. This will become clear when we do frequency domain analysis of time series.
Signal processing—sending and receiving signals on noisy channels—was a motivation for early time series analysis.
In time series analysis, a sequence of independent identically distributed (IID) Normal random variables with mean zero and variance \(\sigma^2\) is known as Gaussian white noise. We write this model as \[ \epsilon_{1:N} \sim \mbox{IID } N[0,\sigma^2].\]
Let \(\epsilon_{1:N}\) be IID with \[\begin{eqnarray} \epsilon_n = \left\{\begin{array}{ll} 1, & \mbox{with probability $1/2$} \\ -1, & \mbox{with probability $1/2$} \end{array}\right. . \end{eqnarray}\] We can check that \({\mathbb{E}}[\epsilon_n]=0\), \({\mathrm{Var}}(\epsilon_n)=1\) and \({\mathrm{Cov}}(\epsilon_m,\epsilon_n)=0\) for \(m\neq n\). Therefore, \(\epsilon_{1:N}\) is white noise.
Similarly, for any \(p\in (0,1)\), we could have \[\begin{eqnarray} \epsilon_n = \left\{\begin{array}{ll} (1-p)/p, & \mbox{with probability $p$} \\ -1, & \mbox{with probability $1-p$} \end{array}\right. . \end{eqnarray}\]
Let \(\epsilon_n = \sin(2\pi n U)\), with a single draw \(U\sim\mathrm{Uniform}[0,1]\) determining the time series model for all \(n\in 1:N\).
A. Show that \(\epsilon_{1:N}\) is weakly stationary, and is white noise!
B. Show that \(\epsilon_{1:N}\) is NOT strictly stationary.
These are exercises in working with sines and cosines.
As well as providing a concrete example of a weakly stationary time series that is not strictly stationary, practice working with sines and cosines will come in handy later when we work in the frequency domain.
As a hint for B, consider the following plot of \(\epsilon_{1:3}\) as a function of \(U\). \(\epsilon_1\) is shown as the black solid line; \(\epsilon_2\) is the red dashed line; \(\epsilon_3\) is the blue dot-dash line.
All statistical tests (i.e., whenever we use data to answer a question) rely on having a model for the data. The model is sometimes called the assumptions for the test.
If our model is wrong, then any conclusions drawn from it may be wrong. Our error could be small and insignificant, or disastrous.
Time series data collected close in time are often more similar than a model with IID variation would predict. We need models that have this property, and we must work out how to test interesting hypotheses for these models.
The order \(p\) autoregressive model, abbreviated to AR(p), is
[M1] \(\quad\quad \quad Y_n = \phi_1 Y_{n-1}+\phi_2Y_{n-2}+\dots+\phi_pY_{n-p} + \epsilon_n\),
where \(\{\epsilon_n\}\) is a white noise process.
Often, we consider the Gaussian AR(p) model, where \(\{\epsilon_n\}\) is a Gaussian white noise process.
M1 is a stochastic difference equation. It is a difference equation (also known as a recurrence relation) since each time point is specified recursively in terms of previous time points. Stochastic just means random.
To complete the model, we need to initialize the solution to the stochastic difference equation. Supposing we want to specify a distribution for \(Y_{1:N}\), we have some choices in how to set up the initial values.
We can specify \(Y_{1:p}\) explicitly, to get the recursion started.
We can specify \(Y_{1-p:0}\) explicitly.
For either of these choices, we can define these initial values either to be additional parameters in the model (i.e., not random) or to be specified random variables.
If we want our model is strictly stationary, we must initialize so that \(Y_{1:p}\) have the proper joint distribution for this stationary model.
Assuming the initialization has mean zero, M1 implies that \({\mathbb{E}}[Y_n]=0\) for all \(n\). For additional generality, we could add a constant mean \(\mu\).
Let’s investigate a particular Gaussian AR(1) process, as an exercise.
[M2] \(\quad\quad \quad Y_n = 0.6 Y_{n-1}+ \epsilon_n\),
where \(\epsilon_n\sim \mathrm{IID} N[0,1]\). We will initialize with \(Y_1\sim N[0,1.56^2]\).
Looking at simulated sample paths is a good way to get some intuition about a random process model.
We will do this for the AR(1) model M2
One approach is to use the arima.sim
function in R.
set.seed(123456789)
ar1 <- arima.sim(list(ar=0.6),n=100,sd=1)
plot(ar1,type="l")
Does your intuition tell you that these data are evidence for a model with a linear trend?
The eye looks for patterns in data, and often finds them even when there is no strong statistical evidence.
That is why we need statistical tests!
It is easy to see patterns even in white noise. Dependent models produce spurious patterns even more often.
Play with simulating different models with different seeds to train your intuition.
A direct approach to simulating model M2 is to write out the model equation explicitly.
set.seed(123456789)
N <- 100
X <- numeric(N)
X[1] <- rnorm(1,sd=1.56)
for(n in 2:N) X[n] <- 0.6 * X[n-1] + rnorm(1)
plot(X,type="l")
arima.sim
simulation, except for a difference near the start. Can you explain this? Hint: How does arima.sim
initialize the simulation?arima.sim
over the direct simulation method?The order \(q\) moving average model, abbreviated to MA(q), is
[M3] \(\quad\quad \quad Y_n = \epsilon_n +\theta_1 \epsilon_{n-1} +\dots+\theta_q\epsilon_{n-q}\),
where \(\{\epsilon_n\}\) is a white noise process.
To fully specify \(Y_{1:N}\) we must specify the joint distribution of \(\epsilon_{1-q:N}\).
Often, we consider the Gaussian MA(q) model, where \(\{\epsilon_n\}\) is a Gaussian white noise process.
In M3, we’ve defined a zero mean process. We could add a mean \(\mu\).
Let’s investigate a particular Gaussian MA(2) process, as an exercise.
[M4] \(\quad\quad \quad Y_n = \epsilon_n + 1.5\epsilon_{n-1}+\epsilon_{n-2}\),
where \(\epsilon_n\sim \mathrm{IID} N[0,1]\).
N <- 100
set.seed(123456789)
X1 <- arima.sim(list(ma=c(1.5,1)),n=N,sd=1)
set.seed(123456789)
epsilon <- rnorm(N+2)
X2 <- numeric(N)
for(n in 1:N) X2[n] <- epsilon[n+2]+1.5*epsilon[n+1]+epsilon[n]
oldpars <- par(mfrow=c(1,2))
plot(X1,type="l")
plot(X2,type="l")
par(oldpars)
X1
and X2
look identical. We can check thisall(X1==X2)
## [1] TRUE
Perhaps you agree that the spurious evidence for a trend that we saw for the AR(1) model is still somewhat present for the MA(2) simulation.
We could be curious about what the underlying white noise process looks like
N <- 100
set.seed(123456789)
epsilon <- rnorm(N)
plot(epsilon,type="l")
The random walk model is
[M5] \(\quad\quad\quad Y_n = Y_{n-1} + \epsilon_n\),
where \(\{\epsilon_n\}\) is white noise. Unless otherwise specified, we usually initialize with \(Y_0=0\).
If \(\{\epsilon_n\}\) is Gaussian white noise, then we have a Gaussian random walk.
The random walk model is a special case of AR(1) with \(\phi_1=1\).
The stochastic difference equation in M5 has an exact solution, \[ Y_n = \sum_{k=1}^n\epsilon_k.\]
We can also call \(Y_{0:N}\) an integrated white noise process. We think of summation as a discrete version of integration.
If data \({y_{1:N}^*}\) are modeled as a random walk, the value of \(Y_0\) is usually an unknown. Rather than introducing an unknown parameter to our model, we may initialize our model at time \(t_1\) with \(Y_1={y_1^*}\).
The first difference time series \({z_{2:N}^*}\) is defined by \[ {z_n^*}= \Delta {y_n^*} = {y_{n}^*}-{y_{n-1}^*}\]
From a time series of length \(N\), we only get \(N-1\) first differences.
A random walk model for \({y_{1:N}^*}\) is essentially equivalent to a white noise model for \({z_{2:N}^*}= \Delta {y_{2:N}^*}\), apart from the issue of initialization.
The random walk with drift model is given by the difference equation
[M6] \(\quad\quad\quad Y_n = Y_{n-1} + \mu + \epsilon_n\),
driven by a white noise process \(\{\epsilon_n\}\). This has solution \[ Y_n = Y_0 + n\mu + \sum_{k=1}^n\epsilon_k.\]
Here, \(\mu\) is the mean of the increments rather than the random walk process itself.
As for the random walk without drift, we must define \(Y_0\) to initialize the model and complete the model specification. Unless otherwise specified, we usually initialize with \(Y_0=0\).
The theory of efficient financial markets suggests that the logarithm of a stock market index (or, for that matter, the value of an individual stock or other investment) might behave like a random walk with drift
Let’s test that out on daily S&P 500 data, downloaded from yahoo.com.
dat <- read.table("sp500.csv",sep=",",header=TRUE)
N <- nrow(dat)
sp500 <- dat$Close[N:1] # data are in reverse order in sp500.csv
par(mfrow=c(1,2))
plot(sp500,type="l")
plot(log(sp500),type="l")
mu <- mean(diff(log(sp500)))
sigma <- sd(diff(log(sp500)))
set.seed(95483123)
X1 <- log(sp500[1])+cumsum(c(0,rnorm(N-1,mean=mu,sd=sigma)))
set.seed(324324587)
X2 <- log(sp500[1])+cumsum(c(0,rnorm(N-1,mean=mu,sd=sigma)))
par(mfrow=c(1,2))
plot(X1,type="l")
plot(X2,type="l")
Seems reasonable enough so far. Let’s plot the sample autocorrelation function (sample ACF) of diff(log(sp500))
.
It is bad style to refer to quantities using computer code notation. We should set up mathematical notation in the text. Let’s try again…
Let \({y_{1:N}^*}\) be the time series of S&P 500 daily closing values downloaded from yahoo.com. Let \({z_n^*}= \Delta \log {y_n^*} = \log {y_{n}^*}-\log {y_{n-1}^*}\).
The temporal difference of the log of the value of an investment is called the return on the investment. Let’s plot the sample autocorrelation function of the time series of S&P 500 returns, \({z_{2:N}^*}\).
z <- diff(log(sp500))
acf(z)
This looks pretty close to the ACF of white noise. There is some evidence for a small nonzero autocorrelation at lags 0 and 1.
Here, we have have a long time series (\(N=16616\)). For such a long time series, statistically significant effects may be practically insignificant.
It seems like the S&P 500 returns (centered, by subtracting the sample mean) may be a real-life time series well modeled by white noise.
However, things become less clear when we look at the absolute value of the centered returns.
acf(abs(z-mean(z)),lag.max=200)
Nowadays, nobody is surprised that the sample ACF of a financial return time series shows little or no evidence for autocorrelation.
Deviations from the efficient market hypothesis, if you can find them, are of interest.
Also, it remains a challenge to find good models for volatility, the conditional variance process of a financial return model.