Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.
Objectives
Looking at the frequency components present in our data can help to identify appropriate models.
Looking at the frequency components present in our models can help to assess whether they are doing a good job of describing our data.
We’re going to start by reviewing eigenvectors and eigenvalues of covariance matrices. This eigen decomposition also arises elsewhere in Statistics, such as the principle component analysis technique in multivariate analysis.
A univariate time series model is a vector-valued random variable \(Y_{1:N}\) which we suppose has a covariance matrix \(V\) which is an \(N\times N\) matrix with entries \(V_{mn}={\mathrm{Cov}}(Y_m,Y_n)\).
\(V\) is a non-negative definite symmetric matrix, and therefore has \(N\) non-negative eigenvalues \(\lambda_1,\dots,\lambda_N\) with corresponding eigenvectors \({u}_1,\dots,{u}_N\) such that \[ V {u}_n = \lambda_n {u}_n.\]
A basic property of these eigenvectors is that they are orthogonal, i.e., \[ {u}_m^{\scriptsize{T}}{u}_n = 0 \mbox{ if $m\neq n$}.\]
We may work with normalized eigenvectors that are scaled such that \({u}_n^{\scriptsize{T}}{u}_n = 1\).
We can also check that the components of \(Y\) in the directions of different eigenvectors are uncorrelated. Since \({\mathrm{Cov}}(AY,BY)=A{\mathrm{Cov}}(Y,Y)B^{\scriptsize{T}}\), we have \[\begin{eqnarray} {\mathrm{Cov}}({u}_m^{\scriptsize{T}}Y, {u}_n^{\scriptsize{T}}Y) &=& {u}_m^{\scriptsize{T}}{\mathrm{Cov}}(Y,Y) {u}_n \\ &=& {u}_m^{\scriptsize{T}}V {u}_n \\ &=&\lambda_n {u}_m^{\scriptsize{T}}{u}_n &=& \left\{\begin{array}{ll} \lambda_n & \mbox{if $m=n$} \\ 0 & \mbox{if $m\neq n$} \end{array}\right. \end{eqnarray}\] For the last equality, we have supposed that the eigenvectors are normalized.
Thus, if we knew \(V\), we could convert the model to a representation where the observable random variables are uncorrelated.
Specifically, we could transform the data into its components in the directions of the eigenvectors of the model. An uncorrelated (or, in the Gaussian case, independent) model would then become appropriate for this transformation of the data.
Let’s see how to do that for a stationary time series model, say 100 observations from an AR(1) model with autoregressive coefficient 0.8.
N <- 100
phi <- 0.8
sigma <- 1
V <- matrix(NA,N,N)
for(m in 1:N) for(n in 1:N) V[m,n] <- sigma^2 * phi^abs(m-n) / (1-phi^2)
V_eigen <- eigen(V,symmetric=TRUE)
oldpars <- par(mfrow=c(1,2))
matplot(V_eigen$vectors[,1:5],type="l")
matplot(V_eigen$vectors[,6:9],type="l")
par(oldpars)
We see that the eigenvectors, plotted as functions of time, look like sine wave oscillations.
The eigenvalues are
round(V_eigen$values[1:9],2)
## [1] 24.59 23.44 21.73 19.70 17.57 15.51 13.61 11.91 10.42
We see that the eigenvalues are decreasing. For this model, the components of \(Y_{1:N}\) with highest variance correspond to long-period oscillations.
Are the sinusoidal eigenvectors a special feature of this particular time series model, or something more general?
Suppose \(\{Y_n,-\infty<n<\infty\}\) has a stationary autocovariance function \(\gamma_h\).
We write \(\Gamma\) for the infinite matrix with entries \[ \Gamma_{m,n} = \gamma_{m-n} \quad \mbox{for all integers $m$ and $n$}.\]
An infinite eigenvector is a sequence \({u}=\{{u}_n, -\infty<n<\infty\}\) with corresponding eigenvalue \(\lambda\) such that \[\Gamma {u}= \lambda {u},\] or, writing out the matrix multiplication explicitly, \[\sum_{n=-\infty}^\infty \Gamma_{m,n} {u}_n = \lambda {u}_m\quad \mbox{for all $m$}.\]
Now, let’s look for a sinusoidal solution, \({u}_n = e^{i\omega n}\). Then, \[\begin{eqnarray} \sum_{n=-\infty}^\infty \Gamma_{m,n} {u}_n &=& \sum_{n=-\infty}^\infty \gamma_{m-n} {u}_n \\ &=& \sum_{h=-\infty}^\infty \gamma_{h} {u}_{m-h} \quad \mbox{setting $h=m-n$} \\ &=& \sum_{h=-\infty}^\infty \gamma_{h} e^{i\omega(m-h)} \\ &=& e^{i\omega m} \sum_{h=-\infty}^\infty \gamma_{h} e^{-i\omega h} \\ &=& {u}_m \lambda \mbox{ for } \lambda= \sum_{h=-\infty}^\infty \gamma_{h} e^{-i\omega h} \end{eqnarray}\]
This calculation shows that \[{u}_n(\omega) = e^{i\omega n}\] is an eigenvector for \(\Gamma\) for any choice of \(\omega\). The corresponding eigenvalue function, \[\lambda(\omega)= \sum_{h=-\infty}^\infty \gamma_{h} e^{-i\omega h},\] is called the spectral density function. It is calculated as the Fourier transform of \(\gamma_h\) at frequency \(\omega\).
It was convenient to do this calculation with complex exponentials. However, writing \[ e^{i\omega n} = \cos(\omega n) + i \sin(\omega n)\] we see that the real and imaginary parts of this calculation in fact give us two real eigenvectors, \(\cos(\omega n)\) and \(\sin(\omega n)\).
Assuming that this computation for an infinite sum represents a limit of increasing dimension for finite matrices, we have found that the eigenfunctions for any long, stationary time series model are approximately sinusoidal.
For the finite time series situation, we only expect \(N\) eigenvectors for a time series of length \(N\). We have one eigenvector for \(\omega=0\), two eigenvectors corresponding to sine and cosine functions with frequency \[\omega_{n} = 2\pi n/N, \mbox{ for $0<n<N/2$},\] and, if \(N\) is even, a final eigenvector with frequency \[\omega_{(N/2)} = \pi.\]
These sine and cosine vectors are called the Fourier basis.
The frequency components of \(Y_{1:N}\) are the components in the directions of these eigenvectors. Equivalently, we could say they are the representation of \(Y_{1:N}\) in the Fourier basis. Specifically, we write \[\begin{eqnarray} C_n &=& \frac{1}{\sqrt{N}}\sum_{k=1}^N Y_k\cos(\omega_n k) \mbox{ for $0\le n\le N/2$}, \\ S_n &=& \frac{1}{\sqrt{N}}\sum_{k=1}^N Y_k\sin(\omega_n k) \mbox{ for $1\le n\le N/2$}. \end{eqnarray}\]
Similarly, the frequency components of data \({y_{1:N}^*}\) are \[\begin{eqnarray} {c_n^*} &=& \frac{1}{\sqrt{N}}\sum_{k=1}^N {y_k^*}\cos(\omega_n k) \mbox{ for $0\le n\le N/2$}, \\ {s_n^*} &=& \frac{1}{\sqrt{N}}\sum_{k=1}^N {y_k^*}\sin(\omega_n k) \mbox{ for $1\le n\le N/2$}. \end{eqnarray}\]
The frequency components of the data are often written as real and imaginary parts of the discrete Fourier transform, \[\begin{eqnarray} {d_n^*} &=& \frac{1}{\sqrt{N}} \sum_{k=1}^N {y_k^*} e^{2\pi i n/N} \\ &=&{c_n^*} + i{s_n^*} \end{eqnarray}\]
Here, we have made a decision to introduce a normalizing constant of \(1/\sqrt{N}\). There are various choices about signs and factors of \(2\pi\), \(\sqrt{2\pi}\) and \(\sqrt{N}\) that can—and are—made in the definition of the Fourier transform in various situations. One should try to be consistent, and also be careful: the fft
command in R, for example, doesn’t include this constant.
fft
is an implementation of the fast Fourier transform algorithm, which enables computation of all the frequency components with order \(N\log(N)\) computation. At first consideration, computing the frequency components appears to require a matrix multiplication involving order \(N^2\) additions and multiplications. When \(N=10^5\) or \(N=10^6\) this difference becomes important!
The first frequency component, \(C_0\), is something of a special case, since it has mean \(\mu={\mathbb{E}}[Y_n]\) whereas the other components have mean zero.
In practice, we subtract a mean before computing the periodogram, which is equivalent to removing the frequency component for frequency zero.
The frequency components \((C_{0:N/2},S_{1:N/2})\) are asymptotically uncorrelated. They are constructed as a sum of a large number of terms, with the usual \(1/\sqrt{N}\) scaling for a central limit theorem. So, it may not be surprising that a central limit theorem applies, giving asymptotic justification for the following normal approximation.
\((C_{1:N/2},S_{1:N/2})\) are approximately independent, mean zero, Normal random variables with \[ {\mathrm{Var}}(C_n) = {\mathrm{Var}}(S_n) \approx 1/2 \lambda(\omega_n).\]
\(C_0\big/ \sqrt{N}\) is approximately Normal, mean \(\mu\), independent of \((C_{1:N/2},S_{1:N/2})\), with \[{\mathrm{Var}}(C_0\big/ \sqrt{N}) \approx \lambda(0)\big/ N.\]
Moving to the frequency domain (i.e., transforming the data to its frequency components) has decorrelated the data. Statistical techniques based on assumptions of independence become applicable.
It follows from the normal approximation that, for \(1\le n\le N/2\), \[ C_n^2 + S_n^2 \approx \lambda(\omega_n) \frac{\chi^2_2}{2},\] where \(\chi^2_2\) is a chi-squared random variable on two degrees of freedom.
Taking logs, we have \[ \log\big(C_n^2 + S_n^2 \big) \approx \log \lambda(\omega_n) + \log(\chi^2_2/2).\]
These results motivate consideration of the periodogram, \[ I_n = {c_n^*}^2 + {s_n^*}^2 = \big| {d_n^*}\big|^2\] as an estimator of the spectral density.
\(\log I_n\) can be modeled as an estimator of the log spectral density with independent, identically distributed errors.
We see from the normal approximation that a signal-plus-white-noise model is appropriate for estimating the log spectral density using the log periodogram.
The power of a wave is proportional to the square of its amplitude.
The spectral density gives the mean square amplitude of the components at each frequency, and therefore gives the expected power.
The spectral density function can therefore be called the power spectrum.
\[ Y_n = \frac{1}{q+1} \sum_{k=0}^q \epsilon_{n-k}.\]
\[ e^{i\omega} = \cos(\omega)+i\,\sin(\omega).\]
system("head ann_arbor_weather.csv",intern=TRUE)
## [1] "## https://weather-warehouse.com/WeatherHistory/PastWeatherData_AnnArborUnivOfMi_AnnArbor_MI_January.html"
## [2] "## accessed by ELI on Nov 2, 2015"
## [3] "##"
## [4] "## Full column descriptions:"
## [5] "## Lowest Temperature (F)"
## [6] "## Highest Temperature (F)"
## [7] "## Warmest Minimum Temperature (F)"
## [8] "## Coldest Maximum Temperature (F)"
## [9] "## Average Minimum Temperature (F)"
## [10] "## Average Maximum Temperature (F)"
y <- read.table(file="ann_arbor_weather.csv",header=TRUE)
head(y)
## Year Low High Hi_min Lo_max Avg_min Avg_max Mean Precip Snow Hi_Pricip
## 1 1900 -7 50 36 12 18.0 34.7 26.3 1.06 4.0 0.28
## 2 1901 -7 48 37 20 17.0 31.8 24.4 1.45 10.1 0.40
## 3 1902 -4 41 27 11 15.0 30.4 22.7 0.60 6.0 0.25
## 4 1903 -7 50 36 12 15.1 29.6 22.4 1.27 7.3 0.40
## 5 1904 -11 38 31 6 8.2 22.9 15.3 2.51 11.0 0.67
## 6 1905 -3 47 32 14 10.9 25.9 18.4 1.64 7.9 0.84
## Hi_Snow
## 1 1.1
## 2 3.2
## 3 2.5
## 4 3.2
## 5 2.1
## 6 2.5
low <- y$Low
We have to deal with the NA measurement for 1955. A simple approach is to replace the NA by the mean.
What other approaches can you think of for dealing with this missing observation?
What are the strengths and weaknesses of these approaches?
low[is.na(low)] <- mean(low, na.rm=TRUE)
spectrum(low, main="Unsmoothed periodogram")
What is the default periodogram smoother in R?
How should we use it?
Why is that default chosen?
spectrum(low,spans=c(3,5,3), main="Smoothed periodogram",ylim=c(15,100))
To see what R actually does to compute and smooth the periodogram, type ?spectrum
.
This will lead you to type ?spec.pgram
.
You will see that, by default, R removes a linear trend, fitted by least squares. This may often be a sensible thing to do. Why?
You will see that R then multiplies the data by a quantity called a taper, computed by spec.taper
.
The taper smooths the ends of the time series and removes high-frequency artifacts arising from an abrupt start and end to the time series.
Formally, from the perspective of the Fourier transform, the time series takes the value zero outside the observed time points \(1:N\). The sudden jump to and from zero at the start and end produces unwanted effects in the frequency domain.
The default taper in R smooths the first and last \(p=0.1\) fraction of the time points, by modifying the detrended data \({y_{1:N}^*}\) to tapered version \({z_{1:N}^*}\) defined by \[ {z_n^*} = \left\{ \begin{array}{ll} {y_n^*} \big(1-\cos(\pi n/Np)\big)/2 & \mbox{ if $1\le n< Np$ } \\ {y_n^*} & \mbox{ if $Np \le n \le N(1-p)$ } \\ {y_n^*} \big(1-\cos(\pi [N+1-n]/Np)\big)/2 & \mbox{ if $N(1-p)<n\le N$ } \end{array}\right. \]
plot(spec.taper(rep(1,100)),type="l",
main="Default taper in R for a time series of length 100")
abline(v=c(10,90),lty="dotted",col="red")
spectrum(low,method="ar", main="Spectrum estimated via AR model picked by AIC")
Since this time series is well modeled by white noise, we could fit a signal plus white noise model. This might be a more sensitive way to look for a trend.
Let’s try some low-order polynomial trend specifications.
lm0 <- lm(Low~1,data=y)
lm1 <- lm(Low~Year,data=y)
lm2 <- lm(Low~Year+I(Year^2),data=y)
lm3 <- lm(Low~Year+I(Year^2)+I(Year^3),data=y)
poly_aic <- matrix( c(AIC(lm0),AIC(lm1),AIC(lm2),AIC(lm3)), nrow=1,
dimnames=list("<b>AIC</b>", paste("order",0:3)))
require(knitr)
kable(poly_aic,digits=1)
order 0 | order 1 | order 2 | order 3 | |
---|---|---|---|---|
AIC | 780.9 | 781.6 | 783.6 | 783.8 |
Still no evidence suggesting anything other than a white noise model.
Now, let’s remind ourselves what the data look like.
plot(Low~Year,data=y,type="l")
Our eye may see a trend, and recall that it looks a bit like the global temperature trend.
Let’s try fitting global temperature as an explanatory variable.
Z <- read.table("Global_Temperature.txt",header=TRUE)
global_temp <- Z$Annual[Z$Year %in% y$Year]
lm_global <- lm(Low~global_temp,data=y)
AIC(lm_global)
## [1] 779.8877
summary(lm_global)
##
## Call:
## lm(formula = Low ~ global_temp, data = y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.1554 -4.5079 -0.1595 4.1068 20.3959
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.0413 0.6916 -4.397 2.51e-05 ***
## global_temp 3.7396 2.1530 1.737 0.0851 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.273 on 112 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.02623, Adjusted R-squared: 0.01754
## F-statistic: 3.017 on 1 and 112 DF, p-value: 0.08515
Perhaps we now have a satisfactory understanding of Ann Arbor January low temperatures: random, white noise, variation around the global mean temperature trend.
With noisy data, uncovering what is going on can take careful data analysis together with scientific reasoning.
What could we do to improve the signal to noise ratio in this analysis?
Hint: notice the standard error on the coefficient, together with consideration of possible values of the coefficient in a scientifically plausible model.
It is always good practice to be explicit about the units of quantities. For frequency domain analysis we can consider various options for units of frequency.
For a frequency component corresponding to \(\sin(2\pi\omega n)\) or \(\cos(2\pi\omega n)\), we say that the frequency is \(\omega\) cycles per unit time.
Suppose the time series consists of equally spaced observations, with \(t_{n}-t_{n-1}=\Delta\) years. Then we say that the frequency is \(\omega/\Delta\) cycles per year.
For a frequency component corresponding to \(\sin(\nu t)\) or \(\cos(\nu t)\), we say that the frequency is \(\nu\) radians per unit time.
The period of an oscillation is the time for one cycle. So, when frequency is measured in cycles per time, we have \[ \mbox{period} = \frac{1}{\mbox{frequency}}.\] Thus, for a frequency component corresponding to \(\sin(2\pi\omega n)\) or \(\cos(2\pi\omega n)\), the period is \(1/\omega\) observation intervals.
When the observation intervals have constant time length (years, seconds, etc) we should use those units for the period.