Most people noticed that it is hard to get a model fitting better than white noise for these data. Trying to do better can lead to unstable models with weakly identified parameters.

The data consist of the low temperature from each January, ranging from 1900 to 2019, in Ann Arbor, MI. By looking only at temperatures from the same month each year, we have simplified our problem by eliminating seasonal fluctuation. However, this also reduces our available data. Additionally, the low temperature is not available for January 1955 (seen in the plot below). To deal with this issue, I will use the 1954 January temperature as a proxy for the missing 1955 value. This seems like a better option than simply removing the year 1955 from the data because that could interfere with the yearly variation we are interested in.

This time series plot of the observed data shows wide variation in the January low temperatures across the years, ranging from below \(-20^\circ F\) to almost \(20^\circ F\). The data appear to fluctuate around the mean of \(-2.87^\circ F\) without any obvious long-term patterns. Based on this, it seems reasonable to begin with a null hypothesis that a model with no trend is appropriate for the data. This analysis won’t look at any models with trend, but that would be a logical next step.

From the time series plot, it also seems possible that the variation in the data is increasing over time, especially from about 1980 to present. The change in variation, however, does not seem pronounced enough to preclude the use of a stationary autoregressive-moving average (ARMA) model.

Therefore, I will start the analysis by fitting an ARMA(p,q) model of the form:

\[ Y_n = \mu + \phi_1(Y_{n-1} - \mu) + \dots + \phi_p(Y_{n-p} - \mu) + \varepsilon_n + \psi_1\varepsilon_{n-1} + \dots + \psi_q\varepsilon_{n-q}\] where \({\{\varepsilon_n}\}\) is a white noise process with distribution \(\mathcal{N}(0,\sigma^2)\). The parameters for this model are \(\theta = (\phi_1, \dots, \phi_p, \psi_1, \dots, \psi_q, \mu, \sigma^2)\), representing the coefficients for the autoregressive part of the model, the coefficients for the moving average part of the model, the population mean, and the error variance. In this model, \(\mu\) does not depend on time because we are assuming a model without trend. To determine the best values of p and q for this data, I will fit multiple ARMA(p,q) models with various values of p and q (shown below ranging from 0 to 4).

As an initial method to compare these various ARMA models, I will consider their Akaike information criteria (AIC) values. Models with low values of the AIC indicate higher prediction precision, and therefore, better models in terms of predictive power. Though this is a somewhat informal method of model selection, it can be effective at eliminating models with very bad fits.

MA0 MA1 MA2 MA3 MA4
AR0 825.07 826.75 828.72 830.27 832.24
AR1 826.74 828.16 830.14 831.72 833.71
AR2 828.69 830.14 832.09 833.73 835.70
AR3 830.15 831.83 833.76 830.64 831.95
AR4 832.14 833.83 835.76 832.00 824.47

In the AIC table above, the lowest value is associated with an ARMA(0,0) model. This is a white noise model that assumes no temperature dependence between subsequent years. The ARMA(0,0) model is of the form \(Y_n = \mu + \varepsilon_n\), where the \({\{\varepsilon_n}\}\) are as described above. Although the AIC table identifies this model as the most appropriate for the data, there is some climatological intuition indicating that there is dependence in temperature from one year to the next. Therefore, I will not restrict my analysis to the ARMA(0,0) model. In addition, I will look at some models that have a higher AIC, but that allow for the dependence we are interested in modeling, including the ARMA(0,1), ARMA(1,0), and ARMA(1,1).

I fit these four models to the data and the results are listed in the table below. The first thing to notice is that all four models give similar estimates for the intercept, around \(-2.87\), but that their standard error estimates differ. The first model has standard error \(0.68\), and next two models all have standard errors around \(0.71\), but the ARMA(1,1) model has a higher standard error of about \(0.84\).

Intercept SE(Intercept) AR Coef. MA Coef.
ARMA(0, 0) -2.867 0.676
ARMA(0, 1) -2.875 0.71 0.052
ARMA(1, 0) -2.876 0.713 0.053
ARMA(1, 1) -2.925 0.842 0.78 -0.724

This seems to indicate that the ARMA(1,1) is more accurately capturing the dependence in the data than the other three models. Inadequately modeling dependence can result in artificially low standard errors for parameter estimates. These results indicate that the ARMA(1,1) is the better model to use, which is in opposition to the results of the AIC table above.

Due to the results of the AIC table and these fitted values, I will continue to consider the ARMA(0,0) model and the ARMA(1,1). The other two models, ARMA(0,1) and ARMA(1,0) have coefficient estimates very close to zero and don’t seem to be doing anything significantly different from the ARMA(0,0). The ARMA(0,0) model can be written as \(Y_n = -2.867 + \varepsilon_n\), and the ARMA(1,1) model can be written as follows:

\[\phi(B)(Y_n + 2.925) = \psi(B)\varepsilon_n\]

where \(B\) is the backshift operator, and \(\phi(x)\) and \(\psi(x)\) are the AR and MA polynomials, respectively. For this fitted model, these polynomials are defined as follows: \[\phi(x) = 1 - 0.78x \hspace{3cm} \psi(x) = 1 - 0.724x\]

Something to consider with the ARMA(1,1) model are the roots of the AR and MA polynomials, which can be used to check for causality and invertibility. The AR root is \(1.28\) and the MA root is \(1.38\), both outside the unit circle. This indicates that the fitted model is both causal and invertible, two attractive qualities for a time series model. However, these roots are also relatively close in magnitude, which indicates the possibility of reducing the model to the ARMA(0,0). It’s difficult to tell if these roots are close enough to approximately cancel, but it definitely seems like a possibility. This is another argument for using the ARMA(0,0) model over the ARMA(1,1).

A final test that I will do is a more formal hypothesis test using Wilks’ approximation. For this test, my null hypothesis corresponds to the ARMA(0,0) model, while my alternative corresponds to the ARMA(1,1). The approximation tells us:

\[\Lambda = 2(\mathcal{l}_1 - \mathcal{l}_0) \approx \chi^2_{D_1-D_0}\] where \(\mathcal{l}_i\) is the maximum log likelihood under hypothesis \(H_i\) and \(D_i\) is the number of parameters estimated under hypothesis \(H_i\). We will reject the null hypothesis if \(\Lambda\) is larger than the \(\chi^2\) cutoff. When comparing ARMA(0,0) and ARMA(1,1), \(\Lambda = 0.916\), which we can compare to the cutoff value of \(5.99\) for a 95% significance level and 2 degrees of freedom. This tells us that we cannot reject our null hypothesis – the ARMA(0,0) model is more appropriate for the data. Since this conclusion is supported both here, with the Wilks’ approximate \(\chi^2\) test, with the approximately canceling roots, and with the AIC, I will move forward with the white noise model.

Since I have identified the ARMA(0,0) as the best model for the data, I need to check that the model assumptions are valid. First, I will look at the residuals of the fitted ARMA(0,0) model as a time series plot:

The time series plot shows no striking patterns in the residuals, so I don’t think there is anything too worrisome here. Next, we can look at the autocorrelation plot of the residuals. This will allow us to check our assumption that the errors \(\{\varepsilon_n\}\) are uncorrelated. There is only one lag with significant autocorrelation (lag 15), while the rest may be considered sufficiently close to zero. While this may be the case, there are also some potentially non-negligible fluctuations in the autocorrelation that might be interesting to look into more carefully. Perhaps this indicates that a model with trend could be appropriate for this data.

Finally, in fitting an ARMA model, we make the assumption that \(\{\varepsilon_n\} \sim \mathcal{N}(0,\sigma^2)\) and we can check the normality assumption with a QQ-plot of the residuals. With the exception of a few points that deviate from the line, the residuals seem to be sufficiently normal to make this assumption valid.

Since the model fit seems to meet the assumptions, I can consider doing inference on the parameter estimate for \(\mu\). The \(\texttt{arima()}\) function in R uses the observed Fisher information to calculate standard errors for the coefficients. Those standard errors can then be used to construct approximate 95% confidence intervals for the parameter estimates. The confidence interval for the mean does not contain zero:

\[[-2.867 - (1.96)(0.68), -2.867 + (1.96)(0.68)] = \textbf{[-4.1998, -1.5342]}\]

As noted above, however, there was a possibility that the standard error from the ARMA(0,0) model (\(0.68\)) was artificially small. Therefore, I can check this confidence interval through simulations. Here is the distribution of the estimate for \(\mu\) from 5,00 simulations:

In this plot, the dashed vertical lines correspond to the upper and lower limits of the Fisher information confidence interval calculated above. From looking at this plot, the coverage of the confidence interval seems accurate, indicating that there are no problems with the Fisher information standard errors. I can further check the validity of the above confidence interval by looking at the profile log likelihood. Though not included here, this method also gives a confidence interval comparable to the one constructed using the Fisher information standard errors. This lends more credibility to the above analysis.

From this data exploration, it appears that the ARMA(0,0) model, a Gaussian white noise model, is most appropriate for the January low temperature data for Ann Arbor, MI. This is somewhat surprising, given the intuition that temperature might vary systematically from year to year. Further interesting work would be to consider models with trend to see if we can capture some gradual warming. It seems possible, however, that small changes (increases, fluctuations, etc.) could be difficult to detect with such little data on such a long time frame.

Looking for a trend model

Since this time series is well modeled by white noise, we could fit a signal plus white noise model. This might be a more sensitive way to look for a trend. First, we try some low-order polynomial trend specifications,

\[ Y_n=\sum_{k=0}^K \beta_k n^k + \epsilon_n\]

where \(K\) is the order of the fitted trend model. We compare AIC for \(K\) up to 5.

K=0 K=1 K=2 K=3 K=4 K=5
AIC 825.1 826.4 827.9 829.5 829.5 831

There is still no evidence suggesting anything other than a white noise model. Now, As one more attempt, we can compare the Michigan data to the global temperature series.

The red dashed line shows 10 times the global annual temperature anomaly (multiplied by 9/5 to move from Celcius to Fahrenheit) compared to the Michigan January low (in Fahrenheit). The trends appear similar until 1970 or so, after which the global temperature increases while the Michigan January low does not. However, caution is needed because of the relative scales: A scientifically plausible model probably can’t have a coefficient much bigger than 1 degree centigrade in Michigan per degree of global mean temperature. Given the size of the standard error resulting from year-to-year fluctuations, an effect of this size will be hard to see even if the model is good. The blue dotted line shows the global climate anomaly in degrees Fahrenheit. From this perspective, we can be skeptical about whether the apparent pattern (with warm January in the 1940s, cooler in the 1980, and currently reentering a cool phase) could be related to global temperature fluctations and trend. Interpreting the Michigan data as trend may well be just reading patterns into noise.


Acknowledgements

Parts of this solution are adapted from a previous homework submission by Zoe Rehnberg.