Introduction

Winter in Ann Arbor is not easy, especially because of the heavy snow. It would be nice if we could fit a model to historical snowfall data and maybe use it for prediction. The analysis may provide some insight for a better understanding of the climate change overall; forecast may even help us get more prepared for a harsh winter.

Data Exploration

Unfortunately I couldn’t find data for Ann Arbor snowfall. The closest one I could find is the monthly precipitation in Michigan. Because in winter most of the precipitation comes from snowfall, especially in January, we choose the average precipitation(inches) in Michigan from 1895 to 2018 as our data[1], which should be a good measure of snowfall.

Plots of data in both time and frequency domain are shown below:

We almost couldn’t tell any trend from the time domain plot, and even though it seems like there is some periodicity corresponding to the peak in periodogram around frequency \(0.1 \text{cycles/year}\), we notice that the confidence interval shown by the verticle bar in the plot is very large compared to the peak so it is not really reliable. This is confirmed by the plot of autocorrelation function(ACF) below, which shows no statistically significant correlation for any lag.

All of these suggest a stationary non-seasonal ARMA model could be a good fit to our data.

Fit ARMA models

Under assumption of no trend

We first try to fit a stationary Gaussian \(ARMA(p,q)\) model to our data under the null hypothesis of no trend, which seems reasonable from previous plot.

In order to determine appropriate \(p\) and \(q\), we use Akaike’s information criterion(AIC) as a reference and get following AIC table:

AIC table with no trend
MA0 MA1 MA2 MA3 MA4 MA5
AR0 266.10 264.27 265.72 266.90 268.26 268.12
AR1 263.86 265.71 266.65 267.63 267.82 268.50
AR2 265.65 266.21 267.94 269.52 269.33 270.04
AR3 267.64 267.65 258.80 259.47 260.59 261.86
AR4 266.78 267.22 259.08 259.61 259.96 261.32
The AIC table seems to suggest an \(ARMA(3,2)\) model as it gives lowest AIC value, but if we actually fit the model and check the roots of its MA polynomial, we will find:

The MA roots are really just on the unit circle, which means the model is practically not invertible(we even checked higher order models, but up to \(ARMA(4,5)\) as shown in the AIC table they all have the same non-invertible problem). Besides, the difference between its AIC value and that of \(AR(1)\) model couldn’t justify our choice of such a complex model as it usually has issues like numerical unstabilitiy, etc.

Based on the argument above, we finally choose an \(AR(1)\) model to fit our data, which generates following result:

## 
## Call:
## arima(x = pcpn, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.1834     1.8588
## s.e.  0.0882     0.0751
## 
## sigma^2 estimated as 0.4683:  log likelihood = -128.93,  aic = 263.86

Under assumption of trend being monthly mean temperature

It seems like a common sense that a colder winter will have more snow, so we decided to check whether an ARMA error model with monthly average temperature as signal will give better fit.(we also checked both min and max tempature as well, but they have almost linear relationship with mean temperature so we don’t test them further)

We plot the average temperature(F) of Janaury in Michigan[2] together with precipitation as below:

It looks like there is no apparent dependence. But if we decompose both series into trend + noise + cycles as below, both trends seem to follow the same oscillation pattern. The cycles plot agrees with our former claim of no apparent periodicity.

We again generate AIC table, this time using the temperature as trend:

AIC table with tempareture as trend
MA0 MA1 MA2 MA3 MA4 MA5
AR0 264.72 263.79 265.09 266.28 267.56 266.12
AR1 263.41 265.20 266.93 267.25 265.72 265.88
AR2 265.08 267.08 267.71 269.13 267.99 261.23
AR3 267.06 265.68 259.48 260.95 260.65 262.12
AR4 265.79 265.32 260.28 260.04 261.37 263.66

We basically find the same pattern as no trend: higher order models have smaller AIC values, but we can again check that they are not invertible(not shown here). And based on the same argument, we choose \(AR(1)\) again as our choice of ARMA model. The result is:

## 
## Call:
## arima(x = pcpn, order = c(1, 0, 0), xreg = tmpc)
## 
## Coefficients:
##          ar1  intercept    tmpc
##       0.1649     1.4865  0.0201
## s.e.  0.0898     0.2487  0.0128
## 
## sigma^2 estimated as 0.4592:  log likelihood = -127.71,  aic = 263.41

Determine which model to use

We want to know if we can statistically justify the choice of temperature as trend. This is done by a likelihood ratio test on whether the coefficient before temperature is \(0\), which gives p-value to be

## [1] 0.1175506

Since the p-value is greater than \(10\%\), we can’t reject the null hypothesis that the precipitation doesn’t depend on temperature. Thus we will just use the non-trend \(AR(1)\) model.

Model Diagnostics

AR coefficent

A likelihood ratio test on the AR coefficent \(\phi\) of our \(AR(1)\) model gives p-value to be

## [1] 0.03941323

which is even below \(5\%\) and justifies the \(AR(1)\) coefficient being non-zero, supporting our choice of an \(AR(1)\) model instead of a white noise process.

Correlation of residuals

The ACF plot of residuals of our fitted model as shown below supports the IID error assumption in ARMA model.

Conclusion

We find out an \(AR(1)\) model fits reasonably well to our data of average January precipitation in Michigan. On the other hand, we don’t have enough evidence for its dependence on temperature. We could try to use this model to predict snowfall in Michigan or Ann Arbor in the future.

References:

[1]. Precipitation data: ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdiv/climdiv-pcpnst-v1.0.0-20180205

[2]. Temperature data: ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdiv/climdiv-tmpcst-v1.0.0-20180205