Introduction

Since 1850, the average temperature of the Earth has increased by approximately 2°F, about 0.11°F per decade. However, from 1982, the global warming rate has accelerated to 0.36°F per decade. The year 2023 set a record as the hottest year since 1850, 2.12°F higher than the 20th-century average and 2.43°F higher than the pre-industrial average [1]. Current climate models indicate that rising temperatures will intensify water cycle of the Earth [2], and increased evaporation will result in more frequent and intense storms. In this report, we want to study whether the precipitations in Detroit are affected by global warming, and if so, what is the magnitude of the impact.

According to the government report of Detroit [3], Detroit regularly faces the hazards of flooding, often caused by rainstorms. The frequency and intensity of severe storms have increased, and this trend will likely continue as the effects of climate change becomes more pronounced. Rainstorms and floods can lead to significant economic and societal disruptions, including flooding, infrastructure damage, and soil erosion. As such, monitoring daily precipitation levels, particularly extreme values, detecting periodic patterns and predicting the likelihood of heavy rainfall in the future are crucial. We collect daily precipitation data from January 1st, 1970, February 14th, 2024, observed at Detroit Metro Airport, Michigan, US [4] and use some time series techniques to analyze the data. After analyzing the time series, we can have a basic understanding of what level of preparedness and preventive actions to take.

Date Exploration and Processing

First we read the data from National Centers for Environmental Information [4] and check the time series plot. We select the precipitation in Detroit, MI from January 1st 1970 to February 15th 2024, drop the missing values and convert it to millimeters. It is obvious from the plot that there is a periodic pattern of the peaks and we cannot deny that the time series is stationary. We will use the ACF plot to get a more conclusive answer later.

The maximum of daily precipitation over the past 54 years is:

## [1] 116.1

back in 2014, marginally exceeded 100mm daily and can scarcely be categorized into violent shower according to [5], which will have catastrophic effects.

ARMA Modeling

We want to fit an ARMA model for our time series data to examine the autoregressive, moving average patterns and forecast the future precipitation. The model can be defined as: \[\begin{align*} &\phi(B)(Y_n-\mu)=\psi(B)\mu_n, \\ &\mu=\mathbb E[Y_n], \\ &\phi(x) = 1-\phi_1x-\cdots-\phi_px^p, \\ &\psi(x) = 1+\psi_1x+\cdots+\psi_qx^q, \\ &\epsilon_n\sim\text{i.i.d.}\ \text{N}(0,\sigma^2) \end{align*}\] where \(B\) is the backshift operator, i.e. \(BY_n=Y_{n-1}\) from [6].

Model Selection

To find the optimal model, we will experiment a set of models with different \(p\) and \(q\), then select the one with the least AIC value, which is defined as:

\[\begin{equation} \text{AIC}=-2\times\ell(\theta^*)+2D, \end{equation}\]

where \(\ell(\theta^*)\) is the maximized log likelihood and \(D\) is the number of parameters according to [7]. The code below to tabulate the AIC values of models with different \(p\) and \(q\) is from [7]:

MA0 MA1 MA2 MA3 MA4 MA5
AR0 128586.1 128408.9 128409.4 128409.6 128411.3 128413.0
AR1 128413.9 128409.6 128411.2 128411.7 128413.3 128415.0
AR2 128408.5 128410.7 128411.4 128413.7 128415.3 128417.0
AR3 128409.8 128411.7 128413.8 128415.5 128417.3 128418.9
AR4 128411.1 128413.2 128415.2 128417.2 128418.9 128420.9
AR5 128412.7 128414.8 128416.8 128418.8 128420.8 128420.5

The best model is AR(2), and the coefficients are:

## 
## Call:
## arima(x = prcp, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1      ar2  intercept
##       0.0955  -0.0194     2.3469
## s.e.  0.0071   0.0071     0.0479
## 
## sigma^2 estimated as 38.69:  log likelihood = -64200.25,  aic = 128408.5

Thus our AR(2) model is: \[\begin{align*} Y_n=2.347+0.0955Y_{n-1}-0.0194Y_{n-2}+ \epsilon_n \end{align*}\] We also want to check the AR roots of our AR(2) model based on the methods from [6]. The modulus of both roots are greater than 1, indicating that they are both outside the complex unit circle. Hence our model is causal.

## [1] 7.179582 7.179582

Residual Analysis

For the AR(2) model we fit in the previous section, we want to examine if the residuals of the model have zero means, if they are uncorrelated and if they are normal. To check whether the residuals have zero means, we can plot the local average. We use the LOESS smoothing to do the local regression in the following code based on [8]. The local average plot is nearly a straight line with constant value 0, therefore the residuals exhibit good adherence to the zero-mean assumption.

Next we want to check the uncorrelation using two methods. The first method is the Ljung-Box tests in R to test the hypotheses [9]: \[\begin{equation} H_0:\text{The residuals are uncorrelated}\ \text{v.s.}\ H_1:\text{The residuals are not uncorrelated} \end{equation}\]

## 
##  Box-Ljung test
## 
## data:  resids
## X-squared = 0.00024182, df = 1, p-value = 0.9876

The p value is 0.9876, much larger than the common significance level 0.05. Hence we fail to reject \(H_0\), meaning that the residuals are uncorrelated. We can also plot the ACF of residuals [10]. We find that the ACF values for almost all the lags fall inside the blue bands, which indicates that the model has independent error terms.

At last we use QQ-plot to test the normaility of residuals [11]. The QQ-plot we obtained is not a straight line, so the residuals do not satisfy the normality assumption.

Spectrum Analysis

It is not difficult to see from the time series plot of precipitation that this time series data shows certain kind of periodicity. In order to look into its periodic behavior. We can use spectral analysis to discover underlying periodicities.

To perform spectral analysis, we must first transform data from time domain to frequency domain. The following is the spectral density which is the squared correlation between our time series and sine/cosine waves at the different frequencies spanned by the series [12]. We use periodogram to show the spectral densities.

As for the method we use, Suppose \(X_t\) is a linear process, so it can be written as \[ X_t = \sum_{i=0}^{\infty} \psi_i W_{t-i} = \psi(B) W_t. \] Consider the autocovariance sequence, \[ \gamma_h = \text{Cov}(X_t, X_{t+h}) \] \[ = \mathbb{E} \left[ \sum_{i=0}^{\infty} \psi_i W_{t-i} \sum_{j=0}^{\infty} \psi_j W_{t+h-j} \right] \] \[ = \sigma_w^2 \sum_{i=0}^{\infty} \psi_i \psi_{i+h}. \]

Define the autocovariance generating function as \[ \gamma(B) = \sum_{h=-\infty}^{\infty} \gamma_h B^h. \] Then, \[ \gamma(B) = \sigma_w^2 \sum_{h=-\infty}^{\infty} \sum_{i=0}^{\infty} \psi_i \psi_{i+h} B^h \] \[ = \sigma_w^2 \sum_{i=0}^{\infty} \sum_{j=0}^{\infty} \psi_i \psi_j B^{j-i} \] \[ = \sigma_w^2 \sum_{i=0}^{\infty} \psi_i B^{-i} \sum_{j=0}^{\infty} \psi_j B^j \] \[ = \sigma_w^2 \psi(B^{-1}) \psi(B). \]

Note that \[ \gamma(B) = \sum_{h=-\infty}^{\infty} \gamma_h B^h. \] \[ f(\nu) = \sum_{h=-\infty}^{\infty} \gamma_h e^{-2\pi i \nu h} \] \[ = \gamma(e^{-2\pi i \nu}) \] \[ = \sigma_w^2 \psi(e^{-2\pi i \nu}) \psi(e^{2\pi i \nu}) \]

For an AR(p) model, we have \(\psi(B) = \frac{1}{\phi(B)}\), so \[ f(\nu) = \frac{\sigma_w^2}{\phi(e^{-2\pi i \nu}) \phi(e^{2\pi i \nu})} \] \[ = \frac{\sigma_w^2}{|\phi(e^{-2\pi i \nu})|^2}. \] \[ = \sigma_w^2 |\psi(e^{2\pi i \nu})|^2. \]

Unsmoothed Periodgram

As we can see from the raw periodogram, it is really hard to define a dominant frequency since we cannot observe an obvious peak in the plot. Though we can compare the raw periodogram with using repeated rectangular smoothing windows to obtain a non-parametrically smoothed periodogram and the spectrum estimation via AR model picked by AIC [13].

Smoothing

Frequency and Period

From the smoothed estimation, we can observe a remarkable peak that is very close to 0, and the estimation via AR model displays downward trend. In the smoothed case, we can get the magnitude of the dominant frequency and the period of our precipitation data. The calculation results go as follows.

## Dominant frequency for unsmoothed method is  0.0028
## Dominant frequency for smoothing method is  0.0027
## Dominant frequency for AR method is  0

After we get the dominant frequency, we can calculate the dominant period for each method.

## Dominant period for unsmoothed method is  363.636
## Dominant period for smoothing method is  370.37
## Dominant period for AR method is  Inf

From the calculation, we can see that the dominant frequency obtained using unsmoothed method is almost the same as the smoothed one, while the dominant period obtained using AR method is exactly 0. This indicates that under AR method, the data set does not display any periodicity. So we are going to focus on the other two method.

After we calculate the dominant period using each method, the result is quite interesting. The period is very close to 370 days, which is a bit more than a year.

SARMA Modeling

Prepare Monthly Data and Check for Seasonality

It is common for precipitation to be affected by months and seasons. In order to check whether there is seasonality pattern in this precipitation data, here we derive a monthly dataset that represents the average precipitation for each month from year 1970 to year 2023. We plot it out to observe potential patterns.

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

As illustrated in the plot, the monthly precipitation seems to have both stationary mean and stationary variance. Additionally, frequent oscillations may indicate that there tend to be higher precipitation in certain months and lower in others.

We then use a autocorrelation plot to further investigate the seasonality pattern.

Although the autocorrelation coefficients are mostly around the boundary of significance, we may still observe a possible period of 12 months. This indicates that fitting a seasonal autoregressive moving average model may be appropriate.

Seasonal Autoregressive Moving Average (SARMA) Model

A \(SARMA(p,q) \times (P,Q)_k\) model [14] for a certain period \(k\) refers to

\[ \phi(B) \Phi(B^k)(Y_n - \mu) = \psi(B) \Psi(B^k) \epsilon_n, \]

where \(B\) is the backshift operator, \(\{\epsilon_n\}\) is a white noise process and

\[ \begin{aligned} && \mu & = \mathbb{E} [Y_n] \\ && \phi(x) & = 1 - \phi_1 x - \cdots - \phi_p x^p \\ && \psi(x) & = 1 + \psi_1 x + \cdots + \psi_q x^q \\ && \Phi(x) & = 1 - \Phi_1 x - \cdots - \Phi_P x^P \\ && \Psi(x) & = 1 + \Psi_1 x + \cdots + \Psi_Q x^Q. \end{aligned} \]

According to what is observed in the ACF plot, it is appropriate to choose the period in our SARMA model to be 12 months.

Fitting ARMA Model for Monthly Precipitation Data

We start with determining the value of \(p\) and \(q\) by comparing the AIC result of fitting different \(ARMA(p,q)\) model (\(p, q = 0,1,2,3,4\)) on monthly precipitation data.

MA0 MA1 MA2 MA3 MA4
AR0 2088.05 2085.59 2082.97 2084.71 2085.55
AR1 2084.90 2085.35 2084.85 2081.66 2079.29
AR2 2083.42 2085.33 2084.29 2039.81 2041.39
AR3 2085.18 2078.10 2078.73 2037.12 2039.02
AR4 2084.24 2076.28 2040.78 2042.72 2043.58

Based on the AIC criteria, ARMA(3,3), ARMA(3,4) and ARMA(4,2) may be candidate models for fitting monthly precipitation data. We will then check their AR roots and MA roots.

## [1] "AR roots for ARMA(3,3)"
## [1]  0.8661256+0.5004687i -1.0277266-0.0000000i  0.8661256-0.5004687i
## [1] "MA roots for ARMA(3,3)"
## [1]  0.8668065+0.5056225i -1.0079977-0.0000000i  0.8668065-0.5056225i
## [1] "AR roots for ARMA(3,4)"
## [1]  0.8660853+0.5004442i -1.0292950-0.0000000i  0.8660853-0.5004442i
## [1] "MA roots for ARMA(3,4)"
## [1]   0.8665915+0.5053194i  -1.0086216-0.0000000i   0.8665915-0.5053194i
## [4] 159.2051936+0.0000000i
## [1] "AR roots for ARMA(4,2)"
## [1]  0.8659706+0.5002967i  0.8659706-0.5002967i -5.7252439+0.0000000i
## [4]  5.5633362+0.0000000i
## [1] "MA roots for ARMA(4,2)"
## [1] 0.8657287+0.5034543i 0.8657287-0.5034543i

We can see that the AR roots and MA roots of ARMA(3,3) model fall outside the unit circle, and this model has relatively simpler structure than the other two. Therefore, we will choose \(p = 3\) and \(q = 3\), and further build SARMA model based on this result.

Choosing Q for SARMA Model

Now that we get the best ARMA model for monthly precipitation data, we can also use AIC criteria to choose \(P\) and \(Q\) in our SARMA model. However, since the parameter estimation during model fitting fails to converge when \(P > 1\), we can only choose \(P = 1\). By fitting several SARMA models based on ARMA(3,3) for \(P=1\) and a range of different values of \(Q\) (from 0 to 2), we can choose the one that corresponds to the lowest AIC as the best SARMA model for our data.

sarma303100$aic
## [1] 2043.064
sarma303101$aic
## [1] 2044.25
sarma303102$aic
## [1] 2042.203

Based on the AIC results of SARMA models above, \(SARMA(3,0,3) \times (1,0,0)_{12}\) has the lowest AIC, thus can be our optimal SARMA model. The estimated coefficients of this model are shown as follows.

## 
## Call:
## arima(x = prcp_dt, order = c(3, 0, 3), seasonal = list(order = c(1, 0, 0), period = 12))
## 
## Coefficients:
##          ar1     ar2      ar3      ma1      ma2     ma3     sar1  intercept
##       0.9129  0.4171  -0.8174  -0.9129  -0.3956  0.7989  -0.0210     2.3466
## s.e.     NaN     NaN      NaN      NaN      NaN     NaN   0.0384     0.0446
## 
## sigma^2 estimated as 1.326:  log likelihood = -1012.53,  aic = 2043.06

Diagnostic Analysis of SARMA Model

Next we need to perform some diagnostic analyses of this SARMA model. First, we check the correlation coefficients between model residuals to see whether they are uncorrelated.

As illustrated in the ACF plot, autocovariance coefficients of the residuals are all insignificant and do not show notable periodic patterns. We can draw the conclusion that residuals are almost uncorrelated.

Then we use QQ-plot to check whether the residuals follow normal distribution.

The QQ-plot shows that the distribution of the residuals has slightly lighter left and right tails than normal distribution. However, since most of the points fall on the qqline, we conclude that residuals are nearly normally distributed.

Conclusion

In this project, we intend to assess the potential shifts in Detroit’s precipitation patterns after 1970, within the context of escalating global warming and the rise in extreme weather events in the Great Lake region. Using publicly available precipitation datasets, we conducted a comprehensive analysis of both daily and monthly time series data related to Detroit’s precipitation to uncover notable patterns.

Our analysis yields several findings. Firstly, the precipitation in Detroit after 1970 exhibits mean stationarity and variance stationarity. This pattern appears to be independent of the increasing precipitation and storm events observed in the Great Lake regions. With this feature, we suppose that an Autoregressive Moving Average (ARMA) model is suitable for modeling the daily precipitation data. After parameter selection based on AIC criteria, we identify ARMA(2,0) model as optimal. Using the results from [15], we conclude that the precipitation in Detroit appears to have a pseudo-cyclic behavior with a cycle of \(1/f_0\), where \[\begin{equation} f_0=\dfrac{1}{2\pi}\cos^{-1}\left(\dfrac{\phi_1}{2\sqrt{-\phi_2}}\right)=0.1943, \end{equation}\] and the autocorrelations oscillate in a sinusoidal envelope with a frequency of \(f_0\). In a nutshell, the data indicates a consistent level of precipitation throughout all observed years, also suggesting a continuation of this pattern in the future.

Furthermore, our spectrum analysis shows that the dominant frequency for unsmoothed method is 370.28 days per cycle and 378.78 days per cycle for smoothing method, both a little more than a year. The cycle that represents precipitation is one year, which is not exactly the same as what we expected. This is out of expectation because with global warming and climate change, heavy rainfall in Detroit will become more frequent and the cycle is expected to be shortened. This annual pattern is consistent with seasonal changes and reflects the climate characteristics of the region, including the combined effects of factors such as seasonal temperature changes, humidity, and atmospheric circulation patterns [16].

This spectrum investigation paves a way for fitting a seasonal model based on monthly averaged precipitation data. Using AIC criteria once again, we find the ARMA(3,3) model as the most suitable for monthly precipitation data. Additionally, incorporating seasonality, the best SARMA model appears to be SARIMA\((3,0,3)\times(1,0,0)_{12}\), which is both causal and invertible while ensuring the validity of residual assumptions. The good performance of the SARMA model fitting suggests that there does not exist notable shifts in precipitation patterns in Detroit since 1970.

In conclusion, our analysis indicates that the precipitation dynamics in Detroit remain stable despite the escalating global warming and increasing regional weather events.

Our study has some limitations:

  1. The residuals of our AR(2) model picked by AIC criterion are not normally distributed, and the cycle of our AR(2) model is much shorter than the result obtained from spectrum analysis. There will be some limitations when we want to forecast extreme weathers in the future.

  2. In the spectrum analysis, we divided the data into two parts with the year 2000 as the dividing line. After obtaining their dominant period and dominant frequency by performing spectral analysis on the two periods respectively. However, the results are not good as we expected, and there is no obvious peak in the periodogram. Therefore, there is a lack of comparison in the description of frequency and period, which makes it less convincing.

  3. In the SARMA modeling part, we use averaging to obtain monthly data. However, there may be more appropriate methods to process the data for analyzing precipitation pattern change and extreme weathers.

References

[1] https://www.climate.gov/news-features/understanding-climate/climate-change-global-temperature

[2] https://gpm.nasa.gov/resources/faq/how-does-climate-change-affect-precipitation

[3] https://detroitmi.gov/departments/homeland-security-emergency-management-detroit/severe-weather-warnings-safety-tips-and-resources/severe-weather-hazards-safety-tips/floods

[4] https://www.ncei.noaa.gov/data/daily-summaries/access

[5] https://water.usgs.gov/edu/activity-howmuchrain-metric.html(https://water.usgs.gov/edu/activity-howmuchrain-metric.html#){.uri}#:~:text=Slight%20rain%3A%20Less%20than%200.5,than%208%20mm%20per%20hour.

[6] https://ionides.github.io/531w24/04/notes.pdf

[7] https://ionides.github.io/531w24/05/notes.pdf

[8] http://r-statistics.co/Loess-Regression-With-R.html

[9] https://stat.ethz.ch/R-manual/R-devel/library/stats/html/box.test.html

[10] https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/acf

[11] https://www.rdocumentation.org/packages/EnvStats/versions/2.3.1/topics/qqPlot

[12] Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S. Springer, New York, 271-300. https://doi.org/10.1007/978-0-387-21706-2

[13] https://ionides.github.io/531w24/07/notes.pdf

[14] Lecture slides chapter 6. https://ionides.github.io/531w24/06/slides.pdf

[15] https://sjmiller8182.github.io/LearningTS/build/autoregressive_models.html

[16] Mallakpour, I., Villarini, G. Analysis of changes in the magnitude, frequency, and seasonality of heavy precipitation over the contiguous USA. Theor Appl Climatol 130, 345–363 (2017). https://doi.org/10.1007/s00704-016-1881-z