stats531 midterm project: A Study on Accumulated Cyclone Energy of North Atlantic Hurricane

1 Introduction
- Background
- objective
2 Explore the data
3 Modeling
4 Diagnosis
5 Prediction
6 Conclusion
7 Reference

1 Introduction

Background

A hurricane is a kind of tropical cyclone that occurs in the Atlantic Ocean and northeastern Pacific Ocean, which is a rapidly rotating storm system characterized by a low-pressure center, a closed low-level atmospheric circulation, strong winds, and a spiral arrangement of thunderstorms that produce heavy rain or squalls.\(^{[1]}\) As a kind of natural disaster, hurricane seriously threatens people’s lives and property, and causes great impact on people’s livelihood, agriculture and economy in coastal areas. As a typical hurricane-plagued country, the United States has suffered many hurricanes that caused heavy casualties in its history. Thus, people have been studying the causes and nature of hurricanes for years, hoping to get a handle on them.

Accumulated cyclone energy (ACE) is a measure to express the activity of individual tropical cyclones and entire tropical cyclone seasons. It uses an approximation of the wind energy used by a tropical system over its lifetime and is calculated every six hours. The ACE of a season is the sum of the ACEs for each storm and takes into account the number, strength, and duration of all the tropical storms in the season.\(^{[2]}\) The greater the ACE value, the more active and energetic a hurricane is. What’s more, the primary energy source for these storms is warm ocean waters. So there are reasons to suspect that changes in sea surface temperature(SST) can affect hurricane formation and its energy levels.

Here, we consider the measured north Atlantic hurricane ACE values over the past 150 years, hoping to find some rules of hurricane phenomena and the relationships between the factors that influence it.

objective

Finding a better time series model that can fit historic annual ACE values of hurricanes effectively.
Hoping to find the relationship between the ACE value of hurricane and SST. Can we use the change of SST to estimate the change of ACE?
Using the well-fitted model above to estimate ACE values of North Atlantic hurricanes over the next few years.

2 Explore the data

Loading in the data

Our dataset of ACE of North Atlantic hurricanes is from Our World in Data. And the dataset of SST is from KNMI Climate Explorer , which is operating on Ensemble-median sea-surface temperature anomalies from the HadSST.4.0.0.0 data set, averaging anomalies over region lon=(-360.000,0.000), lat=(-90.000,90.000), tos [K] Sea water temperature anomaly at a depth of 20cm.

After simply processing the data, we got the ACE values of North Atlantic hurricanes and the corresponding SST values for the 142 years from 1876 to 2017.

##           Entity Year ACE
## 1 North Atlantic 1876  57
## 2 North Atlantic 1877  73
## 3 North Atlantic 1878 181
## 4 North Atlantic 1879  64
## 5 North Atlantic 1880 131
## 6 North Atlantic 1881  59

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   55.00   83.00   93.78  125.00  259.00

From the summary of the data, the mean of ACE is 93.78, which is in the classification of ‘Near Normal’.

Then, we look at the time plot and the acf plot to get a overall state of the data.

Stationarity analysis

From the time plot, we cannot see obvious trend and instability directly. It seems that the time series is stationary. But in order to obtain more accurate statistical results, we now use Augmented Dickey-Fuller test\(^{[3]}\) to judge the stability of time series.

adf.test(ACE)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ACE
## Dickey-Fuller = -3.6096, Lag order = 5, p-value = 0.03497
## alternative hypothesis: stationary

The p-value = 0.03 < 0.05, so we reject the null hypothesis and conclude that the time series is stationary. Therefore, we can continue to analyze the data with ARMA models under the assumption that the time series is stationary.

Frequency domain analysis

Let’s briefly analyze the frequency domain of the time series to see if there is any significant periodicity.

spec = spectrum(ACE,span=c(5,3))

Actually, we cannot find the dominant frequency from the periodogram, which means that the time series doesn’t have obvious seasonal oscillation. While we can still find that the largest peak occurs at a frequency of 0.021, which corresponds to a period of 48 years. In the subsequent model analysis, we can try to add this seasonal variation to see if it can improve the model.

Trend relationship with SST

In order to find if the values of SST have influence on the values of ACE, we decompose the trend of ACE and SST by loess.

ace_low <- ts(loess(ACE~ace$Year,span = 0.5)$fitted,start = 1876,frequency = 1)
sst_low <- ts(loess(sst~ace$Year,span = 0.5)$fitted,start = 1876,frequency = 1)
plot(ts.union(ACE,ace_low,sst,sst_low))

The plot shows that these two dataset have similar trend behaviors. So in the next analysis we will consider that if SST can help us fit the ACE model well.

3 Modeling

ARIMA model of ACE without SST

From the analysis and assumption above, we can first try to fit a a stationary Gaussian ARIMA(p,0,q) model with no trend:

\[ \phi(B)(Y_n-\mu)=\psi(B)\epsilon_n, \] where \[\mu=\mathbb{E}[Y_n]\] \[\phi(x)=1-\phi_1x-\phi_2x^2-\cdots-\phi_px^p\] \[\psi(x)=1+\psi_1x+\psi_2x^2+\cdots+\psi_qx^q\] \[\epsilon_n\sim iid N(0,\sigma^2)\] and this model has the parameter vector\(\theta=(\phi_{1:p},\psi_{1:q},\mu,\sigma^2)\).

In order to find a proper parameters \(p\) and \(q\) for ARIMA model, we use AIC as a reference and get the following AIC table.

## Warning in arima(data, order = c(p, 0, q)): possible convergence problem:
## optim gave code = 1

	MA0	MA1	MA2	MA3	MA4
AR0	1546.57	1546.76	1548.76	1550.26	1549.64
AR1	1546.78	1545.18	1550.76	1549.05	1550.59
AR2	1548.77	1546.56	1547.11	1550.33	1551.67
AR3	1550.52	1549.13	1549.10	1551.11	1550.49
AR4	1551.79	1550.81	1552.79	1550.46	1551.27

The lowest AIC is from ARIMA(1,0,1) model. And we also consider about ARIMA(2,0,1) models with smaller AIC values. Then using likelihood ratio test to check these two models. We consider ARIMA(1,0,1) model as null hypothesis \(H_0\) and ARIMA(2,0,1) model as alternative hypothesis \(H_1\). Remember that under the hypothesis \(H_0\), the formula of Wilks Approximation is \(2(\ell^{(1)}-\ell^{(0)})\approx \chi^2_{D^1-D^0}\), where \(D^1\) and \(D^0\) is the number of parameters of hypothesis \(H^1\) and \(H^0\), respectively.

2*(arma21$loglik-arma11$loglik)

## [1] 0.6115904

qchisq(0.95,1)

## [1] 3.841459

0.61 < 3.84, so we can’t reject \(H_0\) and continue to choose ARIMA(1,0,1) model for ACE values.

ARIMA model of ACE with SST

Let’s fit a model using linear regression first, and fit an ARMA error model.

linear <- lm(ACE~sst)
summary(linear)

## 
## Call:
## lm(formula = ACE ~ sst)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -86.20 -39.85 -10.40  30.27 173.12 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   94.394      4.586  20.584   <2e-16 ***
## sst           38.221     15.830   2.414   0.0171 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 54.56 on 140 degrees of freedom
## Multiple R-squared:  0.03997,    Adjusted R-squared:  0.03312 
## F-statistic: 5.829 on 1 and 140 DF,  p-value: 0.01705

The coefficient is significant. Thus, the univariate linear regression applies to this model.

Also, we use AIC as a reference and get the following AIC table.

	MA0	MA1	MA2	MA3	MA4
AR0	1540.77	1541.77	1543.48	1545.35	1544.58
AR1	1541.85	1539.44	1541.31	1543.31	1545.46
AR2	1543.59	1541.31	1543.31	1545.31	1546.68
AR3	1545.58	1543.31	1545.25	1542.16	1545.12
AR4	1546.24	1545.40	1547.40	1545.11	1545.32

The table also shows that ARIMA(1,0,1) model has the smallest AIC. And comparing these two models, ARMA error model is better with lower AIC.

a <- Arima(ACE,order = c(1,0,1),xreg = sst)
a

## Series: ACE 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##           ar1     ma1  intercept     xreg
##       -0.8983  1.0000    94.4522  38.6118
## s.e.   0.0405  0.0253     4.6716  15.9720
## 
## sigma^2 estimated as 2865:  log likelihood=-765.72
## AIC=1541.44   AICc=1541.88   BIC=1556.22

Also, we can use likelihood ratio test to check it.

2*(a$loglik-arma11$loglik)

## [1] 5.739249

5.74 > 3.84, so we reject null hypothesis and choose the linear regression with ARMA error model.

SARIMA model of ACE with SST

We now consider to add seasonality property into the model.

as <- arima(ACE,order = c(1,0,1),xreg = sst,seasonal = list(order = c(1,0,0),period = 48))
as

## 
## Call:
## arima(x = ACE, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 0), period = 48), 
##     xreg = sst)
## 
## Coefficients:
##           ar1     ma1    sar1  intercept      sst
##       -0.9032  1.0000  0.0610    94.3288  38.0178
## s.e.   0.0404  0.0239  0.1034     4.8533  15.9580
## 
## sigma^2 estimated as 2777:  log likelihood = -765.55,  aic = 1543.09

\(AIC_a < AIC_{as}\), so the seasonality cannot improve the model and we still to choose ARIMA(1,0,1) error model.

4 Diagnosis

checkresiduals(a)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 7.3379, df = 6, p-value = 0.2907
## 
## Model df: 4.   Total lags used: 10

From the Ljung-Box test, the p-value = 0.2907 > 0.05, so the series is uncorrelated and we can think the residuals as a white noise.

So far, we get the final fitted model:

\[Y_n=94.45+38.61X_n+\eta_n\] \[\eta_n=-0.90\eta_{n-1}+\epsilon_n+\epsilon_{n-1}\]

\[\epsilon_n \thicksim iid N(0,2777)\]

where \(Y_n\) is ACE value time series of north Atlantic hurricane, and \(X_n\) is the time series of corresponding SST values.

5 Prediction

When we use the regression model with ARIMA error terms for prediction, we need to predict both the linear regression model part and the ARIMA model part, and then combine the two results to get the final model prediction result. Here, the future value of predictive variable is needed. So we try to make prediction of ACE value in 2018 since we have SST value of year 2018, which is 0.64.

fcast <- forecast(a, xreg=c(0.64))
autoplot(fcast) + xlab("Year") + ylab("ACE")+
  theme(text = element_text(family = "STHeiti"))+
  theme(plot.title = element_text(hjust = 0.5))

fcast

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2018       118.4444 49.63616 187.2527 13.21126 223.6776

6 Conclusion

After the analysis above, we get the final model of the ACE values of north Atlantic hurricanes. It follows a linear regression with ARMA(1,1) error model.
The value of SST can help us to assess and predict the value of ACE, which means that the increase of hurricane activity is closely related to the increase of sea surface temperature. It is worth reflecting that the global warming caused by human activities in recent years, and the harm of which is also fed back to human society through natural disasters.
This model also has some disadvantages. Notice that the MA root of this model is at the edge of the unit circle, which will cause numerical instability. So the model is needed to improve more.