A hurricane is a kind of tropical cyclone that occurs in the Atlantic Ocean and northeastern Pacific Ocean, which is a rapidly rotating storm system characterized by a low-pressure center, a closed low-level atmospheric circulation, strong winds, and a spiral arrangement of thunderstorms that produce heavy rain or squalls.\(^{[1]}\) As a kind of natural disaster, hurricane seriously threatens people’s lives and property, and causes great impact on people’s livelihood, agriculture and economy in coastal areas. As a typical hurricane-plagued country, the United States has suffered many hurricanes that caused heavy casualties in its history. Thus, people have been studying the causes and nature of hurricanes for years, hoping to get a handle on them.
Accumulated cyclone energy (ACE) is a measure to express the activity of individual tropical cyclones and entire tropical cyclone seasons. It uses an approximation of the wind energy used by a tropical system over its lifetime and is calculated every six hours. The ACE of a season is the sum of the ACEs for each storm and takes into account the number, strength, and duration of all the tropical storms in the season.\(^{[2]}\) The greater the ACE value, the more active and energetic a hurricane is. What’s more, the primary energy source for these storms is warm ocean waters. So there are reasons to suspect that changes in sea surface temperature(SST) can affect hurricane formation and its energy levels.
Here, we consider the measured north Atlantic hurricane ACE values over the past 150 years, hoping to find some rules of hurricane phenomena and the relationships between the factors that influence it.
Finding a better time series model that can fit historic annual ACE values of hurricanes effectively.
Hoping to find the relationship between the ACE value of hurricane and SST. Can we use the change of SST to estimate the change of ACE?
Using the well-fitted model above to estimate ACE values of North Atlantic hurricanes over the next few years.
Our dataset of ACE of North Atlantic hurricanes is from Our World in Data. And the dataset of SST is from KNMI Climate Explorer , which is operating on Ensemble-median sea-surface temperature anomalies from the HadSST.4.0.0.0 data set, averaging anomalies over region lon=(-360.000,0.000), lat=(-90.000,90.000), tos [K] Sea water temperature anomaly at a depth of 20cm.
After simply processing the data, we got the ACE values of North Atlantic hurricanes and the corresponding SST values for the 142 years from 1876 to 2017.
## Entity Year ACE
## 1 North Atlantic 1876 57
## 2 North Atlantic 1877 73
## 3 North Atlantic 1878 181
## 4 North Atlantic 1879 64
## 5 North Atlantic 1880 131
## 6 North Atlantic 1881 59
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 55.00 83.00 93.78 125.00 259.00
From the summary of the data, the mean of ACE is 93.78, which is in the classification of ‘Near Normal’.
Then, we look at the time plot and the acf plot to get a overall state of the data.
From the time plot, we cannot see obvious trend and instability directly. It seems that the time series is stationary. But in order to obtain more accurate statistical results, we now use Augmented Dickey-Fuller test\(^{[3]}\) to judge the stability of time series.
adf.test(ACE)
##
## Augmented Dickey-Fuller Test
##
## data: ACE
## Dickey-Fuller = -3.6096, Lag order = 5, p-value = 0.03497
## alternative hypothesis: stationary
The p-value = 0.03 < 0.05, so we reject the null hypothesis and conclude that the time series is stationary. Therefore, we can continue to analyze the data with ARMA models under the assumption that the time series is stationary.
Let’s briefly analyze the frequency domain of the time series to see if there is any significant periodicity.
spec = spectrum(ACE,span=c(5,3))
Actually, we cannot find the dominant frequency from the periodogram, which means that the time series doesn’t have obvious seasonal oscillation. While we can still find that the largest peak occurs at a frequency of 0.021, which corresponds to a period of 48 years. In the subsequent model analysis, we can try to add this seasonal variation to see if it can improve the model.
In order to find if the values of SST have influence on the values of ACE, we decompose the trend of ACE and SST by loess.
ace_low <- ts(loess(ACE~ace$Year,span = 0.5)$fitted,start = 1876,frequency = 1)
sst_low <- ts(loess(sst~ace$Year,span = 0.5)$fitted,start = 1876,frequency = 1)
plot(ts.union(ACE,ace_low,sst,sst_low))
The plot shows that these two dataset have similar trend behaviors. So in the next analysis we will consider that if SST can help us fit the ACE model well.
From the analysis and assumption above, we can first try to fit a a stationary Gaussian ARIMA(p,0,q) model with no trend:
\[ \phi(B)(Y_n-\mu)=\psi(B)\epsilon_n, \] where \[\mu=\mathbb{E}[Y_n]\] \[\phi(x)=1-\phi_1x-\phi_2x^2-\cdots-\phi_px^p\] \[\psi(x)=1+\psi_1x+\psi_2x^2+\cdots+\psi_qx^q\] \[\epsilon_n\sim iid N(0,\sigma^2)\] and this model has the parameter vector\(\theta=(\phi_{1:p},\psi_{1:q},\mu,\sigma^2)\).
In order to find a proper parameters \(p\) and \(q\) for ARIMA model, we use AIC as a reference and get the following AIC table.
## Warning in arima(data, order = c(p, 0, q)): possible convergence problem:
## optim gave code = 1
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 1546.57 | 1546.76 | 1548.76 | 1550.26 | 1549.64 |
AR1 | 1546.78 | 1545.18 | 1550.76 | 1549.05 | 1550.59 |
AR2 | 1548.77 | 1546.56 | 1547.11 | 1550.33 | 1551.67 |
AR3 | 1550.52 | 1549.13 | 1549.10 | 1551.11 | 1550.49 |
AR4 | 1551.79 | 1550.81 | 1552.79 | 1550.46 | 1551.27 |
The lowest AIC is from ARIMA(1,0,1) model. And we also consider about ARIMA(2,0,1) models with smaller AIC values. Then using likelihood ratio test to check these two models. We consider ARIMA(1,0,1) model as null hypothesis \(H_0\) and ARIMA(2,0,1) model as alternative hypothesis \(H_1\). Remember that under the hypothesis \(H_0\), the formula of Wilks Approximation is \(2(\ell^{(1)}-\ell^{(0)})\approx \chi^2_{D^1-D^0}\), where \(D^1\) and \(D^0\) is the number of parameters of hypothesis \(H^1\) and \(H^0\), respectively.
2*(arma21$loglik-arma11$loglik)
## [1] 0.6115904
qchisq(0.95,1)
## [1] 3.841459
0.61 < 3.84, so we can’t reject \(H_0\) and continue to choose ARIMA(1,0,1) model for ACE values.
Let’s fit a model using linear regression first, and fit an ARMA error model.
linear <- lm(ACE~sst)
summary(linear)
##
## Call:
## lm(formula = ACE ~ sst)
##
## Residuals:
## Min 1Q Median 3Q Max
## -86.20 -39.85 -10.40 30.27 173.12
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 94.394 4.586 20.584 <2e-16 ***
## sst 38.221 15.830 2.414 0.0171 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.56 on 140 degrees of freedom
## Multiple R-squared: 0.03997, Adjusted R-squared: 0.03312
## F-statistic: 5.829 on 1 and 140 DF, p-value: 0.01705
The coefficient is significant. Thus, the univariate linear regression applies to this model.
Also, we use AIC as a reference and get the following AIC table.
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 1540.77 | 1541.77 | 1543.48 | 1545.35 | 1544.58 |
AR1 | 1541.85 | 1539.44 | 1541.31 | 1543.31 | 1545.46 |
AR2 | 1543.59 | 1541.31 | 1543.31 | 1545.31 | 1546.68 |
AR3 | 1545.58 | 1543.31 | 1545.25 | 1542.16 | 1545.12 |
AR4 | 1546.24 | 1545.40 | 1547.40 | 1545.11 | 1545.32 |
The table also shows that ARIMA(1,0,1) model has the smallest AIC. And comparing these two models, ARMA error model is better with lower AIC.
a <- Arima(ACE,order = c(1,0,1),xreg = sst)
a
## Series: ACE
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept xreg
## -0.8983 1.0000 94.4522 38.6118
## s.e. 0.0405 0.0253 4.6716 15.9720
##
## sigma^2 estimated as 2865: log likelihood=-765.72
## AIC=1541.44 AICc=1541.88 BIC=1556.22
Also, we can use likelihood ratio test to check it.
2*(a$loglik-arma11$loglik)
## [1] 5.739249
5.74 > 3.84, so we reject null hypothesis and choose the linear regression with ARMA error model.
We now consider to add seasonality property into the model.
as <- arima(ACE,order = c(1,0,1),xreg = sst,seasonal = list(order = c(1,0,0),period = 48))
as
##
## Call:
## arima(x = ACE, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 0), period = 48),
## xreg = sst)
##
## Coefficients:
## ar1 ma1 sar1 intercept sst
## -0.9032 1.0000 0.0610 94.3288 38.0178
## s.e. 0.0404 0.0239 0.1034 4.8533 15.9580
##
## sigma^2 estimated as 2777: log likelihood = -765.55, aic = 1543.09
\(AIC_a < AIC_{as}\), so the seasonality cannot improve the model and we still to choose ARIMA(1,0,1) error model.
checkresiduals(a)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 7.3379, df = 6, p-value = 0.2907
##
## Model df: 4. Total lags used: 10
From the Ljung-Box test, the p-value = 0.2907 > 0.05, so the series is uncorrelated and we can think the residuals as a white noise.
So far, we get the final fitted model:
\[Y_n=94.45+38.61X_n+\eta_n\] \[\eta_n=-0.90\eta_{n-1}+\epsilon_n+\epsilon_{n-1}\]
\[\epsilon_n \thicksim iid N(0,2777)\]
where \(Y_n\) is ACE value time series of north Atlantic hurricane, and \(X_n\) is the time series of corresponding SST values.
When we use the regression model with ARIMA error terms for prediction, we need to predict both the linear regression model part and the ARIMA model part, and then combine the two results to get the final model prediction result. Here, the future value of predictive variable is needed. So we try to make prediction of ACE value in 2018 since we have SST value of year 2018, which is 0.64.
fcast <- forecast(a, xreg=c(0.64))
autoplot(fcast) + xlab("Year") + ylab("ACE")+
theme(text = element_text(family = "STHeiti"))+
theme(plot.title = element_text(hjust = 0.5))
fcast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2018 118.4444 49.63616 187.2527 13.21126 223.6776
After the analysis above, we get the final model of the ACE values of north Atlantic hurricanes. It follows a linear regression with ARMA(1,1) error model.
The value of SST can help us to assess and predict the value of ACE, which means that the increase of hurricane activity is closely related to the increase of sea surface temperature. It is worth reflecting that the global warming caused by human activities in recent years, and the harm of which is also fed back to human society through natural disasters.
This model also has some disadvantages. Notice that the MA root of this model is at the edge of the unit circle, which will cause numerical instability. So the model is needed to improve more.
[1] Wikipedia:Hurricane. https://en.wikipedia.org/wiki/Tropical_cyclone
[2] Wikipedia:Accumulated cyclone energy. https://en.wikipedia.org/wiki/Accumulated_cyclone_energy
[3] Knowledge resource: Augmented Dickey-Fuller test. https://nwfsc-timeseries.github.io/atsa-labs/sec-boxjenkins-aug-dickey-fuller.html
Knowledge resource: Dynamic regression models. https://blog.csdn.net/bea_tree/article/details/51228721
Knowledge resource: Forecasting. https://otexts.com/fppcn/forecasting.html
Knowledge resource: Box-Ljung test. https://nwfsc-timeseries.github.io/atsa-labs/sec-boxjenkins-check-resids.html