Earthquakes occur all over the world multiple and times a year. Some are so small that they can’t be felt. Some are so violent that buildings get destroyed. Scientists study earthquakes to prepare for future earthquakes, understand the moving of the tectonic plates, and comprehend why they happen.
More specifically one measure for earthquakes is magnitude; it measures the relative size of the earthquake. By looking at a time series of earthquake magnitudes from the last one hundred years, it is possible to predict when another large earthquake will appear and find appropriate models to fit earthquake magnitude data.
By looking at a time series of world earthquake magnitude data, it is the hope to have a better understanding of models that fit the data well, if there is trend, and what a 5 year forecast looks like.
Earthquake Magnitude: Values range: 0.0 to 9.9 “The value in this column contains the primary earthquake magnitude. Magnitude measures the energy released at the source of the earthquake. Magnitude is determined from measurements on seismographs. For pre-instrumental events, the magnitudes are derived from intensities. There are several different scales for measuring earthquake magnitudes. The primary magnitude is chosen from the available magnitude scales.” (NOAA)
First the data is read in from National Oceanic and Atomospheric Association (NOAA) on earthquake magnitudes from 1917 - 2000. This is the average magnitude from every year for the last 100 years. The reason for using average magnitude for each year is because there are not enough earthquakes occurring in each month to do average for month. This is a look at the first few data points to get an idea of what the data looks like:
## # A tibble: 6 x 2
## Year Mag
## <int> <dbl>
## 1 1917 6.986364
## 2 1918 7.528571
## 3 1919 7.040000
## 4 1920 6.945455
## 5 1921 6.944444
## 6 1922 7.018750
plot(eq_2$Mag~eq_2$Year,type="l", main = "Earthquake Magnitude from 1917 to 2017", xlab = "Years", ylab ="Magnitude")
This plot shows the fluctuation over time of the average earthquake magnitude around the world. It appears to have a slight decreasing trend just by looking at the data. However, modeling the data is the only way to know if the trend is decreasing or stable. First we will assume the trend is constant and later test whether that assumption is true or whether the assumption is false.
This is a stationary ARMA(p,q) model under the null hypothesis that there is no trend. To the naked eye it appears to have a slightly decreasing trend but we will look at that later. The null is stating that magnitude has not changed across the globe from the last 100 years. *Stationary Guasian ARMA(p,q) model: \[\phi(B)(Y_{n}-\mu)=\psi(B)\epsilon_{n}\] where \[\mu=E[Y_n]\] \[\psi=1+\psi_1x+\psi_2x^2+...+\psi_qx^q\] \[\phi=1-\phi_1x-\phi_2x^2-...-\phi_qx^q\] \[\epsilon_n\]~Normal[0,\(\sigma^2\)] In order to decide what values of p and q to use, an Akaik Information Criterion (AIC) table is to be computed as follows: \[AIC = -2 * \mathcal{l}(\theta)+2D\] where D is the number of parameters.
eq_mag <- eq_2$Mag
eq_year <- eq_2$Year
aic_table <- function(data,P,Q){
table <- matrix(NA,(P+1),(Q+1))
for(p in 0:P) {
for(q in 0:Q) {
table[p+1,q+1] <- arima(data,order=c(p,0,q), method = 'ML')$aic
}
}
dimnames(table) <- list(paste("<b> AR",0:P, "</b>", sep=""),paste("MA",0:Q,sep=""))
table
}
weather_aic_table <- aic_table(eq_mag,4,5)
## Warning in arima(data, order = c(p, 0, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 0, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 0, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 0, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 0, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 0, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 0, q), method = "ML"): possible
## convergence problem: optim gave code = 1
require(knitr)
## Loading required package: knitr
kable(weather_aic_table,digits=2)
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | 118.93 | 79.15 | 56.69 | 48.53 | 43.83 | 38.80 |
AR1 | 36.34 | 19.09 | 20.73 | 22.38 | 24.37 | 26.36 |
AR2 | 21.14 | 20.64 | 22.55 | 24.56 | 24.95 | 26.78 |
AR3 | 21.34 | 22.34 | 23.42 | 23.94 | 26.19 | 28.55 |
AR4 | 22.95 | 23.91 | 24.27 | 26.34 | 28.66 | 29.37 |
From the AIC table ARMA(1,1), ARMA(2,1), and ARMA(1,2) seem like reasonable models to assess. None of them are too large of models in which case overfitting would be a concern.
##
## Call:
## arima(x = eq_mag, order = c(2, 0, 1))
##
## Coefficients:
## ar1 ar2 ma1 intercept
## 0.8001 0.1408 -0.4177 6.5496
## s.e. 0.2360 0.2057 0.2264 0.2158
##
## sigma^2 estimated as 0.06418: log likelihood = -5.32, aic = 20.64
##
## Call:
## arima(x = eq_mag, order = c(1, 0, 2))
##
## Coefficients:
## ar1 ma1 ma2 intercept
## 0.9491 -0.5612 0.0646 6.5491
## s.e. 0.0396 0.1069 0.1059 0.2165
##
## sigma^2 estimated as 0.06424: log likelihood = -5.37, aic = 20.73
##
## Call:
## arima(x = eq_mag, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## 0.9574 -0.5442 6.5511
## s.e. 0.0331 0.0966 0.2284
##
## sigma^2 estimated as 0.06447: log likelihood = -5.55, aic = 19.09
From looking at the models individually, it is shown that all of the models have larger fitted values than their standard errors. However all of the fitted values are inside the unit circle. This might be an issue later with stability. ARMA(1,1) appears to have the best performance with low standard errors and larger fitted values. We will analyze the data set using ARMA(1,1).
arma21_roots <- polyroot(c(1,-coef(arma21)))
abs(arma21_roots)
## [1] 0.6435125 0.6734758 0.6435125 0.5474557
arma12_roots <- polyroot(c(1,-coef(arma12)))
abs(arma12_roots)
## [1] 0.6049179 0.7480889 0.6049179 0.5577909
arma11_roots <- polyroot(c(1,-coef(arma11_0)))
abs(arma11_roots)
## [1] 0.4679272 0.5711566 0.5711566
All of the models have roots inside of the unit circle. This is a concern with the stability of the model. However, sometimes it cannot be avoided picking a model where the roots are within the unit circle. Thus we will stick with ARMA(1,1) model for now.
Since ARMA(1,1) model was chosen previously we will test an ARMA(1,1) model with trend against the model with no trend. Here is the model with a linear trend:
arma11_1 <- arima(eq_mag, c(1,0,1),xreg = eq_year)
arma11_1
##
## Call:
## arima(x = eq_mag, order = c(1, 0, 1), xreg = eq_year)
##
## Coefficients:
## ar1 ma1 intercept eq_year
## 0.7543 -0.4185 28.1751 -0.0110
## s.e. 0.1146 0.1464 3.6220 0.0018
##
## sigma^2 estimated as 0.05909: log likelihood = -0.6, aic = 11.2
The model that is above is: \[(1-\phi*B)(Y_n-\mu-\beta*t_n)=\psi(B)\epsilon_n\] where \(\epsilon_n\) is Gaussian white noise iid N[0,\(\sigma^2\)]. The null hypothesis: \[H_0: \beta=0\] Alternative hypothesis: \[H_1:\beta\neq0\]
#Hypothesis test
LRT <- 2*(-.6+5.55)
1-pchisq(LRT,1)
## [1] 0.001652788
Using the likelihood ratio test to test whether or not it is appropriate to model the data with a model that has trend. The null hypothesis can be rejected that \(\beta=0\) with a pvalue of .00165 and \(\alpha<.05\). So a model with trend is the better way to model the data.
aic_table <- function(data,P,Q){
table <- matrix(NA,(P+1),(Q+1))
for(p in 0:P) {
for(q in 0:Q) {
table[p+1,q+1] <- arima(data,order=c(p,1,q), method = 'ML')$aic
}
}
dimnames(table) <- list(paste("<b> AR",0:P, "</b>", sep=""),paste("MA",0:Q,sep=""))
table
}
weather_aic_table <- aic_table(eq_mag,4,5)
## Warning in arima(data, order = c(p, 1, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 1, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 1, q), method = "ML"): possible
## convergence problem: optim gave code = 1
## Warning in arima(data, order = c(p, 1, q), method = "ML"): possible
## convergence problem: optim gave code = 1
require(knitr)
kable(weather_aic_table,digits=2)
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | 44.14 | 15.84 | 17.73 | 18.85 | 20.58 | 22.28 |
AR1 | 20.59 | 17.68 | 19.52 | 19.64 | 21.61 | 23.54 |
AR2 | 19.46 | 17.82 | 19.60 | 21.60 | 22.47 | 24.29 |
AR3 | 20.57 | 19.59 | 19.94 | 21.93 | 17.63 | 27.44 |
AR4 | 21.38 | 21.59 | 21.94 | 23.94 | 19.72 | 25.53 |
arima111 <- arima(eq_mag, c(1,1,1))
arima111
##
## Call:
## arima(x = eq_mag, order = c(1, 1, 1))
##
## Coefficients:
## ar1 ma1
## -0.0858 -0.5133
## s.e. 0.2132 0.1944
##
## sigma^2 estimated as 0.06554: log likelihood = -5.84, aic = 17.68
Because the model has a decreasing trend it is important to consider using a different model. The transformed model changes the data \(y_n-y_{n-1}\) This is written as \[\phi(B)((1-B)^dY_n-\mu=\psi(B)\epsilon_n\] where \(\epsilon_n\) is Gaussian white noise. The reason for using this is to make the data look more stationary.
spectrum(eq_mag, main="Unsmoothed Periodogram of Earthquake Magnitude")
smoothed<-spectrum(eq_mag, spans=c(3,3), main="Smoothed Periodogram of Earthquake Magnitude")
spectrum(eq_mag, spans=c(3,3), main="Smoothed Periodogram of Earthquake Magnitude")
highest_freq<-smoothed$freq[which.max(smoothed$spec)]
highest_freq
## [1] 0.0462963
1/highest_freq
## [1] 21.6
The smoothed periodogram frequency is in cycles per data point (year). The highest frequency is at .0462. This corresponds to a period of 21.6 years. This is a significant period. None of the other peaks seem to be significant which can be seen by looking at the confidence interval bar on the right hand side.
Looking at the ACF (autocorrelation function) graph of the ARMA(1,1) with trend, there does not appear to be any issues with autocorrelation. Lag 10 appears to be slightly out of the 95% confidence range where we would expect to see Gaussian white noise. However, this is not concerning. The residuals, however, appear to have a lot of variability and peaks around every 20 years. Looking at the residual scatter plot of the predicted versus the residuals shows that they are scattered very randomly. This suggests that further analysis of doing a log transformation is not necessary by looking at the residuals.
qqnorm(eq_2$Mag)
qqline(eq_2$Mag)
shapiro.test(eq_2$Mag)
##
## Shapiro-Wilk normality test
##
## data: eq_2$Mag
## W = 0.9828, p-value = 0.2129
It is also important to check that the magnitudes are in fact normal. To test that it is important to look at a qqplot. From the qqplot, it can be seen that the data is normally distributed. However, it does have really long tails on each end. To further test normality we use the shairo test with \(H_0:\) the data is normally distributed versus \(H_1:\) the data is not normally distributed. With a pvalue of .2129 we cannot reject the null hypothesis that the data is normally distributed.Thus we conclude that we have normally distributed data.
arma_forecast <- forecast(arma11_0, h = 5)
plot(arma_forecast, ylab="Time since 1917" )
This is a forecast using R’s build in forecast function for 5 more years. we see that the analysis is that average magnitude will decrease slightly and then stay constant for the next 5 years.
For further analysis, it would be a good idea to take a look at the time series of the count of earthquakes during the same years.
read.excel <- function(header=TRUE,...) {
read.table("clipboard",sep="\t",header=header,...)
}
eq <- read.excel()
eq_1 <- as.data.frame(eq)
eq_2 <- eq_1 %>% group_by(Year) %>% summarise(count = n())
plot(eq_2$count~eq_2$Year,type="l", ylab="Count of Earthquakes", xlab="Year", main="Time Series Plot of the Count of Earthquakes")
There is a dramatic increase in the number of earthquakes after 2001-2002. However, if you look at the magnitude after approximately 2001-2002, it drops by to the lowest point it ever reaches in the past 100 years. It would be an interesting next step to look at how earthquake frequency around the globe impacts earthquake magnitude in the essence of time series analysis by doing a joint analysis. A hypothesis to look at with the added information is whether increased earthquake frequency decreases earthquake magnitude. Earthquake magnitude is a measure of intensity. Perhaps having a multitude of small earthquakes allows the tension in the earth’s tectonic plates to be reduced. If there are fewer earthquakes perhaps more tension builds up which results in highly intense but fewer earthquakes.
Another hypothesis is that the instruments to detect earthquakes have gotten a lot better so they are detecting earthquakes that are very small.