Introduction

In this project, I would like to explore the time series pattern of the thickness of the total Ozone in the Earth. The data is obtained from Earth System Research Laboratory. The data is collected by Dobson Ozone Spectrophotometer at various locations. I selected the observation site that closest to the pole area in order to study the Ozone thickness trend that are not directly influenced by local air pollution. The South Pole station is located at Latitude -89.98 and Longitude -24.80. Due to the severe weather condition in the South Pole, there are some missing data.

The questions we would like to answer in this project include as follow:

Data manipulation

  1. Read in the daily Ozone thickness data from 1963 to 2015.
x <- read.table(file="amsTOTo3.txt",sep=',',header=TRUE)
t = x$YYYY+x$MM/12+x$DD/365
plot(x$Total_Ozone~t,type='l', xlab='Year', ylab='Total Ozone')

By just examining the time series plot, we can find several interesting features:

First let us reduce the daily data to monthly averaged data.

N_year = length(unique((x$YYYY)))
tmp = rep(unique(x$YYYY),12)
dim(tmp) = c(N_year,12)
Year = as.vector(t(tmp))
Month = rep(1:12, N_year)
ozone_ave = rep(0, N_year*12)
for (i in 1:N_year)
{
  for (j in 1:12)
  {
    y = unique(x$YYYY)[i]
    m = j
    ozone_ave[12*(i-1)+j] = mean(x[(x$YYYY==y & x$MM==m), ]$Total_Ozone)
  }
}
data = data.frame(Year, Month, ozone_ave)
data$time = data$Year+data$Month/12
plot(data$ozone_ave~data$time,type='l', xlab='Year', ylab='Monthly Average Ozone')

It should be noticed that, although we have reduced data to monthly average, there are still some missing data, especially in Sep. when it is winter in the South Pole.

Finding period

I would like to examine all possible periods that appear in this Ozone time series data.

  1. The auto-correlation function for each lag is calculated and plotted below.
abc = data$ozone_ave
abc[abc=='NaN']=0
acf(abc, main='Ozone auto-correlation function')

From the auto-correlation function plot, we can see that the ACF peaks specifically at 12 and 24 month lag. This suggests that there exists a apparent seasonal variation.

  1. Spectrum density plot
spectrum(abc, spans=c(3,5,3))
abline(v=1/3.,lty=2)
abline(v=1/6.,lty=2)
abline(v=1/12.,lty=2)
abline(v=1/24.,lty=2)

The above spectrum density plot demonstrates this seasonal variation clearer. The unit of the x-axis is per month. The dashed lines from left to right represent the periods of 24, 12, 6, and 3 months. We can see that this seasonal variation is so strong that we have to take great care of it in order to draw reasonable conclusions.

Seasonal features

I will fit this time series with ARMA models with seasonality. Also there exists 3 and 6 months periods in the spectrum, they can be considered as the harmonic of the 12 months period. The general \(\rm SARMA(p,q)\times(P,Q)_{12}\) model is give by \(\phi(B)\Phi(B^{12})(X_n-\mu)=\psi(B)\Psi(B^{12})\epsilon_n\). For simplicity, I try \(p=q=P=Q=1\) to fit this SARMA model.

sarma = arima(data$ozone_ave, order=c(1,0,1), seasonal = list(order=c(1,0,1), period=12))
sarma
## 
## Call:
## arima(x = data$ozone_ave, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 
##     1), period = 12))
## 
## Coefficients:
##          ar1     ma1    sar1     sma1  intercept
##       0.5065  0.0792  0.9817  -0.7632   262.9043
## s.e.  0.0731  0.0878  0.0069   0.0295    14.5909
## 
## sigma^2 estimated as 483.7:  log likelihood = -2185.79,  aic = 4383.59

From the result above, the parameters for both SAR1 and SMA1 is significant. The confidence interval of these two parameters lies well outside the 0. This indicates that the seasonal variation is apparent. We can also see that the parameter of MA1 has mean 0.0792 and standard deviation 0.0878. This means that the MA1 model may not be necessary. I then let \(q=0\) and refit the data.

sarma = arima(data$ozone_ave, order=c(1,0,0), seasonal = list(order=c(1,0,1), period=12))
sarma
## 
## Call:
## arima(x = data$ozone_ave, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 
##     1), period = 12))
## 
## Coefficients:
##          ar1    sar1     sma1  intercept
##       0.5594  0.9822  -0.7638   262.8162
## s.e.  0.0383  0.0067   0.0294    15.3795
## 
## sigma^2 estimated as 485.9:  log likelihood = -2186.21,  aic = 4382.43
acf(resid(sarma), na.action = na.pass)

The AIC value of \(q=0\) model is 4382.43, lower than 4383.59 for \(q=1\) model. This AIC value together with the low value of the MA1 parameter in \(q=1\) model may suggest that the MA1 model is not needed, and AR(1) model is good for the fit. By checking the auto-correlation function of the residual of the fit, we find that there is no significant peaks appears in the plot and the ACF value is under the \(1/\sqrt{N}\) CI. This suggests that the residual is basically white noise and the AR(1) model with seasonality is a good representative of this time series data.

Trend

  1. Data smoothing

I extract the Ozone thickness data for each Jan. data from 1963 to 2015. I use loess smoothing to fit the trend of the data. This smoothing is a non-parametric method and is good for first check of the data.

Jan = data[data$Month==1,]
ozone_smooth <- loess(Jan$ozone_ave~Jan$Year,span=0.5)
plot(Jan$Year, Jan$ozone_ave,type = 'l',col='red')
lines(ozone_smooth$x, ozone_smooth$fitted,type='l')

The Jan. Ozone data together with the smoothed curve suggests that there is a decrease of Ozone thickness from 1980 to 2000, and a slightly rise from 2000 to 2015. This is consistent with that was reported by news.

  1. Model selection with and without trend by AIC table

I fit the Jan. data with ARMA(p,q) model with and without trend. I also extract the AIC value for each p and q from 0 to 4. The result is listed below.

aic_table <- function(data,P,Q,xreg=NULL){
  table <- matrix(NA,(P+1),(Q+1))
  for(p in 0:P) {
    for(q in 0:Q) {
       table[p+1,q+1] <- arima(data,order=c(p,0,q),xreg=xreg)$aic
    }
  }
  dimnames(table) <- list(paste("<b> AR",0:P, "</b>", sep=""),paste("MA",0:Q,sep=""))
  table
}

e_aic_table <- aic_table(Jan$ozone_ave,4,4)
## Warning in arima(data, order = c(p, 0, q), xreg = xreg): possible
## convergence problem: optim gave code = 1

## Warning in arima(data, order = c(p, 0, q), xreg = xreg): possible
## convergence problem: optim gave code = 1

## Warning in arima(data, order = c(p, 0, q), xreg = xreg): possible
## convergence problem: optim gave code = 1
require(knitr)
## Loading required package: knitr
kable(e_aic_table,digits=2)
MA0 MA1 MA2 MA3 MA4
AR0 436.72 420.85 404.18 403.07 401.89
AR1 398.09 385.17 383.23 385.52 385.60
AR2 384.21 383.93 383.83 387.09 387.55
AR3 383.51 385.51 385.63 386.69 388.78
AR4 385.51 385.79 387.63 389.34 392.00
e_aic_table <- aic_table(Jan$ozone_ave,4,4,xreg=Jan$Year)
require(knitr)
kable(e_aic_table,digits=2)
MA0 MA1 MA2 MA3 MA4
AR0 404.31 398.79 388.97 389.44 390.74
AR1 391.89 384.69 382.76 384.59 383.86
AR2 382.80 383.43 380.31 385.07 387.06
AR3 382.96 383.55 385.03 387.00 388.94
AR4 384.87 385.05 387.02 384.67 384.61

From the AIC table above, we prefer the ARMA(1,2) model in the cases both with and without trend. Also from the AIC value, we can see that the model with trend has better fit to the data. The slope of the linear trend is -0.59+-0.29. First, this CI does not contain 0, which means that the trend is significant. Second, the slope has negative value and it suggests that there exists a general decrease of the Ozone thickness.

Hypothesis test

In the above section, I used the AIC value to distinguish between models with and without trend, and concluded that the trend is apparent. Here I would like to use the likelihood ratio test to make inference again, since these two models are nested. Again, we fit the time series data with ARMA(1,2) model with and without trend, and get the log likelihood.

fit0 = arima(Jan$ozone_ave, order=c(1,0,2))
fit0
## 
## Call:
## arima(x = Jan$ozone_ave, order = c(1, 0, 2))
## 
## Coefficients:
##          ar1      ma1     ma2  intercept
##       0.9438  -0.7398  0.2695   287.3202
## s.e.  0.0434   0.1511  0.1271     9.4996
## 
## sigma^2 estimated as 85.19:  log likelihood = -186.61,  aic = 383.23
fit1 = arima(Jan$ozone_ave, order=c(1,0,2), xreg = Jan$Year)
fit1
## 
## Call:
## arima(x = Jan$ozone_ave, order = c(1, 0, 2), xreg = Jan$Year)
## 
## Coefficients:
##          ar1      ma1     ma2  intercept  Jan$Year
##       0.8749  -0.6992  0.2667  1466.9420   -0.5934
## s.e.  0.0873   0.1687  0.1240   576.9052    0.2901
## 
## sigma^2 estimated as 82.26:  log likelihood = -185.38,  aic = 382.76

\(D=2(\log{L_{alt}}-\log{L_0})=2.46\). The p-value of the \(\chi^2_1\) in one degree of freedom is 0.12. If we use \(\alpha=0.05\) as the statistical significance, we cannot reject the null hypothesis that the time series has no trend. This may possibly due to the fact that there exist a rise from 2000 to 2015, so that the trend is not linearly dependent on the time. What we can do in the future is to fit the trend with higher order polynomials rather than single linear term.

Conclusions