Atmospheric CO2 Level Trends and Seasonality [1958-2022]

Introduction

Understanding the impact of atmospheric concentration of carbon dioxide is of great importance in the ongoing effects to address global climate change. Carbon dioxide (CO₂) is generated by human activity (e.g. burning of fossil fuels) and natural processes (e.g. volcanic eruptions) and its level impacts the amount of heat that is “trapped” in the atmosphere.₁ CO₂ levels have increased 49% since the beginning of the industrial revolution₁ and are higher than they have been at any point of the last 400,000 years.₂ Research by the National Oceanic and Atmospheric Administration (NOAA) shows a very clear link between temperature change and atmospheric CO₂ level over time.₃ When the levels of CO₂ continue to climb, so does the global temperature change. This is supported by concurrent research that shows an corresponding and continual rise in oceanic surface temperature.₄

Through this investigation we seek to corroborate the reported trend of atmospheric CO₂ level, understand the underlying structure such as seasonality, and use modeling in both the time and frequency domains to quantify the associated trends and cycles of atmospheric CO₂ levels. This information will provide scientists and policy makers critical information about which policies, and the timing by which those policies, can be most impactful.

Data

Atmospheric CO₂ levels are measured each hour at the NASA Mauna Loa Observatory (Hawaii). This data is publicly available via the NOAA Global Monitoring Laboratory and by FTP transfer from the NASA Global Climate Change Website._1,5 Atmospheric CO₂ level data through 2020 from the Mauna Loa Observatory (as of the writing of this report) were also available through the NOAA Global Monitoring Laboratory Data Finder.₆

Our variables of interest from the NOAA CO₂ data set include Year, Month, Date (decimal), Monthly Average, and Deseasonalized Monthly Average. The NOAA dataset also includes variables for Number of Days, Standard Deviation of Number of Days, and Uncertainty Estimate of Monthly Average which are maintained and described in the linked dataset but are not utilized in this analysis.

## 'data.frame':    767 obs. of  8 variables:
##  $ Year             : int  1958 1958 1958 1958 1958 1958 1958 1958 1958 1958 ...
##  $ Month            : int  3 4 5 6 7 8 9 10 11 12 ...
##  $ Date             : num  1958 1958 1958 1958 1959 ...
##  $ MonthAvg         : num  316 317 318 317 316 ...
##  $ Deseason_MonthAvg: num  314 315 315 315 315 ...
##  $ NumDay           : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ sdDay            : num  -9.99 -9.99 -9.99 -9.99 -9.99 -9.99 -9.99 -9.99 -9.99 -9.99 ...
##  $ Uncert_MonthAvg  : num  -0.99 -0.99 -0.99 -0.99 -0.99 -0.99 -0.99 -0.99 -0.99 -0.99 ...

Variables of Interest

Year / Month (Year, Month) The Year and Month variables contain the calendar year and month respectively in which the CO₂ data was collected for the monthly average. The dataset contains information from 1958 to 2022, for a total of 767 observations.

Decimal Date (Date) The Date variable is a transformation of the Year/Month variables to a continuous decimal date. For example, March 1958 is converted to 1958.2027 in the Date variable. This makes plotting and analysis easier in statistical software.

Monthly Average (MonthAvg) Monthly Average is the variable of greatest interest, as it is the level of atmospheric CO₂ recorded in parts per million (ppm). Per the NOAA, monthly average values are computed from daily mean values. The monthly averages range from 312.43 to 419.13, with a mean of 356.87 \(\pm\) 29.94 and a median of 353.86.

Deseasonalized Monthly Average (Deseason_MonthAvg) NOAA scientists have previously determined a seasonal cycle for CO₂ levels, which is used to center the monthly averages average seasonal cycle. In this analysis we will conduct our own review of seasonality/cycle, but this variable has been retained for comparison purposes. The monthly deseasonalized averages range from 314.43 to 417.87, with a mean of 356.8691656 \(\pm\) 29.8907465 and a median of 353.84.

Missing Data

Per the data description provided by the NOAA, missing months have been interpolated and are indicated by negative “Standard Deviation of Number of Days” and “Uncertainty Estimate of Monthly Average” variable values. Given this imputation, there is no additional treatment needed in this analysis for missing values.

Unknown values are represented as negative numbers throughout the dataset.

Statistical Software and Packages

The analysis described in this report is conducted in R (Vienna, Austria)₇. Packages used for the analysis and report are ggplot2, pander, knitr, readr, dplyr, readxl, tidyverse, astsa, aTSA, and forecast.

Exploratory Data Analysis

An exploratory data analysis was conducted to understand the apparent patterns and structure of the data distribution, as well as inform the methods best suited to those characteristics.

Timeplot

A timeplot of the monthly atmospheric CO₂ level is shown in Figure 1. A quadratic linear model was fitted using the data and is shown overlying the timeplot in red:

\[ CO_2 = \beta_0 + \beta_1 * Date + \beta_2 * Date^2\]

Figure 1.

An alternative timeplot (Figure 2) provides the standard timeplot, but with a loess fitted line shown in red. This smoother further demonstrates a trend expected in the atmospheric CO₂ data.

Figure 2.

The both timeplots reveal several important characteristics about the CO₂ data: (1) there appears to be a clear trend in the data as CO₂ levels increase over time but that variance appears to be stable, (2) there is strong evidence of seasonality that will need to be addressed in the methods selected/analysis, and (3) a quadratic linear model is a viable candidate to describe the change in atmospheric CO₂ levels over the last 60 years.

Stationarity and Seasonality

Stationarity is an important computational and theoretical assumption in any time series analysis._8,9 As noted on the timeplots in Figures 1 and 2, we anticipate the atmospheric CO₂ levels are not well modeled as mean stationary - there is an apparent upward trend. The timeplots also reveal strong evidence of seasonality in atmospheric CO₂ levels. This makes sense intuitively as the use of fossil fuels will vary at different times of the year (with weather changes, increased travel periods, manufacturing cycles, etc.) and the cycles of natural CO₂ producing system (e.g. volcanoes and photosynthesis).

The auto-correlation (ACF) plot in Figure 3 confirms these observations. The sample ACF is not consistent with a model having independent error terms, so we reject a null hypothesis that the residuals of the quadratic linear model are iid.

Figure 3.

Detrending

To appropriately model the CO₂ atmospheric levels, given the notable non-stationarity, we must first detrend the data. This is achieved by considering differenced data, rather than the raw data.₈ Specifically, we transform data \(y^{*}_{1:N}\ to \ z_{2:N}\),

\[Z_n = \Delta y_n^{*} = y_n^{*} - y^{*}_{n-1}\]

We perform an augmented Dickey-Fuller test (ADF) to formally test for stationarity. The null hypothesis of ADF test is the time series is not stationary and the alternative hypothesis is stationary. From ADF test, we got a p-value which is smaller than 0.01, thus, we can reject the null hypothesis and accept the alternative hypothesis, the differenced series is stationary and it can be used to conduct an ARMA model.

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag   ADF p.value
## [1,]   0 -11.5    0.01
## [2,]   1 -18.1    0.01
## [3,]   2 -20.6    0.01
## [4,]   3 -17.2    0.01
## [5,]   4 -16.4    0.01
## [6,]   5 -20.6    0.01
## [7,]   6 -28.7    0.01
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -11.6    0.01
## [2,]   1 -18.2    0.01
## [3,]   2 -20.9    0.01
## [4,]   3 -17.5    0.01
## [5,]   4 -16.9    0.01
## [6,]   5 -21.6    0.01
## [7,]   6 -31.9    0.01
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -11.6    0.01
## [2,]   1 -18.2    0.01
## [3,]   2 -21.0    0.01
## [4,]   3 -17.6    0.01
## [5,]   4 -16.9    0.01
## [6,]   5 -21.7    0.01
## [7,]   6 -32.3    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

The updated time series plot for the differenced data (Figure 4) shows that the data becomes more stationary. It can also be verified that the slope of a linear regression is almost horizontal, shown in red on Figure 4.

Figure 4.

Frequency Domain Analysis

With strong evidence for seasonality/cycles within the atmospheric CO₂ data, we begin with Frequency Domain Analysis, which will allow us to quantify those cycles. Understanding the cycles, again, is useful in implementing policies and initiatives at the right time to be most effective, but also in overall model fitting.

Spectral Analysis

We can quantify the period of the atmospheric CO₂ level cycle by investigating its periodogram. The highest peak on the periodogram corresponds with the frequency we use to calculate the cycle.₁₀

\[ Cycle = \frac{1}{frequency} \] Figure 5 provides a periodogram smoothed with span (10,10).

Figure 5.

We can see that the spectrum have a peak at around 0.08, which corresponds to period of 12 months.

Using the calculated frequency, we can decompose the atmospheric CO₂ levels into its components - trend, noise, cycle - as shown in Figure 6.

Figure 6

The decomposition in Figure 6 further supports the trend we found in exploratory data analysis and the cycle calculated from the smoothed periodogram.

Model Fitting

With appropriately detrended data, we can begin to fit a time series model to the atmospheric CO₂ level data. Auto-regressive Moving Averages (ARMA) modeling is common in time series analysis so is investigated, though Seasonal Auto-regressive Moving Averages (SARMA) modeling is anticipated to be most appropriate given the evidence of seasonality observed in exploratory data analysis.

Fitting ARMA model

The CO₂ data is fit to an ARMA(p,q) model with the following form₈:

\[Y_{n}=\mu+\phi_{1}\left(Y_{n-1}-\mu\right)+\cdots+\phi_{p}\left(Y_{n-p}-\mu\right)+\varepsilon_{n}+\psi_{1} \varepsilon_{n-1}+\cdots+\psi_{q} \varepsilon_{n-q},\] where the error terms are Gaussian white noise, \(\varepsilon_n \sim N\left(0, \sigma^{2}\right)\). The parameters \((\phi_{1:p})\) is the coefficients for the autoregressive part of the model, \(\psi_{1:p}\) is the coefficients for the moving average part of the model, and \(\mu\) is the population mean.

We select the best (p,q) using the Akaike’s information criterion (AIC)₁₁ which is defined as \[AIC = -2 \times \ell(\theta) + 2D\]

AIC is an approach to minimize prediction error and mitigate overfitting of models. Using AIC criterion, we select the model with the lowest AIC value, or a competitively low AIC value when less complex models can offer similar results.₁₁

Table 1. AIC values for ARMA(p,q) models of the CO₂ data.

## Warning in log(s2): NaNs produced

Based on the table above, we observe that two competetively low AIC values occur for ARMA(3,3) with an AIC value of 579.98 and ARMA (3,4) with and AIC value of 496.92. With similar AIC values, we favor the slightly less complex ARMA(3,3) model.

The ARMA(3,3) model coefficients, standard errors, and other characteristic statistics are shown below:

## Series: CO2new$diff 
## ARIMA(3,0,3) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     ma2     ma3    mean
##       1.4949  -1.4340  0.4948  -0.5866  0.8259  0.1242  0.1356
## s.e.  0.0636   0.0755  0.0591   0.0645  0.0349  0.0452  0.0774
## 
## sigma^2 = 0.4941:  log likelihood = -816.19
## AIC=1648.39   AICc=1648.58   BIC=1685.53

We next use the AR and MA polynomial roots (\(\phi\) and \(\psi\), respectively) to investigate the causality and invertibility of the ARMA(3,3) model. Causality is an important feature of a model that will be used for forecasting - it requires that future values can be determined only using previously observed values. Invertibility is also a critical feature of a good ARMA model, as it suggests the stability of that model. Both are investigated, again, by reviewing the polynomial roots of the model coefficients. Inverse AR roots outside the unit circle suggest causality and inverse MA roots outside the unit circle suggest invertibility.₁₂

Figure 7. AR and MA polynomial roots for ARMA(3,3) model

The three dots in the left hand plot correspond to the roots of the AR polynomials \((\phi)\), while the three dots in the right hand plot corresponds to the root of MA polynomials \((\psi)\). The AR roots and MA roots are close to the border of unit circle. It suggests that the fitted model might not be causal and is at the threshold of non-invertibility. This suggests that the ARMA(3,3) model may become unstable and might not be good for forecasting.

Thus, we fit the slightly more complex ARMA(3,4) model which had the lowest AIC value. But, again we find that the model is potentially unstable by review of the inverse polynomial roots. This suggests the added complexity does not provide notable value toward the causality and invertibility.

## Series: CO2new$diff 
## ARIMA(3,0,4) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ar3      ma1      ma2     ma3     ma4    mean
##       1.5194  -0.6679  -0.1660  -1.1672  -0.0319  0.1185  0.2662  0.1327
## s.e.  0.1312   0.2246   0.1292   0.1228   0.1902  0.0809  0.0455  0.0120
## 
## sigma^2 = 0.3197:  log likelihood = -650.18
## AIC=1318.36   AICc=1318.6   BIC=1360.14

Figure 8.

Finally, we fit the ARMA (3,2) that is simpler and have the third lowest AIC values among the candidate models to determine if we can achieve a more stable model. We observe that roots of AR polynomials and MA polynomials are further inside the unit circle.

## Series: CO2new$diff 
## ARIMA(3,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2      ar3      ma1      ma2    mean
##       0.8472  0.2376  -0.5998  -0.2282  -0.5969  0.1334
## s.e.  0.2705  0.4152   0.2263   0.2795   0.2517  0.0081
## 
## sigma^2 = 0.4247:  log likelihood = -758.35
## AIC=1530.7   AICc=1530.84   BIC=1563.19

Figure 9.

Wilk’s approximation

As described in the above ARMA model fitting section, we have 3 candidate ARMA models - ARMA(3,2), ARMA(3,3), and ARMA(3,4) - without a clear indication about which model is preferred based on causality or invertibility. We use Wilk’s approximation to determine whether these models are significantly different from one another. If the models are not significantly differnt, we would choose ARMA(3,2) as it is a simplest model. The null and alternate hypotheses under Wilk’s approximation are:

\[H_{0} \ \ : \theta \in \Theta_0 = (\phi_{1:3},\psi_{1:2} ,\mu, \sigma^2)\] \[H_{1} \ \ : \theta \in \Theta_1 = (\phi_{1:3},\psi_{1:3} ,\mu, \sigma^2)\]

Under the null hypothesis, the Wilk’s approximation is the following₁₁,

\[\Lambda = (\mathcal{l}^{(1)} - \mathcal{l}^{(0)}) \approx \frac{1}{2}\chi^2_{D^{(1)}-D^{(0)}}, \ \ \ \ \]

where \(D^{(0)}\) and \(D^{(1)}\) are the number of estimated parameters for null and alternative hypothesis, \(l_{(0)}\) and \(l_{(1)}\) are the maximum log-likelihood under null and and alternative hypotheses.

ARMA(3,2) vs. ARMA (3,3)

The log-likelihood difference is -115.69, and the cutoff value is 3.84. Since the log-likelihood difference is smaller than the cutoff value, we fail to reject the null hypothesis and conclude that the ARMA(3,2) is not significantly different than ARMA (3,3) and thus we choose ARMA(3,2).

ARMA (3,2) vs. ARMA (3,4)

The log-likelihood difference is 216.34, and the cutoff value is 3.84. Again, since the log-likelihood difference is smaller than the cutoff value, we fail to reject the null hypothesis and conclude that the ARMA(3,2) is not significantly different than ARMA (3,4) and thus we choose ARMA(3,2).

ARMA Model Residual Analysis

After finding the best ARMA model, we check if the model assumptions are met. First, we need to verify if the residuals follow the normal distribution. Based on the QQ plot below, the errors terms are roughly normally distributed. Figure 10.

Figure 11.

From the ACF plot of residuals (Figure 11), we can see that there is an seasonal pattern which span across around 12 lags. This makes sense as the use of fossil fuels, industry cycles, and photosynthesis of plants can all reasonably be considered on an annual cycle.

SARIMA

Throughout the analysis there has been strongly supported evidence for a 12 month cycle in atmospheric CO₂ levels. SARMA modeling is a special case of ARMA where the AR and MA polynomials are factored into monthly and annual polynomial. This makes SARMA modeling very effective when there is known seasonality in the data of interest.₉

Model Selection

The CO₂ data is fit to a SARMA(p,q) x (P,Q)₁₂ model with the following form₉:

\[ \phi(B)\Phi(B^{12})(Y_n - \mu) = \psi(B)\Psi(B^{12})\varepsilon_n\,\] where the error terms are a white noise process and

\[\begin{aligned} \mu &= E[Y_n] \\ \phi(x) &= 1 - \phi_1x - \dots - \phi_px^p, \\ \psi(x) &= 1 + \psi_1x - \dots + \psi_qx^q, \\ \Phi(x) &= 1 - \Phi_1x - \dots - \Phi_px^p, \\ \Psi(x) &= 1 + \Psi_1x - \dots + \Psi_qx^q.\\ \end{aligned}\]

Like ARMA modeling described in the above section, the parameters \((\phi_{1:p})\) are the coefficients for the autoregressive part of the model, \(\psi_{1:p}\) are the coefficients for the moving average part of the model, and \(\mu\) is the population mean. SARMA models further include the seasonal period which is captured by \(\Phi\) and \(\Psi\).

We select the best parameters for the SARMA model using AIC. The models with the lowest/competetive AICs are shown in Table 2, along with the parameter settings, log likelihoods, and degrees of freedom.

Table 2. AIC table for SARIMA Modeling of CO₂ data

	p	d	q	q.1	AIC	LogLikelihood	dof
64	3	1	3	1	1.49	-560.97	7
62	3	1	2	1	1.49	-564.41	6
60	3	1	1	1	1.71	-647.11	5
44	2	1	1	1	1.76	-666.41	4
56	3	0	3	1	1.94	-733.88	7
54	3	0	2	1	1.94	-736.06	6

From the AIC table we find the most favorable candidates to be:

SARIMA(3, 1, 3, 0, 0, 1), with an AIC of 1.49
SARIMA(3, 1, 2, 0, 0, 1), with an AIC of 1.49
SARIMA(3, 1, 1, 0, 0, 1), with an AIC of 1.71
SARIMA(2, 1, 1, 0, 0, 1), with an AIC of 1.76

Candidate Model 1 vs Candidate Model 2

From the candidate models above, we select between model 1 and 2 using the log likelihood test:

\[ \ell^{<1>} - \ell^{<0>} \approx (1/2) \chi^2_{D^{<1>}-D^{<0>},} \] where \(\chi^2_d\) is a chi-squared random variable on d degrees of freedom.

H₀: SARIMA(3, 1, 2, 0, 0, 1)

H₁: SARIMA(3, 1, 3, 0, 0, 1)

We find p=0.0086588, so we H₀ reject the null hypothesis and accept SARIMA(3, 1, 3, 0, 0, 1).

Candidate Model 1 vs Candidate Model 3

We now compare the SARIMA(3, 1, 3, 0, 0, 1) with model candidate 3 - SARIMA(3, 1, 1, 0, 0, 1).

H₀: SARIMA(3, 1, 1, 0, 0, 1)

H₁: SARIMA(3, 1, 3, 0, 0, 1)

We find p=0, so we reject the H₀ hypothesis and accept SARIMA(3, 1, 3, 0, 0, 1).

Candidate Model 1 vs Candidate Model 4

We now compare the SARIMA(3, 1, 3, 0, 0, 1) with model candidate 4 - SARIMA(2, 1, 1, 0, 0, 1) - to aid in making our final selection.

H₀: SARIMA(2, 1, 1, 0, 0, 1)

H₁: SARIMA(3, 1, 3, 0, 0, 1)

We find p=0, so we reject the H₀ hypothesis and accept SARIMA(3, 1, 3, 0, 0, 1) by AIC.

Model Summary

The SARIMA model selected by AIC is SARIMA(3, 1, 3, 0, 0, 1), however when model diagnostics were performed include polynomial root review, Rolling Test of coefficients, and simulations for confidence intervals. The SARIMA(3, 1, 3, 0, 0, 1) was very unstable, so despite the preference from AIC, functionally it was not an acceptable model.

The unit roots (shown below) are within the unit circle which, along with the computations errors found in computing the model, indicate the instability of the SARIMA(3, 1, 3, 0, 0, 1) model.

SARIMA(3, 1, 3, 0, 0, 1) AR polynomial roots:

## [1]   0.8689107+0.5033621i   0.8689107-0.5033621i -10.8490967+0.0000000i

SARIMA(3, 1, 3, 0, 0, 1) MA polynomial roots:

## [1]  0.9126207+0.4292086i  0.9126207-0.4292086i -3.5918308-0.0000000i

To address the issues of stability we opt for the less complex model of SARIMA(2, 1, 1, 0, 0, 1), which had a competitive AIC value and allow for much better stability overall.

The SARIMA(2, 1, 1, 0, 0, 1) model coefficients, standard errors, and diagnostics are presented in the Model Diagnostics section.

Simulations for Confidence Intervals

A simulation study provides confidence intervals for the AR, MA, and SMA model parameters. Figure 12 shows density plots for each parameter along with lines for the 95% confidence interval.

Figure 12.

From the bootstrap, we can see that the fisher confidence interval for AR1, AR2 and SMA1 is more reliable, but for MA1, it’s not that reliable because the root of MA1 is close to the edge of unit circle.

Model Diagnostics

Residuals

The model residuals and q-q plot are shown in the figure below:

## initial  value 0.209343 
## iter   2 value -0.124892
## iter   3 value -0.272185
## iter   4 value -0.335150
## iter   5 value -0.361222
## iter   6 value -0.365201
## iter   7 value -0.366096
## iter   8 value -0.367419
## iter   9 value -0.373556
## iter  10 value -0.376098
## iter  11 value -0.386334
## iter  12 value -0.405790
## iter  13 value -0.427040
## iter  14 value -0.430834
## iter  15 value -0.494836
## iter  16 value -0.513242
## iter  17 value -0.527859
## iter  18 value -0.544320
## iter  19 value -0.544974
## iter  20 value -0.550234
## iter  21 value -0.550376
## iter  22 value -0.550435
## iter  23 value -0.550439
## iter  24 value -0.550442
## iter  25 value -0.550442
## iter  26 value -0.550443
## iter  26 value -0.550443
## final  value -0.550443 
## converged
## initial  value -0.548856 
## iter   2 value -0.548885
## iter   3 value -0.548916
## iter   4 value -0.548932
## iter   5 value -0.548947
## iter   6 value -0.548949
## iter   7 value -0.548949
## iter   8 value -0.548949
## iter   8 value -0.548949
## iter   8 value -0.548949
## final  value -0.548949 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xreg, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##     REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2      ma1    sma1    xreg
##       1.4732  -0.7713  -0.9132  0.4087  1.6115
## s.e.  0.0245   0.0245   0.0109  0.0287  0.1034
## 
## sigma^2 estimated as 0.3316:  log likelihood = -666.41,  aic = 1344.82
## 
## $degrees_of_freedom
## [1] 761
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    1.4732 0.0245  60.0316       0
## ar2   -0.7713 0.0245 -31.5236       0
## ma1   -0.9132 0.0109 -83.9615       0
## sma1   0.4087 0.0287  14.2385       0
## xreg   1.6115 0.1034  15.5818       0
## 
## $AIC
## [1] 1.755644
## 
## $AICc
## [1] 1.755747
## 
## $BIC
## [1] 1.791998

The residuals still show some autocorrelation that we should be aware of as we interpret the final results from the SARIMA model. This does, however, show improvement from the ARMA model.

The Q-Q plot supports the assumption that the residuals are generally normally distributed.

Polynomial Root Review for Causality and Invertibility

As described in the SAIRMA model fitting section, we review the AR and MA polynomial roots to review causality and invertibility of the model.

The AR roots are:

## [1] 0.9549914+0.6200299i 0.9549914-0.6200299i

This are within the unit circle, which suggests our model is not necessarily causal and will have limitations where it comes to prediction. However, this again represents an improvement from the ARMA model.

The MA root is:

## [1] 1.09508+0i

The MA root is outside the unit circle, which means our model is invertible and will remain more stable. This is preferable to both the ARMA model and the initial AIC-selected SARIMA(3,1,3,0,0,1) which had MA roots within the unit circle.

Rolling Test for coefficients

Cross-validation is a valuable tool in model fitting, but is not possible within time dependent structure, as in time series. Instead, we conduct a rolling test of coefficients. A good model should have relatively stable coefficients over rolling time periods.

Figure 13 shows the model coefficients on rolling time bases. The plots show generally stable coefficients, though the MA component fluctuates notably. In future work, additional attention on the MA component of the model may provide more opportunity to fine tune for slight improvements.

Figure 13.

Prediction

Finally, we perform a final test of our model accuracy by applying the model for prediction. We apply a rolling prediction patter and use a 100 time point window to attempt to predict beyond the window. Figure 14 shows the predictive performance compared with the actual data from the atmospheric CO₂ data. The performance looks quite good, but additional testing and comfirming future forecasts can increase the confidence in this model.

Figure 14.

Conclusion

Through this investigation we sought to corroborate the reported trend of atmospheric CO₂ level, understand the underlying structure such as seasonality, and use modeling in both the time and frequency domains to quantify the associated trends and cycles of atmospheric CO₂ levels.

We were, indeed, able to confirm the reported trend and 12 month cycle in CO₂ atmospheric levels. CO₂ levels in the atmosphere are rising which is, in turn, contributing to the rising global temperature. This pattern indicates an urgent need to implement policies that will begin to reverse or mitigate CO₂ released by human activity.

References

NASA. (2021, November 16). Carbon dioxide concentration. NASA. Retrieved February 21, 2022, from https://climate.nasa.gov/vital-signs/carbon-dioxide/
NASA. (2020, February 5). Graphic: The relentless rise of carbon dioxide – climate change: Vital signs of the planet. NASA. Retrieved February 21, 2022, from https://climate.nasa.gov/climate_resources/24/graphic-the-relentless-rise-of-carbon-dioxide/
Temperature change and carbon dioxide … - NCEI offers acces. (n.d.). Retrieved February 21, 2022, from https://www.ncei.noaa.gov/sites/default/files/2021-11/8%20-%20Temperature%20Change%20and%20Carbon%20Dioxide%20Change%20-%20FINAL%20OCT%202021.pdf
NASA. (2022, January 19). Global surface temperature. NASA. Retrieved February 21, 2022, from https://climate.nasa.gov/vital-signs/global-temperature/
US Department of Commerce, N. O. A. A. (2005, October 1). Global Monitoring Laboratory - Carbon Cycle Greenhouse Gases. GML. Retrieved February 21, 2022, from https://gml.noaa.gov/ccgg/trends/trends_log.html [ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt]
US Department of Commerce, N. O. A. A. (2005, October 1). ESRL Global Monitoring Laboratory - FTP navigator. GML. Retrieved February 21, 2022, from https://gml.noaa.gov/dv/data/index.php?category=Greenhouse%2BGases¶meter_name=Carbon%2BDioxide&site=MLO
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Ionides, E. L. (n.d.). Modeling and analysis of time series Datastats/datasci 531, winter 2022Chapter 3: Stationarity, White Noise, and some basic time series models. Modeling and Analysis of Time Series Data STATS/DATASCI 531, Winter 2022 Chapter 3: Stationarity, white noise, and some basic time series models. Retrieved February 21, 2022, from https://ionides.github.io/531w22/03/index.html
Ionides, E. L. (n.d.). Modeling and analysis of time series Datastats/datasci 531, winter 2022 Chapter 6: Extending the ARMA model: Seasonality, integration and trend. Modeling and Analysis of Time Series Data STATS/DATASCI 531, Winter 2022 Chapter 6: Extending thh ARMA model: Seasonality, integration and trend. Retrieved February 21, 2022, from https://ionides.github.io/531w22/06/slides.pdf
Ionides, E. L. (n.d.). Modeling and analysis of time series Datastats/datasci 531, Winter 2022Chapter 7: Introduction to time series analysis in the frequency domain. Modeling and Analysis of Time Series Data STATS/DATASCI 531, Winter 2022 Chapter 7: Introduction to time series analysis in the frequency domain. Retrieved February 21, 2022, from https://ionides.github.io/531w22/07/index.html
2mm modeling and analysis of time … - ionides.github.io. (n.d.). Retrieved February 21, 2022, from https://ionides.github.io/531w22/05/slides.pdf
Ionides, E. L. (n.d.). Modeling and analysis of time series Datastats/datasci 531, winter 2022Chapter 4: Linear Time Series models and the algebra of Arma Models. Modeling and Analysis of Time Series Data STATS/DATASCI 531, Winter 2022 Chapter 4: Linear time series models and the algebra of ARMA models. Retrieved February 21, 2022, from https://ionides.github.io/531w22/04/index.html

Atmospheric CO2 Level Trends and Seasonality [1958-2022]

STATS 531 Midterm Project

2/20/2022

Introduction

Data

Variables of Interest

Missing Data

Statistical Software and Packages

Exploratory Data Analysis

Timeplot

Stationarity and Seasonality

Detrending

Frequency Domain Analysis

Spectral Analysis

Model Fitting

Fitting ARMA model

Wilk’s approximation

ARMA(3,2) vs. ARMA (3,3)

ARMA (3,2) vs. ARMA (3,4)

ARMA Model Residual Analysis

SARIMA

Model Selection

Candidate Model 1 vs Candidate Model 2

Candidate Model 1 vs Candidate Model 3

Candidate Model 1 vs Candidate Model 4

Model Summary

Simulations for Confidence Intervals

Model Diagnostics

Residuals

Polynomial Root Review for Causality and Invertibility

Rolling Test for coefficients

Prediction

Conclusion

References