This final project is the further anlysis of the midterm project. Candy Production Index(CPI) dataset is from FRED economic research and includes monthly candy production index from 1972 Jananury to 2020 March. Both projects make analysis of the candy production index dataset, while the focuses are different. Midterm project focused on the seasonal effect of candy production index and mainly used ARIMA and SARIMA model to fit. While in final project, the main point is to fit different model on the CPI and select the best one with larger log likelihood. As the focus changed, the data used will be the demeaned seasonal adjusted one instead of raw data.
From the summary below, we can see that the CPI range from 55.99 to 127.5 on the level of 2012 index as 100 and the dataset includes 579 data points.
## DATE IPG3113S year month time
## 1 1972-01-01 80.9025 1972 1 1972.083
## 2 1972-02-01 76.0587 1972 2 1972.167
## 3 1972-03-01 77.0023 1972 3 1972.250
## 4 1972-04-01 76.4226 1972 4 1972.333
## 5 1972-05-01 74.9647 1972 5 1972.417
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 55.99 89.82 103.10 101.82 113.77 127.50
Plotting raw IPI against time and the acf of the data, we can see a clear trend with large auto correlation. This means data transformtion is needed. After data is demeaned, the plot is quite more stationary and the data is nearly uncorrelated.
From the spectrum plot, we can see that the red dashed line shows it reached its peak at a frequency value of 0.33 which means it may has a seasonality of a period of 3 months.
In decomposition plot, we can see the candy_low variable, representing the trend and cycles waved inquite small ranges and the noise plot is similar with the original demeaned plot. This indicates the data process is successfully detrended and what left was mostly the noise part.
In midterm project, we fitted ARIMA and SARIMA model on the CPI data and the output of simulation shows it worked well. While in the final project, we will start with the ARIMA & SARIMA model selected by midterm analysis and then refit with greater aic value and larger log likelihood.
ARIMA & SARIMA selected by midterm with no seasonal adjusted dan do not remove the trend.
The ARIMA model selected is ARIMA(0,1,1) and the sarima model selected is \(SARIMA(4,0,3)\times(0,1,1)_{12}\).
##
## Call:
## arima(x = dmcandy, order = c(0, 1, 1))
##
## Coefficients:
## ma1
## -1.0000
## s.e. 0.0045
##
## sigma^2 estimated as 0.001197: log likelihood = 1119.05, aic = -2234.1
##
## Call:
## arima(x = dmcandy, order = c(4, 0, 3), seasonal = list(order = c(0, 1, 1), period = 12))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 sma1
## -0.4470 -0.1930 0.6928 -0.0075 0.2747 -0.0202 -0.8427 -1.0000
## s.e. 0.0903 0.1296 0.1225 0.0552 0.0811 0.0934 0.0807 0.0302
##
## sigma^2 estimated as 0.00103: log likelihood = 1118.79, aic = -2219.58
Fristly, We fit the ARIMA model with different AR and MA parameter values and get a table of AIC value. In the table, we can see that all the value is less than -2234.1 which is ARIMA(0,1,1) aic value shown above.
## MA0 MA1 MA2 MA3 MA4 MA5
## AR0 -2245.355 -2265.974 -2275.923 -2274.064 -2282.602 -2281.824
## AR1 -2259.815 -2281.892 -2280.605 -2280.603 -2282.744 -2280.136
## AR2 -2271.535 -2280.398 -2279.190 -2281.365 -2280.875 -2288.071
## AR3 -2269.547 -2281.345 -2282.467 -2307.746 -2306.120 -2304.231
## AR4 -2276.601 -2283.556 -2281.905 -2305.886 -2305.207 -2303.256
Then we fit SARIMA model with new seasonal period as 3 months. And comparing \(SARIMA(0,0,3)\times(0,1,1)_{3}\) with ARIMA(0,1,1) selected by ARIMA aic value.
## MA0 MA1 MA2 MA3 MA4 MA5
## AR0 -2215.226 -2235.656 -2245.296 -2214.812 -2230.824 -2240.767
## AR1 -2229.564 -2250.319 -2248.821 -2248.669 -2252.728 -2250.956
## AR2 -2241.337 -2240.171 -2247.789 -2237.881 -2242.486 -2248.936
## AR3 -2239.388 -2249.488 -2250.823 -2257.608 -2256.928 -2247.023
## AR4 -2246.332 -2252.604 -2250.828 -2256.565 -2254.783 -2245.364
##
## Call:
## arima(x = dmcandy, order = c(0, 0, 3), seasonal = list(order = c(0, 1, 1), period = 3))
##
## Coefficients:
## ma1 ma2 ma3 sma1
## -0.0465 0.0091 -0.9627 0.0249
## s.e. 0.0194 0.0107 0.0200 0.0453
##
## sigma^2 estimated as 0.001202: log likelihood = 1112.41, aic = -2214.81
From the output above, we can see that ARIMA(0,1,1) model had a log-likelihood of 1119.05 which is greater than the SARIMA model.
From the ACF plot, we can see that the residual is quite stationary and from the QQplot, it indicates that the residual fitted normal distribution with light tails on both sides.
In this part, we will fit GARCH(1,1) model to our CPI dataset. And the output shows that the logliklihood is 1151.887 which is greater than the ARIMA(0,1,1) selected in part 2. This shows GARCH(1,1) model fits better than ARIMA(0,1,1) model on this CPI dataset.
The GARCH(1,1) model used is shown below:
\[
Y_n = \epsilon_n\sqrt{V_n}
\] where
\[ V_n = \alpha_0+\alpha_1Y_{n-1}^2 + \beta_1V_{n-1} \] and \(\epsilon_{1:N}\) is white noise.
##
## Call:
## garch(x = dmcandy, grad = "numerical", trace = FALSE)
##
## Model:
## GARCH(1,1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.72927 -0.56529 -0.01359 0.60788 3.16261
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## a0 2.621e-05 1.450e-05 1.807 0.070717 .
## a1 5.907e-02 1.592e-02 3.710 0.000207 ***
## b1 9.179e-01 2.296e-02 39.981 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Diagnostic Tests:
## Jarque Bera Test
##
## data: Residuals
## X-squared = 16.822, df = 2, p-value = 0.0002224
##
##
## Box-Ljung test
##
## data: Squared.Residuals
## X-squared = 0.36558, df = 1, p-value = 0.5454
## 'log Lik.' 1151.887 (df=3)
The fitted GARCH model is:
\[
Y_n = \epsilon_n\sqrt{V_n}
\]
where
\[ V_n = 2.621\times10^{-5}+0.06Y_{n-1}^2 + 0.918V_{n-1} \]
In this part, we will fit POMP model on CPI dataset. As production index is similar to price index, we will fit pomp model and compare the output with other model selected below.
In this part, the pomp model is from Breto(2014).
\[ R_n = \frac{exp(2G_n)-1}{exp(2G_n)+1} \] where \({G_n}\) is the usual, Gaussian random walk.
\[ Y_n = exp\{H_n/2\}\epsilon_n \]
\[ H_n = \mu_h(1-\phi)+\phi H_{n-1} + \beta_{n-1}R_nexp\{-H_{n-1}/2\}+\omega_n \] \[ G_n = G_{n-1}+\upsilon_n \]
where
\[
\beta_n =Y_n\sigma_\eta\sqrt{1-\phi^2}
\] \[
\epsilon_n ~is \ i .i.d \ N(0,1)
\]
\[ \upsilon_n ~is \ i .i.d \ N(0,{\sigma_\upsilon}^2) \]
\[ \omega_n ~is \ i .i.d \ N(0,{\sigma_\omega}^2) \]
The runlevel of fitting the pomp model shown below:
run_level <- 3
candy_Np <- switch(run_level,100,1e3,2e3)
candy_Nmif <- switch(run_level,10,100,200)
candy_Nreps_eval <- switch(run_level,4,10,20)
candy_Nreps_local <- switch(run_level,10,20,20)
candy_Nreps_global <- switch(run_level,10,20,100)
The test of loglikelihood works well with efficient output and standard error.
## se
## -808.2704481 0.1611517
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1126 1143 1148 1146 1154 1155
The loglikelihood of local search is 1152. From the plot, it indicates that the range to get larger loglikelihood of each parameters.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1114 1130 1143 1142 1154 1159
The loglikelihood of global search is 1159.
From the plot above, we can see that all the parameters almost converged after enough times of iteration.
From the analysis above, we found that POMP model fits the best among the ARIMA&SARIMA, GARCH and POMP model with the largest log likelihood as 1159. And POMP model fits well as the runtime is quite low to converge.