1 Background Introduction

This final project is the further anlysis of the midterm project. Candy Production Index(CPI) dataset is from FRED economic research and includes monthly candy production index from 1972 Jananury to 2020 March. Both projects make analysis of the candy production index dataset, while the focuses are different. Midterm project focused on the seasonal effect of candy production index and mainly used ARIMA and SARIMA model to fit. While in final project, the main point is to fit different model on the CPI and select the best one with larger log likelihood. As the focus changed, the data used will be the demeaned seasonal adjusted one instead of raw data.

1.1 Exploratory Data Analysis (EDA)

From the summary below, we can see that the CPI range from 55.99 to 127.5 on the level of 2012 index as 100 and the dataset includes 579 data points.

##         DATE IPG3113S year month     time
## 1 1972-01-01  80.9025 1972     1 1972.083
## 2 1972-02-01  76.0587 1972     2 1972.167
## 3 1972-03-01  77.0023 1972     3 1972.250
## 4 1972-04-01  76.4226 1972     4 1972.333
## 5 1972-05-01  74.9647 1972     5 1972.417

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   55.99   89.82  103.10  101.82  113.77  127.50

1.2 Detrend & data transform

Plotting raw IPI against time and the acf of the data, we can see a clear trend with large auto correlation. This means data transformtion is needed. After data is demeaned, the plot is quite more stationary and the data is nearly uncorrelated.

From the spectrum plot, we can see that the red dashed line shows it reached its peak at a frequency value of 0.33 which means it may has a seasonality of a period of 3 months.

In decomposition plot, we can see the candy_low variable, representing the trend and cycles waved inquite small ranges and the noise plot is similar with the original demeaned plot. This indicates the data process is successfully detrended and what left was mostly the noise part.

2 Midterm part(ARMA&SARIMA model)

In midterm project, we fitted ARIMA and SARIMA model on the CPI data and the output of simulation shows it worked well. While in the final project, we will start with the ARIMA & SARIMA model selected by midterm analysis and then refit with greater aic value and larger log likelihood.

2.1 Fit ARIMA&SARIMA selected by midterm project

ARIMA & SARIMA selected by midterm with no seasonal adjusted dan do not remove the trend.
The ARIMA model selected is ARIMA(0,1,1) and the sarima model selected is \(SARIMA(4,0,3)\times(0,1,1)_{12}\).

## 
## Call:
## arima(x = dmcandy, order = c(0, 1, 1))
## 
## Coefficients:
##           ma1
##       -1.0000
## s.e.   0.0045
## 
## sigma^2 estimated as 0.001197:  log likelihood = 1119.05,  aic = -2234.1

## 
## Call:
## arima(x = dmcandy, order = c(4, 0, 3), seasonal = list(order = c(0, 1, 1), period = 12))
## 
## Coefficients:
##           ar1      ar2     ar3      ar4     ma1      ma2      ma3     sma1
##       -0.4470  -0.1930  0.6928  -0.0075  0.2747  -0.0202  -0.8427  -1.0000
## s.e.   0.0903   0.1296  0.1225   0.0552  0.0811   0.0934   0.0807   0.0302
## 
## sigma^2 estimated as 0.00103:  log likelihood = 1118.79,  aic = -2219.58

2.2 Fit new ARIMA & SARIMA model

Fristly, We fit the ARIMA model with different AR and MA parameter values and get a table of AIC value. In the table, we can see that all the value is less than -2234.1 which is ARIMA(0,1,1) aic value shown above.

##           MA0       MA1       MA2       MA3       MA4       MA5
## AR0 -2245.355 -2265.974 -2275.923 -2274.064 -2282.602 -2281.824
## AR1 -2259.815 -2281.892 -2280.605 -2280.603 -2282.744 -2280.136
## AR2 -2271.535 -2280.398 -2279.190 -2281.365 -2280.875 -2288.071
## AR3 -2269.547 -2281.345 -2282.467 -2307.746 -2306.120 -2304.231
## AR4 -2276.601 -2283.556 -2281.905 -2305.886 -2305.207 -2303.256

Then we fit SARIMA model with new seasonal period as 3 months. And comparing \(SARIMA(0,0,3)\times(0,1,1)_{3}\) with ARIMA(0,1,1) selected by ARIMA aic value.

##           MA0       MA1       MA2       MA3       MA4       MA5
## AR0 -2215.226 -2235.656 -2245.296 -2214.812 -2230.824 -2240.767
## AR1 -2229.564 -2250.319 -2248.821 -2248.669 -2252.728 -2250.956
## AR2 -2241.337 -2240.171 -2247.789 -2237.881 -2242.486 -2248.936
## AR3 -2239.388 -2249.488 -2250.823 -2257.608 -2256.928 -2247.023
## AR4 -2246.332 -2252.604 -2250.828 -2256.565 -2254.783 -2245.364

## 
## Call:
## arima(x = dmcandy, order = c(0, 0, 3), seasonal = list(order = c(0, 1, 1), period = 3))
## 
## Coefficients:
##           ma1     ma2      ma3    sma1
##       -0.0465  0.0091  -0.9627  0.0249
## s.e.   0.0194  0.0107   0.0200  0.0453
## 
## sigma^2 estimated as 0.001202:  log likelihood = 1112.41,  aic = -2214.81

2.3 Compare and select

2.3.1 ARIMA model selected

From the output above, we can see that ARIMA(0,1,1) model had a log-likelihood of 1119.05 which is greater than the SARIMA model.

2.3.2 Diagonis analysis

From the ACF plot, we can see that the residual is quite stationary and from the QQplot, it indicates that the residual fitted normal distribution with light tails on both sides.

3 GARCH model

In this part, we will fit GARCH(1,1) model to our CPI dataset. And the output shows that the logliklihood is 1151.887 which is greater than the ARIMA(0,1,1) selected in part 2. This shows GARCH(1,1) model fits better than ARIMA(0,1,1) model on this CPI dataset.

The GARCH(1,1) model used is shown below:
\[ Y_n = \epsilon_n\sqrt{V_n} \] where

\[ V_n = \alpha_0+\alpha_1Y_{n-1}^2 + \beta_1V_{n-1} \] and \(\epsilon_{1:N}\) is white noise.

## 
## Call:
## garch(x = dmcandy, grad = "numerical", trace = FALSE)
## 
## Model:
## GARCH(1,1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.72927 -0.56529 -0.01359  0.60788  3.16261 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)    
## a0 2.621e-05   1.450e-05    1.807 0.070717 .  
## a1 5.907e-02   1.592e-02    3.710 0.000207 ***
## b1 9.179e-01   2.296e-02   39.981  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 16.822, df = 2, p-value = 0.0002224
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 0.36558, df = 1, p-value = 0.5454

## 'log Lik.' 1151.887 (df=3)

The fitted GARCH model is:
\[ Y_n = \epsilon_n\sqrt{V_n} \]

where

\[ V_n = 2.621\times10^{-5}+0.06Y_{n-1}^2 + 0.918V_{n-1} \]

4 POMP Model

4.1 Model description

In this part, we will fit POMP model on CPI dataset. As production index is similar to price index, we will fit pomp model and compare the output with other model selected below.

4.1.1 Financial leverage model

In this part, the pomp model is from Breto(2014).

\[ R_n = \frac{exp(2G_n)-1}{exp(2G_n)+1} \] where \({G_n}\) is the usual, Gaussian random walk.

\[ Y_n = exp\{H_n/2\}\epsilon_n \]

\[ H_n = \mu_h(1-\phi)+\phi H_{n-1} + \beta_{n-1}R_nexp\{-H_{n-1}/2\}+\omega_n \] \[ G_n = G_{n-1}+\upsilon_n \]

where
\[ \beta_n =Y_n\sigma_\eta\sqrt{1-\phi^2} \] \[ \epsilon_n ~is \ i .i.d \ N(0,1) \]

\[ \upsilon_n ~is \ i .i.d \ N(0,{\sigma_\upsilon}^2) \]

\[ \omega_n ~is \ i .i.d \ N(0,{\sigma_\omega}^2) \]

The runlevel of fitting the pomp model shown below:

run_level <- 3
candy_Np <- switch(run_level,100,1e3,2e3)
candy_Nmif <- switch(run_level,10,100,200)
candy_Nreps_eval <- switch(run_level,4,10,20)
candy_Nreps_local <- switch(run_level,10,20,20)
candy_Nreps_global <- switch(run_level,10,20,100)

The test of loglikelihood works well with efficient output and standard error.

##                        se 
## -808.2704481    0.1611517

4.2 Local search

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1126    1143    1148    1146    1154    1155

The loglikelihood of local search is 1152. From the plot, it indicates that the range to get larger loglikelihood of each parameters.

4.3 Global Search

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1114    1130    1143    1142    1154    1159

The loglikelihood of global search is 1159.

From the plot above, we can see that all the parameters almost converged after enough times of iteration.

5 Conclusion

From the analysis above, we found that POMP model fits the best among the ARIMA&SARIMA, GARCH and POMP model with the largest log likelihood as 1159. And POMP model fits well as the runtime is quite low to converge.

6 Reference

Candy Production dataset
https://fred.stlouisfed.org/series/IPG3113S?utm_source=series_page&utm_medium=related_content&utm_term=other_formats&utm_campaign=other_format
Class notes
https://github.com/ionides/531w20
Format of toc
https://ionides.github.io/531w18/final_project/21/final.html
Detrend
https://ionides.github.io/531w18/final_project/27/final.html
My midterm project
https://ionides.github.io/531w20/midterm_project/1/Midterm-project.html

STATS 531 Final Project: Candy Production Index(CPI)