The CSI 300 stock market index (CSI 300), compiled by the China Securities Index Company since April 8, 2005, which is intended to replicate the performance of the top 300 stocks traded on the Shanghai Stock Exchange and the Shenzhen Stock Exchange. It is considered to be a blue-chip index for China stock exchanges, and one of the most representative and typical indices in the world for emerging markets.
This project is interested in the financial volatility and implementation of the POMP model for the CSI 300 index. The benchmark model, \(ARMA(p_a,q_a)-GARCH(p_g,q_g)\), selected by diagnostic tests and plots, will be compared to the POMP model and the results will be discussed.
Since the preference of log-return in financial data analysis, the weekly close prices will be transformed into log-return to eliminate the trends in data and achieve stationary (hopefully). The daily close price is abandoned because it contains too much noise and it will increase the sample size largely. Notice this is not the adjusted close price, which also takes splits and dividends into account.
Here we simply take \(ARMA(p_a,q_a)-GARCH(p_g,q_g)\) as our benchmark model. This model will capture both the ARMA type trend and local volatility dependence. The \(ARMA(p_a,q_a)-GARCH(p_g,q_g)\) model can be expressed by:
\[ \phi(B)X_n = \psi(B)\epsilon_n \] Where \(\phi(B) = 1-\phi_1B-\phi_2B^2 - \dots - \phi_{p_a}B^{p_a}\), \(\psi(B) = 1+ \psi_1B + \phi_2B^2 + \dots + \psi_{q_a}B^{q_a}\) and \[ \epsilon_n = \sigma_n\delta_n. \] Where \(\sigma^2_n = \alpha_0 + \sum_{i=1}^{p_g} \alpha_i\epsilon_{n-i}^2 + \sum_{j=1}^{q_g} \beta_j\sigma^2_{n-j}\). The \(\delta_n\) is the IID white noise this time. And this project will both cover the Gaussian noise \(\delta_n \sim \text{i.i.d. }N(0,2)\) and t-distributed white noise \(\delta_n \sim \text{i.i.d. }TDIST(\nu,0,\sigma^2)\) for our ARMA-GARCH model.
In general, there are two ways to fit the ARMA-GARCH model. The first way is to fit ARMA first and then use residuals to fit the GARCH model. The second way is to fit the ARMA part and GARCH part together. The second way is usually preferred since it’s more reasonable. However, it will cost more time and the first way will provide a more robust model. Thus, we use the first method to fit our ARMA-GARCH model for simplicity. Let’s check the AIC Table below first.
MA0 | MA1 | MA2 | MA3 | |
---|---|---|---|---|
AR0 | -2750.491 | -2750.059 | -2754.433 | -2755.913 |
AR1 | -2750.365 | -2762.168 | -2760.491 | -2759.730 |
AR2 | -2755.679 | -2760.521 | -2759.158 | -2757.206 |
AR3 | -2756.775 | -2759.671 | -2757.207 | -2755.865 |
As we can see, the AIC criteria would prefer the \(ARMA(1,1)\) model. And there are not many violations and convergence problems in the AIC table here. Thus, we would pick \(ARMA(1,1)\) as our ARMA part.
For the GARCH part, a similar process can be applied. But there is a fashion that the GARCH part will not go beyond \(GARCH(1,1)\). So we simply choose \(GARCH(1,1)\) as our GARCH part. Then, the t-distributed white noise and Gaussian white noise are discussed as the table shows below. As we can see, the t-distributed white noise is much better than Gaussian white noise no matter from log-likelihood, AIC or BIC. So our benchmark model would be the \(ARMA(1,1)-GARCH(1,1)\) with t-distributed white noise. And the parameter set would be \((\mu, \phi_1, \psi_1, \omega,\alpha_1,\beta_1,\nu)=(0.00014,-0.82740^*,0.78740^*,0.00002,0.09694^*,0.89620^*, 8.15800^*)\).
Diagnosis | ARMA_GARCH_norm | ARMA_GARCH_t |
---|---|---|
T-test for mu | ✕ | ✕ |
T-test for ar1 | ✓ | ✓ |
T-test for ma1 | ✓ | ✓ |
T-test for omega | ✓ | ✕ |
T-test for alpha1 | ✓ | ✓ |
T-test for beta1 | ✓ | ✓ |
T-test for shape | NA | ✓ |
Jarque-Bera | ✕ | ✓ |
Shapiro-Wilk | ✕ | ✓ |
Ljung-Box-R | ✓ | ✓ |
Ljung-Box-R^2 | ✓ | ✓ |
LM Arch | ✓ | ✓ |
Log Likelihood | 1458.998 | 1469.034 |
AIC | -3.9323 | -3.9568 |
BIC | -3.8949 | -3.9132 |
Here, we will use the same model from Breto (2014) as the lecture slides shows. The model states that \(R_n\) is a random walk on a transformed scale: \[ R_n=\frac{exp{\{2G_n\}}-1}{exp{\{2G_n\}}+1} \] Where \(\{G_n\}\) is the Gaussian random walk. And the model should follows the formula in lecture: \[ \begin{align} Y_n &= \exp{\{H_n/2\}}\epsilon_n \\ H_n &= \mu_h(1-\phi) + \phi H_{n-1} + \beta_{n-1}R_n\exp{\{-H_{n-1}/2\}} + \omega_n \\ G_n &= G_{n-1}+\nu_n \end{align} \] Where \(\beta_n = Y_n \sigma_\eta \sqrt{1-\phi^2}\), \(\epsilon_n \overset{iid}\sim N(0,1)\), \(\nu_n \overset{iid}\sim N(0,\sigma^2_\nu)\), and \(\omega_n \overset{iid}\sim N(0,\sigma^2_\omega)\). Here, \(H_n\) is the log volatility. Moreover, we use the state variable \(X_n=(G_n,H_n,Y_n)\) and the filter particle \(j\) at time \(n-1\) is denoted as: \[ X_{n-1,j}^F=(G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}) \] And the prediction particles at time \(n\) follows: \[ (G_{n,j}^P,H_{n,j}^P)\sim f_{G_n,H_n|G_{n-1},H_{n-1},Y_{n-1}}(g_n|G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}) \] with corresponding weight \(w_{n,j}=f_{Y_n|G_n,H_n}(y_n|G_{n,j}^P,H_{n,j}^P)\).
The codes and simulations are done on the Great Lakes. And the results can be reproduced by using finalp.r, finalp.sbat and CSI500w.rda, which may cost some time. The diagnostic plots provided by IF2 Algorithm are shown below:
The diagnostic plots provided by global likelihood maximization are as follow:
As we can see from those plots, the log-likelihood, \(\sigma_\nu\), \(\phi\), \(G_0\) shows the convergence behavior, but there are some problems with the convergence of \(\sigma_\eta\), \(\mu_h\) and \(H_0\). In general, I would consider the there are some problems with this POMP model itself and it may be improved using different POMP models. The log-likelihood value may suggest the same results and the log-Likelihood table is given below. Moreover, by extracting the parameters found by global likelihood maximization, we have our parameter set for POMP model: \((\sigma_\nu,\mu_h,\phi,\sigma_\eta,G_0,H_0)=(0.0000155, -6.959008, 0.9831476, 0.9128984, 0.2697264, -7.109635)\) corresponding to Log-likelihood 1467 with standard error 0.06871633.
Methods | Min | X1st.Quantile | Median | Mean | X3rd.Quantile | Max |
---|---|---|---|---|---|---|
IF2 Algorithm | 1429 | 1438 | 1443 | 1443 | 1449 | 1459 |
Global Maximization | 1442 | 1462 | 1466 | 1463 | 1466 | 1467 |
Based on the new parameter sets, we can give the simulated values outputted by our model. Here we simply provide 8 simulation paths.
Based on log-likelihood and AIC value, we can see that there is not much difference between the benchmark model and the POMP model. The log-likelihoods of two models are similar and since the similar parameter number, the AIC values are also similar to each other (With ARMA_GARCH -2924 and POMP -2922). In practice, it may suggest that they are both good models or bad models.
There are still some ways to improve ARMA_GARCH. For example, we can implement the ARMA_APGARCH since the left tail and right tail of the residuals seem to be not similar. And the different distribution of residuals can also be used like general error distribution. In summary, All those model give their explanation for self-exciting and local volatility dependence nature of financial data in different ways.
For the POMP model, except for some convergence problem with the parameters, I think there is still some future value prediction problems since, for different random seeds, the simulated paths would be different. But it does provide some novel ideas about volatility and financial data modeling. As we can see from the final simulation, the volatility does cluster, which is very similar to the financial data in the real world. To improve the fit, I think we can also try some other POMP models to fit this data.
[1] CSI stock market index data. Retrieved from https://www.investing.com/indices/csi300-historical-data.
[1] CSI 300 Index. Retrieved from https://en.wikipedia.org/wiki/CSI_300_Index
[2] Ionides, E. L(2020). Case study: POMP modeling to investigate financial volatility. Retrieved from https://ionides.github.io/531w20/.
[3] Ionides, E. L(2020). Extending the ARMA model: Seasonality and trend. Retrieved from https://ionides.github.io/531w20/.
[4] Breto, C. (2014). On idiosyncratic stochasticity of financial leverage effects, Statistics & Probability Letters 91: 20-26.
[5] Ionides, E. L., Nguyen, D., Atchad´e, Y., Stoev, S. and King, A. A. (2015). Inference for dynamic and latent variable models via iterated, perturbed Bayes maps, Proceedings of the National Academy of Sciences of the U.S.A. 112(3): 719-724.