1. Introduction

Volatility is a statistical measure of the dispersion of returns for a given market index(investopedia). The volatility in the stock market refers to the price swing around the mean price. That is the risks of assets. The volatility is usually measured by the standard deviation of logarithmic returns(Wikipedia). To avoid marketing risk, we need to investigate the volatility of a market before investing. In this project, we will focus on the volatility of the Chinese stock market.

After the financial crisis of 2007–2008, Chinese growth remained quite robust despite the damage to many of the nation's main export markets. It seems that the Chinese stock market drove bank business and stabilized the inflationary during the recession(Richard C. K. Burdekin, 2012). It suggests that the main index of the Chinese stock market, the Shanghai Composite Index, might be a good indicator of China's economy. The Shanghai Composite Index is the most commonly used indicator to reflect Shanghai stock market performance(Wikipedia).

In this project, we build two different models for the volatility of the Shanghai Composite Index. We first investigate the index using a GARCH model as a benchmark model. Then, we implement the POMP model to characterize the volatility of the Shanghai Composite Index and compare the result to the GARCH model.

2. Exploratory analysis

2.1. Data exploration

The index data are downloaded from the website "https://www.investing.com/indices/shanghai-composite-historical-data." The data includes a total of 570 observations of weekly average closing price from 03/21/2010 to 04/11/2021. The prices are shown below. The left plot refers to the time series of original prices, while the right one refers to the time series of the logarithm of the prices. The red line in the plot indicates the mean prices and mean log prices of the index in this period, respectively.

To investigate the volatility, we need to calculate the return, which is the difference of the log of the index. Mathematically, $y^{*}_n=log(z_n)-log(z_{n-1})$, where $z_n$ refers to the index value of week $n$. The demeaned return is plotted below. The demeaned return is calculated by subtracting the mean of the return.

The demeaned log-return looks appropriate to fit a stationary model with white noise. There is no significant peak or trend in the plot. Then, we can check that this time series has negligible sample autocorrelation by plotting the ACF.

The above plot shows that there is no significant autocorrelation for $lag\# > 0$. Therefore, we can safely assume that the data are all independent.

3. GARCH model analysis

The generalized autoregressive conditional heteroskedasticity model, also known as GARCH(p, q) model, is widely used in time series analysis in the area of finance (Ionides, 2021). The GARCH(p, q) model has the form: \[Y_n=\epsilon_n \sqrt{V_n}\] where $V_n=\alpha_0 + \sum_{j=1}^{p} \alpha_j Y^2_{n-j} + \sum_{k=1}^{q} \beta_k V_{n-k}$ and $\epsilon_{1:N}$ is white noise. Although, the GARCH(1, 1) model is a popular choice for time series analysis (Cowpertwait and Metcalfe, 2009), we still want to evaluate several different GARCH model and choose the best fitted model.

3.1 GARCH model selection

To decide the value of $p$ and $q$ of the GARCH(p, q) model, we will start by tabulating Akaike Information Criteria (AIC). A model with low AIC values implies a precise prediction.

## Warning: package 'tseries' was built under R version 4.0.5

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: knitr

	q1	q2	q3	q4	q5
p1	-2528.11	-2457.28	-2512.66	-2503.48	-2483.75
p2	-2520.71	-2457.67	-2462.48	-2483.95	-2462.72
p3	-2488.26	-2452.10	-2463.09	-2482.35	-2462.77
p4	-2481.85	-2446.72	-2458.70	-2469.56	-2462.96
p5	-2433.96	-2456.40	-2454.19	-2464.62	-2477.80

The lowest value of the AIC criteria above is -2528.11 from the GARCH(1, 1) model. This model coincidently agrees with the most popular GARCH model. Therefore, we choose the GARCH(1, 1) model without further formal statistical tests.

3.2 GARCH model assessment

We evaluate the performance of our model by checking a model summary, a QQ-plot of residuals, and residuals autocorrelation plot.

## Warning: package 'fGarch' was built under R version 4.0.5

## Warning: package 'timeSeries' was built under R version 4.0.5

## Warning: package 'fBasics' was built under R version 4.0.5

## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
##   Consider formula(paste(x, collapse = " ")) instead.

## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 1), data = demeaned, trace = F) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 1)
## <environment: 0x000000001dcd3600>
##  [data = demeaned]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##          mu        omega       alpha1        beta1  
## -1.3457e-18   3.2484e-05   1.4342e-01   8.2193e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##          Estimate  Std. Error  t value Pr(>|t|)    
## mu     -1.346e-18   9.707e-04    0.000   1.0000    
## omega   3.248e-05   1.831e-05    1.774   0.0761 .  
## alpha1  1.434e-01   3.857e-02    3.719   0.0002 ***
## beta1   8.219e-01   4.994e-02   16.458   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  1269.577    normalized:  2.231243 
## 
## Description:
##  Wed Apr 21 17:40:08 2021 by user: HaoRong 
## 
## 
## Standardised Residuals Tests:
##                                 Statistic p-Value     
##  Jarque-Bera Test   R    Chi^2  29.46482  3.997562e-07
##  Shapiro-Wilk Test  R    W      0.9882871 0.0001608002
##  Ljung-Box Test     R    Q(10)  6.734039  0.750292    
##  Ljung-Box Test     R    Q(15)  8.039864  0.9221745   
##  Ljung-Box Test     R    Q(20)  10.70246  0.9535767   
##  Ljung-Box Test     R^2  Q(10)  3.186122  0.9766983   
##  Ljung-Box Test     R^2  Q(15)  5.322779  0.9890227   
##  Ljung-Box Test     R^2  Q(20)  10.05697  0.9671272   
##  LM Arch Test       R    TR^2   4.300299  0.9773878   
## 
## Information Criterion Statistics:
##       AIC       BIC       SIC      HQIC 
## -4.448425 -4.417889 -4.448523 -4.436510

The summary above suggests that our GARCH(1, 1) model is $V_n= 0.143 Y^2_{n-1} + 0.822 V_{n-1}$. The summary also shows that the log-likelihood of this model is 1269.58. This value would be the benchmark of our analysis. We would compare the log-likelihood of the POMP model with this benchmark method later.

Then, we draw the QQ-plot for residuals as below:

The QQ-plot suggests that the residuals of the GARCH(1, 1) model have a heavy tail distribution. It violates the normality assumption of the residuals in the GARCH model. One possible explanation is that the sample is a little biased to the true distribution.

Finally, we check the autocorrelation plot to determine whether the residuals are uncorrelated.

Since there is no significant correlation other than lag 0, we conclude that residuals are uncorrelated.

4. POMP model analysis

4.1 Build the POMP model

Then we utilized the POMP model proposed in the lecture to analyze the volatility of SSE Composite Index. The equation and notations that we build for this POMP model are adopted from Breto (2014). We denote $H_n=log(\sigma^2_n)=2log(\sigma_n)$ and the model is following:

\[\begin{align} Y_n &= exp(H_n/2) \epsilon_n \\ H_n &= \mu_h (1-\phi) + \phi H_{n-1} + \beta_{n-1} R_n exp(-H_{n-1}/2) + \omega_n \\ G_n &= G_{n-1}+\nu_n \\ \end{align}\] where, \[\begin{align} \beta_n &= Y_n \sigma_{\eta} \sqrt{1-\phi^2} \\ \sigma_{\omega} &= \sigma_{\eta} \sqrt{1-\phi^2} \sqrt{1-R_n^2} \\ \epsilon_n &\overset{i.i.d}{\sim} N(0, 1) \\ \nu_n &\overset{i.i.d}{\sim} N(0, \sigma_{\nu}^2) \\ \omega_n &\overset{i.i.d}{\sim} N(0, \sigma_{\omega}^2) \\ Rn &= \frac{exp(2G_n)-1}{exp(2G_n)+1} \end{align}\]

The motivation of applying the POMP model on the index analysis is the financial leverage, which relates to the correlation between returns of previous day and the increasing in the log volatility.

let's check the function of pfilter for further investigation.

##    user  system elapsed 
##     0.0     0.0    16.2

##                          se 
## -800.60730780    0.03561261

4.2 MLE from local search

Then, we perform the local search for maximum log-likelihood by using mif2 function in POMP package. A database was created to store the likelihood information.

## Warning: executing %dopar% sequentially: no parallel backend registered

##    user  system elapsed 
## 1031.38    0.81 1034.03

## Warning in write.table(r.if1, file = "Shanghai_params.csv", append = TRUE, :
## appending column names to file

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1213    1220    1226    1227    1233    1244

From the plot above, we see that the variable phi is stable around 1. The maximum log-likelihood using local search is 1244. Other parameters are fluctuated between certain range. we conduct a global search method with random starting values and figure out the maximized likelihood for this POMP model.

4.3 MLE from global search

##    user  system elapsed 
## 2561.27    1.11 2566.47

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1187    1228    1236    1239    1260    1264

From the summary we see that the maximum log-likelihood using global search is 1264, which is better than the result using local parameter search.

4.4 profile likelihood over $\phi$

Since the variable $\phi$ is shown in each term of our POMP model, we could further investigating the profile likelihood of $\phi$ to see whether $\phi$ should be close to 1 as suggested in local search. Another reason for checking the profile likelihood of $\phi$ is that it seems a strong positive relationship between $\phi$ and log-likelihood. That is, as $\phi$ increasing and getting close to 1, the log-likelihood also increases.

The plot above looks contradicted to our assumption. The plot above suggests that the maximum log-likelihood over $\phi$ is achieved when $= $. As $\phi$ appraoches 1, the likelihood becomes unstable.

5. Conclusions

In this project, we applied two statistical models for analysis of volatility of SSE Composite Index. First, we applied a GARCH(1, 1) model as our benchmark. The model suggests that the volatility should slightly shift positively as time moving forward. Then, we performs a simulation study using the POMP model. After comparing the GARCH model and the POMP models, we see that the POMP model have even worse log-likelihood score. This could due to the limitation of time and computational sources. The final POMP model in this project may not have the optimal parameters.

For future work, it is necessary to use this preliminary results from the POMP model and increase the computational force to achieve a better result subsequently. Another possible promotion for the POMP model is to add more features to describe the process in a more sophisticated way.

6. Work distribution

Description of individual contributions removed for anonymity.

7. References

Ionides, L. E. (2021). "STATS/DATASCI 531(winter 2021), Lecture Notes, chapter 16: A case study of financial volatility and a POMP model with observations driving latent dynamics"
Kidus Asfaw, Edward L. Ionides and Aaron A. King. (2021). "STATS/DATASCI 531(winter 2021), Lecture Notes, chapter 14: Likelihood maximization for POMP models"
Ionides, L. E. (2021). "STATS/DATASCI 531(winter 2021), Lecture Notes, chapter 3: Stationarity, white noise, and some basic time series models"
Wikipedia: Volatility. URL:https://en.wikipedia.org/wiki/Volatility_(finance). access at 04/20/2021.
Wikipedia: Shanghai_Stock_Exchange. URL:https://en.wikipedia.org/wiki/Shanghai_Stock_Exchange. access at 04/20/2021.
Ionides, L. E. (2016). "STATS/DATASCI 531(winter 2021), Lecture sources, winter 2016 Final project: 'Financial Volatility Analysis with SV-in-Mean Model in Pomp'"
Ionides, L. E. (2018). "STATS/DATASCI 531(winter 2021), Lecture sources, winter 2018 Final project: 'Investigate Financial Volatility of Google Stock'"
Ionides, L. E. (2020). "STATS/DATASCI 531(winter 2021), Lecture sources, winter 2020 Final project: 'POMP Model Analysis with CSI 300 Stock Index'"
Ionides, L. E. (2020). "STATS/DATASCI 531(winter 2021), Lecture sources, winter 2020 Final project: 'Financial Volatility Analysis of Alibaba Stock'"
Breto, C. (2014). On idiosyncratic stochasticity of financial leverage effects, Statistics & Probability Letters 91: 20-26.
Ionides, E. L., Nguyen, D., Atchad´e, Y., Stoev, S. and King, A. A. (2015). Inference for dynamic and latent variable models via iterated, perturbed Bayes maps, Proceedings of the National Academy of Sciences of the U.S.A. 112(3): 719-724.
Cowpertwait, P.S., and A.V. Metcalfe. (2009). Introductory time series with R. Springer Science & Business Media.
Yu, J. (2005). On leverage in a stochastic volatility model. Journal of Econometrics. Volume 127, Issue 2, Pages 165-178.
Burdekin, Richard C. K., Barth, James R., Song, Frank M., Zhou, Zhongfei. (2012). China after the Global Financial Crisis. Economics Research International.

Volatility analysis on the Shanghai Composite Index

April 20, 2021