Volatility is a statistical measure of the dispersion of returns for a given market index(investopedia). The volatility in the stock market refers to the price swing around the mean price. That is the risks of assets. The volatility is usually measured by the standard deviation of logarithmic returns(Wikipedia). To avoid marketing risk, we need to investigate the volatility of a market before investing. In this project, we will focus on the volatility of the Chinese stock market.
After the financial crisis of 2007–2008, Chinese growth remained quite robust despite the damage to many of the nation's main export markets. It seems that the Chinese stock market drove bank business and stabilized the inflationary during the recession(Richard C. K. Burdekin, 2012). It suggests that the main index of the Chinese stock market, the Shanghai Composite Index, might be a good indicator of China's economy. The Shanghai Composite Index is the most commonly used indicator to reflect Shanghai stock market performance(Wikipedia).
In this project, we build two different models for the volatility of the Shanghai Composite Index. We first investigate the index using a GARCH model as a benchmark model. Then, we implement the POMP model to characterize the volatility of the Shanghai Composite Index and compare the result to the GARCH model.
The index data are downloaded from the website "https://www.investing.com/indices/shanghai-composite-historical-data." The data includes a total of 570 observations of weekly average closing price from 03/21/2010 to 04/11/2021. The prices are shown below. The left plot refers to the time series of original prices, while the right one refers to the time series of the logarithm of the prices. The red line in the plot indicates the mean prices and mean log prices of the index in this period, respectively.
To investigate the volatility, we need to calculate the return, which is the difference of the log of the index. Mathematically, \(y^{*}_n=log(z_n)-log(z_{n-1})\), where \(z_n\) refers to the index value of week \(n\). The demeaned return is plotted below. The demeaned return is calculated by subtracting the mean of the return.
The demeaned log-return looks appropriate to fit a stationary model with white noise. There is no significant peak or trend in the plot. Then, we can check that this time series has negligible sample autocorrelation by plotting the ACF.
The above plot shows that there is no significant autocorrelation for \(lag\# > 0\). Therefore, we can safely assume that the data are all independent.
The generalized autoregressive conditional heteroskedasticity model, also known as GARCH(p, q) model, is widely used in time series analysis in the area of finance (Ionides, 2021). The GARCH(p, q) model has the form: \[Y_n=\epsilon_n \sqrt{V_n}\] where \(V_n=\alpha_0 + \sum_{j=1}^{p} \alpha_j Y^2_{n-j} + \sum_{k=1}^{q} \beta_k V_{n-k}\) and \(\epsilon_{1:N}\) is white noise. Although, the GARCH(1, 1) model is a popular choice for time series analysis (Cowpertwait and Metcalfe, 2009), we still want to evaluate several different GARCH model and choose the best fitted model.
To decide the value of \(p\) and \(q\) of the GARCH(p, q) model, we will start by tabulating Akaike Information Criteria (AIC). A model with low AIC values implies a precise prediction.
## Warning: package 'tseries' was built under R version 4.0.5
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Loading required package: knitr
q1 | q2 | q3 | q4 | q5 | |
---|---|---|---|---|---|
p1 | -2528.11 | -2457.28 | -2512.66 | -2503.48 | -2483.75 |
p2 | -2520.71 | -2457.67 | -2462.48 | -2483.95 | -2462.72 |
p3 | -2488.26 | -2452.10 | -2463.09 | -2482.35 | -2462.77 |
p4 | -2481.85 | -2446.72 | -2458.70 | -2469.56 | -2462.96 |
p5 | -2433.96 | -2456.40 | -2454.19 | -2464.62 | -2477.80 |
The lowest value of the AIC criteria above is -2528.11 from the GARCH(1, 1) model. This model coincidently agrees with the most popular GARCH model. Therefore, we choose the GARCH(1, 1) model without further formal statistical tests.
We evaluate the performance of our model by checking a model summary, a QQ-plot of residuals, and residuals autocorrelation plot.
## Warning: package 'fGarch' was built under R version 4.0.5
## Warning: package 'timeSeries' was built under R version 4.0.5
## Warning: package 'fBasics' was built under R version 4.0.5
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
## Consider formula(paste(x, collapse = " ")) instead.
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(1, 1), data = demeaned, trace = F)
##
## Mean and Variance Equation:
## data ~ garch(1, 1)
## <environment: 0x000000001dcd3600>
## [data = demeaned]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1 beta1
## -1.3457e-18 3.2484e-05 1.4342e-01 8.2193e-01
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu -1.346e-18 9.707e-04 0.000 1.0000
## omega 3.248e-05 1.831e-05 1.774 0.0761 .
## alpha1 1.434e-01 3.857e-02 3.719 0.0002 ***
## beta1 8.219e-01 4.994e-02 16.458 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 1269.577 normalized: 2.231243
##
## Description:
## Wed Apr 21 17:40:08 2021 by user: HaoRong
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 29.46482 3.997562e-07
## Shapiro-Wilk Test R W 0.9882871 0.0001608002
## Ljung-Box Test R Q(10) 6.734039 0.750292
## Ljung-Box Test R Q(15) 8.039864 0.9221745
## Ljung-Box Test R Q(20) 10.70246 0.9535767
## Ljung-Box Test R^2 Q(10) 3.186122 0.9766983
## Ljung-Box Test R^2 Q(15) 5.322779 0.9890227
## Ljung-Box Test R^2 Q(20) 10.05697 0.9671272
## LM Arch Test R TR^2 4.300299 0.9773878
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -4.448425 -4.417889 -4.448523 -4.436510
The summary above suggests that our GARCH(1, 1) model is \(V_n= 0.143 Y^2_{n-1} + 0.822 V_{n-1}\). The summary also shows that the log-likelihood of this model is 1269.58. This value would be the benchmark of our analysis. We would compare the log-likelihood of the POMP model with this benchmark method later.
Then, we draw the QQ-plot for residuals as below:
The QQ-plot suggests that the residuals of the GARCH(1, 1) model have a heavy tail distribution. It violates the normality assumption of the residuals in the GARCH model. One possible explanation is that the sample is a little biased to the true distribution.
Finally, we check the autocorrelation plot to determine whether the residuals are uncorrelated.
Since there is no significant correlation other than lag 0, we conclude that residuals are uncorrelated.
Then we utilized the POMP model proposed in the lecture to analyze the volatility of SSE Composite Index. The equation and notations that we build for this POMP model are adopted from Breto (2014). We denote \(H_n=log(\sigma^2_n)=2log(\sigma_n)\) and the model is following:
\[\begin{align} Y_n &= exp(H_n/2) \epsilon_n \\ H_n &= \mu_h (1-\phi) + \phi H_{n-1} + \beta_{n-1} R_n exp(-H_{n-1}/2) + \omega_n \\ G_n &= G_{n-1}+\nu_n \\ \end{align}\] where, \[\begin{align} \beta_n &= Y_n \sigma_{\eta} \sqrt{1-\phi^2} \\ \sigma_{\omega} &= \sigma_{\eta} \sqrt{1-\phi^2} \sqrt{1-R_n^2} \\ \epsilon_n &\overset{i.i.d}{\sim} N(0, 1) \\ \nu_n &\overset{i.i.d}{\sim} N(0, \sigma_{\nu}^2) \\ \omega_n &\overset{i.i.d}{\sim} N(0, \sigma_{\omega}^2) \\ Rn &= \frac{exp(2G_n)-1}{exp(2G_n)+1} \end{align}\]The motivation of applying the POMP model on the index analysis is the financial leverage, which relates to the correlation between returns of previous day and the increasing in the log volatility.
let's check the function of pfilter for further investigation.
## user system elapsed
## 0.0 0.0 16.2
## se
## -800.60730780 0.03561261
Then, we perform the local search for maximum log-likelihood by using mif2 function in POMP package. A database was created to store the likelihood information.
## Warning: executing %dopar% sequentially: no parallel backend registered
## user system elapsed
## 1031.38 0.81 1034.03
## Warning in write.table(r.if1, file = "Shanghai_params.csv", append = TRUE, :
## appending column names to file
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1213 1220 1226 1227 1233 1244
From the plot above, we see that the variable phi is stable around 1. The maximum log-likelihood using local search is 1244. Other parameters are fluctuated between certain range. we conduct a global search method with random starting values and figure out the maximized likelihood for this POMP model.
## user system elapsed
## 2561.27 1.11 2566.47
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1187 1228 1236 1239 1260 1264
From the summary we see that the maximum log-likelihood using global search is 1264, which is better than the result using local parameter search.
Since the variable \(\phi\) is shown in each term of our POMP model, we could further investigating the profile likelihood of \(\phi\) to see whether \(\phi\) should be close to 1 as suggested in local search. Another reason for checking the profile likelihood of \(\phi\) is that it seems a strong positive relationship between \(\phi\) and log-likelihood. That is, as \(\phi\) increasing and getting close to 1, the log-likelihood also increases.
The plot above looks contradicted to our assumption. The plot above suggests that the maximum log-likelihood over \(\phi\) is achieved when $= $. As \(\phi\) appraoches 1, the likelihood becomes unstable.
In this project, we applied two statistical models for analysis of volatility of SSE Composite Index. First, we applied a GARCH(1, 1) model as our benchmark. The model suggests that the volatility should slightly shift positively as time moving forward. Then, we performs a simulation study using the POMP model. After comparing the GARCH model and the POMP models, we see that the POMP model have even worse log-likelihood score. This could due to the limitation of time and computational sources. The final POMP model in this project may not have the optimal parameters.
For future work, it is necessary to use this preliminary results from the POMP model and increase the computational force to achieve a better result subsequently. Another possible promotion for the POMP model is to add more features to describe the process in a more sophisticated way.
Description of individual contributions removed for anonymity.