In financial markets, investors are always interested in investigating the financial volatility for predicting the price trend of certain stocks. “volatility is the degree of variation of a trading price series over time as measured by the standard deviation of logarithmic returns.” [1] [2]
In this project, we study the stock data of Google (Alphabet Inc.), which is one of largest information technology companies of the world. For the large companies such as Google, the large fluctuations of its stock can possibly result in the profound influence on the entire financial market. Thus, it will be interesting and meaningfui to study the trend of Google stock.
The historical financial data of Google can be downloaded from Yahoo Finance (https://finance.yahoo.com/quote/GOOG/history?p=GOOG). This dataset consists of 7 variables and 2669 observations. In this project, we use the adjusted close price from 2007 to 2017 to investigate the financial volatility of Google stock.
Adjusted Close Price:\(\{z_n*,n=1,...,N\}\) Log Return: \(log(z_n^*)\)
## Warning in strptime(x, format, tz = "GMT"): unknown timezone 'zone/tz/
## 2018c.1.0/zoneinfo/America/Detroit'
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 127.9 262.1 331.6 440.8 575.6 1077.1
In order to obtain a demeaned stationary dataset, we apply a difference operation to the log return and then remove the mean. [3]
Difference of Log Return: \(y_n^*=log(z_n^*)-log(z_{n-1}^*)\)
We can see the high volatility around 480 index when the 2008 financial crisis took place.
The GARCH models have become “widely used for financial time series modeling.” [7] Here, we introduce the GARCH(p,q) model. The GARCH(p,q) has the form: \[Y_n=\epsilon_n\sqrt{V_n}\]
where \[V_n=a_0+\sum_{j=1}^pa_jY_{n-j}^2+\sum_{k=1}^qb_kY_{n-k}^2\] and \(\epsilon_{1:N}\) is white noise.
We use the GARCH model as a benchmark since GARCH is a simpler model than POMP. In practice, the GARCH(1,1) model is a popular choice (Cowpertwait and Metcalfe 2009 [5]), which can fitted as follows.
GARCH(1,1) model
## Loading required package: tseries
## Warning: package 'tseries' was built under R version 3.4.4
From the result above, the logLikelihood of GARCH(1,1) model is 7222.91 with 3 parameters.
Previously, we perform GARCH model to predict the financial volatulity. However, the parameters in the GARCH model are not explanatory. To better understand the correlation between daily return and volatility, the stochastic POMP model will be presented.
“\(R_n\) is formally defined as leverage on day n as the correlation between index return on day (n-1) and the inincrease in the log volatility from day (n-1) to day n.”[7] Here, we introduce a pomp implementation of Breto (2014) [4], which models \(R_n\) as a random walk on a transformed scale \[R_n=\frac{exp(2G_n)-1}{exp(2G_n)+1}\] where \(G_n\) is the usual, Gaussian random walk.
Then we continue to build the POMP model following the notations from Breto (2014) [4].
(Denote that \(H_n=log(\sigma_n^2)=2log(\sigma^n)\)) \[Y_n=exp(H_n/2)\epsilon_n\] \[H_n=\mu_h(1-\phi)+\phi H_{n-1}+\beta_{n-1}R_nexp(-H_{n-1}/2)+W_n\] \[G_n=G_{n-1}+v_n\] where \[\beta_n=Y_n\sigma_{\eta}\sqrt{1-\phi^2}\] \[\sigma_{\omega}=\sigma_{\eta}\sqrt{1-\phi^2}\sqrt{1-R_n^2}\] \[\epsilon_n \sim i.i.d. N(0,1)\] \[v_n \sim i.i.d. N(0,\sigma_v^2)\] \[w_n \sim i.i.d. N(0,\sigma_{\omega}^2)\]
Here, we choose the iterated filtering algorithm (IF2) [6] to converge toward the region of parameter space maximizing the maximum likelihood. In this case, we use the state variable \(X_n=(G_n,H_n,Y_n)\). [7]
Filter particle j at time (n-1) is denoted as: \[X_{n-1,j}^F=(G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}^*)\]
Prediction particles at time n are denoted as: \[(G_{n,j}^p,H_{n,j}^p)\sim f_{G_n,H_n|G_{n-1},H_{n-1},Y_{n-1}}(g_n|G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}^*)\] with corresponding weight \(w_{n,j}=f_{Y_n|G_n,H_n}(y_n^*|G_{n,j}^P,H_{n,j}^P)\)
## Loading required package: pomp
## Warning: package 'pomp' was built under R version 3.4.3
## Loading required package: doParallel
## Loading required package: foreach
## Warning: package 'foreach' was built under R version 3.4.3
## Loading required package: iterators
## Warning: package 'iterators' was built under R version 3.4.3
## Loading required package: parallel
The initial values and the starting values of parameters are set at the first step.
params_test <- c(
sigma_nu = 0.10,
mu_h = -9.0,
phi = 0.02,
sigma_eta = 0.02,
G_0=0,
H_0=0
)
goog_rw.sd_rp <- 0.02
goog_rw.sd_ivp <- 0.01
goog_cooling.fraction.50 <- 0.5
We can see that the maximum value of logLikelihood (\(=7746\)) shown below is obviously larger than the value generated from the GARCH(1,1) model (\(\sim7223\)).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7162 7732 7735 7706 7738 7746
From the plots, we can summarize that there is a larger possibility of obtaining the maximum logLiklihood when \(\sigma_\nu\) is close to 0, and \(\phi\) is close to 0.92. Other parameters does not show obvious patterns of their relationship with logLiklihood.
Instead of setting a fixed group of starting values which is exactly what we do at the previous step, we begin to randomly select the starting values inside the box of parameter vectors. The parameter vectors are determined based on the summarized patterns of the parameter plots above.
Here is the box of parameter vectors.
goog_box <- rbind(
sigma_nu=c(0.0002,0.002),
mu_h =c(-8.7,-8.6),
phi = c(0.90,0.92),
sigma_eta = c(0.95,1.05),
G_0 = c(-2,2),
H_0 = c(-1,1)
)
Although the maximum logLikelihood (\(=7744\)) is slightly smaller than that (\(=7746\)) in the POMP model with fixed parameters, the values of logLikelihood actually appears much denser with the minimum 7581 comparing to the minimum 7162 in the previous POMP model.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7581 7708 7729 7723 7738 7744
Since we randomly select the starting values, the parameters here look more diverse than the parameters in the model with fixed starting values. In this plot, \(\mu_h\) is pretty dense around 0. Moreover, the pattern of the relationship between \(\phi\) and logLikelihood still seems to exist, but remains to be checked afterwards.
The likelihood does not converge very fast in this case. As we can see, the logLikelihood begins to converge after 150 iterations. We should increase the sample size and the number of iterations in the future study. In addtion, the outlier appearing in \(\sigma_\eta\) shows great influence in this plot. Since the extreme value is too large, we are not able to clearly identify the true pattern of \(\sigma_\eta\). As we know that occasional numerical failures in mif2 like this are not uncommon, and also the starting values are randomized at this step, the proper way we can implement next to make any improvement is probably to refine the parameters box and then re-run the algorithms.
From the diagnostics plots, we observe that logLiklihood will increase as \(\phi\) increases to 1. Therefore, we are going to investigate \(\phi\) by constructing profile likelihood.[8]
\[\{\phi:max\{l^{profile}(\phi)\}-l^{profile}(\phi)<1.92\}\]
## Loading required package: plyr
## Loading required package: ggplot2
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## lower upper
## 0.9012433 0.9012433
From the result above, we discover a problem that both the lower bound and the upper bound of the 95% confidence interval are 0.9012433. The reason why this problem arises probably is that the reference statistic 1.92 is too small comparing to the logLikelihood which is about 7700. We can only conclude that the estimate of \(\phi\) is around 0.90 right now. However, we still need to narrow down the range of Phi to a much smaller size based on the finding from this result to precisely locate the confidence interval, such as from 0.895 to 0.905, at the next step.
After comparing the GARCH model and the POMP models, we conclude that the random walk leverage POMP model with randomized starting values is generally the best choice to investigate the financial volatility of Google stock. Moreover, by implementing a POMP model, we can estimate the parameters denoted in the financial model which is remarkbly benefial for financial study of volatility.
Due to the limited time and the considerable amount of computations, we are unable to provide an optimal presentation of our models. In the future, apart from refining the algorithms by increasing the sample size and the amount of iterations, we can also provide the best estimates for all parameters, not only \(\phi\). Last, we also have to find proper method to solve the outlier problem in this case.
[1] https://en.wikipedia.org/wiki/Volatility_(finance)
[3] Edward Ionides, “6.2 ARMA models for differenced data” from class notes, https://ionides.github.io/531w18/06/notes06.html
[4] Bretó, C. 2014. On idiosyncratic stochasticity of financial leverage effects. Statistics & Probability Letters 91:20–26.
[5] Cowpertwait, P.S., and A.V. Metcalfe. 2009. Introductory time series with R. Springer Science & Business Media.
[6] Ionides, E.L., D.Nguyen, Y.Atchadé, S.Stoev, and A.A. King. 2015. Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. Proceedings of the National Academy of Sciences of the U.S.A. 112:719–724.
[7] Edward Ionides, “14. Case study: POMP modeling to investigate financial volatility”, https://ionides.github.io/531w18/14/notes14.html#arch-and-garch-models
[8] Yitong Chen, https://ionides.github.io/531w16/final_project/Project11/final.html