In the field of financial market, data often exhibit volatility clustering. It is more common to see time-varying volatility than constant volatility. It is important to get accurate modeling of time-varying volatility in financial engineering. The purpose of investing is to make a profit. Investors expect to get revenues that are higher than the initial investments. Returns measure the change in price relative to the assets being held. \[R_t = \frac{P_t - P_{t-1}}{P_{t-1}}\] where \(P_t\) is the price of an asset at time t and \(R_t\) means the return at time t.
Log returns (\(r_t\)) are approximately equal to returns when returns are close to zero. It makes computation simple after transforming dividend to subtraction.
\[r_t = log(P_t) - log(P_{t-1})\] In this project, returns denote log returns.
Our goal is to choose a model that describes data best. In this project, we try to apply time series model, such as ARMA, GARCH and ARMA-GARCH model, and nonlinear POMP model derived from adding stochastically time-varying parameters to a time series model.
The Facebook stock dataset comes from yahoo finance website. It contains 1259 observations of daily closing price from April 15th, 2015 to April 15th, 2020. Below shows the closing prices and log closing prices over these five years.
Then we get the log return randomly distributed around 0. And the variance varys differently during different time periods. High volatility usually clusters together.
In ACF plots, we can see that some larger autocorrelations on log-returns and squared log-returns exist.
Let’s get started with ARMA model. What we do first is to decide where to start in terms of values of p and q based on ARMA model. Thus, we tabulate AIC tables below.
MA0 | MA1 | MA2 | MA3 | |
---|---|---|---|---|
AR0 | -6301.588 | -6307.332 | -6306.112 | -6305.499 |
AR1 | -6307.699 | -6306.557 | -6304.575 | -6305.703 |
AR2 | -6306.352 | -6304.571 | -6324.953 | -6322.903 |
AR3 | -6305.147 | -6305.738 | -6303.714 | -6324.393 |
From the AIC table, we prefer to choose ARMA(2,2) with the lowest AIC value -6324.953. We refit to the data in ARMA(2,2) and obtain the value of each parameter. Outputs are present as follows,
##
## Call:
## arima(x = fb_diff_return, order = c(2, 0, 2), optim.control = list(maxit = 1000))
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept
## -1.7588 -0.9266 1.6947 0.8627 6e-04
## s.e. 0.0451 0.0560 0.0587 0.0732 5e-04
##
## sigma^2 estimated as 0.00038: log likelihood = 3168.48, aic = -6324.95
From the acf plot of residuals, ARMA(2,2) model indeed succeeds in decreasing autocorrelations when lags are greater than 1. However, there are still some larger autocorrelations existing on squared residuals from the second plot. Also, heavier tails than normal distribution happen.
In order to lower autocorrelations of squared residuals, we consider to use GARCH model to fit volatility dependent on time. The GARCH(P,Q) for \(Y_n\) is as follows
\[Y_n = \epsilon_n \sqrt{V_n}\] \[V_n = \alpha_0 + \Sigma_{j=1}^P\alpha_j Y_{n-j}^2 + \Sigma_{k=1}^Q\beta_k V_{n-k}\] where {\(\epsilon_n\)} is a mean-zero weak white-noise process (we still apply the assumption of normal distribution in this section).
Now, we select the optimal (P, Q) pair for GARCH model with the lowest AIC. The table is shown below.
q = 1 | q = 2 | q = 3 | q = 4 | |
---|---|---|---|---|
p = 1 | -6519.24 | -6507.67 | -6500.67 | -6488.84 |
p = 2 | -6504.91 | -6510.30 | -6483.68 | -6496.18 |
p = 3 | -6490.40 | -6501.11 | -6494.29 | -6497.01 |
p = 4 | -6484.60 | -6488.94 | -6480.60 | -6493.31 |
From the table, it suggests that GARCH(1,1) have the smallest value of AIC. Also, due to its simplicity, GARCH(1,1) is the best choice. Its fitting info is shown below. We can see that all the parameters are significant in GARCH(1,1) model and Jarque Bera Test on residuals tells us that the error assumption of normal distribution does not make sense.
Call:
garch(x = fb_diff_return, order = c(1, 1), grad = "analytical", trace = FALSE)
Coefficient(s):
Estimate Std. Error t value Pr(>|t|)
a0 7.217e-05 6.727e-06 10.73 <2e-16 ***
a1 3.319e-01 2.556e-02 12.98 <2e-16 ***
b1 5.439e-01 3.192e-02 17.04 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Diagnostic Tests:
Jarque Bera Test
data: Residuals
X-squared = 26192, df = 2, p-value < 2.2e-16
Below are shown three plots. From left to right, they are the acf plot of residuals, acf plot of squared residuals and QQ-plot under normal distribution, respectively. The first two ACF plots shows better performance of GARCH(1,1) model. But the problem of heavy tails is still not solved.
So far, the problem for heavy tails still occurs as shown in last section so that we have to take it into consideration. Also, we hope to further refine our current garch model by adding ARMA model. The ARMA(p,q) with GARCH(1,1) errors for \(Y_n\) corrresponds to \[Y_n = \Sigma_{i=1}^p\phi_i Y_{n-i} + \Sigma_{j=1}^q\psi_j\epsilon_{n-j} + \epsilon_n\] where the noise process {\(\epsilon_n\)} is a GARCH(P,Q) process
\[\epsilon_n = \sigma_n\delta_n\] \[\sigma_n^2 = \alpha_o+ \Sigma_{i=1}^P\alpha_i \epsilon_{n-i} + \Sigma_{j=1}^Q \beta_j \sigma_{n-j}^2\] where \(\delta_n\) is iid t-distribution in order to fix the problem for heavy tails.
From the course note14, We have known that GARCH is a black-box model, in the sense that the parameters don’t have clear interpretation. Further, ARMA-GARCH model is more complicated than GARCH model. In practice, we hope that the ARMA part should have been simpler. Since GARCH(1,1) performed relatively well on data, we decide to remain this part in the mixed model. Thus, we end up fitting four different mixed models with t-distribution, ARMA(0,0)-GARCH(1,1), ARMA(0,1)-GARCH(1,1), ARMA(1,0)-GARCH(1,1) and ARMA(1,1)-GARCH(1,1). The log-likelihood and AIC are shown below, respectively.
LogLike | AIC | |
---|---|---|
ARMA(0,0)-GARCH(1,1) | 3431.70 | -6858.85 |
ARMA(1,0)-GARCH(1,1) | 3453.36 | -6882.18 |
ARMA(0,1)-GARCH(1,1) | 3434.06 | -6861.56 |
ARMA(1,1)-GARCH(1,1) | 3456.57 | -6886.61 |
We notice that as ARMA part goes more and more complex, the log-likelihood function gets larger and AIC becomes smaller simultaneously. In addition, the information of estimations for paramaters in ARMA(1,1)-GARCH(1,1) model is also listed below. We find \(\phi_1\)(ar1), \(\psi_1\)(ma1), \(\alpha_1\) and \(\beta_1\) are all significantly existing in model.
Call:
garchFit(formula = ~arma(1, 1) + garch(1, 1), data = fb_diff_return, cond.dist = "std", trace = F)
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu 2.295e-04 1.175e-04 1.954 0.0508 .
ar1 8.366e-01 8.010e-02 10.445 <2e-16 ***
ma1 -8.890e-01 6.482e-02 -13.716 <2e-16 ***
omega 1.346e-05 9.120e-06 1.475 0.1401
alpha1 1.253e-01 4.974e-02 2.520 0.0117 *
beta1 8.571e-01 6.112e-02 14.024 <2e-16 ***
shape 3.448e+00 3.626e-01 9.510 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log Likelihood:
3456.57 normalized: 2.733888
Most importantly, we see that most of points are close to the red line, which means t-distribution should be reasonably included in the error assumption.
Finally, we present a pomp implementation of Breto (2014), which models \(R_n\) as a random walk on a transformed scale \[R_n = \frac{e^{2G_n}-1}{e^{2G_n}+1}\] where {\(G_n\)} is the usual Gaussian random walk.
Following the notation and model representation in equation (4) of , we propose a model, \[ \begin{align} Y_n &= e^{H_n/2} \epsilon_n, \\ H_n &= \mu_h(1-\phi) + \phi H_{n-1} + \beta_{n-1}R_ne^{-H_{n-1}/2} + \omega_n,\\ G_n &= G_{n-1}+\nu_n, \end{align} \] where \(\beta_n=Y_n\sigma_\eta\sqrt{1-\phi^2}\), \(\{\epsilon_n\}\) is an iid \(N(0,1)\) sequence, \(\{\nu_n\}\) is an iid \(N(0,\sigma_{\nu}^2)\) sequence, and \(\{\omega_n\}\) is an iid \(N(0,\sigma_\omega^2)\) sequence.
Here, \(H_n\) is the log volatility.
library(pomp)
fb_statenames = c("H","G","Y_state")
fb_rp_names = c("sigma_nu","mu_h","phi","sigma_eta")
fb_ivp_names = c("G_0","H_0")
fb_paramnames = c(fb_rp_names, fb_ivp_names)
rproc1 = "
double beta,omega,nu;
omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) *
sqrt(1-tanh(G)*tanh(G)));
nu = rnorm(0, sigma_nu);
G += nu;
beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
H = mu_h*(1 - phi) + phi*H + beta * tanh( G )
* exp(-H/2) + omega;
"
rproc2.sim = "Y_state = rnorm( 0,exp(H/2) );"
rproc2.filt = "Y_state = covaryt;"
fb_rproc.sim = paste(rproc1, rproc2.sim)
fb_rproc.filt = paste(rproc1, rproc2.filt)
fb_rinit = "
G = G_0;
H = H_0;
Y_state = rnorm( 0,exp(H/2) );
"
fb_rmeasure = "y=Y_state;"
fb_dmeasure = "lik=dnorm(y,0,exp(H/2),give_log);"
fb_partrans = parameter_trans(
log=c("sigma_eta","sigma_nu"),
logit="phi"
)
fb.filt = pomp(data=data.frame(
y=fb_diff_return, time=1:length(fb_diff_return)),
statenames=fb_statenames,
paramnames=fb_paramnames,
times="time",
t0=0,
covar=covariate_table(
time=0:length(fb_diff_return),
covaryt=c(0,fb_diff_return),
times="time"),
rmeasure=Csnippet(fb_rmeasure),
dmeasure=Csnippet(fb_dmeasure),
rprocess=discrete_time(step.fun=Csnippet(fb_rproc.filt),
delta.t=1),
rinit=Csnippet(fb_rinit),
partrans=fb_partrans
)
Next, we use IF2 algorithm to get the maximum likelihood to help us compare POMP model with time series model. Some Diagnoses are also shown below. They are about convergences of parameters and the value of log-likelihood.
library(doParallel)
registerDoParallel()
library(doRNG)
registerDoRNG(34118892)
run_level = 3
fb_Np = switch(run_level, 100, 1e3, 2e3)
fb_Nmif = switch(run_level, 10, 100, 200)
fb_Nreps_eval = switch(run_level, 4, 10, 20)
fb_Nreps_local = switch(run_level, 10, 20, 20)
fb_Nreps_global = switch(run_level, 10, 20, 100)
params_test = c(
sigma_nu = exp(-4.5),
mu_h = -0.25,
phi = expit(4),
sigma_eta = exp(-0.07),
G_0 = 0,
H_0=0
)
fb_rw.sd_rp = 0.02
fb_rw.sd_ivp = 0.1
fb_cooling.fraction.50 = 0.5
fb_rw.sd = rw.sd(
sigma_nu = fb_rw.sd_rp,
mu_h = fb_rw.sd_rp,
phi = fb_rw.sd_rp,
sigma_eta = fb_rw.sd_rp,
G_0 = ivp(fb_rw.sd_ivp),
H_0 = ivp(fb_rw.sd_ivp)
)
stew(file=sprintf("mif1-%d.rda",run_level),{
t.if1 = system.time({
if1 = foreach(i=1:fb_Nreps_local, .packages= 'pomp', .combine=c) %dopar%
mif2( fb.filt,
params=params_test,
Np=fb_Np,
Nmif=fb_Nmif,
cooling.fraction.50=fb_cooling.fraction.50,
rw.sd = fb_rw.sd)
L.if1 = foreach(i=1:fb_Nreps_local, .packages= 'pomp', .combine=rbind) %dopar%
logmeanexp(
replicate(fb_Nreps_eval,
logLik(pfilter(fb.filt, params=coef(if1[[i]]),Np=fb_Np))), se=TRUE)
})
},seed=318817883,kind="L'Ecuyer")
r.if1 = data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2], t(sapply(if1,coef)))
summary(r.if1$logLik, digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3366 3369 3370 3380 3376 3419
plot(if1)
After simlulating the process, we can see that the maximum log-likelihood is 3419. It is a little bit less than that of ARMA(1,1)-GARCH(1,1) model (3456). For the convergence diagnostics, we see that the maximum log-likelihood we estimated only happen in some of particles and there are a few particles in which algorthmic parameters are not converged. It is not sufficient to get to the conclusion that POMP model is stable and appropriate for data.
To check the stability of POMP model, we decide to randomize the initial values and then simulate processes.
fb_box = rbind(
sigma_nu=c(0.005,0.05),
mu_h =c(-0.5,0),
phi = c(0.95,0.99),
sigma_eta = c(0.5,1),
G_0 = c(-1,1),
H_0 = c(-0.5,0.5)
)
stew(file=sprintf("box_eval-%d.rda",run_level),{
t.box = system.time({
if.box = foreach(i=1:fb_Nreps_global, .packages= "pomp", .combine=c) %dopar%
mif2(if1[[1]],
params=apply(fb_box,1,function(x)runif(1,x)))
L.box = foreach(i=1:fb_Nreps_global, .packages= "pomp", .combine=rbind) %dopar% {
logmeanexp(replicate(fb_Nreps_eval,
logLik(pfilter(fb.filt,params=coef(if.box[[i]]),Np=fb_Np))), se=TRUE)}
})
}, seed=290860873,kind="L'Ecuyer")
r.box = data.frame(logLik=L.box[,1],logLik_se=L.box[,2], t(sapply(if.box,coef)))
summary(r.box$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3358 3366 3372 3383 3415 3421
plot(if.box)
From diagnostics plots above, we see that
\(\sigma_{\nu}\), \(\mu_h\), \(G_0\) and \(H_0\) are convergent after a few 40 MIF iterations
log likelihood are also convergent at larger value in more particles. And the maximum is equal to around 3421
\(\phi\), \(\sigma_{\eta}\) are not as stable as other parameters. However, most of them finally converge into a certain range.
In terms of the maximum log likehood among these model, I would conclude that ARMA(1,1)-GARCH(1,1) with t-distribution is a good choice to fit the financial volatility of Facebook stock, because it has the smallest log likehood. However, as instructor says, ‘GARCH is a black-box model, in the sense that the parameters don’t have clear interpretation’. Since we forcely define a model which has financial meaning and time series meaning, POMP Model is much easier to be interpreted. Furthermore, their log likehoods have not an abvious different. Thus, POMP model is better with regards to interpretation.
However, we still notice the unstability in POMP model, no matter when the initial values are fixed or randomly sampled. It may be because the model we pick up is not adequate too much. Modifying the model might be a good idea if possible in the furture.
Facebook stock data come from https://finance.yahoo.com/quote/FB/history?period1=1429056000&period2=1586908800&interval=1d&filter=history&frequency=1d
stats531 winter2020 course note14 https://ionides.github.io/531w20/14/notes14.pdf
stats531 winter2018 final project2 https://ionides.github.io/531w18/final_project/2/final.html
stats531 winter2018 final project19 https://ionides.github.io/531w18/final_project/19/final.html
stats531 winter2018 final project40 https://ionides.github.io/531w18/final_project/40/final.html
Carles Bretó, On idiosyncratic stochasticity of financial leverage effects
David Ruppert, Statistics and Data Analysis for Financial Engineering, Chapter18