Introduction

This research project is about fitting three different stochastic volatility models, Stochastic Volatility model with Financial leverage, Jump components and GARCH model. we are also analyzing whether our stock daily return is affected by some financial actions significantly in this short period of time. Those financial actions are implanted into model assumptions and constructions. Furthermore, picking a better model judged mainly by their Log-likelihood value is also presented. The data we used is GOOGLE stock which is a strong representative of technology stocks sector.

Reading Data

The GOOGLE stock from August 2014 to April 2016 is the historical data used, which is about 1.5 years daily data. The data is download from Yahoo Finance website (http://finance.yahoo.com/q?s=GOOG). This stock time series data contains daily Open, High, Low, Close, Volume, and Adjusted close price. Usually the most reasonable column to use is the adjusted close price. It is mainly because the adjusted close price has been amended including corporate actions such as dividend before next trading day. By this property, this price is often used for analyzing the historical return. We also use this price to do the research in another class.

goog <- read.csv("goog.csv")
head(goog,3)
##         Date   Open   High     Low  Close  Volume Adj.Close
## 1 2016-04-14 754.01 757.31 752.705 753.20 1131000    753.20
## 2 2016-04-13 749.16 754.38 744.261 751.72 1707100    751.72
## 3 2016-04-12 738.00 743.83 731.010 743.09 1349700    743.09
tail(goog,3)
##           Date     Open     High      Low    Close  Volume Adj.Close
## 415 2014-08-21 583.8226 584.5027 581.1426 583.3727  914800  583.3727
## 416 2014-08-20 585.8827 586.7027 582.5726 584.4926 1036700  584.4926
## 417 2014-08-19 585.0026 587.3427 584.0026 586.8626  978600  586.8626
goog$Adj.Close <- as.numeric(goog$Adj.Close)
goog$Date <- as.Date(goog$Date, format = "%Y-%m-%d")
goog <- goog[,c(1,7)]
par(mfrow = c(1,2))
plot(goog,type = "l", main = "Google stock price", xlab = "Date", ylab = "Price")
plot(log(goog$Adj.Close)~ goog$Date, type = "l", main = "Log-Transformed price", xlab = "Date", ylab = "log(Price)")

By intuition, we are normally using log-transformed stock prices.

We use the mean of log-transformed stock prices with its standard deviation to generate two simulations. These simulations do show some randomness but reasonable enough to let us believe the Gaussian random walk simulation has some connection with our log-transformed stock price change.

Therefore, we can further transform data into a time series called log-return series. Let stock prices be \(\{s_{1:N}^*,n=1,\dots,N\}\), then our log return series is generated by: \[{z_n^*}= \Delta \log {s_n^*} = \log {s_{n}^*}-\log {s_{n-1}^*},\] where \(\{z^*_n,n=2,\dots,N\}\) is log-return for between each day. Now we could plot the demeaned series and its acf.

From acf plot, we see this series looks like an white noise. There are only small number of bars which just slightly across the dashed line.

We often take this demeaned log-return prices series as the stochastic volatilities of a stock. Now we can build our models.

Models & Analysis

In this part, we fit three different stochastic models.

The SV Model with Financial Leverage

By the case study notes in our class, we know a little bit about what is the Financial leverage. It is usually being observed that relatively obvious negative shocks of stocks market index has some correlations with its subsequent increasing volatility. by Investopedia website providing, we understand the definition of financial leverage is:

Financial leverage is the degree to which a company uses fixed-income securities such as debt and preferred equity. The more debt financing a company uses, the higher its financial leverage. A high degree of financial leverage means high interest payments, which negatively affect the company’s bottom-line earnings per share.” (http://www.investopedia.com/)

In our mathematical interpretation, let the leverage be \(R_n\) on day \(n\), then we can define it as the correlation between log return on day \(n-1\) and the dynamic of the log volatility from day \(n-1\) to \(n\).

One of this model researches has been proposed by Bretó at year 2014.

According to his report on this model, \(R_n\) will be a random walk with transformed scale.\[R_n= \frac{\exp\{2G_n\} -1}{\exp\{2G_n\}+1},\] where \(G_n\) is a usual Gaussian random walk (Bretó 2014).

Now the stochastic model is: \[Z_n = \exp(H_n/2)\epsilon_n,\]\[H_n=\mu_h(1-\phi)+{\phi}H_{n-1}+\beta_{n-1}R_{n}\exp(-H_{n-1}/2)+\omega_n,\]\[G_n = G_{n-1} + \nu_n,\]\[\beta_n = {Z_n}\sigma_{\eta}\sqrt{1-\phi^2},\]\[\epsilon_n \sim iid~N(0,1),~\nu_n \sim iid~N(0,\sigma_\nu^2),~\omega_n \sim iid~N(0,\sigma_\omega^2),\] where \(H_n\) is the log volatility. (Bretó 2014)

Then we can start to build our POMP model: In this model, the observe variable is \(Z_n\) which affects two interaction latent process from \((G_n,H_n)\) to \((G_{n+1},~H_{n+1})\). The class note suggests us that we could use state variable \(X_n = (G_n,H_n,Z_n)\) and let \(Z_n\) be the measurement variable to form a pomp model. After that, we define our recursive particle filter with particle \(j\) at time \(n-1\) as \[X_{n-1,j}^F = (G_{n-1,j}^F,H_{n-1,j}^F,z_{n-1}^*)\] then prediction particles at time \(n\) will be : \[(G_{n,j}^P,H_{n,j}^P) \sim f_{G_n,H_n|G_{n-1},H_{n-1},Z_{n-1}}(g_n|G_{n-1,j}^F,H_{n-1,j}^F,z_{n-1}^*)\] with corresponding weight: \[w_{n,j} = f_{Z_n|G_n,H_n}(y_n^*|G_{n,j}^P,H_{n,j}^P).\] The filter detail and weight derivation is in Appendix of (http://ionides.github.io/531w16/notes15/notes15.html).

Take a look back to Financial leverage part, \(R_n\) is fixed when we set \(G_n\)’s standard deviation to 0. We could get general idea that if the financial leverage is fixed by look at our fitted value \(\sigma_\nu\). This part we will mention later.

Here the theortical part is set, now we could code the model into R. Those codes’ original version are generally produced by class notes.

library(pomp)
library(plyr)
library(ggplot2)
require(reshape2)
require(doParallel)
registerDoParallel()
mcopts <- list(set.seed = TRUE)
goog_1_statenames <- c("H", "G", "Y_state")
goog_1_rp_names <- c("sigma_nu", "mu_h", "phi", "sigma_eta")
goog_1_ivp_names <- c("G_0", "H_0")
goog_1_paramnames <- c(goog_1_rp_names, goog_1_ivp_names)
goog_1_covarnames <- "covaryt"
rproc1_1 <- "
  double beta,omega,nu;
  omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) * sqrt(1-tanh(G)*tanh(G)));
  nu = rnorm(0, sigma_nu);
  G += nu;
  beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
  H = mu_h*(1 - phi) + phi*H + beta * tanh( G ) * exp(-H/2) + omega;
  "
rproc2_1.sim <- "
  Y_state = rnorm( 0,exp(H/2) );
  "
rproc2_1.filt <- "
  Y_state = covaryt;
  "
goog_1_rproc.sim <- paste(rproc1_1, rproc2_1.sim)
goog_1_rproc.filt <- paste(rproc1_1, rproc2_1.filt)
goog_1_initializer <- "
  G = G_0;
  H = H_0;
  Y_state = rnorm( 0,exp(H/2) );
"
goog_1_rmeasure <- "
  y=Y_state;
"
goog_1_dmeasure <- "
  lik=dnorm(y,0,exp(H/2),give_log);
"
  • As we learned the iteration process is convenient to transform parameter coefficients onto whole real line. Thus since we know \(\sigma_\eta,~\sigma_\nu\) are defined on positive values and \(\phi\) is between \((0,1)\), we log-transform \(\sigma_\eta\) and \(\sigma_\nu\), logit-transform \(\phi\), then we exponential those transformed value back when we present results. Notice the Ystate we used here is represented for \(Z_n\) in the model.
goog_1_toEstimationScale <- "
  Tsigma_eta = log(sigma_eta);
  Tsigma_nu = log(sigma_nu);
  Tphi = logit(phi);
"
goog_1_fromEstimationScale <- "
  Tsigma_eta = exp(sigma_eta);
  Tsigma_nu = exp(sigma_nu);
  Tphi = expit(phi);
"

We could build our POMP object for filtering as below.

goog_1.filt <- pomp(
  data = data.frame(y = z,
  time = 1:length(z)),
  statenames = goog_1_statenames,
  paramnames = goog_1_paramnames,
  covarnames = goog_1_covarnames,
  times = "time",
  t0 = 0,
  covar = data.frame(covaryt = c(0, z),
  time = 0:length(z)),
  tcovar = "time",
  rmeasure = Csnippet(goog_1_rmeasure),
  dmeasure = Csnippet(goog_1_dmeasure),
  rprocess = discrete.time.sim(step.fun = Csnippet(goog_1_rproc.filt), delta.t = 1),
  initializer = Csnippet(goog_1_initializer),
  toEstimationScale = Csnippet(goog_1_toEstimationScale),
  fromEstimationScale = Csnippet(goog_1_fromEstimationScale)
  )
expit<-function(real){1/(1+exp(-real))}
logit<-function(p.arg){log(p.arg/(1-p.arg))}

Here the model implementation is almost done, we could next test the code see if it is working by giving our test parameters first. Notice, this number is used in the class note.

params_test_1 <- c(
  sigma_nu = exp(-4.5),
  mu_h = -0.25,
  phi = expit(4),
  sigma_eta = exp(-0.07),
  G_0 = 0,
  H_0 = 0
  )

Here we could start to try iterated flitering on our GOOGLE volatility data. The IF2 algorithm we used here is provided by Ionides et al. (2015)

We start by fixing every coefficient’s initial value which is the same as we used for simulation data.

run_level <- 3
goog_1_Np <-          c(100, 1000, 30000)
goog_1_Nmif <-        c(10,  100,   300)
goog_1_Nreps_eval <-  c(10,   10,    20)
goog_1_Nreps_local <- c(10,   20,    40)
goog_1_Nreps_global <- c(10,   40,    40)
goog_1_rw.sd_rp <- 0.01
goog_1_rw.sd_ivp <- 0.01
goog_1_cooling.fraction.50 <- 0.5
stew("mif1_svf_lvl3.rda",{
  t.if1_1 <- system.time({
    if1_1 <- foreach(i=1:goog_1_Nreps_global[run_level],
                   .packages='pomp', .combine=c,
                   .options.multicore=list(set.seed=TRUE)) %dopar% try(
                     mif2(goog_1.filt,
                          start= params_test_1,
                          Np=goog_1_Np[run_level],
                          Nmif=goog_1_Nmif[run_level],
                          cooling.type="geometric",
                          cooling.fraction.50=goog_1_cooling.fraction.50,
                          transform=TRUE,
                          rw.sd = rw.sd(
                            sigma_nu  = goog_1_rw.sd_rp,
                            mu_h      = goog_1_rw.sd_rp,
                            phi       = goog_1_rw.sd_rp,
                            sigma_eta = goog_1_rw.sd_rp,
                            G_0       = ivp(goog_1_rw.sd_ivp),
                            H_0       = ivp(goog_1_rw.sd_ivp)
                          )
                     )
                   )
    L.if1_1 <- foreach(i=1:goog_1_Nreps_global[run_level],.packages='pomp',
                     .combine=rbind,.options.multicore=list(set.seed=TRUE)) %dopar%
                     {
                       logmeanexp(
                         replicate(goog_1_Nreps_eval[run_level],
                                   logLik(pfilter(goog_1.filt,params=coef(if1_1[[i]]),Np=goog_1_Np[run_level]))
                         ),
                         se=TRUE)
                     }
  })
},seed=3176,kind="L'Ecuyer")
t.if1_1
##       user     system    elapsed 
## 341759.436   1626.405  26822.410

It cost about 7.5 hours to finish computation. Now lets take a look at the result:

r.if1_1 <- data.frame(logLik=L.if1_1[,1],logLik_se=L.if1_1[,2],t(sapply(if1_1,coef)))
summary(r.if1_1$logLik,digits=5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  1131.6  1132.3  1132.5  1132.6  1132.8  1133.3

The Log-likelihood in this model for this particular set of initial parameters values are generally at \(1132.5\), with standard error around \(0.56\). Later we will compare this value to other models.

From the Log-likelihood pair plots, we could get a generally idea about which interval each coefficient is possibly in. We could see \(\sigma_\nu\) is between \((0,0.01)\), \(\mu_h\) is between \((-2,0)\), \(\phi\) is between \((0.998,1)\) which is really close to 1, and \(\sigma_\eta\) is located around \(10\) with a very large interval.

pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,data=subset(r.if1_1,logLik>max(logLik)-20))

plot(if1_1)

Next we could check the convergence plot and last iteration diagnostics plot. We see the Log-likelihood is converging really fast before 50 iterations. the reason those coefficients expect \(\phi\) has less obvious convergence is probably because of the size of ylab. However, we could still find a suitable interval for next searching. For the diagnostics plot, we see the effective sample size has major drops when Ystate is observed significant fluctuations. \(H_n\) is and Ystate is quite good fitted while \(G_n\) has similar trend between each line with different magnitude of fluctuations, this is because of the value of \(\sigma_\nu^2\) in the \(\nu_n.\)

Now we could find the maximum likelihood by using randomly picked coefficient values in our selected interval. First we give our coefficient interval by looking at the result from last plot.

goog_1_box_1 <- rbind(
  sigma_nu=c(0.001,0.01),
  mu_h    =c(-2,0),
  phi = c(0.998,0.999),
  sigma_eta = c(7,13),
  G_0 = c(-2,2),
  H_0 = c(-1,1)
)
stew(file="box_eval_svf_lvl3.rda",{
  t.box_1 <- system.time({
    if.box_1 <- foreach(i=1:goog_1_Nreps_global[run_level],.packages='pomp',.combine=c,
                      .options.multicore=list(set.seed=TRUE)) %dopar%  
      mif2(
        if1_1[[1]],
        start=apply(goog_1_box_1,1,function(x)runif(1,x[1],x[2]))
      )
    
    L.box_1 <- foreach(i=1:goog_1_Nreps_global[run_level],.packages='pomp',.combine=rbind,
                     .options.multicore=list(set.seed=TRUE)) %dopar% {
                       set.seed(8793+i)
                       logmeanexp(
                         replicate(goog_1_Nreps_eval[run_level],
                                   logLik(pfilter(goog_1.filt,params=coef(if.box_1[[i]]),Np=goog_1_Np[run_level]))
                         ), 
                         se=TRUE)
                     }
  })
},seed=2900873,kind="L'Ecuyer")
t.box_1
##       user     system    elapsed 
## 216710.519    511.266  16603.362

It takes 4.6 hours to finish the computation.

r.box_1 <- data.frame(logLik=L.box_1[,1],logLik_se=L.box_1[,2],t(sapply(if.box_1,coef)))
summary(r.box_1$logLik,digits=5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  1042.0  1114.3  1121.0  1119.3  1127.5  1134.7

Here we see that the randomness causes the drop of likelihood. Comparing with the first set of iterations plots, we could conclude that the test set of parameters are probably useful across different financial properties and different time intervals.

pairs(~~logLik+sigma_nu+mu_h+phi+sigma_eta,data=subset(r.box_1,logLik>max(logLik)-30))

plot(if.box_1)

The iteration plots is also telling the same story. The convergence quality is generally worse than the previous plots.

To further discuss the significance of the changing of financial leverage with time, we normally uses profile likelihood to construct a confidence interval, then observe if this interval contains 0 or not. However, by the limitation of time, I could not finish constructing the confidence interval of \(\sigma_\nu\). On the other hand, we could still look at convergence plot to get a rough idea about how \(\sigma_\nu\) change with time.

summary(r.if1_1$sigma_nu)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0003519 0.0032270 0.0065090 0.0102900 0.0129100 0.0372600

By this very informal analysis of \(\sigma_\nu\), we could not say it is changing with time confidently. Since numbers tend to locate close to 0. Hence, even there is a change with time in Financial leverage, it could be really small. The result might be different from the SP500 dataset analyzed by Bretó (2014), but it could be possible because we only choose a relatively short 1.5 years’ period which U.S. economy is stable. Also we picked a different stock other than SP500 INDEX, the result could be various.

The SV Model with Jump Components

Financial jump model is motivated by the limitation of fitting jumps in returns in using GARCH(1,1) and SVf models. Those jumps usually happen when some news about market, company or anything related to this stock. The information published will cause a market get under stress with less liquid. That why Nikolaus Hautsch & Yanggouyi Ou (2008) use a SV frame with jumps to natrually combine those jumps information into models. Thus, the model is: \[Z_n = K_nQ_n + exp({H_n/2})\epsilon_n,\]\[\epsilon_n \sim N(0,1),\] \[H_n = \mu + \phi(H_{n-1}-\mu) + \eta_n,\]\[\eta_n \sim N(0,\sigma_\eta),\]\[K_n \sim N(\alpha_k,\beta_k),~Q_n \sim Bernoulli(\kappa).\] (Nikolaus Hautsch & Yanggouyi Ou 2008)

Similarly, we build our POMP project: state variables \(X_n = (H_n, Q_n, K_n)\), also \(Z_n\) is the measurement model. The recursive particle filter and prediction particles are also defined similarly.

Then we could build our second pomp model. In the code, I let every variable be a state variable which is only for avoiding bugs. Actually there is only one state process \((H_n, Q_n, K_n)\).

goog_statenames <- c("Q","K","H","Y","Y_state")
goog_rp_names <- c("k","alpha_k","beta_k","mu","phi","sigma_Y")
goog_ivp_names <- c("H_0")
goog_paramnames <- c(goog_rp_names,goog_ivp_names)
goog_covarnames <- "covaryt"
rproc1 <- "
Y = rnorm(0,sigma_Y);
H = mu + phi*(H - mu) + Y;
K = rnorm(alpha_k,sqrt(beta_k));
Q = rbinom(1,k);
"
rproc2.sim <- "
  Y_state = rnorm( K*Q , exp(H/2));
"
rproc2.filt <- "
  Y_state = covaryt;
"
goog_rproc.sim  <- paste(rproc1,rproc2.sim)
goog_rproc.filt <- paste(rproc1,rproc2.filt)
goog_initializer <- "
  H = H_0;
  Y_state = rnorm( K*Q,exp(H/2));
"
goog_rmeasure <- "
  y = Y_state;
"
goog_dmeasure <- "
  lik = dnorm(y,K*Q,exp(H/2),give_log);
"
  • Since \(\kappa\) & \(\phi\) are between \((0,1)\), \(\beta_k\) & \(\sigma_\eta\) has to be positive. We do logit transform to \(\kappa\) & \(\phi\) and log transform to \(\beta_k\) & \(\sigma_\eta\) during the iteration.
goog_fromEstimationScale <- "
Tk = expit(k);
Talpha_k = alpha_k;
Tbeta_k = exp(beta_k);
Tmu = mu;
Tphi = expit(phi);
Tsigma_Y = exp(sigma_Y);
"
goog_toEstimationScale <- "
Tk = logit(k);
Talpha_k = alpha_k;
Tbeta_k = log(beta_k);
Tmu = mu;
Tphi = logit(phi);
Tsigma_Y = log(sigma_Y);
"
goog.filt <- pomp(data=data.frame(y=z, time=1:length(z)),
                  statenames=goog_statenames,
                  paramnames=goog_paramnames,
                  covarnames=goog_covarnames,
                  times="time",
                  t0=0,
                  covar=data.frame(covaryt=c(0,z),time=0:length(z)),
                  tcovar="time",
                  rmeasure=Csnippet(goog_rmeasure),
                  dmeasure=Csnippet(goog_dmeasure),
                  rprocess=discrete.time.sim(step.fun=Csnippet(goog_rproc.filt),delta.t=1),
                  initializer=Csnippet(goog_initializer),
                  toEstimationScale=Csnippet(goog_toEstimationScale),
                  fromEstimationScale=Csnippet(goog_fromEstimationScale)
)

It is fortunate that our reference also provided us a suggested set of coefficients, we could simply apply those numbers into models for fitting our GOOGLE series.

params_test <- c(
  k = 0.01,
  alpha_k = -0.005,
  beta_k =  (0.03)^2,
  mu = -9.1,
  phi = 0.990,
  sigma_Y = 0.125,
  H_0 = 0.1
)
run_level <- 3
goog_Np <-          c(100, 1000, 30000)
goog_Nmif <-        c( 10,  100,   300)
goog_Nreps_eval <-  c( 10,   10,    20)
goog_Nreps_local <- c( 10,   20,    40)
goog_Nreps_global <-c( 10,   40,    40)
goog_rw.sd_rp <- 0.01
goog_rw.sd_ivp <- 0.01
goog_cooling.fraction.50 <- 0.5
stew("mif1_svj_lvl3.rda",{
  t.if1 <- system.time({
    if1 <- foreach(i=1:goog_Nreps_global[run_level],
                   .packages='pomp', .combine=c,
                   .options.multicore=list(set.seed=TRUE)) %dopar% try(
                     mif2(goog.filt,
                          start= params_test,
                          Np=goog_Np[run_level],
                          Nmif=goog_Nmif[run_level],
                          cooling.type="geometric",
                          cooling.fraction.50=goog_cooling.fraction.50,
                          transform=TRUE,
                          rw.sd = rw.sd(
                            k = goog_rw.sd_rp,
                            alpha_k = goog_rw.sd_rp,
                            beta_k =  goog_rw.sd_rp,
                            mu = goog_rw.sd_rp,
                            phi = goog_rw.sd_rp,
                            sigma_Y = goog_rw.sd_rp,
                            H_0 = ivp(goog_rw.sd_ivp)
                          )
                     )
                   )
    
    L.if1 <- foreach(i=1:goog_Nreps_global[run_level],.packages='pomp',
                     .combine=rbind,.options.multicore=list(set.seed=TRUE)) %dopar%
                     {
                       logmeanexp(
                         replicate(goog_Nreps_eval[run_level],
                                   logLik(pfilter(goog.filt,params=coef(if1[[i]]),Np=goog_Np[run_level]))
                         ),
                         se=TRUE)
                     }
  })
},seed=3176,kind="L'Ecuyer")
t.if1
##       user     system    elapsed 
## 311540.059   1629.368  12963.331

This time we take 3.6 hours. Let’s check the Log-likelihood below.

r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],t(sapply(if1,coef)))
summary(r.if1$logLik,digits=5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  1174.8  1175.2  1175.4  1175.3  1175.5  1175.6

The mean of this value is around 1175.4 and standard error is 0.28.

It looks like this model works better than SVf model a little bit, because there is only about 40 difference between two likelihood values.

pairs(~logLik+k+alpha_k+beta_k+mu+phi+sigma_Y,data=subset(r.if1,logLik>max(logLik)-10))

plot(if1)

From the pairwise plot, \(\kappa\) is located between \((0.002,0.010)\), \(\alpha_k\) is \((-0.06,-0.10)\), \(\beta_k\) is in \((0,0.004)\), \(\mu\) is between \((-8.80,-8.78)\), \(\phi\) is \((0.35,0.45)\) which is kind of a large interval, \(\sigma_\eta\) is located around \(0.65\).

Comparing with the coefficient in Discrete-Time Stochastic Volatility Models and MCMC-Based Statistical Inference, this interval we get here is really close to the confidence interval which computed in this paper. Despite the difference of stocks and time intervals in two research, we may say that this model also tends to have similarly level of coefficients. We may be able to say that the financial situation we considered hasn’t change a lot in a big financial environment with a relatively long period of time. If we want to analysis this furthermore, we have to try using many different stocks in various periods of time to verify the consistence of coefficients. At this stage of research, we only can say that those two sets of parameters don’t a clear sign to against what we indicated.

The convergence plots show all coefficients including Log-likelihood convergence obviously. The only interesting part as same as the SVf model is the effective sample size will majorly drop when the Ystate has a severe fluctuation. This effective sample size dropping with certainly affects what we fit in all state variables.

I also do the local search trying to find a more precise value for each coefficient. As expected, the result did not show significant changing of the values of coefficients convergence. Therefore, for conciseness of the report, the computation result of this part is saved in Appendix at the end.

Testing significance of \(\kappa\) by Profile likelihood

In this part, we are going to test if financial jump in this model is statistical significant. As our model indicates: \[Q_n \sim Bernoulli(\kappa).\] \(\kappa\) is the probability of Bernoulli distribution. Now we want to find out if this probability is 0 or not. before we try to find the confidence interval, we observed \(\beta_k\) is almost 0, which probably means that during this time period, the affection of financial jump does not change dramatically.

We could use profile likelihood to find a confidence interval about where \(\kappa\) is located. If the interval is not including 0. then we could conclude it is statistical significant. This method is inspired by (http://kingaa.github.io/sbied/measles/measles-profile.html).

First we fixed all the coefficients except \(\kappa\). For those values, we picked set of coefficients when Log-likelihood value researches the maximum in our iteration. Then \(\kappa\) is picked from

load("box_eval_svj_lvl3.rda")
r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],t(sapply(if.box,coef)))
goog_mle <- coef(if.box[[which.max(r.box$logLik)]])
fixed_paras <- goog_mle[-1]
names(fixed_paras) <- names(goog_mle[-1])
stew(file= "profile_like_lvl3_1.rda" ,{
  LOGLIK=c()
  for (k in seq(0.005,0.009,1e-04)){
    j=round(((k-0.005)/1e-04)+1,0)
    (fixed_params <- c("k" = k,fixed_paras))
    t.box_pf <- system.time({
      if.box_pf <- foreach(i=1:goog_Nreps_local[run_level],.packages='pomp',.combine=c,
                           .options.multicore=list(set.seed=TRUE)) %dopar%  
        mif2(
          if.box[[which.max(r.box$logLik)]],
          Np=goog_Np[run_level],
          Nmif=goog_Nmif[run_level],
          start=fixed_params
        )
      
      L.box_pf <- foreach(i=1:goog_Nreps_local[run_level],.packages='pomp',.combine=rbind,
                          .options.multicore=list(set.seed=TRUE)) %dopar% {
                            set.seed(8799+i)
                            logmeanexp(
                              replicate(goog_Nreps_eval[run_level],
                                        logLik(pfilter(goog.filt,params=coef(if.box[[i]]),Np=goog_Np[run_level]))
                              ), 
                              se=TRUE)
                          }
    })
      t_global_eval <- system.time({
        liks_global <- foreach(i=1:goog_Nreps_local[run_level],.packages='pomp',.combine=rbind, .options.multicore=mcopts) %dopar% {
          evals <- replicate(goog_Nreps_eval[run_level], logLik(pfilter(goog.filt,params=coef(if.box_pf[[i]]),Np=goog_Np[run_level])))
          logmeanexp(evals, se=TRUE)
        }
      })
    print(j)
    print(k)
    LOGLIK[j]=mean(liks_global[,1])
    print(LOGLIK[j])
  }
})
Index=which(LOGLIK>=(max(LOGLIK)-1.92))
plot(seq(0.005,0.009,1e-04),LOGLIK,xlab="k",ylab="profile-likelihood",type="l")
lines(loess(LOGLIK~seq(0.005,0.009,1e-04))$fitted~seq(0.005,0.009,1e-04))
abline(h = max(LOGLIK)-1.92*sd(LOGLIK),lty=2, lwd=1, col = "red")

In this proflie likelihood, we just observed lower bound of confidence interval, which is approximately around 0.007. Despite we don’t have the upper bound here, this answer is still accpetable. Because our confidence interval does not include 0, we could conclude that in this time, GOOGLE stocks return does affected by Fianancial jump under our model assupmtion.

GARCH model

As we know, fitting a GARCH model is a very classical and quick way other than those two SV model previously. The GARCH(1,1) is often widely used (Cowpertwait and Metcalfe 2009). For a Generalized GARCH(1,1) model, the formula is: \[Z_n = \epsilon_n \sqrt{V_n},\]\[V_n = \alpha_0 + \alpha_1 Z_{n-1}^2 + \beta_1 V_{n-1},\] and \(\epsilon_{1:N}\) is white noise.

Here is the result we simply fitting the GARCH(1,1) model.

library(tseries)
garch(z,grad = "numerical", trace = FALSE)
## 
## Call:
## garch(x = z, grad = "numerical", trace = FALSE)
## 
## Coefficient(s):
##        a0         a1         b1  
## 0.0002422  0.0500000  0.0500000
logLik(garch(z,grad = "numerical", trace = FALSE))
## 'log Lik.' 1116.697 (df=3)

The Log-likelihood of GARCH(1,1) is 1116.697.

Conclusion

This Log-likelihood value for GARCH(1,1) is little bit lower than what we have in SVf model. We might roughly tell SVf model is generally better. However, the Log-likelihood of SVf is also less than SVj model. Notice that since GARCH(1,1) only takes about 1 sec to get the estimated coefficients while SVj and SVf takes hours computation. Also SVf and SVj have more parameters to fit. The likelihood surely will increase by intuition. If we just need a model with fast and relatively high log-likelihood, GARCH(1,1) is a very good choice. On the other hand, if we do need an accurate model with more financial affections are considered, SVj and SVf are providing good directions. Especially under the condition of this google stock and a kind of short period of time, SVj model works better slightly.

Reference

  1. Breto, C., “On idiosyncratic stochasticity of financial leverage effects”, Statistics & Probability Letters 91:20-26.
  2. Hautsch, N. Ou, Y., “Discrete-Time Stochastic Volatility Models and MCMC-Based Statistical Inference”, SFB 649 Discussion Paper 2008-063.
  3. http://finance.yahoo.com/q?s=GOOG
  4. Ionides, E., “Case study: POMP modeling to investigate financial volatility”, http://ionides.github.io/531w16/notes11/notes11.html
  5. Ionides, E., “Statistical methodology for nonlinear partially observed Markov process models”, http://ionides.github.io/531w16/notes15/notes15.html
  6. “Complete Guide To Corporate Finance”, http://www.investopedia.com/walkthrough/corporate-finance/5/capital-structure/financial-leverage.aspx
  7. King, A., “Measles profile computation”, http://kingaa.github.io/sbied/measles/measles-profile.html

Appendix

Local searching for SV Jump model

##       user     system    elapsed 
## 204747.967    478.094   8435.193
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  1128.2  1132.9  1171.9  1160.9  1172.5  1175.3