myround<- function (x, digits = 1) {
  # taken from the broman package
  if (digits < 1) 
    stop("This is intended for the case digits >= 1.")
  if (length(digits) > 1) {
    digits <- digits[1]
    warning("Using only digits[1]")
  }
  tmp <- sprintf(paste("%.", digits, "f", sep = ""), x)
  zero <- paste0("0.", paste(rep("0", digits), collapse = ""))
  tmp[tmp == paste0("-", zero)] <- zero
  tmp
}

set.seed(2050320976)
stopifnot(packageVersion("pomp")>="2.0")

1 Question Description

Stock Price is of a big interest to many people. When analysing its property, the returns are often used instead of the original prices. This is because stock returns are often found to be uncorrelated. Moreover, because of one characteristic of market index called volatility, which is the degree of variation of returns, a simple time series model such as ARMA model may not be good enough to capture this property. Thus, more advanced models may be needed.

In this project, I will study the time series of APPLE stock price. I will first fit a ARIMA model, then follow with a POMP (partially observed Markov process) model and a GARCH model.

2 Exploratory Data Analysis

The data is from Yahoo. It is the weekly data contains 7 variables and 785 records from 4/23/2005 to 4/23/2020.

dat=read.csv(file="AAPL.csv",header=TRUE)
head(dat)
##         Date     Open     High      Low    Close Adj.Close     Volume
## 1 2005-04-18 5.262857 5.285714 4.985714 5.071429  4.402208  209782300
## 2 2005-04-25 5.212857 5.358572 5.031428 5.151429  4.471652  854398300
## 3 2005-05-02 5.172857 5.332857 5.145714 5.320000  4.617978  531112400
## 4 2005-05-09 5.325714 5.350000 4.730000 4.967143  4.311685 1127723800
## 5 2005-05-16 4.937143 5.382857 4.932857 5.364286  4.656419  736293600
## 6 2005-05-23 5.407143 5.848571 5.407143 5.794286  5.029676  718392500

Here I will use the adjusted close price. Let look at the plot of the data:

aapl=dat$Adj.Close
aapl_ts=ts(aapl,start=2005.17,frequency = 52)
plot(aapl_ts,type='l',ylab='Apple Stock Price',main='Apple Stock Price')

We can see with the spread use of smart phone, the stock price of Apple Inc. witnesses dramatic increase in the last 15 years. It is clear that there are periodical rapid increase in the stock price, which corresponds with the Apple press with product announcement.

We use \(\{z_n^*,n=1,\cdots,n\}\) to denote the data. We can calculate the log return \(y_n^*\) as \[y_n^*=log(z_n^*)-log(z_{n-1}^*)\]

The time series plot of the log return is shown as follows:

aapl_df=diff(log(aapl))
aapl_df = aapl_df-mean(aapl_df)
plot(aapl_df,type='l',xlab='Time',main='Apple Log Return')

Also, we can plot the auto-correlation plot of the log returns:

acf(aapl_df, main='Acf of Log Returns')

From the ACF, we see the log returns are uncorrelated since they all lie in the confidence interval.

And we are ready to fit models.

3 GARCH Model

The GARCH models have become “widely used for financial time series modeling.” Here, we introduce the GARCH(p,q) model. The GARCH(p,q) has the form: \[Y_n = \epsilon_n\sqrt{V_n}\] Where \[V_n = a_0+\sum_{j=1}^pa_jY_{n-1}^2+\sum_{k=1}^qb_kY_{n-k}^2\] and \(\epsilon_{1:N}\) is white noise.

We use the GARCH model as a benchmark since GARCH is a simpler model than POMP. In practice, the GARCH(1,1) model is a popular choice , which can fitted as follows.

fit.garch.benchmark <- garch(aapl_df,grad = "numerical", trace = FALSE)
tseries:::logLik.garch(fit.garch.benchmark)
## 'log Lik.' 1345.298 (df=3)

From the result above, the logLikelihood of GARCH(1,1) model is 1345.298 with 3 parameters.

4 ARMA Model

spectrum(aapl_df, main = "Unsmoothed periodogram")

smoothed_r = spectrum(aapl_df, spans=c(30,30), main = "Smoothed periodogram")
abline(v = min(smoothed_r$freq[findPeaks(smoothed_r$spec)]),col = 'red',lty=2)

1/min(smoothed_r$freq[findPeaks(smoothed_r$spec)])
## [1] 36.36364

The period is around 36 weeks which is half a year. Since Apple press with product announcement usually holds twice a year, the period is quite resonable.

The trend is not clear to observe, so let’s decompress the returns to investigate

de=decompose(ts(aapl_df,start=2005.17,frequency = 52))
plot(de)

The trend is still not so obvious but the seasonality is easy to see.

So let’s look at the random part and try to fit it into an ARMA model based on AIC values:

aic_table <- function(data,P,Q){
  table <- matrix(NA,(P+1),(Q+1))
  for(p in 0:P) {
    for(q in 0:Q) {
      table[p+1,q+1] <- arima(data,order=c(p,0,q))$aic
    }
  }
  dimnames(table) <- list(paste("AR",0:P, sep=""),paste("MA",0:Q,sep=""))
  table
}

temp_aic_table <- aic_table(de$random,5,5)
require(knitr)
## Loading required package: knitr
kable(temp_aic_table,digits=2)
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -2571.94 -2569.98 -2568.06 -2566.61 -2564.70 -2563.07
AR1 -2569.98 -2567.98 -2566.12 -2564.63 -2562.75 -2562.94
AR2 -2568.06 -2566.16 -2577.90 -2568.28 -2575.05 -2561.03
AR3 -2566.64 -2564.67 -2567.43 -2568.68 -2593.97 -2597.00
AR4 -2564.72 -2562.72 -2575.22 -2593.58 -2618.78 -2615.88
AR5 -2563.22 -2562.86 -2560.88 -2597.67 -2592.19 -2618.50

It is easy to see that the ARMA(0,0) model has the smallest AIC. Thus we fit the ARMA(0,0) model.

ar=arima(de$random,order = c(0,0,0))
ar
## 
## Call:
## arima(x = de$random, order = c(0, 0, 0))
## 
## Coefficients:
##       intercept
##          0.0000
## s.e.     0.0015
## 
## sigma^2 estimated as 0.001735:  log likelihood = 1287.97,  aic = -2571.94

We can see the log liklihood for ARMA(0,0) model is 1287.97, which is much smaller than the benchmark GARCH(1,1) log liklihood 1345.298. Then we analyze the residuals of ARMA(0,0) model.

qqnorm(ar$residuals)
qqline(ar$residuals)

The residuals still present a long-tailed distribution which disagrees with the normal assumption. It can also be checked by Shapiro-Wilk test.

shapiro.test(ar$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  ar$residuals
## W = 0.98445, p-value = 5.102e-07

The p-value is very small, which means we should reject the null hypothesis that the residuals follow normal distribution.

5 POMP Model

The phenomenon that negative shocks to a stockmarket index are associated with a subsequent increase in volatility is called leverage. Here, we formally define leverage, \(R_n\) on day \(n\) as the correlation between index return on day \(n − 1\) and the increase in the log volatility from day \(n − 1\) to day \(n\).

\[R_n=\frac{\exp(2G_n)-1}{\exp(2G_n)+1}\]

Where \(\{G_n\}\) is Gaussian random walk.

Then the model is: \[Y_n=\exp(\frac{H_n}{2})\] \[H_n=\mu_n(1-\phi)+\phi H_{n-1}+\beta_{n-1}R_n\exp\{-\frac{H_{n-1}}{2}\}+\omega_n\] \[G_n=G_{n-1}+\nu_n\] Where \(\beta=Y_n\sigma_{\eta}\sqrt{1-\phi^2}\), \(\{\epsilon_n\}\) is an iid \(\mathcal{N}(0,1)\) sequence, \(\{\nu_n\}\) is an iid \(\mathcal{N}(0,\sigma_{\nu}^2)\) sequence, and \(\{\omega_n\}\) is an iid \(\mathcal{N}(0,\sigma_{\omega}^2)\) sequence, \(H_n\) is the log volatility.

5.1 Building a POMP object

## ----names--------------------------------------------------------------------
aapl_statenames <- c("H","G","Y_state")
aapl_rp_names <- c("sigma_nu","mu_h","phi","sigma_eta")
aapl_ivp_names <- c("G_0","H_0")
aapl_paramnames <- c(aapl_rp_names,aapl_ivp_names)
aapl_covarnames <- "covaryt"

## ----rproc--------------------------------------------------------------------
rproc1 <- "
double beta,omega,nu;
omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) * 
sqrt(1-tanh(G)*tanh(G)));
nu = rnorm(0, sigma_nu);
G += nu;
beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
H = mu_h*(1 - phi) + phi*H + beta * tanh( G ) 
* exp(-H/2) + omega;
"
rproc2.sim <- "
Y_state = rnorm( 0,exp(H/2) );
"

rproc2.filt <- "
Y_state = covaryt;
"
aapl_rproc.sim <- paste(rproc1,rproc2.sim)
aapl_rproc.filt <- paste(rproc1,rproc2.filt)


## ----rinit--------------------------------------------------------------------
aapl_rinit <- "
G = G_0;
H = H_0;
Y_state = rnorm( 0,exp(H/2) );
"


## ----measure------------------------------------------------------------------
aapl_rmeasure <- "
y=Y_state;
"

aapl_dmeasure <- "
lik=dnorm(y,0,exp(H/2),give_log);
"


## ----transforms---------------------------------------------------------------
aapl_partrans <- parameter_trans(
  log=c("sigma_eta","sigma_nu"),
  logit="phi"
)

Then I simulate with an arbitrary parameters.

aapl.filt <- pomp(data=data.frame(
  y=aapl_df,time=1:length(aapl_df)),
  statenames=aapl_statenames,
  paramnames=aapl_paramnames,
  times="time",
  t0=0,
  covar=covariate_table(
    time=0:length(aapl_df),
    covaryt=c(0,aapl_df),
    times="time"),
  rmeasure=Csnippet(aapl_rmeasure),
  dmeasure=Csnippet(aapl_dmeasure),
  rprocess=discrete_time(step.fun=Csnippet(aapl_rproc.filt),
                         delta.t=1),
  rinit=Csnippet(aapl_rinit),
  partrans=aapl_partrans
)


params_test <- c(
  sigma_nu = exp(-4.5),  
  mu_h = -0.25,      
  phi = expit(4),    
  sigma_eta = exp(-0.07),
  G_0 = 0,
  H_0=0
)

sim1.sim <- pomp(aapl.filt, 
                 statenames=aapl_statenames,
                 paramnames=aapl_paramnames,
                 rprocess=discrete_time(
                   step.fun=Csnippet(aapl_rproc.sim),delta.t=1)
)

sim1.sim <- simulate(sim1.sim,seed=1,params=params_test)
plot(Y_state~time,data=sim1.sim,type='l',col='red',main='Oringinal vs Simulated',ylab='Log returns')
lines(aapl_df)
legend('topright',legend=c("Original","Simulated"),col=c("black","red"),lty = c(1,1))

We can see this is a poor simulation, but we will use this parameter set as a start to make a local search later.

sim1.filt <- pomp(sim1.sim, 
                  covar=covariate_table(
                    time=c(timezero(sim1.sim),time(sim1.sim)),
                    covaryt=c(obs(sim1.sim),NA),
                    times="time"),
                  statenames=aapl_statenames,
                  paramnames=aapl_paramnames,
                  rprocess=discrete_time(
                    step.fun=Csnippet(aapl_rproc.filt),delta.t=1)
)

5.2 Filtering on simulated data

## ----run_level----------------------------------------------------------------
run_level <- 3
aapl_Np <-           switch(run_level, 100, 1e3, 2e3)
aapl_Nmif <-         switch(run_level,  10, 100, 200)
aapl_Nreps_eval <-   switch(run_level,   4,  10,  20)
aapl_Nreps_local <-  switch(run_level,  10,  20,  20)
aapl_Nreps_global <- switch(run_level,  10,  20, 100)


## ----parallel-setup,cache=FALSE-----------------------------------------------
library(doParallel)
## Loading required package: iterators
## Loading required package: parallel
registerDoParallel()
library(doRNG)
## Loading required package: rngtools
registerDoRNG(34118892)

## ----pf1----------------------------------------------------------------------
stew(file=sprintf("pf1-%d.rda",run_level),{
  t.pf1 <- system.time(
    pf1 <- foreach(i=1:aapl_Nreps_eval,
      .packages='pomp') %dopar% pfilter(sim1.filt,Np=aapl_Np))
},seed=493536993,kind="L'Ecuyer")
(L.pf1 <- logmeanexp(sapply(pf1,logLik),se=TRUE))
##                          se 
## -1.093044e+03  4.743215e-02
## ----mif_setup----------------------------------------------------------------
aapl_rw.sd_rp <- 0.02
aapl_rw.sd_ivp <- 0.1
aapl_cooling.fraction.50 <- 0.5
aapl_rw.sd <- rw.sd(
  sigma_nu  = aapl_rw.sd_rp,
  mu_h      = aapl_rw.sd_rp,
  phi       = aapl_rw.sd_rp,
  sigma_eta = aapl_rw.sd_rp,
  G_0       = ivp(aapl_rw.sd_ivp),
  H_0       = ivp(aapl_rw.sd_ivp)
)    


## ----mif----------------------------------------------------------------------
stew(file=sprintf("mif1-%d.rda",run_level),{
  t.if1 <- system.time({
  if1 <- foreach(i=1:aapl_Nreps_local,
    .packages='pomp', .combine=c) %dopar% mif2(aapl.filt,
      params=params_test,
      Np=aapl_Np,
      Nmif=aapl_Nmif,
      cooling.fraction.50=aapl_cooling.fraction.50,
      rw.sd = aapl_rw.sd)
  L.if1 <- foreach(i=1:aapl_Nreps_local,
    .packages='pomp', .combine=rbind) %dopar% logmeanexp(
      replicate(aapl_Nreps_eval, logLik(pfilter(aapl.filt,
        params=coef(if1[[i]]),Np=aapl_Np))), se=TRUE)
  })
},seed=318817883,kind="L'Ecuyer")

r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],
  t(sapply(if1,coef)))
if (run_level>1) write.table(r.if1,file="aapl_params.csv",
  append=TRUE,col.names=FALSE,row.names=FALSE)

pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,
   data=subset(r.if1,logLik>max(logLik)-20))

## ----box----------------------------------------------------------------------
aapl_box <- rbind(
 sigma_nu=c(0.005,0.05),
 mu_h    =c(-1,0),
 phi = c(0.95,0.99),
 sigma_eta = c(0.5,1),
 G_0 = c(-2,2),
 H_0 = c(-1,1)
)


## ----box_eval-----------------------------------------------------------------
stew(file=sprintf("box_eval-%d.rda",run_level),{
  t.box <- system.time({
    if.box <- foreach(i=1:aapl_Nreps_global,
      .packages='pomp',.combine=c) %dopar% mif2(if1[[1]],
        params=apply(aapl_box,1,function(x)runif(1,x)))
    L.box <- foreach(i=1:aapl_Nreps_global,
      .packages='pomp',.combine=rbind) %dopar% {
         logmeanexp(replicate(aapl_Nreps_eval, logLik(pfilter(
         aapl.filt,params=coef(if.box[[i]]),Np=aapl_Np))), 
           se=TRUE)
       }
  })
},seed=290860873,kind="L'Ecuyer")

r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],
  t(sapply(if.box,coef)))
if(run_level>1) write.table(r.box,file="aapl_params.csv",
  append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.box$logLik,digits=5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1314    1335    1342    1340    1348    1352
pairs(~logLik+log(sigma_nu)+mu_h+phi+sigma_eta+H_0,
  data=subset(r.box,logLik>max(logLik)-10))

lik_max=subset(r.box,logLik==max(logLik))
lik_max
##             logLik  logLik_se    sigma_nu      mu_h       phi sigma_eta
## result.57 1351.834 0.06209907 0.001581687 -6.285642 0.9085966 0.6231066
##                  G_0       H_0
## result.57 -0.3072754 -4.768942

We can see the maximum log likihood is 1351.834, which is larger than the benchmark GARCH likihood 1345.298.

params_test <- c(
  sigma_nu = exp(log(lik_max$sigma_nu)),  
  mu_h = lik_max$mu_h,       
  phi = expit(logit(lik_max$phi)),     
  sigma_eta = exp(log(lik_max$sigma_eta)),
  G_0 = lik_max$G_0,
  H_0=lik_max$H_0
)
sim1.sim <- pomp(aapl.filt, 
                 statenames=aapl_statenames,
                 paramnames=aapl_paramnames,
                 covarnames=aapl_covarnames,
                 rprocess=discrete.time.sim(step.fun=Csnippet(aapl_rproc.sim),delta.t=1)
)
## Warning: 'discrete.time.sim' is deprecated and will be removed in a
## forthcoming release. Use 'discrete_time' instead.
sim1.sim <- simulate(sim1.sim,seed=8,params=params_test)

plot(Y_state~time,data=sim1.sim,type='l',col='red',main='Oringinal vs Simulated',ylab='Log returns')
lines(aapl_df)
legend('topright',legend=c("Original","Simulated"),col=c("black","red"),lty = c(1,1))

We can see this simulation is quite good. Although there are some big fluctuation it does not capture, it fits pretty well.

6 Conclusion

After comparing the ARMA model, the GARCH model and the POMP models, we conclude that the random walk leverage POMP model with randomized starting values is generally the best choice to investigate the financial volatility of Apple stock and GARCH model is a better model than ARMA model when analyzing financial volatility. Moreover, by implementing a POMP model, we can estimate the parameters denoted in the financial model which is remarkbly beneficial for financial study of volatility.

Due to the limited time and the considerable amount of computations, we are unable to provide an optimal presentation of our models. In the future, apart from refining the algorithms by increasing the sample size and the amount of iterations, we can also provide the best estimates for all parameters.

7 Reference