myround<- function (x, digits = 1) {
# taken from the broman package
if (digits < 1)
stop("This is intended for the case digits >= 1.")
if (length(digits) > 1) {
digits <- digits[1]
warning("Using only digits[1]")
}
tmp <- sprintf(paste("%.", digits, "f", sep = ""), x)
zero <- paste0("0.", paste(rep("0", digits), collapse = ""))
tmp[tmp == paste0("-", zero)] <- zero
tmp
}
set.seed(2050320976)
stopifnot(packageVersion("pomp")>="2.0")
Stock Price is of a big interest to many people. When analysing its property, the returns are often used instead of the original prices. This is because stock returns are often found to be uncorrelated. Moreover, because of one characteristic of market index called volatility, which is the degree of variation of returns, a simple time series model such as ARMA model may not be good enough to capture this property. Thus, more advanced models may be needed.
In this project, I will study the time series of APPLE stock price. I will first fit a ARIMA model, then follow with a POMP (partially observed Markov process) model and a GARCH model.
The data is from Yahoo. It is the weekly data contains 7 variables and 785 records from 4/23/2005 to 4/23/2020.
dat=read.csv(file="AAPL.csv",header=TRUE)
head(dat)
## Date Open High Low Close Adj.Close Volume
## 1 2005-04-18 5.262857 5.285714 4.985714 5.071429 4.402208 209782300
## 2 2005-04-25 5.212857 5.358572 5.031428 5.151429 4.471652 854398300
## 3 2005-05-02 5.172857 5.332857 5.145714 5.320000 4.617978 531112400
## 4 2005-05-09 5.325714 5.350000 4.730000 4.967143 4.311685 1127723800
## 5 2005-05-16 4.937143 5.382857 4.932857 5.364286 4.656419 736293600
## 6 2005-05-23 5.407143 5.848571 5.407143 5.794286 5.029676 718392500
Here I will use the adjusted close price. Let look at the plot of the data:
aapl=dat$Adj.Close
aapl_ts=ts(aapl,start=2005.17,frequency = 52)
plot(aapl_ts,type='l',ylab='Apple Stock Price',main='Apple Stock Price')
We can see with the spread use of smart phone, the stock price of Apple Inc. witnesses dramatic increase in the last 15 years. It is clear that there are periodical rapid increase in the stock price, which corresponds with the Apple press with product announcement.
We use \(\{z_n^*,n=1,\cdots,n\}\) to denote the data. We can calculate the log return \(y_n^*\) as \[y_n^*=log(z_n^*)-log(z_{n-1}^*)\]
The time series plot of the log return is shown as follows:
aapl_df=diff(log(aapl))
aapl_df = aapl_df-mean(aapl_df)
plot(aapl_df,type='l',xlab='Time',main='Apple Log Return')
Also, we can plot the auto-correlation plot of the log returns:
acf(aapl_df, main='Acf of Log Returns')
From the ACF, we see the log returns are uncorrelated since they all lie in the confidence interval.
And we are ready to fit models.
The GARCH models have become “widely used for financial time series modeling.” Here, we introduce the GARCH(p,q) model. The GARCH(p,q) has the form: \[Y_n = \epsilon_n\sqrt{V_n}\] Where \[V_n = a_0+\sum_{j=1}^pa_jY_{n-1}^2+\sum_{k=1}^qb_kY_{n-k}^2\] and \(\epsilon_{1:N}\) is white noise.
We use the GARCH model as a benchmark since GARCH is a simpler model than POMP. In practice, the GARCH(1,1) model is a popular choice , which can fitted as follows.
fit.garch.benchmark <- garch(aapl_df,grad = "numerical", trace = FALSE)
tseries:::logLik.garch(fit.garch.benchmark)
## 'log Lik.' 1345.298 (df=3)
From the result above, the logLikelihood of GARCH(1,1) model is 1345.298 with 3 parameters.
spectrum(aapl_df, main = "Unsmoothed periodogram")
smoothed_r = spectrum(aapl_df, spans=c(30,30), main = "Smoothed periodogram")
abline(v = min(smoothed_r$freq[findPeaks(smoothed_r$spec)]),col = 'red',lty=2)
1/min(smoothed_r$freq[findPeaks(smoothed_r$spec)])
## [1] 36.36364
The period is around 36 weeks which is half a year. Since Apple press with product announcement usually holds twice a year, the period is quite resonable.
The trend is not clear to observe, so let’s decompress the returns to investigate
de=decompose(ts(aapl_df,start=2005.17,frequency = 52))
plot(de)
The trend is still not so obvious but the seasonality is easy to see.
So let’s look at the random part and try to fit it into an ARMA model based on AIC values:
aic_table <- function(data,P,Q){
table <- matrix(NA,(P+1),(Q+1))
for(p in 0:P) {
for(q in 0:Q) {
table[p+1,q+1] <- arima(data,order=c(p,0,q))$aic
}
}
dimnames(table) <- list(paste("AR",0:P, sep=""),paste("MA",0:Q,sep=""))
table
}
temp_aic_table <- aic_table(de$random,5,5)
require(knitr)
## Loading required package: knitr
kable(temp_aic_table,digits=2)
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | -2571.94 | -2569.98 | -2568.06 | -2566.61 | -2564.70 | -2563.07 |
AR1 | -2569.98 | -2567.98 | -2566.12 | -2564.63 | -2562.75 | -2562.94 |
AR2 | -2568.06 | -2566.16 | -2577.90 | -2568.28 | -2575.05 | -2561.03 |
AR3 | -2566.64 | -2564.67 | -2567.43 | -2568.68 | -2593.97 | -2597.00 |
AR4 | -2564.72 | -2562.72 | -2575.22 | -2593.58 | -2618.78 | -2615.88 |
AR5 | -2563.22 | -2562.86 | -2560.88 | -2597.67 | -2592.19 | -2618.50 |
It is easy to see that the ARMA(0,0) model has the smallest AIC. Thus we fit the ARMA(0,0) model.
ar=arima(de$random,order = c(0,0,0))
ar
##
## Call:
## arima(x = de$random, order = c(0, 0, 0))
##
## Coefficients:
## intercept
## 0.0000
## s.e. 0.0015
##
## sigma^2 estimated as 0.001735: log likelihood = 1287.97, aic = -2571.94
We can see the log liklihood for ARMA(0,0) model is 1287.97, which is much smaller than the benchmark GARCH(1,1) log liklihood 1345.298. Then we analyze the residuals of ARMA(0,0) model.
qqnorm(ar$residuals)
qqline(ar$residuals)
The residuals still present a long-tailed distribution which disagrees with the normal assumption. It can also be checked by Shapiro-Wilk test.
shapiro.test(ar$residuals)
##
## Shapiro-Wilk normality test
##
## data: ar$residuals
## W = 0.98445, p-value = 5.102e-07
The p-value is very small, which means we should reject the null hypothesis that the residuals follow normal distribution.
The phenomenon that negative shocks to a stockmarket index are associated with a subsequent increase in volatility is called leverage. Here, we formally define leverage, \(R_n\) on day \(n\) as the correlation between index return on day \(n − 1\) and the increase in the log volatility from day \(n − 1\) to day \(n\).
\[R_n=\frac{\exp(2G_n)-1}{\exp(2G_n)+1}\]
Where \(\{G_n\}\) is Gaussian random walk.
Then the model is: \[Y_n=\exp(\frac{H_n}{2})\] \[H_n=\mu_n(1-\phi)+\phi H_{n-1}+\beta_{n-1}R_n\exp\{-\frac{H_{n-1}}{2}\}+\omega_n\] \[G_n=G_{n-1}+\nu_n\] Where \(\beta=Y_n\sigma_{\eta}\sqrt{1-\phi^2}\), \(\{\epsilon_n\}\) is an iid \(\mathcal{N}(0,1)\) sequence, \(\{\nu_n\}\) is an iid \(\mathcal{N}(0,\sigma_{\nu}^2)\) sequence, and \(\{\omega_n\}\) is an iid \(\mathcal{N}(0,\sigma_{\omega}^2)\) sequence, \(H_n\) is the log volatility.
## ----names--------------------------------------------------------------------
aapl_statenames <- c("H","G","Y_state")
aapl_rp_names <- c("sigma_nu","mu_h","phi","sigma_eta")
aapl_ivp_names <- c("G_0","H_0")
aapl_paramnames <- c(aapl_rp_names,aapl_ivp_names)
aapl_covarnames <- "covaryt"
## ----rproc--------------------------------------------------------------------
rproc1 <- "
double beta,omega,nu;
omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) *
sqrt(1-tanh(G)*tanh(G)));
nu = rnorm(0, sigma_nu);
G += nu;
beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
H = mu_h*(1 - phi) + phi*H + beta * tanh( G )
* exp(-H/2) + omega;
"
rproc2.sim <- "
Y_state = rnorm( 0,exp(H/2) );
"
rproc2.filt <- "
Y_state = covaryt;
"
aapl_rproc.sim <- paste(rproc1,rproc2.sim)
aapl_rproc.filt <- paste(rproc1,rproc2.filt)
## ----rinit--------------------------------------------------------------------
aapl_rinit <- "
G = G_0;
H = H_0;
Y_state = rnorm( 0,exp(H/2) );
"
## ----measure------------------------------------------------------------------
aapl_rmeasure <- "
y=Y_state;
"
aapl_dmeasure <- "
lik=dnorm(y,0,exp(H/2),give_log);
"
## ----transforms---------------------------------------------------------------
aapl_partrans <- parameter_trans(
log=c("sigma_eta","sigma_nu"),
logit="phi"
)
Then I simulate with an arbitrary parameters.
aapl.filt <- pomp(data=data.frame(
y=aapl_df,time=1:length(aapl_df)),
statenames=aapl_statenames,
paramnames=aapl_paramnames,
times="time",
t0=0,
covar=covariate_table(
time=0:length(aapl_df),
covaryt=c(0,aapl_df),
times="time"),
rmeasure=Csnippet(aapl_rmeasure),
dmeasure=Csnippet(aapl_dmeasure),
rprocess=discrete_time(step.fun=Csnippet(aapl_rproc.filt),
delta.t=1),
rinit=Csnippet(aapl_rinit),
partrans=aapl_partrans
)
params_test <- c(
sigma_nu = exp(-4.5),
mu_h = -0.25,
phi = expit(4),
sigma_eta = exp(-0.07),
G_0 = 0,
H_0=0
)
sim1.sim <- pomp(aapl.filt,
statenames=aapl_statenames,
paramnames=aapl_paramnames,
rprocess=discrete_time(
step.fun=Csnippet(aapl_rproc.sim),delta.t=1)
)
sim1.sim <- simulate(sim1.sim,seed=1,params=params_test)
plot(Y_state~time,data=sim1.sim,type='l',col='red',main='Oringinal vs Simulated',ylab='Log returns')
lines(aapl_df)
legend('topright',legend=c("Original","Simulated"),col=c("black","red"),lty = c(1,1))
We can see this is a poor simulation, but we will use this parameter set as a start to make a local search later.
sim1.filt <- pomp(sim1.sim,
covar=covariate_table(
time=c(timezero(sim1.sim),time(sim1.sim)),
covaryt=c(obs(sim1.sim),NA),
times="time"),
statenames=aapl_statenames,
paramnames=aapl_paramnames,
rprocess=discrete_time(
step.fun=Csnippet(aapl_rproc.filt),delta.t=1)
)
## ----run_level----------------------------------------------------------------
run_level <- 3
aapl_Np <- switch(run_level, 100, 1e3, 2e3)
aapl_Nmif <- switch(run_level, 10, 100, 200)
aapl_Nreps_eval <- switch(run_level, 4, 10, 20)
aapl_Nreps_local <- switch(run_level, 10, 20, 20)
aapl_Nreps_global <- switch(run_level, 10, 20, 100)
## ----parallel-setup,cache=FALSE-----------------------------------------------
library(doParallel)
## Loading required package: iterators
## Loading required package: parallel
registerDoParallel()
library(doRNG)
## Loading required package: rngtools
registerDoRNG(34118892)
## ----pf1----------------------------------------------------------------------
stew(file=sprintf("pf1-%d.rda",run_level),{
t.pf1 <- system.time(
pf1 <- foreach(i=1:aapl_Nreps_eval,
.packages='pomp') %dopar% pfilter(sim1.filt,Np=aapl_Np))
},seed=493536993,kind="L'Ecuyer")
(L.pf1 <- logmeanexp(sapply(pf1,logLik),se=TRUE))
## se
## -1.093044e+03 4.743215e-02
## ----mif_setup----------------------------------------------------------------
aapl_rw.sd_rp <- 0.02
aapl_rw.sd_ivp <- 0.1
aapl_cooling.fraction.50 <- 0.5
aapl_rw.sd <- rw.sd(
sigma_nu = aapl_rw.sd_rp,
mu_h = aapl_rw.sd_rp,
phi = aapl_rw.sd_rp,
sigma_eta = aapl_rw.sd_rp,
G_0 = ivp(aapl_rw.sd_ivp),
H_0 = ivp(aapl_rw.sd_ivp)
)
## ----mif----------------------------------------------------------------------
stew(file=sprintf("mif1-%d.rda",run_level),{
t.if1 <- system.time({
if1 <- foreach(i=1:aapl_Nreps_local,
.packages='pomp', .combine=c) %dopar% mif2(aapl.filt,
params=params_test,
Np=aapl_Np,
Nmif=aapl_Nmif,
cooling.fraction.50=aapl_cooling.fraction.50,
rw.sd = aapl_rw.sd)
L.if1 <- foreach(i=1:aapl_Nreps_local,
.packages='pomp', .combine=rbind) %dopar% logmeanexp(
replicate(aapl_Nreps_eval, logLik(pfilter(aapl.filt,
params=coef(if1[[i]]),Np=aapl_Np))), se=TRUE)
})
},seed=318817883,kind="L'Ecuyer")
r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],
t(sapply(if1,coef)))
if (run_level>1) write.table(r.if1,file="aapl_params.csv",
append=TRUE,col.names=FALSE,row.names=FALSE)
pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,
data=subset(r.if1,logLik>max(logLik)-20))
## ----box----------------------------------------------------------------------
aapl_box <- rbind(
sigma_nu=c(0.005,0.05),
mu_h =c(-1,0),
phi = c(0.95,0.99),
sigma_eta = c(0.5,1),
G_0 = c(-2,2),
H_0 = c(-1,1)
)
## ----box_eval-----------------------------------------------------------------
stew(file=sprintf("box_eval-%d.rda",run_level),{
t.box <- system.time({
if.box <- foreach(i=1:aapl_Nreps_global,
.packages='pomp',.combine=c) %dopar% mif2(if1[[1]],
params=apply(aapl_box,1,function(x)runif(1,x)))
L.box <- foreach(i=1:aapl_Nreps_global,
.packages='pomp',.combine=rbind) %dopar% {
logmeanexp(replicate(aapl_Nreps_eval, logLik(pfilter(
aapl.filt,params=coef(if.box[[i]]),Np=aapl_Np))),
se=TRUE)
}
})
},seed=290860873,kind="L'Ecuyer")
r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],
t(sapply(if.box,coef)))
if(run_level>1) write.table(r.box,file="aapl_params.csv",
append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.box$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1314 1335 1342 1340 1348 1352
pairs(~logLik+log(sigma_nu)+mu_h+phi+sigma_eta+H_0,
data=subset(r.box,logLik>max(logLik)-10))
lik_max=subset(r.box,logLik==max(logLik))
lik_max
## logLik logLik_se sigma_nu mu_h phi sigma_eta
## result.57 1351.834 0.06209907 0.001581687 -6.285642 0.9085966 0.6231066
## G_0 H_0
## result.57 -0.3072754 -4.768942
We can see the maximum log likihood is 1351.834, which is larger than the benchmark GARCH likihood 1345.298.
params_test <- c(
sigma_nu = exp(log(lik_max$sigma_nu)),
mu_h = lik_max$mu_h,
phi = expit(logit(lik_max$phi)),
sigma_eta = exp(log(lik_max$sigma_eta)),
G_0 = lik_max$G_0,
H_0=lik_max$H_0
)
sim1.sim <- pomp(aapl.filt,
statenames=aapl_statenames,
paramnames=aapl_paramnames,
covarnames=aapl_covarnames,
rprocess=discrete.time.sim(step.fun=Csnippet(aapl_rproc.sim),delta.t=1)
)
## Warning: 'discrete.time.sim' is deprecated and will be removed in a
## forthcoming release. Use 'discrete_time' instead.
sim1.sim <- simulate(sim1.sim,seed=8,params=params_test)
plot(Y_state~time,data=sim1.sim,type='l',col='red',main='Oringinal vs Simulated',ylab='Log returns')
lines(aapl_df)
legend('topright',legend=c("Original","Simulated"),col=c("black","red"),lty = c(1,1))
We can see this simulation is quite good. Although there are some big fluctuation it does not capture, it fits pretty well.
After comparing the ARMA model, the GARCH model and the POMP models, we conclude that the random walk leverage POMP model with randomized starting values is generally the best choice to investigate the financial volatility of Apple stock and GARCH model is a better model than ARMA model when analyzing financial volatility. Moreover, by implementing a POMP model, we can estimate the parameters denoted in the financial model which is remarkbly beneficial for financial study of volatility.
Due to the limited time and the considerable amount of computations, we are unable to provide an optimal presentation of our models. In the future, apart from refining the algorithms by increasing the sample size and the amount of iterations, we can also provide the best estimates for all parameters.