library(pomp)
## Loading required package: methods
library(plyr)
library(ggplot2)
library(foreach)
library(tseries)
Stock Price is of a big interest to many people. When analysing its property, the returns are often used instead of the original prices. This is because stock returns are often found to be uncorrelated. Moreover, because of one characteristic of market index called volatility, which is the degree of variation of returns, a simple time series model such as ARMA model may not be good enough to capture this property. Thus, more advanced models may be needed.
In this project, I study the time series of Nintendo Stock Price. I will first fit a ARIMA model, then follow with a POMP (partially observed Markov process) model and a GARCH model.
The data is from Yahoo. It is the weekly data contains 523 records from 4/22/2008 to 4/22/2018.
dat=read.csv(file="NTDOY.csv",header=TRUE)
head(dat)
## Date Open High Low Close Adj.Close Volume
## 1 2008-04-21 68.50 71.15 67.20 68.55 57.72731 1897000
## 2 2008-04-28 69.85 70.45 67.00 67.61 56.93572 1468600
## 3 2008-05-05 68.05 70.25 59.50 69.49 58.51891 514800
## 4 2008-05-12 69.81 72.50 69.00 72.50 61.05368 1110300
## 5 2008-05-19 73.75 74.12 69.22 70.25 59.15892 991100
## 6 2008-05-26 71.50 71.50 63.50 68.95 58.06416 895100
Here I will use the adjusted close price. Let look at the plot of the data:
ntd=dat$Adj.Close
ntd_ts=ts(ntd,start=2008.17,frequency = 52)
plot(ntd_ts,type='l',ylab='Nintendo Stock Price',main='Nintendo Stock Price')
As we can see, the Nintendo’s stock price is at a peak at around 2008 mainly due to the success of its console Wii released in 2006. It began to decrease since then and went to a trough between 2012 and 2016. This corresponded to the failure of its new console after Wii called Wii U that is released in 2012 and the popularity of PS4 and Xbox One released in 2013. After 2017, we can see the stock price in general kept increasing, resulting from the huge success of Nintendo’s new console Switch launched in 2017.
Let \(\{z^*_n, n=1...N\}\) denote the data, then we write the log return \(y^*_n\) as \[y^*_n=log(z^*_n)-log(z^*_{n-1})\] The plot is below:
ntd_df=diff(log(ntd))
plot(ntd_df,type='l',xlab='Time',main='Nintendo Log Return')
acf(ntd_df, main='Acf of Log Returns')
From the ACF, we see the log returns are uncorrelated since they all lie in the confidence interval.
And we are ready to fit models.
From the periodogram, there is a high peak at frequency = 0.44, which suggests a period of 2.3 weeks. Thus there is a seanality the may take into consideration.
The trend is not clear to observe, so let’s decompress the returns to investigate.
de=decompose(ts(ntd_df,start=2008.17,frequency = 52))
plot(de)
The decompose shows that there is a positive trending to the returns, and the seaonal behavior is also obvious.
So let’s look at the random part and try to fit it into an ARMA model based on AIC values:
## Loading required package: knitr
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | -1418.76 | -1418.33 | -1417.28 | -1415.48 | -1415.48 | -1413.76 |
AR1 | -1418.20 | -1436.06 | -1434.06 | -1413.99 | -1413.72 | -1411.99 |
AR2 | -1417.03 | -1434.06 | -1432.43 | -1430.58 | -1411.73 | -1412.37 |
AR3 | -1415.21 | -1414.01 | -1430.56 | -1435.54 | -1414.31 | -1413.40 |
AR4 | -1415.91 | -1413.94 | -1411.94 | -1414.32 | -1418.65 | -1417.36 |
AR5 | -1413.94 | -1411.94 | -1409.94 | -1412.28 | -1417.11 | -1413.15 |
We cho | ose ARMA(0, | 0) that has | a low AIC, | so we choo | se to fit t | his model. The summary is below: |
ar=arima(de$random,order = c(0,0,0))
ar
##
## Call:
## arima(x = de$random, order = c(0, 0, 0))
##
## Coefficients:
## intercept
## 0.0008
## s.e. 0.0025
##
## sigma^2 estimated as 0.002837: log likelihood = 711.38, aic = -1418.76
Next we look at the residuals of the mp=odel:
qqnorm(ar$residuals)
qqline(ar$residuals)
shapiro.test(ar$residuals)
##
## Shapiro-Wilk normality test
##
## data: ar$residuals
## W = 0.94531, p-value = 3.764e-12
The normal QQ-plot shows heavy tails on both end. The Shapiro-Wilk normality test also rejects the assumption that the residuals are normally distrubuted.
Thus, this is not a good model.
The phenomenon that negative shocks to a stockmarket index are associated with a subsequent increase in volatility is called leverage.
NA \[R_n=\frac{\{exp2G_n\}-1}{\{exp2G_n\}+1}\] where \(\{G_n\}\) is Gaussian random walk.
Then the model is : \[Y_n=exp\{H_n/2\}\] \[H_n=\mu_h(1-\phi)+\phi H_{n-1}+\beta_{n-1}R_n exp\{-H_{n-1}/2\}+\omega_n\] \[G_n=G_{n-1}+\nu_n\] where \(\beta=Y_n\sigma_\eta\sqrt(1-\phi^2)\), \(\{\epsilon\}\) is iid N(0,1), \(\{\nu_n\}\) is iid N(0,\(\sigma^2_\nu\)), \(\{\omega_n\}\) is N(0,\(\sigma^2_\omega\)), H_n is log volatility.
Then I build a POMP project:
ntd_statenames <- c("H","G","Y_state")
ntd_rp_names <- c("sigma_nu","mu_h","phi","sigma_eta")
ntd_ivp_names <- c("G_0","H_0")
ntd_paramnames <- c(ntd_rp_names,ntd_ivp_names)
ntd_covarnames <- "covaryt"
rproc1 <- "
double beta,omega,nu;
omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) * sqrt(1-tanh(G)*tanh(G)));
nu = rnorm(0, sigma_nu);
G += nu;
beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
H = mu_h*(1 - phi) + phi*H + beta * tanh( G ) * exp(-H/2) + omega;
"
rproc2.sim <- "
Y_state = rnorm( 0,exp(H/2) );
"
rproc2.filt <- "
Y_state = covaryt;
"
ntd_rproc.sim <- paste(rproc1,rproc2.sim)
ntd_rproc.filt <- paste(rproc1,rproc2.filt)
ntd_initializer <- "
G = G_0;
H = H_0;
Y_state = rnorm( 0,exp(H/2) );
"
ntd_rmeasure <- "
y=Y_state;
"
ntd_dmeasure <- "
lik=dnorm(y,0,exp(H/2),give_log);
"
ntd_toEstimationScale <- "
Tsigma_eta = log(sigma_eta);
Tsigma_nu = log(sigma_nu);
Tphi = logit(phi);
"
ntd_fromEstimationScale <- "
Tsigma_eta = exp(sigma_eta);
Tsigma_nu = exp(sigma_nu);
Tphi = expit(phi);
"
Then I simulate with an arbitrary parameters.
ntd.filt <- pomp(data=data.frame(y=ntd_df,
time=1:length(ntd_df)),
statenames=ntd_statenames,
paramnames=ntd_paramnames,
covarnames=ntd_covarnames,
times="time",
t0=0,
covar=data.frame(covaryt=c(0,ntd_df),
time=0:length(ntd_df)),
tcovar="time",
rmeasure=Csnippet(ntd_rmeasure),
dmeasure=Csnippet(ntd_dmeasure),
rprocess=discrete.time.sim(step.fun=Csnippet(ntd_rproc.filt),delta.t=1),
initializer=Csnippet(ntd_initializer),
toEstimationScale=Csnippet(ntd_toEstimationScale),
fromEstimationScale=Csnippet(ntd_fromEstimationScale)
)
expit<-function(real){1/(1+exp(-real))}
logit<-function(p.arg){log(p.arg/(1-p.arg))}
params_test <- c(
sigma_nu = exp(-4.5),
mu_h = -0.25,
phi = expit(4),
sigma_eta = exp(-0.07),
G_0 = 0,
H_0=0
)
sim1.sim <- pomp(ntd.filt,
statenames=ntd_statenames,
paramnames=ntd_paramnames,
covarnames=ntd_covarnames,
rprocess=discrete.time.sim(step.fun=Csnippet(ntd_rproc.sim),delta.t=1)
)
sim1.sim <- simulate(sim1.sim,seed=1,params=params_test)
plot(Y_state~time,data=sim1.sim,type='l',col='red',main='Oringinal vs Simulated',ylab='Log returns')
lines(ntd_df)
legend('topright',legend=c("Original","Simulated"),col=c("black","red"),lty = c(1,1))
We can see this is a poor simulation, but we will use this parameter set as a start to make a local search later.
sim1.filt <- pomp(sim1.sim,
covar=data.frame(
covaryt=c(obs(sim1.sim),NA),
time=c(timezero(sim1.sim),time(sim1.sim))),
tcovar="time",
statenames=ntd_statenames,
paramnames=ntd_paramnames,
covarnames=ntd_covarnames,
rprocess=discrete.time.sim(step.fun=Csnippet(ntd_rproc.filt),delta.t=1)
)
I then use this simulation data to estimate a likelibood.
run_level <- 3
ntd_Np <- c(100,1000,5000)
ntd_Nmif <- c(10, 100,200)
ntd_Nreps_eval <- c(4, 10, 20)
ntd_Nreps_local <- c(10, 20, 20)
ntd_Nreps_global <-c(10, 20, 100)
require(doParallel)
## Loading required package: doParallel
## Loading required package: iterators
## Loading required package: parallel
registerDoParallel(20)
stew("pf1.rda",{
t.pf1 <- system.time(
pf1 <- foreach(i=1:ntd_Nreps_eval[run_level],.packages='pomp',
.options.multicore=list(set.seed=TRUE)) %dopar% try(
pfilter(sim1.filt,Np=ntd_Np[run_level])
)
)
},seed=493536993,kind="L'Ecuyer")
(L.pf1 <- logmeanexp(sapply(pf1,logLik),se=TRUE))
## se
## -732.70008730 0.02841198
We can see this gives us an unbiased likelihood estimate of -732.7 with a Monte standard error of 0.03.
I now use the IF2 algorithm of Ionides et al. to iterated filtering on the data.
ntd_rw.sd_rp <- 0.02
ntd_rw.sd_ivp <- 0.1
ntd_cooling.fraction.50 <- 0.5
stew("mif1.rda",{
t.if1 <- system.time({
if1 <- foreach(i=1:ntd_Nreps_local[run_level],
.packages='pomp', .combine=c,
.options.multicore=list(set.seed=TRUE)) %dopar% try(
mif2(ntd.filt,
start=params_test,
Np=ntd_Np[run_level],
Nmif=ntd_Nmif[run_level],
cooling.type="geometric",
cooling.fraction.50=ntd_cooling.fraction.50,
transform=TRUE,
rw.sd = rw.sd(
sigma_nu = ntd_rw.sd_rp,
mu_h = ntd_rw.sd_rp,
phi = ntd_rw.sd_rp,
sigma_eta = ntd_rw.sd_rp,
G_0 = ivp(ntd_rw.sd_ivp),
H_0 = ivp(ntd_rw.sd_ivp)
)
)
)
L.if1 <- foreach(i=1:ntd_Nreps_local[run_level],.packages='pomp',
.combine=rbind,.options.multicore=list(set.seed=TRUE)) %dopar%
{
logmeanexp(
replicate(ntd_Nreps_eval[run_level],
logLik(pfilter(ntd.filt,params=coef(if1[[i]]),Np=ntd_Np[run_level]))
),
se=TRUE)
}
})
},seed=318817883,kind="L'Ecuyer")
r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],t(sapply(if1,coef)))
if (run_level>1)
write.table(r.if1,file="ntd_params.csv",append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.if1$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 771.1 775.3 787.4 786.8 798.9 799.0
As we can see, this gives us a max likelihood 799.0.
pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,data=subset(r.if1,logLik>max(logLik)-20))
From the plot of parameters vs Likelihood, we could get a glance at the approximate best parameter values for the max likelihood.
We are now ready to estimate the parameters based on global search.
We have to first set intinitial values for the parameter search.
Based on the results from the local search in the previous section, we could construct a box containing reasonable parameter values:
ntd_box <- rbind(
sigma_nu=c(0,0.05),
mu_h =c(-8,-5),
phi = c(0,0.5),
sigma_eta = c(0.8,1),
G_0 = c(-2,2),
H_0 = c(-1,1)
)
stew(file="box_eval.rda",{
t.box <- system.time({
if.box <- foreach(i=1:ntd_Nreps_global[run_level],.packages='pomp',.combine=c,
.options.multicore=list(set.seed=TRUE)) %dopar%
mif2(
if1[[1]],
start=apply(ntd_box,1,function(x)runif(1,x))
)
L.box <- foreach(i=1:ntd_Nreps_global[run_level],.packages='pomp',.combine=rbind,
.options.multicore=list(set.seed=TRUE)) %dopar% {
set.seed(87932+i)
logmeanexp(
replicate(ntd_Nreps_eval[run_level],
logLik(pfilter(ntd.filt,params=coef(if.box[[i]]),Np=ntd_Np[run_level]))
),
se=TRUE)
}
})
},seed=290860873,kind="L'Ecuyer")
r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],t(sapply(if.box,coef)))
if(run_level>1) write.table(r.box,file="ntd_params.csv",append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.box$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 762.4 798.5 798.7 796.8 798.8 799.4
The best likelihood was 799.4, which is slightly larger than the one from local search.
pairs(~logLik+log(sigma_nu)+mu_h+phi+sigma_eta+H_0,data=subset(r.box,logLik>max(logLik)-10))
The plot of parameters shows us the parameters that achieve best likelihood, and they are shown below:
lik_max=subset(r.box,logLik==max(logLik))
lik_max
## logLik logLik_se sigma_nu mu_h phi sigma_eta
## result.44 799.3466 0.04245424 0.0004430871 -6.199666 0.3438113 0.8897086
## G_0 H_0
## result.44 0.02502883 -3.482849
Now lets fit those parameters into the model and make a simulation:
params_test <- c(
sigma_nu = exp(log(lik_max$sigma_nu)),
mu_h = lik_max$mu_h,
phi = expit(logit(lik_max$phi)),
sigma_eta = exp(log(lik_max$sigma_eta)),
G_0 = lik_max$G_0,
H_0=lik_max$H_0
)
sim1.sim <- pomp(ntd.filt,
statenames=ntd_statenames,
paramnames=ntd_paramnames,
covarnames=ntd_covarnames,
rprocess=discrete.time.sim(step.fun=Csnippet(ntd_rproc.sim),delta.t=1)
)
sim1.sim <- simulate(sim1.sim,seed=8,params=params_test)
plot(Y_state~time,data=sim1.sim,type='l',col='red',main='Oringinal vs Simulated',ylab='Log returns')
lines(ntd_df)
legend('topright',legend=c("Original","Simulated"),col=c("black","red"),lty = c(1,1))
We can see this simulation is quite good. Although there are some big fluctuation it does not capture, it fits pretty well.
We then fitting the data into a GARCH Model as a benchmark:
ntd_garch=garch(ntd_df,grad='numerical',trace=FALSE)
logLik(ntd_garch)
## 'log Lik.' 763.8883 (df=3)
We see the GARCH(1,1) model gives a max likelihood of 763.9, which is less than that of POMP model. Thus, POMP model is obviously better than it.
In this project, I fit the Nintendo stock price returns into ARMA(0,0), POMP and GARCH(1,1) model.
ARMA is not a good option, sinces it is not able to capture the volatality of stock returns. Comparing GARCH with POMP, we see that POMP achieves a larger likelihood. Moreover, since GARCH’s parameters do not have clear meanings in reality, we should choose the POMP model over the GARCH model.