Time Series Analysis of Nintendo stock price

1. Introduction

library(pomp)

## Loading required package: methods

library(plyr)
library(ggplot2)
library(foreach)
library(tseries)

Stock Price is of a big interest to many people. When analysing its property, the returns are often used instead of the original prices. This is because stock returns are often found to be uncorrelated. Moreover, because of one characteristic of market index called volatility, which is the degree of variation of returns, a simple time series model such as ARMA model may not be good enough to capture this property. Thus, more advanced models may be needed.

In this project, I study the time series of Nintendo Stock Price. I will first fit a ARIMA model, then follow with a POMP (partially observed Markov process) model and a GARCH model.

2. Explore the Data

The data is from Yahoo. It is the weekly data contains 523 records from 4/22/2008 to 4/22/2018.

dat=read.csv(file="NTDOY.csv",header=TRUE)
head(dat)

##         Date  Open  High   Low Close Adj.Close  Volume
## 1 2008-04-21 68.50 71.15 67.20 68.55  57.72731 1897000
## 2 2008-04-28 69.85 70.45 67.00 67.61  56.93572 1468600
## 3 2008-05-05 68.05 70.25 59.50 69.49  58.51891  514800
## 4 2008-05-12 69.81 72.50 69.00 72.50  61.05368 1110300
## 5 2008-05-19 73.75 74.12 69.22 70.25  59.15892  991100
## 6 2008-05-26 71.50 71.50 63.50 68.95  58.06416  895100

Here I will use the adjusted close price. Let look at the plot of the data:

ntd=dat$Adj.Close
ntd_ts=ts(ntd,start=2008.17,frequency = 52)
plot(ntd_ts,type='l',ylab='Nintendo Stock Price',main='Nintendo Stock Price')

As we can see, the Nintendo’s stock price is at a peak at around 2008 mainly due to the success of its console Wii released in 2006. It began to decrease since then and went to a trough between 2012 and 2016. This corresponded to the failure of its new console after Wii called Wii U that is released in 2012 and the popularity of PS4 and Xbox One released in 2013. After 2017, we can see the stock price in general kept increasing, resulting from the huge success of Nintendo’s new console Switch launched in 2017.

Let \(\{z^*_n, n=1...N\}\) denote the data, then we write the log return \(y^*_n\) as \[y^*_n=log(z^*_n)-log(z^*_{n-1})\] The plot is below:

ntd_df=diff(log(ntd))
plot(ntd_df,type='l',xlab='Time',main='Nintendo Log Return')

acf(ntd_df, main='Acf of Log Returns')

From the ACF, we see the log returns are uncorrelated since they all lie in the confidence interval.

And we are ready to fit models.

3. ARMA Model

From the periodogram, there is a high peak at frequency = 0.44, which suggests a period of 2.3 weeks. Thus there is a seanality the may take into consideration.

The trend is not clear to observe, so let’s decompress the returns to investigate.

de=decompose(ts(ntd_df,start=2008.17,frequency = 52))
plot(de)

The decompose shows that there is a positive trending to the returns, and the seaonal behavior is also obvious.

So let’s look at the random part and try to fit it into an ARMA model based on AIC values:

## Loading required package: knitr

	MA0	MA1	MA2	MA3	MA4	MA5
AR0	-1418.76	-1418.33	-1417.28	-1415.48	-1415.48	-1413.76
AR1	-1418.20	-1436.06	-1434.06	-1413.99	-1413.72	-1411.99
AR2	-1417.03	-1434.06	-1432.43	-1430.58	-1411.73	-1412.37
AR3	-1415.21	-1414.01	-1430.56	-1435.54	-1414.31	-1413.40
AR4	-1415.91	-1413.94	-1411.94	-1414.32	-1418.65	-1417.36
AR5	-1413.94	-1411.94	-1409.94	-1412.28	-1417.11	-1413.15
We cho	ose ARMA(0,	0) that has	a low AIC,	so we choo	se to fit t	his model. The summary is below:

ar=arima(de$random,order = c(0,0,0))
ar

## 
## Call:
## arima(x = de$random, order = c(0, 0, 0))
## 
## Coefficients:
##       intercept
##          0.0008
## s.e.     0.0025
## 
## sigma^2 estimated as 0.002837:  log likelihood = 711.38,  aic = -1418.76

Next we look at the residuals of the mp=odel:

qqnorm(ar$residuals)
qqline(ar$residuals)

shapiro.test(ar$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  ar$residuals
## W = 0.94531, p-value = 3.764e-12

The normal QQ-plot shows heavy tails on both end. The Shapiro-Wilk normality test also rejects the assumption that the residuals are normally distrubuted.

Thus, this is not a good model.

4. POMP Model

The phenomenon that negative shocks to a stockmarket index are associated with a subsequent increase in volatility is called leverage.

NA \[R_n=\frac{\{exp2G_n\}-1}{\{exp2G_n\}+1}\] where \(\{G_n\}\) is Gaussian random walk.

Then the model is : \[Y_n=exp\{H_n/2\}\] \[H_n=\mu_h(1-\phi)+\phi H_{n-1}+\beta_{n-1}R_n exp\{-H_{n-1}/2\}+\omega_n\] \[G_n=G_{n-1}+\nu_n\] where \(\beta=Y_n\sigma_\eta\sqrt(1-\phi^2)\), \(\{\epsilon\}\) is iid N(0,1), \(\{\nu_n\}\) is iid N(0,\(\sigma^2_\nu\)), \(\{\omega_n\}\) is N(0,\(\sigma^2_\omega\)), H_n is log volatility.

4.1 Building a POMP object

Then I build a POMP project:

ntd_statenames <- c("H","G","Y_state")
ntd_rp_names <- c("sigma_nu","mu_h","phi","sigma_eta")
ntd_ivp_names <- c("G_0","H_0")
ntd_paramnames <- c(ntd_rp_names,ntd_ivp_names)
ntd_covarnames <- "covaryt"

rproc1 <- "
double beta,omega,nu;
omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) * sqrt(1-tanh(G)*tanh(G)));
nu = rnorm(0, sigma_nu);
G += nu;
beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
H = mu_h*(1 - phi) + phi*H + beta * tanh( G ) * exp(-H/2) + omega;
"
rproc2.sim <- "
Y_state = rnorm( 0,exp(H/2) );
"

rproc2.filt <- "
Y_state = covaryt;
"
ntd_rproc.sim <- paste(rproc1,rproc2.sim)
ntd_rproc.filt <- paste(rproc1,rproc2.filt)

ntd_initializer <- "
G = G_0;
H = H_0;
Y_state = rnorm( 0,exp(H/2) );
"

ntd_rmeasure <- "
y=Y_state;
"

ntd_dmeasure <- "
lik=dnorm(y,0,exp(H/2),give_log);
"

ntd_toEstimationScale <- "
Tsigma_eta = log(sigma_eta);
Tsigma_nu = log(sigma_nu);
Tphi = logit(phi);
"

ntd_fromEstimationScale <- "
Tsigma_eta = exp(sigma_eta);
Tsigma_nu = exp(sigma_nu);
Tphi = expit(phi);
"

Then I simulate with an arbitrary parameters.

ntd.filt <- pomp(data=data.frame(y=ntd_df,
                                 time=1:length(ntd_df)),
                 statenames=ntd_statenames,
                 paramnames=ntd_paramnames,
                 covarnames=ntd_covarnames,
                 times="time",
                 t0=0,
                 covar=data.frame(covaryt=c(0,ntd_df),
                                  time=0:length(ntd_df)),
                 tcovar="time",
                 rmeasure=Csnippet(ntd_rmeasure),
                 dmeasure=Csnippet(ntd_dmeasure),
                 rprocess=discrete.time.sim(step.fun=Csnippet(ntd_rproc.filt),delta.t=1),
                 initializer=Csnippet(ntd_initializer),
                 toEstimationScale=Csnippet(ntd_toEstimationScale), 
                 fromEstimationScale=Csnippet(ntd_fromEstimationScale)
)


expit<-function(real){1/(1+exp(-real))}
logit<-function(p.arg){log(p.arg/(1-p.arg))}
params_test <- c(
  sigma_nu = exp(-4.5),  
  mu_h = -0.25,       
  phi = expit(4),     
  sigma_eta = exp(-0.07),
  G_0 = 0,
  H_0=0
)
sim1.sim <- pomp(ntd.filt, 
                 statenames=ntd_statenames,
                 paramnames=ntd_paramnames,
                 covarnames=ntd_covarnames,
                 rprocess=discrete.time.sim(step.fun=Csnippet(ntd_rproc.sim),delta.t=1)
)

sim1.sim <- simulate(sim1.sim,seed=1,params=params_test)

plot(Y_state~time,data=sim1.sim,type='l',col='red',main='Oringinal vs Simulated',ylab='Log returns')
lines(ntd_df)
legend('topright',legend=c("Original","Simulated"),col=c("black","red"),lty = c(1,1))

We can see this is a poor simulation, but we will use this parameter set as a start to make a local search later.

sim1.filt <- pomp(sim1.sim, 
                  covar=data.frame(
                    covaryt=c(obs(sim1.sim),NA),
                    time=c(timezero(sim1.sim),time(sim1.sim))),
                  tcovar="time",
                  statenames=ntd_statenames,
                  paramnames=ntd_paramnames,
                  covarnames=ntd_covarnames,
                  rprocess=discrete.time.sim(step.fun=Csnippet(ntd_rproc.filt),delta.t=1)
)

4.2 Filtering on simulated data

I then use this simulation data to estimate a likelibood.

run_level <- 3 
ntd_Np <-          c(100,1000,5000)
ntd_Nmif <-        c(10, 100,200)
ntd_Nreps_eval <-  c(4,  10,  20)
ntd_Nreps_local <- c(10, 20, 20)
ntd_Nreps_global <-c(10, 20, 100)


require(doParallel)

## Loading required package: doParallel

## Loading required package: iterators

## Loading required package: parallel

registerDoParallel(20)

stew("pf1.rda",{
  t.pf1 <- system.time(
    pf1 <- foreach(i=1:ntd_Nreps_eval[run_level],.packages='pomp',
                   .options.multicore=list(set.seed=TRUE)) %dopar% try(
                     pfilter(sim1.filt,Np=ntd_Np[run_level])
                   )
  )
},seed=493536993,kind="L'Ecuyer")
(L.pf1 <- logmeanexp(sapply(pf1,logLik),se=TRUE))

##                          se 
## -732.70008730    0.02841198

We can see this gives us an unbiased likelihood estimate of -732.7 with a Monte standard error of 0.03.

4.3 Fitting the stochastic leverage model

I now use the IF2 algorithm of Ionides et al. to iterated filtering on the data.

ntd_rw.sd_rp <- 0.02
ntd_rw.sd_ivp <- 0.1
ntd_cooling.fraction.50 <- 0.5

stew("mif1.rda",{
  t.if1 <- system.time({
    if1 <- foreach(i=1:ntd_Nreps_local[run_level],
                   .packages='pomp', .combine=c,
                   .options.multicore=list(set.seed=TRUE)) %dopar% try(
                     mif2(ntd.filt,
                          start=params_test,
                          Np=ntd_Np[run_level],
                          Nmif=ntd_Nmif[run_level],
                          cooling.type="geometric",
                          cooling.fraction.50=ntd_cooling.fraction.50,
                          transform=TRUE,
                          rw.sd = rw.sd(
                            sigma_nu  = ntd_rw.sd_rp,
                            mu_h      = ntd_rw.sd_rp,
                            phi       = ntd_rw.sd_rp,
                            sigma_eta = ntd_rw.sd_rp,
                            G_0       = ivp(ntd_rw.sd_ivp),
                            H_0       = ivp(ntd_rw.sd_ivp)
                          )
                     )
                   )
    
    L.if1 <- foreach(i=1:ntd_Nreps_local[run_level],.packages='pomp',
                     .combine=rbind,.options.multicore=list(set.seed=TRUE)) %dopar% 
                     {
                       logmeanexp(
                         replicate(ntd_Nreps_eval[run_level],
                                   logLik(pfilter(ntd.filt,params=coef(if1[[i]]),Np=ntd_Np[run_level]))
                         ),
                         se=TRUE)
                     }
  })
},seed=318817883,kind="L'Ecuyer")

r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],t(sapply(if1,coef)))
if (run_level>1) 
  write.table(r.if1,file="ntd_params.csv",append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.if1$logLik,digits=5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   771.1   775.3   787.4   786.8   798.9   799.0

As we can see, this gives us a max likelihood 799.0.

pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,data=subset(r.if1,logLik>max(logLik)-20))

From the plot of parameters vs Likelihood, we could get a glance at the approximate best parameter values for the max likelihood.

4.4 Likelihood maximization using randomized starting values

We are now ready to estimate the parameters based on global search.

We have to first set intinitial values for the parameter search.

Based on the results from the local search in the previous section, we could construct a box containing reasonable parameter values:

ntd_box <- rbind(
  sigma_nu=c(0,0.05),
  mu_h    =c(-8,-5),
  phi = c(0,0.5),
  sigma_eta = c(0.8,1),
  G_0 = c(-2,2),
  H_0 = c(-1,1)
)

stew(file="box_eval.rda",{
  t.box <- system.time({
    if.box <- foreach(i=1:ntd_Nreps_global[run_level],.packages='pomp',.combine=c,
                      .options.multicore=list(set.seed=TRUE)) %dopar%  
      mif2(
        if1[[1]],
        start=apply(ntd_box,1,function(x)runif(1,x))
      )
    
    L.box <- foreach(i=1:ntd_Nreps_global[run_level],.packages='pomp',.combine=rbind,
                     .options.multicore=list(set.seed=TRUE)) %dopar% {
                       set.seed(87932+i)
                       logmeanexp(
                         replicate(ntd_Nreps_eval[run_level],
                                   logLik(pfilter(ntd.filt,params=coef(if.box[[i]]),Np=ntd_Np[run_level]))
                         ), 
                         se=TRUE)
                     }
  })
},seed=290860873,kind="L'Ecuyer")


r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],t(sapply(if.box,coef)))
if(run_level>1) write.table(r.box,file="ntd_params.csv",append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.box$logLik,digits=5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   762.4   798.5   798.7   796.8   798.8   799.4

The best likelihood was 799.4, which is slightly larger than the one from local search.

pairs(~logLik+log(sigma_nu)+mu_h+phi+sigma_eta+H_0,data=subset(r.box,logLik>max(logLik)-10))

The plot of parameters shows us the parameters that achieve best likelihood, and they are shown below:

lik_max=subset(r.box,logLik==max(logLik))
lik_max

##             logLik  logLik_se     sigma_nu      mu_h       phi sigma_eta
## result.44 799.3466 0.04245424 0.0004430871 -6.199666 0.3438113 0.8897086
##                  G_0       H_0
## result.44 0.02502883 -3.482849

Now lets fit those parameters into the model and make a simulation:

params_test <- c(
  sigma_nu = exp(log(lik_max$sigma_nu)),  
  mu_h = lik_max$mu_h,       
  phi = expit(logit(lik_max$phi)),     
  sigma_eta = exp(log(lik_max$sigma_eta)),
  G_0 = lik_max$G_0,
  H_0=lik_max$H_0
)
sim1.sim <- pomp(ntd.filt, 
                 statenames=ntd_statenames,
                 paramnames=ntd_paramnames,
                 covarnames=ntd_covarnames,
                 rprocess=discrete.time.sim(step.fun=Csnippet(ntd_rproc.sim),delta.t=1)
)

sim1.sim <- simulate(sim1.sim,seed=8,params=params_test)

plot(Y_state~time,data=sim1.sim,type='l',col='red',main='Oringinal vs Simulated',ylab='Log returns')
lines(ntd_df)
legend('topright',legend=c("Original","Simulated"),col=c("black","red"),lty = c(1,1))

We can see this simulation is quite good. Although there are some big fluctuation it does not capture, it fits pretty well.

5. GARCH Model

We then fitting the data into a GARCH Model as a benchmark:

ntd_garch=garch(ntd_df,grad='numerical',trace=FALSE)
logLik(ntd_garch)

## 'log Lik.' 763.8883 (df=3)

We see the GARCH(1,1) model gives a max likelihood of 763.9, which is less than that of POMP model. Thus, POMP model is obviously better than it.

6. Conclusion

In this project, I fit the Nintendo stock price returns into ARMA(0,0), POMP and GARCH(1,1) model.

ARMA is not a good option, sinces it is not able to capture the volatality of stock returns. Comparing GARCH with POMP, we see that POMP achieves a larger likelihood. Moreover, since GARCH’s parameters do not have clear meanings in reality, we should choose the POMP model over the GARCH model.

7. Reference

[1] https://finance.yahoo.com/quote/NTDOY/history?period1=1208836800&period2=1524369600&interval=1wk&filter=history&frequency=1wk

[2] https://ionides.github.io/531w18/