Pomp model fitting for China monthly export value from 1993 to 2018

Introduction
Data
Model Fitting
Conclusion
Reference

Introduction

With the development of GDP, China exporting value and importing value increased at a dramatically speed. It can’t help but make us think about how did Chinese international trade grow and what will it be like in the future. I downloaded the monthly export data from 1993 to 2018 and try to fit pomp models with it. From midterm project, I find the growth rate of trade value seems to be rather smooth, and it’s more convenient to fit. However, the ARMA model seems to have a not much effective fit about the certain values. Specifically, it can’t reflect the volatility accurately. So now I’m trying to apply Garch and stochastic volatility models to study the volatility of the data and see if there will be a better match.

Data

Firstly, load the export data downloaded from UN Comtrade database[1]. To make clarification, the data is the combined value of china and Hongkong export values. The reason is that there exists great amount of trade contacts between them, which can largely affect the result. So I’ve added up their export values together to do the analysis here.

## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ tibble  3.0.1     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ✓ purrr   0.3.4

## Warning: package 'tibble' was built under R version 3.6.2

## Warning: package 'purrr' was built under R version 3.6.2

## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

## -------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## -------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:purrr':
## 
##     compact

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

## 
## Attaching package: 'foreach'

## The following objects are masked from 'package:purrr':
## 
##     accumulate, when

## Warning: package 'pomp' was built under R version 3.6.2

## Welcome to pomp version 2.7.1.0!
## As of version 2.7.1.0, important changes have been made to the
## default settings of the particle filtering algorithms in
## 'pfilter', 'mif2', 'pmcmc', 'bsmc2'.
## These changes are not backward compatible.
## See the package NEWS for the details.
## 
## For information on upgrading your pomp version < 2 code, see the
## "pomp version 2 upgrade guide" at https://kingaa.github.io/pomp/.

## 
## Attaching package: 'pomp'

## The following object is masked from 'package:purrr':
## 
##     map

From above plots and the analysis in midterm project, we can see the growth rate is rather stable. So in this project, I also decide to fitting the models to the growth rate values.

Model Fitting

Then I will try to fit the models to the data. The first model is Garch(1,1) Model which is given in tseries package.

The General ARCH model has the form as below: \[ Y_{n}=\epsilon_{n}\sqrt{V_{n}} \]

where \[ V_{n}=\alpha_{0}+\sum_{j=1}^{p}\alpha_{j}Y_{n-j}^{2}+\sum_{k=1}^{q}\beta_{k}V_{n-k} \]

So for Garch(1,1) model, the \(V_{n}\) equals: \[ V_{n}=\alpha_{0}+\alpha_{1}Y_{n-1}^{2}+\beta_{1}V_{n-1} \]

and here \(\epsilon_{1:N}\) represents a white noise process. Thus we can just fit the model with these three parameters.

require(tseries)

## Loading required package: tseries

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

gr<-dat$Growth.Rate
gr<-gr[-which(is.na(gr))]
fit.garch <- garch(gr,grad = "numerical",
  trace = FALSE)
L.garch <- tseries:::logLik.garch(fit.garch)
AIC(L.garch)

## [1] -339.5199

We can get a maximized log likelihood of -451, and the AIC of this model is -339.52. We’ll then try to form a pomp framework and then fit the stochastic volatility model.

The model here leverage at year \(n\), which will be the correlation between the growth rate of year \(n-1\) and the increase in log volatility from year \(n-1\) to \(n\). As the model is always used to fit financial or economic variations, I suppose it can have a good effect on our data analysis. Thus, we creat the pomp implementation as note14 like following:

\[ Y_{n}=exp\left \{ H2/2 \right \}\epsilon_{n}, \\ H_{n}=\mu_{h}(1-\phi)+\phi H_{n-1}+\beta_{n-1}R_{n}exp\left \{ -H_{n-1} /2 \right \}+\omega_{n},\\ G_{n}=G_{n-1}+\nu_{n} \]

where \(\beta_{n}=Y_{n}\sigma_{\eta}\sqrt{1-\phi^{2}}\),{\(\epsilon_{n}\)} is an iid \(N(0,1)\) sequence, {\(\nu_{n}\)} is an iid \(N(0,\sigma_{\nu}^{2})\) sequence, and {\(\omega_{n}\)} is an iid \(N(0,\sigma_{\omega}^{2})\) sequence. And \(H_{n}\) is the log volatility of the growth rate.

gr_statenames <- c("H","G","Y_state")
gr_rp_names <- c("sigma_nu","mu_h","phi","sigma_eta")
gr_ivp_names <- c("G_0","H_0")
gr_paramnames <- c(gr_rp_names,gr_ivp_names)

rproc1 <- "
  double beta,omega,nu;
  omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) * 
    sqrt(1-tanh(G)*tanh(G)));
  nu = rnorm(0, sigma_nu);
  G += nu;
  beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
  H = mu_h*(1 - phi) + phi*H + beta * tanh( G ) 
    * exp(-H/2) + omega;
"
rproc2.sim <- "
  Y_state = rnorm( 0,exp(H/2) );
 "

rproc2.filt <- "
  Y_state = covaryt;
 "
gr_rproc.sim <- paste(rproc1,rproc2.sim)
gr_rproc.filt <- paste(rproc1,rproc2.filt)

gr_rinit <- "
  G = G_0;
  H = H_0;
  Y_state = rnorm( 0,exp(H/2) );
"

gr_rmeasure <- "
   y=Y_state;
"

gr_dmeasure <- "
   lik=dnorm(y,0,exp(H/2),give_log);
"

gr_partrans <- parameter_trans(
  log=c("sigma_eta","sigma_nu"),
  logit="phi"
)

Then I start to simulate the data with some intial values.

gr.filt <- pomp(data=data.frame(
    y=gr,time=1:length(gr)),
  statenames=gr_statenames,
  paramnames=gr_paramnames,
  times="time",
  t0=0,
  covar=covariate_table(
    time=0:length(gr),
    covaryt=c(0,gr),
    times="time"),
  rmeasure=Csnippet(gr_rmeasure),
  dmeasure=Csnippet(gr_dmeasure),
  rprocess=discrete_time(step.fun=Csnippet(gr_rproc.filt),
    delta.t=1),
  rinit=Csnippet(gr_rinit),
  partrans=gr_partrans
)

params_test <- c(
  sigma_nu = exp(-4.5),  
  mu_h = -0.25,      
  phi = expit(4),    
  sigma_eta = exp(-0.07),
  G_0 = 0,
  H_0=0
)
  
sim1.sim <- pomp(gr.filt, 
  statenames=gr_statenames,
  paramnames=gr_paramnames,
  rprocess=discrete_time(
    step.fun=Csnippet(gr_rproc.sim),delta.t=1)
)

sim1.sim <- simulate(sim1.sim,seed=1,params=params_test)

plot(sim1.sim)

We can see the simulation from above plot. In some way it catches the trend of the growth rate, compared to the actual values. And the choice of initial parameters seems greatly effect the simulation result, which tells us we need to be careful about the range of box later. Then we can start to filter the simulation result.

sim1.filt <- pomp(sim1.sim, 
  covar=covariate_table(
    time=c(timezero(sim1.sim),time(sim1.sim)),
    covaryt=c(obs(sim1.sim),NA),
    times="time"),
  statenames=gr_statenames,
  paramnames=gr_paramnames,
  rprocess=discrete_time(
    step.fun=Csnippet(gr_rproc.filt),delta.t=1)
)

run_level <- 3
gr_Np <-           switch(run_level, 200, 1e3, 2e3)
gr_Nmif <-         switch(run_level,  10, 100, 200)
gr_Nreps_eval <-   switch(run_level,   4,  10,  20)
gr_Nreps_local <-  switch(run_level,  10,  20,  20)
gr_Nreps_global <- switch(run_level,  10,  20, 100)

library(doParallel)

## Loading required package: iterators

## Loading required package: parallel

registerDoParallel()
library(doRNG)

## Loading required package: rngtools

registerDoRNG(34118892)

stew(file=sprintf("pf1-%d.rda",run_level),{
  t.pf1 <- system.time(
    pf1 <- foreach(i=1:gr_Nreps_eval,
      .packages='pomp') %dopar% pfilter(sim1.filt,Np=gr_Np))
},seed=493536993,kind="L'Ecuyer")
(L.pf1 <- logmeanexp(sapply(pf1,logLik),se=TRUE))

##                          se 
## -445.53632816    0.02803178

After filtering the simulated data, we calculate its maximum likelihood which is -445.5. Then we use the IF2 algorithm to carry out a local and a global maximization of our log-likelihood :

gr_rw.sd_rp <- 0.02
gr_rw.sd_ivp <- 0.1
gr_cooling.fraction.50 <- 0.5
gr_rw.sd <- rw.sd(
  sigma_nu  = gr_rw.sd_rp,
  mu_h      = gr_rw.sd_rp,
  phi       = gr_rw.sd_rp,
  sigma_eta = gr_rw.sd_rp,
  G_0       = ivp(gr_rw.sd_ivp),
  H_0       = ivp(gr_rw.sd_ivp)
)

stew(file=sprintf("mif1-%d.rda",run_level),{
  t.if1 <- system.time({
  if1 <- foreach(i=1:gr_Nreps_local,
    .packages='pomp', .combine=c) %dopar% mif2(gr.filt,
      params=params_test,
      Np=gr_Np,
      Nmif=gr_Nmif,
      cooling.fraction.50=gr_cooling.fraction.50,
      rw.sd = gr_rw.sd)
  L.if1 <- foreach(i=1:gr_Nreps_local,
    .packages='pomp', .combine=rbind) %dopar% logmeanexp(
      replicate(gr_Nreps_eval, logLik(pfilter(gr.filt,
        params=coef(if1[[i]]),Np=gr_Np))), se=TRUE)
  })
},seed=318817883,kind="L'Ecuyer")

r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],
  t(sapply(if1,coef)))
if (run_level>1) write.table(r.if1,file="gr_params.csv",
  append=TRUE,col.names=FALSE,row.names=FALSE)

summary(r.if1$logLik,digits=5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   205.8   206.3   206.6   206.5   206.8   206.9

Above part is the local search for the log likelihood surface, and the largest value is 206.9. Then we find the parameters when it maximize the log likelihood as below, and we can find there are some changes for each of them.

r.if1[which.max(r.if1$logLik),]

##            logLik  logLik_se    sigma_nu      mu_h       phi sigma_eta
## result.6 206.8567 0.04795096 0.002906927 -4.497791 0.4878593  1.303851
##                 G_0       H_0
## result.6 -0.3358953 -2.057262

pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,
  data=subset(r.if1,logLik>max(logLik)-20))

From above plot, we still can not find some significant peaks for the parameters with log likelihood. So, then we will do global search to see if there are any converges.

gr_box <- rbind(
 sigma_nu=c(0.002,0.06),
 mu_h    =c(-1,0),
 phi = c(0.90,0.99),
 sigma_eta = c(0.5,1),
 G_0 = c(-2,2),
 H_0 = c(-1,1)
)

stew(file=sprintf("box_eval-%d.rda",run_level),{
  t.box <- system.time({
    if.box <- foreach(i=1:gr_Nreps_global,
      .packages='pomp',.combine=c) %dopar% mif2(if1[[1]],
        params=apply(gr_box,1,function(x)runif(1,x)))
    L.box <- foreach(i=1:gr_Nreps_global,
      .packages='pomp',.combine=rbind) %dopar% {
         logmeanexp(replicate(gr_Nreps_eval, logLik(pfilter(
         gr.filt,params=coef(if.box[[i]]),Np=gr_Np))), 
           se=TRUE)
       }
  })
},seed=290860873,kind="L'Ecuyer")

r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],
  t(sapply(if.box,coef)))
if(run_level>1) write.table(r.box,file="gr_params.csv",
  append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.box$logLik,digits=5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   205.8   206.4   206.5   206.5   206.7   207.1

We can see now the maximum log likelihood is 207.1, which is larger than local search above. The according parameters are below, which occur some adjustments.

r.box[which.max(r.box$logLik),]

##             logLik  logLik_se    sigma_nu      mu_h       phi sigma_eta
## result.46 207.0861 0.07888484 0.003009881 -4.433394 0.4968869  1.336849
##                  G_0       H_0
## result.46 -0.4193286 -3.104808

pairs(~logLik+log(sigma_nu)+mu_h+phi+sigma_eta+H_0,
  data=subset(r.box,logLik>max(logLik)-10))

plot(if.box[r.box$logLik>max(r.box$logLik)-10])

From the plot of diagnostics, we can find that most of the parameters converge to a rather stable level. We’ll then check the maximum likelihood region and see if this model is applicable.

loglik_convergence <- do.call(cbind,
  traces(if.box[r.box$logLik>max(r.box$logLik)-10],"loglik"))
matplot(loglik_convergence,type="l",lty=1,
  ylim=max(loglik_convergence,na.rm=T)+c(-10,0))

Conclusion

From the diagnostic result, the global search on likelihood surface shows that the log likelihood converges to a stabe range with the increase of the iteration times. Thus, we can say the stochastic leverage model can be used to interpret the dynamics of the growth rate of trade values and its volatility.

Comparing with the max log likelihood values of Garch(1,1) with stochastic volatility model, we can find the later one is better fitting. Checking the maximum log likelihood of ARMA model that I did in my midterm project, the value is 137.1 which is lower than that of stochastic volatility model. So, in conclusion, stochastic volatility model fits the data better than ARMA and Garch. Moreover, most of its parameters converge to a rather stable value and the log likelihood falls in a rather small interval. Therefore, I think this model can do quite a good job in analyzing trade data.

Reference

[1] UN Comtrade Database https://comtrade.un.org/data

[2] Lecture notes https://ionides.github.io/531w18/#class-notes