CNH to USD pair represents the offshore Chinese Yuan against US Dollar, CNH uses the letters CNY when trading inside of China. The Yuan used to be pegged to the US Dollar but is now allowed to trade a limited distance against the reserve currency on a daily basis. China has used its control over its exchange rate to help ward off global financial crisis.
This project is to investigate the CNH to USD exchange rate prediction, and further, what is the best model to describe this relationship. If any proven relationship is discovered or confirmed, it can be a used as an effective proxy for exchange rate prediction. It can also make a significant impact on macroeconomic decisions of central banks, hedging and risk management for investors as well as any cross-border business that requires more than one currency for clearing and settlement purpose.
This project is also an continued part of the midterm project, where we decided to adopt ARIMA(4,1,4) to model 2017 CNH USD exchange rate. Based on the analysis of midterm project, it is obvious that we can improve our model by
Larger data set with longer time length (2013-2018);
Non linear model, such as GARCH and POMP
Please be noted that this file is executed under level 1, the level 3 output is stored in a seperate file Rplot.pdf and final.out.
In this project, we look at the CNH to USD exchange rate from 18-Jul-13 to 24-Apr-18 (1244 trading days). This longer piece of historical data is downloaded from Investing.com \(^{[2]}\). A time plot for both time series may give a general overview of their behavior.
## Day Rate
## 24-Apr-18 1 6.3203
## 23-Apr-18 2 6.3129
## 22-Apr-18 3 6.2873
## 20-Apr-18 4 6.2791
## 19-Apr-18 5 6.2774
## 18-Apr-18 6 6.2709
After simulation a ARIMA(4,1,4) process and compare it with origianl time series. We can say that the specturm captured some important information of data set. As a time series, CNH USD itself is a decent predictor if ARIMA(4,1,4) is as prediction model.
##
## Call:
## arima(x = dc, order = c(4, 0, 4))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 0.2985 0.7875 -0.4946 -0.3157 -0.2096 -0.8861 0.3925 0.4392
## s.e. 0.2068 0.1597 0.1369 0.1835 0.1973 0.1510 0.1363 0.1711
## intercept
## -1e-04
## s.e. 4e-04
##
## sigma^2 estimated as 0.0002302: log likelihood = 3442.17, aic = -6864.33
## The following object is masked from package:ggplot2:
##
## mpg
So we still have two more tasks: GARCH modeling and POMP modeling, which will be stated in next two sections. We would like to fit our data in a more complex, non linear setting, to justify whether a complicated model is able to help generate better prediction for USD CNH exchange rate.
Our first alternative is to fit GARCH(p,q) model, which is widely used for financial time series modeling. The GARCH(p,q) model can be expressed as \[Y_t=\epsilon_t \sqrt{V_t}\] where \(V_t=\alpha_0+\sum_{j=1}^{p} \alpha_j Y_{t-j}^2+\sum_{k=1}^{q} \beta_k V_{n-k}\) and \(\epsilon_t\) are white noise.
##
## Call:
## garch(x = ex_data$Rate, grad = "numerical", trace = FALSE)
##
## Model:
## GARCH(1,1)
##
## Residuals:
## Min 1Q Median 3Q Max
## 0.9526 0.9782 0.9792 0.9802 0.9941
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## a0 8.947e-02 7.173e+02 0.000 1.000
## a1 1.041e+00 3.006e+02 0.003 0.997
## b1 1.115e-10 2.887e+02 0.000 1.000
##
## Diagnostic Tests:
## Jarque Bera Test
##
## data: Residuals
## X-squared = 18097, df = 2, p-value < 2.2e-16
##
##
## Box-Ljung test
##
## data: Squared.Residuals
## X-squared = 8.5703, df = 1, p-value = 0.003417
The maximum log likelihood is -4075.241 with d.f.=3 for GARCH model, which is much smaller than that of ARIMA(4,1,4), 3442.17. Therefore, GARCH model is not as good as ARIMA model in terms of modeling USD CNH exchange rate. We decided to move on to POMP Modeling.
A geometric Brownian motion is a continuous-time stochastic process in which the logarithm of the randomly varying quantity follows a Brownian motion with drift \(^{[3]}\). The original differential equation is defined as \[dN=\mu N dt+\delta N dz\] After solving the BSPDE, we get \[N_t=N_0e^{(\mu-\frac{\delta^2}{2})t+\delta\sqrt{t} \epsilon}\] With proper discritization by single trading days \[N_{t+1}=N_te^{(\mu-\frac{\delta^2}{2})+\delta \epsilon},\epsilon \sim N(0,1)\] We would like to describe our state variables based on GBM model as \(N\) and \(\epsilon\).
There are two parameters in total for this model, they are:
drift parameter for constant trend: \(\mu\);
volatility parameter for deviations: \(\delta\);
After taking the difference of log of original time series, we get \(log(N_t)-log(N_{t-1})\), which should follows a normal distribution with mean \(\mu\) and standard deviation \(\delta\). We use our historical data and get empirical sample mean and sample standard deviation as follow.
## [1] -2.321843e-05
## [1] 0.002357479
The initial value of N is drawn from a normal distribution with mean 6.5 and standard divation 0.3. The initial value of is drawn from a normal distribution with mean 1 and standard divation 1. The rmeasure is defined as Rate being drawn from a random draw from the normal distribution with mean 0 and variance, which is the state variable. Implementation details:
ex_statenames <- c("N","e")
ex_paramnames <- c("mu","delta")
ex_dmeasure <- Csnippet("lik = dnorm(Rate,0,N,give_log);")
ex_rmeasure <- Csnippet("Rate = rnorm(0,N);")
ex_rprocess <- Csnippet(" e = rnorm(0,1);N = N*exp((mu-delta*delta/2)+delta*e);")
ex_initializer <-"N=rnorm(6.5,0.3);e=rnorm(1,1);"
stopifnot(packageVersion("pomp")>="0.75-1")
ex_fromEstimationScale <- "Tmu = exp(mu);Tdelta = expit(delta);"
ex_toEstimationScale <- "Tmu = log(mu);Tdelta = logit(delta);"
Please be noted that this file is executed under level 1, the level 3 output is stored in a seperate file Rplot.pdf. We first set up a pomp object likes following, and generate a bunch of simulated trajectories at some particular point in parameter space.
ex <<- pomp(
data=ex_data,
times="Day",
t0=0,
rprocess=discrete.time.sim(step.fun=ex_rprocess,delta.t=1),
rmeasure=ex_rmeasure,
dmeasure=ex_dmeasure,
obsnames ="Rate",
statenames=ex_statenames,
paramnames=ex_paramnames,
initializer=Csnippet(ex_initializer))
simulate(ex,params=c(mu=mu0,delta=delta0),nsim=20,states=TRUE) -> x
matplot(time(ex),t(x["N",1:20,]),type='l',lty=1, xlab="time",ylab="Rate",bty='l',col='blue')
lines(time(ex),obs(ex,"Rate"),lwd=2,col='black')
The basic particle filter is implemented in the command pfilter in pomp. A theoretical property of the particle filter is that it gives us an unbiased Monte Carlo estimate of the likelihood.
This theoretical property, combined with Jensen’s inequality and the observation that is a concave function, ensures that the average of the log likelihoods from many particle filter replications will have negative bias as a Monte Carlo estimator of the log likelihood. By using logmeanexp to average the likelihood estimates on the natural scale not the logarithmic scale.
Here we evaluate the log likelihood of the data given the states, the model, and the parameters:
pf <- pfilter(ex,Np=ex_Np,params=c(mu=mu0,delta=delta0))
logLik(pf)
## [1] -4081.127
We can also repeat several times to get an estimate as:
pf <- replicate(10,pfilter(ex,Np=ex_Np,params=c(mu=-0.0001,delta=0.002)))
ll <- sapply(pf,logLik)
logmeanexp(ll,se=TRUE)
## se
## -4.080718e+03 8.466601e-02
However, we would like to know more about the log likelihood of other parameters to better calibrate the model. Intuitively, it can be helpful to think of the geometric surface defined by the likelihood function versus changing parameters.
Now, we can plot two likelihood slices
fix \(\mu=-2.321843e-05\) and move \(\delta\) around interval \([0,0.5]\)
fix \(\delta=0.002357479\) and move \(\mu\) around interval \([-0.5,0.5]\)
## Warning in e$fun(obj, substitute(ex), parent.frame(), e$data): already
## exporting variable(s): ex, ex_Np, pf
## [[1]]
## NULL
##
## [[2]]
## NULL
It is obvious that slices offer limited perspective on the geometry of the likelihood surface, there are far more parameter combinations not being tested. With just two parameters \(\mu\) and \(\delta\), we can evaluate the likelihood at a grid of points and visualize the surface directly.
## Warning: Removed 1567 rows containing non-finite values (stat_contour).
We need to build the new pomp object as follows.
ex2 <<- pomp(
data=ex_data,
times="Day",
t0=0,
rprocess=euler.sim(step.fun=ex_rprocess,delta.t=1),
rmeasure=ex_rmeasure,
dmeasure=ex_dmeasure,
fromEstimationScale=Csnippet(ex_fromEstimationScale),
toEstimationScale=Csnippet(ex_toEstimationScale),
obsnames ="Rate",
statenames=ex_statenames,
paramnames=ex_paramnames,
initializer=Csnippet(ex_initializer))
The local maximum search gives us a maximum of likelihood with a standard error.
## se
## -6548.76900 35.27813
Let’s carry out a local search using mif2 around this previously identified MLE. We set the rw.sd=0.002 and cooling.fraction.50=0.5 algorithmic parameters. We evaluate the likelihood, together with a standard error, using replicated particle filters at each point estimate.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6544 -6536 -6521 -6497 -6478 -6406
These repeated stochastic maximizations can also show us the geometry of the likelihood surface in a neighborhood of this point estimate
Till now we see the maximum log likelihood is -6406, which is far smaller than that of ARIMA(4,1,4), 3442.17. Therefore, POMP model is not as good as ARIMA model in terms of modeling USD CNH exchange rate.
We decided to use ARIMA(4,1,4) to model CNH USD exchange rate. As we discrussed, it captured most important information of data set compared with GARCH model and POMP model. As a time series, CNH USD itself is a decent predictor if ARIMA(4,1,4) is as prediction model.
We can add other predicting factors to our analysis, or extend time window to include more data points, or conduct nonlinear variable transformation if we plan to further imporve our analysis.
The data analysis report is very intuitive and easy to follow. We first conducted data overview for CNH and USD exchange rate. Then we evaluate our original proposed model ARIMA(4,1,4), and estimate two other non linear models, GARCH(1,1) and POMP. By comparing log likelihood value, we concluded that ARIMA(4,1,4) has the best performance.
[1] Edward L. Ionides, STATS 531 Class notes 4,5,6,7,8.9,10,11,12
[2] Ross, Sheldon M. (2014). “Variations on Brownian Motion”. Introduction to Probability Models (11th ed.). Amsterdam: Elsevier. pp. 612-14.
[3] S&P 500 Index, https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC
[4] R.H.Shmway and D.S.Stoffer,Time Series Analysis and Its Application, Chapter 4