After the breakout of Coronavirus (COVID-19) in Wuhan, China in December 2019, strict quarantine policies are implemented in China. US then released travel restrictions to and from China in January, 2020 to stop the spread of COVID-19. Quantifying the impact of COVID-19 on the U.S. tourism demand is the interest of this report. This report is a continue work on the impact analysis of US tourist demand using POMP and GARCH modeling and compare it to the Seasonal Autoregressive Integrated Moving Average (SARIMA) Model.
The data of overseas non-resident arrivals to the U.S. was extracted from the U.S. Government website: National Travel & Tourism Office. The data spans from January, 2003 to March, 2020 and its measurement includes all countries except Canada and Mexico. Notice the data set updated on the February and March overseas non-resident arrivals
Section 2 is the data overview and recap of midterm project. Section 3 includes: Pomp modeling and its likelihood estimation.
The trend plot is shown below:
the SARIMA model for non-stationary monthly data \((p, d, q)\times (P, D, Q)_12\) is defined as \[\phi(B)\Phi(B^{12})((1-B)^d(1-B)^{12})^D Y_n-\mu)=\psi(B)\Psi(B^{12})\epsilon_n,\] where \(\phi(B)\) and \(\Phi(B)\) are polynomials in \(B\) of order \(p\) and \(q\), \(psi(B)\) and\(\Psi(B)\) polynomials in \(B^{12}\) of order \(P\) and \(Q\), \(Y_t\) is the observation at time \(t\), \(\mu=E[(1-B)^d(1-B)^{12})^D]\) is the mean of the differenced process, and \(\epsilon_t\) is the white noise with mean \(0\) and constant variance \(\sigma^2.\)
The SARIMA model on the updated data is SARIMA\((3, 0, 4)\times (1, 1, 1)_{12}\), the summary results is shown as follows
## Series: log(Visits_ts)
## ARIMA(1,0,3)(1,1,1)[12] with drift
##
## Coefficients:
## ar1 ma1 ma2 ma3 sar1 sma1 drift
## 0.7815 -0.4519 0.1894 0.3075 -0.1417 -0.6706 0.0042
## s.e. 0.0632 0.0816 0.0722 0.0887 0.1150 0.0979 0.0004
##
## sigma^2 estimated as 0.002016: log likelihood=322.03
## AIC=-628.05 AICc=-627.27 BIC=-601.99
We can see the diagnostic plots shows the SARIMA model is reasonable for the data set. The log-likelihood of it is 323.5.
The prediction of SARIMA\((3, 0, 4)\times (1, 1, 1)_{12}\) for January, February and March, 2020 are the red points on the plot. As we can see the oversea non-resident arrivals for March is far below expectation.
Rprocess is based on geometric brownian motion (GRM) model of which the logarithm of randomly varying quantity a Brownian motion. It has two state variables: \(N\) and \(\epsilon\) and three parameters: \(\mu\) (drift), \(\sigma\). and \(\delta\) (volatility) (Brewer et al. 2016). The differential equation is
\[dN=\mu Ndt+\delta N \epsilon \sqrt{dt}\] \[d log(N)=(\mu-\frac{\delta^2}{2}) dt+\delta \epsilon \sqrt{dt}\] where \(N\) is the arrivals in each month and \(\epsilon\) is a random draw from \(N(0,\sigma)\). Solving the above equation, we can get \[N_{t+\Delta t}=N_t \exp((\mu-\frac{\delta^2}{2}) \Delta t+\delta \epsilon \sqrt{\Delta t})\], \[N_{t+1}=N_t \exp((\mu-\frac{\delta^2}{2}) \frac{1}{n}+ \frac{\delta}{\sqrt{n}} \epsilon)\], where \(n\) represents the number of months in a year.
We first to take log to detrend data for pomp modeling.
We first define variables in the pomp object and the initial value of \(N\) and \(\epsilon=0\) are set to be 10 and rpois(1). Rmeasure is defined as random draw from a normal distribution with mean 0 and variance \(N\).
pop_statenames = c("N", "e")
pop_paramnames = c("mu","delta","sigma")
stochStep <- Csnippet("e = rnorm(0,sigma);
N = N*exp((mu-delta*delta/2)/12+delta/sqrt(12)*e);")
initlz <- Csnippet("N=10;e=rpois(1);")
dmeas <- Csnippet("lik = dnorm(Arrivals,0,N,give_log);")
rmeas <- Csnippet("Arrivals = rnorm(0,N);")
pomp(data = tour[,1:2],
times="Month",
t0=0,
rprocess=discrete_time(step.fun=stochStep,delta.t=1),
rinit=initlz,
rmeasure = rmeas, dmeasure = dmeas,
statenames=pop_statenames,
paramnames=pop_paramnames,
partrans=parameter_trans(
log=c("mu","sigma"),
logit="delta")
) -> pop_pomp
Next, we would like to investigate the likelihood surface of our model by construct a geometric surface and see how the log-likelihood changes with parameters.
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
From log-likelihood plots, we can see log-likelihood is maximized at around \(\mu=0.4\) along \(\mu\) direction, \(\delta=1\) along \(\delta\) direction, and \(\sigma=0.3\) along \(\sigma\) direction.
## [1] -885.0366 -892.9376 -892.6783 -888.0243 -898.2768 -886.6809 -884.8339
## [8] -894.2333 -888.8317 -888.1283
## se
## -886.4081522 0.6887519
The Particle Filter gives an unbiased Monte Carlo estimate of the log-likelihood. Given 1000 particles, the model, and the parameters, we get an estimate of log-likelihood -884.80 with standard error of 0.57.
We need to build new pomp object before conducting a local maximum likelihood search around previously identified MLE.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -856.8 -856.5 -856.4 -856.5 -856.4 -856.2
The maximum loh-likelihood result from local search is -856.2 and the scatter plots shows the likelihood surface in the neighborhood of given point estimates.
Next, we would like to conduct a global search of log-likelihood. The parameter box is given below:
pop_box <- rbind(mu=c(0, 1),
delta=c(0,3),
sigma=c(0,3))
Both local search and global search give similar max log-likelihood. From the POMP diagnostics plot, we can see overall parameters converges but there is some Monte Carlo variation. \(\hat \mu\), \(\hat \delta\), and \(\hat \sigma\) converge to their maximum-likelihood estimates.
In this project, For the US overseas visits data from 2003 to 2020, SARIMA \((3, 0, 4)\times (1, 1, 1)_{12}\) gives quite good prediction and proves a large abnormal decrease of oversea visits after 2020. Especially for March, the number of visits below estimation more than 2 standard deviation. Although SARIMA model can capture almost all validation of data with clear trend and seasonality, it is quite hard to interpret. Pomp modeling, on the other hand, is easier to explain. Despite it under performs SARIMA model in terms of log-likelihood, it is more adaptable to data with high volatility.
[1] Data: https://travel.trade.gov/view/m-2017-I-001/index.asp
[2] Coronavirus: https://www.cdc.gov/coronavirus/2019-ncov/about/transmission.html
[3] Avdiu, K. Brownian Motion & Geometric Brownian Motion. Retrieved April 26, 2018, from http://homepage.univie.ac.at/kujtim.avdiu/dateien/BrownianMotion.pdf.
[4] Previous Project: https://ionides.github.io/531w18/final_project/3/final.html#geometric-brownian-motion
[5] Lecture notes.