Stock investments are one of the most popular ways for people to choose to grow their wealth. According to data from yahoo finance, Amazon went for IPO on May 15th, 1997, with an IPO price of $18 per share. There had been two 2-for-1 split and one 3-for-1 split. So if someone bought 100 shares of amazon stock with $1,800 in 1997, and still holding those stocks today, that money would grow to 1002232366 = $2,839,200 (calculated by amazon stock price on 4/27/2020). That is 157,733% of overall return in 23 years without adjusting inflation. The massive growth in stock price also sent Jeff Bezos to become the richest man in the world.

However, as an average investor (and a student took stats courses), how could someone like me benefit from the stock growth? Therefore, in this report, I will try to analyze the performance of amazon stock with the last 10 years of performance with different techniques, including ARMA (ARIMA models), GARCH model, and using pomp package to do the financial analysis. The ultimate goal is to try to provide some insight to readers about information and predictions about Amazon stock. Therefore, try to provide some investment insights.

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble  3.0.0     v dplyr   0.8.5
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## v purrr   0.3.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
## 
## Attaching package: 'foreach'
## The following objects are masked from 'package:purrr':
## 
##     accumulate, when
## Welcome to pomp!
## As of version 2.7.1.0, important changes have been made to the
## default settings of the particle filtering algorithms in
## 'pfilter', 'mif2', 'pmcmc', 'bsmc2'.
## These changes are not backward compatible.
## See the package NEWS for the details.
## 
## For information on upgrading your pomp version < 2 code, see the
## "pomp version 2 upgrade guide" at https://kingaa.github.io/pomp/.
## 
## Attaching package: 'pomp'
## The following object is masked from 'package:purrr':
## 
##     map
## Loading required package: iterators
## Loading required package: parallel

Read in Data

The data is a daily data consists of a total of 2,518 data points. The data are ranging from 4/26/2010 to 4/24/2020. We will first plot out the data and see how it looks like. The variable we interested in is the adjusted close price.

AMZN <- read.csv("amzn.csv",sep=",",header=TRUE)
AMZN$log_close <- log(AMZN$Adj.Close)
diff <- diff(AMZN$log_close)
demean <- diff - mean(diff)

Plot the data

I will show three different plots for this dataset. First, I will plot the relationship between time and the adjusted close price. Then I will plot the relationship between time and log-transformed adjusted close price. Finally, I will plot the relationship between time and difference of log return.

The difference of log return is defined as following:

Let’s denote the Amazon stock price on day t is \(Z_t\), then difference of log return \(D_t\) is defined as \[ D_t = log(Z_t) - log(Z_{t-1})\]

The reason we want to see the difference in log return is that differenced data is typically mean stationary. We will see such characteristics in the following third plot.

ARMA? ARIMA? Auto-ARIMA?

ARMA with mean difference data

Here is some information about ARIMA model:

Here, the way I fit with ARIMA is actually fit ARMA with the demeaned data. In other words, I have already calculated the difference and then fit the processed data to the ARMA model. Therefore, below is the set up for the ARMA model.

In the mathematical formula, an ARMA{p,q} model is defined by \[Y_n = \mu + \phi_1(Y_{n-1} - \mu) + \phi_2(Y_{n-2} - \mu) + ... + \phi_p(Y_{n-p} - \mu) + \epsilon_n + \psi_1\epsilon_{n-1} + ... + \psi_q\epsilon_{n-q}\]

The assumption for the ARMA model is stationary, and error term \(\epsilon\) are idd, drawn from \(N(0, \sigma^2)\)
To choose the best ARMA(p,q) model, we could use Akaike information criteria (AIC) values to help us make that decision. Recall that lower AIC values mean better fits. Below are the AIC values from ARIMA(0,0) to ARIMA(5,5)

## Loading required package: knitr
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -12602.60 -12601.59 -12599.79 -12598.16 -12598.47 -12597.26
AR1 -12601.57 -12600.34 -12600.41 -12598.64 -12601.50 -12599.51
AR2 -12599.76 -12597.76 -12601.10 -12595.86 -12599.51 -12597.49
AR3 -12598.09 -12596.86 -12604.17 -12607.32 -12605.28 -12604.65
AR4 -12598.33 -12601.84 -12599.82 -12605.31 -12606.20 -12601.30
AR5 -12596.97 -12594.96 -12603.85 -12602.33 -12604.22 -12600.90

As we could see from the AIC table, the model with the lowest AIC is ARIMA(3,1,3) with AIC value -12607.32. This model has moderate complexity and the lowest AIC value. Therefore, if we want to model Amazon stock with the ARIMA model, ARIMA(3,1,3) should be the right choice.

ARIMA by auto.arima

However, there is a package called forecast which has a function “auto.arima”. According to the documentation, \(^{[3]}\), auto.arima returns the best ARIMA model according to either AIC, AICc, or BIC value. The function searches for a possible model within the order constraints provided. Therefore, I will fit the adjusted close price on the log scale with auto.arima to see how the result will be different from what I did above.

Also, if interested, the information of ARIMA model could be found at reference \(^{[6]}\)

## Loading required package: forecast
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:pomp':
## 
##     forecast
## Loading required package: tseries
## Loading required package: timeSeries
## Loading required package: timeDate
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:pomp':
## 
##     time<-
## Loading required package: fGarch
## Loading required package: fBasics
## Loading required package: rugarch
## 
## Attaching package: 'rugarch'
## The following object is masked from 'package:purrr':
## 
##     reduce
## The following object is masked from 'package:stats':
## 
##     sigma
## Series: AMZN$log_close 
## ARIMA(1,1,1) with drift 
## Box Cox transformation: lambda= 1.174328 
## 
## Coefficients:
##          ar1      ma1   drift
##       0.9017  -0.9189  0.0015
## s.e.  0.1104   0.1026  0.0004
## 
## sigma^2 estimated as 0.0007349:  log likelihood=5511.08
## AIC=-11014.16   AICc=-11014.14   BIC=-10990.84
## 
## Training set error measures:
##                         ME       RMSE        MAE         MPE      MAPE
## Training set -1.115879e-05 0.01975801 0.01356845 -0.00144157 0.2241459
##                   MASE         ACF1
## Training set 0.9960286 -0.003931504

We could see here the choice becomes different from auto.arima. Now auto.arima suggests I should use ARIMA(1,1,1) instead of ARIMA(3,1,3), even before with ARIMA(3,1,3), I reach to an AIC value of -12607.32, and with auto.arima, the ARIMA(1,1,1) only has an AIC value of -11014.16. In the next step, I will try to do some forecasting the price of Amazon stock for the next 30 days, and try to construct a confidence interval with ARIMA(1,1,1). Let’s see what will happen.

Forecasting 30 days of stock price with ARIMA(1,1,1) and Condifence Interval

price_forecast <- forecast(modelfit, h=30)
plot(price_forecast, main="Forecast plot with ARIMA(1,1,1)")

As we could see in this plot, ARIMA(1,1,1) predicted an increasing trend. We could identify this by the solid blue line, which slightly pointing upward. Moreover, one nice thing about the forecast package is that it will also provide confidence intervals. The two confidence intervals presented by default is 80% CI and 95% CI. Now let’s take a look at what those numbers are. One thing we need to keep in mind is that our value is currently on the log scale. To make the result looks make sense, we need to perform an exponential transformation first.

# Confidence interval
CI <- as.data.frame(cbind(exp(price_forecast$lower), exp(price_forecast$upper)))
CI
##    exp(price_forecast$lower).80% exp(price_forecast$lower).95%
## 1                       2350.712                      2320.656
## 2                       2326.620                      2285.006
## 3                       2308.784                      2258.702
## 4                       2294.343                      2237.391
## 5                       2282.156                      2219.349
## 6                       2271.617                      2203.668
## 7                       2262.356                      2189.796
## 8                       2254.122                      2177.365
## 9                       2246.736                      2166.115
## 10                      2240.064                      2155.851
## 11                      2234.000                      2146.423
## 12                      2228.464                      2137.716
## 13                      2223.387                      2129.634
## 14                      2218.713                      2122.099
## 15                      2214.396                      2115.048
## 16                      2210.397                      2108.425
## 17                      2206.683                      2102.186
## 18                      2203.224                      2096.292
## 19                      2199.995                      2090.707
## 20                      2196.974                      2085.404
## 21                      2194.143                      2080.355
## 22                      2191.485                      2075.540
## 23                      2188.984                      2070.938
## 24                      2186.628                      2066.531
## 25                      2184.405                      2062.305
## 26                      2182.304                      2058.244
## 27                      2180.317                      2054.338
## 28                      2178.434                      2050.575
## 29                      2176.649                      2046.944
## 30                      2174.954                      2043.438
##    exp(price_forecast$upper).80% exp(price_forecast$upper).95%
## 1                       2467.739                      2499.655
## 2                       2490.633                      2535.904
## 3                       2507.986                      2563.463
## 4                       2522.586                      2586.623
## 5                       2535.515                      2607.053
## 6                       2547.320                      2625.608
## 7                       2558.322                      2642.794
## 8                       2568.724                      2658.937
## 9                       2578.665                      2674.260
## 10                      2588.242                      2688.925
## 11                      2597.526                      2703.048
## 12                      2606.569                      2716.720
## 13                      2615.412                      2730.011
## 14                      2624.085                      2742.974
## 15                      2632.613                      2755.655
## 16                      2641.015                      2768.089
## 17                      2649.307                      2780.304
## 18                      2657.501                      2792.325
## 19                      2665.608                      2804.174
## 20                      2673.637                      2815.865
## 21                      2681.594                      2827.415
## 22                      2689.486                      2838.835
## 23                      2697.317                      2850.137
## 24                      2705.092                      2861.329
## 25                      2712.815                      2872.421
## 26                      2720.490                      2883.418
## 27                      2728.118                      2894.328
## 28                      2735.704                      2905.155
## 29                      2743.248                      2915.906
## 30                      2750.753                      2926.585

Just to keep in mind that the adjusted close price on 4/24/2020 is $2,410.22. According to ARIMA(1,1,1) model, if we use a 95% confidence interval, that value could be either as low as $2,043.438, or as high as $2,926.585. So another way to think about this is if you bought one stock of Amazon on 4/24/2020, according to ARIMA(1,1,1) model, after one month, in most extreme cases, you could either lose $366.79, or gain a whooping $516.37. However, most likely, the number should be somewhere between. And that is the beauty of the stock market: higher risk, higher return. And that is why so many people, including me, were so into it.

Training / Testing Split and Model Validation

One final thing is we cannot validate the model without talking about training/testing split. And here we go. Let’s split the original data to the first 70% as the training set, and the last 30% as the testing set. Still, we will work on the adjusted close price, and we will work on the log scale. Let’s see how our model fits the data. Again, since the data is on the log scale, we should always remember to do the exponential transformation to make the result easier to interpret.

As we could see above, the ARIMA(1,1,1) successfully caught the peak at the end. However, if we take a look near time 350, we could see Amazon stock price was relatively high at that time, too. However, ARIMA (1,1,1) was not able to catch that. That suggests us maybe we should try out other models.

GARCH \(^{[1]}\)

The generalized ARCH model, known as GARCH(p,q), has the form \[Y_n=\epsilon_n\sqrt{V_n}\]

where \[V_n=a_0+\sum_{j=1}^pa_jY_{n-j}^2+\sum_{k=1}^qb_kY_{n-k}^2\] and \(\epsilon_{1:N}\) is white noise.

The GARCH(1.1) model is a popular choice (Cowpertwait and Metcalfe; 2009) which can be fitted using garch() in the tseries R package.

## ----garch--------------------------------------------------------------------
require(tseries)
fit.garch = garch(demean, order = c(1,1), grad = "numerical", trace = FALSE)
L.garch <- tseries:::logLik.garch(fit.garch)
L.garch
## 'log Lik.' 6338.532 (df=3)

This 3-parameter model has a maximized log likelihood of 6338.532 with degree of freedom equals to 3.

GARCH take two

As I explained the GARCH(1,1) above, it’s time to use some prebuilt package to do some work. Here, I will try to apply a standard GARCH(1,1) model over ARMA(2,2), and we want to check whether or not any improvements are made comparing to our ARIMA model.

## $fit
## 
## *----------------------------------*
## *          ARFIMA Model Fit        *
## *----------------------------------*
## Mean Model   : ARFIMA(2,0,1)
## Distribution : norm 
## 
## Optimal Parameters
## ------------------------------------
##        Estimate  Std. Error    t value Pr(>|t|)
## mu     4.994316    0.012562    397.572        0
## ar1    1.931343    0.000033  59354.179        0
## ar2   -0.932522    0.000046 -20259.137        0
## ma1   -0.701365    0.004720   -148.600        0
## sigma  0.021772    0.000278     78.274        0
## 
## Robust Standard Errors:
##        Estimate  Std. Error    t value Pr(>|t|)
## mu     4.994316    0.016680    299.420        0
## ar1    1.931343    0.000047  41330.944        0
## ar2   -0.932522    0.000081 -11564.163        0
## ma1   -0.701365    0.013129    -53.419        0
## sigma  0.021772    0.000739     29.463        0
## 
## LogLikelihood : 6063.859 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -4.8124
## Bayes        -4.8009
## Shibata      -4.8124
## Hannan-Quinn -4.8082
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                          statistic   p-value
## Lag[1]                       38.68 4.998e-10
## Lag[2*(p+q)+(p+q)-1][8]      58.09 0.000e+00
## Lag[4*(p+q)+(p+q)-1][14]     66.74 0.000e+00
## 
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      32.74 1.055e-08
## Lag[2*(p+q)+(p+q)-1][2]     43.12 3.913e-12
## Lag[4*(p+q)+(p+q)-1][5]     56.32 9.992e-16
## 
## 
## ARCH LM Tests
## ------------------------------------
##              Statistic DoF   P-Value
## ARCH Lag[2]      48.16   2 3.491e-11
## ARCH Lag[5]      58.19   5 2.876e-11
## ARCH Lag[10]     64.20  10 5.758e-10
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  62.2004
## Individual Statistics:             
## mu     0.7487
## ar1   24.2444
## ar2   24.5326
## ma1    8.5185
## sigma  0.2535
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## 
## Elapsed time : 7.529865 
## 
## 
## $rank.matrix
##    ar1 ar2 ma1 ma2 im arf          AIC converged
## 1    1   1   1   0  1   0  -4.81243747         1
## 2    1   0   0   1  1   0  -4.80697034         1
## 3    1   1   0   1  1   0  -4.70244584         1
## 4    1   0   1   1  1   0  -4.68780214         1
## 5    1   1   0   0  1   0  -4.67221264         1
## 6    1   0   1   0  1   0  -4.63836389         1
## 7    1   0   0   0  1   0  -4.56531310         1
## 8    0   1   1   0  1   0  -4.51038869         1
## 9    0   1   0   0  1   0  -3.63307665         1
## 10   1   0   1   1  0   0  -2.87826368         1
## 11   1   1   0   1  0   0  -2.75301312         1
## 12   1   0   0   1  0   0  -2.74721652         1
## 13   1   0   1   0  0   0  -2.74287644         1
## 14   1   0   0   0  0   0  -2.54141535         1
## 15   1   1   1   1  0   0  -2.09794656         1
## 16   0   1   1   1  0   0  -0.67397216         1
## 17   0   1   1   0  0   0  -0.66145572         1
## 18   0   1   0   1  0   0  -0.61035969         1
## 19   0   1   0   0  0   0  -0.47549639         1
## 20   0   0   1   1  1   0   0.09041063         1
## 21   0   0   1   0  1   0   1.20805556         1
## 22   0   0   0   1  1   0   1.23094917         1
## 23   0   0   0   0  1   0   2.54121339         1
## 24   0   0   1   1  0   0   3.93613037         1
## 25   0   0   1   0  0   0   5.15543541         1
## 26   0   0   0   1  0   0   5.16492137         1
## 27   0   1   1   1  1   0 101.03540216         1
## 28   1   1   0   0  0   0           NA         0
## 29   1   1   1   0  0   0           NA         0
## 30   0   1   0   1  1   0           NA         0
## 31   1   1   1   1  1   0           NA         0

From the summary above, we could see that we should use the ARFIMA(2,0,1) model. ARFIMA stands for Autoregressive fractionally integrated moving average. The detailed explanation of the ARFIMA model is below \(^{[5]}\)

With the parameters collected, we choose ARFIMA (2,0,1) and incorporate the parameters into a garch model.

As we could see, the volatility is somehow stationary with respect to time. Now let’s see how the combined model performs and how the prediction looks like.

As we could see on the plot, the combined model provided a much smaller confidence interval, and the trend is still upwards. One thing to keep in mind is that the plot is different than all plots above; it is on the log scale. Since I didn’t figure out a way to transform it back to the normal scale, I just keep it there. However, we should expect that if we transfer it to the normal scale, the upward slope should be steeper

Pomp model with financial leverge \(^{[1]}\) \(^{[2]}\)

As we fitted the GARCH model before, we noticed that GARCH is more like a black box model, and it’s parameters don’t have a clear interpretation. To develop and test the hypothesis, we need the help of a pomp model.

\(R_n\) is formally defined as leverage on day n as the correlation between index return on day (n-1) and the inincrease in the log volatility from day (n-1) to day n.” Here, we introduce a pomp implementation of Breto (2014), which models \(R_n\) as a random walk on a transformed scale \[R_n=\frac{exp(2G_n)-1}{exp(2G_n)+1}\] where \(G_n\) is the usual, Gaussian random walk.

\[Y_n=exp(H_n/2)\epsilon_n\] \[H_n=\mu_h(1-\phi)+\phi H_{n-1}+\beta_{n-1}R_nexp(-H_{n-1}/2)+W_n\] \[G_n=G_{n-1}+v_n\] where \(\beta_n=Y_n\sigma_{\eta}\sqrt{1-\phi^2}\), \(\epsilon_n \sim i.i.d. N(0,1)\), \(v_n \sim i.i.d. N(0,\sigma_v^2)\), and \(w_n \sim i.i.d. N(0,\sigma_{\omega}^2)\)

Here, we choose the iterated filtering algorithm (IF2) to converge toward the region of parameter space maximizing the maximum likelihood. In this case, we use the state variable \(X_n=(G_n,H_n,Y_n)\).

To determinene a recursive particle filter, we write the filtered particle j at time n - 1 as

\[X_{n-1,j}^F=(G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}^*)\]

Now we can construct prediction particles at time n, \[(G_{n,j}^p,H_{n,j}^p)\sim f_{G_n,H_n|G_{n-1},H_{n-1},Y_{n-1}}(g_n|G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}^*)\] with corresponding weight \(w_{n,j}=f_{Y_n|G_n,H_n}(y_n^*|G_{n,j}^P,H_{n,j}^P)\)

## ----names--------------------------------------------------------------------
amzn_statenames <- c("H","G","Y_state")
amzn_rp_names <- c("sigma_nu","mu_h","phi","sigma_eta")
amzn_ivp_names <- c("G_0","H_0")
amzn_paramnames <- c(amzn_rp_names,amzn_ivp_names)
amzn_covarnames <- "covaryt"

## ----rproc--------------------------------------------------------------------
rproc1 <- "
  double beta,omega,nu;
  omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) * 
    sqrt(1-tanh(G)*tanh(G)));
  nu = rnorm(0, sigma_nu);
  G += nu;
  beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
  H = mu_h*(1 - phi) + phi*H + beta * tanh( G ) 
    * exp(-H/2) + omega;
"
rproc2.sim <- "
  Y_state = rnorm( 0,exp(H/2) );
 "

rproc2.filt <- "
  Y_state = covaryt;
 "
amzn_rproc.sim <- paste(rproc1,rproc2.sim)
amzn_rproc.filt <- paste(rproc1,rproc2.filt)


## ----rinit--------------------------------------------------------------------
amzn_rinit <- "
  G = G_0;
  H = H_0;
  Y_state = rnorm( 0,exp(H/2) );
"


## ----measure------------------------------------------------------------------
amzn_rmeasure <- "
   y=Y_state;
"

amzn_dmeasure <- "
   lik=dnorm(y,0,exp(H/2),give_log);
"


## ----transforms---------------------------------------------------------------
amzn_partrans <- parameter_trans(
  log=c("sigma_eta","sigma_nu"),
  logit="phi"
)


## ----sp_pomp------------------------------------------------------------------
amzn.filt <- pomp(data=data.frame(
    y=demean,time=1:length(demean)),
  statenames=amzn_statenames,
  paramnames=amzn_paramnames,
  times="time",
  t0=0,
  covar=covariate_table(
    time=0:length(demean),
    covaryt=c(0,demean),
    times="time"),
  rmeasure=Csnippet(amzn_rmeasure),
  dmeasure=Csnippet(amzn_dmeasure),
  rprocess=discrete_time(step.fun=Csnippet(amzn_rproc.filt),
    delta.t=1),
  rinit=Csnippet(amzn_rinit),
  partrans=amzn_partrans
)

## ----run_level----------------------------------------------------------------
run_level <- 3
amzn_Np <-           switch(run_level, 100, 1e3, 2e3)
amzn_Nmif <-         switch(run_level,  10, 100, 200)
amzn_Nreps_eval <-   switch(run_level,   4,  10,  20)
amzn_Nreps_local <-  switch(run_level,  10,  20,  20)
amzn_Nreps_global <- switch(run_level,  10,  20, 100)

## ----sim_pomp-----------------------------------------------------------------
params_test <- c(
  sigma_nu = exp(-4.5),  
  mu_h = -0.25,      
  phi = expit(4),    
  sigma_eta = exp(-0.07),
  G_0 = 0,
  H_0=0
)



## ----parallel-setup,cache=FALSE-----------------------------------------------
library(doRNG)
## Loading required package: rngtools
registerDoRNG(34118892)


## ----mif_setup----------------------------------------------------------------
amzn_rw.sd_rp <- 0.02
amzn_rw.sd_ivp <- 0.1
amzn_cooling.fraction.50 <- 0.5
amzn_rw.sd <- rw.sd(
  sigma_nu  = 0.02,
  mu_h      = 0.02,
  phi       = 0.02,
  sigma_eta = 0.02,
  G_0       = ivp(0.1),
  H_0       = ivp(0.1)
)    

## ----box----------------------------------------------------------------------
amzn_box <- rbind(
 sigma_nu=c(0.005,0.05),
 mu_h    =c(-1,0),
 phi = c(0.95,0.99),
 sigma_eta = c(0.5,1),
 G_0 = c(-2,2),
 H_0 = c(-1,1)
)
varName <- c("amzn_rw.sd", "amzn.filt", "params_test", "amzn_Np", "amzn_Nmif", "amzn_cooling.fraction.50", "amzn_Nreps_eval", "amzn_box")
stew(file=sprintf("mif1-%d.rda",run_level),{
  t.if1 <- system.time({
  if1 <- foreach(i=1:amzn_Nreps_local,
    .packages='pomp', .combine=c, .export=varName) %dopar% mif2(amzn.filt,
      params=params_test,
      Np=amzn_Np,
      Nmif=amzn_Nmif,
      cooling.fraction.50=amzn_cooling.fraction.50,
      rw.sd = amzn_rw.sd)
  L.if1 <- foreach(i=1:amzn_Nreps_local,
    .packages='pomp', .combine=rbind, .export=varName) %dopar% logmeanexp(
      replicate(amzn_Nreps_eval, logLik(pfilter(amzn.filt,
        params=coef(if1[[i]]),Np=amzn_Np))), se=TRUE)
  })
},seed=318817883,kind="L'Ecuyer")


r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],
  t(sapply(if1,coef)))
if (run_level>1) write.table(r.if1,file="amzn_params.csv",
  append=TRUE,col.names=FALSE,row.names=FALSE)

## ----pairs_plot,echo=F,eval=T,out.width="11cm"--------------------------------
pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,
  data=subset(r.if1,logLik>max(logLik)-20))

## ----box_eval-----------------------------------------------------------------
stew(file=sprintf("box_eval-%d.rda",run_level),{
  t.box <- system.time({
    if.box <- foreach(i=1:amzn_Nreps_global,
      .packages='pomp',.combine=c, .export = c("if1", "amzn_box")) %dopar% mif2(if1[[1]],
        params=apply(amzn_box,1,function(x)runif(1,x)))
    L.box <- foreach(i=1:amzn_Nreps_global,
      .packages='pomp',.combine=rbind, .export = varName) %dopar% {
         logmeanexp(replicate(amzn_Nreps_eval, logLik(pfilter(
         amzn.filt,params=coef(if.box[[i]]),Np=amzn_Np))), 
           se=TRUE)
       }
  })
},seed=290860873,kind="L'Ecuyer")

r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],
  t(sapply(if.box,coef)))
if(run_level>1) write.table(r.box,file="amzn_params.csv",
  append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.box$logLik,digits=5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6585    6600    6608    6629    6670    6673
## ----pairs_global_plot,eval=T,echo=F,out.width="11cm"-------------------------
pairs(~logLik+log(sigma_nu)+mu_h+phi+sigma_eta+H_0,
  data=subset(r.box,logLik>max(logLik)-10))

plot(if.box)

By looking at the diagnosis plot, I found one thing that is very interesting. With run level = 3, the log-likelihood converges very fast; it converged after around ten iterations. Moreover, other parameters also converged after 150 iterations. That suggests we could save some computation power by reducing the number of iterations to 150.

Compare GARCH with Pomp model

## ----garch_benchmark,eval=F,echo=F--------------------------------------------
library(tseries)
fit.garch.benchmark <- garch(demean,grad = "numerical", trace = FALSE)
L.garch.benchmark <- tseries:::logLik.garch(fit.garch.benchmark)
L.garch.benchmark
## 'log Lik.' 6338.532 (df=3)
# Log likelihood with Pomp
summary(r.box$logLik,digits=5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6585    6600    6608    6629    6670    6673

As we could see, the benchmark GARCH has log-likelihood of 6338.532. However, with the pomp model, the minimum value of log-likelihood is 6585. That brings the point why we should try out pomp model.

Summary

In a word, about all the analysis I did above, I personally think it’s still a good time to hold amazon stocks since due to our model, the price should increase in the future. However, this is just a report for a class project, and any result I came up with cannot be treated as a piece of financial advice. I hope everyone reading this report could achieve financial success in the future.

Refernece

  1. Lecture Notes 14

  2. https://ionides.github.io/531w18/final_project/1/final.html#conclu

  3. https://www.rdocumentation.org/packages/forecast/versions/8.12/topics/auto.arima

  4. https://rpubs.com/kapage/523169

  5. ARFIMA model: https://en.wikipedia.org/wiki/Autoregressive_fractionally_integrated_moving_average

  6. ARIMA model: https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average