Stock investments are one of the most popular ways for people to choose to grow their wealth. According to data from yahoo finance, Amazon went for IPO on May 15th, 1997, with an IPO price of $18 per share. There had been two 2-for-1 split and one 3-for-1 split. So if someone bought 100 shares of amazon stock with $1,800 in 1997, and still holding those stocks today, that money would grow to 1002232366 = $2,839,200 (calculated by amazon stock price on 4/27/2020). That is 157,733% of overall return in 23 years without adjusting inflation. The massive growth in stock price also sent Jeff Bezos to become the richest man in the world.
However, as an average investor (and a student took stats courses), how could someone like me benefit from the stock growth? Therefore, in this report, I will try to analyze the performance of amazon stock with the last 10 years of performance with different techniques, including ARMA (ARIMA models), GARCH model, and using pomp package to do the financial analysis. The ultimate goal is to try to provide some insight to readers about information and predictions about Amazon stock. Therefore, try to provide some investment insights.
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble 3.0.0 v dplyr 0.8.5
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## v purrr 0.3.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
##
## Attaching package: 'foreach'
## The following objects are masked from 'package:purrr':
##
## accumulate, when
## Welcome to pomp!
## As of version 2.7.1.0, important changes have been made to the
## default settings of the particle filtering algorithms in
## 'pfilter', 'mif2', 'pmcmc', 'bsmc2'.
## These changes are not backward compatible.
## See the package NEWS for the details.
##
## For information on upgrading your pomp version < 2 code, see the
## "pomp version 2 upgrade guide" at https://kingaa.github.io/pomp/.
##
## Attaching package: 'pomp'
## The following object is masked from 'package:purrr':
##
## map
## Loading required package: iterators
## Loading required package: parallel
The data is a daily data consists of a total of 2,518 data points. The data are ranging from 4/26/2010 to 4/24/2020. We will first plot out the data and see how it looks like. The variable we interested in is the adjusted close price.
AMZN <- read.csv("amzn.csv",sep=",",header=TRUE)
AMZN$log_close <- log(AMZN$Adj.Close)
diff <- diff(AMZN$log_close)
demean <- diff - mean(diff)
I will show three different plots for this dataset. First, I will plot the relationship between time and the adjusted close price. Then I will plot the relationship between time and log-transformed adjusted close price. Finally, I will plot the relationship between time and difference of log return.
The difference of log return is defined as following:
Let’s denote the Amazon stock price on day t is \(Z_t\), then difference of log return \(D_t\) is defined as \[ D_t = log(Z_t) - log(Z_{t-1})\]
The reason we want to see the difference in log return is that differenced data is typically mean stationary. We will see such characteristics in the following third plot.
Here is some information about ARIMA model:
Here, the way I fit with ARIMA is actually fit ARMA with the demeaned data. In other words, I have already calculated the difference and then fit the processed data to the ARMA model. Therefore, below is the set up for the ARMA model.
In the mathematical formula, an ARMA{p,q} model is defined by \[Y_n = \mu + \phi_1(Y_{n-1} - \mu) + \phi_2(Y_{n-2} - \mu) + ... + \phi_p(Y_{n-p} - \mu) + \epsilon_n + \psi_1\epsilon_{n-1} + ... + \psi_q\epsilon_{n-q}\]
The assumption for the ARMA model is stationary, and error term \(\epsilon\) are idd, drawn from \(N(0, \sigma^2)\)
To choose the best ARMA(p,q) model, we could use Akaike information criteria (AIC) values to help us make that decision. Recall that lower AIC values mean better fits. Below are the AIC values from ARIMA(0,0) to ARIMA(5,5)
## Loading required package: knitr
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | -12602.60 | -12601.59 | -12599.79 | -12598.16 | -12598.47 | -12597.26 |
AR1 | -12601.57 | -12600.34 | -12600.41 | -12598.64 | -12601.50 | -12599.51 |
AR2 | -12599.76 | -12597.76 | -12601.10 | -12595.86 | -12599.51 | -12597.49 |
AR3 | -12598.09 | -12596.86 | -12604.17 | -12607.32 | -12605.28 | -12604.65 |
AR4 | -12598.33 | -12601.84 | -12599.82 | -12605.31 | -12606.20 | -12601.30 |
AR5 | -12596.97 | -12594.96 | -12603.85 | -12602.33 | -12604.22 | -12600.90 |
As we could see from the AIC table, the model with the lowest AIC is ARIMA(3,1,3) with AIC value -12607.32. This model has moderate complexity and the lowest AIC value. Therefore, if we want to model Amazon stock with the ARIMA model, ARIMA(3,1,3) should be the right choice.
However, there is a package called forecast which has a function “auto.arima”. According to the documentation, \(^{[3]}\), auto.arima returns the best ARIMA model according to either AIC, AICc, or BIC value. The function searches for a possible model within the order constraints provided. Therefore, I will fit the adjusted close price on the log scale with auto.arima to see how the result will be different from what I did above.
Also, if interested, the information of ARIMA model could be found at reference \(^{[6]}\)
## Loading required package: forecast
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Attaching package: 'forecast'
## The following object is masked from 'package:pomp':
##
## forecast
## Loading required package: tseries
## Loading required package: timeSeries
## Loading required package: timeDate
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:pomp':
##
## time<-
## Loading required package: fGarch
## Loading required package: fBasics
## Loading required package: rugarch
##
## Attaching package: 'rugarch'
## The following object is masked from 'package:purrr':
##
## reduce
## The following object is masked from 'package:stats':
##
## sigma
## Series: AMZN$log_close
## ARIMA(1,1,1) with drift
## Box Cox transformation: lambda= 1.174328
##
## Coefficients:
## ar1 ma1 drift
## 0.9017 -0.9189 0.0015
## s.e. 0.1104 0.1026 0.0004
##
## sigma^2 estimated as 0.0007349: log likelihood=5511.08
## AIC=-11014.16 AICc=-11014.14 BIC=-10990.84
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -1.115879e-05 0.01975801 0.01356845 -0.00144157 0.2241459
## MASE ACF1
## Training set 0.9960286 -0.003931504
We could see here the choice becomes different from auto.arima. Now auto.arima suggests I should use ARIMA(1,1,1) instead of ARIMA(3,1,3), even before with ARIMA(3,1,3), I reach to an AIC value of -12607.32, and with auto.arima, the ARIMA(1,1,1) only has an AIC value of -11014.16. In the next step, I will try to do some forecasting the price of Amazon stock for the next 30 days, and try to construct a confidence interval with ARIMA(1,1,1). Let’s see what will happen.
price_forecast <- forecast(modelfit, h=30)
plot(price_forecast, main="Forecast plot with ARIMA(1,1,1)")
As we could see in this plot, ARIMA(1,1,1) predicted an increasing trend. We could identify this by the solid blue line, which slightly pointing upward. Moreover, one nice thing about the forecast package is that it will also provide confidence intervals. The two confidence intervals presented by default is 80% CI and 95% CI. Now let’s take a look at what those numbers are. One thing we need to keep in mind is that our value is currently on the log scale. To make the result looks make sense, we need to perform an exponential transformation first.
# Confidence interval
CI <- as.data.frame(cbind(exp(price_forecast$lower), exp(price_forecast$upper)))
CI
## exp(price_forecast$lower).80% exp(price_forecast$lower).95%
## 1 2350.712 2320.656
## 2 2326.620 2285.006
## 3 2308.784 2258.702
## 4 2294.343 2237.391
## 5 2282.156 2219.349
## 6 2271.617 2203.668
## 7 2262.356 2189.796
## 8 2254.122 2177.365
## 9 2246.736 2166.115
## 10 2240.064 2155.851
## 11 2234.000 2146.423
## 12 2228.464 2137.716
## 13 2223.387 2129.634
## 14 2218.713 2122.099
## 15 2214.396 2115.048
## 16 2210.397 2108.425
## 17 2206.683 2102.186
## 18 2203.224 2096.292
## 19 2199.995 2090.707
## 20 2196.974 2085.404
## 21 2194.143 2080.355
## 22 2191.485 2075.540
## 23 2188.984 2070.938
## 24 2186.628 2066.531
## 25 2184.405 2062.305
## 26 2182.304 2058.244
## 27 2180.317 2054.338
## 28 2178.434 2050.575
## 29 2176.649 2046.944
## 30 2174.954 2043.438
## exp(price_forecast$upper).80% exp(price_forecast$upper).95%
## 1 2467.739 2499.655
## 2 2490.633 2535.904
## 3 2507.986 2563.463
## 4 2522.586 2586.623
## 5 2535.515 2607.053
## 6 2547.320 2625.608
## 7 2558.322 2642.794
## 8 2568.724 2658.937
## 9 2578.665 2674.260
## 10 2588.242 2688.925
## 11 2597.526 2703.048
## 12 2606.569 2716.720
## 13 2615.412 2730.011
## 14 2624.085 2742.974
## 15 2632.613 2755.655
## 16 2641.015 2768.089
## 17 2649.307 2780.304
## 18 2657.501 2792.325
## 19 2665.608 2804.174
## 20 2673.637 2815.865
## 21 2681.594 2827.415
## 22 2689.486 2838.835
## 23 2697.317 2850.137
## 24 2705.092 2861.329
## 25 2712.815 2872.421
## 26 2720.490 2883.418
## 27 2728.118 2894.328
## 28 2735.704 2905.155
## 29 2743.248 2915.906
## 30 2750.753 2926.585
Just to keep in mind that the adjusted close price on 4/24/2020 is $2,410.22. According to ARIMA(1,1,1) model, if we use a 95% confidence interval, that value could be either as low as $2,043.438, or as high as $2,926.585. So another way to think about this is if you bought one stock of Amazon on 4/24/2020, according to ARIMA(1,1,1) model, after one month, in most extreme cases, you could either lose $366.79, or gain a whooping $516.37. However, most likely, the number should be somewhere between. And that is the beauty of the stock market: higher risk, higher return. And that is why so many people, including me, were so into it.
One final thing is we cannot validate the model without talking about training/testing split. And here we go. Let’s split the original data to the first 70% as the training set, and the last 30% as the testing set. Still, we will work on the adjusted close price, and we will work on the log scale. Let’s see how our model fits the data. Again, since the data is on the log scale, we should always remember to do the exponential transformation to make the result easier to interpret.
As we could see above, the ARIMA(1,1,1) successfully caught the peak at the end. However, if we take a look near time 350, we could see Amazon stock price was relatively high at that time, too. However, ARIMA (1,1,1) was not able to catch that. That suggests us maybe we should try out other models.
The generalized ARCH model, known as GARCH(p,q), has the form \[Y_n=\epsilon_n\sqrt{V_n}\]
where \[V_n=a_0+\sum_{j=1}^pa_jY_{n-j}^2+\sum_{k=1}^qb_kY_{n-k}^2\] and \(\epsilon_{1:N}\) is white noise.
The GARCH(1.1) model is a popular choice (Cowpertwait and Metcalfe; 2009) which can be fitted using garch() in the tseries R package.
## ----garch--------------------------------------------------------------------
require(tseries)
fit.garch = garch(demean, order = c(1,1), grad = "numerical", trace = FALSE)
L.garch <- tseries:::logLik.garch(fit.garch)
L.garch
## 'log Lik.' 6338.532 (df=3)
This 3-parameter model has a maximized log likelihood of 6338.532 with degree of freedom equals to 3.
As I explained the GARCH(1,1) above, it’s time to use some prebuilt package to do some work. Here, I will try to apply a standard GARCH(1,1) model over ARMA(2,2), and we want to check whether or not any improvements are made comparing to our ARIMA model.
## $fit
##
## *----------------------------------*
## * ARFIMA Model Fit *
## *----------------------------------*
## Mean Model : ARFIMA(2,0,1)
## Distribution : norm
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 4.994316 0.012562 397.572 0
## ar1 1.931343 0.000033 59354.179 0
## ar2 -0.932522 0.000046 -20259.137 0
## ma1 -0.701365 0.004720 -148.600 0
## sigma 0.021772 0.000278 78.274 0
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 4.994316 0.016680 299.420 0
## ar1 1.931343 0.000047 41330.944 0
## ar2 -0.932522 0.000081 -11564.163 0
## ma1 -0.701365 0.013129 -53.419 0
## sigma 0.021772 0.000739 29.463 0
##
## LogLikelihood : 6063.859
##
## Information Criteria
## ------------------------------------
##
## Akaike -4.8124
## Bayes -4.8009
## Shibata -4.8124
## Hannan-Quinn -4.8082
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 38.68 4.998e-10
## Lag[2*(p+q)+(p+q)-1][8] 58.09 0.000e+00
## Lag[4*(p+q)+(p+q)-1][14] 66.74 0.000e+00
##
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 32.74 1.055e-08
## Lag[2*(p+q)+(p+q)-1][2] 43.12 3.913e-12
## Lag[4*(p+q)+(p+q)-1][5] 56.32 9.992e-16
##
##
## ARCH LM Tests
## ------------------------------------
## Statistic DoF P-Value
## ARCH Lag[2] 48.16 2 3.491e-11
## ARCH Lag[5] 58.19 5 2.876e-11
## ARCH Lag[10] 64.20 10 5.758e-10
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 62.2004
## Individual Statistics:
## mu 0.7487
## ar1 24.2444
## ar2 24.5326
## ma1 8.5185
## sigma 0.2535
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
##
## Elapsed time : 7.529865
##
##
## $rank.matrix
## ar1 ar2 ma1 ma2 im arf AIC converged
## 1 1 1 1 0 1 0 -4.81243747 1
## 2 1 0 0 1 1 0 -4.80697034 1
## 3 1 1 0 1 1 0 -4.70244584 1
## 4 1 0 1 1 1 0 -4.68780214 1
## 5 1 1 0 0 1 0 -4.67221264 1
## 6 1 0 1 0 1 0 -4.63836389 1
## 7 1 0 0 0 1 0 -4.56531310 1
## 8 0 1 1 0 1 0 -4.51038869 1
## 9 0 1 0 0 1 0 -3.63307665 1
## 10 1 0 1 1 0 0 -2.87826368 1
## 11 1 1 0 1 0 0 -2.75301312 1
## 12 1 0 0 1 0 0 -2.74721652 1
## 13 1 0 1 0 0 0 -2.74287644 1
## 14 1 0 0 0 0 0 -2.54141535 1
## 15 1 1 1 1 0 0 -2.09794656 1
## 16 0 1 1 1 0 0 -0.67397216 1
## 17 0 1 1 0 0 0 -0.66145572 1
## 18 0 1 0 1 0 0 -0.61035969 1
## 19 0 1 0 0 0 0 -0.47549639 1
## 20 0 0 1 1 1 0 0.09041063 1
## 21 0 0 1 0 1 0 1.20805556 1
## 22 0 0 0 1 1 0 1.23094917 1
## 23 0 0 0 0 1 0 2.54121339 1
## 24 0 0 1 1 0 0 3.93613037 1
## 25 0 0 1 0 0 0 5.15543541 1
## 26 0 0 0 1 0 0 5.16492137 1
## 27 0 1 1 1 1 0 101.03540216 1
## 28 1 1 0 0 0 0 NA 0
## 29 1 1 1 0 0 0 NA 0
## 30 0 1 0 1 1 0 NA 0
## 31 1 1 1 1 1 0 NA 0
From the summary above, we could see that we should use the ARFIMA(2,0,1) model. ARFIMA stands for Autoregressive fractionally integrated moving average. The detailed explanation of the ARFIMA model is below \(^{[5]}\)
With the parameters collected, we choose ARFIMA (2,0,1) and incorporate the parameters into a garch model.
As we could see, the volatility is somehow stationary with respect to time. Now let’s see how the combined model performs and how the prediction looks like.
As we could see on the plot, the combined model provided a much smaller confidence interval, and the trend is still upwards. One thing to keep in mind is that the plot is different than all plots above; it is on the log scale. Since I didn’t figure out a way to transform it back to the normal scale, I just keep it there. However, we should expect that if we transfer it to the normal scale, the upward slope should be steeper
As we fitted the GARCH model before, we noticed that GARCH is more like a black box model, and it’s parameters don’t have a clear interpretation. To develop and test the hypothesis, we need the help of a pomp model.
“\(R_n\) is formally defined as leverage on day n as the correlation between index return on day (n-1) and the inincrease in the log volatility from day (n-1) to day n.” Here, we introduce a pomp implementation of Breto (2014), which models \(R_n\) as a random walk on a transformed scale \[R_n=\frac{exp(2G_n)-1}{exp(2G_n)+1}\] where \(G_n\) is the usual, Gaussian random walk.
\[Y_n=exp(H_n/2)\epsilon_n\] \[H_n=\mu_h(1-\phi)+\phi H_{n-1}+\beta_{n-1}R_nexp(-H_{n-1}/2)+W_n\] \[G_n=G_{n-1}+v_n\] where \(\beta_n=Y_n\sigma_{\eta}\sqrt{1-\phi^2}\), \(\epsilon_n \sim i.i.d. N(0,1)\), \(v_n \sim i.i.d. N(0,\sigma_v^2)\), and \(w_n \sim i.i.d. N(0,\sigma_{\omega}^2)\)
Here, we choose the iterated filtering algorithm (IF2) to converge toward the region of parameter space maximizing the maximum likelihood. In this case, we use the state variable \(X_n=(G_n,H_n,Y_n)\).
To determinene a recursive particle filter, we write the filtered particle j at time n - 1 as
\[X_{n-1,j}^F=(G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}^*)\]
Now we can construct prediction particles at time n, \[(G_{n,j}^p,H_{n,j}^p)\sim f_{G_n,H_n|G_{n-1},H_{n-1},Y_{n-1}}(g_n|G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}^*)\] with corresponding weight \(w_{n,j}=f_{Y_n|G_n,H_n}(y_n^*|G_{n,j}^P,H_{n,j}^P)\)
## ----names--------------------------------------------------------------------
amzn_statenames <- c("H","G","Y_state")
amzn_rp_names <- c("sigma_nu","mu_h","phi","sigma_eta")
amzn_ivp_names <- c("G_0","H_0")
amzn_paramnames <- c(amzn_rp_names,amzn_ivp_names)
amzn_covarnames <- "covaryt"
## ----rproc--------------------------------------------------------------------
rproc1 <- "
double beta,omega,nu;
omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) *
sqrt(1-tanh(G)*tanh(G)));
nu = rnorm(0, sigma_nu);
G += nu;
beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
H = mu_h*(1 - phi) + phi*H + beta * tanh( G )
* exp(-H/2) + omega;
"
rproc2.sim <- "
Y_state = rnorm( 0,exp(H/2) );
"
rproc2.filt <- "
Y_state = covaryt;
"
amzn_rproc.sim <- paste(rproc1,rproc2.sim)
amzn_rproc.filt <- paste(rproc1,rproc2.filt)
## ----rinit--------------------------------------------------------------------
amzn_rinit <- "
G = G_0;
H = H_0;
Y_state = rnorm( 0,exp(H/2) );
"
## ----measure------------------------------------------------------------------
amzn_rmeasure <- "
y=Y_state;
"
amzn_dmeasure <- "
lik=dnorm(y,0,exp(H/2),give_log);
"
## ----transforms---------------------------------------------------------------
amzn_partrans <- parameter_trans(
log=c("sigma_eta","sigma_nu"),
logit="phi"
)
## ----sp_pomp------------------------------------------------------------------
amzn.filt <- pomp(data=data.frame(
y=demean,time=1:length(demean)),
statenames=amzn_statenames,
paramnames=amzn_paramnames,
times="time",
t0=0,
covar=covariate_table(
time=0:length(demean),
covaryt=c(0,demean),
times="time"),
rmeasure=Csnippet(amzn_rmeasure),
dmeasure=Csnippet(amzn_dmeasure),
rprocess=discrete_time(step.fun=Csnippet(amzn_rproc.filt),
delta.t=1),
rinit=Csnippet(amzn_rinit),
partrans=amzn_partrans
)
## ----run_level----------------------------------------------------------------
run_level <- 3
amzn_Np <- switch(run_level, 100, 1e3, 2e3)
amzn_Nmif <- switch(run_level, 10, 100, 200)
amzn_Nreps_eval <- switch(run_level, 4, 10, 20)
amzn_Nreps_local <- switch(run_level, 10, 20, 20)
amzn_Nreps_global <- switch(run_level, 10, 20, 100)
## ----sim_pomp-----------------------------------------------------------------
params_test <- c(
sigma_nu = exp(-4.5),
mu_h = -0.25,
phi = expit(4),
sigma_eta = exp(-0.07),
G_0 = 0,
H_0=0
)
## ----parallel-setup,cache=FALSE-----------------------------------------------
library(doRNG)
## Loading required package: rngtools
registerDoRNG(34118892)
## ----mif_setup----------------------------------------------------------------
amzn_rw.sd_rp <- 0.02
amzn_rw.sd_ivp <- 0.1
amzn_cooling.fraction.50 <- 0.5
amzn_rw.sd <- rw.sd(
sigma_nu = 0.02,
mu_h = 0.02,
phi = 0.02,
sigma_eta = 0.02,
G_0 = ivp(0.1),
H_0 = ivp(0.1)
)
## ----box----------------------------------------------------------------------
amzn_box <- rbind(
sigma_nu=c(0.005,0.05),
mu_h =c(-1,0),
phi = c(0.95,0.99),
sigma_eta = c(0.5,1),
G_0 = c(-2,2),
H_0 = c(-1,1)
)
varName <- c("amzn_rw.sd", "amzn.filt", "params_test", "amzn_Np", "amzn_Nmif", "amzn_cooling.fraction.50", "amzn_Nreps_eval", "amzn_box")
stew(file=sprintf("mif1-%d.rda",run_level),{
t.if1 <- system.time({
if1 <- foreach(i=1:amzn_Nreps_local,
.packages='pomp', .combine=c, .export=varName) %dopar% mif2(amzn.filt,
params=params_test,
Np=amzn_Np,
Nmif=amzn_Nmif,
cooling.fraction.50=amzn_cooling.fraction.50,
rw.sd = amzn_rw.sd)
L.if1 <- foreach(i=1:amzn_Nreps_local,
.packages='pomp', .combine=rbind, .export=varName) %dopar% logmeanexp(
replicate(amzn_Nreps_eval, logLik(pfilter(amzn.filt,
params=coef(if1[[i]]),Np=amzn_Np))), se=TRUE)
})
},seed=318817883,kind="L'Ecuyer")
r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],
t(sapply(if1,coef)))
if (run_level>1) write.table(r.if1,file="amzn_params.csv",
append=TRUE,col.names=FALSE,row.names=FALSE)
## ----pairs_plot,echo=F,eval=T,out.width="11cm"--------------------------------
pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,
data=subset(r.if1,logLik>max(logLik)-20))
## ----box_eval-----------------------------------------------------------------
stew(file=sprintf("box_eval-%d.rda",run_level),{
t.box <- system.time({
if.box <- foreach(i=1:amzn_Nreps_global,
.packages='pomp',.combine=c, .export = c("if1", "amzn_box")) %dopar% mif2(if1[[1]],
params=apply(amzn_box,1,function(x)runif(1,x)))
L.box <- foreach(i=1:amzn_Nreps_global,
.packages='pomp',.combine=rbind, .export = varName) %dopar% {
logmeanexp(replicate(amzn_Nreps_eval, logLik(pfilter(
amzn.filt,params=coef(if.box[[i]]),Np=amzn_Np))),
se=TRUE)
}
})
},seed=290860873,kind="L'Ecuyer")
r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],
t(sapply(if.box,coef)))
if(run_level>1) write.table(r.box,file="amzn_params.csv",
append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.box$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6585 6600 6608 6629 6670 6673
## ----pairs_global_plot,eval=T,echo=F,out.width="11cm"-------------------------
pairs(~logLik+log(sigma_nu)+mu_h+phi+sigma_eta+H_0,
data=subset(r.box,logLik>max(logLik)-10))
plot(if.box)
By looking at the diagnosis plot, I found one thing that is very interesting. With run level = 3, the log-likelihood converges very fast; it converged after around ten iterations. Moreover, other parameters also converged after 150 iterations. That suggests we could save some computation power by reducing the number of iterations to 150.
## ----garch_benchmark,eval=F,echo=F--------------------------------------------
library(tseries)
fit.garch.benchmark <- garch(demean,grad = "numerical", trace = FALSE)
L.garch.benchmark <- tseries:::logLik.garch(fit.garch.benchmark)
L.garch.benchmark
## 'log Lik.' 6338.532 (df=3)
# Log likelihood with Pomp
summary(r.box$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6585 6600 6608 6629 6670 6673
As we could see, the benchmark GARCH has log-likelihood of 6338.532. However, with the pomp model, the minimum value of log-likelihood is 6585. That brings the point why we should try out pomp model.
In a word, about all the analysis I did above, I personally think it’s still a good time to hold amazon stocks since due to our model, the price should increase in the future. However, this is just a report for a class project, and any result I came up with cannot be treated as a piece of financial advice. I hope everyone reading this report could achieve financial success in the future.
Lecture Notes 14
https://ionides.github.io/531w18/final_project/1/final.html#conclu
https://www.rdocumentation.org/packages/forecast/versions/8.12/topics/auto.arima
ARFIMA model: https://en.wikipedia.org/wiki/Autoregressive_fractionally_integrated_moving_average
ARIMA model: https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average