As one of the most successful and famous tech company in the world, Apple‘s stock has always been financial practioner’s focus ever since Apply stocks were traded publicly in NASDAQ from 1980. What’s more interesting, Apple(AAPL) has gone through a very big Internet booming period last year and three great meltdowns recently. Last year is a great case for study stocks, especially for Apple, a representitive company which involves in both hardware and software high-end tech. Aside the effects from the market environment, Apple’s stock performance is also affected by its productions’ performances. For example, the launch of Iphone 11 and Ipad Pro took a hit in the market and improved Apple’s stock performances. Gievn the US stock markets just went through three melt-downs, the instability certainly becomes a hot and interesting topic that people would like to investigate in. Volatility, as a measure of market’s instability, certainly becomes an interesting topic that we would like to apply statistical knowledge to model it. In this paper, we will model Apple stock prices in both classic statistical models and Markov-property based models, and compare their results. We hope this project could provide some insight for investors in understanding stock volatility in time series sense.
Following our discussion above, I would like to raise one questions that I want to solve in my report:
The data were collected open-sourcely from Yahoo! Finance. The data covered Apple’s stock information daily from Nov.13 2018 to Apr.28 2020. The data includes the timestamp, open price, daily high/low prices, close price, adjusted close prices, and volumn. Since we are interested in the volatility, we will use the adjusted close prices, which is a financial concept that the prices are adjusted for splits and dividends. More detailed information could be refered at Investopedia. Let’s first take a look at the original data attributes(Figure 1).
## Date Open High Low Close Adj.Close
## 9564 2018-11-13 191.630005 197.179993 191.449997 192.229996 188.936081
## 9565 2018-11-14 193.899994 194.479996 185.929993 186.800003 183.599152
## 9566 2018-11-15 188.389999 191.970001 186.899994 191.410004 188.130142
## 9567 2018-11-16 190.500000 194.970001 189.460007 193.529999 190.213806
## 9568 2018-11-19 190.000000 190.699997 184.990005 185.860001 182.675247
## 9569 2018-11-20 178.369995 181.470001 175.509995 176.979996 173.947418
## Volume
## 9564 46882900
## 9565 60801000
## 9566 46478800
## 9567 36928300
## 9568 41925300
## 9569 67825200
First we could take a look at the general distribution by looking at the summary statistics and plot for the time series along with the the time series on the log scale(Figure 1):
## closingPrice
## nobs 365.000000
## NAs 0.000000
## Minimum 139.753540
## Maximum 327.200012
## 1. Quartile 183.715332
## 3. Quartile 262.470001
## Mean 221.352705
## Median 205.394424
## Sum 80793.737187
## SE Mean 2.582177
## LCL Mean 216.274847
## UCL Mean 226.430562
## Variance 2433.687616
## Stdev 49.332420
## Skewness 0.512615
## Kurtosis -0.847412
From the statistics summary, we could see that the daily adjusted closing prices for Apple has a very wide distribution across the 1 year(from $139 to $327) with a big variance of 2433.6876. Along with the plots on the right, we could see that the variances are pretty big across the whole time series. In general, the stock has an obvious increasing trend. The stock has some sudden spikes and dips happening during the gradual increases in its adjusted prices. After the log for the adjusted closing prices, the data pattern did not change, with only the magnitude becoming smaller
However, since we are dealing with the volatility related problem, we need to introduce the concept of return, which has the concept introduced in class note 14:
\(Return_{n}=\log \left(Price_{n}\right)-\log \left(Price_{n-1}\right)\)
where \(n\) denotes the timestamp as day.
In order to make more statoinary, we would de-mean the \(Return_{n}\) by:
\(DemeanedReturn_{n}=Return_{n} - (1/N) *\sum_{k=1}^{N} Return_{k}\)
Then we could look at the plot of the time series data (Figure 2):
We could see that after taking the difference of logrithmed adjusted closing prices and demeaning the data, the time series is much more weakly stationary. We could clealy observe the volatility across different sections on the time series, with a heavier tail due to the affect to market melt-down and corona virus. But we could observe that the variance is pretty random while the mean is around constant around 0, which still supports the facts towards the weakly stationary assumption.
Before we move on to model the data, we would like to check the ACF(Figure 3) for the de-meaned return too.
ACF is defined as based on class note 2:
where \(\gamma_h\) and \(\gamma_0\) are actually \(\gamma_{n,n+h}\) and \(\gamma_{n,n}\) repectively. They are the autocovariances. h is the lag inbetween time.
For random variable \(Y_{1:N}\), autocovariances are \(\gamma_{m, n}=\mathbb{E}\left[\left(Y_{m}-\mu_{m}\right)\left(Y_{n}-\mu_{n}\right)\right]\)
By plotting thr autocovariance versus the lags, we can get ACF plot for the de-meaned return:
We could observe that there are several spikes that indicates the autocorrealtions exisiting in between some of the lags, which we need to pay attention to in the later modeling. Though it seems correlated for some time points, it could coming from the relatively short period we sample from, and also it could be coming from the recent frequent emergencies happening all around. It is hard to conclude the autocorrelation is robust and valid from a simple ACF. But knowing the potentail relationship in between data points could help us to draw careful conclusion later on.
One way to compare the result to is to leverage some mature package on the shelf to comapre our randomness to. For example, the decompose function is a additive model that can help users to dempose a time series into seasonal (seasonal pattern), trend(long-term tendency), and short-term pattern with moving average. The additive model used is:
I chose to assume a seasonl pattern of weekly pattern(infered from ACF above) on the stock and extract the long-term upward trend to check on the errors left(Figure 4). We could see the “random” section is showing the residuals defined as random volatility. By the begining of the year 2020, we could see that there are data with random high variance. Also the fluctuation at the end of year 2018 is higher than the middle of the year. Those patterns match the de-meaned return we showed in (Figure 2)
From class note 14 we learned the generalized Autogressive Conditional Heteroskedasticity model(GARCH), which is GARCH(p,q) has the form of:
where \(\epsilon_{1: N}\) is the white noise.
##
## Call:
## garch(x = LogDiffAppleDmean, grad = "numerical", trace = FALSE)
##
## Coefficient(s):
## a0 a1 b1
## 1.612e-05 1.831e-01 7.954e-01
## 'log Lik.' 916.0265 (df=3)
The 3 parameter GARCH model has a maximized log likelihood of 916.0265. The three parameters concludes at \(\alpha_{0} = 1.612e-05\), \(\alpha_{1} = 1.831e-01\), and $_{1} = 7.954e-01 $. This GARCH model’s likelihood will be used as a benchmark for later comparison with POMP. However, we need to pay attention to that GARCH is hard to interpret that the parameters does not have a clear meaning. It is hard to draw conclusion from GARCH’s output directly.
The POMP model fisrt would need a volatility model. Using the introduced concept of leverage in class note 14, we want to define \(R_{n}\) as the correlation between returns on day n and day n-1. The idea is from Breto(’o 2014), which uses a transformed scale to model a AR(1) random walk.
where \(G_{n}\) is the random walk with Gaussian property.
Given \(G_{n}\) defines \(R_{n}\) we would have
where we would have:
\(H_{n}\) is the log of volatility on the return we are modeling on
\(\beta_{n}=Y_{n} \sigma_{\eta} \sqrt{1-\phi^{2}}\), which indicates \(\beta_{n}\) depends on \(Y_{n}\)
\(\epsilon_{n}\) is iid \(N(0,1)\)
\(v_{n}\) is iid \(N\left(0, \sigma_{\nu}^{2}\right)\) sequence
\(\left\{\omega_{n}\right\}\) is an iid \(N\left(0, \sigma_{\omega}^{2}\right)\) sequence
Gievn above-mentioned backgournd, we have constructed our volatility model. Then by transform the latent variables from \((G_{n},H{n})\) to \((G_{n + 1},H{n + 1})\) , we could use the state variable \(X_{n}=\left(G_{n}, H_{n}\), Y_{n}) to model the process as a observation of \(Y_{n}\) . The derivation could be refered at class note 14’s appendix. In this way, we are building a pomp object that could do filtering and estimation for parameters.
In other words, we will define a recurssive particle filter by defining particle \(j\) at time n-1:
\(X_{n-1, j}^{F}=\left(G_{n-1, j}^{F}, H_{n-1, j}^{F}, y_{n-1}\right)\)
Then we can build prediction particles:
\(\left(G_{n, j}^{P}, H_{n, j}^{P}\right) \sim f_{G_{n}, H_{n} | G_{n-1}, H_{n-1}, Y_{n-1}}\left(g_{n} | G_{n-1, j}^{F}, H_{n-1, j}^{F}, y_{n-1}\right)\) with weight \(w_{n, j}=f_{Y_{n} | G_{n}, H_{n}}\left(y_{n} | G_{n, j}^{P}, H_{n, j}^{P}\right)\)
The notations follows the class notes introduced in class note 12To combine the above-metioned idea with Monte Carlo, we start with speicfying the POMP model parameters and volatility model, we would have two pomp objects (filter abd simulation). Per report’s requirement, I will show the model structure so readers do not need to review my code by themselves. The model is built on the lecture node and with my own parameters choice for fine-tuning the results
Apple_ivp_names <- c("G_0","H_0") ##initial values
Apple_statenames <- c("H","G","Y_state")
Apple_rp_names <- c("sigma_nu","mu_h","phi","sigma_eta") ##regular parameters
Apple_paramnames <- c(Apple_rp_names,Apple_ivp_names)
rproc1 <- "
double beta;
double omega;
double nu;
omega = rnorm(0,sigma_eta * sqrt( 1- phi*phi ) *
sqrt(1-tanh(G)*tanh(G)));
nu = rnorm(0, sigma_nu);
G += nu;
beta = Y_state * sigma_eta * sqrt( 1- phi*phi );
H = mu_h*(1 - phi) + phi*H + beta * tanh( G )
* exp(-H * 0.5) + omega;
"
rproc2.filt <- "
Y_state = covaryt;
"
rproc2.sim <- "
Y_state = rnorm( 0,exp(H/2) );
"
Apple_rproc.filt <- paste(rproc1,rproc2.filt)
Apple_rproc.sim <- paste(rproc1,rproc2.sim)
Moreover, we need to initilize Y, G, and H for the pomp model, then we could define the initilizing value of rmeasure and dmeasure.
##The two objects share the common initializer
Apple_rinit <- "
G = G_0;
H = H_0;
Y_state = rnorm( 0,exp(H/2) );
"
#d and r measures
Apple_rmeasure <- "
y=Y_state;
"
Apple_dmeasure <- "
lik=dnorm(y,0,exp(H/2),give_log);
"
#Do log transform and logit transform
Apple_partrans <- parameter_trans(
log=c("sigma_eta","sigma_nu"),
logit="phi"
)
Here we build the main body of the POMP model by sub-in all the parameters we showed above
Apple.filt <- pomp(data=data.frame(
y=LogDiffAppleDmean,time=1:length(LogDiffAppleDmean)),
statenames=Apple_statenames,
paramnames=Apple_paramnames,
times="time",
t0=0,
covar=covariate_table(
time=0:length(LogDiffAppleDmean),
covaryt=c(0,LogDiffAppleDmean),
times="time"),
rmeasure=Csnippet(Apple_rmeasure),
dmeasure=Csnippet(Apple_dmeasure),
rprocess=discrete_time(step.fun=Csnippet(Apple_rproc.filt),
delta.t=1),
rinit=Csnippet(Apple_rinit),
partrans=Apple_partrans
)
Here we show the fixed parameter choices of our self and also we build the simulation pomp object
params_test <- c(
sigma_nu = exp(-3),
mu_h = -0.3,
phi = expit(3.5),
sigma_eta = exp(-0.06),
G_0 = 0,
H_0=0
)
sim1.sim <- pomp(Apple.filt,
statenames=Apple_statenames,
paramnames=Apple_paramnames,
rprocess=discrete_time(
step.fun=Csnippet(Apple_rproc.sim),delta.t=1)
)
sim1.sim <- simulate(sim1.sim,seed=1,params=params_test)
Similarily we build up the filter pomp object based on those parameter set up
sim1.filt <- pomp(sim1.sim,
covar=covariate_table(
time=c(timezero(sim1.sim),time(sim1.sim)),
covaryt=c(obs(sim1.sim),NA),
times="time"),
statenames=Apple_statenames,
paramnames=Apple_paramnames,
rprocess=discrete_time(
step.fun=Csnippet(Apple_rproc.filt),delta.t=1)
)
First we choose runing level with 3, which corresponding to the size parameters listed below.
run_level <- 3
Apple_Np <- switch(run_level, 100, 1e3, 2e3, 1000)
Apple_Nmif <- switch(run_level, 10, 100, 200, 100)
Apple_Nreps_eval <- switch(run_level, 4, 10, 20, 10)
Apple_Nreps_local <- switch(run_level, 10, 20, 20, 20)
Apple_Nreps_global <- switch(run_level, 10, 20, 100, 75)
stew(file=sprintf("p1f1-%d.rda",run_level),{
t.pf1 <- system.time(
pf1 <- foreach(i=1:Apple_Nreps_eval,
.packages='pomp') %dopar% pfilter(sim1.filt,Np=Apple_Np))
},seed = 4292020, kind = "L'Ecuyer")
# (L.pf1 <- logmeanexp(sapply(pf1,logLik),se=TRUE))
Running the fixed-parameter model
Apple_rw.sd_rp <- 0.02
Apple_rw.sd_ivp <- 0.1
Apple_cooling.fraction.50 <- 0.7
Apple_rw.sd <- rw.sd(
sigma_nu = Apple_rw.sd_rp,
mu_h = Apple_rw.sd_rp,
phi = Apple_rw.sd_rp,
sigma_eta = Apple_rw.sd_rp,
G_0 = ivp(Apple_rw.sd_ivp),
H_0 = ivp(Apple_rw.sd_ivp)
)
stew(file=sprintf("m1if1-%d.rda",run_level),{
t.if1 <- system.time({
if1 <- foreach(i=1:Apple_Nreps_local,
.packages='pomp', .combine=c) %dopar% mif2(Apple.filt,
params=params_test,
Np=Apple_Np,
Nmif=Apple_Nmif,
cooling.fraction.50=Apple_cooling.fraction.50,
rw.sd = Apple_rw.sd)
L.if1 <- foreach(i=1:Apple_Nreps_local,
.packages='pomp', .combine=rbind) %dopar% logmeanexp(
replicate(Apple_Nreps_eval, logLik(pfilter(Apple.filt,
params=coef(if1[[i]]),Np=Apple_Np))), se=TRUE)
})
},seed = 4292020, kind = "L'Ecuyer")
r.if1 <- data.frame(logLik=L.if1[,1],logLik_se=L.if1[,2],
t(sapply(if1,coef)))
if (run_level>1) write.table(r.if1,file="Apple_params.csv",
append=TRUE,col.names=FALSE,row.names=FALSE)
Let’s evaluate the fixed parameter models with diagnostic plots
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 918.9 924.7 926.0 926.8 931.1 932.6
From the summary above, we could see the Maxium likelihood is 932.6
From Figure 5, we could see the \(\sigma_{\nu}\) is around 0.005, which is away from exp(-3); \(\mu\) is distributed around our preset value of -3; \(\phi\) is distributed sparsely as two groups, which is not a good sign for comparing to our preset value; \(\sigma_{\eta}\) is distributed around 1, which is close to our preset \(exp(-0.06)\)
From Figure 6, we could see that our convergences of parameters are decent. \(\sigma_{\nu}\) , \(\sigma_{\eta}\), \(G_{0}\) and \(H_{0}\) have a rough convergence except some deviations. Out of our expection, \(\mu_{h}\) did not converge well as iterations goes up. \(\phi\)’s behaivor is dividing its convergence into two parts, which matches our findings from the pair plot. But it is not a good sign for the parameter.Moreover, we could see some randm dips in effective sample sizes along the iterations, but the average levels seems good.
In order to address the potential problem, and double check our model’s validaility, we need to run global search for likelihood with randomized starting value. Here we could see the ranges we set for each parameters
Apple_box <- rbind(
sigma_nu=c(0.001,0.1),
mu_h =c(-10,0),
phi = c(0.5,0.99),
sigma_eta = c(0.5,1),
G_0 = c(-2,2),
H_0 = c(-1,1)
)
stew(file=sprintf("box1_eval-%d.rda",run_level),{
t.box<- system.time({
if.box <- foreach(i=1:Apple_Nreps_global,
.packages='pomp',.combine=c) %dopar% mif2(if1[[1]],
params=apply(Apple_box,1,function(x)runif(1,x)))
L.box <- foreach(i=1:Apple_Nreps_global,
.packages='pomp',.combine=rbind) %dopar% {
logmeanexp(replicate(Apple_Nreps_eval, logLik(pfilter(
Apple.filt,params=coef(if.box[[i]]),Np=Apple_Np))),
se=TRUE)
}
})
},seed = 4292020, kind = "L'Ecuyer")
r.box <- data.frame(logLik=L.box[,1],logLik_se=L.box[,2],
t(sapply(if.box,coef)))
write.table(r.box,file="Apple_params1.csv", append=TRUE,col.names=FALSE,row.names=FALSE)
summary(r.box$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 904.7 924.4 925.8 926.0 930.3 932.8
From the summary, we could see that our model has a little bit improvement on maxium likelihood, which reaches 932.8. It imporves a little bit from the Stochastic Leverge Model.
However, from Figure 7, we could the see the projected convergency might be pretty bad given parameters like \(\mu_{h}\) and \(\phi\) are showing inconvergent patterns.
Not as guessed from the pair plot, we could see from Figure 8 that the parameters have pretty good convergency as iteration grows. \(\sigma_{\nu}\), \(G_{0}\) and \(H_{0}\), and \(\sigma_{\eta}\) shows pretty good trend for convergency. While \(\mu_{h}\) and \(\phi\) behaves similarly from local search.
Moreover, we could see some random dips in effective sample sizes along the iterations, but the average levels seems good.
Recall from out output above, GARCH model yields the maximum likelihood of 916.0265, which is lower than both local search(932.6) and global search(932.8) from POMP models. Hence based on how AIC works introduced in class notes 5, we would pick POMP models over GARCH model in modeling volatility for apple.
The global search has a maximum likelihood higher the maximum likelihood than local search, but not guranteed to be statistically significant.
Both of the POMP models shows stable performances on the effective sample sizes, but they both suffer from some parameters convergence problems (\(\mu_{h}\) and \(\phi\)).
To put it into a nutshell, we could arrive at these conclusion from our analysis and answer the Question raised in section 1.1:
By conducting likilihood test based on AIC value, we could see the maximum likelihood from POMP model did defeat the GARCH model. In other words, POMP model could do a better job in modeling Apple stock’s volatility in our case study. To be specific, Random Walk Leverage volatility based POMP model with randomized starting parameters in global search usually do best job in modeling Apple stock volitility. However, the convergency of the model parameters would need more attention, which will be discussed in the following section.
The Apple stock is going through market effects, government effects, and COVID-19 with a lot of interesting chain reactions. The current volatility model we choose might need some adptations to better describe the stock. Given we are doing case study on a single stock, we could introduce more specific parameters into the model to catch up the domain knolwdge. While limited by the coding period and learning depth, I did not dig through more detailed model that might be able to improve the POMP model performance.
Given we have several parameters did not converge as we expected, it is not statistically reliable to conclude a “best” model. Even though I tried parameter tunings in many different combinations, I was not able to manage in between the time complexity and space complexity, which left my job using too much runtime. If I could find a better way to monitoring the dynamic relationship inbetween parameters, the job of finding “best” model parameters would be much more efficient.
Given we are using a lot of dataframe in the code, I would try to use a more fundamental data strucutre to try to carry out the computation job in higher dimensions to save some runtime for the project.
I would like to try some optimization methods in leading to POMP model parameters convergency. I was thinking about adding penlties to some parameters to balance the performances in between parameters, which would lead to a more robust POMP model.
In stead of only modeling volatility, I want to leverage POMP model into portfolio building in order to maxmize the return from investment. Learning POMP from time series domain opened up a differnt angle for me to handle time information. I want to apply pomp to generate more features from time series, which could benefit my study in Data Science a lot.
The format of the RMD layout is learned from https://github.com/ionides/531w18/blob/master/final_project/21/final.Rmd
’o, Carles Bre. 2014. “On Idiosyncratic Stochasticity of Financial Leverage Effects.” Statistics & Probability Letters 91 (August). Elsevier BV: 20–26. https://doi.org/10.1016/j.spl.2014.04.003.