Volatility is a statistical measure of the dispersion of returns for a given security or market index. In most cases, the higher the volatility, the riskier the security and in financial world, volatility often comes with big swings.[1] Therefore, I think modeling financial volatility is very important and worth of invetigation.
The NASDAQ-100 index is a stock market index made up of 103 equity securities issued by 100 of the largest non-financial companies listed on the Nasdaq stock market.[2] Most of the NASDAQ-100 companies are very famous companies in all fields such as Apple, Costco, Intel etc. Therefore, I think it would be interesting to find out what the trend is for NASDAQ-100 and industry-leading companies in the past few years.
To conduct the volutatiy analysis of NASDAQ-100 index time series data, I will try using GARCH model and POMP model to find the best model that can describe the financial volatility of NASDAQ-100 index in this project.
In this project, the data is acquired from Federal Reserve Economic Data (FRED)[5], which is a database maintained by the Research division of the Federal Reserve Bank of St. Louis.[6]
The data on which I’m conducting research is the NASDAQ-100 index data. There are a total of 463 data points ranging from 6-11-2001 to 04-26-2010. Each data point represents an average of that week’s NASDAQ-100 index price, ending Friday.
First, let’s have a quick look at the data:
## DATE NASDAQ100
## 1 2001-06-15 1780.332
## 2 2001-06-22 1708.528
## 3 2001-06-29 1778.134
## 4 2001-07-06 1767.205
## 5 2001-07-13 1692.960
## 6 2001-07-20 1690.902
There are two columns in the data set. For each data point, we have parameter “DATE” which is its date, and parameter “NASDAQ100”, which is its NASDAQ-100 price, reflecting the average price of the week this date resides, ending Friday.
Let’s look at a brief summary of the data below:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 832.7 1374.9 1547.7 1539.9 1759.1 2212.2
We can see that the maximum of the NASDAQ-100 index during this time period is 2212.2, while the minimum is 832.7, and it has an average of 1539.9.
To have a more thorough idea of the data, let’s make a time plot directly and see what it looks like!
As we can see from above, both before and after log-transformation time plots are increasing during the period, while there are two sudden drops, during 2003 and 2008, which could be explained by the two huge financial crisis (economic recession). Here, we notice that, there is a trend in both plots. The average of log-transformed data is around 7.3. Then I take the difference of log-transformed data, which still has a slight mean. We eventually take the mean off this data set and use it as our final data set.
From the plots, we notice that the demeaned return time plot is a random process around 0, while the variances around 2003 and 2008 are higher than other years. In other words, volatility is changing over time as well through the years. High volatilities often show up together as well.
The generalized autoregressive conditional heterosdacity model (known as GARCH model) is usually pretty useful in modeling financial data.
A GARCH(p,q) model has the form \[y_n = \epsilon_n \sqrt V_n\] where \(y_n\) is the return at time n, \(V_n\) is the volatility at time n, and \[ V_n = \alpha_0 + \sum_{j=1}^p \alpha_j y_{n-j}^2 + \sum_{k=1}^q \beta_k V_{n-k} \] and \(\epsilon_{1:n}\) is white noise.
Here, we want to fit a GARCH model which is a popular choice (Cowpertwait and Metcalfe; 2009) which can be fitted using garch() in the tseries R package.[4]
## 'log Lik.' 1045.844 (df=3)
This function builds a 3-parameter GARCH model which gives a maximized conditional log-likelihood of 1045.844 given the first max(p,q) values.
This seems promising, but GARCH model is a black-box model, and we have no idea what those 3 parameters mean. They don’t have clear interpretation. This might be helpful in terms of forcasting. However, if we want to develop and test a hypothesis that goes beyond the class of GARCH models, it is useful to have the POMP framework available.[4]
“Rn is formally defined as leverage on day n as the correlation between index return on day (n-1) and the inincrease in the log volatility from day (n-1) to day n.”
Here, we will use a pomp implementation of Breto(2014) and model \(R_n\) as a random walk on a transformed scale \[R_n=\frac{exp{\{2G_n\}}-1}{exp{\{2G_n\}}+1}\] where \(\{G_n\}\) is the usual, Gaussian random walk.[4]
Here, I want to use the Iterated Filtering algorithm (IF2)[8] to try to find the optimal parameter space which maximizes log-likelihood.[4]
Filter particle j at time (n-1) is denoted as: \[X_{n-1,j}^F=(G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}^*)\] Prediction particles at time n are denoted as: \[(G_{n,j}^p,H_{n,j}^p)\sim f_{G_n,H_n|G_{n-1},H_{n-1},Y_{n-1}}(g_n|G_{n-1,j}^F,H_{n-1,j}^F,y_{n-1}^*)\] with corresponding weight \(w_{n,j}=f_{Y_n|G_n,H_n}(y_n^*|G_{n,j}^P,H_{n,j}^P)\)
Initial values and starting values of the parameters are given below. We will use these parameters to fit the first model.
params_test <- c(
sigma_nu = exp(-4.5),
mu_h = -0.25,
phi = expit(4),
sigma_eta = exp(-0.07),
G_0 = 0,
H_0=0
)
nsdq_rw.sd_rp <- 0.02
nsdq_rw.sd_ivp <- 0.1
nsdq_cooling.fraction.50 <- 0.5
Here, we will be using level 3 parameter setting to fit the model
run_level <- 3
nsdq_Np <- switch(run_level, 100, 1e3, 2e3)
nsdq_Nmif <- switch(run_level, 10, 100, 200)
nsdq_Nreps_eval <- switch(run_level, 4, 10, 20)
nsdq_Nreps_local <- switch(run_level, 10, 20, 20)
nsdq_Nreps_global <- switch(run_level, 10, 20, 100)
## Loading required package: rngtools
Below is the summary of the log-likelihood we got from this model.
summary(r.if1$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1032 1040 1046 1045 1051 1053
As we can see, using a simple POMP model without parameter searching is already better than a GARCH model! This is very promising and let’s dig more into it by making plots of the model.
## ----pairs_plot,echo=F,eval=T,out.width="11cm"--------------------------------
pairs(~logLik+sigma_nu+mu_h+phi+sigma_eta,
data=subset(r.if1,logLik>max(logLik)-20))
As we can see from the pairs plots, the optimal value of \(\sigma_{\nu}\) is roughly between (0, 0.03), the optimal value of \(\mu_h\) is roughly between (-8, 4), the optimal value of \(\phi\) is roughly between (0.95, 1) and the optimal value of \(\sigma_\eta\) is roughly between (0, 20). Therefore, now we want to optimize the parameter settings for the POMP model, by trying many different potential start values for the parameters. This is useful since carrying out searches starting randomly throughout a large box can lead to reasonble evidence for successful global maximization.[4]
Now, let’s look at the summary of the globally optimized model!
summary(r.box$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1016 1046 1052 1049 1056 1059
As we notice, this globally optimized model has a better performance than the IF2 model! The maximum log-likelihood 1059 is higher than 1053 and 1045, which are the maximum log-likelihood of IF2 model and GARCH model respectively.
Now, let’s have a look at the pairs graph of the parameters.
As we can see above, there are more points as compared to IF2 model, since we updated the range of parameters during this global optimization.
From the plots, we can summarize that \(\phi\) around 0.98, \(H_0\) around -6, \(\mu_h\) around -7.5, \(\sigma_\eta\) around 0 might give us optimal log-likelihood. But other parameters don’t show clear pattern here.
From the convergence diagonostics, we know that log-likelihood, \(G_0\) and \(\phi\) converged pretty well during the time. However, other parameters did not converge well enough during this experiement. There are several things we could potentially try in the future: increase the number of iterations and refine the global box search parameter ranges.
After running all the experiments above and conducting the diagonostices, I think POMP model outperforms the GARCH model in two perspectives:
In terms of the maximized log-likelihood, POMP model gives higher value than the GARCH model, which is what we want.
In terms of the interpretation, GARCH model is a black-box and we don’t what the parameters stand for. However, each parameter in POMP model has its specific meaning and is easier to interpret.
Therefore, I think POMP model will do a better job in modeling the NASDAQ-100 time series data. In the future, increasing the number of iterations and refining the global box optimization might even improve the performance more.
[1] https://www.investopedia.com/terms/v/volatility.asp [2] https://en.wikipedia.org/wiki/NASDAQ-100 [3] https://ionides.github.io/531w18/final_project/2/final.html [4] https://ionides.github.io/531w20/14/notes14.pdf [5] https://fred.stlouisfed.org/series/NASDAQ100#0 [6] https://en.wikipedia.org/wiki/Federal_Reserve_Economic_Data [7] https://ionides.github.io/531w18/final_project/1/final.html [8] Ionides, E.L., D.Nguyen, Y.Atchadé, S.Stoev, and A.A. King. 2015. Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. Proceedings of the National Academy of Sciences of the U.S.A. 112:719–724.