Introduction

In financial markets, having the foresight to anticipate upcoming trends can give a considerable edge. This project delves into analyzing NVIDIA Corporation’s stock prices using sophisticated time series methodologies like ARMA and GARCH models, while also delving into the POMP framework. Our dataset spans daily stock market data from January 18, 2022, focusing particularly on the ‘Adjusted Close’ price, which provides a nuanced reflection of NVIDIA’s valuation, accounting for corporate actions. The ARMA model helps dissect the series’ linear dynamics, while the GARCH model aptly deals with its inherent volatility. Furthermore, we employ the POMP framework to capture the intricacies of processes influenced by latent variables. By integrating these models, our aim is not only to improve predictive accuracy regarding future stock prices but also to contribute to a deeper understanding of financial time series modeling. The insights gleaned from this analysis offer promise in crafting effective strategies for risk management and trading.

We plan to divide our project into four main segments: exploratory data analysis, ARMA model selection, GARCH model selection, and the implementation of the POMP framework. Each model will undergo diagnostic checks, and we will choose the best one based on its highest likelihood and interpretability.

Exploratory Data Analysis

Daily Stock Price and Summary

In a candlestick chart representing stock prices, a green candlestick signifies that the closing price is higher than the opening price, indicating an increase in price. Conversely, a red candlestick indicates that the closing price is lower than the opening price, signaling a decrease in price. Observing the trend from a decline until October 14, 2022, when the stock reached its lowest price of $112.9, there has been a consistent upward trajectory in prices, culminating in a peak of $950.02 on March 25, 2024.

##                              Statistic            Value
## 1                                 Mean 333.889637163993
## 2                   Standard Deviation 200.114347273665
## 3                              Maximum        950.02002
## 4                              Minimum       112.186241
## 5 Date of maximum adjusted close price       2024-03-25
## 6 Date of minimum adjusted close price       2022-10-14

Definition of Daily Log-return and Summary Statistics

We define the daily log-return as follows:

Suppose \(\{X_t\}\) is the daily closing stock-price of an asset, then the daily Log-returns are defined as \[r_t = \log\left(\frac{X_t}{X_{t-1}}\right)\]

Also, the \(k-\)period log-return is additive over past \(k\)-days’ log-return, which is a way to normalize data: \[r_t(k) = \log\left(\frac{X_t}{X_{t-1}}\right) = r_t + t_{t-1} + \cdots + r_{t-k+1}\]

After obtaining the daily log-return, it appears that our data exhibits signs of stationarity, yet there are notable peaks indicating periods of high volatility. The daily log volatility, averaging about 0.035, suggests considerable fluctuations. Notably, the highest daily log-return occurred on May 25, 2023, reaching 0.218, while the lowest was recorded on September 13, 2022, at -0.09. With these observations in mind, we can proceed to fitting an ARMA model to the daily log-return data.

##                          Statistic               Value
## 1                             Mean 0.00223475447180245
## 2               Standard Deviation  0.0346966995707835
## 3                          Maximum   0.218087750501639
## 4                          Minimum -0.0995175788079692
## 5 Date of maximum daily log-return          2023-05-25
## 6 Date of minimum daily log-return          2022-09-13

ARIMA Model Selection

Before ARMA model selection, let’s review the definition of ARMA process[2]:

\(\textbf{Definition:}\) A time series \(\{X_t: t = 0, \pm 1, \pm 2, \cdots \}\) is ARMA(\(p, q\)) if it is stationary and \[X_t = \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} + \epsilon_t + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q}\] with \(\phi_p \neq 0, \theta_q \neq 0\). Here \(\epsilon_t\) is a weak white noise. We define \(\epsilon_t\) to be weak white noise if it has mean 0, constant variance \(\sigma^2\), and \(Cov(\epsilon_m, \epsilon_n) = 0\) for \(m \neq n\).

ADF Test for Stationary

To begin with, we first take a Augmented Dickey-Fuller test(ADF) [6]to our time series. ADF tests the null hypothesis that a unit root is presented in a time series sample, the alternative hypothesis is usually stationarity or trend-stationarity. The more negative the test value is , the stronger the rejection of the hypothesis that there is a unit root at some level of confidence. We see that the test-statistic is about -7.16 and the p-value is less than 0.01, which suggest we keep the null hypothesis that our time series is stationary.

Ljung-Box test for Independent Distribution

We then conducted the Ljung-Box test, wherein the null hypothesis assumes that the data are independently distributed, while the alternative hypothesis suggests that the data are not independently distributed and exhibit serial correlation. Upon testing our time series data, we obtained a p-value of approximately 0.325, which exceeds the significance level of 0.05. This suggests that we retain the null hypothesis, indicating that our data are independently distributed.

Having verified both the stationarity and independence of the data, we can now proceed to the next stage: selecting an appropriate ARMA model.

## 
##  Box-Ljung test
## 
## data:  diff_data
## X-squared = 22.283, df = 20, p-value = 0.3254

ARMA Model Selection

We calculated the Akaike Information Criterion (AIC) for each ARMA(\(p, q\)) model, with \(1 \leq p, q \leq 4\). Our analysis identified ARMA(0, 0), ARMA(0, 1), and ARMA(1, 0) as potential feasible models, each with AIC values of -2171.24, -2169.36, and -2169.35, respectively.

### AIC table
aic_table <- function(data, P, Q){
  table <- matrix(NA, (P+1), (Q+1))
  for (p in 0 : P){
    for (q in 0 : Q){
      table[p+1, q+1] <- arima(data, order = c(p, 0, q))$aic
    }
  }
  dimnames(table) <- list(paste("AR", 0 : P, sep = ""),
                          paste("MA", 0 : Q, sep = ""))
  table
}

aic_table <- aic_table(diff_data, 4, 4)
require(knitr)
## Loading required package: knitr
kable(aic_table, digits = 2)
MA0 MA1 MA2 MA3 MA4
AR0 -2171.24 -2169.36 -2168.15 -2167.70 -2165.85
AR1 -2169.35 -2167.33 -2166.66 -2165.75 -2166.30
AR2 -2168.22 -2166.85 -2173.11 -2164.29 -2165.08
AR3 -2167.95 -2165.96 -2164.36 -2171.01 -2163.10
AR4 -2165.98 -2166.00 -2164.88 -2162.97 -2167.07

Upon examining the plot for the roots of the autoregressive (AR) and moving average (MA) components, we observe that both ARMA(0, 1) and ARMA(1, 0) appear promising, with their roots located inside the unit circle. This suggests that ARMA(0, 1) exhibits invertibility, while ARMA(1, 0) demonstrates causality.

Likelihood Ratio test

To further refine our model selection process, we conducted a likelihood ratio test, comparing the null hypothesis of ARMA(0, 0) against the alternative hypotheses of either ARMA(0, 1) or ARMA(1, 0). Since the hypotheses are nested, the test is valid. We then applied Wilk’s approximation to determine that the test statistic follows a \(\chi_d\),where \(d\) represents the difference in the dimension of parameters between the null and alternative hypotheses.

After conducting the two nested hypothesis tests, we conclude that retaining the null hypothesis, indicating that ARMA(0, 0) is sufficient. Considering both the AIC value and the likelihood ratio test, we select ARMA(0, 0) as the optimal ARMA model.

Also, the intercept term for the ARMA(0, 0) is 0.00223, and so we can represent the ARMA(0, 0) model as \[Y_n = 0.00223 + \epsilon_n\] where \(\epsilon_n\) is a weak white noise process.

##   intercept 
## 0.002238745

Residual Analysis For ARMA(0, 0)

We now proceed to the part of residual analysis for the ARMA(0, 0) process, by checking the ACF plot for the residual, we see that all lags greater than 2 are inside of the confidence band, indicating stationarity of our time series data.

Then, we estimated the sample mean and sample variance of residuals, that\(E(\epsilon_n) \approx 0\) and \(Var(\epsilon_n) = 0.0012\). Also, we checked the QQ-plot of the residuals and conducted the Shapiro test for testing normality. We conclude that the residual may not be normal but behaves heavier tail than normal, but we conclude our process should be a weak white noise. This suggests that normal residuals could not model the outliers well and the high volatilities shows in our above plot, suggesting the residual may be fitted by a t-distribution and we can then try to fit a GARCH model to better fit high volatility.

## [1] "mean of epsilon: -6.51642678309702e-15 Variance of epsilon: 0.00120600560853897"

For our purpose, we computed the log-likelihood of the ARMA(0, 0) model, which is approximately 1087. We will now proceed to fit a GARCH model and find the corresponding log-likelihood.

## [1] "log-likelihood for ARMA(0, 0): 1087.61849681041"

GARCH Model Selection

Before model selection, we first introduce why we want to choose a GARCH model. In many financial datasets, returns may change over time, meaning that volatility may also vary over time, and the variance may depend on local history. While the ARMA model assumes constant volatility over time, it may fail to explain periods of high volatility effectively. The GARCH model addresses this limitation and is capable of capturing time-varying volatility more accurately. Now, let’s begin by reviewing the definition of the GARCH model.[3]

\(\textbf{Definition}\): We say a process \(X = \{X_n\}\) is GARCH(p, q) if \[X_n = \mu_n + \sigma_n\epsilon_n\] where \(\{\sigma_n\}\) is an iid white noise process with mean 0 and variance of 1, and the model for \(\sigma_n\) is \[\sigma_n^2 = \alpha_0 + \sum\limits_{i=1}^{p}\beta_i\sigma^2_{n-i} + \sum\limits_{j=1}^{q}\alpha_j \tilde{X}_{n-j}^2\] with \[\tilde{X}_n = X_n - \mu_n\] A popular used model is the GARCH(1, 1) model, that we have \[X_n = \mu_n + \sigma_n\epsilon_n\] where \[\sigma_n^2 = \alpha_0 + \beta_1\sigma^2_{n-1} + \alpha_1 \tilde{X}^2_{n-1}\]

GARCH(1,1)

We now fit a GARCH(1, 1) to our time series, we notice that the coefficients of fitted model are not statistically significant, that we find \[\alpha_0 = 0.0011, \alpha_1 = 0.0499, \beta_1 = 0.05\] and the p-value for three coefficient are greater than 0.1. Although the likelihood of the model is about 1596 that far exceed that of ARAM(0, 0) model, we still discard this model for those insignificant coefficients.

ARMA(0, 0)+ GARCH(1, 1) + Normal error

Now, after failing to fit a GARCH(1, 1) model, we are considering an ARMA/GARCH model[4]. This decision stems from our observation that in the ARMA(0, 0) model, the residuals may not follow a normal distribution. Therefore, we aim to model the residuals as a GARCH process.

\(\textbf{Definition}\): The ARMA(\(p, q\)) model for \(X_n\) corresponds to \[X_n = \sum\limits_{i=1}^{p}\alpha_iX_{n-i} + \sum\limits_{j=1}^{q}\phi_j\epsilon_{n-j} + \epsilon_n\] where \({\epsilon_n}\) is a mean-zero white noise process. A more general model is to allow the noise process \(\epsilon_n\) be a GARCH(\(p, q\)) process, that \[\epsilon_n = \sigma_n\delta_n\] where \(\delta_n\) is a iid \(N(0, 1)\), and \[\sigma_n^2 = \alpha_{g,0} + \sum\limits_{j=1}^{p}\beta_{g, j}\sigma^2_{n-i} + \sum\limits_{i=1}^{q}\alpha_{g, i}\epsilon_{n-j}^2\] Now, after fitting the model, we obtained a likelihood of 1092. However, it is worth noting that both the Shapiro-Wilk test statistic and the Jarque-Bera test statistic indicate that the residuals \(\epsilon_n\) do not follow a normal distribution, violating our model assumption. Consequently, we once again discard this model.

ARMA(0, 0) + GARCH(1,1) + t-error

Instead of fitting the residual with a normal distribution, we attempted to fit it with a t-distribution. Upon fitting this model, we observed that all test statistics performed well, and we obtained a shape parameter of 6.78, which is statistically significant. Additionally, the model provided a likelihood of 1120. Therefore, in this section, we select ARMA(0, 0) + GARCH(1, 1) with residuals following a t-distribution as our potential model.

POMP Model

We will now consider a stochastic volatility model, where volatility is modeled as a latent stochastic process, partially observed via the returns. Additionally, assuming a Markovian property for volatility leads to a POMP (Partially Observed Markov Process) model. Now, we define the leverage \(R_n\), that on day \(n\) as the correlation between index return on day \(n-1\) and the increase in the log volatility from day \(n-1\) to day \(n\), and model \(R_n\) as a random walk on a transformed scale[1], \[R_n = \frac{\exp{(2G_n)} - 1}{\exp{(2G_n)} + 1}\] where \(\{G_n\}\) is a Gaussian random walk.[1]

Now the proposed model is below[8]: \[Y_n = \exp{(H_n/2)}\epsilon_n\] \[H_n = \mu_h(1-\phi) + \phi H_{n-1} + \beta_{n-1}R_n\exp{(-H_{n-1}/2)} + \omega_n\] \[G_n = G_{n-1} + \nu_n\] where \(\beta_n = Y_n\sigma_{\eta}\sqrt{1 - \phi^2}\), \(\{\epsilon_n\}\) is an iid \(N(0, 1)\) sequence, \(\{\nu_n\}\) is an iid \(N(0, \sigma^2_{\nu})\) sequence, and \(\omega_n\) is \(N(0, \sigma^2_{\omega, n})\) with \[\sigma^2_{\omega, n} = \sigma^2_{\eta}(1 - \phi^2)(1-R_n^2)\] Here , \(H_n\) is the log volatility. The latent stat is \(X_n = (G_n, H_n)\). To build a POMP model, the POMP representation has state variable \(X_n = (G_n, H_n, Y_n)\), and we write the filtered particle \(j\) at time \(n-1\) as \[X_{n-1, j}^{F} = (G_{n-1, j}^{F}, H_{n-1, j}^{F}, y^{*}_{n-1})\] and we can construct prediction particles at time \(n\), \[(G_{n, j}^{P}, H_{n, j}^{P}) \sim f_{G_n, H_n|G_{n-1}, H_{n-1}, Y_{n-1}}(g_n|G_{n-1, j}^F, H_{n-1, j}^F, y^{*}_{n-1})\] with corresponding weight \[w_{n, j} = f_{Y_n|G_n, H_n}(y^{*}_n|G_{n, j}^{P}, H^{P}_{n, j})\] We can now start fitting the POMP model, that we need to estimate \(\sigma_{\nu}, \mu_h, \phi, \sigma_{eta}, G_0, H_0\) where \(G_0, H_0\) are initial values.

Conclusion

By fitting all three models, ARMA(0, 0) give us the likelihood of 1092, the ARMA(0, 0)-GARCH(1, 1) model give us likelihood of 1120, while the pomp model using both local search and global search give us likelihood of 1110.

Indeed, we have compelling reasons to consider each model as the best fit for our analysis. The ARMA(0, 0) model stands out as the simplest option, indicating that daily log-returns follow a shifted random walk, highlighting the inherent unpredictability of financial markets. This aligns with the efficient market theory, emphasizing the importance of market unpredictability to prevent arbitrage opportunities. On the other hand, the ARMA-GARCH model offers a more nuanced approach by better capturing the volatility of daily log-returns and acknowledging that future volatility may depend on past observations. Although it provides the highest likelihood among the models considered, it remains complex and doesn’t explicitly elucidate the underlying causes of volatility but serves as a superior fitting model for our data. Meanwhile, the POMP model introduces the concept of latent processes to interpret observed measurements, offering meaningful insights into time-series dynamics. However, it exhibits a lower likelihood compared to the ARMA-GARCH model. In conclusion, we opt for the ARMA-GARCH model as our best choice, but recognize the need for further validation and research to enhance both fitting performance and interpretability.

Reference

[1] https://ionides.github.io/531w24/16/index.html

[2] Shumway RH, Stoffer DS (2017). Time Series Analysis and its Applications: With R Examples. 4th edition. Springer.

[3] Ruppert, D. and Matteson, D. (2015) Statistics and Data Analysis for Financial Engineering with R Examples, Springer.

[4] Analysis of Financial Time Series, Second Edition, RUEY S. TSAY.

[5] Stats 509 Lecture Notes Chapter 10

[6] https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test

[7] https://en.wikipedia.org/wiki/Ljung%E2%80%93Box_test

[8] Bret´o C (2014). “On idiosyncratic stochasticity of financial leverage effects.” Statistics & Probability Letters, 91, 20–26.

[9] https://ionides.github.io/531w24/midterm_project/project15/comments.html

[10] https://ionides.github.io/531w20/final_project/Project5/final.html