Understanding and forecasting stock price volatility is vital for navigating today’s increasingly turbulent financial markets. From the COVID-19 shock to ongoing macroeconomic uncertainty, accurately modeling asset return dynamics plays a central role in risk management, portfolio optimization, and derivative pricing (1). This study compares two prominent time series frameworks: ARMA-GARCH and Partially Observed Markov Process (POMP) models—in capturing the volatility behavior of Apple Inc. (AAPL) log-returns between 2020 and 2025. While ARMA-GARCH is a benchmark in financial econometrics, combining linear return dynamics with conditional heteroskedasticity, POMP models offer a state-space approach capable of handling nonlinearities and latent processes (2). By evaluating both models on a dataset encompassing major market regime changes, this research explores their effectiveness in addressing stylized facts like volatility clustering, leverage effects, and fat tails. The goal is to shed light on the trade-offs between traditional and more flexible modeling strategies in the context of evolving financial environments.
The dataset we used was obtained from Kaggle, a popular platform for sharing datasets and data science resources. It contains historical stock price information for Apple Inc. (AAPL) starting from 1980. For the purpose of this analysis, we focused on data from January 2020 onward, a period that captures significant market events such as the COVID-19 pandemic, monetary policy shifts, and post-pandemic market adjustments. This time frame provides a rich environment for evaluating volatility modeling techniques under various economic regimes.
First, we plot Apple Inc.’s (AAPL) daily Close prices from January 2020-01 to 2025-01, sourced from dataset (3). While the dataset includes high/low prices and trading volume, we focus on Close prices to isolate foundational daily price movements.
Figure 2.1: Close Price of Apple Stock Over Time
Apple’s stock exhibits distinct structural phases. A major shift occurred in 2020, marked by a rapid surge in prices and increased volatility—likely triggered by the COVID-19 pandemic and a global acceleration in digital adoption. The stock reached its peak in late 2022 amid strong product demand, followed by a corrective phase. As of 2024, the price has rebounded significantly, nearing previous highs. If this trajectory continues into 2025, Apple may surpass prior records, although this outlook depends on broader macroeconomic conditions and innovation cycles.
Figure 3.1: Density Plot of Apple Stock Price
Figure 3.2: STL Decomposition of Apple Stock Price
According to the density plot, the distribution appears fairly symmetrical with two peaks, but it does not follow a normal distribution. Next, we apply a logarithmic transformation to the original price series, and then we apply the STL (Seasonal-Trend decomposition using Loess) algorithm to decompose the stock price time series, and the results are shown in the figure. We can observe that between 2020 and 2025, the stock prices exhibit an upward trend, with a generally steady overall movement. It is also evident that the seasonal pattern in the stock price series is not very obvious. Over the five-year period, the algorithm identifies five cycles. At the beginning of each year, there is a slight drop in price, but no clear seasonality or monthly cyclical pattern is observed.
The heightened volatility post-2020 introduces non-stationarity into the price series, violating core assumptions required for ARMA modeling. To address this, we compute log returns as the first difference of log-transformed daily Closeing prices. As the log returns closely approximate continuously compounded returns, making them more suitable for long-term financial analysis (4). After that, we subtract the mean from the log returns to obtain the demeaned daily log return series (5).
\[\begin{align} y^*_n &= log(z_n) - log(z_{n-1}) \\ a^*_n &= y^*_n - \frac{1}{N- 1}\sum^N_{k=2}y^*_k \end{align}\]
Where \(z^*_i\) is the original price, \(y^*_i\) is the log return, and \(a^*_i\) is the demeaned daily return value. \(N\) is the sample size.
Figure 3.3: Apple Log Returns Over Time
The plot of Apple’s log returns from 2020 to 2024 exhibits volatility clustering, with pronounced fluctuations during 2020 which likely reflecting the pandemic-driven market turbulence and 2022 which might potentially tied to macroeconomic uncertainty. Returns oscillate around zero, indicating mean reversion, but display heavy-tailed behavior, evidenced by extreme spikes. Post-2022, volatility moderates slightly, though persistent fluctuations suggest ongoing market sensitivity. The series highlights non-stationary variance, a hallmark of financial returns, necessitating models that account for time-dependent volatility.
Later, we will also use different models to analyze volatility and explore which model best captures the trend and the volatility of Apple’s stock price.
Figure 4.1: ACF plots of Close Prices
Figure 4.2: ACF plots of Log Returns
The autocorrelation function plots for Apple’s stock reveal stark contrasts between raw close prices and log-transformed returns. The ACF of the raw prices displays strong, slowly decaying autocorrelations—persisting up to lag 30 with values around 0.6–0.8—indicating pronounced non-stationarity and trend persistence, a common trait in price-level series.
In contrast, the ACF of log returns rapidly decays to near zero after just a few lags. Most spikes fall within the 95% confidence bounds, suggesting no statistically significant autocorrelation. This behavior is consistent with stationarity and supports the weak-form Efficient Market Hypothesis (EMH)(6), which posits that past return information holds little to no predictive power. A minor spike at lag 1 may hint at weak momentum or mean-reversion effects, but the overall lack of structure validates the use of returns for time series modeling.
Therefore, the requirement of stationarity for ARMA and related models has been satisfied.
We start our modeling process with the autoregressive-moving average model (ARMA) model, which is given by:
\[ Y_t = c + \sum_{i=1}^{p} \phi_i Y_{t-i} + \epsilon_t + \sum_{j=1}^{q} \theta_j \epsilon_{t-j} \]
where \(\{\epsilon_n\}\) is a white noise process following \(N(0, \sigma^2)\) (7). Terms with \(\phi\) are autoregressive terms, and terms with \(\psi\) are moving average terms. From this general form of ARMA(p,q) model, we can derive the ARMA(1,1) model as the following expression
\[ \text{ARMA(1,1): } X_t = c+ \phi X_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1} \]
The AICs of fitted ARMA models are used to determine which model has the best performance, as well as the log-likelihoods (8):
\[ \mathrm{AIC} = -2 \times l(\theta^*) + 2D \\ \ell(\theta) = \log L(\theta) = \sum_{i=1}^{n} \log f(X_i \mid \theta) \] where \(l(\theta^*)\) represents the maximized log likelihood and \(D\) represents the number of parameters (9). The formula shows that AIC penalizes overfitting models for their increasing number of parameters. The AICs of models from ARMA(0, 0) to ARMA(4, 4) are summarized in the table below.
##
## Call:
## arima(x = df$log_return, order = c(p, 0, q))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 ma4 intercept
## -0.8632 0.8639 0.9679 0.7750 -0.9446 -0.8908 0.0604 8e-04
## s.e. 0.0181 0.0075 0.0190 0.0335 0.0279 0.0334 0.0304 1e-04
##
## sigma^2 estimated as 0.0003793: log likelihood = 3171.08, aic = -6324.15
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | -6282.04 | -6295.25 | -6293.64 | -6292.29 | -6290.29 |
AR1 | -6295.67 | -6293.81 | -6291.84 | -6290.28 | -6288.30 |
AR2 | -6293.78 | -6291.76 | -6289.94 | -6288.63 | -6286.60 |
AR3 | -6292.10 | -6290.13 | -6291.14 | -6321.59 | -6324.15 |
AR4 | -6290.45 | -6304.77 | -6324.03 | -6321.80 | -6320.64 |
To identify the most appropriate ARMA model, we compared the AIC table for the models and selected ARMA(1,1) model due to its relative low AIC value and simple structure. Although AR(4) and MA(4) models also yielded very low AICs, indicating statistically comparable performance. Following the principle of parsimony, we chose the simpler ARMA(1,1) specification to reduce the risk of overfitting.
ARMA models capture the conditional mean dynamics of a time series but do not account for time-varying volatility (heteroskedasticity), which is a common feature in financial return data(10). This motivates the use of GARCH models to model conditional variance.
Before introducing GARCH, we conducted Ljung-Box Test and ARCH-LM Test tests on the residual from the fitted ARMA(1,1) model to ensure it is indeed white noise.
##
## Box-Ljung test
##
## data: residuals(best_model)
## X-squared = 19.521, df = 20, p-value = 0.4882
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: residuals(best_model)
## Chi-squared = 220.47, df = 12, p-value < 2.2e-16
By Ljung-Box test, the p-value is 0.4882 > 0.05, indicating he residual sequence has no significant autocorrelation. Therefore, after ARMA(1,1) fitting, the residual is basically white noise. However, the p-value is very small (much less than 0.05), reject the null hypothesis of no ARCH effect in the residuals. So there is a significant volatility clustering, showing the significance of using GARCH model(10).
Figure 4.2: ARMA Diagnostics Plots
Observations: The residuals fluctuate around 0, with no obvious trend structure. Which is signs of slight heteroskedasticity, suggest the potential ARCH effects.
All lags of ACF are within the blue confidence interval, and there is no significant autocorrelation, showing ARMA(1,1) captures the temporal structure of the mean well.
The middle section in the QQ plot is basically along the diagonal line, but the tail has obvious deviations. So the residuals may have fat tails (not subject to normal distribution)
The standard GARCH model is defined as: \[ \sigma_t^2 = \omega + \sum_{i=1}^{p} \alpha_i \epsilon_{t-i}^2 + \sum_{j=1}^{q} \beta_j \sigma_{t-j}^2 \]
where \(\omega > 0\), \(\alpha_i \geq 0\), and \(\beta_j \geq 0\) to ensure that the conditional variance remains positive. The model captures volatility clustering—a characteristic feature of financial time series where large (or small) shocks tend to be followed by similarly large (or small) shocks(10).
The stationarity condition, given by: (_i + _j) < 1, ensures that the effect of shocks gradually dissipates over time, preventing the variance from exploding.
However, the standard GARCH model assumes symmetric volatility response, meaning that both positive and negative shocks \(\epsilon_{t-i}\) have the same effect on future volatility. This assumption can be limiting in financial applications like Apple stock, which often display asymmetric volatility behavior—where negative news tends to increase volatility more than positive news of the same magnitude. This limitation motivates the use of asymmetric GARCH models such as EGARCH and GJR-GARCH.
## == sGARCH with norm distribution ==
##
## Akaike -5.215397
## Bayes -5.190911
## Shibata -5.215442
## Hannan-Quinn -5.206195
## == sGARCH with std distribution ==
##
## Akaike -5.271509
## Bayes -5.242941
## Shibata -5.271570
## Hannan-Quinn -5.260773
## == sGARCH with ged distribution ==
##
## Akaike -5.265615
## Bayes -5.237047
## Shibata -5.265676
## Hannan-Quinn -5.254879
The EGARCH (Exponential GARCH) model addresses asymmetry in volatility dynamics through its logarithmic specification(11):
\[ \log(\sigma_t^2) = \omega + \sum_{i=1}^{p} \alpha_i \left( \frac{|\epsilon_{t-i}|}{\sigma_{t-i}} - \mathbb{E} \left[\frac{|\epsilon_{t-i}|}{\sigma_{t-i}}\right] \right) + \sum_{j=1}^{q} \beta_j \log(\sigma_{t-j}^2) + \sum_{k=1}^{r} \gamma_k \frac{\epsilon_{t-k}}{\sigma_{t-k}} \] Key features include:
Asymmetry: The term \(\gamma_k\) captures the “leverage effect.” If \(\gamma_k < 0\), negative shocks \((\epsilon_{t-k} < 0)\) increase future volatility more than positive shocks of equal magnitude.
No Parameter Constraints: Due to the log specification, the model inherently ensures \(\sigma_t^2 > 0\) without requiring constraints such as \(\omega > 0\), \(\alpha_i \geq 0\), or \(\beta_j \geq 0\).
Fat-Tailed Errors: By assuming \(\epsilon_t \sim t_\nu\) (Student’s t-distribution), the model accommodates the heavy-tailed nature of financial return distributions.
## == eGARCH with t-distribution ==
##
## Akaike -5.273282
## Bayes -5.240634
## Shibata -5.273362
## Hannan-Quinn -5.261013
The GJR-GARCH (Glosten-Jagannathan-Runkle) model also introduces asymmetry in volatility using an indicator function(12): \[ \sigma_t^2 = \omega + \sum_{i=1}^{p} \alpha_i \epsilon_{t-i}^2 + \sum_{j=1}^{q} \beta_j \sigma_{t-j}^2 + \sum_{k=1}^{r} \gamma_k \mathbb{I}(\epsilon_{t-k} < 0) \epsilon_{t-k}^2 \]
Here, \(\mathbb{I}(\epsilon_{t-k} < 0)\) is an indicator function that equals 1 if the past shock is negative and 0 otherwise. When \(\gamma_k > 0\), negative shocks have a larger impact on volatility than positive shocks of the same magnitude.
Key features:
Explicit Leverage Effects: The model captures the asymmetry in financial markets where bad news tends to increase volatility more than good news—consistent with behavioral finance literature.
Fat-Tailed Distributions: Like EGARCH, GJR-GARCH can be fitted with Student’s t-distributed errors \((\epsilon_t \sim t_\nu)\), allowing the model to accommodate heavy tails in return distributions.
## == GJR-GARCH with t-distribution ==
##
## Akaike -5.274614
## Bayes -5.241965
## Shibata -5.274694
## Hannan-Quinn -5.262345
## Log-Likelihood values for different models:
## sGARCH_norm: 3289.0925851674
## sGARCH_std: 3325.41473162354
## sGARCH_ged: 3321.70436912145
## eGARCH_std: 3327.53125064373
## gjrGARCH_std: 3328.3695631054
The gjrGARCH_std model provides the highest log-likelihood(3328.37), suggesting it fits the data best. This might due to Leverage Effect and Fat-Tailed Distribution. But despite having the lowest log-likelihood, sGARCH-norm achieves the lowest AIC due to its simpler structure. Since normal distribution has fewer parameters than the t-distribution (which includes degrees of freedom). And the sGARCH model is simpler than gjrGARCH, which includes additional asymmetric terms. This reflects the trade-off in AIC: it penalizes model complexity while rewarding fit.
If the Goal is to forecast accuracy, then we choose to use sGARCH-norm, which assumes symmetry and normally distributed errors, which may understate tail risks and asymmetric volatility. But in this report our goal is to capture the financial volatility dynamics. So we selected gjrGarch for further analysis.
##
## === Standardized Residuals: Basic Statistics ===
## Index res
## Min. :1970-01-02 00:00:00 Min. :-4.165185
## 1st Qu.:1970-11-12 12:00:00 1st Qu.:-0.585127
## Median :1971-09-23 00:00:00 Median :-0.000317
## Mean :1971-09-23 00:00:00 Mean :-0.008243
## 3rd Qu.:1972-08-02 12:00:00 3rd Qu.: 0.601796
## Max. :1973-06-13 00:00:00 Max. : 5.215588
##
## Skewness: 0.006715734
## Kurtosis: 4.972334
## Jarque-Bera test p-value: 0
##
## === ARCH Effect Test ===
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: res
## Chi-squared = 4.4602, df = 12, p-value = 0.9736
##
##
## === Ljung-Box Test ===
## Ljung-Box test p-value for residuals: 0.8672254
## Ljung-Box test p-value for squared residuals: 0.9748365
Figure 5.1: gjrGARCH Diagnostics Plots
Skewness (0.0067): Skewness is near 0, indicating residuals are hardly biased. Kurtosis (4.97): Far exceeds 3 (normal distribution), confirming heavy-tailed residuals, common in financial returns. Jarque-Bera Test (p=0): Rejects normality, justifying the use of t-distribution in gjrGARCH.
ARCH-LM Test (p=0.9736):Fails to reject the null hypothesis of no ARCH effects at 5% significance. This indicates the gjrGARCH model has successfully captured volatility clustering, leaving no residual heteroskedasticity.
Ljung-Box Tests: Residuals (p=0.87): No autocorrelation in residuals. Squared Residuals (p=0.97): No autocorrelation in squared residuals.
Standardized Residual Series: Fluctuates around zero with sporadic extreme values (-5 to 5), consistent with heavy tails. No visible trends or structural breaks.
Q-Q Plot: Deviations from the reference line at the tails confirm slight non-normality (heavy-tailed).
ACF/PACF of Residuals: All autocorrelations lie within confidence bands, indicating no significant lagged correlations.
Residual Distribution: Peaked center and fat tails, corroborating high kurtosis.
Based on all these observations, gjrGARCH successfully captures volatility clustering.
We propose a Partially Observed Markov Process (POMP) model for stochastic volatility with leverage effects to capture the time-varying volatility and asymmetric return-volatility relationship of Apple stock returns.
Let the latent state be \(X_n = (G_n, H_n)\), where \(G_n\) is Transformed leverage parameter, \(H_n\)is Log volatility at time \(n\), and the observed \(Y_n\) represents the log return of the stock at time \(n\).
According to Bretó (13) and course slides (5), we propose the model:
\[\begin{align} Y_n &= e^{H_n/2} \cdot \epsilon_n, \quad \epsilon_n \sim N(0, 1) \\ H_n &= \mu_h(1 - \phi) + \phi H_{n-1} + \beta_{n-1} R_n e^{-H_{n-1}/2} + \omega_n\\ G_n &= G_{n-1} + \nu_n, \quad \nu_n \sim N(0, \sigma_\nu^2) \end{align}\]
where \[ \beta_{n-1} = Y_{n-1} \cdot \sigma_\eta \cdot \sqrt{1 - \phi^2} \] and
\[ \omega_n \sim N\left(0, \sigma_\eta^2 (1 - \phi^2)(1 - R_n^2)\right) \]
\(\{G_n\}\) is the Gaussian random walk, and \(\{\nu_n\}\) is an iid \(N(0,\sigma_{\nu}^2)\) sequence.
And \(R_n\) is defined as: \[ R_n = \frac{e^{2G_n} - 1}{e^{2G_n} + 1} \]
which is a transformed random walk (5).
The parameters and descriptions of the model are listed in the table below.
Parameter | Description | Type / Notes |
---|---|---|
\(\mu_h\) | Long-term mean of log volatility | |
\(\phi\) | Volatility persistence coefficient | \(\in (0, 1)\) |
\(\sigma_\nu\) | Innovation SD of leverage process | \(> 0\) |
\(\sigma_\eta\) | Volatility shock magnitude | \(> 0\) |
\(G_0\) | Initial leverage state | |
\(H_0\) | Initial log volatility state |
Figure 6.1: Local Search Convergence Diagnostics
We conducted a local search. For parameter settings, we first performed a low-computation test. After observing poor convergence in the log-likelihood, we moderately increased the number of particles and iterations. We ultimately settled on 50 iterations and 1,000 particles, conducting 50 search runs in total. The random walk perturbations for variables \(\mu_h,\sigma_{\nu},\sigma_{\eta},\phi\), were set to 0.02, while the perturbations for \(G_0\) and \(H_0\) were set to 0.1. The diagnostic plot of the model is shown below.
From the diagnostic plot, we can observe that the log-likelihood shows good convergence, with a dispersion range of approximately 100 log units. Additionally, the parameter values across different runs vary significantly, indicating that Multiple sets of different parameters yield similar fitting results.
Figure 6.2: Filter Diagnostics for Local Search
By examining the diagnostic plots, we can see that most points are well-fitted, with high conditional log-likelihood values. However, at certain time points, the effective sample size drops below 50, and the conditional log-likelihood also falls into negative values. This indicates that the model does not fit the data well during those periods, suggesting potential model misspecification.
It is difficult to construct a model that can fit all data points perfectly. We think such situations may arise due to external factors that the model cannot capture, leading to a sharp decline in the weights of most particles. Due to time constraints, we were not able to fine-tune the pomp model. However, we believe that appropriately modifying the parameter distributions in the model assumptions could potentially improve the model’s performance.
logLik | logLik_se | sigma_nu | mu_h | phi | sigma_eta | G_0 | H_0 | |
---|---|---|---|---|---|---|---|---|
result.29 | 3283.007 | 0.5462860 | 0.0065610 | -8.6873949 | 0.8734031 | 0.6873725 | -0.4007429 | -1.1654855 |
result.7 | 3279.962 | 0.2208197 | 0.0005905 | -8.6461464 | 0.8219497 | 0.6776427 | -0.3110682 | -1.3052328 |
result.38 | 3261.522 | 0.4136096 | 0.0001788 | -7.5020717 | 0.9947883 | 1.4034772 | -0.0947381 | -1.8152287 |
result.37 | 3255.956 | 2.8697454 | 0.0005560 | 1.4425003 | 0.9999979 | 60.0630438 | 0.0710530 | -2.1581278 |
result.16 | 3252.806 | 0.4491770 | 0.0321223 | -8.6878223 | 0.8402068 | 0.5243586 | -0.4629967 | -0.4126838 |
result.11 | 3251.952 | 2.7077747 | 0.0084731 | 0.1908663 | 0.9998699 | 6.7011176 | -0.1747451 | -2.3405970 |
result.41 | 3249.780 | 3.7597925 | 0.0110770 | -0.8142286 | 0.9999973 | 56.6661496 | 0.4033691 | -2.0865288 |
result.24 | 3245.812 | 1.6768036 | 0.0023181 | -1.4773851 | 0.9998414 | 6.6308174 | -0.2617641 | -1.7367870 |
result.40 | 3244.617 | 1.4133396 | 0.0083705 | -3.7044816 | 0.9999936 | 31.9497960 | -0.1781560 | -1.8077511 |
result.20 | 3243.914 | 0.9915116 | 0.0019669 | -0.0939205 | 0.9999118 | 9.6540054 | -0.1269780 | -1.5405166 |
Figure 6.3: Pair Plot of the Parameters and Log-likelihood
We plotted the pair plot for samples with relatively high log-likelihood and selected the 10 parameter sets with the highest log-likelihood. Upon observation, it’s difficult to identify any clear signs of convergence among the parameters. Notably, while most values of phi are close to 1, the values of phi corresponding to the highest log-likelihoods are all below 0.9. We will revisit this phenomenon during the global search phase to see if it persists.
Next, we conducted a global search, keeping the parameters consistent with those from the local search. The search range is as follows:
\[ \left\{ \begin{array}{l} \sigma_{\nu} \in [0.005,0.05] \\ \mu_h \in [-1,0] \\ \phi \in [0.5,0.99] \\ \sigma_{\eta} \in [0.5,1] \\ G_0 \in [-2,2] \\ H_0 \in [-1,1] \end{array} \right. \]
Figure 6.4: Global Search Convergence Diagnostics
Figure 6.5: Filter Diagnostics for Global Search
logLik | logLik_se | sigma_nu | mu_h | phi | sigma_eta | G_0 | H_0 | |
---|---|---|---|---|---|---|---|---|
result.15 | 3288.956 | 0.7130946 | 0.0025264 | -8.5799814 | 0.9101560 | 0.6188779 | -0.3682157 | -1.968463 |
result.39 | 3262.690 | 1.3304826 | 0.0033095 | 0.5561058 | 0.9999921 | 24.7525710 | -0.2669330 | -3.159501 |
result.5 | 3261.215 | 0.3133688 | 0.0254157 | -2.8728991 | 0.9999828 | 20.6414034 | -0.5986508 | -3.037613 |
result.49 | 3259.463 | 0.5830393 | 0.0002602 | 0.8721089 | 0.9999161 | 9.7018577 | -0.1681563 | -2.478351 |
result.27 | 3257.131 | 0.6102780 | 0.0032753 | 1.4464212 | 0.9999297 | 10.0371273 | -0.0151745 | -2.500388 |
result.4 | 3254.306 | 0.6584938 | 0.0167594 | 4.8844261 | 0.9998831 | 6.6134495 | -0.1249585 | -2.754614 |
result.32 | 3253.787 | 2.6341751 | 0.0114151 | -4.1671874 | 0.9999796 | 20.5372645 | -0.1467752 | -1.846094 |
result.2 | 3253.103 | 0.4186339 | 0.0076439 | -3.1828183 | 0.9999858 | 29.8621149 | -0.1392136 | -1.557006 |
result.42 | 3252.165 | 2.3804367 | 0.0055121 | -3.3094258 | 0.9991214 | 3.4924973 | -0.2924637 | -1.717574 |
result.16 | 3249.391 | 0.3831847 | 0.0038078 | -7.2005864 | 0.9961747 | 1.2169771 | -0.4647831 | -2.211803 |
From the results, we can see that the highest log-likelihood was still achieved with a phi value around 0.9, reaching up to 3289. The corresponding \(\mu_h\) is also approximately -8.6. However, we still cannot rule out the possibility that this parameter set reached the optimal value due to random perturbations. Therefore, further validation is needed. A \(\phi\) value closer to 1 might provide a more stable parameter combination.
Figure 6.6: Pair Plot of the Parameters and Log-likelihood for Global Search
index | logLik | logLik_se | |
---|---|---|---|
est | 1 | 3288.548 | 0.4377735 |
est1 | 2 | 3261.143 | 0.8500078 |
est2 | 3 | 3258.327 | 1.1275658 |
est3 | 4 | 3257.876 | 1.1428478 |
est4 | 5 | 3258.583 | 3.1848767 |
Therefore, we selected the 5 parameter sets with the highest log-likelihood and simulated each set 50 times. We then observed the mean log-likelihood to determine whether the improved performance was due to perturbation. Based on repeated simulations, the parameter set demonstrates consistent log-likelihood performance, with a mean value of 3288.548 across 50 simulations—closely aligning with the result obtained from the global search.Hence, we decided to use it as the final model parameters.
logLik | logLik_se | sigma_nu | mu_h | phi | sigma_eta | G_0 | H_0 | |
---|---|---|---|---|---|---|---|---|
result.15 | 3288.956 | 0.7130946 | 0.0025264 | -8.579981 | 0.910156 | 0.6188779 | -0.3682157 | -1.968463 |
Figure 6.7: Profile Likelihood Over \(\phi\)
Figure 6.8: Profile Trace for \(\phi\) over \(\mu_h\)
To further investigate the identifiability of the model, we performed a profile likelihood analysis over \(\phi\). The results show that the confidence interval is (0.959, 0.99), with only a few points falling within this interval. This suggests that the model exhibits weak identifiability, indicating considerable room for improvement. Additionally, this result helps explain why parameter convergence issues occurred in both the local and global searches. The data points are not sufficient to accurately identify these parameters. More data or improvements in the model structure are needed to further refine the model.
sGARCH_norm | gjrGARCH | Discrete-time Stochastic Volatility Model(Pomp) | |
---|---|---|---|
Log-likelihood | 3289.09 | 3328.37 | 3288.55 |
We selected the gjrGARCH model as our final choice from the ARMA + GARCH family, and used Global Search to select an optimal set of parameters for the Pomp model. Upon evaluation, the log-likelihood of the gjrGARCH model on the dataset was 3328.27, which is higher than that of the Pomp model at 3288.55. This result is reasonable, as the GARCH model is essentially a regression model relatively simple in nature,thus capable of achieving a better fit. In contrast, the stochastic volatility model offers better interpretability and dynamic structure.
During the diagnostic process, we also found that the Pomp model may suffer from potential misspecification and weak identification issues, which require further improvement. At the same time, we observed that the performance of the standard GARCH model and the Pomp model was very similar. This may be because the gjrGARCH model accounts for fat tails, while we did not implement corresponding enhancements in the original Pomp model. This provides a direction for future improvements to the model.
In this project, we used ARMA + GARCH models and discrete-time stochastic volatility model (Pomp) to fit the log return data of Apple stock. Through model construction, fitting, and diagnostics, we found that the gjrGARCH model demonstrated better fitting performance than Pomp model. On the other hand, the Pomp model showed potential issues with misspecification and weak identification, which require further investigation.
Our work also has several limitations. Due to time constraints and limited computational resources, we were unable to use a sufficient number of guesses when computing the profile log-likelihood over \(\phi\) in the Pomp Model. Additionally, we did not carry out further improvements to the Pomp model. We believe that with more time to refine the model, there is significant room for improvement.
For this assignment, we referred to previous coursework and peer comments. We primarily built upon the methodology from (14), but addressed some of its limitations. Notably, we observed that Project 11 placed limited emphasis on ARMA modeling, which is crucial for capturing autocorrelation structures in financial time series. In our approach, we placed greater focus on residual diagnostics, ensuring that the standardized residuals from our GARCH models behaved like white noise. To this end, we employed the Box-Ljung test and ARCH-LM test to verify the absence of serial correlation and remaining ARCH effects. Furthermore, we incorporated EGARCH and GJR-GARCH models to account for asymmetries in volatility—an essential feature when modeling highly volatile assets such as Apple Inc. (AAPL). Unlike previous work that prioritized model selection solely based on AIC, we emphasized the explanatory power and interpretability of the ARMA-GARCH family in capturing key stylized facts of financial data. We also use the comparison idea from (15), and we added analysis fro potential reasons why ARMA-GARCH models are better than Pomp models.
We also noticed that (16) used basically the same dataset as we do. Compared to their work and the original POMP model, we added more extensive analysis and diagnostics regarding model performance. Although we did not have enough time to modify the model itself, we still conducted a thorough evaluation. Additionally, we identified certain limitations of the POMP model when applied to this particular dataset.
ChatGPT (17) was used to assist with debugging, formatting and consultation. However, all ideas and methods were independently developed by our team members.