National Association of Securities Dealers Automated Quotations, also called as Nasdaq or Nasdaq exchange, is a global electronic marketplace for buying and selling securities. Nasdaq is also used to refer to the Nasdaq Composite, an index of more than 3,000 stocks listed on the Nasdaq exchange.
Gold is the most popular investment of all the precious metal. Its price volatility and differentiated characteristics from stocks have made the gold price an important indicator of global economic trends.
Nasdaq index and Gold price are both important indicators of global economy. Historically, people treated gold as a safe-harbor asset that protects purchasing power against inflation. Also, Nasdaq index showed superior hedging ability than other financial products in terms of multiple compositions of almost all stocks listed on the Nasdaq stock exchange. It is reasonable for us to think that they may have some certain relationship and associated behavior pattern.
In this project, we would figure out the relationship between the Nasdaq index and the Gold Price. We would fit the suitable model between two data sets and look up for their patterns.
We used 10 years Nasdaq index price data and Gold price data in this project. The time conditions for two data sets are equal with March 2012 to January 2022, total of 119 months. The historical data of Nasdaq index and Gold Price were downloaded from the yahoo finance (https://finance.yahoo.com/).
data=read.csv("data.csv", header = TRUE)
head(data)
## Date NASDAQ_Index Gold_Price
## 1 2012/3/1 3091.57 1721.1
## 2 2012/4/1 3046.36 1677.5
## 3 2012/5/1 2827.34 1653.4
## 4 2012/6/1 2935.05 1612.2
## 5 2012/7/1 2939.52 1597.2
## 6 2012/8/1 3066.96 1587.4
summary(data)
## Date NASDAQ_Index Gold_Price
## Length:119 Min. : 2827 Min. :1064
## Class :character 1st Qu.: 4526 1st Qu.:1237
## Mode :character Median : 5825 Median :1315
## Mean : 6861 Mean :1424
## 3rd Qu.: 8071 3rd Qu.:1640
## Max. :15645 Max. :1966
In the summary of the data, we could figure out that the price range of the Nasdaq was 2827 to 15645, while the price range of the gold price was 1064 to 1966. It can be assumed that the Nasdaq price fluctuated much more violently than the Gold Price in the given time series.
data$date = as.Date(data$Date)
date=data$date
Nasdaq=data$NASDAQ_Index
Gold=data$Gold_Price
par(mar=c(5, 4, 4, 5))
plot(date, Nasdaq, col = "red", xlim = c(as.Date("2012-03-01"), as.Date("2022-01-01")),main = "Time Plot of NASDAQ Index and Gold Price", xlab = "", ylab = "NASDAQ Index", col.lab = "red", type = 'l')
par(new = TRUE)
plot(date, Gold, col = "blue", xlim = c(as.Date("2012-03-01"), as.Date("2022-01-01")), axes = FALSE, xlab="Time", ylab = "", type = 'l')
axis(side=4, col ="blue")
mtext("Gold Price", col = "blue", side = 4, line = 4)
We used blue line to represent the Gold Price and red line to represent the Nasdaq index data during the time window of 2012 to 2022. During the given time sereis, the Nasdaq index steadily increased while the Gold price tended to decrease until 2016 and increase from 2016 to 2022. Also, after 2016, the Nasdaq index and the gold price showed similar pattern. In conclusion, from the data overview, we could check that there exists reasonable relationship between the Nasdaq index and the gold price data. However, there also existed too much fluctuation between two time series data sets which may caused by other factors in the market.
As we learnt in the lecture, we need to find whether the fluctuations are relevant. We use Loess Smoothing to extract the trend, noise and cycle component.
Low frequency component could be considered as trend. High frequency component could be considered as noise. Trend component may be affected by long-term economic and financial situations, and noise could be attributed to various reasons and factors. Therefore, we don’t need to include these parts while building models.
Mid-range frequency component could be considered as cycle component. It is what we should use to build relavant models.
Now we could put the cycle component of two data sets into one plot. As we could see, fluctuation of gold price is much more violent than NASDAQ index. However, we could still see a tendency that as two data sets fluctuate in a similar pattern. What’s more, it’s better comparing with previous plots. Therefore, we could say eliminating the trend and noise is a good choice.
In the plot, we could find that the gold price tends to increase as Nasdaq index increases, and decreases otherwise. In other words, we could find a strong tendency that two data sets fluctate in a simliar pattern.
As we know, a general ARMA(p,q) model is \[\phi(B)(Y_n-\mu)=\psi(B)\epsilon_n\]
where \[\mu=E[Y_n]\] \[\phi(x)=1-\phi_1x-\phi_2x^2-...-\phi_px^p\]
\[\psi(x)=1+\psi_1x+\psi_2x^2+...+\psi_px^p\]
What’s more, {\(\epsilon_n\)} is the white noise process and B is the backshift operator.
In our problem, we use following ARMA errors model
\[I_n^c=\alpha+\beta P_n^c+w_n\]
where {\(w_n\)} is the Gaussian ARMA Process and \(I_n^c\), \(P_n^c\) are cycle components.
We use AIC to choose a suitable ARMA Model for our models.
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 1730.86 | 1599.01 | 1496.51 | 1472.14 | 1403.91 |
AR1 | 1553.66 | 1440.25 | 1411.56 | 1398.42 | 1368.37 |
AR2 | 1439.94 | 1394.01 | 1398.21 | 1380.89 | 1372.40 |
AR3 | 1415.51 | 1394.66 | 1396.27 | 1377.67 | 1372.17 |
AR4 | 1415.76 | 1394.99 | 1396.73 | 1374.83 | 1375.35 |
Based on AIC table, ARMA(1,4) model showed the minimum AIC value. Because large models may have some problems such as redundancy, causality, and invertibility, we should check whether our models are suitable.
arma14=arima(nas_cycles,xreg=gold_cycles,order=c(1,0,4))
abs(polyroot(c(1,-arma14$coef[1])))
## [1] 1.475903
abs(polyroot(c(1,arma14$coef[2:5])))
## [1] 1.028068 1.107269 1.028068 1.107269
Based on the results above, ARMA(1,4) has both causality and invertiblity. Therefore, we could choose ARMA(1,4) as our model. However, since the AIC table shows that ARMA(1,4), ARMA(2,4), and ARMA(3,4) have close AIC values, we should perform diagnostic check before choosing the final model.
We first perform log-likelihood ratio test between ARMA(1,4) and ARMA(3,4) with a 2 degrees of freedom.
loglikratio=as.numeric(logLik(arima(nas_cycles,xreg=gold_cycles,order=c(1,0,4)))
-logLik(arima(nas_cycles,xreg=gold_cycles,order=c(3,0,4))))
p_value=1-pchisq(2*loglikratio,df=2)
p_value
## [1] 1
Since the p-value is very large, we could not reject our null hypothesis. That is, we should choose ARMA(1,4) over ARMA(3,4). Then, we perform the log-likelihood test on ARMA(1,4) and ARMA(2,4).
loglikratio=as.numeric(logLik(arima(nas_cycles,xreg=gold_cycles,order=c(1,0,4)))
-logLik(arima(nas_cycles,order=c(2,0,4))))
p_value=1-pchisq(2*loglikratio,df=1)
p_value
## [1] 0.275819
We could see that the p-value is still greater than 0.05, so we should choose our final model as ARMA(1,4).
Last but not least, we need to check model assumptions. From the plot below we can know that although the residuals vary a lot, its changes are mainly around 0.
From the ACF plot below we can see that most lags do not have a significant autocorrelation. Overall we can conclude that the errors are not correlated.
From the QQ plot below we could see that the residuals are mostly normally distributed with slightly heavy right and left tail, indicating the normality assumption are mostly being met.
To test our model, we fit our model in the different time window from Feb 2007 to Feb 2012.
## Date NASDAQ_Index Gold_Price
## 1 2012-02-01 2966.89 1737.8
## 2 2012-01-01 2813.84 1565.8
## 3 2011-12-01 2605.15 1745.5
## 4 2011-11-01 2620.34 1711.0
## 5 2011-10-01 2684.41 1620.4
## 6 2011-09-01 2415.40 1826.0
modtest1=arima(nas_cycles.test,xreg=gold_cycles.test,order=c(1,0,4),include.mean = FALSE)
modtest1
##
## Call:
## arima(x = nas_cycles.test, order = c(1, 0, 4), xreg = gold_cycles.test, include.mean = FALSE)
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 gold_cycles.test
## 0.6422 1.3062 0.0000 -1.3062 -1.0000 -0.1324
## s.e. 0.1064 0.1378 0.1733 0.1597 0.1534 0.1277
##
## sigma^2 estimated as 1692: log likelihood = -315.08, aic = 644.17
According to the the results above, we could find that the coefficients of ARMA(1,4) are significant.
abs(polyroot(c(1,-modtest1$coef[1])))
## [1] 1.557061
abs(polyroot(c(1,modtest1$coef[2:5])))
## [1] 1.000014 1.000006 1.000006 1.000004
Also, the model is casual and invertible since the roots of \(\phi(x)\) and \(\psi(x)\) are outside the unit circle.
We can check the Gaussian White noise assumption from the residuals.
According to ACF, using our model to fit the data from 2007 to 2012 seems reasonable and the Gaussian white noise assumption is not violated. We can conclude that we build the well suited model to find the relationship between two data sets.
As we mentioned in the Data overview part, the Gold price decreased before 2016 while Nasdaq was increased at that time window. Since our model was well fitted for the previous test data set, this violence seems not to be significant in building relationship between two data sets. However, we can still think about other factors that may impact when we are trying to find some relationships between composite stock prices and the precious metals.
The gold price can be under-perfored when people are less confident in the long-standing safe assets. This may be generated by the continued depreciation of currency. The causes of this phenomenon may be the central bank activities, rampant inflation, or potential interest rate changes. We can consider those impacts by including related parameters in our models.
Some specific events can also be other factors that may influence the Nasdaq index and Gold Price. For instance, the Covid-19 in 2019 had a huge impact on the global financial markets. Although there are always new critical events in the market, we may consider those factors also by factorizing those events into numeric values.
A model for Nasdaq indexa and Gold price was following \((1 - 0.68B)(I_n^c - 0.47P_n^c) = (1 + 1.77B + 1.79B^2 + 1.67B^3 + 0.77B^4)\epsilon_n\)
Not only fitted well for the training data from 2012 to 2022, our model also fitted well for the different time window data from 2007 to 2012. However, we can still consider other factors and variables to advance our model since our model only can evaluate the cycle components.
Class notes of Stats 531 (Winter 2022) ‘Analysis of Time Series’, instructor: Edward L. Ionides (https://ionides.github.io/531w22/)
Investopedia (https://www.investopedia.com/terms/n/nasdaq.asp)
Investopedia (https://www.investopedia.com/articles/stocks/11/how-to-invest-in-the-periodic-table.asp)
The Relationship between the NASDAQ Index and Crude Oil Price (https://ionides.github.io/531w18/midterm_project/project42/Midterm_Project.html)
USnews (https://money.usnews.com/investing/investing-101/articles/should-you-invest-in-gold)