In this project, we seek to find the relationship between the NASDAQ Index and Crude Oil Price. We would find out the laws that may be suitable to explain the behavior patterns and the association between these two datasets.
In this project, we are going to look at monthly data of Crude Oil Price for the last 10 years, and NASDAQ for the same period.
data=read.csv("data.csv", header = TRUE)
head(data)
## date Crude_Oil_Price NASDAQ_Index
## 1 2008/1/1 92.97 2418.09
## 2 2008/2/1 95.39 2325.83
## 3 2008/3/1 105.45 2254.82
## 4 2008/4/1 112.58 2368.10
## 5 2008/5/1 125.40 2483.24
## 6 2008/6/1 133.88 2427.45
We would look at the time plot for both of the two datasets and acquire a general idea of their behaviors and relationships.
data$date = as.Date(data$date)
date=data$date
Nasdaq=data$NASDAQ_Index
Crudeoil=data$Crude_Oil_Price
par(mar=c(5, 4, 4, 5))
plot(date, Nasdaq, col = "red", xlim = c(as.Date("2008-01-01"), as.Date("2018-01-01")),main = "Time Plot of NASDAQ Index and Crude Oil Price", xlab = "", ylab = "NASDAQ Index", col.lab = "red", type = 'l')
par(new = TRUE)
plot(date, Crudeoil, col = "blue", xlim = c(as.Date("2008-01-01"), as.Date("2018-01-01")), axes = FALSE, xlab="Time", ylab = "", type = 'l')
axis(side=4, col ="blue")
mtext("Crude Oil Price", col = "blue", side = 4, line = 4)
According to the time plot above, we use red line to represent the behavior of NASDAQ Index versus time during the time window of 2008 to 2018, and use the blue line to represent the behavior of Crude Oil Price. There are some common features shown in the plot that we may be interested.
Therefore, we could know that the two datasets are reasonably associated with each other. We seek to find the common law and rule behind their patten during the selected whole time window. The huge fluctuations during the whole time scale are attributed to many other factors like global financing markets, relevant political reasons, crude oil markets, etc… These factors would have a brief discussion later in the project as they may indicate some methods to improve the model we use in this project.
We seek to find whether the fluctuations are relevant in some certain way. We need to detrend and eliminate the trend first. Here, we would use Loess Smoothing to extract the trend, noise and cycle component.
Year=as.numeric(format(date,format="%Y"))
Month=as.numeric(format(date,format="%m"))
time=Year+(Month-1)/12
nas_low=ts(loess(Nasdaq~time,span=0.5)$fitted,start=time[1],frequency=12)
nas_high=ts(Nasdaq-loess(Nasdaq~time,span=0.1)$fitted,start=time[1],frequency=12)
nas_cycles=ts(Nasdaq-nas_low-nas_high,start=time[1],frequency=12)
ts.nas=ts.union(Nasdaq,nas_low,nas_high,nas_cycles)
colnames(ts.nas)=c("Index","trend","noise","cycles")
cru_low=ts(loess(Crudeoil~time,span=0.5)$fitted,start=time[1],frequency=12)
cru_high=ts(Crudeoil-loess(Crudeoil~time,span=0.1)$fitted,start=time[1],frequency=12)
cru_cycles=ts(Crudeoil-cru_low-cru_high,start=time[1],frequency=12)
ts.cru=ts.union(Crudeoil,cru_low,cru_high,cru_cycles)
colnames(ts.cru)=c("Index","trend","noise","cycles")
plot(ts.nas,main="")
plot(ts.cru,main="")
According to the Loess Smoothing, we could extract the cycle compomnent of the two datasets. We put the cycle components of two series together in one plot.
par(mar=c(5, 4, 4, 5))
plot(time,nas_cycles,type="l",xlab="",ylab="NASDAQ Index",
col="red",col.lab="red",main="Cycle components of NASDAQ Index and Crude Oil Price")
par(new=TRUE)
plot(time,cru_cycles,type="l",col="blue",xlab="",ylab="",axes=FALSE)
axis(side=4,col="blue")
mtext("Crude Oil Price", col = "blue", side = 4, line = 4)
It seems that there may exist a strong tendency that as two datasets fluctuate in a similar pattern once we eliminated the trend and focus on cycle components.
We denote the cycle component of NASDAQ Index at time \(t_n\) as \(I_n^c\), and \(P_n^c\) for the cycle component of Crude Oil Price.
A general ARMA(p,q) model is \(\phi(B)(Y_n-\mu)=\psi(B)\epsilon_n\)
where {\(\epsilon_n\)} is the white noise process and B is the backshift operator,
\(\mu=E[Y_n]\)
\(\phi(x)=1-\phi_1x-\phi_2x^2-...-\phi_px^p\)
\(\psi(x)=1+\psi_1x+\psi_2x^2+...+\psi_px^p\)
We would consider the following ARMA errors model
\(I_n^c=\alpha+\beta P_n^c+w_n\)
where {\(w_n\)} is the Gaussian ARMA Process
We would use AIC Table to choose a suitable ARMA Model for the errors.
aic_table <- function(data,P,Q,xreg=NULL){
table <- matrix(NA,(P+1),(Q+1))
for(p in 0:P) {
for(q in 0:Q) {
table[p+1,q+1] <- arima(data,order=c(p,0,q),xreg=xreg)$aic
}
}
dimnames(table) <- list(paste("<b> AR",0:P, "</b>", sep=""),paste("MA",0:Q,sep=""))
table
}
nas_aic_table <- aic_table(nas_cycles,3,3,xreg=cru_cycles)
require(knitr)
## Loading required package: knitr
kable(nas_aic_table,digits=2)
MA0 | MA1 | MA2 | MA3 | |
---|---|---|---|---|
AR0 | 1545.27 | 1394.08 | 1293.08 | 1190.90 |
AR1 | 1262.89 | 1155.37 | 1108.11 | 1074.40 |
AR2 | 1074.47 | 1062.22 | 1054.37 | 1051.68 |
AR3 | 1055.21 | 1056.14 | 1052.38 | 1050.29 |
From the AIC Table, we could know that ARMA(2,2), ARMA(3,1), and ARMA(3,2) are worth further discussion. Large models may have some problems like redundancy, causality, and invertibility. Hence, we would not choose ARMA(3,3), although they seem to have small AIC from the AIC Table. In addition, we could not find numerical inconsistency of the AIC Table, indicating that the results could be reasonable.
For further analysis, we would check the causality, invertibility and redundancy of the three models mentioned above.
arma22=arima(nas_cycles,xreg=cru_cycles,order=c(2,0,2))
abs(polyroot(c(1,-arma22$coef[1:2])))
## [1] 1.109655 1.109655
abs(polyroot(c(1,arma22$coef[3:4])))
## [1] 1.720917 1.720917
arma31=arima(nas_cycles,xreg=cru_cycles,order=c(3,0,1))
abs(polyroot(c(1,-arma31$coef[1:3])))
## [1] 1.173721 1.173721 1.342288
abs(polyroot(c(1,arma31$coef[4])))
## [1] 6.487626
arma32=arima(nas_cycles,xreg=cru_cycles,order=c(3,0,2))
abs(polyroot(c(1,-arma32$coef[1:3])))
## [1] 1.243168 1.243168 1.398649
abs(polyroot(c(1,arma32$coef[4:5])))
## [1] 1.865209 1.865209
According to the results above, we could find ARMA(2,2), ARMA(3,1),and ARMA(3,2) are both causality and invertibility. Therefore, we would choose ARMA(2,2) as it has relatively small AIC from the AIC Table and relatively small number of parameters.
arma22
##
## Call:
## arima(x = nas_cycles, order = c(2, 0, 2), xreg = cru_cycles)
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept cru_cycles
## 1.7282 -0.8121 0.3792 0.3377 4.0471 6.9664
## s.e. 0.0635 0.0641 0.0953 0.0927 32.4965 1.0758
##
## sigma^2 estimated as 300.3: log likelihood = -520.18, aic = 1054.37
According to the results above, we could find that all the coefficients of ARMA model are significant. However, the standard error of intercept is quite large, which may lead to insignificant results.
We would use likelihood ratio test for the coefficients of the ARMA errors model,that is we would whether the \(\alpha\) and \(\beta\) are zero or not.
Test 1: \(H_0:\alpha=0\) vs.\(H_a:\alpha\neq0\)
loglikratio=as.numeric(logLik(arima(nas_cycles,xreg=cru_cycles,order=c(2,0,2)))
-logLik(arima(nas_cycles,xreg=cru_cycles,order=c(2,0,2),include.mean = FALSE)))
p_value=1-pchisq(2*loglikratio,df=1)
p_value
## [1] 0.9004051
Test 2: \(H_0:\beta=0\) vs.\(H_a:\beta\neq0\)
loglikratio=as.numeric(logLik(arima(nas_cycles,xreg=cru_cycles,order=c(2,0,2)))
-logLik(arima(nas_cycles,order=c(2,0,2))))
p_value=1-pchisq(2*loglikratio,df=1)
p_value
## [1] 1.093445e-08
According to the Hypothesis above, we would know that under 95% significant level, we should not reject Test 1 and reject Test 2. In other words, based on likelihood ratio test we would know that \(\alpha=0\) and \(\beta\neq0\) with 95% confidence interval.
We would adjust our model into the following,
\(I_n^c=\beta P_n^c+w_n\)
where {\(w_n\)} is the Gaussian ARMA(2,2) Process
mod2=arima(nas_cycles,xreg=cru_cycles,order=c(2,0,2),include.mean = FALSE)
mod2
##
## Call:
## arima(x = nas_cycles, order = c(2, 0, 2), xreg = cru_cycles, include.mean = FALSE)
##
## Coefficients:
## ar1 ar2 ma1 ma2 cru_cycles
## 1.7287 -0.8130 0.3786 0.3374 6.9534
## s.e. 0.0633 0.0637 0.0952 0.0927 1.0720
##
## sigma^2 estimated as 300.3: log likelihood = -520.19, aic = 1052.38
abs(polyroot(c(1,-mod2$coef[1:2])))
## [1] 1.109058 1.109058
abs(polyroot(c(1,mod2$coef[3:4])))
## [1] 1.721576 1.721576
According to the results above, we would know that all the coefficients are significant, and we would not consider the intercept term. Furthermore, the model is causal and invertible since the roots of \(\phi(x)\) and \(\psi(x)\) are all outside the unit cicle.
We would check the residuals for the fitted model and sample autocorrelation.
plot(mod2$residuals,ylab="Residuals",main="Residuals for the ARMA(2,2) Errors Model")
According to the plot of residuals with time, we would not find strong evidence of heteroskedasticity.
acf(mod2$residuals, main="ACF of Residuals")
According to the ACF plot, we would find that all the points of sample autocorrelation are inside the region between the dashed line, indicating that the Gaussian white noise is accepted under 95% significant level.
We would use QQ plot to check the normality of the residuals.
qqnorm(mod2$resid)
qqline(mod2$resid,probs = c(0.25,0.75))
According to the QQ plot above, we would know that the residuals are almost normal distribution with slightly heavy right tail, indicating that Gaussian White Noise assumption should not be rejected.
We would use Bootstrap simultion for further discussion.
J=1000
params=coef(mod2)
ar=params[grep("^ar",names(params))]
ma=params[grep("^ma",names(params))]
xreg.coef=params["cru_cycles"]
sigma=sqrt(mod2$sigma2)
theta=matrix(NA,nrow=J,ncol=length(params),dimnames=list(NULL,names(params)))
sgm=rep(NA,length.out=J)
for(j in 1:J){
X_j=ts(arima.sim(
list(ar=ar,ma=ma),
n=length(nas_cycles),
sd=sigma),start=time[1],frequency=12)+xreg.coef*cru_cycles
mod=arima(X_j,order=c(2,0,2),xreg=cru_cycles,include.mean = FALSE)
theta[j,]=coef(mod)
sgm[j]=var(mod$residuals)
}
sqrt(diag(var(theta)))
## ar1 ar2 ma1 ma2 cru_cycles
## 0.07036342 0.07068230 0.10477034 0.10685726 1.08605859
We would use \(\hat\theta_{1:N}\) to compute the confidence interval and make comparison with the confidence interval computed from Fisher Information.
Bootstrap=t(apply(theta,2,quantile,c(0.025,0.975)))
Bootstrap
## 2.5% 97.5%
## ar1 1.5399913 1.8189961
## ar2 -0.9039153 -0.6210666
## ma1 0.1813852 0.5898489
## ma2 0.1337354 0.5380253
## cru_cycles 4.8524883 9.1154515
FisherInformation= cbind(mod2$coef-1.96*sqrt(diag(mod2$var.coef)),mod2$coef+1.96*sqrt(diag(mod2$var.coef)))
colnames(FisherInformation)=c("2.5%","97.5%")
FisherInformation
## 2.5% 97.5%
## ar1 1.6045257 1.8528442
## ar2 -0.9377866 -0.6882166
## ma1 0.1920576 0.5652050
## ma2 0.1557849 0.5190188
## cru_cycles 4.8523747 9.0545031
According to the results above, we would know that the 95% confidence interval computed by Bootstrap simulation are consistant with the 95% confidence interval computed by Fisher Information. Therefore, we could conclude that the results from Fisher Information are trustworthy. Our model may capture the rules or patterns of cycle components of NASDAQ Index and Crude Oil Price during the time window from Jan 2008 to Jan 2018.
We would use the model discussed above to fit the data from Jan 2001 to Jan 2007.
data=read.csv("datatest.csv", header = T)
data$date = as.Date(data$date)
date=data$date
Nasdaq.test=data$NASDAQ
Crudeoil.test=data$CrudeOil
Year=as.numeric(format(date,format="%Y"))
Month=as.numeric(format(date,format="%m"))
t=Year+(Month-1)/12
nas_low=ts(loess(Nasdaq.test~t,span=0.5)$fitted,start=t[1],frequency=12)
nas_high=ts(Nasdaq.test-loess(Nasdaq.test~t,span=0.1)$fitted,start=t[1],frequency=12)
nas_cycles.test=ts(Nasdaq.test-nas_low-nas_high,start=t[1],frequency=12)
ts.nas.test=ts.union(Nasdaq.test,nas_low,nas_high,nas_cycles.test)
colnames(ts.nas.test)=c("Index","trend","noise","cycles")
cru_low=ts(loess(Crudeoil.test~t,span=0.5)$fitted,start=t[1],frequency=12)
cru_high=ts(Crudeoil.test-loess(Crudeoil.test~t,span=0.1)$fitted,start=t[1],frequency=12)
cru_cycles.test=ts(Crudeoil.test-cru_low-cru_high,start=t[1],frequency=12)
ts.cru.test=ts.union(Crudeoil.test,cru_low,cru_high,cru_cycles.test)
colnames(ts.cru.test)=c("Index","trend","noise","cycles")
par(mar=c(8, 7, 7, 8))
plot(t,nas_cycles.test,type="l",xlab="",ylab="NASDAQ Index",
col="red",col.lab="red",main="Cycle components of NASDAQ Index and Crude Oil Price")
par(new=TRUE)
plot(t,cru_cycles.test,type="l",col="blue",xlab="",ylab="",axes=FALSE)
axis(side=4,col="blue")
mtext("Crude Oil Price", col = "blue", side = 4, line = 4)
modtest1=arima(nas_cycles.test,xreg=cru_cycles.test,order=c(2,0,2),include.mean = FALSE)
modtest1
##
## Call:
## arima(x = nas_cycles.test, order = c(2, 0, 2), xreg = cru_cycles.test, include.mean = FALSE)
##
## Coefficients:
## ar1 ar2 ma1 ma2 cru_cycles.test
## 1.3499 -0.7420 1.2787 0.3776 -6.2842
## s.e. 0.1161 0.1116 0.1529 0.1394 3.6701
##
## sigma^2 estimated as 643.4: log likelihood = -394.73, aic = 801.45
According to the the results above, we could find that the coefficients of ARMA(2,2) are significant. We would check the roots of \(\phi(x)\) and \(\psi(x)\) to find whether the model is causal and invertible.
abs(polyroot(c(1,-modtest1$coef[1:2])))
## [1] 1.16088 1.16088
abs(polyroot(c(1,modtest1$coef[3:4])))
## [1] 1.225671 2.160552
According to the the results above, we would find that the model is causal and invertible since the roots of \(\phi(x)\) and \(\psi(x)\) are outside the unit cicle.
We would check the residuals to make sure whether the Gaussian White Noise assumption should be rejected.
acf(modtest1$residuals, main="ACF of Residuals")
According to the ACF plot, using our model to fit the data from 2001 to 2007 may not be a good choice. The Gaussian White Noise assumption is violated。 We need to use more complicated model other than ARMA(2,2) errors model to analyze this time series from 2001 to 2007.
We could find some evidence to explain why our model could not fit the time window 2001 and 2007.
We have mentioned in the data overview part, that the Price of Crude Oil decreased sharpely during the time period from 2014 to 2015. However, the NASDAQ Index still increased with time. This dismatch of the two datasets seems not strongly influence our analysis, we should consider some other factors that may have some impacts when we try to figure out the relationship between some stocks’ index and commodity futures.
According to all the analysis above, we could have the following conclusions,