1. Introduction


2. WTI and Brent visuliazation

summary(WTI$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.82   19.71   28.67   44.16   67.97  145.30
summary(BRT$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.10   18.34   27.30   44.92   67.65  144.00

3. Modeling

1. Data manipulation

  • Since WTI and Brent crude oil price are quite similar, we only take WTI for prediction and modeling. To make the time series smoother and easier for analysis, we compute the monthly mean of the WTI. We select monthly data from May 1987 to December 2014 for modeling, and data from January 2015 until now for prediction.
wti_month=aggregate(WTI[,2], list(WTI[,3],WTI[,4]), mean)
names(wti_month)[names(wti_month)=='Group.1']='year'
names(wti_month)[names(wti_month)=='Group.2']='month'
names(wti_month)[names(wti_month)=='x']='price'
wti_month=wti_month[order(wti_month$year,wti_month$month),]
wti_month$time <- wti_month$year + wti_month$month/12
#train and predict
wti=wti_month[(wti_month$year<=2014)&(wti_month$month<=12),]
wti_test=wti_month[wti_month$year>2014,]
plot(wti$price,type='l',xlab='Time',ylab='WTI monthly price')

2. Detrend

  • Seen from the WTI monthly price, it’s not at all stationary. We can see obvious fluctuation in crude oil price. So we can not fit the ARMA model directly. We need detrend the monthly data as first.

1. hp-filter

  • Hp-filter is a smoothing spline with a particular choice of smoothing parameter. We try various parameters and select the one which separates the trend and cycle for this specific dataset.
wti_hp=hpfilter(wti$price, freq=100,type="lambda",drift=F)
trend=ts(wti_hp$trend)
cycle=ts(wti_hp$cycle)
plot(ts.union(trend,cycle),type="l",xlab="Time",ylab="", main='Decomposition of WTI monthly price as trend and cycle')

  • Since there is a sudden decrease in 2008, the shock change in the trend can not be removed. It shows in both trend part and cylce part.
  • In addition, the fluation is more intensive after 2005. The cycling pattern is quite different before and after 2005.
  • Therefore, even after hp smoothing, the new time series is not appropriate for arma modeling.

2. loess-filter

  • We turn to another filter, loess filter to remove the trend of the data, since hp filter does not work well in this problem.
  • Loess filter is a local linear regression approach. The basic idea is quite simple: at each point in time, we carry out a linear regression using only points close in time. Thus, it is just like a moving window of points included in the regression.
  • In this time series, high frequency variation might be considered “noise” and low frequency variation might be considered trend. A band of mid-range frequencies might be considered to correspond to the cycle. Thus, we can set different frequency to extract the trend, cylce and noise.
wti_low <- ts(loess(wti$price~wti$time,span=0.4)$fitted)
wti_hi <- ts(wti$price - loess(wti$price~wti$time,span=0.07)$fitted)
wti_cycles <- wti$price - wti_low - wti_hi
plot(ts.union(wti$price, wti_low,wti_hi,wti_cycles),
     main="Decomposition of WTI monthly price as trend + noise + cycles")

  • After this decomposition, seen from the plot above, trend part extracts the main trend of the oil price
  • But noise part seems extracting more, it even includes some cycle pattern. It is partly because the monthly data don not show very intensive noise process, since it has been smoothed through average.
  • Therefore, it seems that loess filter fails in this problem.

3. Log transformation and Difference

  • In order to eliminate the effects of the high fluctuation pattern, we do the log transformation of the monthly WTI price time series. Compared with untransfromed data, the fluctuation and the difference of the time period decrease a lot.
plot(wti$price,type='l',ylab='wti price')

plot(log(wti$price),type='l',ylab='log(wti)')

  • However, there is still trend in the data. We eliminate this by taking difference. Seen from the plot below, the time series are generally stationary except for two obivious increas and decrease. It is owing to the sudden crude oil price increase and decrease in 1997 and 2008.

  • We can also do difference of 2. The plot is as follows. It actually increase the stationary, but not brings large improvement. Thus, for the simplicity of the model, we take difference as 1.

plot(diff(log(wti$price),differences = 1),type='l',ylab='difference of log(wti)')

plot(diff(log(wti$price),differences = 2),type='l',ylab='difference of log(wti)')

3. Model Selection

1. Frequency domain

  • After the transformation and difference of the WTI price time series, the time series are generally stationary. Before we fit the arima model, we first explore the data in frequency domain.
  • From the plot below, it shows many peaks, implying periodic pattern. The highest peak has the frequence of 0.094, equvilantly period of 10.6, which approaches 12 months. Therefore, we should add seasonality parameter in the model.
diff_logprice=diff(log(wti$price),differences = 1)
f=spectrum(diff_logprice,spans=c(2,2), main="Smoothed periodogram")

1/f$freq[which.max(f$spec)]
## [1] 10.58824

2. ARMA model

  • Now, Let’s start fitting a stationary ARMA(p,q) model under the null hypothesis that the time series are stationary.
  • We seek to fit a SARIMA\((p,1,q)\times(1,0,1)_{12}\) model for nonstationary monthly data, given by \[ {\phi}(B){\Phi}(B^{12}) \big((1-B)X_n-\mu\big)={\psi}(B){\Psi}(B^{12})\epsilon_n \] where \({\epsilon_n}\) is a Gaussian white noise process, the intercept \(\mu\) is the mean of the differenced process \((1-B)X_n-\mu\), and we have \[ {\phi}(B) = 1-{\phi}_1 B - {\phi}_2 B^2 - \dots - {\phi}_p B^p \] \[ {\Phi}(B^{12}) = 1-{\phi}_1 B^{12} \] \[ {\psi}(B) = 1+{\psi}_1 B + {\psi}_2 B^2 + \dots + {\psi}_q B^q \] \[ {\Psi}(B^{12}) = 1+{\psi}_1 B^{12} \]
  • We We need to decide what to choose in terms of values of p and q. AIC is viewed as a way to select a model with reasonable predictive skill from a range of possibilities.
  • From the AIC table, SARIMA\((2,1,2)\times(1,0,1)_{12}\) is selected because of the low AIC. There are larger models with smaller AIC values, but it does not decrease a lot.

    MA0 MA1 MA2 MA3 MA4
    AR0 -715.42 -742.12 -743.88 -742.69 -741.01
    AR1 -745.88 -743.95 -742.27 -740.89 -745.24
    AR2 -743.97 -741.89 -747.77 -738.89 -744.25
    AR3 -742.49 -747.58 -745.78 -743.89 -742.20
    AR4 -743.19 -745.95 -744.16 -749.29 -742.93
  • The coefficients of the model are shown as below. It is invertible and casual model SARMA model.

sarima=arima(log(wti$price),order=c(2,1,2),seasonal=list(order=c(1,0,1),period=12))
sarima
## 
## Call:
## arima(x = log(wti$price), order = c(2, 1, 2), seasonal = list(order = c(1, 0, 
##     1), period = 12))
## 
## Coefficients:
##          ar1      ar2      ma1     ma2     sar1    sma1
##       1.4390  -0.5399  -1.1556  0.2293  -0.8234  0.8904
## s.e.  0.1288   0.1244   0.1471  0.1416   0.1598  0.1372
## 
## sigma^2 estimated as 0.005843:  log likelihood = 380.89,  aic = -747.77

4. Model Diagnosis

5. Model Evaluation

1. Fit

  • We plot the fitted value and the real value of WTI monthly price below. It shows very high consistency, which implies good performance of our SARMA model.
  • However, there is some translation between fitted and real data. Fitted data seems one month lag from the real data. It may be owing to the algortihm of predict.Arima in R. Since we model the SARMA model with d=1, the initialization of the data may make a difference.

2. Predict

  • We then use the model above to predict the monthly WTI price from January 2015 to February 2016, 14 months in total. It captures the main trend of the monthly WTI price fluctuation.
  • Same as the fitted data, it also shows one month lag. Thus we need to explore the function in R in order to tackle this problem.
  • In general, our model shows good performance both in terms of modeling and prediction.

6. Conclusion

7. Reference