1 Question Description

Real estate investing may take a variety of forms. Real estate investments may be classified along two dimensions: debt or equity based, and in private or public markets. Equity investments in real estate that occur in the private markets are often referred to as direct investments in real estate. The money to finance real estate property purchases comes from many sources. A well-known form of debt financing of real estate purchases is mortgages. Private investors—institutional and individual, real estate corporations, and REITs-may provide the equity financing for the purchase.

REITs sell shares to raise funds to make property purchases. REIT shares are typically publicly traded and represent an indirect investment in real estate property. REIT index use the prices of publicly traded shares of REITs to construct the indices. Therefore, the REIT index is a good way to evaluate the value of real estates as the trade of REITs share is more consistent campared with trading real estates directly.

The observations for the Wilshire US Real Estate Investment Trust Total Market Index (Wilshire US REIT) represent the daily index value at market close. The total market indice are total market returns, which do include reinvested dividends. The historical Wilshare US REIT index data used in the project comes from FEDERAL RESERVE BANK of ST. Louis. I get access to the data via the package quantmod.

I am interested in its short-term trend in the future, hoping to predict its future prices according to the past year data by a proper time series model.

2 Data Analysis

2.1 Exploratory Data Analysis

## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "WILLREITIND"
## Warning: 'indexClass<-' is deprecated.
## Use 'tclass<-' instead.
## See help("Deprecated") and help("xts-deprecated").

This is the REIT index from 1977. As there are too many missing values in the early years and the financial crisis in 2009 affected the data dramaticlly, I tend to study the data starting from 2010 and then remove a few existing missing values. There are 2054 observations left after deleting data before 2010 and missing values.

From the time series plot, we can see that REIT indice show a increasing trend during the years. However, there are some oscillations near each time points. Thus, I determine to study the detrending REIT indice instead. The time series plot and sample auto-correlation plot are shown as follows:

We can see from the time series time plot that the sequence is mean stationary. For the sample auto-correlation plot, we can see that almost the auto-correlations for almost all lags fall between the two dashed lines. This could be a indication for covariance stationarity. We then dig more into the periodogram to find out whether there is a cycle exist in the sequence.

## [1] 51.42857

We see that the dominant frequency is 0.01944444, which corresponds to a period of 51.42857 days after deleting the days that trades of REIT shares are not open.

2.2 Model Selection

Next, we can fit stationary Gaussian ARIMA\((p,0,q)\). It is natural to choose \((p,q)\) by AIC scores. The lower AIC score reflects the higher log-likelihood and fewer parameters.

## Loading required package: knitr
MA0 MA1 MA2 MA3 MA4 MA5
AR0 24944.55 24913.24 24853.57 24855.42 24854.32 24829.97
AR1 24903.02 24890.29 24855.47 24828.00 24822.29 24816.36
AR2 24866.80 24859.56 24853.65 24815.17 24805.38 24813.34
AR3 24852.33 24810.55 24811.35 24808.65 24808.96 24803.54
AR4 24842.30 24811.64 24814.54 24808.68 24810.70 24812.90
AR5 24824.77 24812.44 24808.05 24811.62 24811.51 24808.64

We can see that the ARIMA(3,0,4) model gives the smallest AIC.However, I tend to work with ARIMA(4,0,1) and ARIMA(4,0,0) model as the AIC value is not too far away from that of ARIMA(3,0,4). Although AIC rewards model simplicity, does so only as far as complexity leads to poor prediction from overfitting. Other considerations are that smaller models reduce problems with parameter identifiability, invertibility, and numerical stability which we know are common when fitting larger ARMA models. Implicity may be particularly valuable if we want to interpret parameters. Redundant models (or close to redundant) are undesirable, whatever the AIC.

Also it is noticable that the table is inconsistent — adding a parameter can only increase the maximized log-likelihood, i.e. the AIC can only increase by $$2. But there are several positions violating such a rule. This can only come about by imperfect likelihood calculation and/or maximization. We may suspect a problem with likelihood maximization.

2.3 Fit a Model

We first fit the ARIMA(4,0,0) model:

## 
## Call:
## arima(x = detrend, order = c(4, 0, 0))
## 
## Coefficients:
##           ar1     ar2     ar3      ar4  intercept
##       -0.1274  0.1522  0.0961  -0.0853     1.1725
## s.e.   0.0222  0.0222  0.0241   0.0245     2.2632
## 
## sigma^2 estimated as 9830:  log likelihood = -12416.15,  aic = 24842.3

Thus the fitted model is: \[(1-0.0239B+0.0.0240B^2-0.0261B^3+0.0886B^4)(X_n-3.1403)=\epsilon_n\]

Where \(B\) is the backshift operator, \(\{\epsilon_n\}\) is white noise with standard deviation 82.1. The likelihood calculation also assumes that \(\{\epsilon_n\}\) is Gaussian, i.e. an independent sequence with \(\epsilon_n ∼ N(0, 82.12)\).

Then we fit the ARIMA(4,0,1) model:

## 
## Call:
## arima(x = detrend, order = c(4, 0, 1))
## 
## Coefficients:
##           ar1     ar2     ar3      ar4     ma1  intercept
##       -0.8948  0.0489  0.2087  -0.0276  0.7864     1.1776
## s.e.   0.0525  0.0303  0.0312   0.0284  0.0472     2.3232
## 
## sigma^2 estimated as 9675:  log likelihood = -12399.82,  aic = 24811.64

Thus the fitted model is: \[(1+0.2277B+0.0187B^2-0.0204B^3+0.0827B^4)(X_n-3.1642)=(1+0.2528B)\epsilon_n\] Where \(B\) is the backshift operator, \(\{\epsilon_n\}\) is white noise with standard deviation 82.1. The likelihood calculation also assumes that \(\{\epsilon_n\}\) is Gaussian, i.e. an independent sequence with \(\epsilon_n ∼ N(0, 82.12)\).

We then want to decide which model is better.

## Likelihood ratio test
## 
## Model 1: arima(x = detrend, order = c(4, 0, 0))
## Model 2: arima(x = detrend, order = c(4, 0, 1))
##   #Df LogLik Df  Chisq Pr(>Chisq)    
## 1   6 -12416                         
## 2   7 -12400  1 32.668  1.093e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Likelihood ratio test
## 
## Model 1: arima(x = detrend, order = c(0, 0, 0))
## Model 2: arima(x = detrend, order = c(4, 0, 0))
##   #Df LogLik Df  Chisq Pr(>Chisq)    
## 1   2 -12471                         
## 2   6 -12416  4 110.25  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The null hypothesis is that the smaller model is the “best” model; It is rejected when the test statistic is large. In other words, if the null hypothesis is rejected, then the larger model is a significant improvement over the smaller one. The p-value for the test is large which means we should not reject the null hypothesis. Therefore, ARIMA(4,0,1) model is no better than ARIMA(4,0,0) model. Also we use likelihood ratio test to make sure that ARIMA(4,0,0) model is better than the reduced ARIMA(0,0,0) model, which indicates REIT index is not an independent data.

3 Diagnostics

As we have choosed ARIMA(4,0,0) model as before, we will do some diagnostics for the fitted model.

From the auto-correlation plot of residuals, we can see that auto-correlation of almost all lags fall between the two dashed lines except for \(lag=9\), which means the residuals appear to be uncorrelated.

From the residuals vs. fitted values plot and the density plot of residuals, we can conclude that the distribution of the residuals is not skewed.

From qqplot, we can see that the distribution of residuals is long-tailed at both ends, meaning more extreme values will occur.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fit0)
## W = 0.83175, p-value < 2.2e-16

Then I conduct a Shapiro-Wilk test for the residuals. The null hypothesis for Shapiro-Wilk test is that the sequence follows normal distribution and p-value for the test is very small. Thus we have to reject the null hypothesis. As the normal assumption of residuals has been violated, the estimations of standard errors and confidence intervals are not dependable while the point estimation can still be valid. The further focuses may be transforming the sequence and looking for predictors to be added to the model.

4 Conclusion

I find ARIMA(4,0,0) model fits REIT index time series by using the AIC table and comparing the model with ARIMA(4,0,1). However, though the autocovariance of the data seems that ARIMA(4,0,0) model fits well on the data, the qqplot shows that the residuals have long tail at both ends, deviateing from the independent normal distribution residual assumption, which indicates a more complex model is needed.

Though ARIMA(4,0,0) model is not a good model fitting real data, it does show some main characteristics of the data. First, REIT index has a significant increasing trend, and the trend disappears by differentiating this time series index once. Second, REIT index does not have a seasonal effect, but there is a cycle of around 2 months in the sequence. Therefore, the model could be a reference in predicting the future.

Finally, the study shows that REIT index series is not an independent data. Today’s index is related to historical index. However, ARIMA(4,0,0) model may not be the most exact model, and further analysis with more complex model is needed.

5 Reference