Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC



Objectives




6.1 Seasonal autoregressive moving average (SARMA) models



6.1.1 Question: Which of [S2] and/or [S3] is a SARMA model?




6.1.2 Question: Why do we assume a multiplicative structure in [S1]?

  • What theoretical and practical advantages (or disadvantages) arise from requiring that an ARMA model for seasonal behavior has polynomials that can be factored as a product of a monthly polynomial and an annual polynomial?




6.1.3 Fitting a SARMA model

  • Let’s do this for the full, monthly, version of the Lake Huron data described in Section 5.5.

  • First, we’ll revisit reading in the data.

system("head huron_depth.csv",intern=TRUE)
##  [1] "# downloaded on 1/24/16 from\r"                                                    
##  [2] "# http://www.glerl.noaa.gov/data/dashboard/data/levels/mGauge/miHuronMog.csv\r"    
##  [3] "# Lake Michigan-Huron:, Monthly Average Master Gauge Water Levels (1860-Present)\r"
##  [4] "# Source:, NOAA/NOS\r"                                                             
##  [5] "Date, Average\r"                                                                   
##  [6] "01/01/1860,177.285\r"                                                              
##  [7] "02/01/1860,177.339\r"                                                              
##  [8] "03/01/1860,177.349\r"                                                              
##  [9] "04/01/1860,177.388\r"                                                              
## [10] "05/01/1860,177.425\r"
dat <- read.table(file="huron_depth.csv",sep=",",header=TRUE)
dat$Date <- strptime(dat$Date,"%m/%d/%Y")
dat$year <- as.numeric(format(dat$Date, format="%Y"))
dat$month <- as.numeric(format(dat$Date, format="%m"))
head(dat)
##         Date Average year month
## 1 1860-01-01 177.285 1860     1
## 2 1860-02-01 177.339 1860     2
## 3 1860-03-01 177.349 1860     3
## 4 1860-04-01 177.388 1860     4
## 5 1860-05-01 177.425 1860     5
## 6 1860-06-01 177.461 1860     6
huron_depth <- dat$Average
time <- dat$year + dat$month/12 # Note: we treat December 2011 as time 2012.0, etc
plot(huron_depth~time,type="l")

  • Now, we get to fit a model. Based on our previous analysis, we’ll go with AR(1) for the annual polynomial. Let’s try ARMA(1,1) for the monthly part. In other words, we seek to fit the model \[ (1-{\Phi}_1 B^{12})(1-{\phi}_1 B) Y_n = (1+{\psi}_1 B)\epsilon_n.\]
huron_sarma11x10 <- arima(huron_depth,
   order=c(1,0,1),
   seasonal=list(order=c(1,0,0),period=12)
)
huron_sarma11x10
## 
## Call:
## arima(x = huron_depth, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 0), 
##     period = 12))
## 
## Coefficients:
##          ar1     ma1    sar1  intercept
##       0.9641  0.3782  0.5104   176.5714
## s.e.  0.0063  0.0203  0.0218     0.0909
## 
## sigma^2 estimated as 0.002592:  log likelihood = 2884.36,  aic = -5758.72
  • Residual analysis is similar to what we’ve seen for non-seasonal ARMA models.

  • We look for residual correlations at lags corresonding to multiples of the period (here, 12, 24, 36, …) for misspecified annual dependence.

acf(resid(huron_sarma11x10))



6.1.4 Question: What do you conclude from this residual analysis? What would you do next?




6.2 ARMA models for differenced data




6.2.1 Question: why “almost” not “exactly” in the previous statement?




6.2.2 Why fit an ARIMA model?

  • There are two reasons to fit an ARIMA(p,1,q) model
  1. You may really think that modeling the differences is a natural approach for your data. The S&P 500 stock market index analysis in Section 3.5 is an example of this, as long as you remember to first apply a logarithmic transform to the data.

  2. Differencing often makes data look “more stationary” and perhaps it will then look stationary enough to justify applying the ARMA machinery.

  • We should be cautious about this second reason. It can lead to poor model specifications and hence poor forecasts or other conclusions.

  • The second reason was more compelling in the 1970s and 1980s. With limited computing power and the existence of computationally convenient (but statistically inefficient) ARMA algorithms, it makes sense to force as many data analyses as possible into the ARMA framework.

  • ARIMA analysis is relatively simple to do. It has been a foundational component of time series analysis since the publication of the influential book “Time Series Analysis” by Box and Jenkins (1st edition, 1970) which developed and popularized ARIMA modeling. A practical approach is:

  1. Do a competent ARIMA analysis.

  2. Identify potential limitations in this analysis and remedy them using more advanced methods.

  3. Assess whether you have in fact learned anything from (2) that goes beyond (1).




6.2.3 Question: What is the trend of the ARIMA(p,1,q) model?

  • Hint: recall that the ARIMA(p,1,q) model specification for \(Y_{1:N}\) implies that \(Z_n = (1-B)Y_n\) is a stationary, causal, invertible ARMA(p,q) process with mean \(\mu\). Now take expectations of both sides of the difference equation.




6.2.4 Question: What is the trend of the ARIMA(p,d,q) model, for general \(d\)?




6.3 The SARIMA\((p,d,q)\times(P,D,Q)\) model




6.4 Modeling trend with ARMA noise.




6.4.1 Linear regression with ARMA errors

  • When the trend function has a linear specification, \[\mu_n = \sum_{k=1}^K Z_{n,k}\beta_k,\] the signal plus ARMA noise model is known as linear regression with ARMA errors.

  • Writing \(Y\) for a column vector of \(Y_{1:N}\), \(\mu\) for a column vector of \(\mu_{1:N}\), \(\eta\) for a column vector of \(\eta_{1:N}\), and \(Z\) for the \(N\times K\) matrix with \((n,k)\) entry \(Z_{n,k}\), we have a general linear regression model with correlated ARMA errors, \[ Y = Z\beta + \eta.\]

  • Maximum likelihood estimation of \(\theta = ({\phi}_{1:p},{\psi}_{1:q},\sigma^2,\beta)\) is a nonlinear optimization problem. Fortunately, arima in R can do it for us, though as usual we should look out for signs of numerical problems.

  • Data analysis for a linear regression with ARMA errors model, using the framework of likelihood-based inference, is therefore procedurally similar to fitting an ARMA model.

  • This is a powerful technique, since the covariate matrix \(Z\) can include other time series. We can evaluate associations between different time series. With appropriate care (since association is not causation) we can draw inferences about mechanistic relationships between dynamic processes.




6.4.2 Example: Looking for evidence of systematic trend in the depth of Lake Huron

  • Let’s restrict ourselves to annual data, say the January depth.
monthly_dat <- subset(dat, month==1)
huron <- monthly_dat$Average
year <- monthly_dat$year
plot(x=year,y=huron,type="l")

  • Visually, there seems some evidence for a decreasing trend, but there are also considerable fluctuations.

  • Let’s test for a trend, using a regression model with Gaussian AR(1) errors. We have previously found that this is a reasonable model for these data.

  • First, let’s fit a null model.

fit0 <- arima(huron,order=c(1,0,0))
fit0
## 
## Call:
## arima(x = huron, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.8694   176.4588
## s.e.  0.0407     0.1234
## 
## sigma^2 estimated as 0.04368:  log likelihood = 22,  aic = -38
  • Now, we can compare with a linear trend model.
fit1 <- arima(huron,order=c(1,0,0),xreg=year)
fit1
## 
## Call:
## arima(x = huron, order = c(1, 0, 0), xreg = year)
## 
## Coefficients:
##          ar1  intercept     year
##       0.8240   186.0146  -0.0049
## s.e.  0.0451     3.7417   0.0019
## 
## sigma^2 estimated as 0.0423:  log likelihood = 24.62,  aic = -41.25
  • To talk formally about these results, we’d better write down a model and some hypotheses. Writing the data as \({y_{1:N}^*}\), collected at years \(t_{1:N}\), the model we have fitted is \[ (1-{\phi}_1 B)(Y_n - \mu - \beta t_n) = \epsilon_n,\] where \(\{\epsilon_n\}\) is Gaussian white noise with variance \(\sigma^2\). Our null model is \[ H^{\langle 0\rangle}: \beta=0,\] and our alternative hypothesis is \[ H^{\langle 1\rangle}: \beta\neq 0.\]




6.4.3 Question: How do we test \(H^{\langle 0\rangle}\) against \(H^{\langle 1\rangle}\)?

  • Construct two different tests using the R output above.

  • Which test do you prefer, and why?

  • How would you check whether your preferred test is indeed better?




6.4.4 Question: What other supplementary analysis could you do to strengthen your conclusions?