Association between Unemployment Rate and CPI of Alcoholic Beverages

Introduction
Data Exploration
- Time Plot
- Decomposition
Model
Diagnostic
Bootstrap Simulation
Conclusion
Reference

Introduction

The unemployment rate is the number of people who are unemployed as a percentage of the labour force.
The Consumer Price Index(CPI) is a frequently used statistics for identifying periods of inflation or deflation and the changes in the CPI can be used to assess price changes associated with the cost of living.
I’m curious to know whether people tend to buy more alcohol when being unemployed or just the opposite? The question motivated me to look for the two common economic indicators – the unemployment rate and the CPI for alcoholic beverage. Are these two indicators having a positive association or a negative association?

Data Exploration

Both data are downloaded from the U.S. Bureau of Larbor Statistics.

Time Plot

We first look at the time plot of two series from January 1, 1978 to January 1, 2020.

The blue line shows the CPI for all urban consumers on alcoholic beverages increases with time.
Compared to CPI, the black line, unemployment rate, is more wiggly and exhibiting several spikes. We can know that the two spikes in 2001 and 2008 indicate the U.S. recession periods.
To examine the relationship between these two series, we need to decompose the trend, noise and business cycles and assess if the cycles match. The movement of business cycles may comtain low-frequency trends and high-frequency noise, here we use Loess smoothing to extract the trends, noise, and cycles component.

Decomposition

# Function to decompose
decomp_loess = function(response, time, start_time, freq = 12){
  low = ts(loess(response ~ time, span = .5)$fitted, start = start_time,
           frequency = freq)
  high = ts(response - loess(response ~ time, span = .1)$fitted, 
            start = start_time, frequency = freq)
  cycles = ts(response - low - high, start = start_time, frequency = freq)
  ts = ts.union(response, low, high, cycles)
  colnames(ts) = c("Observed", "Trend", "Noise", "Cycles")
  return(list(cycles, ts))
}

# Decompose CPI for alcoholic beverages
decomp_alc = decomp_loess(alc, time, 1978, freq = 12)
plot(decomp_alc[[2]], main = "Decomposition of CPI as trend + noise + cycles")

# Decompose unemployment rate
decomp_unemp = decomp_loess(unemp, time, 1978, freq = 12)
plot(decomp_unemp[[2]], main = "Decomposition of unemployment as trend + noise + cycles")

We combine the cycle components of CPI for alcoholic beverages and unemployment rate into the same plot.

The figure suggests that he detrended unemployment rate and the detrended CPI for alcoholic beverages cycle together.

Model

To confirm this hypothesis, we further analyze the \(A_{1:N}\) using a regression with ARMA errors model, \[A_n = \alpha + \beta u_n + \epsilon_n\], where {\(\epsilon_n\)} is a Gausian ARMA process.

AIC Table

We will use the AIC table to choose the parameters \(p\), \(q\) of ARMA(p,q) model.

# Function for AIC table
aic_table = function(data, p, q, xreg=NULL){
  table = matrix(NA, p+1, q+1)
  for(p in 0:p) {
    for(q in 0:q) {
      table[p+1,q+1] = arima(data, order = c(p,0,q), xreg = xreg)$aic
    }
  }
  dimnames(table) = list(paste("AR", 0:p, sep = ""), paste("MA", 0:q, sep = ""))
  return(table)
}

# AIC
aic_tbl = aic_table(decomp_alc[[1]], 3, 3, xreg = decomp_unemp[[1]])

##            MA0       MA1        MA2       MA3
## AR0  1907.6641  1218.982   536.1389  -125.888
## AR1  -544.4689 -1229.396 -1875.7121 -2345.931
## AR2 -2563.2860 -3137.285 -3346.6649 -3368.524
## AR3 -3073.7108 -3367.321 -3374.3532 -3374.362

The AIC table shows that ARMA(3,3) has the lowest AIC value followed by ARMA(3,2). Since smaller model is less likely to encounter issues like parameter redundancy, causality, and invertibility. We will leave out ARMA(3,3) and consider ARMA(3,2) for further discussion.

ARMA(3,2)

alcohol_cycles = decomp_alc[[1]]
unemploy_cycles = decomp_unemp[[1]]
arma32 = arima(alcohol_cycles, xreg = unemploy_cycles, order = c(3,0,2))
cat("Polyroot for AR Coefficient: ", abs(polyroot(c(1,-arma32$coef[1:3]))),
    "\nPolyroot for MA Coefficient: ", abs(polyroot(c(1,arma32$coef[4:5]))))

## Polyroot for AR Coefficient:  1.022995 1.022995 1.808804 
## Polyroot for MA Coefficient:  1.241376 3.804177

The result shows that ARMA(3,2) is causality and invertibility.

## 
## Call:
## arima(x = alcohol_cycles, order = c(3, 0, 2), xreg = unemploy_cycles)
## 
## Coefficients:
##          ar1      ar2     ar3     ma1     ma2  intercept  unemploy_cycles
##       2.5017  -2.0330  0.5283  1.0684  0.2118    -0.0063           0.8671
## s.e.  0.0595   0.1179  0.0590  0.0686  0.0699     0.2831           0.0903
## 
## sigma^2 estimated as 6.977e-05:  log likelihood = 1695.18,  aic = -3374.35

The standard errors suggest a statistically significant association between cyclical variation in unemployment and CPI for alcoholic beverages.

Likelihood Ratio Test

We can also test the significance of coefficients from the likelihood ratio test. Suppose we have two nested hypotheses \[ \begin{eqnarray*} H^{\langle 0\rangle} &:& \theta\in \Theta^{\langle 0\rangle}, \\ H^{\langle 1\rangle} &:& \theta\in \Theta^{\langle 1\rangle}, \end{eqnarray*} \] defined via two nested parameter subspaces, \(\Theta^{\langle 0\rangle}\subset \Theta^{\langle 1\rangle}\), with respective dimensions \(D^{\langle 0\rangle}< D^{\langle 1\rangle}\le D\).

We consider the log likelihood maximized over each of the hypotheses, \[ \begin{eqnarray*} \ell^{\langle 0\rangle} &=& \sup_{\theta\in \Theta^{\langle 0\rangle}} \ell(\theta), \\ \ell^{\langle 1\rangle} &=& \sup_{\theta\in \Theta^{\langle 1\rangle}} \ell(\theta). \end{eqnarray*} \] A useful approximation asserts that, under the hypothesis \(H^{\langle 0\rangle}\), \[ \ell^{\langle 1\rangle} - \ell^{\langle 0\rangle} \approx (1/2) \chi^2_{D^{\langle 1\rangle}- D^{\langle 0\rangle}}, \] where \(\chi^2_d\) is a chi-squared random variable on \(d\) degrees of freedom and \(d\) = 1 in our case.

We now test whether \(\alpha\) = 0 or not.

# test alpha
loglik = as.numeric(
  logLik(arima(decomp_alc[[1]], xreg = decomp_unemp[[1]], order = c(3,0,2))) 
  - logLik(arima(decomp_alc[[1]], xreg = decomp_unemp[[1]], order = c(3,0,2),
                 include.mean = FALSE))
)
cat("P-value: ", 1 - pchisq(2*loglik, df = 1))

## P-value:  0.9817706

The p-value of 0.9817706 suggests that we can not reject the null hypothesis. Hence, \(\alpha\) should be 0.

We also test whether \(\beta\) = 0 or not.

# test beta
loglik = as.numeric(
  logLik(arima(decomp_alc[[1]], xreg = decomp_unemp[[1]], order = c(3,0,2))) 
  - logLik(arima(decomp_alc[[1]], order = c(3,0,2))))

# p-value
cat("P-value: ", 1 - pchisq(2*loglik, df = 1))

## P-value:  0

The p-value of 0 suggests that we can reject the null hypothesis at level \(\alpha=.05\) and suggest that \(\beta \neq 0\).

ARMA(3,2) with α = 0

Based on the result, we adjust our model to \(A_n = \alpha + \beta u_n + \epsilon_n\), where {\(\epsilon_n\)} is a Gausian ARMA(3,2) process.

arma32_adj = arima(alcohol_cycles, xreg = unemploy_cycles, order = c(3,0,2),
                   include.mean = FALSE)

## 
## Call:
## arima(x = alcohol_cycles, order = c(3, 0, 2), xreg = unemploy_cycles, include.mean = FALSE)
## 
## Coefficients:
##          ar1      ar2     ar3     ma1     ma2  unemploy_cycles
##       2.5016  -2.0329  0.5282  1.0684  0.2117           0.8670
## s.e.  0.0595   0.1179  0.0590  0.0686  0.0699           0.0902
## 
## sigma^2 estimated as 6.977e-05:  log likelihood = 1695.18,  aic = -3376.35

## Polyroot for AR Coefficient:  1.022995 1.022995 1.808804 
## Polyroot for MA Coefficient:  1.241376 3.804177

All the coefficients are significant and the model remains causal and invertible.

Diagnostic

We proceed to check the residuals for the fitted model and inspect their sample autocorrelation.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,0,2) with zero mean
## Q* = 319.99, df = 18, p-value < 2.2e-16
## 
## Model df: 6.   Total lags used: 24

The residuals appear to have the issue of heteroskedasticity due to the large fluctuations around 1990.
The ACF plot indicates sample autocorrelation since several lags are outside the dashed line.
According to Ljung-Box test, the residuals are significantly different from white noise at level \(\alpha = .05\).

Bootstrap Simulation

# set up
B = 500
ar = c(2.5016252, -2.0328500, 0.5282242)
ma = c(1.0684262, 0.2117450)
xreg_coef = 0.8669930
sigma = sqrt(arma32_adj$sigma2)

theta = matrix(NA, nrow = B, ncol = 6, 
               dimnames = list(NULL, names(coef(arma32_adj))))
s = rep(NA, B)

# Bootstrap B times
for(b in 1:B){
  x = ts(arima.sim(list(ar, ma), n = length(alcohol_cycles), sd = sigma),
         start = start_tm, frequency = 12) + xreg_coef*unemploy_cycles
  mod = arima(x, order = c(3, 0, 2), xreg = unemploy_cycles, include.mean = F)
  theta[b, ] = coef(mod)
  s[b] = var(mod$residuals)
}

# s.e.
sqrt(diag(var(theta)))

##             ar1             ar2             ar3             ma1 
##    0.7397657951    0.6299757690    0.0580636189    0.7482964335 
##             ma2 unemploy_cycles 
##    0.6432908230    0.0003844257

## Bootstrap Confidence Interval:

##                       2.5%     97.5%
## ar1             -1.5317233 1.5621234
## ar2             -1.0155861 0.8179436
## ar3             -0.1135062 0.1066840
## ma1             -1.6047708 1.5429964
## ma2             -0.8580770 0.9999945
## unemploy_cycles  0.8662586 0.8677405

## Fisher Information:

##                        2.5%      97.5%
## ar1              2.38509992  2.6181504
## ar2             -2.26396832 -1.8017317
## ar3              0.41260831  0.6438400
## ma1              0.93398091  1.2028714
## ma2              0.07469427  0.3487957
## unemploy_cycles  0.69026225  1.0437238

Unfortunately, the result is not robust.

Conclusion

Based on the summary table of the model, we know that the unemployment rate and the CPI of alcoholic beverages have a positive relationship. People tend to purchase more alcohols when being unemployed.
However, we have to be cautious about the exsitence of confounding variables, meaning that the unemployment rate may not directly cause the CPI of alcoholic beverages to increase since it is likely to have a third variable such as social status or mental health that is omitted from this analysis.
We only look at the association of two indicators from the mid-range frequency, the cycle components, of the data. At this point, the model doesnot pass the Gaussian white noise assumption. I think the possible reasons include large variation from the economic recession period and the remaining seasonality. Thus, other filters or more complicated model should be considered in the future analysis to improve the goodness of fit.

Reference

Ionides, E. (n.d.). Stats 531 (Winter 2020) “Analysis of Time Series”. Retrieved from https://ionides.github.io/531w20/
Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. https://otexts.com/fpp2/
Ganong, P., & Noel, P. (2015). How does unemployment affect consumer spending?. Unpublished manuscript, Harvard Univ., Cambridge, MA. http://scholar. harvard. edu/files/ganong/files/ganongmpnemploymentpending. pdf Article Location.
Estrella, A. (2007). Extracting business cycle fluctuations: what do time series filters really do?. FRB of New York Staff Report, (289).
The Association Between Home Purchase Prices and Effective Interest Rates. Retrieved from https://ionides.github.io/531w18/midterm_project/project21/midterm_project.html#forecasted-value-versus-the-original-value
Midterm Project. Retrieved from https://ionides.github.io/531w18/midterm_project/project42/Midterm_Project.html
Case study: An association between unemployment and mortality? Retrieved from http://ionides.github.io/531w16/notes10/notes10.html