Introduction


Dataset

s_data = read.csv(file="us_suicide.csv", header=T, sep=",")
head(s_data)
##    Year      U.S.
## 1 X1950 11.221060
## 2 X1951 10.248620
## 3 X1952  9.890234
## 4 X1953  9.985363
## 5 X1954 10.051450
## 6 X1955 10.125830
u_data <- read.table(file="adjusted_unemployment.csv",sep=",",header=TRUE)
head(u_data)
##   Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 1948 3.4 3.8 4.0 3.9 3.5 3.6 3.6 3.9 3.8 3.7 3.8 4.0
## 2 1949 4.3 4.7 5.0 5.3 6.1 6.2 6.7 6.8 6.6 7.9 6.4 6.6
## 3 1950 6.5 6.4 6.3 5.8 5.5 5.4 5.0 4.5 4.4 4.2 4.2 4.3
## 4 1951 3.7 3.4 3.4 3.1 3.0 3.2 3.1 3.1 3.3 3.5 3.5 3.1
## 5 1952 3.2 3.1 2.9 2.9 3.0 3.0 3.2 3.4 3.1 3.0 2.8 2.7
## 6 1953 2.9 2.6 2.6 2.7 2.5 2.5 2.6 2.7 2.9 3.1 3.5 4.5
t <- c(1950:2005)
s <- s_data$U.S.
u <- apply(u_data[u_data$Year %in% t, 2:13],1,mean)
plot(ts(cbind(s,u),start=1950),main="Percent unemployment(u) and suicide rate(s) for U.S.",xlab="Year")


Detrend the data using Hodrick-Prescott(HP) Filter

\[{s_{1:N}^*} = \arg\min_{s_{1:N}} \left\{ \sum^{N}_{n=1}\big({y_n^*}-s_{n}\big)^2 + \lambda\sum^{N-1}_{n=2}\big(s_{n+1}-2s_{n}+s_{n-1}\big)^2 \right\}.\]

require(mFilter)
s_hp <- hpfilter(s, freq=100,type="lambda",drift=F)$cycle
u_hp <- hpfilter(u, freq=100,type="lambda",drift=F)$cycle
plot(t,u_hp,type="l",xlab="Year",ylab="", main="Detrended unemployment and detrended suicide rate")
par(new=TRUE)
plot(t,s_hp,col="red",type="l",axes=FALSE,xlab="",ylab="")
axis(side=4, col="red")
legend(2000,0.85, c("u_hp","s_hp"),lwd=c(1,1),col=c("black","red"),cex=0.5)


Model Selection

AIC Table

  • We can make a test to check that the detrended suicide rate and unemployment are with the same cycle.

  • For example, we can analyze \(s^{HP*}_{1:N}\) using a regression with ARMA errors model, \[ S^{HP}_n = \alpha + \beta u^{HP}_n + \epsilon_n,\] where \(\{\epsilon_n\}\) is a Gaussian ARMA process.

  • To choose the parameters \(p\), \(q\) of ARMA(p,q) model, we give out the AIC table as below.

aic_table <- function(data,P,Q,xreg=NULL){
  table <- matrix(NA,(P+1),(Q+1))
  for(p in 0:P) {
    for(q in 0:Q) {
       table[p+1,q+1] <- arima(data,order=c(p,0,q),xreg=xreg)$aic
    }
  }
  dimnames(table) <- list(paste("<b> AR",0:P, "</b>", sep=""),paste("MA",0:Q,sep=""))
  table
}
e_aic_table <- aic_table(s_hp,4,4,xreg=u_hp)
## Warning in arima(data, order = c(p, 0, q), xreg = xreg): possible
## convergence problem: optim gave code = 1
require(knitr)
kable(e_aic_table,digits=2)
MA0 MA1 MA2 MA3 MA4
AR0 13.73 9.32 10.67 11.45 13.35
AR1 8.62 10.61 12.56 13.38 15.35
AR2 10.61 12.62 -1.40 9.89 8.24
AR3 12.34 2.30 -1.00 10.12 0.71
AR4 12.17 1.92 2.51 0.56 2.07
  • We can see from the warning messages that we have numerical difficulties in optimizing the logliklihood function.

  • The AIC table suggests that the model with ARMA(2,2) erroes is the best small model.

arima(s_hp,xreg=u_hp,order=c(2,0,2))
## 
## Call:
## arima(x = s_hp, order = c(2, 0, 2), xreg = u_hp)
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  intercept    u_hp
##       1.7669  -0.9033  -1.9978  0.9997     0.0056  0.0920
## s.e.  0.0525   0.0519   0.1444  0.1445     0.0014  0.0427
## 
## sigma^2 estimated as 0.03703:  log likelihood = 7.7,  aic = -1.4
  • To chcek the causality and invertibility of ARMA(2,2) model, we check the roots of the polynomials using polyroot function.
AR_root = polyroot(c(1,-1.7669,0.9033))
abs(AR_root)
## [1] 1.052165 1.052165
MA_root = polyroot(c(1,-1.9978,0.9997))
abs(MA_root)
## [1] 1.00015 1.00015
  • Finding the absolute value shows that we have the roots outside the unit circle, so this ARMA(2,2) model is causal and invertible.

Residual Analysis

  • Now, let’s check the residuals for the fitted model, and look at their sample autocorrelation.
r = resid(arima(s_hp, xreg=u_hp, order=c(2,0,2)))
plot(r, type ="p")
abline(h=0, lty=2)

  • We can see two extreme outliers in year 1950 and 1977.

  • There is a little evidence for fluctuations decreasing in amplitude over time. But not serious here.

qqnorm(r)

  • The QQ-plot shows a few heavy right tails.

  • We may want to check what happened in 1950 and 1977.

acf(r)

  • The residuals appear to be uncorrealated, since ACF at each lag falls betwwen the dotted lines.

  • We may conclude the residuals are Gaussian white noise.

Analysis of Temporal Differences

  • We may think of an ARIMA(p,1,q) model to fit the annual changes in suicide rate, rather than difference from a trend.

  • In this case, we consider the variable \[ \Delta {s_n^*} = {s_n^*} - {s_{n-1}^*}.\]

delta_s <- s[2:56] - s_data$U.S.[1:55]
delta_t <- c(1951:2005)
plot(t,u,type="l",xlab="Year",ylab="")
par(new=TRUE)
plot(delta_t,delta_s,col="red",type="l",axes=FALSE,xlab="",ylab="")
axis(side=4,col="red")

  • The relationship between unemployment and differenced life expectancy is harder to see than when HP-detrended.

  • The relationship is also harder to find by statistical methods.

arima(s, xreg=u, order=c(2,1,2))
## 
## Call:
## arima(x = s, order = c(2, 1, 2), xreg = u)
## 
## Coefficients:
##          ar1      ar2      ma1     ma2       u
##       0.4928  -0.6413  -0.7707  1.0000  0.1624
## s.e.  0.1515   0.1264   0.0762  0.0955  0.0202
## 
## sigma^2 estimated as 0.07786:  log likelihood = -9.9,  aic = 31.8
  • Here, we see no evidence for the relationship.

  • An experiment found no evidence of an effect might have been a bad choice of experiment, or might have been carried out poorly.

  • It appears that temporal differencing has destroyed too much of the evidence we were hoping it would clarify.


Fitted Model

arima(s_hp, xreg=u_hp, order=c(2,0,2))
## 
## Call:
## arima(x = s_hp, order = c(2, 0, 2), xreg = u_hp)
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  intercept    u_hp
##       1.7669  -0.9033  -1.9978  0.9997     0.0056  0.0920
## s.e.  0.0525   0.0519   0.1444  0.1445     0.0014  0.0427
## 
## sigma^2 estimated as 0.03703:  log likelihood = 7.7,  aic = -1.4
log_lik_ratio <- as.numeric(
   logLik(arima(s_hp,xreg=u_hp,order=c(2,0,2))) -
   logLik(arima(s_hp,order=c(2,0,2)))
)
LRT_pval <- 1-pchisq(2*log_lik_ratio,df=1)
cat(LRT_pval)
## 3.506149e-06
tt = 1:30
arima(s_hp[tt], xreg=u_hp[tt], order=c(2,0,2))
## 
## Call:
## arima(x = s_hp[tt], order = c(2, 0, 2), xreg = u_hp[tt])
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  intercept  u_hp[tt]
##       -1.4539  -0.6748  1.9948  0.9999     0.0439    0.2554
## s.e.   0.1496   0.1349  0.2926  0.2926     0.0470    0.0420
## 
## sigma^2 estimated as 0.04065:  log likelihood = 1.82,  aic = 10.37
arima(s_hp, xreg=u_hp, order=c(1,0,0))
## 
## Call:
## arima(x = s_hp, order = c(1, 0, 0), xreg = u_hp)
## 
## Coefficients:
##          ar1  intercept    u_hp
##       0.3707     0.0097  0.1266
## s.e.  0.1340     0.0514  0.0386
## 
## sigma^2 estimated as 0.05905:  log likelihood = -0.31,  aic = 8.62
log_lik_ratio <- as.numeric(
   logLik(arima(s_hp,xreg=u_hp,order=c(1,0,0))) -
   logLik(arima(s_hp,order=c(1,0,0)))
)
LRT_pval <- 1-pchisq(2*log_lik_ratio,df=1)
cat(LRT_pval)
## 0.00173612

Conclusions



References

Lindsay Lee and Max Roser (2015) - ‘Suicide’. Published online at OurWorldInData.org. Retrieved from: http://ourworldindata.org/data/health/suicide/ [Online Resource]