There were approximately 804,000 suicides globally in 2012, which is an age-standardized suicide rate of 11.4 per 100,000 people. The WHO estimates every 40 seconds someone dies of suicide. It is the second leading cause of death for people aged 15-29 globally.
Pinning down the exact causes of suicide is extremely difficult and controversial. It’s seems that the suicide rate has strong correlations with many social factors like economics, education, Drug and Alcohol use, gun prevalence and divorce law.
Here, we are interested in finding the correlation between the suicide rate and unemployment in U.S..
s_data = read.csv(file="us_suicide.csv", header=T, sep=",")
head(s_data)
## Year U.S.
## 1 X1950 11.221060
## 2 X1951 10.248620
## 3 X1952 9.890234
## 4 X1953 9.985363
## 5 X1954 10.051450
## 6 X1955 10.125830
The suicide rate is 100,000 per year. The rate is adjusted for the changing age structure of the population.
One can also break down suicide rate by gender, race, geography, etc. The data can be found online. It is interesting and relevant to investigate the consistency of any established pattern across sub-populations, but here we’ll focus on a single, national analysis, combining both genders.
u_data <- read.table(file="adjusted_unemployment.csv",sep=",",header=TRUE)
head(u_data)
## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 1948 3.4 3.8 4.0 3.9 3.5 3.6 3.6 3.9 3.8 3.7 3.8 4.0
## 2 1949 4.3 4.7 5.0 5.3 6.1 6.2 6.7 6.8 6.6 7.9 6.4 6.6
## 3 1950 6.5 6.4 6.3 5.8 5.5 5.4 5.0 4.5 4.4 4.2 4.2 4.3
## 4 1951 3.7 3.4 3.4 3.1 3.0 3.2 3.1 3.1 3.3 3.5 3.5 3.1
## 5 1952 3.2 3.1 2.9 2.9 3.0 3.0 3.2 3.4 3.1 3.0 2.8 2.7
## 6 1953 2.9 2.6 2.6 2.7 2.5 2.5 2.6 2.7 2.9 3.1 3.5 4.5
These are the unemployment data that we looked at in Chapter 8.
Denote the time \(t_n = 1949+n\), \(t_n=1,...,56\).
Write \({s_n^*}\) for suicide rate in year \(t_n\).
Write \({u_n^*}\) for mean unemployment in year \(t_n\).
t <- c(1950:2005)
s <- s_data$U.S.
u <- apply(u_data[u_data$Year %in% t, 2:13],1,mean)
plot(ts(cbind(s,u),start=1950),main="Percent unemployment(u) and suicide rate(s) for U.S.",xlab="Year")
We are interested in changes over business cycle timescales, once trends have been removed.
To extract the cyclical component, we use an econometric method called the Hodrick-Prescott(HP) filter.
Specifically, for a time series \({y_{1:N}^*}\), the HP filter is the time series \({s_{1:N}^*}\) constructed as
\[{s_{1:N}^*} = \arg\min_{s_{1:N}} \left\{ \sum^{N}_{n=1}\big({y_n^*}-s_{n}\big)^2 + \lambda\sum^{N-1}_{n=2}\big(s_{n+1}-2s_{n}+s_{n-1}\big)^2 \right\}.\]
A standard econometric choice of \(\lambda\) for removing nonlinear trend, and therefore extracting the business cycle component, in annual data is \(\lambda=100\).
We use this to define the HP-detrended suicide rate, \(s^{HP*}_{1:N}\), and unemployment, \(u^{HP*}_{1:N}\).
require(mFilter)
s_hp <- hpfilter(s, freq=100,type="lambda",drift=F)$cycle
u_hp <- hpfilter(u, freq=100,type="lambda",drift=F)$cycle
plot(t,u_hp,type="l",xlab="Year",ylab="", main="Detrended unemployment and detrended suicide rate")
par(new=TRUE)
plot(t,s_hp,col="red",type="l",axes=FALSE,xlab="",ylab="")
axis(side=4, col="red")
legend(2000,0.85, c("u_hp","s_hp"),lwd=c(1,1),col=c("black","red"),cex=0.5)
We can make a test to check that the detrended suicide rate and unemployment are with the same cycle.
For example, we can analyze \(s^{HP*}_{1:N}\) using a regression with ARMA errors model, \[ S^{HP}_n = \alpha + \beta u^{HP}_n + \epsilon_n,\] where \(\{\epsilon_n\}\) is a Gaussian ARMA process.
To choose the parameters \(p\), \(q\) of ARMA(p,q) model, we give out the AIC table as below.
aic_table <- function(data,P,Q,xreg=NULL){
table <- matrix(NA,(P+1),(Q+1))
for(p in 0:P) {
for(q in 0:Q) {
table[p+1,q+1] <- arima(data,order=c(p,0,q),xreg=xreg)$aic
}
}
dimnames(table) <- list(paste("<b> AR",0:P, "</b>", sep=""),paste("MA",0:Q,sep=""))
table
}
e_aic_table <- aic_table(s_hp,4,4,xreg=u_hp)
## Warning in arima(data, order = c(p, 0, q), xreg = xreg): possible
## convergence problem: optim gave code = 1
require(knitr)
kable(e_aic_table,digits=2)
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 13.73 | 9.32 | 10.67 | 11.45 | 13.35 |
AR1 | 8.62 | 10.61 | 12.56 | 13.38 | 15.35 |
AR2 | 10.61 | 12.62 | -1.40 | 9.89 | 8.24 |
AR3 | 12.34 | 2.30 | -1.00 | 10.12 | 0.71 |
AR4 | 12.17 | 1.92 | 2.51 | 0.56 | 2.07 |
We can see from the warning messages that we have numerical difficulties in optimizing the logliklihood function.
The AIC table suggests that the model with ARMA(2,2) erroes is the best small model.
arima(s_hp,xreg=u_hp,order=c(2,0,2))
##
## Call:
## arima(x = s_hp, order = c(2, 0, 2), xreg = u_hp)
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept u_hp
## 1.7669 -0.9033 -1.9978 0.9997 0.0056 0.0920
## s.e. 0.0525 0.0519 0.1444 0.1445 0.0014 0.0427
##
## sigma^2 estimated as 0.03703: log likelihood = 7.7, aic = -1.4
polyroot
function.AR_root = polyroot(c(1,-1.7669,0.9033))
abs(AR_root)
## [1] 1.052165 1.052165
MA_root = polyroot(c(1,-1.9978,0.9997))
abs(MA_root)
## [1] 1.00015 1.00015
r = resid(arima(s_hp, xreg=u_hp, order=c(2,0,2)))
plot(r, type ="p")
abline(h=0, lty=2)
We can see two extreme outliers in year 1950 and 1977.
There is a little evidence for fluctuations decreasing in amplitude over time. But not serious here.
qqnorm(r)
The QQ-plot shows a few heavy right tails.
We may want to check what happened in 1950 and 1977.
acf(r)
The residuals appear to be uncorrealated, since ACF at each lag falls betwwen the dotted lines.
We may conclude the residuals are Gaussian white noise.
We may think of an ARIMA(p,1,q) model to fit the annual changes in suicide rate, rather than difference from a trend.
In this case, we consider the variable \[ \Delta {s_n^*} = {s_n^*} - {s_{n-1}^*}.\]
delta_s <- s[2:56] - s_data$U.S.[1:55]
delta_t <- c(1951:2005)
plot(t,u,type="l",xlab="Year",ylab="")
par(new=TRUE)
plot(delta_t,delta_s,col="red",type="l",axes=FALSE,xlab="",ylab="")
axis(side=4,col="red")
The relationship between unemployment and differenced life expectancy is harder to see than when HP-detrended.
The relationship is also harder to find by statistical methods.
arima(s, xreg=u, order=c(2,1,2))
##
## Call:
## arima(x = s, order = c(2, 1, 2), xreg = u)
##
## Coefficients:
## ar1 ar2 ma1 ma2 u
## 0.4928 -0.6413 -0.7707 1.0000 0.1624
## s.e. 0.1515 0.1264 0.0762 0.0955 0.0202
##
## sigma^2 estimated as 0.07786: log likelihood = -9.9, aic = 31.8
Here, we see no evidence for the relationship.
An experiment found no evidence of an effect might have been a bad choice of experiment, or might have been carried out poorly.
It appears that temporal differencing has destroyed too much of the evidence we were hoping it would clarify.
arima(s_hp, xreg=u_hp, order=c(2,0,2))
##
## Call:
## arima(x = s_hp, order = c(2, 0, 2), xreg = u_hp)
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept u_hp
## 1.7669 -0.9033 -1.9978 0.9997 0.0056 0.0920
## s.e. 0.0525 0.0519 0.1444 0.1445 0.0014 0.0427
##
## sigma^2 estimated as 0.03703: log likelihood = 7.7, aic = -1.4
The fitted model here is \[s^{HP}_n-1.7669s^{HP}_{n-1}+0.9033s^{HP}_{n-2} = 0.0056 + 0.0920 u^{HP}_n + \epsilon_n-1.9978\epsilon_{n-1}+0.9997\epsilon_{n-2}\]
Although this model fits good, the coefficients are hard to interpret.
The standard errors, computed from the observed Fisher information approximation, suggset a statistically significant association between cyclical variation in unemployment and suicide rate.
We can also compute a p-value from a likelihood ratio test.
log_lik_ratio <- as.numeric(
logLik(arima(s_hp,xreg=u_hp,order=c(2,0,2))) -
logLik(arima(s_hp,order=c(2,0,2)))
)
LRT_pval <- 1-pchisq(2*log_lik_ratio,df=1)
cat(LRT_pval)
## 3.506149e-06
tt = 1:30
arima(s_hp[tt], xreg=u_hp[tt], order=c(2,0,2))
##
## Call:
## arima(x = s_hp[tt], order = c(2, 0, 2), xreg = u_hp[tt])
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept u_hp[tt]
## -1.4539 -0.6748 1.9948 0.9999 0.0439 0.2554
## s.e. 0.1496 0.1349 0.2926 0.2926 0.0470 0.0420
##
## sigma^2 estimated as 0.04065: log likelihood = 1.82, aic = 10.37
There is some suggestion that the association is stronger in the time period 1950-1980, but difference is not large compared to the standard error on the coefficient.
It seems reasonable that the relationship between suicide rate and the economy is changing over time because of the changes of human beings’ values and life styles.
Since the ARMA(2,2) model is hard to interpret and seems to be unstable, we may want to choose another simpler model say ARMA(1,0) from the AIC table.
arima(s_hp, xreg=u_hp, order=c(1,0,0))
##
## Call:
## arima(x = s_hp, order = c(1, 0, 0), xreg = u_hp)
##
## Coefficients:
## ar1 intercept u_hp
## 0.3707 0.0097 0.1266
## s.e. 0.1340 0.0514 0.0386
##
## sigma^2 estimated as 0.05905: log likelihood = -0.31, aic = 8.62
The fitted model is \[s^{HP}_n-0.3707s^{HP}_{n-1} = 0.0097 + 0.1266 u^{HP}_n\]
We can compute a p-value from a likelihood ratio test.
log_lik_ratio <- as.numeric(
logLik(arima(s_hp,xreg=u_hp,order=c(1,0,0))) -
logLik(arima(s_hp,order=c(1,0,0)))
)
LRT_pval <- 1-pchisq(2*log_lik_ratio,df=1)
cat(LRT_pval)
## 0.00173612
We check that detrended suicide rate and detrended unemployment cycle together.
There is clear evidence of pro-cyclical suicide rate at a notional level in the US. For example, in 1981-1983 and 1991-1992, we have high unemployment but below-trend suicide rate.
More data, perhaps a state-level or panel analysisi combining many time series, might be able to improve the signal to noise ration and lead to clearer results.
To talk about the association, it is plausible that economic fluctuations drive the suicide fluctuations. (the reverse causation can not be true)
We regard unemployment as a proxy variable for economic fluctuations. We do not claim that increased unemployment itself is necessarily directly causing reduced mortality.
The association we have found is clearest with no time lag, and (as we have seen previously) economic fluctuations between periods of boom and bust have historically had quite variable duration. A stable lagged relationship between economic activity and suicide rate has not yet been discovered.
Beside economic recession, the suicide rate may has association with education, Drug and Alcohol use, gun prevalence and divorce law, which needs further discussions.
Lindsay Lee and Max Roser (2015) - ‘Suicide’. Published online at OurWorldInData.org. Retrieved from: http://ourworldindata.org/data/health/suicide/ [Online Resource]