Cholera was introduced to Haiti in 2010 the first time of its history likely by UN peacekeepers from Nepal. Started in central Haiti, the cholera outbreak in 2010 rapidly triggered the world’s largest epidemic of the seventh cholera pandemic (Chin et al. 2011; Eppinger et al. 2014). Five years later, people now in Haiti are still suffering from cholera, and more than 9000 people have therefore died since the outbreak in 2010 (Kirpich et al. 2015). Even though many studies gave profound insights into the mechanism initiating the epidemic in 2010 (Rinaldo et al. 2012; Gaudart et al. 2013; Eisenberg et al. 2013), little attention has been paid to the recent epidemics. However, since the cholera in Haiti may have become endemic, the transmission dynamics can be very different from the initial outbreak. Moreover, followed by withdrawal of fund and aid, the situation in Haiti is getting even more difficult. Therefore, in order to develop a timely and sustainable countermeasures agains cholera in Haiti, better understanding for the underlying mechanisms of recent cholera outbreaks is crucial.
There are plenty of potential factors will affect cholera transmission dynamics, such as rainfall, temperature, waning of population immunity, etc (Koelle 2009). I start with the most widely studied one–the relationship between rainfall and cholera incidence. Lots of related studies have been done, and people found out both negative and positive effect of rainfall on cholera incidence through different mechanisms. For example, flooding can lead to contamination of drinking water supplies; whereas, lack of water will increase the chance for people to use contaminated water source. Either way can increase the risk of cholera transmission. In this project, I tried some preliminary time series analysis for cholera incidence and rainfall data from January 2012 to December 2015 in Artibonite, the biggest department in Haiti (Fig 1). I hope to get some basic ideas about the association between rainfall and cholera incidence as well as whether rainfall would be a potential driver for recent cholera epidemic in Haiti.
Case Data Cholera daily incidence by departments is available on Ministere de la Sante Publique et de la Population (MSPP) website. Report is published everyday in pdf formate online. From 2012 to 2015, 120 dates are missing in the database, and there are data points that appear to be missing at random, with some of the data missing throughout large segments in time, particularly in October 2013 and June 2014. These missing data may bias the outcomes given the non-systematic distribution of the size of the missing data in time. Data crawling and parsing are managed using python 2.7.10
.
Rainfall Data Area averaged rainfall estimates from the National Aeronautics and Space Administration (NASA) Tropical Rainfall Measuring Mission (TRMM) are used in the project. Geographical coverage is selected using longitude [-73.120633, -72.226589] and latitude [18.838958, 19.805066] with resolutions of 0.25° x 0.25°.
(Usually, weekly aggregated data is better for visualization and sesonality analysis, but clustered missing data makes it hard to do sensible aggregation. Therefore, the following analysis is done with daily data, unless specified.)
To better define correlation between rainfall and cholera incidence, both raw cross-correlation function (CCF) and pre-whitening CCF are plotted.
ccf
function is used to examine the raw cross-correlation between rainfall and cholera incidence. Sinusoidal pattern is observed in Fig 5, which suggests complex (likely non-linear) correlation between rainfall and cholera incidence. It seems that rainfall and cholera incidence has most strong (positive) correlation when rainfall leads cholera incidence ( \(h < = 0\).) However, it is hard to say if the pattern of lags is meaningful from Fig4. Since both rainfall and cholera cases show annual seasonality, it could be just that there is a common (seasonal) trend in rainfall data and cholera incidence data.
Pre-whitening approach is thus used to better identify the pattern of CCF. By taking out the common trend between two time serieses and making one of the serieses into white noise , pre-whitening makes it easier to determine specific lags. I used prewhiten
function in TSA
package (Chan & Ripley 2012), which first fit an AR model to rainfall data, and then use the model to filter cholera incidence data, removing potential common trend from the two time serieses. We then compare the residuals from fitted rainfall model and the filtered cholera incidence using CCF (Fig 6).
After taking out possible common trend from both time serieses, the sinusoidal pattern remains in Fig 6 (yet less obvious), meaning the sinusoidal correlation pattern between rainfall and cholera cases is more convincing. The ACF of residual after fitting and AR model to rainfall data is also shown below (Fig 7). The ACF plot is not so bad since less than 5% of ACFs are outside the horizontal lines. However, we can still see slight sinusoidal pattern in the figure, indicating more appropriate model should be considered for pre-whitening process.
Generally, I do not think ARIMA (or even sARIMA) model is a good way to describe daily data with annual seasonality given the fact that high order seasonality is often very noisy. To better illustrate time series with high order seasonality, other models such as Fourier series approach or B-spline might be better choices.
##
## Call:
## arima(x = dep1_case[1:1406], order = c(4, 0, 2), optim.control = list(maxit = 200))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 intercept
## 0.5493 0.8896 -0.3008 -0.1456 -0.0414 -0.7755 23.1682
## s.e. 0.1416 0.1971 0.0748 0.0327 0.1423 0.1216 5.0019
##
## sigma^2 estimated as 68.74: log likelihood = -4550.98, aic = 9115.96
where \(\{\epsilon_n\}\) is a Gaussian ARMA process. I used an ARMA(4,2) model selected by AIC.
##
## Call:
## arima(x = dep1_case[1:1406], order = c(4, 0, 2), xreg = dep1_rain[1:1406], optim.control = list(maxit = 200))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 intercept
## 0.5508 0.8879 -0.2999 -0.1463 -0.0436 -0.7742 23.2637
## s.e. 0.1478 0.2061 0.0770 0.0327 0.1488 0.1274 5.0350
## xreg
## -0.0244
## s.e. 0.0382
##
## sigma^2 estimated as 68.71: log likelihood = -4550.78, aic = 9117.55
## [1] 0.5235408
##
## Call:
## arima(x = dep1_case[1:1406], order = c(4, 0, 2), xreg = dep1_rain[48:1500][1:1406],
## optim.control = list(maxit = 200))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 intercept
## 0.6031 0.8159 -0.2789 -0.1474 -0.0976 -0.7262 22.9428
## s.e. 0.1659 0.2314 0.0791 0.0318 0.1670 0.1440 4.9898
## xreg
## 0.0826
## s.e. 0.0370
##
## sigma^2 estimated as 68.47: log likelihood = -4548.48, aic = 9112.97
## [1] 0.02552198
Nevertheless, the diagnostic plots still show violation of model assumption (ie. residual should be randomly distributed). Acf plot suggests that the model is not sufficient to capture the association between rainfall and cholera incidence.
R version 3.1.2 (2014-10-31)
(R Core Team 2014)ggmap
(Boettiger 2015)RCurl
(Boettiger 2015)xts
(Boettiger 2015)TSA
(Boettiger 2015)fields
(Boettiger 2015)akima
(Boettiger 2015)Boettiger, C. (2015). Knitcitations: Citations for ’knitr’ markdown files. Retrieved from http://CRAN.R-project.org/package=knitcitations
Chan, K.-S. & Ripley, B. (2012). TSA: Time series analysis. Retrieved from http://CRAN.R-project.org/package=TSA
Chin, C.-S., Sorenson, J., Harris, J.B., Robins, W.P., Charles, R.C., Jean-Charles, R.R., Bullard, J., Webster, D.R., Kasarskis, A., Peluso, P., Paxinos, E.E., Yamaichi, Y., Calderwood, S.B., Mekalanos, J.J., Schadt, E.E. & Waldor, M.K. (2011). The origin of the haitian cholera outbreak strain. New England Journal of Medicine, 364, 33–42. Retrieved from http://dx.doi.org/10.1056/NEJMoa1012928
Eisenberg, M.C., Kujbida, G., Tuite, A.R., Fisman, D.N. & Tien, J.H. (2013). Examining rainfall and cholera dynamics in haiti using statistical and dynamic modeling approaches. Epidemics, 5, 197–207. Retrieved from http://dx.doi.org/10.1016/j.epidem.2013.09.004
Eppinger, M., Pearson, T., Koenig, S.S.K., Pearson, O., Hicks, N., Agrawal, S., Sanjar, F., Galens, K., Daugherty, S., Crabtree, J., Hendriksen, R.S., Price, L.B., Upadhyay, B.P., Shakya, G., Fraser, C.M., Ravel, J. & Keim, P.S. (2014). Genomic epidemiology of the haitian cholera outbreak: A single introduction followed by rapid, extensive, and continued spread characterized the onset of the epidemic. mBio, 5, e01721–14. Retrieved from http://dx.doi.org/10.1128/mbio.01721-14
Gasparrini, A. (2011). Distributed lag linear and non-linear models in R: The package dlnm. Journal of Statistical Software, 43, 1–20. Retrieved from http://www.jstatsoft.org/v43/i08/
Gaudart, J., Rebaudet, S., Barrais, R., Boncy, J., Faucher, B., Piarroux, M., Magloire, R., Thimothe, G. & Piarroux, R. (2013). Spatio-temporal dynamics of cholera during the first year of the epidemic in haiti (H. Carabin, Ed.). PLoS Negl Trop Dis, 7, e2145. Retrieved from http://dx.doi.org/10.1371/journal.pntd.0002145
Kirpich, A., Weppelmann, T.A., Yang, Y., Ali, A., Morris, J.G. & Longini, I.M. (2015). Cholera transmission in ouest department of haiti: Dynamic modeling and the future of the epidemic (C. Munoz-Zanzi, Ed.). PLoS Negl Trop Dis, 9, e0004153. Retrieved from http://dx.doi.org/10.1371/journal.pntd.0004153
Koelle, K. (2009). The impact of climate on the disease dynamics of cholera. Clinical Microbiology and Infection, 15, 29–31. Retrieved from http://dx.doi.org/10.1111/j.1469-0691.2008.02686.x
R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/
Rinaldo, A., Bertuzzo, E., Mari, L., Righetto, L., Blokesch, M., Gatto, M., Casagrandi, R., Murray, M., Vesenbeckh, S.M. & Rodriguez-Iturbe, I. (2012). Reassessment of the 2010-2011 haiti cholera outbreak and rainfall-driven multiseason projections. Proceedings of the National Academy of Sciences, 109, 6602–6607. Retrieved from http://dx.doi.org/10.1073/pnas.1203333109