In the previous project, I found that rainfall could be a potential driver for recent cholera epidemics in Hati using preliminary cross-correlation analysis. In order to obtain better insights into the interaction between rainfall and cholera transmission, in this study, I further examine the cross-correlation between transmission (using difference of log incidence as an approximate) and rainfall (smoothened by different functions). As follow-up, I implement a mechanistic model in an effort to explain the possible mechanism regarding how rainfall could drive cholera epidemics in Haiti.
Case Data Cholera daily incidence by departments is available on Ministere de la Sante Publique et de la Population (MSPP) website. Report is published everyday in pdf formate. From 2012 to 2015, There are 120 dates missing in the database. Half of the data points appear to be missing at random; however, some of the data missing throughout large segments in time, particularly in October 2013 and June 2014. These missing data may bias the outcomes given the non-systematic distribution of the size of the missing data in time and the fact that the missing points located when the cholera outbreak usually occurs. Data crawling and parsing are managed using python 2.7.10
.
Rainfall Data Corresponding daily precipitation estimates are pulled from National Aeronautics and Space Administration (NASA) Tropical Rainfall Measuring Mission (TRMM). The coordinate of geographical coverage is longitude [-73.120633, -72.226589] and latitude [18.838958, 19.805066] with resolutions of 0.25° x 0.25°. Fig 1 below shows rainfall and incidence data in recent outbreaks (01/01/2012-12/31/2015).
Transmission approximate
Rainfall data smoothing
I have tried three smoothing methods here:
fft
in order to see the magnitude of frequency in the rainfall data (Re(fft_rain)
, Fig 3). We can sort of see that there are some relative stronger signals before y = 20; however, it is very hard to determine the cutoff of noise. Therefore, I just decide the range by visualizing various cutoff, and here is the smoothing I come out with (Fig4) which will be used in the following analysis.approxfun
to interpolate the daily points (Fig5).sp_fun
to interpolate instead. (Fig6)CCF and pre-whitening CCF
ccf
on all three rainfall-transmission pairs (Fig 8). The CCFs of all three pairs look very similar, and all show very obvious sinusoidal pattern. However the strongest signals from the three plots all suggest that future rainfall (~45 days) is positively correlated with cholera transmission, which is contradicted against what we thought earlier.SIRW model overview
\[\frac{dS}{dt}=-\beta _ISW-\beta _WSW\] \[\frac{dI}{dt}=\beta _ISW+\beta _WSW-\gamma I\] \[\frac{dR}{dt}=\gamma I\] \[\frac{dW}{dt}=\xi I-\xi W\]
The SIRW model is then implemented into a stochastic model using pomp
with daily incidence as output. In order to account for measurement error, we assume each data point is drawn based on poisson random variables with mean = \(Ik\), where \(k\) is reporting rate.
The initial condition of each state is: \[ S(0)=1000000 \] \[ I(0)=100\] \[W=5\]
\(S(0)\) is about 2/3 of Artibonite’s total population (~1,727,524), and I(0) is based on the incidence data with the estimated reporting rate from WHO (5%~10%)(‘WHO | the global burden of cholera’ 2016). W(0) is rather arbitrary, which is decided by fitting the model several times. I do not take R into account here since I assume there is no lose of immunity. \(/gamma\) is the recovery rate which is fixed at 0.25 given the estimates from WHO (~4 days)(2016).
I play with other parameter values using deterministic model to have better picture of the model’s behavior. The SIRW model with rainfall forcing function appears to be able to capture the seasonality of choler incidence with proper \(\beta_I\), \(\beta_W\) ratio and low \(\xi\) (to make sure the rainfall effect from waterborne pathway plays a role) as long as the early epidemics do not burn out the susceptible populations (therefore low \(\beta_I\) and \(\beta_W\) too). I then fit the deterministic model to daily incidence data from MSPP using Nelder-Mead method with traj.match
, Using the fitted parameter values (\(\beta_I=2.28e-07\)s,\(\beta_W=3.23e-07\), \(\xi=6.30e-07\),$ k= 3.02e-02$), Fig 11, and Fig 12 show the results of trajectory and simulation produced by deterministic and stochastic process respectively.
pfilter
with the same fitted parameter set. The result of likelihood is shown below:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52.833 52.977 53.018 53.068 53.171 53.411
Likelihood estimation
{sirw_Np=200; sirw_Nmif=5; sirw_Neval=10; sirw_Nglobal=10; sirw_Nlocal=10}
to just get a sense what might look like.Optimization and parameter estimation
mifs
(Ionides et al. 2015). However, while look at the likelihood plot, it is hard to tell if the optimization is successful or the likelihood surface given the small number of iterations.## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52.988 54.905 55.793 55.416 56.293 56.808
The cross-correlation analysis shows that rainfall somewhat associates with cholera transmission with certain time dependence. It does not tell us more than that, especially it does not suggest rainfall can be a precedent driver for cholera transmission. However, as we can see from the diagnostic plots for pre-whitening CCF, those AR models do not describe rainfall pattern very well. The pre-whitening CCF might not be a very tool to look at the correlation for rainfall and cholera transmission.
Even though difference of \(log(incidence)\) could be a good way to approximate disease transmission, it could still be biased by some measurement errors. The sudden sharp peaks could either be from large transmission or simply the delay of reporting.
Also, the way I implement rainfall function in this model is very naive (by simply multiply rainfall and force of infection together). I do not consider any effects of lags or any other why rainfall could interaction with cholera transmission. Moreover, as I mentioned in previous project, increasing rainfall could have opposite influence on cholera transmission (eg. excess amount of water can also dilute the pathogen in the water). Therefore, it is important to think about more detailed method to refine the model, so the model can better explain the data and also help us better understand the effect of rainfall on disease transmission process.
Furthermore, I need to think more carefully about the interpretation and scaling of the SIWR model. For example, it is hard to explain the parameter and state values, especially even though the value of W can be seen as a abstract representation of pathogen concentration, I cannot explain what the value really means in practice, which might make it hard to measure it in the real life (if we need to).
The last but not the least. I had a hard time to find reasonable ranges for parameter without exploding iterated filtering optimizer (usually it happens when some of the states go to negative..). I will need to find a better way to set up the optimizer for future use.
R version 3.2.4 (2016-03-10)
(R Core Team 2016)knitcitations
(Boettiger 2015)RCurl
(Temple Lang & CRAN team 2016)xts
(Ryan & Ulrich 2014)TSA
(Chan & Ripley 2012)bibtex
(Francois 2014)pomp
(King et al. 2015, 2016)doMC
(Analytics & Weston 2015a)foreach
(Analytics & Weston 2015b)ggplot2
(Wickham 2009)Analytics, R. & Weston, S. (2015a). DoMC: Foreach parallel adaptor for ’parallel’. Retrieved from https://CRAN.R-project.org/package=doMC
Analytics, R. & Weston, S. (2015b). Foreach: Provides foreach looping construct for r. Retrieved from https://CRAN.R-project.org/package=foreach
Boettiger, C. (2015). Knitcitations: Citations for ’knitr’ markdown files. Retrieved from https://CRAN.R-project.org/package=knitcitations
Chan, K.-S. & Ripley, B. (2012). TSA: Time series analysis. Retrieved from https://CRAN.R-project.org/package=TSA
Eisenberg, M.C., Kujbida, G., Tuite, A.R., Fisman, D.N. & Tien, J.H. (2013). Examining rainfall and cholera dynamics in haiti using statistical and dynamic modeling approaches. Epidemics, 5, 197–207. Retrieved from http://dx.doi.org/10.1016/j.epidem.2013.09.004
Francois, R. (2014). Bibtex: Bibtex parser. Retrieved from https://CRAN.R-project.org/package=bibtex
Ionides, E.L., Nguyen, D., ’e, Y.A., Stoev, S. & King, A.A. (2015). Inference for dynamic and latent variable models via iterated, perturbed bayes maps. Proceedings of the National Academy of Sciences, 112, 719–724. Retrieved from http://dx.doi.org/10.1073/pnas.1410597112
King, A.A., Ionides, E.L., Bretó, C.M., Ellner, S.P., Ferrari, M.J., Kendall, B.E., Lavine, M., Nguyen, D., Reuman, D.C., Wearing, H. & Wood, S.N. (2016). pomp: Statistical inference for partially observed Markov processes. Retrieved from http://kingaa.github.io/pomp
King, A.A., Nguyen, D. & Ionides, E.L. (2015). Statistical inference for partially observed Markov processes via the R package pomp. Journal of Statistical Software, in press.
R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from https://www.R-project.org/
Ryan, J.A. & Ulrich, J.M. (2014). Xts: EXtensible time series. Retrieved from https://CRAN.R-project.org/package=xts
Temple Lang, D. & CRAN team. (2016). RCurl: General network (hTTP/FTP/.) client interface for r. Retrieved from https://CRAN.R-project.org/package=RCurl
Tien, J.H. & Earn, D.J.D. (2010). Multiple transmission pathways and disease dynamics in a waterborne pathogen model. Bulletin of Mathematical Biology, 72, 1506–1533. Retrieved from http://dx.doi.org/10.1007/s11538-010-9507-6
WHO | the global burden of cholera. (2016).WHO. Retrieved from http://www.who.int/bulletin/volumes/90/3/11-093427/en/
Wickham, H. (2009). Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from http://ggplot2.org
(2016). Retrieved from http://www.who.int/mediacentre/factsheets/fs107/en/