Dengue is a viral infection commonly occur in tropical regions. Symptoms include high fever, headache, vomiting, muscle and joint pains, and a characteristic skin rash.
The viruses are transmitted through the bite of infected mosquitoes (Aedes aegypti and Aedes albopictus) that feed both indoors and outdoors during the daytime. These mosquitoes usually thrive in areas with standing water.
The virus has multiple types. Infection with one type usually gives lifelong immunity to that type, but only short-term immunity to the others. Although Dengue vaccine has been developed and launched in many countries, its usage is still limited. The main approach of disease prevention is limiting exposure to mosquito bites.
Dengue has become a global problem since the Second World War and is common in more than 110 countries. As one of tropical countries, Malaysia has suffered multiple Dengue outbreak in recent years. In 2016, Malaysia reported more than 100,000 dengue fever cases, and the outbreak was ongoing in 2017.
This project aims to analyse historic Dengue incidence data to understand the epidemiologic dynamic behind the spread of this disease. The foundation of modeling is SEIR pomp model. Since the disease transmission requires the participation of mosquito, mosquito dynamic and its interaction with human Dengue spreading is also studied in this project. This improved model would help better explain Dengue incidence pattern especially when multiple outbreaks are overlapped.
This project leveraged two data sources:
Weekly Dengue incidence data from June,2012 to May,2015. The original data is from Malaysia Open Data Portal. Github user @ shikin2015 cleaned and combined the data, the detail can be found here.
Precipitation data is from World Bank Climate Change Knowledge Portal. Since only country level monthly data is available, I used Malaysia national monthly rainfall data as a surrogate of district level rainfall data in this analysis. Considering the climate cycle is similar across the country, this assumption is acceptable.
Further data cleaning and combining performed by me can be found in R script submitted along with this report.
Each district in Malaysia has different outbreak progress. For the purpose of this project, I narrowed down the analysis to one district, Petaling. This district was selected because its data is complete, and the dataset managed to captured two full cycle of Dengue outbreak from 2013 to 2015.
## count pr
## Min. : 28 Min. : 94.73
## 1st Qu.: 742 1st Qu.:179.47
## Median :1032 Median :249.37
## Mean :1274 Mean :250.20
## 3rd Qu.:1811 3rd Qu.:317.32
## Max. :4154 Max. :462.98
## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 1
## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 1
Plotting both weekly Dengue incidence and precipitation, we can observe some correlation between both trends. The peak of precipitation precedes the peak of Dengue incidence for ~5 weeks. The pattern is reasonable, since the reproduction of mosquito has been observed to be associated with precipitaiton. As the vector of this infectious disease, larger mosquito population indicates higher possibility of disease spreading.
The incidence cases is generally continous across time axis, however, there is a significant drop during the peak period around week 85, possibly caused by missing data. This outlier may influence our analysis.
The data captured two outbreak in two consecutive years. We can start our analysis with SEIR model on single outbreak.
Dengue symptoms usually appears 4-7 days after mosquito bites, and then symptoms present for 3-10 days. The patients is only infective (through mosquitoes) during symptom phase. Therefore, a 4-compartment SEIR model is a reasonable choice to start.
\(S = \text{susceptible}\) \(E = \text{exposed and latent}\) \(I = \text{infected and infectious}\) \(R = \text{recovered and/or removed}\)The moving count from compartment to compartment can be simply modeled as binomial distribution over short time.
The number moving from S to E is: \[\Delta{N_{SE}} \sim Binomial\ (S,1-e^{-\lambda I\Delta{t}})\] The number moving from E to I is: \[\Delta{N_{EI}} \sim Binomial\ (E,1-e^{-\beta\Delta{t}})\]
and the number moving from E to I is: \[\Delta{N_{IR}} \sim Binomial\ (I,1-e^{-\gamma\Delta{t}})\]
After setting up the pomp model, I tried out different sets of parameters and conducted the simulation. It seems that the simulation has quite large variance and the model fitness is not very well: Incidence peak simulated by this model is sharper and narrower than the actual trend. SEIR model might be more suitable for infectious diseases transmitted directly between human. For Dengue fever, the dynamic of mosquitoes may play an important role, rendering the model more complicated.
Log likelihood:
## se
## -1805.202458 3.595296
Mosquito is the vector of Dengue virus transmission. Mosquitoes lay eggs on still water, such as puddles after rain. The eggs take about one week to develop into adult mosquito, which can feed on human. The life span of an adult mosquito is about 3-4 weeks.
As observed earlier, an increase in precipitation does lead the increase of Dengue incidence, which is aligned with theory. We can quantitatively examine this correlation through cross-correlation function (ccf) in R.
A sinusoidal pattern indicates a non-linear relation between precipitation and Dengue incidence. A significant negative lag with positive ACF indicates a strong positive correlation that precipitation leads the incidence of Dengue. However, the association may not be causal, it is possible that a confounder impact both trends.
To better identify the pattern of CCF, we can use prewhiten function to remove the potential common trend. The sinusoidal pattern almost disappear. Precipitation alone may not be adequate to explain the trend of Dengue incidence.
Since we don’t have direct data on other climatic factors or population size of mosquitoes, we can try to incorporate mosquito dynamic through precipitation data. This incorporation may help us better modeling the Dengue incidence trend across multiple outbreaks.
To simplify the model, we assume the birth of new mosquito eggs is linearly associate with precipitation, ignoring the scale. Number of mosquitos that will interact with human community follows a poisson distribution with mean \(R\), where R is the precipitation(mm) at that time point.
\[M_{0,n} \tilde{} Poisson(R)\]
It takes about one week for mosquito eggs to grow into adult mosquito which are capable of feeding on human. After mosquito acquiring Dengue virus from human patients, it takes about one week for the virus to reproduce in mosquito’s blood stream before it becomes infective to other human. Mosquito carried Dengue virus can spread the disease to other human throughout its entire adult life span, which is about four weeks.
Therefore, the compartmental model of mosquito can be constructed as:
For human side of the dynamic model, we keep the SEIR model structure, with following improvements:
\[p_n = exp\{-\lambda_n\}\]
\[\Delta N_{SE} = (1-p_n) \times S_n\]
\[\Delta N_{EI} = E_n\]
\[N_{death} = I_n \times \delta\]
\[\Delta N_{II_m} \sim Poisson(k_{i\_im} \times(1-\delta) \times I_n)\]
\[\Delta N_{II_m} \leq I_n \]
Parameter \(k_{i\_im}\) is unknown.
\[\Delta N_{ImS} \sim Poisson(k_{im\_s} \times Im_n) \]
\[\Delta N_{ImS} \leq Im_n\] Parameter \(k_{im\_s}\) is unknown.
In summary, the change of states at each time step is:
\(\Delta S = \Delta N_{ImS} - \Delta N_{SE}\)
\(\Delta E = \Delta N_{SE} - \Delta N_{EI}\)
\(\Delta I = \Delta N_{EI} - \Delta N_{IIm} - \Delta N_{death}\)
\(\Delta Im = \Delta N_{IIm} - \Delta N_{ImS}\)
\[\bar{\lambda_n}= \frac{(M_{2,n} + M_{3,n} + M_{4,n})\times I_n \times \beta}{P_n}+\psi\]
\[\lambda_n = \frac{\bar{\lambda_n} \times \epsilon_n}{1000}\]
\(\epsilon_n\) follows Gamma distribution with mean 1 and variance:
\[\sigma^2=\sigma_{env}^2+\sigma_{dem}^2/\bar{\lambda_n}\]
For initial state, the proportion of latency, infected, and immuned are set to be unknown parameters.
For observations, it is a discretized normal distribution truncated at zero, conditional on the state, with both environmental and Poisson-scale contributions to the variance.
\[Y_n= \max{\mathrm{round}(Z_n),0}, \quad Z_n\sim\mathrm{normal}\left(\rho I_n, \big(\tau I_n\big)^2 + \rho I_n\right).\]
\(\rho\) and \(\tau\) are unknown parameters.
After we set up the pomp model, mif2 algorithm is implemented to find the local maximum likelihood. Since there are many unknown parameters, the landscape of likelihood surface might be quite complex. The range of log likelihood is broad, we managed to get a loglikelihood of -629.5.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1947.0 -1067.7 -960.8 -977.7 -859.8 -629.5
From the pair plots, we can observe that the value of \(\beta\) concentrates slight above zero. Besides, value of \(\rho\) and \(\tau\) are positively associated. Considering the likelihood surface is complex, there might be multiple combinations of parameters that all achieve similar likelihood.
On top of the local search result we obtained earlier, we further conduct a global search to confirm the likelihood maximization. The best likelihood obtained is still around -620, which gives us some confidence about the output.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -898.5 -629.3 -628.6 -632.9 -626.8 -626.0
We can observe that optimal parameters values are clustered in smaller region and the trend we observed from local search remains.
Using the MLE we obtained from global search, we can simulate the dynamic process to inspect the pattern of each state.
As shown in the simulation plots, the incorporation of mosquito dynamic helped us better capture the trend of Dengue incidence across outbreaks. The simulated incidence peaks lagged behind precipitation as expected. The duration of outbreaks becomes longer, which is no longer a sharp peak as shown in simple SEIR mode. Besides, the movement of immuned people back to susceptible pool help us explain the overlap of two consecutive Dengue outbreaks.
The model has many unknown parameters and different combinations of values to achieve optimal likelihood. Therefore, the convergence of each single paramter is not guaranteed/expected. The bottom line is to achieve convergen on log likelihood. Plotting the top log likelihoods, we can see they all converge as expected.
Dengue transmission requires mosquito as vector. Standard SEIR model may not able to capture this dynamic.
Incorporation of mosquito compartments through precipitation could help us improve the model. Although it is possible that some unknown confounders are behind both the precipitation and Dengue incidence pattern, this model is still reasonable for non-causal analysis.
The moving from immuned pool back to susceptible pool and the seasonal change of precipitation jointly help explain the overlapped, seasonal outbreaks of Dengue in Petaling, Malaysia.
Compare the performance of this model with non-dynamic models.
Add vaccine factors to study how vaccination is going to impact Dengue incidence dynamic.
Detail modeling of Mosquito dynamics based on environmental covariates other than precipitation: Since mosquitoes are the vectors of many infectious diseases, some ecological studies have been conducted on mosquito population dynamics (example). Those research can be leveraged to further improve our model.
Apply the model to broader scope: We can test the performance of this model on Dengue incidence data of other districts in Malaysia. We can also expand the time frame of data to examine the seasonality of outbreaks in a broader view.
Conduct inference on parameters (e.g profile likelihood).
[1] “Dengue and severe dengue Fact sheet N°117”. WHO. May 2015
[2] Ranjit S, Kissoon N (January 2011). “Dengue hemorrhagic fever and shock syndromes”. Pediatric Critical Care Medicine. 12 (1): 90-100
[3] CDC Dengue Epidemiology: https://www.cdc.gov/dengue/epidemiology/index.html
[4] Nature Scitable Dengue Introduction: https://www.nature.com/scitable/topicpage/dengue-transmission-22399758
[5] Stats 531 Winter 2016 Final Projects: http://ionides.github.io/531w16/final_project/
[6] Stats 531 Winter 2018 Lecture Notes: https://ionides.github.io/531w18/