Measles is a highly contagious disease with many potential complications. While the fatality rate is generally not very high in developed countries (~0.3%), the serious complications that can occur and the high rate of infection make prevention a priority for any government or public health organization.
Given that there does exist a rather effective measles vaccine with about a 95% rate of immunity for children of 12 months, it may seem like a waste of resources to focus on modelling the disease’s epidemiology. This is not the case. A simple google search reveals that outbreaks in developed countries where vaccines are readily available are not uncommon. Herd immunity, the percentage of a population that must be immune to a disease to protect those who cannot be vaccinated and against outbreaks, of measles is established at around 95%. Given the numerous controversies surrounding vaccination, it’s not unreasonable to imagine that the proportion of the population immune to the disease could dip below that. Coupling this fact with possible connections between socioeconomic status and vaccination, it feels as if establishing herd immunity permanently could be a long way off.
With this in mind, an analysis of the epidemiology of measles should have many benefits. If an outbreak is identified, resources could be appropriated with care. This would result in money, and likely even lives, saved. However, analyses of the spread of measles in post-vaccination society is not widely available, hence a motivation for this project.
I will be analyzing an outbreak of measles in late 80s and early 90s from California. The data was pulled from Project Tycho and is freely available. Much of this analysis will be based on an analysis by Aaron King and He et al.
To begin, we will look at the data available from Project Tycho. We will view the cases from 1989-1992. Cases were recorded weekly, with some data missing.
An initial look reveals more than just the aforementioned missing data, we see that cases recorded during the peak of the epidemic vary greatly, from 100s one week to close to 0 the next. This begs the question: “Could measles really spread in such a varying fashion?” Intuition, and advice from those more experienced in epidemiology than myself, leads to one answer: No. In order to better model the spread of the disease, the wild variations were removed. Specifically, any week with \(\leq 1\) case recorded sandwiched in between two weeks with \(>100\) cases recorded has been removed.
With the specified data removed, we see a somewhat less varying plot of the data below with the removed data in black and the data being used for our analysis in red:
Prior analysis by Aaron King used two main covariates for modeling measles: birth rates and population sizes. The population data is available from the United States Census website and the birth rates from the California Department of Public Health. Similar to the King analysis, the covarites were obtained yearly, and smoothed over each year to get a weekly value. Below we can see the data as points and the smoothed values as the red line. Birthrate is in number of Births per year and population is for children under 10 years old.
The selection of the population of children under 10 years should be addressed. There were two main motivations here. Firstly, a post-epidemic analysis (Dales et al 1993) shows that over 60% of the cases recorded during the peak epidemic years, 1989 and 1990, were in children under 10 years old. This is markedly higher than earlier years in the decade, where children of that age accounted for ~40% of all cases. Given this, it is reasonable to focus on this group as the main demographic of susceptibles in our analysis. Secondly, using the entirety of California’s population leads to a very small proportion of susceptibles, which makes it more difficult to identify the effect changes in our covariates and parameters have on our model. With this in mind, finding a way to lessen the population, given sufficient reason, is a logical next step.
Here we see both smoothed lines do not perfectly capture the yearly estimates, but they do sufficiently well to able to measure the effect that changes may have on our model. It also should be noted that in our analysis, we will be working with lagged birthrate; both at the 6 month and 1 year lag. Reasons for this selection will be specified in the next section.
To model the disease, a slight variation on the standard SEIR compartment model was used.
\(b = \text{births}\)
\(S = \text{susceptibles}\)
\(E = \text{exposed, non-infectious, incubating}\)
\(I = \text{Infectious}\)
\(R = \text{Recovered}\)
(King, A. 2015)
Our model has the earmarks of a susceptible-exposed-infectious-recovered(SEIR) model. Births migrate to our susceptible pool at each time step at a specified rate. The effect of measles on the death rate is negligible, so our death rate at each department is kept equal. The main deviation here from the standard SEIR model is the arrow from \(S \rightarrow R\). This is to model vaccinations, where someone in compartment \(S\) may be vaccinated and go straight to compartment \(R\), where they are modeled as immune.
Analysis was conducted using the POMP framework. This is useful in that it allows for modeling of non-linear processes, which is a trait of epidemiological systems (King, A. 2015). It also will allow much more flexibility in the creation of a model, which is helpful when modeling a complex process such as the spread of a disease, possibly including numerous parameters.
Assuming the reader’s familiarity with non-linear POMP models, we will begin by introducing the process model below. This model borrowed heavily from Aaron King’s implementation of the model put forth by He et al. 2010. Any parameters not specified in this study can be found at either of the two above resources.
rproc <- Csnippet("
double beta, br, seas, foi, dw, births, vac;
double rate[6], trans[6];
// term-time seasonality
t = (t-floor(t))*365.25;
if ((t>=7&&t<=100) || (t>=115&&t<=199) || (t>=252&&t<=300) || (t>=308&&t<=356))
seas = 1.0+amplitude*0.2411/0.7589;
else
seas = 1.0-amplitude;
// transmission rate
beta = R0*(gamma+mu)*seas;
// expected force of infection
foi = beta*pow(I+iota,alpha)/pop;
// white noise (extrademographic stochasticity)
dw = rgammawn(sigmaSE,dt);
rate[0] = foi*dw/dt; // stochastic force of infection
rate[1] = mu; // natural S death
rate[2] = sigma; // rate of ending of latent stage
rate[3] = mu; // natural E death
rate[4] = gamma; // recovery
rate[5] = mu; // natural I death
// Poisson births
births = rpois(birthrate_lag05*dt);
// Vaccination
vac = nearbyint(vr*birthrate_lag1*.95*dt);
// transitions between classes
reulermultinom(2,S,&rate[0],dt,&trans[0]);
reulermultinom(2,E,&rate[2],dt,&trans[2]);
reulermultinom(2,I,&rate[4],dt,&trans[4]);
if (vac > S - trans[0] - trans[1]){
vac = S - tran[0] - tran[1];
}
S += births - trans[0] - trans[1] - vac;
printf(\" births are: %f, S is: %f \",births, S);
E += trans[0] - trans[2] - trans[3];
I += trans[2] - trans[4] - trans[5];
R = pop - S - E - I + vac;
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
C += trans[4]; // true incidence
")
initlz <- Csnippet("
double m = pop/(S_0+E_0+I_0+R_0);
S = nearbyint(m*S_0);
E = nearbyint(m*E_0);
I = nearbyint(m*I_0);
R = nearbyint(m*R_0);
W = 0;
C = 0;
")
## Measurement Model ##
dmeas <- Csnippet("
double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 1.0e-18;
if (cases > 0.0) {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)-pnorm(cases-0.5,m,sqrt(v)+tol,1,0)+tol;
} else {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)+tol;
}
")
rmeas <- Csnippet("
double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 1.0e-18;
cases = rnorm(m,sqrt(v)+tol);
if (cases > 0.0) {
cases = nearbyint(cases);
} else {
cases = 0.0;
}
")
To briefly review the process model, we’ll start at the beginning. Major departures from the original model will be made clear and decisions to keep what were there will be explained.
To investigate a decent fit for our model, we perform two analyses. First, we carry out a local search of model parameters based on prior estimates given in He et al. Our second analysis is based on a global search where our initial parameters are varying and compare them to our local search. We will specify the parameters in question below before moving on to our analysis.
Below are the parameters we want to estimate with brief definitions:
In order to perform a local search, initial values for our parameters must be selected before using the iterated filtering algorithm (mif2) in the pomp package to find local maximums for these parameters. These initial values were based on an initial investigation performed by He et al.(2010). Their paper modeled the epidemiology of measles in 20 different cities around England prior to the vaccinations availability, and found parameter estimates for their very similar model; the basis by which we’ve established the model above. The parameter estimates for each city were evaluated with the model above using the California data.
Our initial step was to perform particle filtering on these maximum likelihood estimates in order to find a suitable set of initial parameters. A few alterations were made to these parameters before running the particle filter. Firstly, a \(vr\) parameter needed to be assigned - it was set to 0.80, based on an 0.82 US vaccination rate estimate by the World Health Organization in 1991. Another parameter that was changed was \(\mu\), our death rate. This was set at 0.007, the value given by the California Department of Publich Health for 1990.
After performing particle filter, we can see the maximum likelihood estimates below for the California data and the model above with the parameters from each of the 20 cities given by He et al(2010). An interval of \(\pm 1.96\) standard errors of the estimate are given by the error bars.
We see many low-varying estimates with log-likelihood values > -1000. Moving ahead, we’ll use those of Bedwellty in order to initialize our local search.
town | loglik | loglik.sd | mu | delay | sigma | gamma | rho | R0 | amplitude | alpha | iota | cohort | psi | sigmaSE | vr |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bedwellty | -792.2213 | 1.559706 | 0.007 | 4 | 57.9 | 146 | 0.311 | 24.7 | 0.16 | 0.937 | 0.0396 | 0.351 | 0.951 | 0.0611 | 0.8 |
With our initial parameters, we can now begin our local search using the iterated filtering algorithm specified above. Below we can look at the likelihood results from our local search and a pairs plot of the resulting parameter estimates.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -727.07 -724.52 -722.11 -722.47 -720.07 -719.43
Next we will conduct a global search using the same iterated filtering algorithm used above. The difference being that we randomly and uniformly select initial parameters from a specified range and run the mif2 algorithm. Below are the boxes we let the parameters vary over - the limits were created by using approximate results from He et al. and intuition regarding reasonable values for the parameters.
measles_box <- rbind(
R0=c(29,40),
gamma=c(60,169),
alpha=c(.7,1),
iota=c(0,.4),
rho=c(.15,.65),
psi=c(.15,.5),
sigma=c(41,56),
sigmaSE=c(.03,.09),
vr=c(0.85,.95),
amplitude=c(0.2, 0.5)
)
After iterated filtering was run, particle filtering was used and likelihood estimates obtained. We can see the results below:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -724.26 -721.78 -721.49 -721.49 -720.92 -720.47
We can see that the maximum likelihood obtained slightly lower than what was found in our local search. This is disappointing and the reason behind this is worth further investigation.
Firstly, it should be noted that because of computational issues, only 20 mifs were ran for both the local and global searches with 200 mif iterations. All conclusions made ought to be taken in this context. Moreover, this could be the reason behind our finding a greater likelihood in our local search than our global search.
Initially we observe a few interesting results from our local and global searches. As mentioned, the local search reached a slightly higher likelihood than our global search. This is unexpected, as a search initializing at many different parameter values would seem as if it should reach a higher maximum. We also see a smaller variance in the likelihood of our global search; this further contradicts general intuition, as one would expect varying initial values would result in a larger range of likelihoods estimates from our mifs.
Looking at our pairs plot for both local and global searches, we see what looks like a trade off between \(\sigma_{\text{SE}}\) and \(\alpha\). We can look more closely at this result below, from our global results:
We \(\sigma_{\text{SE}}\) and \(\alpha\) seem to trade off while maintaining likelihood.
Moving on, we can look at our diagnostic plots for our mif2 maximization procedure.
There are a number of specific parameters that stand out. With the exception of one mif, we see that \(\rho\) converges to a very low value, suggesting a low reporting rate. We see \(\sigma_{\text{SE}}\) jumping to ~0.3 for most mifs. This is notably higher than the parameter estimates obtained by He et al. This may not be surprising, as one could imagine in a diverse and populous state such as California, extra demographic stochasticity may be quite large. We also see estimates for \(\alpha\) generally sink quite low, suggesting heterogeneous mixing in our population. Analysis by Dales et al. seems to corroborate this finding, suggesting that cases of measles may have been concentrated in minority communities and, as previously noted, young children.
Depending on the interest of the reader or future researcher, any number of analysis may prove helpful. Generally speaking, running more mifs at more iterations would be helpful. A number of computational issues came about during this project which lead to this shortcoming; future analysis improving upon this alone would be useful as it could lead to different results.
Firstly, given additional computational power and epidemiological knowledge, a different compartment model could be used. It may be useful to separate infants after 6 months into a completely different susceptible pool. Given their larger susceptibility to the disease relative to older susceptibles, a separate compartment may be able to capture some disease dynamics that this compartment model was unable to investigate. The proposed model may look as follows:
Here, \(\text{S1}\) would model young children who are at a much higher level of susceptibility than older children and adults. These young children could move to either an exposed compartment, \(\text{E}\), or, upon successful vaccination, to a recovered compartment, \(\text{R}\). This type of model would need the inclusion of more of the population (i.e., not just children under ten years old), but may be worth investigating in the future.
Depending on the researchers parameter of interest, creating a likelihood profile may be quite useful. Developing some certainty in the estimate of parameters like \(R0\) would help quantify the force of a disease. It’s quite possible that a post-vaccination measles may have different infective properties than the disease prior to the widespread use of vaccinations. If the force of infection, reflected in a change in \(R0\), were to change, knowledge of this and the degree to which it has changed would help better prepare any public health institution for a future outbreak. Similarly, knowledge of our vaccination rate, \(vr\), would help determine whether or not additional resources may need to be allocated to getting the population better vaccinated. If a likelihood profile for vaccination rates revealed that the rate was significantly lower than desired - that would flag a need for these additional resources.
Many assumptions were made in this report regarding fixed model specifications/rates that could be estimated in a future model. A couple proposals are below:
King, A. (2015, June 7). Case study: Measles in large and small towns. Retrieved from http://kingaa.github.io/sbied/measles/measles.html
He, D., Ionides, E. L., & King, A. A. (2010). Plug-and-play inference for disease dynamics: measles in large and small populations as a case study. Journal of the Royal Society Interface, 7(43), 271-283. http://doi.org/10.1098/rsif.2009.0151
Dales, L. G., Kizer, K. W., Rutherford, G. W., Pertowski, C. A., Waterman, S. H., & Woodford, G. (1993). Measles epidemic from failure to immunize. Western Journal of Medicine, 159(4), 455-464.
Drucker, R. (2010). How Long Does Passive Immunity to Measles Last in Infants. NEJM Journal Watch. Retrieved from http://www.jwatch.org/pa201006020000002/2010/06/02/how-long-does-passive-immunity-measles-last
Population Estimates. (n.d.). Retrieved April 26, 2016, from https://www.census.gov/popest/data/historical/index.html
Statistics. (n.d.). Retrieved April 26, 2016, from http://www.cdph.ca.gov/data/statistics/Pages/default.aspx
Ahmad, S., Zahid, S., & Jan, A. (2013). The impact of parental education and socioeconomic status on routine childhood vaccination: An obsevational study. Journal Of Postgraduate Medical Institute (Peshawar - Pakistan), 27(3). Retrieved from http://www.jpmi.org.pk/index.php/jpmi/article/view/1391/1367
Measles vaccine. (n.d.). Retrieved April 26, 2016, from https://en.wikipedia.org/wiki/Measles_vaccine
Measles. (n.d.). Retrieved April 26, 2016, from https://en.wikipedia.org/wiki/Measles