California Measles - Late 1980s / Early 1990s

Measles in Contemporary Society / Motivation

Measles is a highly contagious disease with many potential complications. While the fatality rate is generally not very high in developed countries (~0.3%), the serious complications that can occur and the high rate of infection make prevention a priority for any government or public health organization.

Given that there does exist a rather effective measles vaccine with about a 95% rate of immunity for children of 12 months, it may seem like a waste of resources to focus on modelling the disease’s epidemiology. This is not the case. A simple google search reveals that outbreaks in developed countries where vaccines are readily available are not uncommon. Herd immunity, the percentage of a population that must be immune to a disease to protect those who cannot be vaccinated and against outbreaks, of measles is established at around 95%. Given the numerous controversies surrounding vaccination, it’s not unreasonable to imagine that the proportion of the population immune to the disease could dip below that. Coupling this fact with possible connections between socioeconomic status and vaccination, it feels as if establishing herd immunity permanently could be a long way off.

With this in mind, an analysis of the epidemiology of measles should have many benefits. If an outbreak is identified, resources could be appropriated with care. This would result in money, and likely even lives, saved. However, analyses of the spread of measles in post-vaccination society is not widely available, hence a motivation for this project.

I will be analyzing an outbreak of measles in late 80s and early 90s from California. The data was pulled from Project Tycho and is freely available. Much of this analysis will be based on an analysis by Aaron King and He et al.

Data Exploration

To begin, we will look at the data available from Project Tycho. We will view the cases from 1989-1992. Cases were recorded weekly, with some data missing.

plot of chunk unnamed-chunk-1

An initial look reveals more than just the aforementioned missing data, we see that cases recorded during the peak of the epidemic vary greatly, from 100s one week to close to 0 the next. This begs the question: “Could measles really spread in such a varying fashion?” Intuition, and advice from those more experienced in epidemiology than myself, leads to one answer: No. In order to better model the spread of the disease, the wild variations were removed. Specifically, any week with \(\leq 1\) case recorded sandwiched in between two weeks with \(>100\) cases recorded has been removed.

With the specified data removed, we see a somewhat less varying plot of the data below with the removed data in black and the data being used for our analysis in red:

Covariates

Prior analysis by Aaron King used two main covariates for modeling measles: birth rates and population sizes. The population data is available from the United States Census website and the birth rates from the California Department of Public Health. Similar to the King analysis, the covarites were obtained yearly, and smoothed over each year to get a weekly value. Below we can see the data as points and the smoothed values as the red line. Birthrate is in number of Births per year and population is for children under 10 years old.

The selection of the population of children under 10 years should be addressed. There were two main motivations here. Firstly, a post-epidemic analysis (Dales et al 1993) shows that over 60% of the cases recorded during the peak epidemic years, 1989 and 1990, were in children under 10 years old. This is markedly higher than earlier years in the decade, where children of that age accounted for ~40% of all cases. Given this, it is reasonable to focus on this group as the main demographic of susceptibles in our analysis. Secondly, using the entirety of California’s population leads to a very small proportion of susceptibles, which makes it more difficult to identify the effect changes in our covariates and parameters have on our model. With this in mind, finding a way to lessen the population, given sufficient reason, is a logical next step.

plot of chunk unnamed-chunk-3

Here we see both smoothed lines do not perfectly capture the yearly estimates, but they do sufficiently well to able to measure the effect that changes may have on our model. It also should be noted that in our analysis, we will be working with lagged birthrate; both at the 6 month and 1 year lag. Reasons for this selection will be specified in the next section.

Model Creation

To model the disease, a slight variation on the standard SEIR compartment model was used.

\(b = \text{births}\)
\(S = \text{susceptibles}\)
\(E = \text{exposed, non-infectious, incubating}\)
\(I = \text{Infectious}\)
\(R = \text{Recovered}\)
(King, A. 2015)

Our model has the earmarks of a susceptible-exposed-infectious-recovered(SEIR) model. Births migrate to our susceptible pool at each time step at a specified rate. The effect of measles on the death rate is negligible, so our death rate at each department is kept equal. The main deviation here from the standard SEIR model is the arrow from \(S \rightarrow R\). This is to model vaccinations, where someone in compartment \(S\) may be vaccinated and go straight to compartment \(R\), where they are modeled as immune.

Analysis was conducted using the POMP framework. This is useful in that it allows for modeling of non-linear processes, which is a trait of epidemiological systems (King, A. 2015). It also will allow much more flexibility in the creation of a model, which is helpful when modeling a complex process such as the spread of a disease, possibly including numerous parameters.

Assuming the reader’s familiarity with non-linear POMP models, we will begin by introducing the process model below. This model borrowed heavily from Aaron King’s implementation of the model put forth by He et al. 2010. Any parameters not specified in this study can be found at either of the two above resources.

rproc <- Csnippet("
                  double beta, br, seas, foi, dw, births, vac;
                  double rate[6], trans[6];
                  
                  // term-time seasonality
                  t = (t-floor(t))*365.25;
                  if ((t>=7&&t<=100) || (t>=115&&t<=199) || (t>=252&&t<=300) || (t>=308&&t<=356))
                  seas = 1.0+amplitude*0.2411/0.7589;
                  else
                  seas = 1.0-amplitude;
                  
                  // transmission rate
                  beta = R0*(gamma+mu)*seas;
                  // expected force of infection
                  foi = beta*pow(I+iota,alpha)/pop;
                  // white noise (extrademographic stochasticity)
                  dw = rgammawn(sigmaSE,dt);
                  
                  rate[0] = foi*dw/dt;  // stochastic force of infection
                  rate[1] = mu;             // natural S death
                  rate[2] = sigma;        // rate of ending of latent stage
                  rate[3] = mu;             // natural E death
                  rate[4] = gamma;        // recovery
                  rate[5] = mu;             // natural I death
                  
                  // Poisson births
                  births = rpois(birthrate_lag05*dt);

                  // Vaccination
                  vac = nearbyint(vr*birthrate_lag1*.95*dt);

                  // transitions between classes
                  reulermultinom(2,S,&rate[0],dt,&trans[0]);
                  reulermultinom(2,E,&rate[2],dt,&trans[2]);
                  reulermultinom(2,I,&rate[4],dt,&trans[4]);

                  if (vac > S - trans[0] - trans[1]){
                    vac = S - tran[0] - tran[1];
                  }

                  S += births   - trans[0] - trans[1] - vac;
                  printf(\" births are: %f,  S is: %f \",births, S);
                  E += trans[0] - trans[2] - trans[3];
                  I += trans[2] - trans[4] - trans[5];
                  R = pop - S - E - I + vac;
                  W += (dw - dt)/sigmaSE;  // standardized i.i.d. white noise
                  C += trans[4];           // true incidence
                  ")

initlz <- Csnippet("
  double m = pop/(S_0+E_0+I_0+R_0);
                   S = nearbyint(m*S_0);
                   E = nearbyint(m*E_0);
                   I = nearbyint(m*I_0);
                   R = nearbyint(m*R_0);
                   W = 0;
                   C = 0;
                   ")

##  Measurement Model  ##
dmeas <- Csnippet("
  double m = rho*C;
                  double v = m*(1.0-rho+psi*psi*m);
                  double tol = 1.0e-18;
                  if (cases > 0.0) {
                  lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)-pnorm(cases-0.5,m,sqrt(v)+tol,1,0)+tol;
                  } else {
                  lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)+tol;
                  }
                  ")

rmeas <- Csnippet("
  double m = rho*C;
                  double v = m*(1.0-rho+psi*psi*m);
                  double tol = 1.0e-18;
                  cases = rnorm(m,sqrt(v)+tol);
                  if (cases > 0.0) {
                  cases = nearbyint(cases);
                  } else {
                  cases = 0.0;
                  }
                  ")

To briefly review the process model, we’ll start at the beginning. Major departures from the original model will be made clear and decisions to keep what were there will be explained.

The seasonality component of the model is kept, as the inclusion of vaccination in the modelling of a disease spreading presents no obvious reason to remove seasonality for the model.
A discrepancy between King’s model and the above is the exclusion of a cohort-entry component. The cohort component was originally included to model the beginning of a child’s school year as the introduction of a large number of new susceptible (He et al. 2010). Analysis of the outbreak has shown that nearly 20% of the cases during peak years were in children less than 1 year old, and over 50% of the cases were in children less than 5. However, the percentage of cases in older children (5-9 years) during this time is less than 9%. With the assumption that children under 5 years attend school at a low rate combined with the concentration of cases in this age group - a cohort effect may not be appropriate to include in the model. Moreover, of the cases that had the source of infection identified, only 14% of them were from schools. If schools do not play a large role in the spread of infection, a cohort effect may be inappropriate for the model (Dales et al. 1993).
The modelling of transmission rate, force of infection, and extrademographic stochasticity is exactly as specified in King’s original model.
Births here are calculated using the birthrate from a half year prior. This is to model maternal immunity, which generally lasts about six months (Drucker, R. 2010).
\(vac\) is the number of vaccinated individuals that would move from \(S \rightarrow R\). The vaccination rate, \(vr\), is a parameter of our model. This is multiplied by a vaccination effectiveness of 95%. We then multiply both of these by the birthrate lagged one year, as that is when children are typically vaccinated for measles. An inelegant bit of the model is the last if-statement before the updates to the compartments are made. Given the variation allowed in our parameters during maximization (shown later in this analysis), it’s possible to imagine a scenario where the parameters have been created such that our \(vac\) count at a specific time step is higher than the count in \(S\) less those individuals transitioning to \(I\) and \(\text{dead}\) compartments. This statement is a safe-guard against this, ensuring that our \(S\) compartment does not become negative. This is, admittedly, a shortcoming of the model that could be improved upon in the future.

Model Fitting

To investigate a decent fit for our model, we perform two analyses. First, we carry out a local search of model parameters based on prior estimates given in He et al. Our second analysis is based on a global search where our initial parameters are varying and compare them to our local search. We will specify the parameters in question below before moving on to our analysis.

Parameters

Below are the parameters we want to estimate with brief definitions:

\(R0\): “the expected number of secondary infections engendered by an infective introduced into a fully susceptible population” (He et al. 2010). This term helps parameterize \(\beta\), the transmission rate, in our rprocess model. A higher \(R0\) would signify a higher transmission rate.
\(\gamma\): Our rate of recovery
\(\alpha\): A mixing parameter; the closer to one, the more homogeneous mixing is in the population
\(\iota\): “the mean number of infectives visiting the population at any given time” (He et al. 2010)
\(\rho\): Our reporting rate
\(\psi\): Overdispersion parameter in the reporting process, found in the measurement model above.
\(\sigma\): Our rate of ending the latent stage
\(\sigma_{\text{SE}}\): Extrademographic stochasticity affecting the force of infection
\(vr\): Our vaccination rate
\(amplitude\): Our seasonality parameter

Local Search

In order to perform a local search, initial values for our parameters must be selected before using the iterated filtering algorithm (mif2) in the pomp package to find local maximums for these parameters. These initial values were based on an initial investigation performed by He et al.(2010). Their paper modeled the epidemiology of measles in 20 different cities around England prior to the vaccinations availability, and found parameter estimates for their very similar model; the basis by which we’ve established the model above. The parameter estimates for each city were evaluated with the model above using the California data.

Our initial step was to perform particle filtering on these maximum likelihood estimates in order to find a suitable set of initial parameters. A few alterations were made to these parameters before running the particle filter. Firstly, a \(vr\) parameter needed to be assigned - it was set to 0.80, based on an 0.82 US vaccination rate estimate by the World Health Organization in 1991. Another parameter that was changed was \(\mu\), our death rate. This was set at 0.007, the value given by the California Department of Publich Health for 1990.

After performing particle filter, we can see the maximum likelihood estimates below for the California data and the model above with the parameters from each of the 20 cities given by He et al(2010). An interval of \(\pm 1.96\) standard errors of the estimate are given by the error bars.

We see many low-varying estimates with log-likelihood values > -1000. Moving ahead, we’ll use those of Bedwellty in order to initialize our local search.

town	loglik	loglik.sd	mu	delay	sigma	gamma	rho	R0	amplitude	alpha	iota	cohort	psi	sigmaSE	vr
Bedwellty	-792.2213	1.559706	0.007	4	57.9	146	0.311	24.7	0.16	0.937	0.0396	0.351	0.951	0.0611	0.8

With our initial parameters, we can now begin our local search using the iterated filtering algorithm specified above. Below we can look at the likelihood results from our local search and a pairs plot of the resulting parameter estimates.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -727.07 -724.52 -722.11 -722.47 -720.07 -719.43

Global Search

Next we will conduct a global search using the same iterated filtering algorithm used above. The difference being that we randomly and uniformly select initial parameters from a specified range and run the mif2 algorithm. Below are the boxes we let the parameters vary over - the limits were created by using approximate results from He et al. and intuition regarding reasonable values for the parameters.

measles_box <- rbind(
  R0=c(29,40),
  gamma=c(60,169),
  alpha=c(.7,1),
  iota=c(0,.4),
  rho=c(.15,.65),
  psi=c(.15,.5),
  sigma=c(41,56),
  sigmaSE=c(.03,.09),
  vr=c(0.85,.95),
  amplitude=c(0.2, 0.5)
)

After iterated filtering was run, particle filtering was used and likelihood estimates obtained. We can see the results below:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -724.26 -721.78 -721.49 -721.49 -720.92 -720.47

We can see that the maximum likelihood obtained slightly lower than what was found in our local search. This is disappointing and the reason behind this is worth further investigation.

Remarks and Conclusions

Firstly, it should be noted that because of computational issues, only 20 mifs were ran for both the local and global searches with 200 mif iterations. All conclusions made ought to be taken in this context. Moreover, this could be the reason behind our finding a greater likelihood in our local search than our global search.

Initially we observe a few interesting results from our local and global searches. As mentioned, the local search reached a slightly higher likelihood than our global search. This is unexpected, as a search initializing at many different parameter values would seem as if it should reach a higher maximum. We also see a smaller variance in the likelihood of our global search; this further contradicts general intuition, as one would expect varying initial values would result in a larger range of likelihoods estimates from our mifs.

Looking at our pairs plot for both local and global searches, we see what looks like a trade off between \(\sigma_{\text{SE}}\) and \(\alpha\). We can look more closely at this result below, from our global results:

plot of chunk unnamed-chunk-11

We \(\sigma_{\text{SE}}\) and \(\alpha\) seem to trade off while maintaining likelihood.

Moving on, we can look at our diagnostic plots for our mif2 maximization procedure.

plot of chunk unnamed-chunk-12

There are a number of specific parameters that stand out. With the exception of one mif, we see that \(\rho\) converges to a very low value, suggesting a low reporting rate. We see \(\sigma_{\text{SE}}\) jumping to ~0.3 for most mifs. This is notably higher than the parameter estimates obtained by He et al. This may not be surprising, as one could imagine in a diverse and populous state such as California, extra demographic stochasticity may be quite large. We also see estimates for \(\alpha\) generally sink quite low, suggesting heterogeneous mixing in our population. Analysis by Dales et al. seems to corroborate this finding, suggesting that cases of measles may have been concentrated in minority communities and, as previously noted, young children.

Further Investigation

Depending on the interest of the reader or future researcher, any number of analysis may prove helpful. Generally speaking, running more mifs at more iterations would be helpful. A number of computational issues came about during this project which lead to this shortcoming; future analysis improving upon this alone would be useful as it could lead to different results.

Updated Compartment Models

Firstly, given additional computational power and epidemiological knowledge, a different compartment model could be used. It may be useful to separate infants after 6 months into a completely different susceptible pool. Given their larger susceptibility to the disease relative to older susceptibles, a separate compartment may be able to capture some disease dynamics that this compartment model was unable to investigate. The proposed model may look as follows:

Here, \(\text{S1}\) would model young children who are at a much higher level of susceptibility than older children and adults. These young children could move to either an exposed compartment, \(\text{E}\), or, upon successful vaccination, to a recovered compartment, \(\text{R}\). This type of model would need the inclusion of more of the population (i.e., not just children under ten years old), but may be worth investigating in the future.

Likelihood Profiles

Depending on the researchers parameter of interest, creating a likelihood profile may be quite useful. Developing some certainty in the estimate of parameters like \(R0\) would help quantify the force of a disease. It’s quite possible that a post-vaccination measles may have different infective properties than the disease prior to the widespread use of vaccinations. If the force of infection, reflected in a change in \(R0\), were to change, knowledge of this and the degree to which it has changed would help better prepare any public health institution for a future outbreak. Similarly, knowledge of our vaccination rate, \(vr\), would help determine whether or not additional resources may need to be allocated to getting the population better vaccinated. If a likelihood profile for vaccination rates revealed that the rate was significantly lower than desired - that would flag a need for these additional resources.

Additional Parameter Estimation

Many assumptions were made in this report regarding fixed model specifications/rates that could be estimated in a future model. A couple proposals are below:

Vaccination Effectiveness: In this study, vaccination effectiveness was held at 95%. It is possible to imagine a virus mutation or a batch of faulty vaccinations that would result in a low vaccination effectiveness. If this were the case, modelling vaccination effectiveness during an outbreak would be useful. Identification of a low effectiveness rate and, consequently, its cause could help prevent future outbreaks.
Susceptible Pool Entrance: A 6 month lag into the susceptible pool after birth was based on prior research on maternal immunity (Drucker 2010). A number of factors may contribute to this lag being different: not all mothers breastfeed, some mothers may not have measles antibodies, and maternal immunity length may not be uniform for all people. With this in mind, varying this lag to find a better fitting model may be reasonable.

References

King, A. (2015, June 7). Case study: Measles in large and small towns. Retrieved from http://kingaa.github.io/sbied/measles/measles.html

He, D., Ionides, E. L., & King, A. A. (2010). Plug-and-play inference for disease dynamics: measles in large and small populations as a case study. Journal of the Royal Society Interface, 7(43), 271-283. http://doi.org/10.1098/rsif.2009.0151

Dales, L. G., Kizer, K. W., Rutherford, G. W., Pertowski, C. A., Waterman, S. H., & Woodford, G. (1993). Measles epidemic from failure to immunize. Western Journal of Medicine, 159(4), 455-464.

Drucker, R. (2010). How Long Does Passive Immunity to Measles Last in Infants. NEJM Journal Watch. Retrieved from http://www.jwatch.org/pa201006020000002/2010/06/02/how-long-does-passive-immunity-measles-last

Population Estimates. (n.d.). Retrieved April 26, 2016, from https://www.census.gov/popest/data/historical/index.html

Statistics. (n.d.). Retrieved April 26, 2016, from http://www.cdph.ca.gov/data/statistics/Pages/default.aspx

Ahmad, S., Zahid, S., & Jan, A. (2013). The impact of parental education and socioeconomic status on routine childhood vaccination: An obsevational study. Journal Of Postgraduate Medical Institute (Peshawar - Pakistan), 27(3). Retrieved from http://www.jpmi.org.pk/index.php/jpmi/article/view/1391/1367

Measles vaccine. (n.d.). Retrieved April 26, 2016, from https://en.wikipedia.org/wiki/Measles_vaccine

Measles. (n.d.). Retrieved April 26, 2016, from https://en.wikipedia.org/wiki/Measles