Chickenpox (Varicella) is a very contagious disease caused by the varicella-zoster virus (VZV). It causes a blister-like rash, itching, tiredness, and fever. Chickenpox used to be very common in the United States. Each year, chickenpox caused about 4 million cases, about 10,600 hospitalizations and 100 to 150 deaths (From CDC website).
SIR model is widely used for disease transmission dynamics. Based on that we also have SEIR model that includes a period of latency before becoming infectious. Like what we learnt from class, SIR model has very good performance in boarding school flu data (Anonymous 1978). Also SEIR model performed well in Measles data (He, Ionides, & King, J. R. Soc. Interface). I want to compare the performance of those two model and see whether it appropriate to include an latency period in transmission of Varicella.
The data of Varicella are from Project Tycho http://www.tycho.pitt.edu. Actually the data include all the states in US. However, the environment as well as population are different from state to state, so the coefficients may also be different. Therefore, it’s better to focus on a smaller scale. Therefore, I focused on Maryland for my study. The data of population in Maryland from 1902 to 2015 is from http://www.data.gov, with birth rate of each year.
I will use two pomp models, SIR and SEIR, in my study. The pomp representation for SIR and SEIR model already exist (Class note 12, EL Ionides & Case study: Measles in large and small towns, AA King). For SEIR model I will do slightly modification based on original model. For SIR model I will basically follow the class note.
In both models I will first perform local search to get the range of parameters, then perform global search and analyze the convergence diagnostic. The purpose is to see whether SEIR model works better in large population cases, with variation in the total population.
The plot is the time plot of Varicella cases in each week on different time scale. The upper one is plot of original data and there are obviously lots of missing values before 1973. And there is no evidence of missing value in time plot between 1973 to 1981. Also with limited computational resource, a smaller dataset would be easier to run. Therefore, it’s more reasonable to choose range from 1973 to 1981.
As we can see from the plot below, both population and number of birth increased between 1973 and 1981. There is no peak in the population and the population and birth shared the same pattern. This suggesting that Varicella make no significant impact on the populatin of Maryland. It is intuitively correct as Varicella is not a deadly disease.
Compare to normal SEIR model, we also include birth and death into account, since they are important new entrances to susceptible compartment. According to CDC website, patients of Varicella may have latency period, when they are infected but have no symptoms. In the original model in King’s case study, Measles patients in this latent period (E node in the diagram) are not infectous. And this is true for Varicella too, the patients of Varicella are not infectous until symptoms show up.
One special thing that my model is different from AA king’s model in the case study of Measles is that I also include an arrow from S to R. There are some children got vaccination when they are young, I want to show this process. According to CDC, 91% of children 19 to 35 months old in the United States had received one dose of varicella vaccine, varying from 83% to 95% by state. I didn’t find the exactly vaccination percentage for Maryland. So take (0.83, 0.95) as the range of \(vr\) in global search and take 0.85 as start in local search.
Before define process model, I want to add a little more explanation of my model to make it clear. The sturucture of my model is from King’s model, so the feature like cohort effect, seasonaility and transmission rate are the same as his model. So I won’t spend time describing his model and explain the details. \(vr\) is the vaccination rate, \(vac\) is the number of person received vaccination before infected. \(vr\) is the key parameter I would focus on. \(vr\) is multiplied by the vaccination effectiveness, which is 0.9 for Varicella. Then multiplied by birthrate lagged one year because new birth children would receive vaccination before one year old. My model haven’t take those who is infected and got vaccination after that into account. Below is how I defined the process model.
## ----rprocess------------------------------------------------------------
rproc <- Csnippet("
double beta, br, seas, foi, dw, births, vac;
double rate[6], trans[6];
// cohort effect
if (fabs(t-floor(t)-251.0/365.0) < 0.5*dt)
br = cohort*birthrate/dt + (1-cohort)*birthrate;
else
br = (1.0-cohort)*birthrate;
// term-time seasonality
t = (t-floor(t))*365.25;
if ((t>=7&&t<=100) || (t>=115&&t<=199) || (t>=252&&t<=300) || (t>=308&&t<=356))
seas = 1.0+amplitude*0.2411/0.7589;
else
seas = 1.0-amplitude;
// transmission rate
beta = R0*(gamma+mu)*seas;
// expected force of infection
foi = beta*pow(I+iota,alpha)/pop;
// white noise (extrademographic stochasticity)
dw = rgammawn(sigmaSE,dt);
rate[0] = foi*dw/dt; // stochastic force of infection
rate[1] = mu; // natural S death
rate[2] = sigma; // rate of ending of latent stage
rate[3] = mu; // natural E death
rate[4] = gamma; // recovery
rate[5] = mu; // natural I death
// Poisson births
births = rpois(br*dt);
// Vaccination
vac = nearbyint(vr*br*.9*dt);
// transitions between classes
reulermultinom(2,S,&rate[0],dt,&trans[0]);
reulermultinom(2,E,&rate[2],dt,&trans[2]);
reulermultinom(2,I,&rate[4],dt,&trans[4]);
S += births - trans[0] - trans[1] - vac;
E += trans[0] - trans[2] - trans[3];
I += trans[2] - trans[4] - trans[5];
R = pop - S - E - I + vac;
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
C += trans[4]; // true incidence
")
## ----initializer------------------------------------------------------------
initlz <- Csnippet("
double m = pop/(S_0+E_0+I_0+R_0);
S = nearbyint(m*S_0);
E = nearbyint(m*E_0);
I = nearbyint(m*I_0);
R = nearbyint(m*R_0);
W = 0;
C = 0;
")
## ----dmeasure------------------------------------------------------------
dmeas <- Csnippet("
double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 1.0e-18;
if (cases > 0.0) {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)-pnorm(cases-0.5,m,sqrt(v)+tol,1,0)+tol;
} else {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)+tol;
}
")
## ----rmeasure------------------------------------------------------------
rmeas <- Csnippet("
double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 1.0e-18;
cases = rnorm(m,sqrt(v)+tol);
if (cases > 0.0) {
cases = nearbyint(cases);
} else {
cases = 0.0;
}
")
Below are the parameters I want to estimate:
\(R0\): “the expected number of secondary infections engendered by an infective introduced into a fully susceptible population” (He et al. 2010).
\(\gamma\): Rate of recovery
\(cohort\): A specified fraction (cohort) of the cohort enter the susceptible pool all at once. Actually refer to the beginning of a child’s school year as the introduction of a large number of new susceptible
\(\alpha\): A mixing parameter; the closer to one, the more homogeneous mixing is in the population
\(\iota\): “the mean number of infectives visiting the population at any given time” (He et al. 2010)
\(\rho\): Reporting rate
\(\psi\): Overdispersion parameter in the reporting process.
\(\sigma\): Rate of ending the latent stage
\(\sigma_{\text{SE}}\): Extrademographic stochasticity affecting the force of infection
\(vr\): Vaccination rate
\(amplitude\): Seasonality parameter
\(S_0\), \(E_0\), \(I_0\), \(R_0\): Fraction of the population in each of the four compartments
In order to carry out local search, we need to define inital values for parameters. Here I take the inital value of King’s case study, and 0.85 for \(vr\). Once we run the global search for the first time, we can use the result to update the MLEs and run the local and global search again to get better result. Due to limitation of time, here I just run it once.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1340 -1339 -1337 -1337 -1336 -1333
Global search is actually based on the the reasonable range for parameters, and randomly select initial value for parameters from it and run the mif2 algorithm. As disucssed above, I first define the range box for parameters based on the pair plot.
abox = rbind(R0=c(5,50),mu=c(0.001,0.5),sigma=c(1,300),gamma=c(1,50),alpha=c(0.05,1),
iota=c(1,10),rho=c(0.5,1),sigmaSE=c(0.1,1),psi=c(0.1,1),cohort=c(0.1,1),
amplitude=c(0.1,1),S_0=c(0.01,0.03),E_0=c(0.00001,0.0001),I_0=c(5e-5,7e-5),R_0=c(0.5,1),vr=c(0.83,1.5))
The max likelihood of global search is significantly larger than local search.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -724.3 -721.8 -721.5 -721.5 -720.9 -720.5
The likelihood doesn’t converge or stable to a certain level, suggesting it may need more iterations to converge. The \(\rho\) converge to approximately 0.9, which means a high reorting rate. Also the range for \(S_0\), \(E_0\), \(I_0\), \(R_0\) are a bit small and none of them converged. This suggests that S, E, I, R initial parameters are not well identified by the model. For the most important parameter \(vr\), it converged to a value greater than 1, which is unexpected. There are two interpretation for this. The actual vaccination effectiveness may be greater than 0.9, so it lead to \(vr\) converge to a value greater than 1. Or this may result from the birth lag.
With the parameters estimations from global search, we can simulate cases and compare with original data. Simulation data caputures the seasonality pattern of the original data. As it covered the peaks of each outbreak, but it tend to over estimate the number of patient. And the first peak is slightly delay compare to original data.
SIR model is simpler and less computational consuming. Basic SIR model doesn’t have stage death or birth, in other word it doesn’t consider change in the whole population. Well it may be suitable for analysis epidemic in a close environment with fixed population, like what we did in class, the flu in boarding school. It may not be suitable for analysis of the whole state with variation in population each year. However, SIR can serve as a control group for SEIR model.In this part I basically used the code from class note 12.
The SIR model we used in class have two stages, \(R_1\) and \(R_2\), which refer to different period of recovering in \(R\) stage. The data of varicella just have one column of number of patients, not like the bsflu data that have two columns. So I assume those are two stages and both of them are not infectous. \(\mu_{R_1}\) and \(\mu_{R_2}\) are the transfer rate from \(R_1\) to \(R_2\) and from \(R_2\) to \(R_3\). Most of people get Varicella may recover in one week, and the time difference of my data is one week, so I fix \(\mu_{R_1}\) and \(\mu_{R_2}\) to 0.9. Also as I mentioned above, SIR model doesn’t take birth and death into account. So I fix the population at \(P=4000\), which is the 10% of the population of Marland at 1973. Because if we assume the population age follow normal distribution, there should be 10% of population age between 2 to 14. Since a person would be immune once he/she got infected and recovered. So population age between 2 to 14 are most likely to be infected.
bsflu_dmeasure <- "
lik = dpois(B,rho*R1+1e-6,give_log);
"
bsflu_rmeasure <- "
B = rpois(rho*R1+1e-6);
C = rpois(rho*R2);
"
bsflu_rprocess <- "
double t1 = rbinom(S,1-exp(-Beta*I*dt));
double t2 = rbinom(I,1-exp(-dt*mu_I));
double t3 = rbinom(R1,1-exp(-dt*mu_R1));
double t4 = rbinom(R2,1-exp(-dt*mu_R2));
S -= t1;
I += t1 - t2;
R1 += t2 - t3;
R2 += t3 - t4;
"
bsflu_fromEstimationScale <- "
TBeta = exp(Beta);
Tmu_I = exp(mu_I);
Trho = expit(rho);
"
bsflu_toEstimationScale <- "
TBeta = log(Beta);
Tmu_I = log(mu_I);
Trho = logit(rho);
"
bsflu_initializer <- "
S=4000;
I=1;
R1=0;
R2=0;
"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5422 -4584 -4500 -4501 -4308 -4258
With the relationship between parameters and likelihood from local search, we get a reasonable range for parameters. As shown in plot, the range of \(Beta\) is (0,0.5), \(\mu_I\) is (0.5,1), \(rho\) is (0.5,1). The max likelihood is -4258, we will compare this with the result of global search.
First, from the filter diagnostics we can see the likelihood dosen’t converge, so it may need more iterations to converge. \(\rho\) converge to 1, which is similar to SEIR model. \(\mu\) and \(\beta\) are not converge. The max likelihood of global search is -4247, slightly higher than the local search. So the improvement is not significant. Also from the simulation based on mle from global search, we can see the SIR model failed to capature the pattern of original data. Overall, SIR model performed badly on the Varicella data.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5423 -4871 -4723 -4746 -4590 -4247
Obviously SEIR model performed better than SIR model. And the modification on SEIR model with vaccination process is successful. Also from the estimation of \(vr\), I would say the vaccination rate is higher in Maryland than the national wide averge, and the vaccination is very effective.
From the diagnostic of convergence of SEIR model, the cohort effect is not significant here. School year children may not be an large enter of susceptible. People from all age could be susceptible unless he/she received vaccination. Therefore, in future study the cohort effect should be remove from the model.
[1] A. A. King, Case study: Measles in large and small towns
[2] E. L. Ionides, Class Note 12, Stats 531
[3] He, D., E. L. Ionides, and A. A. King. 2010. Plug-and-play inference for disease dynamics: Measles in large and small populations as a case study. Journal of the Royal Society, Interface 7:271–283.
[4] Centers for Disease Control and prevention, Chickenpox/Varicella Vaccination: https://www.cdc.gov/vaccines/vpd/varicella/index.html