The coronavirus COVID-19 is undoubtedly already the defining feature of 2020, if not our lives thus far, for most of us. COVID-19, the disease caused by the SARS-CoV-2 virus, was first identified in December 2019 in Wuhan, China and since then has spread worldwide, with a total of over 3 million confirmed cases and 208,000 deaths as of April 26, 2020. The exponential spread of COVID-19 can be attributed to both the ability of the virus to be transmitted between individuals even before any symptoms emerge, as well as the lack of a vaccine to develop heard immunity in the population. Without any means of controlling the spread of the SARS-CoV-2 virus, government officials around the world have resorted to implementing stay at home orders, mandatory shut downs of non-essential businesses and advocating for strict social distancing practices during essential activities in public in an effort to “flatten the curve”, or slow the spread of the disease and avoid overwhelming the healthcare system. These orders have been in effect for about 6-7 weeks throughout the United States, and extend through the end of May in some parts of the country. These stay at home orders have had a significant impact on our society, both economically and socially, causing many citizens and politicians to push to lift the shelter-in-place (SIP) ordinances and return to normal daily life.
Historical data from the Spanish Flu pandemic of 1918 can serve as an interesting case study into what can happen upon lifting SIP orders. The following figures show time series of Spanish flu-caused deaths from Philadelphia, New York, San Francisco and St. Louis show how vastly outcomes can differ even within our large U.S. cities based on the timing and extent of social distancing mandates. If social distancing measures were put into place early enough in the timeline of COVID-19 spread, some cities may have “successfully flattened their curve”, as was done in New York and St. Louis during the Spanish Flu pandemic. But how flat is flat enough? By lifting these measures too early, though, we are threatened with a second resurgence peak of cases, as was seen in San Francisco and even in St. Louis, where things started off well and they thought they had done enough to flatten their curve for good. As America begins its plans to open its doors again amidst this pandemic, we must investigate and ask ourselves: is this history bound to repeat itself in the upcoming weeks?
In this case study, we will develop an Susceptible-Exposed-Infected-Removed (SEIR) compartment model of COVID-19 spread in Seattle, WA metropolitan area. We will simulate the effects of various social distancing scenarios based on work done by Matrajt and Leung 2020, and attempt to fit the model to the real-world COVID-19 case count time series data for Seattle.
Our COVID-19 case count dataset was obtained from the Johns Hopkins University Center for Systems Science and Engineering. While full county-level data is available for the U.S (and country-level data is avaialble worldwide), we choose to limit our case study to the Seattle Metropolitan Area, comprised of King, Pierce and Snohomish counties in Washington state. The reasoning for this is two-fold: firstly, Matrajt and Leung focus on this area specificially, so it makes sesne for a replication study to use the same population, and secondly, the Seattle Metro Area was home to the first confirmed COVID-19 case in the U.S. dating back to January 21, 2020, meaning this area can provide the longest time series of active COVID-19, which is desirable for our analysis since even this time frame is relatively short for POMP modeling.
The daily time series we utilize here is comprised of confirmed COVID-19 case counts from the Seattle Metropolitan Area, obtained by summing the rows corresponding to King, Pierce and Snohomish counties. This time series consists of 95 observations from January 22nd to April 25th of 2020. Plotting this time series below, we see a slow rise of COVID-19 cases in the first 50 days of its arrival in Seattle, and exponential growth thereafter, even though stay at home orders have been in effect in the state of Washington since March 12th. It is important to note these are cumulative case counts to date rather than only active case counts, as full reporting of recoveries and deaths are not available at this time.
We will use a framework based on the one proposed by Matrajt and Leung 2020 to model the spread of COVID-19 in the Seattle Metro Area. In their analysis, Matrajt and Leung calibrated ten SEIR models for individual age groups: 0–5, 6–9, 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79, and > 80 years old. The force of infection (\(\lambda\) or \(\mu_{IR}\)) for each age group model was based on the average inter-group contact matrix (estimated by Wallinga et al. 2006) normalized by the percentage of the population in each contact age group. The details of the role of the covariates in the model will be described in more detail in Section 3. These covariates are visualized below.
(Groups for the contact matrix are ordered such that Group 1 = 0-5 years, …, Group 10 = 80+ years)
We develop an POMP compartment framework to model the spread of COVID-19 in the Seattle population. This model consists of four distinct stages:
\[\frac{dS}{dt}= -\lambda S(t)\] \[\frac{dE}{dt}=\lambda S(t) - \sigma E(t)\] \[\frac{dI}{dt}=\sigma E(t) - \gamma I(t)\] \[\frac{dR}{dt}=\gamma I(t)\]
Matrajt and Leung originally define the force of infection (\(\lambda_i\)) (i.e. the rate of moving from state \(S\) to state \(E\)) for each age class \(i\) as:
\[ \lambda_i = \mu_{SE_{i}} = \beta \sum_{j=1}^{10} \frac{c_{i,j}}{N_j}I_j \] where \(c_{i,j}\) is the estimated number of contacts per day between age class \(i\) and \(j\) in a total population size \(N\). In addition to our goal of replicating a version of Matrajt and Leung’s simulation study, we also wish to fit our POMP model to our available COVID-19 confirmed case time series, which, unfortunately, does not include any age group information and therefore precludes stratification of our data in this way. Instead, we computed a weighted average contact rate \(C\) for the population as a whole as follows:
\[ C = \sum_{i=1}^{10} \frac{N_i}{N} \sum_{j=1}^{10} \frac{c_{i,j}}{N_j} = 0.0000493 \]
Based on this weighted average contact rate, we define the force of infection \(\lambda\) in our model as:
\[ \lambda = \mu_{SE} = \beta C I_j \] We further define our parameters as follows:
\[ \sigma = \mu_{EI} \]
\[ \gamma = \mu_{IR} \] Following Matrajt and Leung, we model \(\frac{1}{\sigma}\) as the mean latent period of the disease (range: 4.5-5.8 days, mean: 5.16 days) [Lauer et al. 2020] and \(\frac{1}{\gamma}\) as the mean infectious period of the disease (range: 3-9 days, mean: 5.02 days) [Bi et al. 2020].
Lastly, the \(\beta\) parameter represents the transmission coefficient of COVID-19 in the population, and depends both on \(\gamma\) and the basic reproduction number \(R_0\), which translates to the number of secondary cases stemming from a primary case of disease. We vary \(\beta\) to fit the confirmed case time series, and can use our final \(\beta\) value to estimate the COVID-19 \(R_0\) in this popultion.
As in Matrajt and Leung, we fix parameter values for the total population \(N\) = 3.5 million and for the reporting rate \(\rho\) at 0.2. This low value of \(\rho\) is chosen as early research shows as many as 80% of COVID-19 cases are mild and are likely to go unreported in the United States, where widespread testing is currently unavailable. Ideally it would be preferrable to vary \(\rho\) along with the other parameters, but for simplicity and to follow the framework given by Matrajt and Leung, we will hold it fixed here.
We model our case count data as a Poisson process:
\[ cases_t \sim Pois (\rho)\]
cov_statenames <- c("S","E","I","R")
cov_paramnames <- c("Beta","N","rho","mu_EI","mu_IR")
cov_obsnames <- "B"
cov_dmeasure <- "
lik = dpois(B,rho*R+1e-6,give_log);
"
cov_rmeasure <- "
B = rpois(rho*R+1e-6);
"
cov_rprocess <- "
double dN_SE = rbinom(S,1-exp(-Beta*C*I*dt));
double dN_EI = rbinom(E,1-exp(-dt*mu_EI));
double dN_IR = rbinom(I,1-exp(-dt*mu_IR));
S -= dN_SE;
E += dN_SE - dN_EI;
I += dN_EI - dN_IR;
R += dN_IR;
"
cov_rinit <- "
S = nearbyint(N)-1;
E = 0;
I = 1;
R = 0;
"
We begin our simulation study by creating covariates that model the reduction in average contact rate \(C\) within our population. We model 4 scenarios here: 0% reduction (no change), 50% reduction, 75% reduction and 90% reduction in average contact rate beginning at Day 51 in our time series, corresponding to the start of the stay at home mandate on March 12. Currently, this stay at home order expires May 4th, possibly leaving the population of Seattle vulnerable to a resurgence peak in COVID-19 cases. To simulate what may happen upon this re-opening, we extend the length of our time series from 95 time points to 155 time points, enabling simulation of 52 days of reduced contact rate under the social distancing mandate and 52 days at normal contact rates after the mandate expires. While we understand that this is an extreme case, as it is likely that the re-opening will be gradual and most individuals still try to reduce their non-essential contacts upon lifting of the social distancing ordinance, we wanted to simulate the case of an immediate return to “normalcy”, which is what some people are vehemently advocating for.
# extend TS to forecast through 7 weeks after stay at home ordinance lifted
day = 96:155
B = rep(NA, length(day))
added = data.frame(day, B)
data=rbind(data, added)
# setting up contact rate covariates for various contact reduction percentages
# weighted avg contacts by age group
# no change in contact rate before March 12
data$C1 = sum(pop*lambdas)
data$C50 = sum(pop*lambdas)
data$C75 = sum(pop*lambdas)
data$C90 = sum(pop*lambdas)
# stay at home order begins March 12
data$C50[data$day > 50] = sum(pop*lambdas)*0.5
data$C75[data$day > 50] = sum(pop*lambdas)*0.25
data$C90[data$day > 50] = sum(pop*lambdas)*0.1
# stay at home order lifts May 4, back to previous contact rate
data$C50[data$day > 103] = sum(pop*lambdas)
data$C75[data$day > 103] = sum(pop*lambdas)
data$C90[data$day > 103] = sum(pop*lambdas)
cov_covar1 <- covariate_table(
C=data$C1,
t=data$day,
times="t"
)
cov_covar50 <- covariate_table(
C=data$C50,
t=data$day,
times="t"
)
cov_covar75 <- covariate_table(
C=data$C75,
t=data$day,
times="t"
)
cov_covar90 <- covariate_table(
C=data$C90,
t=data$day,
times="t"
)
cov_pomp1 <- pomp(
data=subset(data,select=c(day,B)),
times="day",
t0=0,
rprocess=euler(
step.fun=Csnippet(cov_rprocess),
delta.t=1/12
),
rmeasure=Csnippet(cov_rmeasure),
dmeasure=Csnippet(cov_dmeasure),
partrans=parameter_trans(
log=c("Beta","N","mu_EI","mu_IR"),
logit="rho"),
covar=cov_covar1,
obsnames = cov_obsnames,
statenames=cov_statenames,
paramnames=cov_paramnames,
rinit=Csnippet(cov_rinit)
)
cov_pomp50 <- pomp(
data=subset(data,select=c(day,B)),
times="day",
t0=0,
rprocess=euler(
step.fun=Csnippet(cov_rprocess),
delta.t=1/12
),
rmeasure=Csnippet(cov_rmeasure),
dmeasure=Csnippet(cov_dmeasure),
partrans=parameter_trans(
log=c("Beta","N","mu_EI","mu_IR"),
logit="rho"),
covar=cov_covar50,
obsnames = cov_obsnames,
statenames=cov_statenames,
paramnames=cov_paramnames,
rinit=Csnippet(cov_rinit)
)
cov_pomp75 <- pomp(
data=subset(data,select=c(day,B)),
times="day",
t0=0,
rprocess=euler(
step.fun=Csnippet(cov_rprocess),
delta.t=1/12
),
rmeasure=Csnippet(cov_rmeasure),
dmeasure=Csnippet(cov_dmeasure),
partrans=parameter_trans(
log=c("Beta","N","mu_EI","mu_IR"),
logit="rho"),
covar=cov_covar75,
obsnames = cov_obsnames,
statenames=cov_statenames,
paramnames=cov_paramnames,
rinit=Csnippet(cov_rinit)
)
cov_pomp90 <- pomp(
data=subset(data,select=c(day,B)),
times="day",
t0=0,
rprocess=euler(
step.fun=Csnippet(cov_rprocess),
delta.t=1/12
),
rmeasure=Csnippet(cov_rmeasure),
dmeasure=Csnippet(cov_dmeasure),
partrans=parameter_trans(
log=c("Beta","N","mu_EI","mu_IR"),
logit="rho"),
covar=cov_covar90,
obsnames = cov_obsnames,
statenames=cov_statenames,
paramnames=cov_paramnames,
rinit=Csnippet(cov_rinit)
)
We set the following parameter values according to Matrajt and Leung for the simulations:
\[ \mu_{EI} = \frac{1}{5.16} \] \[ \mu_{IR} = \frac{1}{5.02} \] \[ \rho = 0.2 \] \[ N = 3,500,000 \] We vary \(\beta\) until the distancing simulations approximate the real data, then hold the \(\beta\) value constant across all four distancing scenarios to generate the simulations. We plot the full scale of the simulations, as well as a zoomed version to better visualize the trajectories of the various simulations against the true time series.
sims1 <- simulate(cov_pomp1,params=c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000),
nsim=50,format="data.frame",include=TRUE)
sims50 <- simulate(cov_pomp50,params=c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000),
nsim=50,format="data.frame",include=TRUE)
sims75 <- simulate(cov_pomp75,params=c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000),
nsim=50,format="data.frame",include=TRUE)
sims90 <- simulate(cov_pomp90,params=c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000),
nsim=50,format="data.frame",include=TRUE)
sims1$distancing = 0; sims50$distancing = .50; sims75$distancing = 0.75; sims90$distancing = .90
sims1 = sims1[156:length(sims1$day),]
sims50 = sims50[156:length(sims50$day),]
sims75 = sims75[156:length(sims75$day),]
sims90 = sims90[156:length(sims90$day),]
plot_data = data[,c(1,2)]; plot_data$.id = NA
plot_data$distancing = 'Real Data'; plot_data$S = NA; plot_data$E = NA;
plot_data$I= NA; plot_data$R = NA; plot_data$C = NA
all_sims =do.call("rbind", list(plot_data, sims1, sims50, sims75, sims90))
ggplot(all_sims,mapping=aes(x=day,y=B,group=interaction(.id, distancing),color=distancing))+
geom_rect(xmin=96, xmax=155, ymin=-25000, ymax=10, fill = "grey90", color="grey90")+
geom_line( size = 1.5, alpha = 0.5)+
geom_line(data=plot_data, aes(x=day, y=B), color = "#386cb0", size=2)+
geom_vline(xintercept = 51, color = 'red', linetype='dashed') +
annotate("text", x=51, y=500000, label="\nStay at Home Order Begins", angle = 90, size=4) +
geom_vline(xintercept = 104, color = 'red', linetype='dashed') +
annotate("text", x=101, y=500000, label="Stay at Home Order Ends", angle = 90, size=4) +
annotate("text", x=125, y=-12500, label="Future Projection", size=4, fontface = 'italic') +
scale_color_brewer(palette = "Accent") +
ggtitle("SEIR Simulations of COVID-19 Spread in Seattle, WA") +
xlab("Days Since First Case") + ylab("COVID-19 Cases")+
ylim(-25000, NA) + labs(color='Contact Reduction (%)')
ggplot(all_sims,mapping=aes(x=day,y=B,group=interaction(.id, distancing),color=distancing))+
geom_rect(xmin=96, xmax=155, ymin=-1000, ymax=10, fill = "grey90", color="grey90")+
geom_line( size = 1.5, alpha = 0.5)+
geom_line(data=plot_data, aes(x=day, y=B), color = "#386cb0", size=2)+
geom_vline(xintercept = 51, color = 'red', linetype='dashed') +
annotate("text", x=51, y=10000, label="\nStay at Home Order Begins", angle = 90, size=4) +
geom_vline(xintercept = 104, color = 'red', linetype='dashed') +
annotate("text", x=101, y=10000, label="Stay at Home Order Ends", angle = 90, size=4) +
annotate("text", x=125, y=-700, label="Future Projection", size=4, fontface = 'italic') +
scale_color_brewer(palette = "Accent") +
ggtitle("SEIR Simulations of COVID-19 Spread in Seattle, WA (Zoomed)") +
xlab("Days Since First Case") + ylab("COVID-19 Cases")+
ylim(-1000,20000) + labs(color='Contact Reduction (%)')
We can learn a great deal about the relationship between social distancing and the dynamics of COVID-19 spread via this simulation study. Firstly, we very clearly see the effects of contact reduction on reducing the spread of disease, or in other words “flattening the curve”. Furthermore, no matter how effectively the population reduced their contact rates during the 7-week stay at home order, we find evidence for a large resurgence of cases if we were to resume “normal” contact rates after the stay at home order is lifted on May 4th. These results suggest that 1) the stay at home order should be extended beyond the current May 4th date and 2) rates of non-essential contacts should still be minimized whenever the stay at home orders are lifted, and only gradually increased rather than immediately returning to pre-pandemic levels. Without these precautions, it is clear that the rates of infection will undoubtedly skyrocket and overwhelm the healthcare system in the weeks to come.
Furthermore, we notice that the real data seems to lie between the 50% and 75% contact reduction simulation trajectories. Based on this, we estimate the true amount of contact reduction to be 60%, and use this setting for calibration of the model to the true data.
data = data[1:95,]
data$C60 = sum(pop*lambdas)
data$C60[data$day > 50] = sum(pop*lambdas)*0.4
cov_covar60 <- covariate_table(
C=data$C60,
t=data$day,
times="t"
)
cov_pomp60 <- pomp(
data=data,
times="day",
t0=0,
rprocess=euler(
step.fun=Csnippet(cov_rprocess),
delta.t=1/12
),
rmeasure=Csnippet(cov_rmeasure),
dmeasure=Csnippet(cov_dmeasure),
partrans=parameter_trans(
log=c("Beta","N","mu_EI","mu_IR"),
logit="rho"),
covar=cov_covar60,
obsnames = cov_obsnames,
statenames=cov_statenames,
paramnames=cov_paramnames,
rinit=Csnippet(cov_rinit)
)
For both local and global maximization, we will be using the iterative filtering (IF2) algorithm implemented in the pomp
package. Our starting values will be those given by Matrajt and Leung for all parameters but \(\beta\), which we determined the start value for via the simulation study above. The purpose of the local search is for a more efficient global search.
We will carry out the parallelized local maximization using our local machine at run_level
= 2 to obtain an understanding of the likelihood over the parameter space in a relatively short amount of time. We will later run the global search locally at run_level
= 2, as well as on the Great Lakes cluster at run_level
= 3 for a more robust search of the parameter space.
run_level <- 2
cov_Np <- switch(run_level, 100, 20000, 50000)
cov_Nmif <- switch(run_level, 10, 100, 250)
cov_Neval <- switch(run_level, 10, 10, 10)
cov_Nglobal <- switch(run_level, 10, 10, 50)
cov_Nlocal <- switch(run_level, 10, 10, 20)
cov_mle = c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000)
cov_rw.sd <- 0.02; cov_cooling.fraction.50 <- 0.5
stew(file=sprintf("anotherlocal_search-%d.rda",run_level),{
t_local <- system.time({
mifs_local <- foreach(i=1:cov_Nlocal,
.packages='pomp', .combine=c) %dopar% {
mif2(cov_pomp60,
params=cov_mle,
Np=cov_Np,
Nmif=cov_Nmif,
cooling.fraction.50=cov_cooling.fraction.50,
rw.sd=rw.sd(
Beta=cov_rw.sd,
mu_IR=cov_rw.sd,
mu_EI=cov_rw.sd)
)
}
})
},seed=900242057,kind="L'Ecuyer")
## ----lik_local_eval,cache=FALSE------------------------------------------
stew(file=sprintf("anotherlik_local-%d.rda",run_level),{
t_local_eval <- system.time({
liks_local <- foreach(i=1:cov_Nlocal,
.combine=rbind,.packages='pomp')%dopar% {
evals <- replicate(cov_Neval, logLik(
pfilter(cov_pomp60,params=coef(mifs_local[[i]]),Np=cov_Np)))
logmeanexp(evals, se=TRUE)
}
})
},seed=900242057,kind="L'Ecuyer")
results_local <- data.frame(logLik=liks_local[,1],
logLik_se=liks_local[,2],t(sapply(mifs_local,coef)))
## ----lik_local_summary---------------------------------------------------
summary(results_local$logLik,digits=5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1723.6 -1653.2 -814.5 -1118.1 -734.9 -685.3
pairs(~logLik+Beta+mu_IR+mu_EI,
data=results_local)
We determined the box for the global search based on 1) the results of the local search and 2) the acceptable value ranges suggested by prior scientific work in Matrajt and Leung. The starting values in the global search will be randomly sampled from these ranges and preturbed over the the number of MIF iterations cov_Nmif
.
run_level <- 2
cov_Np <- switch(run_level, 100, 20000, 50000)
cov_Nmif <- switch(run_level, 10, 100, 250)
cov_Neval <- switch(run_level, 10, 10, 10)
cov_Nglobal <- switch(run_level, 10, 10, 50)
cov_Nlocal <- switch(run_level, 10, 10, 20)
cov_fixed_params <- c(rho=0.2, N=3500000)
cov_box <- rbind(
Beta=c(0.001, 0.01),
mu_IR=c(1/9,1/3),
mu_EI = c(1/5.8, 1/4.5)
)
cov_rw.sd <- 0.02
cov_cooling.fraction.50 <- 0.5
stew(file=sprintf("andyetanother_box_eval-%d.rda",run_level),{
t_global <- system.time({
mifs_global <- foreach(i=1:cov_Nglobal,
.packages='pomp', .combine=c) %dopar% mif2(
cov_pomp60,
params=c(apply(cov_box,1,function(x)runif(1,x[1],x[2])),
cov_fixed_params),
Np=cov_Np,
Nmif=cov_Nmif,
cooling.fraction.50=cov_cooling.fraction.50,
rw.sd=rw.sd(
Beta=cov_rw.sd,
mu_IR=cov_rw.sd,
mu_EI=cov_rw.sd
)
)
})
})
stew(file = sprintf("andyetanother_lik_global_eval-%d.rda", run_level), {
t_global_eval <- system.time({
liks_global <- foreach(i = 1:cov_Nglobal,
.packages = 'pomp',
.combine = rbind) %dopar% {
evals <- replicate(cov_Neval,
logLik(pfilter(
cov_pomp60, params = coef(mifs_global[[i]]), Np = cov_Np
)))
logmeanexp(evals, se = TRUE)}
})
})
results_global <- data.frame(logLik = liks_global[, 1],
logLik_se = liks_global[, 2], t(sapply(mifs_global, coef)))
summary(results_global$logLik, digits = 5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1741.3 -1715.7 -1668.1 -1347.4 -828.0 -749.5
plot(mifs_global)
results_global[which.max(results_global[,"logLik"]),]
## logLik logLik_se Beta mu_IR mu_EI rho N
## result.3 -749.4675 1.227771 0.002153942 0.0004243604 1.959094 0.2 3500000
run_level <- 3
cov_Np <- switch(run_level, 100, 20000, 50000)
cov_Nmif <- switch(run_level, 10, 100, 250)
cov_Neval <- switch(run_level, 10, 10, 10)
cov_Nglobal <- switch(run_level, 10, 10, 50)
cov_Nlocal <- switch(run_level, 10, 10, 20)
load('anotherbox_eval-3.rda')
load('anotherlik_global_eval-3.rda')
results_global <- data.frame(logLik = liks_global[, 1],
logLik_se = liks_global[, 2], t(sapply(mifs_global, coef)))
summary(results_global$logLik, digits = 5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1865.2 -858.1 -766.8 -858.6 -702.1 -656.7
results_global[which.max(results_global[,"logLik"]),]
## logLik logLik_se Beta mu_IR mu_EI rho
## result.10 -656.742 0.6325675 0.005230596 0.0003667853 0.1674002 0.2
## N
## result.10 3500000
#plot(mifs_global)
We observe that the convergence across the board at run_level
=2 is quite bad, suggesting that a more robust search is needed, or perhaps the model does not capture the true dynamic process, or, very likely, both of these are true. Unfortunately, the mifs_global
object pulled down from our run_level
=3 global maximization on the Great Lakes cluster are missing parameter-specific plots and due to time constraints we were unable to re-run the global search at this level to fix this. However, we do notice that the convergence of the plots that did render at this run level is much better than those at run_level
=2, suggesting a more robust search is needed to properly identify the MLE of this dynamic system. Effective sample size is good (> 100) across the board, though it does exhibit several dips at time points > 40.
When we run the global maximization at run_level
=2 locally on the laptop we find the max(logLik)
= -749.5, and when we run it on the cluster at run_level
=3 we find a better maximization at max(logLik)
=-656.7. Taking a deeper look at the parameter values at the max(logLik)
, we find the estimate for \(\mu_{EI}\) to be very reasonable, corresponding to a mean latent period of ~5.9 days, however the estimate for \(\mu_{IR}\) is completely unreasonable, suggesting a mean infectious period of 2726.391 days. We will comment on this more in Section 7.
Simulating at the parameter vector corresponding to the global MLE from run_level
= 3 does indeed match the true data quite well.
gsims60 <- simulate(cov_pomp60,params=c(Beta=0.005230596,mu_IR=0.0003667853,mu_EI=0.1674002,rho=0.2,N=3500000),
nsim=20,format="data.frame",include=TRUE)
gsims60 =gsims60[96:length(gsims60$day),]
gsims60$distancing = "Simulated at MLE";
gsims60 = gsims60[,c("day","B",".id","distancing")]
plot_data = data[,c(1,2)]; plot_data$.id = NA
plot_data$distancing = 'Real Data'
all_sims =do.call("rbind", list(plot_data, gsims60))
ggplot(all_sims,mapping=aes(x=day,y=B,group=interaction(.id, distancing),color=distancing))+
geom_line( size = 1.5, alpha = 0.5)+
geom_line(data=plot_data, aes(x=day, y=B), color = "#e41a1c", size=2)+
geom_vline(xintercept = 51, color = 'red', linetype='dashed') +
annotate("text", x=51, y=8000, label="\nStay at Home Order Begins", angle = 90, size=4) +
geom_vline(xintercept = 104, color = 'grey', linetype='dashed') +
scale_color_brewer(palette = "Set1") +
ggtitle("SEIR Simulations of COVID-19 Spread in Seattle, WA") +
xlab("Days Since First Case") + ylab("COVID-19 Cases")+
labs(color='Data Set')
We investigate an SEIR framework for modeling the dynamics of COVID-19 spread in the Seattle Metro area from January 22 - April 25, 2020. We simulate 4 different social distancing scenarios and show that lifting stay at home orders as planned on May 4th is likely to result in strong resurgence of COVID-19, regardless of the level of aherence to social distancing mandates over the last 7 weeks.
We also fit this SEIR model to confirmed case-count data over the same time span using a baseline contact reduction of 60% during the social distancing ordinance. By performing global maximization via iterative filtering, we identify the MLE parameter vector that models the time series quite well in a simulation study, but identifies an unreasonable estimate of the mean infectious period of the disease. This may happen for several reasons:
Future directions for this work would be to repeat this analysis once more data is available, and to develop a more robust framework that 1) models reduction in contact rates on a more gradual scale rather than the simplistic “all or nothing” approach take here and 2) more adequately captures the latent exposure state and its relation to both the susceptible (highly contagious even in the pre-symptomatic state) and recovered states (bypassing infection altogether). Taking this together with the improved range for plausible \(\beta\) values may improve the results.