1. Introduction

The coronavirus COVID-19 is undoubtedly already the defining feature of 2020, if not our lives thus far, for most of us. COVID-19, the disease caused by the SARS-CoV-2 virus, was first identified in December 2019 in Wuhan, China and since then has spread worldwide, with a total of over 3 million confirmed cases and 208,000 deaths as of April 26, 2020. The exponential spread of COVID-19 can be attributed to both the ability of the virus to be transmitted between individuals even before any symptoms emerge, as well as the lack of a vaccine to develop heard immunity in the population. Without any means of controlling the spread of the SARS-CoV-2 virus, government officials around the world have resorted to implementing stay at home orders, mandatory shut downs of non-essential businesses and advocating for strict social distancing practices during essential activities in public in an effort to “flatten the curve”, or slow the spread of the disease and avoid overwhelming the healthcare system. These orders have been in effect for about 6-7 weeks throughout the United States, and extend through the end of May in some parts of the country. These stay at home orders have had a significant impact on our society, both economically and socially, causing many citizens and politicians to push to lift the shelter-in-place (SIP) ordinances and return to normal daily life.

Historical data from the Spanish Flu pandemic of 1918 can serve as an interesting case study into what can happen upon lifting SIP orders. The following figures show time series of Spanish flu-caused deaths from Philadelphia, New York, San Francisco and St. Louis show how vastly outcomes can differ even within our large U.S. cities based on the timing and extent of social distancing mandates. If social distancing measures were put into place early enough in the timeline of COVID-19 spread, some cities may have “successfully flattened their curve”, as was done in New York and St. Louis during the Spanish Flu pandemic. But how flat is flat enough? By lifting these measures too early, though, we are threatened with a second resurgence peak of cases, as was seen in San Francisco and even in St. Louis, where things started off well and they thought they had done enough to flatten their curve for good. As America begins its plans to open its doors again amidst this pandemic, we must investigate and ask ourselves: is this history bound to repeat itself in the upcoming weeks?

Source: National Geographic

In this case study, we will develop an Susceptible-Exposed-Infected-Removed (SEIR) compartment model of COVID-19 spread in Seattle, WA metropolitan area. We will simulate the effects of various social distancing scenarios based on work done by Matrajt and Leung 2020, and attempt to fit the model to the real-world COVID-19 case count time series data for Seattle.

2. Data

Our COVID-19 case count dataset was obtained from the Johns Hopkins University Center for Systems Science and Engineering. While full county-level data is available for the U.S (and country-level data is avaialble worldwide), we choose to limit our case study to the Seattle Metropolitan Area, comprised of King, Pierce and Snohomish counties in Washington state. The reasoning for this is two-fold: firstly, Matrajt and Leung focus on this area specificially, so it makes sesne for a replication study to use the same population, and secondly, the Seattle Metro Area was home to the first confirmed COVID-19 case in the U.S. dating back to January 21, 2020, meaning this area can provide the longest time series of active COVID-19, which is desirable for our analysis since even this time frame is relatively short for POMP modeling.

The daily time series we utilize here is comprised of confirmed COVID-19 case counts from the Seattle Metropolitan Area, obtained by summing the rows corresponding to King, Pierce and Snohomish counties. This time series consists of 95 observations from January 22nd to April 25th of 2020. Plotting this time series below, we see a slow rise of COVID-19 cases in the first 50 days of its arrival in Seattle, and exponential growth thereafter, even though stay at home orders have been in effect in the state of Washington since March 12th. It is important to note these are cumulative case counts to date rather than only active case counts, as full reporting of recoveries and deaths are not available at this time.

2.1 Covariates: Contact Rate and Population Stratification by age

We will use a framework based on the one proposed by Matrajt and Leung 2020 to model the spread of COVID-19 in the Seattle Metro Area. In their analysis, Matrajt and Leung calibrated ten SEIR models for individual age groups: 0–5, 6–9, 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79, and > 80 years old. The force of infection (\(\lambda\) or \(\mu_{IR}\)) for each age group model was based on the average inter-group contact matrix (estimated by Wallinga et al. 2006) normalized by the percentage of the population in each contact age group. The details of the role of the covariates in the model will be described in more detail in Section 3. These covariates are visualized below.

(Groups for the contact matrix are ordered such that Group 1 = 0-5 years, …, Group 10 = 80+ years)

3. SEIR Model Design

We develop an POMP compartment framework to model the spread of COVID-19 in the Seattle population. This model consists of four distinct stages:

S: susceptible (all individuals)
E: exposed (asymptomatic)
I: infected (symptomatic)
R: removed (recovered or deceased)

Flow Diagram

3.1 ODE Representation

\[\frac{dS}{dt}= -\lambda S(t)\] \[\frac{dE}{dt}=\lambda S(t) - \sigma E(t)\] \[\frac{dI}{dt}=\sigma E(t) - \gamma I(t)\] \[\frac{dR}{dt}=\gamma I(t)\]

3.2 Defining Transmission Processes and Parameter values

Matrajt and Leung originally define the force of infection (\(\lambda_i\)) (i.e. the rate of moving from state \(S\) to state \(E\)) for each age class \(i\) as:

\[ \lambda_i = \mu_{SE_{i}} = \beta \sum_{j=1}^{10} \frac{c_{i,j}}{N_j}I_j \] where \(c_{i,j}\) is the estimated number of contacts per day between age class \(i\) and \(j\) in a total population size \(N\). In addition to our goal of replicating a version of Matrajt and Leung’s simulation study, we also wish to fit our POMP model to our available COVID-19 confirmed case time series, which, unfortunately, does not include any age group information and therefore precludes stratification of our data in this way. Instead, we computed a weighted average contact rate \(C\) for the population as a whole as follows:

\[ C = \sum_{i=1}^{10} \frac{N_i}{N} \sum_{j=1}^{10} \frac{c_{i,j}}{N_j} = 0.0000493 \]

Based on this weighted average contact rate, we define the force of infection \(\lambda\) in our model as:

\[ \lambda = \mu_{SE} = \beta C I_j \] We further define our parameters as follows:

\[ \sigma = \mu_{EI} \]

\[ \gamma = \mu_{IR} \] Following Matrajt and Leung, we model \(\frac{1}{\sigma}\) as the mean latent period of the disease (range: 4.5-5.8 days, mean: 5.16 days) [Lauer et al. 2020] and \(\frac{1}{\gamma}\) as the mean infectious period of the disease (range: 3-9 days, mean: 5.02 days) [Bi et al. 2020].

Lastly, the \(\beta\) parameter represents the transmission coefficient of COVID-19 in the population, and depends both on \(\gamma\) and the basic reproduction number \(R_0\), which translates to the number of secondary cases stemming from a primary case of disease. We vary \(\beta\) to fit the confirmed case time series, and can use our final \(\beta\) value to estimate the COVID-19 \(R_0\) in this popultion.

As in Matrajt and Leung, we fix parameter values for the total population \(N\) = 3.5 million and for the reporting rate \(\rho\) at 0.2. This low value of \(\rho\) is chosen as early research shows as many as 80% of COVID-19 cases are mild and are likely to go unreported in the United States, where widespread testing is currently unavailable. Ideally it would be preferrable to vary \(\rho\) along with the other parameters, but for simplicity and to follow the framework given by Matrajt and Leung, we will hold it fixed here.

3.3 Measurement Model

We model our case count data as a Poisson process:

\[ cases_t \sim Pois (\rho)\]

3.4 Model Setup

cov_statenames <- c("S","E","I","R")
cov_paramnames <- c("Beta","N","rho","mu_EI","mu_IR")
cov_obsnames <- "B"

cov_dmeasure <- "
  lik = dpois(B,rho*R+1e-6,give_log);
"

cov_rmeasure <- "
  B = rpois(rho*R+1e-6);
"

cov_rprocess <- "

  double dN_SE = rbinom(S,1-exp(-Beta*C*I*dt));
  double dN_EI = rbinom(E,1-exp(-dt*mu_EI));
  double dN_IR = rbinom(I,1-exp(-dt*mu_IR));

  S -= dN_SE;
  E += dN_SE - dN_EI;
  I += dN_EI - dN_IR;
  R += dN_IR;
"
cov_rinit <- "
 S = nearbyint(N)-1;
 E = 0;
 I = 1;
 R = 0;
 "

4. Simulation Study

4.1 Simulation setup

We begin our simulation study by creating covariates that model the reduction in average contact rate \(C\) within our population. We model 4 scenarios here: 0% reduction (no change), 50% reduction, 75% reduction and 90% reduction in average contact rate beginning at Day 51 in our time series, corresponding to the start of the stay at home mandate on March 12. Currently, this stay at home order expires May 4th, possibly leaving the population of Seattle vulnerable to a resurgence peak in COVID-19 cases. To simulate what may happen upon this re-opening, we extend the length of our time series from 95 time points to 155 time points, enabling simulation of 52 days of reduced contact rate under the social distancing mandate and 52 days at normal contact rates after the mandate expires. While we understand that this is an extreme case, as it is likely that the re-opening will be gradual and most individuals still try to reduce their non-essential contacts upon lifting of the social distancing ordinance, we wanted to simulate the case of an immediate return to “normalcy”, which is what some people are vehemently advocating for.

# extend TS to forecast through 7 weeks after stay at home ordinance lifted
day = 96:155
B = rep(NA, length(day))
added = data.frame(day, B)
data=rbind(data, added)

# setting up contact rate covariates for various contact reduction percentages
# weighted avg contacts by age group

# no change in contact rate before March 12
data$C1 = sum(pop*lambdas)
data$C50 = sum(pop*lambdas)
data$C75 = sum(pop*lambdas)
data$C90 = sum(pop*lambdas)
# stay at home order begins March 12
data$C50[data$day > 50] = sum(pop*lambdas)*0.5
data$C75[data$day > 50] = sum(pop*lambdas)*0.25
data$C90[data$day > 50] = sum(pop*lambdas)*0.1
# stay at home order lifts May 4, back to previous contact rate
data$C50[data$day > 103] = sum(pop*lambdas)
data$C75[data$day > 103] = sum(pop*lambdas)
data$C90[data$day > 103] = sum(pop*lambdas)

cov_covar1 <- covariate_table(
  C=data$C1,
  t=data$day,
  times="t"
)

cov_covar50 <- covariate_table(
  C=data$C50,
  t=data$day,
  times="t"
)
cov_covar75 <- covariate_table(
  C=data$C75,
  t=data$day,
  times="t"
)
cov_covar90 <- covariate_table(
  C=data$C90,
  t=data$day,
  times="t"
)

4.2 Building POMP objects

cov_pomp1 <- pomp(
  data=subset(data,select=c(day,B)),
  times="day",
  t0=0,
  rprocess=euler(
    step.fun=Csnippet(cov_rprocess),
    delta.t=1/12
  ),
  rmeasure=Csnippet(cov_rmeasure),
  dmeasure=Csnippet(cov_dmeasure),
  partrans=parameter_trans(
    log=c("Beta","N","mu_EI","mu_IR"),
    logit="rho"),
  covar=cov_covar1,
  obsnames = cov_obsnames,
  statenames=cov_statenames,
  paramnames=cov_paramnames,
  rinit=Csnippet(cov_rinit)
)

cov_pomp50 <- pomp(
  data=subset(data,select=c(day,B)),
  times="day",
  t0=0,
  rprocess=euler(
    step.fun=Csnippet(cov_rprocess),
    delta.t=1/12
  ),
  rmeasure=Csnippet(cov_rmeasure),
  dmeasure=Csnippet(cov_dmeasure),
  partrans=parameter_trans(
    log=c("Beta","N","mu_EI","mu_IR"),
    logit="rho"),
  covar=cov_covar50,
  obsnames = cov_obsnames,
  statenames=cov_statenames,
  paramnames=cov_paramnames,
  rinit=Csnippet(cov_rinit)
)

cov_pomp75 <- pomp(
  data=subset(data,select=c(day,B)),
  times="day",
  t0=0,
  rprocess=euler(
    step.fun=Csnippet(cov_rprocess),
    delta.t=1/12
  ),
  rmeasure=Csnippet(cov_rmeasure),
  dmeasure=Csnippet(cov_dmeasure),
  partrans=parameter_trans(
    log=c("Beta","N","mu_EI","mu_IR"),
    logit="rho"),
  covar=cov_covar75,
  obsnames = cov_obsnames,
  statenames=cov_statenames,
  paramnames=cov_paramnames,
  rinit=Csnippet(cov_rinit)
)

cov_pomp90 <- pomp(
  data=subset(data,select=c(day,B)),
  times="day",
  t0=0,
  rprocess=euler(
    step.fun=Csnippet(cov_rprocess),
    delta.t=1/12
  ),
  rmeasure=Csnippet(cov_rmeasure),
  dmeasure=Csnippet(cov_dmeasure),
  partrans=parameter_trans(
    log=c("Beta","N","mu_EI","mu_IR"),
    logit="rho"),
  covar=cov_covar90,
  obsnames = cov_obsnames,
  statenames=cov_statenames,
  paramnames=cov_paramnames,
  rinit=Csnippet(cov_rinit)
)

4.3 Running and Plotting Simulations

We set the following parameter values according to Matrajt and Leung for the simulations:

\[ \mu_{EI} = \frac{1}{5.16} \] \[ \mu_{IR} = \frac{1}{5.02} \] \[ \rho = 0.2 \] \[ N = 3,500,000 \] We vary \(\beta\) until the distancing simulations approximate the real data, then hold the \(\beta\) value constant across all four distancing scenarios to generate the simulations. We plot the full scale of the simulations, as well as a zoomed version to better visualize the trajectories of the various simulations against the true time series.

sims1 <- simulate(cov_pomp1,params=c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000),
                  nsim=50,format="data.frame",include=TRUE)
sims50 <- simulate(cov_pomp50,params=c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000),
                   nsim=50,format="data.frame",include=TRUE)
sims75 <- simulate(cov_pomp75,params=c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000),
                   nsim=50,format="data.frame",include=TRUE)
sims90 <- simulate(cov_pomp90,params=c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000),
                   nsim=50,format="data.frame",include=TRUE)
sims1$distancing = 0; sims50$distancing = .50; sims75$distancing = 0.75; sims90$distancing = .90
sims1 = sims1[156:length(sims1$day),]
sims50 = sims50[156:length(sims50$day),]
sims75 = sims75[156:length(sims75$day),]
sims90 = sims90[156:length(sims90$day),]
plot_data = data[,c(1,2)]; plot_data$.id = NA
plot_data$distancing = 'Real Data'; plot_data$S = NA; plot_data$E = NA; 
plot_data$I= NA; plot_data$R = NA; plot_data$C = NA

all_sims =do.call("rbind", list(plot_data, sims1, sims50, sims75, sims90))

ggplot(all_sims,mapping=aes(x=day,y=B,group=interaction(.id, distancing),color=distancing))+
  geom_rect(xmin=96, xmax=155, ymin=-25000, ymax=10, fill = "grey90", color="grey90")+
  geom_line( size = 1.5, alpha = 0.5)+ 
  geom_line(data=plot_data, aes(x=day, y=B), color = "#386cb0", size=2)+
  geom_vline(xintercept = 51, color = 'red', linetype='dashed') +
  annotate("text", x=51, y=500000, label="\nStay at Home Order Begins", angle = 90, size=4) +
  geom_vline(xintercept = 104, color = 'red', linetype='dashed') +
  annotate("text", x=101, y=500000, label="Stay at Home Order Ends", angle = 90, size=4) +
  annotate("text", x=125, y=-12500, label="Future Projection", size=4, fontface = 'italic') +
  scale_color_brewer(palette = "Accent") +
  ggtitle("SEIR Simulations of COVID-19 Spread in Seattle, WA") +
  xlab("Days Since First Case") + ylab("COVID-19 Cases")+
  ylim(-25000, NA) + labs(color='Contact Reduction (%)')

ggplot(all_sims,mapping=aes(x=day,y=B,group=interaction(.id, distancing),color=distancing))+
  geom_rect(xmin=96, xmax=155, ymin=-1000, ymax=10, fill = "grey90", color="grey90")+
  geom_line( size = 1.5, alpha = 0.5)+ 
  geom_line(data=plot_data, aes(x=day, y=B), color = "#386cb0", size=2)+
  geom_vline(xintercept = 51, color = 'red', linetype='dashed') +
  annotate("text", x=51, y=10000, label="\nStay at Home Order Begins", angle = 90, size=4) +
  geom_vline(xintercept = 104, color = 'red', linetype='dashed') +
  annotate("text", x=101, y=10000, label="Stay at Home Order Ends", angle = 90, size=4) +
  annotate("text", x=125, y=-700, label="Future Projection", size=4, fontface = 'italic') +
  scale_color_brewer(palette = "Accent") +
  ggtitle("SEIR Simulations of COVID-19 Spread in Seattle, WA (Zoomed)") +
  xlab("Days Since First Case") + ylab("COVID-19 Cases")+
  ylim(-1000,20000) + labs(color='Contact Reduction (%)')

4.4 Simulation Results

We can learn a great deal about the relationship between social distancing and the dynamics of COVID-19 spread via this simulation study. Firstly, we very clearly see the effects of contact reduction on reducing the spread of disease, or in other words “flattening the curve”. Furthermore, no matter how effectively the population reduced their contact rates during the 7-week stay at home order, we find evidence for a large resurgence of cases if we were to resume “normal” contact rates after the stay at home order is lifted on May 4th. These results suggest that 1) the stay at home order should be extended beyond the current May 4th date and 2) rates of non-essential contacts should still be minimized whenever the stay at home orders are lifted, and only gradually increased rather than immediately returning to pre-pandemic levels. Without these precautions, it is clear that the rates of infection will undoubtedly skyrocket and overwhelm the healthcare system in the weeks to come.

Furthermore, we notice that the real data seems to lie between the 50% and 75% contact reduction simulation trajectories. Based on this, we estimate the true amount of contact reduction to be 60%, and use this setting for calibration of the model to the true data.

data = data[1:95,]
data$C60 = sum(pop*lambdas)
data$C60[data$day > 50] = sum(pop*lambdas)*0.4

cov_covar60 <- covariate_table(
  C=data$C60,
  t=data$day,
  times="t"
)

cov_pomp60 <- pomp(
  data=data,
  times="day",
  t0=0,
  rprocess=euler(
    step.fun=Csnippet(cov_rprocess),
    delta.t=1/12
  ),
  rmeasure=Csnippet(cov_rmeasure),
  dmeasure=Csnippet(cov_dmeasure),
  partrans=parameter_trans(
    log=c("Beta","N","mu_EI","mu_IR"),
    logit="rho"),
  covar=cov_covar60,
  obsnames = cov_obsnames,
  statenames=cov_statenames,
  paramnames=cov_paramnames,
  rinit=Csnippet(cov_rinit)
)

5. Local Maximization

For both local and global maximization, we will be using the iterative filtering (IF2) algorithm implemented in the pomp package. Our starting values will be those given by Matrajt and Leung for all parameters but \(\beta\), which we determined the start value for via the simulation study above. The purpose of the local search is for a more efficient global search.

We will carry out the parallelized local maximization using our local machine at run_level = 2 to obtain an understanding of the likelihood over the parameter space in a relatively short amount of time. We will later run the global search locally at run_level = 2, as well as on the Great Lakes cluster at run_level = 3 for a more robust search of the parameter space.

run_level <- 2
cov_Np <-      switch(run_level, 100, 20000, 50000)
cov_Nmif <-    switch(run_level,  10,   100,   250)
cov_Neval <-   switch(run_level,  10,    10,    10)
cov_Nglobal <- switch(run_level,  10,    10,    50)
cov_Nlocal <-  switch(run_level,  10,    10,    20)

cov_mle = c(Beta=0.0034,mu_IR=1/5.02,mu_EI=1/5.16,rho=0.2,N=3500000)

cov_rw.sd <- 0.02; cov_cooling.fraction.50 <- 0.5
stew(file=sprintf("anotherlocal_search-%d.rda",run_level),{
  t_local <- system.time({
    mifs_local <- foreach(i=1:cov_Nlocal,
                          .packages='pomp', .combine=c) %dopar%  {
                            mif2(cov_pomp60,
                                 params=cov_mle,
                                 Np=cov_Np,
                                 Nmif=cov_Nmif,
                                 cooling.fraction.50=cov_cooling.fraction.50,
                                 rw.sd=rw.sd(
                                   Beta=cov_rw.sd,
                                   mu_IR=cov_rw.sd,
                                   mu_EI=cov_rw.sd)
                            )
                          }
  })
},seed=900242057,kind="L'Ecuyer")


## ----lik_local_eval,cache=FALSE------------------------------------------
stew(file=sprintf("anotherlik_local-%d.rda",run_level),{
  t_local_eval <- system.time({
    liks_local <- foreach(i=1:cov_Nlocal,
                          .combine=rbind,.packages='pomp')%dopar% {
                            evals <- replicate(cov_Neval, logLik(
                              pfilter(cov_pomp60,params=coef(mifs_local[[i]]),Np=cov_Np)))
                            logmeanexp(evals, se=TRUE)
                          }
  })
},seed=900242057,kind="L'Ecuyer")

results_local <- data.frame(logLik=liks_local[,1],
                            logLik_se=liks_local[,2],t(sapply(mifs_local,coef)))


## ----lik_local_summary---------------------------------------------------
summary(results_local$logLik,digits=5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1723.6 -1653.2  -814.5 -1118.1  -734.9  -685.3

pairs(~logLik+Beta+mu_IR+mu_EI,
      data=results_local)

6. Global Maximization

We determined the box for the global search based on 1) the results of the local search and 2) the acceptable value ranges suggested by prior scientific work in Matrajt and Leung. The starting values in the global search will be randomly sampled from these ranges and preturbed over the the number of MIF iterations cov_Nmif.

6.1 Global Search at run_level=2

run_level <- 2
cov_Np <-      switch(run_level, 100, 20000, 50000)
cov_Nmif <-    switch(run_level,  10,   100,   250)
cov_Neval <-   switch(run_level,  10,    10,    10)
cov_Nglobal <- switch(run_level,  10,    10,    50)
cov_Nlocal <-  switch(run_level,  10,    10,    20)

cov_fixed_params <- c(rho=0.2, N=3500000)

cov_box <- rbind(
  Beta=c(0.001, 0.01),
  mu_IR=c(1/9,1/3),
  mu_EI = c(1/5.8, 1/4.5)
)

cov_rw.sd <- 0.02
cov_cooling.fraction.50 <- 0.5
stew(file=sprintf("andyetanother_box_eval-%d.rda",run_level),{
  t_global <- system.time({
    mifs_global <- foreach(i=1:cov_Nglobal,
                           .packages='pomp', .combine=c) %dopar% mif2(     
                             cov_pomp60,
                             params=c(apply(cov_box,1,function(x)runif(1,x[1],x[2])),
                                      cov_fixed_params),
                             Np=cov_Np,
                             Nmif=cov_Nmif,
                             cooling.fraction.50=cov_cooling.fraction.50,
                             rw.sd=rw.sd(
                               Beta=cov_rw.sd,
                               mu_IR=cov_rw.sd,
                               mu_EI=cov_rw.sd
                             )
                           )
  })
})

stew(file = sprintf("andyetanother_lik_global_eval-%d.rda", run_level), {
  t_global_eval <- system.time({
    liks_global <- foreach(i = 1:cov_Nglobal,
                           .packages = 'pomp',
                           .combine = rbind) %dopar% {
                             evals <- replicate(cov_Neval,
                                                logLik(pfilter(
                                                  cov_pomp60, params = coef(mifs_global[[i]]), Np = cov_Np
                                                )))
                             logmeanexp(evals, se = TRUE)}
  })
})

results_global <- data.frame(logLik = liks_global[, 1],
                             logLik_se = liks_global[, 2], t(sapply(mifs_global, coef)))
summary(results_global$logLik, digits = 5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1741.3 -1715.7 -1668.1 -1347.4  -828.0  -749.5

plot(mifs_global)

results_global[which.max(results_global[,"logLik"]),]

##             logLik logLik_se        Beta        mu_IR    mu_EI rho       N
## result.3 -749.4675  1.227771 0.002153942 0.0004243604 1.959094 0.2 3500000

6.2 Global Search at run_level=3 (GreatLakes Cluster)

run_level <- 3
cov_Np <-      switch(run_level, 100, 20000, 50000)
cov_Nmif <-    switch(run_level,  10,   100,   250)
cov_Neval <-   switch(run_level,  10,    10,    10)
cov_Nglobal <- switch(run_level,  10,    10,    50)
cov_Nlocal <-  switch(run_level,  10,    10,    20)

load('anotherbox_eval-3.rda')
load('anotherlik_global_eval-3.rda')

results_global <- data.frame(logLik = liks_global[, 1],
                             logLik_se = liks_global[, 2], t(sapply(mifs_global, coef)))
summary(results_global$logLik, digits = 5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1865.2  -858.1  -766.8  -858.6  -702.1  -656.7

results_global[which.max(results_global[,"logLik"]),]

##             logLik logLik_se        Beta        mu_IR     mu_EI rho
## result.10 -656.742 0.6325675 0.005230596 0.0003667853 0.1674002 0.2
##                 N
## result.10 3500000

#plot(mifs_global)

6.3 Comparing Global Results from run_levels 2 and 3

We observe that the convergence across the board at run_level=2 is quite bad, suggesting that a more robust search is needed, or perhaps the model does not capture the true dynamic process, or, very likely, both of these are true. Unfortunately, the mifs_global object pulled down from our run_level=3 global maximization on the Great Lakes cluster are missing parameter-specific plots and due to time constraints we were unable to re-run the global search at this level to fix this. However, we do notice that the convergence of the plots that did render at this run level is much better than those at run_level=2, suggesting a more robust search is needed to properly identify the MLE of this dynamic system. Effective sample size is good (> 100) across the board, though it does exhibit several dips at time points > 40.

When we run the global maximization at run_level=2 locally on the laptop we find the max(logLik)= -749.5, and when we run it on the cluster at run_level=3 we find a better maximization at max(logLik)=-656.7. Taking a deeper look at the parameter values at the max(logLik), we find the estimate for \(\mu_{EI}\) to be very reasonable, corresponding to a mean latent period of ~5.9 days, however the estimate for \(\mu_{IR}\) is completely unreasonable, suggesting a mean infectious period of 2726.391 days. We will comment on this more in Section 7.

Simulating at the parameter vector corresponding to the global MLE from run_level = 3 does indeed match the true data quite well.

gsims60 <- simulate(cov_pomp60,params=c(Beta=0.005230596,mu_IR=0.0003667853,mu_EI=0.1674002,rho=0.2,N=3500000),
                   nsim=20,format="data.frame",include=TRUE)
gsims60 =gsims60[96:length(gsims60$day),]
gsims60$distancing = "Simulated at MLE";
gsims60 = gsims60[,c("day","B",".id","distancing")]
plot_data = data[,c(1,2)]; plot_data$.id = NA
plot_data$distancing = 'Real Data'


all_sims =do.call("rbind", list(plot_data, gsims60))

ggplot(all_sims,mapping=aes(x=day,y=B,group=interaction(.id, distancing),color=distancing))+
  geom_line( size = 1.5, alpha = 0.5)+
  geom_line(data=plot_data, aes(x=day, y=B), color = "#e41a1c", size=2)+
  geom_vline(xintercept = 51, color = 'red', linetype='dashed') +
  annotate("text", x=51, y=8000, label="\nStay at Home Order Begins", angle = 90, size=4) +
  geom_vline(xintercept = 104, color = 'grey', linetype='dashed') +
  scale_color_brewer(palette = "Set1") +
  ggtitle("SEIR Simulations of COVID-19 Spread in Seattle, WA") +
  xlab("Days Since First Case") + ylab("COVID-19 Cases")+
  labs(color='Data Set')

7. Conclusions and future directions

We investigate an SEIR framework for modeling the dynamics of COVID-19 spread in the Seattle Metro area from January 22 - April 25, 2020. We simulate 4 different social distancing scenarios and show that lifting stay at home orders as planned on May 4th is likely to result in strong resurgence of COVID-19, regardless of the level of aherence to social distancing mandates over the last 7 weeks.

We also fit this SEIR model to confirmed case-count data over the same time span using a baseline contact reduction of 60% during the social distancing ordinance. By performing global maximization via iterative filtering, we identify the MLE parameter vector that models the time series quite well in a simulation study, but identifies an unreasonable estimate of the mean infectious period of the disease. This may happen for several reasons:

Weak Identifiability: weak identifiability means trying to fit a large model to a small amount of data. Since the COVID-19 pandemic is ongoing and has only arrived in the U.S. on January 21, we are limited to a shorter time series than is ideal, especially when dealing with an \(N\) on the scale of 3.5 million as we are here.
Model Design Flaws: we have designed a fairly simple POMP framework to model the dynamics of COVID-19, and perhaps the model could be improved. For example, we discussed how COVID-19 is famously highly transmissible between individuals in the pre-symptomatic Exposed stage, but many of the individuals that do become exposed will not move into the Infected compartment and will instead move straight to the Recovered compartment, so accounting for this with a \(\mu_{ER}\) parameter may be beneficial. Furthermore, it is important to note that this simple SEIR framework is meant to model those with active infections, but our data contains cumulative case counts. If better data becomes available that accurately tracks recoveries and deaths from confirmed cases this SEIR model could be applied more appropriately.
Starting Parameter Values: we found that our MLE estimate for \(\mu_{IR}\) is highly implausible, so we can also expect that our \(\beta\) parameter is also implausible as we know \(beta\) and \(\mu_{IR}\) are related through \(R_0\), the basic reproductive number. \(R_0\) for COVID-19 is estimated somewhere between 1.6-3, with the highest current estimate at 5.7 (which is considered exceptionally high). By our MLE parameters, we estimate \(R_0\) to be 14.3, which is highly implausible considering these prior estimates. Based on these estimates of \(R_0\) and \(\mu_{IR}\), a better range to test for \(beta\) may be 0.17 - 1.9 (\(\frac{1.6}{9}\) to \(\frac{5.7}{3}\)). We did not start in this box because in our simulation studies these values did not yield results close to the true time series.

Future directions for this work would be to repeat this analysis once more data is available, and to develop a more robust framework that 1) models reduction in contact rates on a more gradual scale rather than the simplistic “all or nothing” approach take here and 2) more adequately captures the latent exposure state and its relation to both the susceptible (highly contagious even in the pre-symptomatic state) and recovered states (bypassing infection altogether). Taking this together with the improved range for plausible \(\beta\) values may improve the results.

8. References

Google COVID-19 Statistics: Total Cases Worldwide (Accessed April 26, 2020). https://g.co/kgs/RbvsQs
How some cities ‘flattened the curve’ during the 1918 flu pandemic. (National Geographic). https://www.nationalgeographic.com/history/2020/03/how-cities-flattened-curve-1918-spanish-flu-pandemic-coronavirus/#close
Matrajt and Leung 2020. https://www.medrxiv.org/content/10.1101/2020.03.27.20044891v2.full.pdf
Data Source: Johns Hopkins University Center for Systems Science and Engineering (Last pulled April 26, 2020). (https://github.com/CSSEGISandData/COVID-19)
Washington State stay at home order. https://www.governor.wa.gov/sites/default/files/20-07%20Coronavirus%20%28tmp%29.pdf
Wallinga et al. 2006. https://www.ncbi.nlm.nih.gov/pubmed/16968863
Seattle Metro Area Census Data. https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml
Lauer et al. 2020 http://www.ncbi.nlm.nih.gov/pubmed/32150748.
Bi et al. 2020 https://doi.org/10.1101/2020.03.03.20028423.
The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) – China, 2020. China CDC Weekly. 2020;2(8):113–122. http://weekly.chinacdc.cn/en/ article/id/e53946e2-c6c4-41e9-9a9b-fea8db1a8f51.
Washington State COVID-19 Response. https://coronavirus.wa.gov/whats-open-and-closed
High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2 https://wwwnc.cdc.gov/eid/article/26/7/20-0282_article

Flattening the Curve: A Case Study of COVID-19 Spread in Seattle, WA