SEIR Modeling on Interventation Strategies for COVID-19 in Hubei

1. Introduction
2. Data
- 2.1. Hubei Data
- 2.2. Data Plot
3. SEIR Model
4. Implementation
5. Results
6. Limitation of the Model
7. Reference

1. Introduction

1.1. COVID-19

The Coronavirus has spread out over the world and caused a lot of panic and stress to people in different countries. As an infectious disease, according to the description of WHO, this virus can be transmitted through droplets generated when an infected person coughs, sneezes, or exhales. Therefore the infectivity of this virus is extermely serious, and up to this moment there are more than two million confirmed cases in the whole world, and more than 200 thousand people die from this disease. The main goal of this project, is to learn from the published method raised by an academic team led by Childs and focus more on the behavior of the simulation on cases in Hubei, China, the first place where this disease outbreaks and raises the attention of people.

COVID-19 Prevention. Source: https://www.paramounthealthcare.com/news/what-you-should-know-about-coronavirus-disease-covid-19

1.2. COVID-19 in Hubei, China

The outbreak of CVOID-19 in Hubei, China starts from the end of January, 2020. Many people fell sick but could not go to hospitals for treatment at first due to the limitation of the medical resources. Therefore many lifes were taken by this virus and the situation went serious initially. The capital of Hubei, Wuhan government declared a lockdown policy on January 23, 2020 and many medical supports are gathered and provided to Hubei from other provinces in China since then. Now the situation turned much better and the most serious city, Wuhan has just declared there was no new hospitalization the other day. A series of powerful intervention strategies brought Hubei back to life.

1.3. Intervention Strategies

Intervention strategies actually play a vital role in this battle with COVID-19, and these interventions ican mpact dynamics and resultant number of cases especially fatalities through time series. Different invention strategies will have different resulted performances. According to Childs’ team, they consider three major invention strategies that might impace greatly on the COVID-19 cases. They include social distancing for a set duration, which might greatly affect the ability of transmission rate of this disease. The second invention strategy is the “social distancing triggered by the number of hospitalized individuals crossing a threshold”, in which case it considers the difference for the capacity of hospitals. The third one is “isolation of symptomatic individuals”, including policies issued by governments like lockdown and 14-day isolations, etc.

2. Data

2.1. Hubei Data

The data used for this simulation is the trace of complete official data in Hubei starting from December, 2019 up to now, April 26, 2020. The data set is actually from an embedded R package called from Github Link where the authors use this package to record the updated data in the world.

## Loading required package: nCov2019

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

##          time country province cum_confirm cum_heal cum_dead suspected
## 32 2020-01-01   China    Hubei          43        0        0         0
## 33 2020-01-02   China    Hubei          44        0        0         0
## 34 2020-01-03   China    Hubei          44        0        0         0
## 35 2020-01-04   China    Hubei          44        0        0         0
## 36 2020-01-05   China    Hubei          44        0        0         0
##    fetality_rate curr_confirm
## 32             0           43
## 33             0           44
## 34             0           44
## 35             0           44
## 36             0           44

2.2. Data Plot

A plot of the confirmed cases in Hubei shows as follows:

3. SEIR Model

3.1. Basic SEIR Model

Learned from lecture note 11, we know that SEIR has an additional Exposed stage on the basis of SIR model. The Exposed statge refers to a period of latency before becoming truly infectious. Since many diseases have a latent phase during which the individual is infected but not yet infectious, so this delay makes the SEIR model more applicable.

SEIR in lecture note. Source: https://ionides.github.io/531w20/11/notes11.pdf

3.2. Varient Interventions on SEIR Model

To add the three mentioned intervention strategies, we can make a modification on the basic SEIR model. Seen in the model diagram by Childs’s team below, we can see that several stages are added in order to make careful choices for each transition of stages.

SEIR in Childs’ team research. Source: https://covid-measures.github.io/

As we can see, there are totally nine compartments in this SEIR model. After the Exposed (E) stage, there are two directions for different groups of people.For people that are actually infected but remain asymptomatic for the entire infection, the stage is given as \(I_a\). And people who go on to become symptomatic first go through a pre-symptomatic and then into infectious state, the stage is called \(I_p\). People showing symptons are also divided into two groups, \(I_m\) for mild symptoms and \(I_s\) for severe symptons. For the people in the \(I_s\), they need to go to the hospital, and the hospitalization is represented in another stage \(H\). And the R stage is the recovered stage, people who have mild symptoms or after being treated might go to this stage. And lastly, the D represents the death stage.

3.3. Important Parameters

Since this model is rather complex, there are many parameters. \(\beta\), similar to the same parameter used in the SIR model, it is here used for measuring the transmission rate, and here it is closely related to the intervention level, which is an important parameter representing the three different types of intervention strategies. And soc_dist_level is used to regulate the level of social distancing. If no intervention measurement is taken, the \(\beta\) will be equal to the initial value \(\beta_0\). And the contact rates iso_m and iso_s are determined in the case whether isolation of symptomatic cases is in place. rateH is the rate of hospitalization, which can indicate the proportion of people who are in severe stage and able to go to the hospitals for treatment. rateE contains the rate of going to asymtomatic stage and presymptomatic stage correspondingly.

alpha is the asymptomatic fraction, while mu denotes the fraction of mild cases, delta is the fraction of hospitalized cases that are fatal. gamma is the rate at which exposed cases become infectious and four lambdas represent the rate of movements out of infectious stages. rho is the rate of movement out of hospitals. N is still the population size and E0 being the number of people who are initially exposed.

4. Implementation

covid_rprocess <- Csnippet("
  double betat;
  if(intervention  == 2 & thresh_crossed == 1){
      betat =  beta0*thresh_int_level; 
  }
  else if(intervention == 1) betat = beta0*soc_dist_level;
  else betat = beta0; 
  
  // contact rates:
  double iso_m = 1;
  double iso_s = 1;
  if(isolation == 1){
      iso_m = iso_mild_level;
      iso_s = iso_severe_level;
  }
  
  // transitions:
  double dSE = rbinom(S, 1-exp(-betat*(Ca*Ia/N + Cp*Ip/N + iso_m*Cm*Im/N + iso_s*Cs*Is/N)*dt)); 
  double rateE[2];
  double dE_all[2];
  rateE[0] = alpha*gamma; // going to asymtomatic
  rateE[1] = (1-alpha)*gamma; // going to presymptomatic
  reulermultinom(2, E, rateE, dt, &dE_all);
  double dEIa = dE_all[0];
  double dEIp = dE_all[1];
  double dIaR = rbinom(Ia, 1 - exp(-lambda_a*dt));
  double rateIp[2];
  double dIp_all[2];
  rateIp[0] = mu*lambda_p; // going to minor symptomatic
  rateIp[1] = (1-mu)*lambda_p; // going to sever symptomatic
  reulermultinom(2, Ip, rateIp, dt, &dIp_all);
  double dIpIm = dIp_all[0];
  double dIpIs = dIp_all[1];
  double dIsH = rbinom(Is, 1 - exp(-lambda_s*dt));
  double dImR = rbinom(Im, 1 - exp(-lambda_m*dt));
  double rateH[2];
  double dH_all[2];
  rateH[0] = delta*rho;
  rateH[1] = (1-delta)*rho;
  reulermultinom(2, H, rateH, dt, &dH_all);
  double dHD = dH_all[0];
  double dHR = dH_all[1];
                     
  // compartment updation:
  S  -= dSE; // susceptible 
  E  += dSE - dEIa - dEIp; // exposed
  Ia += dEIa - dIaR; // infectious and asymptomatic
  Ip += dEIp - dIpIs - dIpIm; // infectious and pre-symptomatic
  Is += dIpIs - dIsH; // infectious and severe symptoms (that will be hospitalized)
  Im += dIpIm - dImR; // infectious and minor symptoms
  H  += dIsH - dHD - dHR; // hospitalized
  R  += dHR + dImR + dIaR; // recovered
  D  += dHD; // fatalities
  sympt_new  +=  dIpIs + dIpIm;
  H_new += dIsH;
  if(intervention == 2 & H >= thresh_H_start) thresh_crossed = 1;
  else if(intervention == 2 & thresh_crossed == 1 & H < thresh_H_end) thresh_crossed = 0;
")

Specially, for initial set up, we can assume that every individual is in susceptible stage except the exposed people. And the simulation range follows the exact starting date in the data set, which starts from January 1, 2019. But I would like to simulate for a longer range, so the end date I selected was May 31, 2020.

covid_init <- Csnippet("
  S = N-E0;
  E = E0;
  Ia = 0;
  Ip = 0;
  Is = 0;
  Im = 0;
  H = 0;
  R = 0;
  D = 0;
  sympt_new = 0;
  H_new = 0;
  thresh_crossed = 0;
")

sim_start = as.Date("2020-01-01")
sim_end = as.Date("2020-05-30")
sim_length =  sim_end - sim_start
dat = data.frame(day= 1:sim_length, 
                 B = rep(0, sim_length))

We can roughly say that the start date of intervention in Hubei is around January 23, as we mentioned, the lockdown policy in Wuhan started. So there are 22 days before the intervention. Follow such strategy, create a covariate table for intervention. The starting threshhold in hospital selected should be 0 and I set the ending threshold for H stage to be number 1.

intervention = covariate_table(day= 1:(sim_length),
                               intervention = c(rep(1, as.Date("2020-01-23") - sim_start - 1),
                                                rep(0, 26),  
                                                rep(2, 60),
                                                rep(1, sim_length - (as.Date("2020-01-23")- sim_start - 1) - 86)), 
                               isolation = c(rep(0, as.Date("2020-01-23") - sim_start - 1),
                                             rep(1, sim_length - (as.Date("2020-01-23") - sim_start - 1))),
                               iso_severe_level = rep(0, sim_length),
                               iso_mild_level = rep(0.1, sim_length), 
                               soc_dist_level = rep(0.2, sim_length),
                               thresh_H_start = rep(40, sim_length),
                               thresh_H_end = rep(1, sim_length),
                               thresh_int_level = rep(0.2, sim_length),
                               order = "constant",
                               times = "day")

Since for this complex model, there are two many parameters to tune. Using log likelihood to find estimated values like what we did in the lecture notes remains a good idea, but it took extremely long time for finding all the values for these parameters. So for sack of simplicity, I follow the way in Childs’ codes, that give a fixed number based on some literature references, indicated in Childs’ method.

covid <- dat %>%
  pomp(
    time= "day",
    t0=1,
    covar = intervention,
    rprocess = euler(covid_rprocess, delta.t=1/6),
    rinit = covid_init,
    accumvars = c("sympt_new", "H_new"),
    paramnames=c("beta0", "alpha", "mu", "delta", "gamma", "Ca", "Cp", "Cs", "Cm",
                 "lambda_a", "lambda_s","lambda_m", "lambda_p", 
                 "rho", "N", "E0"), 
    statenames=c("S","E","Ia", 
                 "Ip","Is","Im",
                 "R", "H","D", 
                 "sympt_new", "H_new",
                 "thresh_crossed")
  )

covid_params <- c(
  beta0            = 1, 
  Ca               = 1, 
  Cp               = 1, 
  Cs               = 1, 
  Cm               = 1, 
  alpha            = 1/3, 
  gamma            = 1/5.2, 
  lambda_a         = 1/7, 
  lambda_s         = 1/4, 
  lambda_m         = 1/7, 
  lambda_p         = 1/0.5, 
  rho              = 1/10.7, 
  delta            = 0.2, 
  mu               = 19/20, 
  N                = 58500000, # Hubei population
  E0               = 43*20
)

By doing the simulation, we can get the first several columns of the simulated data, where most of stage numbers are given 0:

##       day    .id        S      E    Ia   Ip Is     Im        R     H
## 15141 141 median 58298523 1092.0 646.0 82.0 32 1231.5 196989.5 135.0
## 15142 142 median 58298359 1057.5 617.0 78.5 32 1192.0 197261.0 130.5
## 15143 143 median 58298192 1015.5 593.5 77.5 31 1150.0 197537.0 125.5
## 15144 144 median 58298047  971.5 575.5 74.5 29 1118.0 197786.0 120.0
## 15145 145 median 58297889  949.0 555.5 70.0 29 1087.5 198029.5 118.5
## 15146 146 median 58297744  923.0 539.5 70.0 27 1042.0 198265.5 116.0
## 15147 147 median 58297608  887.0 520.0 65.5 26 1013.0 198495.5 110.0
## 15148 148 median 58297462  854.5 500.0 63.0 25  979.5 198714.5 106.0
## 15149 149 median 58297342  826.0 479.5 62.5 26  952.0 198924.5 101.0
## 15150 150 median 58297210  803.0 459.5 59.0 25  915.0 199133.0  99.0
##            D sympt_new H_new thresh_crossed  B
## 15141 1295.0     142.0     8              1 NA
## 15142 1299.5     137.5     8              1 NA
## 15143 1302.0     131.0     8              1 NA
## 15144 1303.5     129.0     7              1 NA
## 15145 1305.5     124.5     7              1 NA
## 15146 1308.0     116.5     6              1 NA
## 15147 1310.0     117.0     6              1 NA
## 15148 1312.5     111.5     6              1 NA
## 15149 1317.0     107.5     6              1 NA
## 15150 1318.5     105.5     6              1 NA

5. Results

The results are shown by plotting the simulated results:

We can see the modeling has somewhere that gives a good fit. It well captures the peak point on around February 20 or so, which actually the true peak date of Hubei data is on February 18, and at the beginning part, the simulated curve can go perfectly with the similar trend with the true curve. However, there are two drawbacks for the fitting, the first one is the incorrect modeling after the peak since the confirm cases in Hubei decreases very fast and before April, the number of cases can be seen as a very low value. But the fitted model still gives about 10,000 cases at the beginning of April, so the modeling of after-peak is not correct. Also the peak values estimated is around 40,000, where the true value is beyond 50,000.

6. Limitation of the Model

There are several obvious limitations of this model, first, this model over-stresses the impact of interventions, but only sets three levels for distinguishing the interventions, which is far not enough. Actually, there are far more interventions when the COVID-19 cases happened after the peak date, since more factors should be measured and considered, for example, the consideration of medical support from other provinces. Another limitation is the measurement of N for this province, since there existed people flowing before the lockdown of Wuhan on January 23. And since the parameters are valued based on some existing researches in Childs’ project, some of the values are out-of-dated since the situation are changing all the time and the values raised by the articles might not be suitable for some countries. For later improvement, other measurement of intervention strategies should be considered and some initial values should be reconsidered.