1.Introduction

A novel coronavirus (CoV) named 2019-nCoV or 2019 novel coronavirus or COVID-19 by the World Health Organization (WHO) is in charge of the current outbreak of pneumonia that began at the beginning of December 2019 near in Wuhan City, Hubei Province, China. COVID-19 is a pathogenic virus. From the phylogenetic analysis carried out with obtainable full genome sequences, bats occur to be the COVID-19 virus reservoir, but the intermediate host(s) has not been detected till now. Right now, there are 141 millions cumulative comfirmed cases (April 19th) over all round world. Especially for United States, the number is more than 31.7 millions now (April 19th). More detailed report cases: {https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/} However, as the Covid-19 vaccines are taken by more and more people, the spreading of Covid-19 is reducing. The situation becomes better and better and one day this world will recover from the Covid-19.

In this project, I will analyse COVID-19 data of United States from 01/22/2020 to 04/10/2021. I will try to capture some properties of this disease and describe the process of transmission by deriving and applying different models for daily infected cases. The data set comes from IHME(http://www.healthdata.org/). The model used in this model includes SEIR, SEIQR and SECSDR.

2.Data Overview

3.SEIR

Learned from lecture note 11, we know that SEIR has an additional Exposed stage on the basis of SIR model. The Exposed statge refers to a period of latency before becoming truly infectious. Since many diseases have a latent phase during which the individual is infected but not yet infectious, so this delay makes the SEIR model more applicable.

According to what we learned from Lecture note 12, we know that SEIR has an additional exposed stage compared with the basis of SIR model. The exposed stage refers to the period of latency before being truly infected. Since many diseases have a latent phase, I wanna try to use the SEIR model to simulate the daily infected cases happening in US.

This model consists of four distinct stages:
S: susceptible (all individuals)
E: exposed (asymptomatic)
I: infected (symptomatic)
R: removed (recovered or deceased)

3.1 Constructing SEIR model below:

seir_step <- Csnippet("
  double dN_SE = rbinom(S,1-exp(-Beta*I/N*dt));
  double dN_EI = rbinom(E,1-exp(-mu_EI*dt));
  double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
  S -= dN_SE;
  E += dN_SE - dN_EI;
  I += dN_EI - dN_IR;
  R += dN_IR;
  H += dN_IR;
")

seir_rinit <- Csnippet("
  S = nearbyint(eta*N);
  E = 0;
  I = 1;
  R = nearbyint((1-eta)*N);
  H = 0;
  ")

dmeas <- Csnippet("
double tol = 1.0e-25;
double mean_cases = rho*H;
double sd_cases = sqrt(mean_cases*mean_cases);

if(Infected > 0.0){
lik = pnorm(Infected,mean_cases,sd_cases,1,0)- pnorm(Infected-0.5,mean_cases,sd_cases,1,0)
+ tol;
} else{
lik = pnorm(Infected+0.5,mean_cases,sd_cases,1,0) +tol;
}
if (give_log) lik = log(lik);

")
rmeas <- Csnippet("

Infected = rnorm(rho*H, sqrt(rho*H ) );
if (Infected > 0.0) {
Infected = nearbyint(Infected);
} else {
Infected = 0.0;
}

")

measSEIR = dat %>%
  pomp(
    times="Day",t0=0,
    rprocess=euler(seir_step,delta.t=1),
    rinit=seir_rinit,
    rmeasure=rmeas,
    dmeasure=dmeas,
    accumvars="H",
    partrans=parameter_trans(
      log=c("Beta","tau","mu_EI","mu_IR"),
      logit=c("rho","eta")
    ),
    paramnames=c("N","Beta","mu_EI","mu_IR","rho","eta","tau"),
    statenames=c("S","E","I","R","H")
  )

3.4 Simulated result

4.SECSDR

The SECSDR model designed by the others are shown in the above pictures. There are 6 compartment and 1 measurement:
The S compartment stands for the susceptible population.
The E compartment stands for the exposed population.
The Ca compartment stands for the virus carriers, which means they have already been infected, but there is no symptom yet.
The Sy compartment stands for the symptomed patients.
The Di compartment stands for the Diagnosed patients for one specific day.
The R compartment, as in the SEIR model, stands for the recovered population containing both dead and recovered patients (generally the patients who losed infectiousness).
The Obs stands for observation.

4.1 Constructing SECSDR model below:

covid_rprocess <- "
  double dN_SE = rbinom(S,1-exp((-Beta1*Ca-Beta2*Sy)*dt));
  double dN_ECa = rbinom(dN_SE,1-exp(-dt*mu_ECa));
  double dN_CaR = rbinom(Ca,1-exp(-dt*mu_CaR));
  double dN_CaSy = rbinom(Ca-dN_CaR,1-exp(-dt*mu_CaSy));
  double dN_SyR = rbinom(Sy,1-exp(-dt*mu_SyR));
  double dN_SyDi = rbinom(Sy-dN_SyR,1-exp(-dt*mu_SyDi));
  S -= dN_ECa;
  Ca += dN_ECa - dN_CaSy - dN_CaR;
  Sy += dN_CaSy - dN_SyR-dN_SyDi;
  R += dN_CaR+dN_SyR+dN_SyDi;
  Di = dN_SyDi;
"

covid_dmeasure <- "
  lik = dnorm(Infected,Di,rho*Di+1e-10,give_log);
"
covid_rmeasure <- "
  Infected = rnorm(Di,rho*Di+1e-10);
"
covid_rinit <- "
 S=328000000;
 Ca=10;
 Sy=10;
 Di=1;
 R=0;
"

covid_statenames <- c("S","Ca","Sy","R","Di")
covid_paramnames <- c("Beta1","Beta2","mu_ECa","mu_CaSy","mu_CaR","mu_SyR","mu_SyDi","rho")

covid2 <- pomp(
  data=subset(n1,select=c(Infected,Day)),
  times="Day",
  t0=0,
  rprocess=euler(
    step.fun=Csnippet(covid_rprocess),
    delta.t=1/12),
  rmeasure=Csnippet(covid_rmeasure),
  dmeasure=Csnippet(covid_dmeasure),
  partrans=parameter_trans(
    log=c("Beta1","Beta2","mu_ECa","mu_CaSy","mu_CaR","mu_SyR","mu_SyDi","rho")),
  statenames=covid_statenames,
  paramnames=covid_paramnames,
  rinit=Csnippet(covid_rinit)
)

4.2 Global search

## The top five log likelihood:
## result.30 result.33  result.4  result.9 result.44 
## -3508.882 -3858.863 -3954.423 -3996.610 -4029.691
## The corresponding optimized parameters:
##          .id
## parameter         [,1]         [,2]         [,3]         [,4]         [,5]
##   Beta1   0.0002597906 0.0009496918 0.0002994381 5.389932e-04 9.992436e-05
##   Beta2   0.0009261728 0.0005120977 0.0009540954 5.880744e-05 5.314571e-04
##   mu_ECa  0.0037479141 0.0025738955 0.0008613584 6.150075e-03 6.977596e-03
##   mu_CaSy 0.0018706253 0.0023598408 0.0083074660 8.760092e-03 1.987399e-03
##   mu_CaR  0.0056091869 0.0012327049 0.0050358881 3.590565e-03 5.832977e-03
##   mu_SyR  0.0078188836 0.0097305889 0.0030203769 1.469972e-03 1.330731e-03
##   mu_SyDi 0.0035883509 0.0083800862 0.0059770086 4.989204e-04 4.613800e-03
##   rho     0.4751177113 0.4508850702 0.3452309076 3.090944e-01 4.374100e-01

4.3 Simulated result

5.SEIQR

SEIQR model: Susceptible-Exposed-Infectious-Quarantine-Removed and/or Recovered. S,I,R are defined like the above SIR model, E refers to the state that individuals are infected but not infectious and Q refers to quarantine state. The plot is shown below:

Most definitions of parameters are similar as before:
SI(t)= \(\beta\) * I(t): transmiting rate from S to E, where \(\beta\) is the contact rate.
I: transmiting rate from E to I.
R1: transmiting rate from I to Q.
R2: transmiting rate from Q to R.
Á: reporting rate.
N is population size with N = S + E + I + Q + R and we consider it is fixed.

5.1 Constructing SEIQR model below:

covid_rprocess <- "
  double t1 = rbinom(S,1-exp(-Beta*I*dt));
  double t2 = rbinom(E,1-exp(-dt*mu_I));
  double t3 = rbinom(I,1-exp(-dt*mu_R1));
  double t4 = rbinom(Q,1-exp(-dt*mu_R2));
  S -= t1;
  E += t1 - t2;
  I += t2 - t3;
  Q += t3 - t4;
  R += t4;
"

covid_dmeasure <- "
  lik = dnorm(Infected,Q,rho*Q+1e-10,give_log);
"
covid_rmeasure <- "
  Infected= rnorm(Q,rho*Q+1e-10);
"
covid_rinit <- "
  S=N-2;
  E=1;
  I=1;
  Q=0;
  R=0;
"

covid_statenames <- c("S","E","I","Q","R")
covid_paramnames <- c("Beta","rho","mu_I","mu_R1","mu_R2","N")
fixed <- c(N=32000000)
covid2 <- pomp(
  data=subset(n1,select=c(Infected,Day)),
  times="Day",
  t0=0,
  rprocess=euler(
    step.fun=Csnippet(covid_rprocess),
    delta.t=1/12),
  rmeasure=Csnippet(covid_rmeasure),
  dmeasure=Csnippet(covid_dmeasure),
  partrans=parameter_trans(
    log=c("Beta","mu_I","mu_R1","mu_R2","rho")),
  statenames=covid_statenames,
  paramnames=covid_paramnames,
  rinit=Csnippet(covid_rinit)
)

I tried to simulate the Infected case by myself. It is hard to simulate the trend after many times experiments.

5.2 Global search

After global search results, we cannot still build a good model with appropriate parameter to simulate the US Infected cases number.

## The top five log likelihood:
## result.46  result.5  result.4  result.9 result.35 
## -7125.013 -7269.135 -7282.524 -7524.856 -8146.788
## The corresponding optimized parameters:
##          .id
## parameter         [,1]         [,2]         [,3]         [,4]         [,5]
##     Beta  3.780813e+00 5.759627e+00 2.994381e+00 5.389932e+00 9.294188e+00
##     mu_I  1.045713e-02 1.760666e-02 9.540954e-02 5.880743e-03 8.147762e-03
##     mu_R1 7.114426e-02 9.244751e-03 8.613584e-03 6.150076e-02 7.184530e-02
##     rho   2.959514e-01 2.402352e-01 2.492240e-01 2.628027e-01 2.586384e-01
##     mu_R2 1.456446e-01 2.281865e-01 1.510766e-01 1.077169e-01 5.221663e-02
##     N     3.200000e+07 3.200000e+07 3.200000e+07 3.200000e+07 3.200000e+07

5.3 MIF2 convergence diagnostics

5.4 Simulated result

6.Conclusion

From the analysis above we can see that the original data plot of Covid-19 Infected cases have unstable trend. It is impossible to use SEIR, SEIQR and SECSDR to simulate the daily Infected case no matter how to change the parameters. The simulated Infected populations from three models all did not show a similar trend as the real situation and the likelihood analysis seems unreasonable. I think there are three main reasons:
1. Daily confirmed coronavirus cases number in US over long time not only depends on so-called parameters. We cannot simulate a long term unstable situation using unstable parameters, like susceptible population, exposed population, symptomed patients, etc. They may change over the time. So, in the future, I hope to include the temporal condition into model. The first phase is first 70th day. The second phase is from 70th to 100th day. The third is from 100th to 200th day. The last phase is from 200th to 400th day. Maybe from the 400th day to now, there exits another phase.
2. The US is too large, different states has different restriction rules during pandemic. It is hard to simulate the Infected cases number in the whole country directly. If we wanna study the Infected cases in US, we can begin simulate the Covid-19 Infected cases in different states and then ensemble the models.
3. The Covid-19 has a extreme long incubation period. So only one or two parameters about incubation rate included in these three models cannot simulate the real situation.

Future works: I need to figure out how to imcorperate temporal parameter into the models. Then in this ways, I can divide the whole period into different phases.

Reference

[1] Theory basis: STATS531 lecture notes from https://ionides.github.io/531w21/.
[2] When developing SEIQR, I refer to SEICR model in this project https://ionides.github.io/531w20/final_project/Project37/final.html
[3] When developing SECSDR, I refer to SECSDR model in this project https://ionides.github.io/531w20/final_project/Project22/final.html