As countries start to see the light at the end of the tunnel in the battle against the COVID-19 pandemic, we wanted to study the daily cases and, using a series of POMP models, each one being adjusted based on the information we now have present about the ground situation during each “segment” of time, attempt to model the number of cases. Our data comes from Our World in Data[1] and contains daily numbers for the count of new cases for the US from 2020-01-23 to 2022-04-01. After running some initial exploratory data analysis on the data set, we were able to divide the time series into 3 distinct groups, which can be characterised by the variant that was the main driver of cases at the time. We therefore had a pre-Delta variant domain, a Delta variant domain and an Omicron variant domain. The graphs, alongside the full view of the entire dataset is presented below.
## cases avg_7 date
## 1 1 0 2020-01-23
## 2 1 0 2020-01-24
## 3 0 0 2020-01-25
## 4 1 0 2020-01-26
## 5 0 0 2020-01-27
## 6 0 0 2020-01-28
## cases avg_7 date
## Min. : 0 Min. : 0 Min. :2020-01-23
## 1st Qu.: 28183 1st Qu.: 28000 1st Qu.:2020-08-09
## Median : 56516 Median : 60380 Median :2021-02-25
## Mean : 99342 Mean : 99223 Mean :2021-02-25
## 3rd Qu.: 120985 3rd Qu.:120164 3rd Qu.:2021-09-13
## Max. :1259946 Max. :806571 Max. :2022-04-01
## date cases avg_7
## Min. :2020-01-23 Min. : 0 Min. : 0
## 1st Qu.:2020-05-17 1st Qu.: 26386 1st Qu.: 26706
## Median :2020-09-10 Median : 51150 Median : 52849
## Mean :2020-09-10 Mean : 69266 Mean : 68924
## 3rd Qu.:2021-01-04 3rd Qu.: 75440 3rd Qu.: 70492
## Max. :2021-04-30 Max. :293325 Max. :250323
## date cases avg_7
## Min. :2021-05-01 Min. : 8364 Min. : 11596
## 1st Qu.:2021-06-23 1st Qu.: 27602 1st Qu.: 26413
## Median :2021-08-15 Median : 72618 Median : 73728
## Mean :2021-08-15 Mean : 74987 Mean : 74518
## 3rd Qu.:2021-10-07 3rd Qu.:113520 3rd Qu.:110890
## Max. :2021-11-30 Max. :198857 Max. :164521
## date cases avg_7
## Min. :2021-12-01 Min. : 9124 Min. : 25252
## 1st Qu.:2021-12-31 1st Qu.: 42566 1st Qu.: 52433
## Median :2022-01-30 Median : 133912 Median :127134
## Mean :2022-01-30 Mean : 256449 Mean :257795
## 3rd Qu.:2022-03-01 3rd Qu.: 388731 3rd Qu.:419001
## Max. :2022-04-01 Max. :1259946 Max. :806571
We start our analysis by fitting a benchmark ARMA model. We produce an AIC table to determine the best model see that the best fit is the ARMA(4,4).
<- function(data,P,Q){
aic_table <- matrix(NA,(P+1),(Q+1))
table for(p in 0:P) {
for(q in 0:Q) {
+1,q+1] <- arima(data,order=c(p,0,q), method="ML")$aic
table[p
}
}dimnames(table) <- list(paste("AR",0:P, sep=""),paste("MA",0:Q,sep=""))
table
}<- aic_table(data$cases,4,4)
cnt_aic_table ::kable(cnt_aic_table,digits=2) knitr
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 21245.16 | 20535.77 | 20452.18 | 20059.06 | 19941.91 |
AR1 | 20007.74 | 19834.31 | 19811.78 | 19872.01 | 19918.81 |
AR2 | 19965.71 | 19965.97 | 19733.63 | 19589.33 | 19919.91 |
AR3 | 19772.79 | 19847.63 | 19754.16 | 19611.95 | 19549.93 |
AR4 | 19764.28 | 19746.47 | 19764.69 | 19589.80 | 19541.82 |
<- arima(x = data$cases, order = c(4, 0, 4))
arima44 arima44
##
## Call:
## arima(x = data$cases, order = c(4, 0, 4))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 0.2438 -0.4574 0.7302 0.1946 0.5139 1.3157 0.3885 0.5522
## s.e. 0.0634 0.0405 0.0381 0.0656 0.0513 0.0381 0.0394 0.0494
## intercept
## 97526.36
## s.e. 21858.28
##
## sigma^2 estimated as 2.302e+09: log likelihood = -9760.91, aic = 19541.82
par(mfrow=c(1,2))
acf(arima44$residuals)
qqnorm(arima44$residuals)
qqline(arima44$residuals)
From the ACF and QQ-plot, we can tell that the residuals do not appear to be IID and normaly distributed. We then turn to POMP modeling analysis on the COVID-19 data.
The pre-Delta data was defined as being within the timeframe of 2020-01-23 to April 30th 2021. Before the Delta variant, vaccinations had not yet been widely introduced to the US public and so therefore, we decided to implement an SEIR model, which studies the transitions from susceptible, to exposed, infected and then finally to the recovered patients. We found that the SEIR model was more appropriate than the SIR model due to the presence of this exposed state, as it is a situation we observe in many diseases. More specifically, the concept of “being exposed” to COVID-19 was specifically addressed by local, state and federal governments. The way that people were expected to behave changed depending on whether they had been exposed or not and so we believed this was an important section in our modelling process. The combination of parameter we settled on for this model was inspired by a previous project[2]. The parameters in the SEIR models are summarised as:
And the model’s differential equations are given by following expressions:
\(\frac{dS_t}{dt} = -\beta/N S_t I_t\)
\(\frac{dE_t}{dt} = \beta/N S_t I_t -\mu_{EI} E_t\)
\(\frac{dI_t}{dt} = \mu_{EI} E_t -\mu_{IR} I_t\)
\(\frac{dR_t}{dt} = \mu_{IR} I_t\)
library(pomp)
library(doParallel)
library(doRNG)
library(iterators)
registerDoParallel()
registerDoRNG(5312022)
<- Csnippet("
seir_step double dN_SE = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_EI = rbinom(E,1-exp(-mu_EI*dt));
double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
S -= dN_SE;
E += dN_SE - dN_EI;
I += dN_EI - dN_IR;
R += dN_IR;
H += dN_IR;
")
<- Csnippet("
seir_rinit S = nearbyint(eta*N);
E = 0;
I = 1;
R = nearbyint((1-eta)*N);
H = 0;
")
<- Csnippet("
dmeas double tol = 1.0e-25;
double mean_cases = rho*H;
double sd_cases = sqrt(mean_cases*mean_cases);
if(reports > 0.0){
lik = pnorm(reports,mean_cases,sd_cases,1,0)- pnorm(reports-0.5,mean_cases,sd_cases,1,0)
+ tol;
} else{
lik = pnorm(reports+0.5,mean_cases,sd_cases,1,0) +tol;
}
if (give_log) lik = log(lik);
")
<- Csnippet("
rmeas
reports = rnorm(rho*H, sqrt(rho*H ) );
if (reports > 0.0) {
reports = nearbyint(reports);
} else {
reports = 0.0;
}
")
%>% mutate(days = c(1:464), reports=avg_7) -> df1
previous
= df1 %>%
covid1 pomp(
times="days",t0=0,
rprocess=euler(seir_step,delta.t=1),
rinit=seir_rinit,
rmeasure=rmeas,
dmeasure=dmeas,
accumvars="H",
partrans=parameter_trans(
log=c("Beta","tau","mu_EI","mu_IR"),
logit=c("rho","eta")
),paramnames=c("N","Beta","mu_EI","mu_IR","rho","eta","tau"),
statenames=c("S","E","I","R","H")
)
We first do the simulation with our initially guessing parameters. Beta=12.9,mu_IR=1.15,mu_EI=0.08,rho=0.8,tau=1,eta=0.1,N=3e8
set.seed(531)
<- c(Beta=12.9,mu_IR=1.15,mu_EI=0.08,rho=0.8,tau=1,eta=0.1,N=3e8)
params
%>%
covid1 simulate(
params=params,
nsim=10,format="data.frame",include.data=TRUE
-> sims
) %>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data")
<- c(Beta=12.9,mu_IR=1.15,mu_EI=0.08,rho=0.8,tau=1,eta=0.1,N=3e8)
params
foreach(i=1:10,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
%>% pfilter(params=params,Np=100)
covid1 -> pf
} %>% logLik() %>% logmeanexp(se=TRUE) -> L_pf
pf L_pf
## se
## -16744.2904 195.8473
Though the initial simulation seems to fit the data well, we can find that the log-likelihood behaves not good for the initial guessing parameters, we then do the local searching.
library(doParallel)
registerDoParallel()
registerDoRNG(5312022)
# set the random walk parameters
.50 <- 0.5
covid_cooling.fraction<- rw.sd(
covid_rw.sd Beta=0.002, rho=0.002,eta=ivp(0.002)
)
<- c(Beta=12.9,mu_IR=1.15,mu_EI=0.08,rho=0.8,tau=1,eta=0.1,N=3e8)
params
bake(file="lik_local_1.rds",{
foreach(i=1:20,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
mif2(covid1,
params = params,
Np=2000,
Nmif=50,
cooling.fraction.50=covid_cooling.fraction.50,
rw.sd=covid_rw.sd)
-> mifs_local
}
mifs_local-> mifs_local })
<- coef(mifs_local)
coefs_local <- coefs_local[,which.max(logLik(mifs_local))]
max_coefs_local
bake(file="local_results_1.rds",{
foreach(mf=mifs_local, .combine=rbind) %dopar% {
library(pomp)
library(tidyverse)
<- replicate(10,logLik(pfilter(mf,Np=20000)))
evals <- logmeanexp(evals,se=TRUE)
ll %>% coef() %>% bind_rows() %>%
mf bind_cols(loglik=ll[1],loglik.se=ll[2])
-> local_results
}
local_results-> local_results })
The top five best local searches:
bind_rows(local_results) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < 8) %>%
arrange(-loglik) -> best_local_searches
head(best_local_searches,5)
## # A tibble: 5 × 9
## Beta mu_IR mu_EI rho tau eta N loglik loglik.se
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 14.8 1.15 0.08 0.872 1 0.100 300000000 -16474. 5.07
## 2 15.5 1.15 0.08 0.807 1 0.102 300000000 -16678. 4.83
## 3 17.1 1.15 0.08 0.820 1 0.0983 300000000 -16898. 2.67
## 4 17.6 1.15 0.08 0.887 1 0.0994 300000000 -16940. 2.94
## 5 17.5 1.15 0.08 0.864 1 0.101 300000000 -16987. 5.07
bind_rows(local_results) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < 8) %>%
filter(loglik>max(loglik)-10000) -> tmp1
pairs(~loglik+Beta+eta+rho,data=tmp1,pch=16)
We here do simulation for the best local searched parameters:
%>%
covid1 simulate(
params=unlist(best_local_searches[1,]),
nsim=10,format="data.frame",include.data=TRUE
-> sims
) %>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data with local parameters searching")
There is certain trend for the loglik value on parameters like Beta, but the loglik plots look so sparse that it does not give us a clear picture or hint of the ridge in likelihood surface. Thus, we move on to do global search.
We here use the best 5 value for local search as well as the initial parameters guessing to consider the parameter box for global searching[3].
<- rbind(
covid_box1 Beta=c(10,20),
mu_EI=c(0.07,0.09),
rho=c(0.7,1),
eta=c(0,0.09),
N=c(3e8,3e8),
mu_IR=c(1, 1.25),
tau=c(0.85,1.1)
)
bake(file="mifs_global_1.rds",{
foreach(i=1:10,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
mif2(covid1,
params = c(apply(covid_box1,1,function(x)runif(1,x[1],x[2]))),
Np=2500,
Nmif=250,
cooling.fraction.50=covid_cooling.fraction.50,
rw.sd=covid_rw.sd)
-> mifs_global
}
mifs_global-> mifs_global
})
bake(file="global_search_1.rds",{
foreach(mf=mifs_global, .combine=rbind) %dopar% {
library(pomp)
library(tidyverse)
<- replicate(10,logLik(pfilter(mf,Np=50000)))
evals <- logmeanexp(evals,se=TRUE)
ll %>% coef() %>% bind_rows() %>%
mf bind_cols(loglik=ll[1],loglik.se=ll[2])
-> global_results
}
global_results-> global_results1 })
bind_rows(global_results1) %>%
filter(loglik.se < 10 ) %>%
arrange(-loglik) -> best_global_results1
head(as.data.frame(best_global_results1),5)
## Beta mu_EI rho eta N mu_IR tau loglik
## 1 12.30763 0.07945842 0.9998831 0.08948002 3e+08 1.053337 0.9708503 -14148.69
## 2 17.28959 0.08106070 0.9093095 0.06954621 3e+08 1.245942 0.9445085 -16174.48
## 3 16.25388 0.07476923 0.8893382 0.07349088 3e+08 1.129485 1.0554523 -16199.03
## 4 16.09867 0.08932817 0.7614243 0.03955038 3e+08 1.233796 1.0900555 -24341.41
## loglik.se
## 1 9.7205506
## 2 5.9764233
## 3 5.9866589
## 4 0.8067284
The best global search has a likelihood of -14148, which is significantly better than the log-likelihood from the local search. The pairs plot for the global search is as below.
bind_rows(global_results1) %>%
bind_rows(local_results) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < 10) -> tmp
pairs(~loglik+Beta+rho+mu_IR+mu_EI+rho+eta+tau,data=tmp,pch=16, col="red")
set.seed(531)
%>%
covid1 simulate(
params=unlist(best_global_results1[1,]),
nsim=10,format="data.frame",include.data=TRUE
-> sims
) %>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data with global parameters searching")
We observe reasonable, but not ideal fits for our data in both the global and local searches. In particular, they both appear to miss the peak of cases, in timing and extent, which appears to occur roughly on day 355 at 250,000 cases. The local search overestimates the peak, whilst the global search underestimates it.
Vaccinations were a few months into their rollout and the effects were just beginning to become observable by the time the Delta variant started spreading in the US. The dates that we have defined to study the Delta variant is 2021-05-01 to 2021-11-30. For this second segment of our data, we define a SEIRV[4] model, the additional term V included to account for those that have been fully vaccinated.
And the model’s differential equations are given by following expressions:
\(\frac{dS_t}{dt} = -\beta/N S_t I_t -\alpha/N S_t\)
\(\frac{dE_t}{dt} = \beta/N S_t I_t -\mu_{EI} E_t\)
\(\frac{dI_t}{dt} = \mu_{EI} E_t -\mu_{IR} I_t\)
\(\frac{dV_t}{dt} = \alpha/N S_t\)
\(\frac{dR_t}{dt} = \mu_{IR} I_t\)
To justify the parameter setting, H was initialised to the reported cases on 2021-5-1 and the exposed parameter was initialised to the reported cases on 2021-5-2. We set our infected parameter to double the reported cases on the date 2021-5-1. As a point of reference for modelling the vaccinated parameter, the percentage of those vaccinated in the US at the beginning of this time series (on 2021-3-30) was 30.54% and then on 2021-5-1 was 31.15%[1].
<- Csnippet("
seirv_step double dN_SE = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_EI = rbinom(E,1-exp(-mu_EI*dt));
double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
double dN_SV = rbinom(S, 1-exp(-alpha/N*dt));
S -= dN_SE + dN_SV;
E += dN_SE - dN_EI;
I += dN_EI - dN_IR;
R += dN_IR;
V += dN_SV;
H += dN_IR;
")
<- Csnippet("
seirv_rinit S = nearbyint(eta*N);
E = 49087;
I = 2*49743;
R = nearbyint(N*(1-eta)-(49087+2*49743 + round(N*0.3*0.01) + 49743));
V = nearbyint(N*0.3*0.01);
H = 49743;
")
<- Csnippet("
dmeas double tol = 1.0e-25;
double mean_cases = rho*H;
double sd_cases = sqrt(mean_cases*mean_cases);
if(reports > 0.0){
lik = pnorm(reports,mean_cases,sd_cases,1,0)- pnorm(reports-0.5,mean_cases,sd_cases,1,0)
+ tol;
} else{
lik = pnorm(reports+0.5,mean_cases,sd_cases,1,0) +tol;
}
if (give_log) lik = log(lik);
")
<- Csnippet("
rmeas
reports = rnorm(rho*H, sqrt(rho*H ) );
if (reports > 0.0) {
reports = nearbyint(reports);
} else {
reports = 0.0;
}
")
%>% mutate(days = c(1:214), reports=avg_7) -> df2
delta
= df2 %>%
covid2 pomp(
times="days",t0=0,
rprocess=euler(seirv_step,delta.t=1),
rinit=seirv_rinit,
rmeasure=rmeas,
dmeasure=dmeas,
accumvars="H",
partrans=parameter_trans(
log=c("Beta","alpha", "tau","mu_EI","mu_IR"),
logit=c("rho","eta")
),paramnames=c("N","Beta","alpha","mu_EI","mu_IR","rho","eta","tau"),
statenames=c("S","E","I","R","V", "H")
)
set.seed(531)
<- c(Beta=13,alpha = 0.05,mu_IR=1.8,mu_EI=0.09,rho=0.8,tau=1,eta=0.1,N=3e8)
params
%>%
covid2 simulate(
params=params,
nsim=10,format="data.frame",include.data=TRUE
-> sims
) %>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data")
library(doParallel)
library(doRNG)
library(iterators)
registerDoParallel()
registerDoRNG(5312022)
<- c(Beta=13,alpha = 0.05,mu_IR=1.8,mu_EI=0.09,rho=0.8,tau=1,eta=0.1,N=3e8)
params
foreach(i=1:10,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
%>% pfilter(params=params,Np=100)
covid2 -> pf
} %>% logLik() %>% logmeanexp(se=TRUE) -> L_pf
pf L_pf
## se
## -3240.0106209 0.6828551
library(doParallel)
library(doRNG)
registerDoParallel()
registerDoRNG(5312022)
# set the random walk parameters
.50 <- 0.5
covid_cooling.fraction<- rw.sd(
covid_rw.sd Beta=0.02, alpha =0.01, rho=0.02,mu_IR=0.01,mu_EI=0.01,eta=ivp(0.002)
)
<- c(Beta=13,alpha = 0.05,mu_IR=1.8,mu_EI=0.09,rho=0.8,tau=1,eta=0.1,N=3e8)
params
bake(file="lik_local_2.rds",{
foreach(i=1:20,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
mif2(covid2,
params = params,
Np=2000,
Nmif=50,
cooling.fraction.50=covid_cooling.fraction.50,
rw.sd=covid_rw.sd)
-> mifs_local
}
mifs_local-> mifs_local })
<- coef(mifs_local)
coefs_local <- coefs_local[,which.max(logLik(mifs_local))]
max_coefs_local
bake(file="local_results_2.rds",{
foreach(mf=mifs_local, .combine=rbind) %dopar% {
library(pomp)
library(tidyverse)
<- replicate(5,logLik(pfilter(mf,Np=2000)))
evals <- logmeanexp(evals,se=TRUE)
ll %>% coef() %>% bind_rows() %>%
mf bind_cols(loglik=ll[1],loglik.se=ll[2])
-> local_results
}
local_results-> local_results })
bind_rows(local_results) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < 10) %>%
arrange(-loglik) -> best_local_searches
head(best_local_searches,5)
## # A tibble: 5 × 10
## Beta alpha mu_IR mu_EI rho tau eta N loglik loglik.se
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 11.7 0.0674 1.75 0.170 0.898 1 0.0992 300000000 -5759. 1.00
## 2 9.06 0.107 0.855 0.152 0.789 1 0.0989 300000000 -6159. 0.896
## 3 8.41 0.190 0.880 0.199 0.929 1 0.101 300000000 -6454. 9.95
## 4 13.7 0.0658 2.63 0.215 0.959 1 0.0994 300000000 -7291. 0.164
## 5 10.6 0.0592 1.37 0.274 0.947 1 0.0999 300000000 -7481. 0.138
set.seed(531)
%>%
covid2 simulate(
params=unlist(best_local_searches[1,]),
nsim=10,format="data.frame",include.data=TRUE
-> sims
) %>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data with local parameters searching")
<- rbind(
covid_box2 Beta=c(7,15),
mu_EI=c(0.07,0.09),
alpha=c(0,0.2),
rho=c(0.5,1),
eta=c(0,0.5),
N=c(3e8,3e8),
mu_IR=c(0.7, 1.5),
tau=c(0,2)
)
bake(file="mifs_global_2.rds",{
foreach(i=1:20,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
mif2(covid2,
params = c(apply(covid_box2,1,function(x)runif(1,x[1],x[2]))),
Np=3000,
Nmif=250,
cooling.fraction.50=covid_cooling.fraction.50,
rw.sd=covid_rw.sd)
-> mifs_global2
}
mifs_global2-> mifs_global2
})
bake(file="global_search_2.rds",{
foreach(mf=mifs_global2, .combine=rbind) %dopar% {
library(pomp)
library(tidyverse)
<- replicate(10,logLik(pfilter(mf,Np=50000)))
evals <- logmeanexp(evals,se=TRUE)
ll %>% coef() %>% bind_rows() %>%
mf bind_cols(loglik=ll[1],loglik.se=ll[2])
-> global_results2
}
global_results2-> global_results2 })
The best global search had the following coefficients and log likelihood and simulated results:
bind_rows(global_results2) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < 5 ) %>%
arrange(-loglik) -> best_global_results2
head(as.data.frame(best_global_results2),5)
## Beta mu_EI alpha rho eta N mu_IR tau
## 1 6.478088 0.09201846 0.10899263 0.9887020 0.11280416 3e+08 0.8590916 1.1137165
## 2 6.627273 0.07277982 0.06059614 0.9964879 0.07452394 3e+08 0.4207374 0.7006898
## 3 1.693379 0.75722150 0.06664166 0.9017059 0.43622504 3e+08 1.1279543 1.0546965
## 4 2.146755 1.24823145 0.22935993 0.7937392 0.34884011 3e+08 1.0262129 1.0916003
## 5 2.064472 1.83016286 0.01830383 0.8183197 0.36125617 3e+08 1.0295068 0.9732440
## loglik loglik.se
## 1 -2707.706 0.0007600655
## 2 -2709.161 0.0008548933
## 3 -2919.093 2.6375954870
## 4 -6378.334 3.4227820985
## 5 -6605.254 0.3478611999
The best global search has a likelihood of -2707 and 0.00076 se, which is significantly better than the log-likelihood from the local search. The pairs plot and simulation plot for the global search are as below.
bind_rows(global_results2) %>%
bind_rows(local_results) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < 5) -> temp
pairs(~loglik+Beta+rho+mu_IR+mu_EI+alpha,data=temp,pch=16, col="red")
%>%
covid2 simulate(
params=unlist(best_global_results2[1,]),
nsim=10,format="data.frame",include.data=TRUE
-> sims
)
%>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data with global parameters searching")
Our global and local search return distinct simulation curves, with the local search providing a more accurate estimate. Both simulations appear to not align well with the data in terms of estimating the extent of the peak, whilst the global search attempts to model the timing (day 125) more accurately.
We continue to use a SEIRV[4] model to fit the Omicron data, which is defined as the daily cases between the dates 2021-12-01 and 2022-04-01. Situationally, there was little change in the US that could be translated into our parameters, other than a change in the vaccination rate. We here consider people get fully vaccinated can also be infected by the virus.
And the model’s differential equations are given by following expressions:
\(\frac{dS_t}{dt} = -\beta/N S_t I_t -\alpha/N S_t\)
\(\frac{dE_t}{dt} = \beta/N S_t I_t -\mu_{EI} E_t + \sigma \beta/N V_t I_t\)
\(\frac{dI_t}{dt} = \mu_{EI} E_t -\mu_{IR} I_t\)
\(\frac{dV_t}{dt} = \alpha/N S_t -\sigma \beta/N V_t I_t\)
\(\frac{dR_t}{dt} = \mu_{IR} I_t\)
Parameters were initialised using the same principles as with the Delta variant. The reported cases on 2021-12-1 was set as the initial value of H and the reported cases on 2021-12-2 was set as the initial value of E. We set our infected parameter to double the reported cases on the date 2021-12-01. We also observed that the percentage of the vaccinated population stood at 59.35% on 2021-11-30 and 59.45% on 2021-12-1.
<- Csnippet("
seirv2_step double dN_SE = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_EI = rbinom(E,1-exp(-mu_EI*dt));
double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
double dN_SV = rbinom(S, 1-exp(-alpha/N*dt));
double dN_VE = rbinom(V, 1-exp(-sigma*Beta*I/N*dt));
S -= dN_SE + dN_SV;
E += dN_SE + dN_VE - dN_EI;
I += dN_EI - dN_IR;
R += dN_IR;
V += dN_SV - dN_VE;
H += dN_IR;
")
<- Csnippet("
seirv2_rinit S = round(N*eta);
E = 97550;
I = 2*80513;
R = round(N*(1-eta)-(97550+2*80513 + round(N*(0.5945-0.5935)) + 87248));
V = round(N*(0.5945-0.5935));
H = 87248;
")
<- Csnippet("
dmeas double tol = 1.0e-25;
double mean_cases = rho*H;
double sd_cases = sqrt(mean_cases*mean_cases);
if(reports > 0.0){
lik = pnorm(reports,mean_cases,sd_cases,1,0)- pnorm(reports-0.5,mean_cases,sd_cases,1,0)
+ tol;
} else{
lik = pnorm(reports+0.5,mean_cases,sd_cases,1,0) +tol;
}
if (give_log) lik = log(lik);
")
<- Csnippet("
rmeas
reports = rnorm(rho*H, sqrt(rho*H ) );
if (reports > 0.0) {
reports = nearbyint(reports);
} else {
reports = 0.0;
}
")
%>% mutate(days = c(1:122), reports=avg_7) -> df3
omicron
= df3 %>%
covid3 pomp(
times="days",t0=0,
rprocess=euler(seirv2_step,delta.t=1),
rinit=seirv2_rinit,
rmeasure=rmeas,
dmeasure=dmeas,
accumvars="H",
partrans=parameter_trans(
log=c("Beta","alpha", "sigma","tau","mu_EI","mu_IR"),
logit=c("rho","eta")
),paramnames=c("N","Beta","alpha","sigma","mu_EI","mu_IR","rho","eta","tau"),
statenames=c("S","E","I","R","V", "H")
)
set.seed(531)
<- c(Beta=30,alpha =0.05,sigma=0.02,mu_IR=1.5,mu_EI=0.1,rho=0.8,tau=1,eta=0.1,N=3e8)
params
%>%
covid3 simulate(
params=params,
nsim=10,format="data.frame",include.data=TRUE
-> sims
) %>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data")
library(doParallel)
library(doRNG)
library(iterators)
registerDoParallel()
registerDoRNG(5312022)
<- c(Beta=30,alpha =0.05,sigma=0.02,mu_IR=1.5,mu_EI=0.1,rho=0.8,tau=1,eta=0.1,N=3e8)
params
foreach(i=1:10,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
%>% pfilter(params=params,Np=100)
covid3 -> pf
} %>% logLik() %>% logmeanexp(se=TRUE) -> L_pf
pf L_pf
## se
## -3603.1566874 0.1330475
library(doParallel)
registerDoParallel()
registerDoRNG(5312022)
# set the random walk parameters
.50 <- 0.5
covid_cooling.fraction<- rw.sd(
covid_rw.sd Beta=0.002, alpha =0.001,sigma =0.002, rho=0.002,mu_IR=0.01,mu_EI=0.01,eta=ivp(0.002)
)
<- c(Beta=30,alpha =0.05,sigma=0.02,mu_IR=1.5,mu_EI=0.1,rho=0.8,tau=1,eta=0.1,N=3e8)
params
bake(file="lik_local_3.rds",{
foreach(i=1:20,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
mif2(covid3,
params = params,
Np=2000,
Nmif=50,
cooling.fraction.50=covid_cooling.fraction.50,
rw.sd=covid_rw.sd)
-> mifs_local
}
mifs_local-> mifs_local })
<- coef(mifs_local)
coefs_local <- coefs_local[,which.max(logLik(mifs_local))]
max_coefs_local
bake(file="local_results_3.rds",{
foreach(mf=mifs_local, .combine=rbind) %dopar% {
library(pomp)
library(tidyverse)
<- replicate(5,logLik(pfilter(mf,Np=2000)))
evals <- logmeanexp(evals,se=TRUE)
ll %>% coef() %>% bind_rows() %>%
mf bind_cols(loglik=ll[1],loglik.se=ll[2])
-> local_results
}
local_results-> local_results })
bind_rows(local_results) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < 0.5) %>%
arrange(-loglik) -> best_local_searches
head(best_local_searches,5)
## # A tibble: 5 × 11
## Beta alpha sigma mu_IR mu_EI rho tau eta N loglik loglik.se
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 24.6 0.0507 0.0166 1.23 0.0693 0.815 1 0.0975 300000000 -1645. 0.00127
## 2 23.5 0.0460 0.0234 1.27 0.0723 0.817 1 0.0979 300000000 -1645. 0.00106
## 3 24.5 0.0530 0.0163 1.11 0.0668 0.827 1 0.0964 300000000 -1645. 0.00161
## 4 24.5 0.0517 0.0199 1.23 0.0691 0.800 1 0.0957 300000000 -1645. 0.00148
## 5 24.8 0.0465 0.0224 1.16 0.0667 0.801 1 0.0983 300000000 -1645. 0.000788
%>%
covid3 simulate(
params=unlist(best_local_searches[1,]),
nsim=10,format="data.frame",include.data=TRUE
-> sims
) %>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data with local parameters searching")
<- rbind(
covid_box Beta=c(25,35),
mu_EI=c(0,0.5),
alpha=c(0,0.1),
rho=c(0,1),
eta=c(0,0.5),
N=c(3e8,3e8),
mu_IR=c(0, 5),
tau=c(0,2),
sigma=c(0,0.5)
)#Beta=30,alpha =0.05,sigma=0.02,mu_IR=1.5,mu_EI=0.1,rho=0.8,tau=1,eta=0.1,N=3e8
bake(file="mifs_global3.rds",{
foreach(i=1:20,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
mif2(covid3,
params = c(apply(covid_box,1,function(x)runif(1,x[1],x[2]))),
Np=20000,
Nmif=250,
cooling.fraction.50=covid_cooling.fraction.50,
rw.sd=covid_rw.sd)
-> mifs_global
}
mifs_global-> mifs_global
})
bake(file="global_search3.rds",{
foreach(mf=mifs_global, .combine=rbind) %dopar% {
library(pomp)
library(tidyverse)
<- replicate(10,logLik(pfilter(mf,Np=50000)))
evals <- logmeanexp(evals,se=TRUE)
ll %>% coef() %>% bind_rows() %>%
mf bind_cols(loglik=ll[1],loglik.se=ll[2])
-> global_results
}
global_results-> global_results3 })
bind_rows(global_results3) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < .5 ) %>%
arrange(-loglik) -> best_global_results3
head(as.data.frame(best_global_results3),5)
## Beta mu_EI alpha rho eta N mu_IR
## 1 14.49075 0.15716263 0.05318494 0.9235025 0.10309679 3e+08 2.0406757
## 2 22.96002 0.06955564 0.06941834 0.7861368 0.10874120 3e+08 1.2238263
## 3 33.97512 0.06085419 0.05191722 0.8679845 0.06227463 3e+08 0.9314555
## 4 25.25646 0.06298270 0.03346174 0.7359994 0.11114879 3e+08 1.1917544
## 5 23.59684 0.05886329 0.01069119 0.6823240 0.13707199 3e+08 1.0721675
## tau sigma loglik loglik.se
## 1 0.81494483 0.27376243 -1640.246 0.0003151663
## 2 0.62841287 0.30899137 -1648.572 0.0002551736
## 3 0.05934004 0.61653863 -1648.919 0.0003069671
## 4 0.77488153 0.32587294 -1649.572 0.0002221989
## 5 1.40495745 0.00282577 -1657.235 0.0001244582
bind_rows(global_results3) %>%
bind_rows(local_results) %>%
filter(is.finite(loglik)) %>%
filter(loglik.se < .5)-> temp
pairs(~loglik+Beta+rho+mu_IR+mu_EI+alpha,data=temp,pch=16, col="red")
set.seed(531)
%>%
covid3 simulate(
params=unlist(best_global_results3[1,]),
nsim=10,format="data.frame",include.data=TRUE
-> sims
) %>%
sims ggplot(aes(x=days,y=reports,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+labs(title="Simulation for data with global parameters searching")
Our most accurate simulations were for the SEIR model on our pre-Delta variant data. We tried a variety of parameters and iterations, but our simulations on our Delta and Omicron data did not return accurate fits. There are a few reasons as to why we might expect this to occur.
Our pre-Delta data is, relatively speaking, an easier dataset to model. We don’t have to incorporate any vaccination data since we are in the early stages of the disease spread and with this, we don’t need to worry about the rate of change of vaccination rates. The government response in the US was minimal other than encouraging people to stay at home and limit contact, but this proved to be useless given the contagiousness of the disease.
The US as a whole is a difficult country to model. State government mandates essentially ensured that each state operated as its own “mini-country” and attempting to model these as one, single entity might not make the most logical sense. The laws implemented to counteract COVID in states such as New York for example, were vastly different and far stricter than those implemented in Texas.
Vaccination rates aren’t uniform. At the beginning of the vaccine rollout the US did run into the issue of supply restrictions across the country. Differing attitudes towards the vaccine in different states also contributes towards varying rates across the country.
[1] https://ourworldindata.org/coronavirus
[2] https://ionides.github.io/531w21/final_project/project02/blinded.html
[3] https://ionides.github.io/531w21/final_project/project13/blinded.html
[4] https://ionides.github.io/531w21/final_project/project03/blinded.html
The most analysis methods on COVID-19 dataset in US with POMP model are referenced from previous final projects in 2021, where we combined the advantages from different projects and also add our novel contributions based on our own data. As mentioned in above text, we inspired by [2] to use dataset from reported infectious number in US but with more time size (from 2020-1-23 to 2022-04-01), and use more smoothed data “7 days averagely report” to do the analysis, which may mitigate the backward of noise. After the EDA and ARMA modeling, we separate the date rather than simply the time but more according to different variants domain period to estimate different parameters for different models. For SEIRV model, we refer to [4] but separate Delta domain and Omicron domain by whether the fully vaccinated people can be infected. For local and global search’s format, we refer to [3].