Introduction

In the landscape of the COVID-19 pandemic, people all over the world have been impacted in terms of their work and life. Many variants of COVID-19 have developed since its breakthrough in March 2020, which has resulted in the large number of infections and quick spread among close contacts of infected people. This pandemic has lasted for about 2 years, and it interrupted people’s normal life including travel and work.

In our analysis, we focus on the COVID-19 data in Washtenaw county, Michigan from July 1st, 2021 to April 6th, 2022. As University of Michigan students, we were particularly interested in how COVID-19 has spread in our county (Washtenaw). To get an understanding of how COVID-19 cases change over the time, we built a Susceptible-Exposed-Infection-Recovered (SEIR) compartment model. Through local search and global search, we aim to find how the SEIR model can adapt to the real COVID-19 data in Washtenaw County, MI. We also compare to a likelihood benchmark in the context of non-mechanistic models, including a negative binomial model and an ARMA model. This allows us to put the log-likelihoods we obtain through global search in the context of these non-mechanistic benchmarks.

Exploratory Data Analysis

The data was collected from the website of the Michigan state government [1], where the dataset has a time frame of March 2020 to April 2022. We chose to focus on just the time frame from July 2021 to April 2022, as a different variant (Omicron) has become the most prevalent and much previous work has been done using the data from March 2020 to 2021 [2]. We saw in particular that another STATS 531 final project from Winter 2021 used COVID-19 data from Washtenaw County [3], so we are curious if using a different time frame of more recent data will lead to different results than this previous work.

Figure 1 shows a time series plot smoothed by the Loess method, where the blue smoothed line displays an increasing trend before January 2022 and a decreasing trend after that. Figure 2 shows the average COVID-19 cases by month, where a peak in January 2022 is very obvious.

cap_fig1 =
  "**Figure 1.** *Cases of COVID-19 in Washtenaw County from July 2021 to April 2022 (smoothed by loess method)*"
data = washtenaw
data %>% ggplot(aes(x = Date, y = Cases)) + geom_line()  +
  geom_smooth(method='loess') +
  theme_bw() + labs(x = "Date", y = "Cases")
**Figure 1.** *Cases of COVID-19 in Washtenaw County from July 2021 to April 2022 (smoothed by loess method)*

Figure 1. Cases of COVID-19 in Washtenaw County from July 2021 to April 2022 (smoothed by loess method)

# monthly average cases
cap_fig2 = 
  "**Figure 2.** *The average COVID-19 cases by month*"
data %>% mutate(year_month = as.yearmon(Date, "%Y %m")) %>% 
  group_by(year_month) %>% summarize(avg_cases = mean(Cases)) %>% ungroup()%>%
  ggplot + 
  geom_line(aes(x =  year_month, y =avg_cases))  +
  theme_bw()  +
  labs(x = "Month Year", y = "Monthly Average Cases")
**Figure 2.** *The average COVID-19 cases by month*

Figure 2. The average COVID-19 cases by month

Figure 1 shows a time series plot smoothed by the Loess method, where the blue smoothed line displays an increasing trend before January 2022 and a decreasing trend after that. Figure 2 shows the average COVID-19 cases by month, where a peak in January 2022 is very obvious.

cap_fig3 = 
  "**Figure 3.** *The ACF plot of COVID-19 cases in Washtenaw County*"

acf(data$Cases, main="ACF: Covid-19 Cases")
**Figure 3.** *The ACF plot of COVID-19 cases in Washtenaw County*

Figure 3. The ACF plot of COVID-19 cases in Washtenaw County

Figure 3 displays the auto-correlation plot of COVID-19 cases in Washtenaw County, MI. This plot motivates our exploration of the time-dependent relationship of the number of COVID-19 cases.

Fitting SEIR Model

Model Assumptions

We chose to use the SEIR model which consists of four compartments including “S”, “E”, “I” and “R”. The “S” compartment represents the susceptible people; the “E” compartment represents the number of exposed individuals in a period of latency before becoming infectious; the “I” compartment represents the infectious people; the “R” represents those recovered or removed from the susceptible population.

Suppose the number of people in each compartment at time \(t\) is \(S(t), E(t), I(t), R(t)\), respectively. The model can be specified as follows:

\[ \begin{split} S(t)&=S(0)-N_{SE}(t)\\ E(t)&=E(0)+N_{SE}(t)-N_{EI}(t)\\ I(t)&=I(0)+N_{EI}(t)-N_{IR}(t)\\ R(t)&=R(0)+N_{IR}(t) \end{split} \]

where the number of people transiting from one compartment to another is given by:

\[ \begin{split} \Delta N_{SE}&\sim \mathrm{Binomial}(S,1-e^{\beta\frac{I}{N}\Delta t})\\ \Delta N_{EI}&\sim \mathrm{Binomial}(E,1-e^{-\mu_{EI}\Delta t})\\ \Delta N_{IR}&\sim \mathrm{Binomial}(I,1-e^{^{-\mu_{IR}\Delta t}}) \end{split} \]

As for the parameters, \(\beta\) is the contact rate with \(b_1\) when time is in the first half of time period and \(b_2\) when time is in the second half of time period ,and \(\mu_{SI}=\beta I(t)\) denotes the rate at which individuals in S transition to E, \(\mu_{EI}\) is the rate at which individuals in E transition to I and \(\mu_{IR}\) denotes the transition rate from I to R. The probability of a case being reported is \(\rho\).

Initial Guesses

Let’s get the initial values based on some statistics:

  1. The population of Washtenaw county on the start date: 372,258 (Estimated, retrieved from [5])

  2. Initial infected people: 30 (An intuitive value based on the confirmed cases on July 1st.)

We are going to divide the time series into two stages. According to the CDC, the time when Omicron became the primary variant in the US is around the beginning of December [2]. So, we set December 1st as the division of the two stages. Since the speeds of spreading of the Delta and Omicron variants are considered different, we set the parameters of \(\beta\) to be \(b_1\) and \(b_2\) respectively. We expect to see that the estimate of \(b_2\) will be larger than \(b_1\).

The code we have written for our project is based on/inspired by code we have written for previous homework assignments, course notes, and previous semesters’ projects for STATS 531 but modified for the purposes of our project [3] [4].

seir_step <- Csnippet("
  double Beta = intervention ? b2 : b1;

  double dN_SE = rbinom(S,1-exp(-Beta*I/N*dt));
  double dN_EI = rbinom(E,1-exp(-mu_EI*dt));
  double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
  S -= dN_SE;
  E += dN_SE - dN_EI;
  I += dN_EI - dN_IR;
  R += dN_IR;
  H += dN_IR;

")

seir_init <- Csnippet("
  S = nearbyint(eta*N);
  E = 30;
  I = 30;
  H = 0;
  R = nearbyint((1-eta)*N-E-I) + 1;
")

seir_dmeas <- Csnippet("
  double tol=1.0e-25;
  double mean =rho*H;
  double sd =sqrt(pow(tau*H,2)+rho*H);
  if(Cases>0.0){
    lik=pnorm(Cases+0.5,mean,sd,1,0)-pnorm(Cases-0.5,mean,sd,1,0)+tol;
  } else {
    lik=pnorm(Cases+0.5,mean,sd,1,0)+tol;
  }
  if(give_log) lik=log(lik);
")

seir_rmeas <- Csnippet("
  Cases = rnorm(rho*H, sqrt(pow(tau*H,2)+rho*H));
  if(Cases>0.0){
    Cases=nearbyint(Cases);
  } else {
    Cases=0.0;
  }
")


seir_covar <- covariate_table(
  t = washtenaw$Time,
  intervention= c(rep(0, 154),
                  rep(1, 125)),
  times = "t")

washtenaw %>%
  select(Time, Cases) %>% 
  pomp(
    times="Time", t0=1,
    rmeasure=seir_rmeas,
    dmeasure=seir_dmeas,
    accumvars="H",
    rprocess=euler(seir_step,delta.t=1),
    rinit=seir_init,
    partrans=parameter_trans(
      log = c("mu_EI", "mu_IR", "b1", "b2", "tau"),
      logit = c("rho", "eta") 
    ),
    paramnames=c("N","b1", "b2", "rho", "mu_EI","mu_IR","eta", "tau"),
    statenames=c("S","E","I","R","H"),
    covar = seir_covar
  ) -> washtenawSEIR

First, we carry out replicated particle filters at our initial guess. We find the estimated log-likelihood below:

foreach(i=1:10,.combine=c) %dopar% {
  library(pomp)
  washtenawSEIR %>% pfilter(Np=5000)
} -> pf
pf %>% logLik() %>% logmeanexp(se=TRUE) -> L_pf
L_pf
##                  se 
## -7213.183  1510.358

Then we can use simulation and visualization to see how good our initial guesses can fit:

Simulations based on our initial starting points are able to capture the general trend of our data as shown below, but improvements can still be made.

Likelihood Benchmark

We have shown both local and global search of the mechanistic model above, but many of these parameters do not have meaningful scientific interpretation. In this section, we calculate the log-likelihood in the non-mechanistic models, including a negative binomial model and an ARMA model. This allows us to compare our global search results to a benchmark.

nb_lik = function(theta) {- sum(dnbinom(
  data$Cases, size = exp(theta[1]), prob = exp(theta[2]), log = TRUE
))}
nb_mle = optim(c(0, -5), nb_lik)
-nb_mle$value
## [1] -1652.252

The negative binomial model gives a log-likelihood of \(-1652.252\). The log-likelihood of \(-1547\) attained by global search beats this non-mechanistic model.

In addition, a Gaussian auto-regressive moving-average (ARMA) model as a non-mechanistic model may also provide us with another perspective of log-likelihood. We made use of transformed cases \(log(y_n +1)\) in our ARMA model and then calculate the original log-likelihood in an appropriate way, where \(y_n\) is the number of cases.

log_cases = log(1 + data$Cases)
smoothed = spectrum(log_cases, spans=c(5, 5, 5), main="Periodogram (Smoothed)")
idx_max = which.max(smoothed$spec)
idx_max2 = which.max(smoothed$spec[41:length(smoothed$spec)]) + 40
abline(v=smoothed$freq[idx_max], lty=2, col='red') # freq = 0.003472222, 288 days
abline(v=smoothed$freq[idx_max2], lty=2, col='red') # freq = 0.1423611, 1 week

The smoothed periodogram of transformed cases above shows us that the first dominant frequency is around \(w_1 = 0.003472222\) which corresponds to the cycle of \(288\) days, and the second dominant frequency is around \(w_2 = 0.1423611\), which corresponds to a cycle of 1 week. Therefore, we explore the best possible seasonal ARMA models \(SARIMA(p, 0, q) \times (1, 0, 1)_7\) with the period of 7 days in terms of Akaike Information Criterion (AIC), where \(0\leq p\leq 5\), \(0\leq q\leq 5.\)

aic_table=function(data, P, Q){
    table=matrix(NA, (P+1), (Q+1))
    for(p in 0:P) {
        for(q in 0:Q) {
          model_aic = try(
            arima(data, order = c(p, 0, q), seasonal = list(order = c(1, 0, 1), period = 7))$aic, 
            silent = TRUE
          )
          table[p + 1,q + 1] = ifelse(
            inherits(model_aic, "try-error"), NA, model_aic
          )
        }
    }
    dimnames(table) = list(paste("AR", 0:P, sep=""), paste("MA", 0:Q, sep=""))
  table
}
knitr::kable(aic_table(data = log_cases, P = 5, Q = 5))
MA0 MA1 MA2 MA3 MA4 MA5
AR0 346.4929 275.1273 260.0006 251.1119 193.8270 194.1448
AR1 197.5747 177.2757 170.7478 163.6753 162.0773 163.0357
AR2 192.9134 176.3333 169.6617 161.8660 160.9721 162.7601
AR3 159.1812 159.8479 161.5929 163.3872 162.6681 NA
AR4 159.5692 160.0918 161.9805 163.8329 162.8044 167.3686
AR5 161.2516 161.9459 163.9343 165.9595 160.4336 NA
arma30= arima(log_cases, order = c(3, 0, 0), 
                   seasonal = list(order = c(1, 0, 1), period = 7))
arma30$loglik - sum(log_cases)
## [1] -1308.984

The AIC table above shows that \(SARIMA(3, 0, 0) \times (1, 0, 1)_7\) is the best model to move forward with based on this metric. The corrected log-likelihood for the original data is \(-1308.984\), which is slightly higher than that of global search. The SARIMA model captures the periodic phenomenon which the SEIR model may not take into account, since our SEIR model mainly considers the flow among the compartments.

Conclusion

In this report, we conduct a time series analysis using an SEIR pomp model on COVID-19 cases from July 2021 to April 2022 in Washtenaw County, MI.

Due to the Omicron variant of the COVID-19 virus, we divide the time series into two stages, where the contact rate \(\beta\) has two different parameter values during these two time stages. The results of our model show that the contact rate in the second stage is larger than that in the first stage, which suggests the power of Omicron variant.

Although there is still room to improve the SEIR model, such as using the SEQIR model that takes a quarantine state into consideration, our SEIR model fits the data of COVID-19 cases in Washtenaw County well.

Reference

[1] Michigan Coronavirus Data here

[2] CDC Omicron Variant here

[3] Projects from previous semesters of STATS 531 here, especially project 15 from W21.

[4] STATS 531 Course Notes here

[5] Population in Michigan here