531 Final Project: Analysis on Covid-19 Cases in Japan

Introduction

The outbreak of COVID-19 since late 2019 has posed unprecedented challenges to global public health systems and societal well-being. Among the affected regions, Japan stands as a notable case study due to its unique socio-demographic characteristics, healthcare infrastructure, and governmental response strategies. According to JHU [1], by far the COVID-19 has caused total cases of 33,329,551 and total deaths of 73,046.

In this study, we aim to explore the temporal dynamics of COVID-19 report cases time series in Japan using a combination of two distinct yet complementary modeling approaches: Autoregressive Moving Average (ARMA [2]) models and Susceptible-Exposed-Infectious-Recovered (SIR/SEIR [3]) compartmental models.

Data

The dataset comes from this github website [4], whose cases and deaths are collected from World Health Organization Coronavirus Dashboard [5]. This dataset is weekly based and we filtered out the data from Japan sorted by time. In the end, we got a time series of 222 observations from 2020-01-05 to 2024-03-31.

Exploratory Data Analysis

Code
# plot the time-series
ggplot(covid_japan_week, aes(x = date, y = new_cases)) +
  xlab("Date")+
  ylab("Cases")+
  ggtitle("Japan Covid Reports")+
  geom_line(col = "aquamarine3")

The time series plot clearly illustrates three distinct outbreaks of COVID-19 cases: one in early 2022, another in mid to late 2022, and a third towards the end of the same year. It is important to note that this time series reflects a complex interplay of societal, political, and behavioral factors, which are challenging to fully incorporate into our analysis. Therefore, this study focuses solely on utilizing time series models to fit the original data and optimize numerical procedures to achieve maximum likelihood and optimal simulation outcomes. The emphasis lies on numerical accuracy rather than delving deeply into the underlying factors driving the data.

ARMA

Data Analysis

Firstly, we explore this time series with ARMA model.

Code
covid.ts <- ts(data=covid_japan_week$new_cases,start=c(2020,1),frequency=52,names=c('cases'))
Code
par(mfrow = c(1,2))
# ACF and PACF
acf(covid.ts, lag = 48, main = "ACF Plot of Covid in Japan")
pacf(covid.ts, lag = 48, main = "PACF Plot of Covid in Japan")

Looking at the ACF and PACF [6] plot of the time series, we can observe a significant oscillation, which might suggest a periodicity pattern in it. To further investigate it, we do the spectrum analysis [7] below:

Code
# spectrum analysis

# smoothed periodogram
spectrum_s <- spectrum(covid.ts, spans = c(4,6,4), main = "Smoothed periodogram")
abline(v = spectrum_s$freq[which.max(spectrum_s$spec)], lty = "dotted")

Code
spectrum_s$freq[which.max(spectrum_s$spec)]
[1] 0.2311111

Since the frequency is \(0.2311111\text{ week}^{-1}\), thus the period is \(4.326923\text{ week}\). This means that a seasonal pattern with the period to be a month is appropriate. Besides, from the time series there seems not to be a clear trend so we do not use integration in ARMA fitting.

Model Specification

We define the SARIMA [8] model equation below:

  • Define \(Y_n\) as reported cases at time \(n\).
  • Define \(\epsilon_n\) as Gaussian white noise at time \(n\), i.e. \(\epsilon_n \overset{iid}{\sim}N(0. \sigma^2)\).
  • Define \(\mu\) as the expectation value of \(Y_t\).
  • Define \(\varphi(x) = 1- \varphi_1x - \varphi_2 x^2 - \cdots - \varphi_px^p\).
  • Define \(\Phi(x) = 1- \Phi_1x - \Phi_2 x^2 - \cdots - \Phi_px^p\).
  • Define \(\psi(x) = 1+ \psi_1x + \psi_2 x^2 + \cdots + \psi_qx^q\).
  • Define \(\Psi(x) = 1+ \Psi_1x + \Psi_2 x^2 + \cdots + \Psi_qx^q\).
  • Define \(B\) as the backshift operator, \(BY_n = Y_{n-1}\).

Then SARIMA model equation can be represented as:

\[ \varphi(B)\Phi(B^{12})(Y_n - u) = \psi(B)\Psi(B^{12})\epsilon_{n} \]

Choosing ARMA model

Now our final setting is SARIMA(p,0,q)\(\times\)(1,0,1) where \(0\leq p,q \leq 5\). We calculate the AIC tables as follows:(the code is from slides 5 in class [9])

Code
# AIC table

aic_table1 <- function(data,P,Q,sp,sq){
  table <- matrix(NA,(P+1),(Q+1))
  for(p in 0:P) {
    for(q in 0:Q) {
    table[p+1,q+1] <- tryCatch( {Arima(data, order=c(p,0,q), seasonal = list(order=c(sp,0,sq), period = 4),method = "ML")$aic}, error = function(e){return(NA)})
    }
  }
  dimnames(table) <- list(paste("AR",0:P, sep=""),
  paste("MA",0:Q,sep=""))
  table
}
covid_aic_table <- aic_table1(covid.ts,5,5,1,1)
kable(covid_aic_table, caption = paste("AIC of SARIMA(p,0,q),(1,0,1),  where p,q are from 0 to 5"), digits=2)
AIC of SARIMA(p,0,q),(1,0,1), where p,q are from 0 to 5
MA0 MA1 MA2 MA3 MA4 MA5
AR0 6075.74 5846.13 5784.56 5648.49 5628.62 5606.50
AR1 5661.20 5631.18 5611.10 5595.27 5595.67 5572.36
AR2 5603.93 5611.29 5584.22 5581.51 5581.85 5574.00
AR3 5585.94 5589.80 5590.99 5578.80 5589.95 5574.53
AR4 5585.34 5584.49 5592.51 5588.73 5582.41 5568.65
AR5 5587.03 5586.25 5584.17 5571.77 5585.04 NA

To achieve lower AIC value as well as keep the model relatively simple, we finally choose the model to be SARIMA(1,0,5)\(\times\)(1,0,1). Then we check the stationarity of the model and do some diagnostics.

Model Diagnostics

Code
# model diagnostics
sarima_model <- Arima(covid.ts, order=c(1, 0, 5), seasonal=list(order=c(1, 0, 1), period=4))

# plot roots
autoplot(sarima_model)

Code
# check residuals
checkresiduals(sarima_model)


    Ljung-Box test

data:  Residuals from ARIMA(1,0,5)(1,0,1)[4] with non-zero mean
Q* = 54.688, df = 36, p-value = 0.02372

Model df: 8.   Total lags used: 44

As we can see, the inverse AR and MA roots are all inside the unit circle, which means the ARMA is both stationary and causal. The Ljung-Box test and residual ACF plot show that there’s correlation between the residual in the model, which kind of violates the model assumption. However, the residual histogram shows a nearly normal distribution. Finally we show plots of how well ARMA model fits and also do some forecast.

Code
# fitted and prediction

# fitted
p1 <- ggplot(covid_japan_week, aes(x = date, y = new_cases)) +
  xlab("Date")+
  ylab("cases")+
  ggtitle("fitted vs. original")+
  geom_line(data = covid_japan_week, aes(colour = "Original data"))+
  geom_line(y = fitted(sarima_model), linetype = 2, aes(colour = "SARMA fit"))+
  scale_colour_manual(name="Legend", values=c("aquamarine3","coral1"))

# forecast
p2 <- autoplot(forecast::forecast(sarima_model, h=10), main = "forecast", ylab = "Cases", xlab = "Date")

p1 / p2

SEIR Model

Model Description

We have chosen to implement the SEIR model to simulate the COVID-19 outbreak because it accounts for the incubation period that is known to occur with COVID-19. This model introduces an ‘E’ stage, standing for ‘exposed’, indicating that individuals go through a latency period before they can infect others. This makes the SEIR model more practical in comparison to the SIR model, which does not consider this latency period.

SEIR Model Flowchart

In the flowchart [10], each arrow represents a specific rate of transition between the stages of the SEIR model. The stages are defined as:

  • S for the susceptible group
  • E for those exposed but not yet infectious
  • I for the infectious individuals
  • R for those who have recovered or been removed from the population.

The contact rate is denoted by \(\beta\), and the rate at which susceptible individuals become exposed is given by \(\mu_{SI} = \beta I(t)\). The transition from exposed to infectious is represented by \(\mu_{EI}\), and the rate at which infectious individuals recover or are removed is \(\mu_{IR}\). The likelihood of a case being officially reported is represented by \(\rho\).

The calculations for the number of individuals in each compartment are as follows:

  • \(S(t) = S(0) - \Delta N_{SE}(t)\)
  • \(E(t) = E(0) + \Delta N_{SE}(t) - \Delta N_{EI}(t)\)
  • \(I(t) = I(0) + \Delta N_{EI}(t) - \Delta N_{IR}(t)\)
  • \(R(t) = R(0) + \Delta N_{IR}(t)\)

The transitions \(\Delta N_{SE}\), \(\Delta N_{EI}\), and \(\Delta N_{IR}\) are modeled using binomial distributions with the respective probabilities being calculated as follows:

  • \(\Delta N_{SE} \sim \text{Binomial}(S, 1 - e^{-\beta \frac{I}{N} \Delta t})\)
  • \(\Delta N_{EI} \sim \text{Binomial}(E, 1 - e^{-\mu_{EI} \Delta t})\)
  • \(\Delta N_{IR} \sim \text{Binomial}(I, 1 - e^{-\mu_{IR} \Delta t})\)

For our measurement model, we use a discretized normal distribution truncated at zero for the reported cases:

\(\text{Cases} = \max\{\text{round}(C_n), 0\}\)

where \(C_n\) is drawn from \(N(\rho H_n, (\tau H_n)^2 + \rho H_n)\), with \(H\) tracking the number of individuals moving from I to R.

Model Assumption

To simplify the demographic factors in our model, we assume the birth rate, death rate, and population movement to be zero, thereby considering the total population \(N = S + E + I + R\) as constant. Furthermore, our data, visualized in the previous section, show several outbreaks, whereas traditional SEIR models typically describe scenarios with only one peak.

Consequently, we introduce a time-dependent contact rate (\(\beta\)) [11], allowing it to vary throughout the epidemic’s course. This variation is implemented as a step function, reflecting our exploratory data analysis (EDA) findings that the contact rate changes as the epidemic progresses.

To reflect the dynamic nature of public behavior towards the COVID-19 virus in Japan, our model incorporates a step function for the contact rate (\(\beta\)), aligned with significant events that likely influenced social interactions and mobility. These events are used as the breakpoints for the changes in \(\beta\), and each period represents a distinct phase in the epidemic’s timeline as related to these key events:

\[ \beta = \begin{cases} b_1, & \text{01/01/2020 - 25/05/2020} \\ & \text{(Start of the pandemic to the lifting of the state of emergency)} \\ b_2, & \text{26/05/2020 - 17/02/2021} \\ & \text{(Post-emergency period to the beginning of vaccination)} \\ b_3, & \text{18/02/2021 - 23/07/2021} \\ & \text{(Vaccination rollout to the day before the Tokyo Olympics)} \\ b_4, & \text{24/07/2021 - 31/12/2021} \\ & \text{(During and after the Tokyo Olympics)} \\ \end{cases} \]

  1. State of Emergency Fully Lifted [12]: On May 25, 2020, Japan fully lifted the state of emergency initially declared due to the COVID-19 pandemic. This marked a significant change in public behavior and restrictions, likely increasing social contact rates.
  2. First Vaccine Administered [13]: On February 17, 2021, Japan administered its first COVID-19 vaccines, introducing a new phase in the pandemic response, which may have affected public perception of risk and altered behavior patterns.
  3. Tokyo Olympics [14]: Starting July 23, 2021, the Tokyo Olympics were held, a period associated with increased international attention and domestic mobility, potentially leading to higher contact rates.

Due to the emergence of the Omicron variant on Nov 30, 2021 [15], which has different transmission characteristics, we will only use data up to 31/12/2021 to avoid confounding the analysis with a different pattern of infection.

These breakpoints were chosen based on the assumption that public health directives and major societal events significantly impact the population’s contact patterns, which are crucial for modeling the spread of the virus.

Code
suppressPackageStartupMessages({
  library(pomp)
})

seir_step = Csnippet("
  double Beta;
  if(intervention == 1) Beta = b1;
  else if(intervention == 2) Beta = b2;
  else if(intervention == 3) Beta = b3;
  else Beta = b4;
  
  double dN_SE = rbinom(S, 1 - exp(-Beta * I / N * dt));
  double dN_EI = rbinom(E, 1 - exp(-mu_EI * dt));
  double dN_IR = rbinom(I, 1 - exp(-mu_IR * dt));
  S -= dN_SE;
  E += dN_SE - dN_EI;
  I += dN_EI - dN_IR;
  H += dN_IR;
")

seir_rinit = Csnippet("
  S = nearbyint(eta * N);
  E = 100;
  I = 200;
  H = 0;
")

dmeas <- Csnippet("
  double tol=1.0e-25;
  double mean =rho*H;
  double sd =sqrt(pow(tau*H,2)+rho*H);
  if(new_cases>0.0){
    lik=pnorm(new_cases+0.5,mean,sd,1,0)-pnorm(new_cases-0.5,mean,sd,1,0)+tol;
  } else {
    lik=pnorm(new_cases+0.5,mean,sd,1,0)+tol;
  }
  if(give_log) lik=log(lik);
")

rmeas <- Csnippet("
  new_cases = rnorm(rho*H, sqrt(pow(tau*H,2)+rho*H));
  if(new_cases>0.0){
    new_cases=nearbyint(new_cases);
  } else {
    new_cases=0.0;
  }
")
covid_japan_week_20_21 <- covid_japan_week %>% filter(date<"2021-12-31") %>% mutate(Time = row_number())

seir_covar <- covariate_table(
  t = covid_japan_week_20_21$Time,
  intervention = c(rep(1, 21),
                   rep(2, 38),
                   rep(3, 23),
                   rep(4, 22)),
  times = "t")

covidSEIR = covid_japan_week_20_21 %>% select(Time, new_cases) %>%
  pomp(
    times = "Time", t0 = 1,
    rprocess = euler(seir_step, delta.t = 1),
    rinit = seir_rinit,
    rmeasure = rmeas,
    dmeasure = dmeas,
    accumvars = "H",
    partrans=parameter_trans(
      log = c("mu_EI", "mu_IR", "tau", "b1", "b2", "b3", "b4"),
      logit = c("rho", "eta")
    ),
    statenames = c("S", "E", "I", "H"),
    paramnames = c("b1", "b2", "b3", "b4", "mu_EI", "mu_IR", 
                   "eta", "rho", "N", "tau"),
    covar = seir_covar
  )

Model Initialization

Japan’s population for the year 2020 was reported to be 126,226,568 [16]. Consistent with our prior assumptions, we shall maintain \(N\) as a constant value at 126,226,568. According to the Centers for Disease Control and Prevention (CDC) [17], the incubation period for corona viruses is at 6.5 days by average. Therefore, we anticipate that the transition rate from exposed to infectious (\(\mu_{EI}\)) will be \(\frac{1}{6.5 \text{ days}} = 0.15 \text{ day}^{-1}\).

Recovery from COVID-19 typically takes a minimum of two weeks, with emerging research indicating that most adults with mild to moderate symptoms cease to be infectious approximately 10 days after symptoms manifest [18]. Consequently, the rate at which infectious individuals recover or are removed (\(\mu_{IR}\)) is estimated to be around \(0.1 \text{ day}^{-1}\). We have decided to set both \(\mu_{EI}\) and \(\mu_{IR}\) at 0.1, fixing these values for both local and global searches.

Utilizing this information, we conducted several simulations with varied parameters. The results suggest the following set of parameters as suitable initial values for our local search:

\[ \begin{cases} b_1 = 1, b_2 = 10, b_3 = 20, b_4 = 18 \\ \mu_{EI} = 0.1, \mu_{IR} = 0.1 \\ \rho = 0.4 \\ \eta = 0.09 \\ \tau = 0.05 \\ N = 126,226,568 \\ \end{cases} \]

Code
params = c(b1 = 1, b2 = 10, b3 = 20, b4 = 18, 
            mu_EI = 0.1, mu_IR = 0.1, rho = 0.4, eta = 0.09, 
            tau = 0.05, N = 126226568)
fixed_params = params[c("N", "mu_EI", "mu_IR")]
params_rw.sd = rw_sd(b1 = 0.02, b2 = 0.02, b3 = 0.02, b4 = 0.02, 
                     rho = 0.02, tau = 0.0001, eta = ivp(0.02))
Code
registerDoRNG(2289432)
bake(file = "lik_starting_vals.rds", {
  foreach(i=1:10, .combine = c) %dopar% {
    covidSEIR %>% pfilter(params=params,  Np=500)
  }
}) -> pf
pf %>% logLik() %>% logmeanexp(se = TRUE)
         est           se 
-4219.645321     2.084666 

we obtain an unbiased likelihood estimate of -4219.65 with a Monte Carlo standard error of 2.08.

Code
plot_simulation = function(sim_dat) {
  sim_dat %>%
    ggplot() +
    theme_bw() +
    geom_line(aes(Time, new_cases, group = .id, 
                  color = (.id == "data"), alpha = (.id == "data"), 
                  linetype = (.id == "data"))) +
    scale_color_manual(values = c("aquamarine3", "coral1")) +
    scale_alpha_manual(values = c(0.5, 1)) +
    scale_linetype_manual(values = c(5, 1)) +
    guides(color = FALSE, linetype = FALSE, alpha = FALSE)
} # plot_simulation()

covidSEIR %>%
  simulate(params = params, nsim = 20, format = "data.frame", include.data = TRUE) %>% 
  plot_simulation()

Based on the initial parameters, the simulations detect the trend of the biggest peak, but still be not able to capture the other major outbreaks.

Reference

[1] COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University(JHU).

[2] ARMA.

[3] SIR/SEIR.

[4] COVID-19 dataset.

[5] World Health Organization Coronavirus Dashboard.

[6] ACF and PACF.

[7] Spectrum Analysis.

[8] Seasonality.

[9] AIC table code

[10] Project 2, STATS531W21

[11] Project 15, STATS531W21

[12] Chronology of major events related to coronavirus and Japan

[13] COVID-19 vaccination in Japan

[14] COVID-19 in Japan during 2020-2022: Characteristics, responses, and implications for the health care system

[15] First omicron variant in Japan

[16] Demographics of Japan

[17] Incubation period for Covid-19

[18] Isolation and Precautions for People with COVID-19

[19] Japan – COVID-19: Re-Shutting of Borders for One Month Due to Omicron Variant