Nova Scotia’s Influenza Cases

  • Influenza Case Trends in Nova Scotia: Capturing the Seasonal Behavior
    • Introduction
    • Data Preparation and EDA
    • POMP Model analysis
      • SIR Model
      • SIRS model
      • SEIRS model
      • Conclusion
      • Previous Project Comparisons

Influenza Case Trends in Nova Scotia: Capturing the Seasonal Behavior

Influenza Virus (Hospital CMQ, 2024)
Influenza Virus (Hospital CMQ, 2024)

Introduction

Influenza, a contagious respiratory illness, imposes recurring seasonal health burdens on populations across the globe. In this study, we examine lab-confirmed influenza cases in Nova Scotia[1] from 2014 to 2019, aiming to uncover the temporal dynamics and structural patterns underlying observed trends. By leveraging time series decomposition, autoregressive modeling, and partially observed Markov process (POMP) frameworks—including SIR, SIRS and SEIRS models—we seek to capture both the seasonal and stochastic behavior of influenza transmission in the region. This modeling effort helps quantify the seasonal peaks and infection dynamics and provides a basis for forecasting and evaluating public health interventions tailored to the Nova Scotia context.

Data Preparation and EDA

To begin our analysis, we obtained weekly lab-confirmed influenza case data for Nova Scotia from 2014 to 2019. The dataset was cleaned by converting date fields into standardized formats and filtering for the relevant timeframe. The dataset contains 262 observations. We then constructed a time series of weekly reported cases, revealing distinct seasonal peaks corresponding to annual influenza outbreaks.The time series plot shows clear seasonal peaks in confirmed influenza cases in Nova Scotia, with sharp outbreaks occurring annually between 2014 and 2019. Each season is followed by periods of low to near-zero case counts, indicating a strong seasonal trend in influenza activity. We have 261 observations in our dataset.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    1.00   12.07   17.75   87.00

We can notice that the distribution of Influenza cases in Nova Scotia is skewed to the right with a median being 1 cases per day and the range being from 0 to 87 cases per day.

The ACF plot shows a strong autocorrelation at lag one that decays gradually across subsequent lags, indicating a non-stationary time series. The PACF plot displays a significant spike at lag one, followed by a sharp drop-off, which is characteristic of an autoregressive process. Together, these patterns suggest that the series requires first-order differencing to achieve stationarity, after which an ARIMA model may be appropriate for forecasting.

The ACF plot shows a sharp drop after lag 1, with all subsequent autocorrelations within the confidence bounds, indicating that the series is now stationary. The PACF shows significant spikes at early lags (especially lag 1–2), suggesting that an AR(1) or AR(2) component may be appropriate. This supports the use of an ARIMA(p,1,q) model with d = 1 due to first-order differencing.

The decomposition also confirms that the observed time series consists of a relatively stable trend component, a recurring seasonal pattern, and irregular random fluctuations. The seasonal component shows consistent cyclical behavior across years, while the random component captures short-term noise not explained by the trend or seasonality. We are going to check our yearly cycle by spectral analysis

## The peak frequency:  0.01851852
## The dominant period is: 54 weeks.

There is a clear seasonal trend in the influenza case data that recurs every ~54 weeks. Since the time period (54 weeks) is approximately one year (52 weeks), we will set the frequency to be 52 weeks for our following analysis.

aic_table <- function(data, P = 3, Q = 3, seasonal = FALSE, per = 52) {
  table <- matrix(NA, nrow = P + 1, ncol = Q + 1)
  
  for (p in 0:P) {
    for (q in 0:Q) {
      model <- tryCatch({
        if (seasonal) {
          arima(data, order = c(p, 0, q),
                seasonal = list(order = c(1, 0, 1), period = per))
        } else {
          arima(data, order = c(p, 0, q))
        }
      }, error = function(e) NULL)
      
      if (!is.null(model)) {
        table[p + 1, q + 1] <- model$aic
      }
    }
  }
  
  dimnames(table) <- list(paste0("AR", 0:P), paste0("MA", 0:Q))
  return(table)
}


aic_results <- aic_table(data = diff_series, P = 5, Q = 5, seasonal = FALSE)
# View as a table
print(aic_results)
##          MA0      MA1      MA2      MA3      MA4      MA5
## AR0 1829.717 1831.505 1833.304 1833.459 1833.433 1825.505
## AR1 1831.493 1832.313 1834.227 1835.077 1831.307 1827.454
## AR2 1833.325 1834.460 1827.025 1827.399 1830.062 1827.488
## AR3 1834.179 1835.441 1827.272 1829.255 1827.447 1828.115
## AR4 1832.375 1831.121 1831.262 1828.779 1827.497 1829.497
## AR5 1828.092 1830.083 1829.217 1829.026 1829.498 1821.560
min_aic <- min(aic_results, na.rm = TRUE)
print(min_aic)
## [1] 1821.56
## Best model: ARIMA(5,0,5)

The AIC table shows that ARIMA(5,0,5) has the lowest AIC (1821.56), indicating the best fit among tested models. However, ARIMA(2,0,2) has a similar AIC (1827.03) with far fewer parameters. Given the small AIC difference, ARIMA(2,0,2) may offer a better balance between accuracy and simplicity, helping to avoid overfitting.

POMP Model analysis

SIR Model

The SIR model is one of the most fundamental compartmental models in infectious disease epidemiology. It segments the population into three compartments: Susceptible (S), Infected (I), and Recovered (R). Individuals progress from susceptible to infected upon exposure and from infected to recovered after resolving the illness. In this model, recovered individuals gain permanent immunity and do not return to the susceptible class.

The population dynamics of the SIR model are governed by the following system of difference equations: \[ \begin{align*} S(t) &= S(0) - N_{SI}(t) \\ I(t) &= I(0) + N_{SI}(t) - N_{IR}(t) \\ R(t) &= R(0) + N_{IR}(t) \end{align*} \] Here, \(N_{SI}(t)\) denotes the number of individuals who transition from susceptible to infected by time t, and \(N_{IR}(t)\) represents those moving from infected to recovered.

The flow between compartments is governed by the following rate equations:

\[ \begin{aligned} \frac{dN_{SI}}{dt} &= \beta(t) \cdot \frac{S(t) I(t)}{N} \\ \frac{dN_{IR}}{dt} &= \mu_{IR} \cdot I(t) \end{aligned} \] To account for seasonality in influenza transmission, we modulate the transmission rate using a sinusoidal function: \[ \beta(t) = \beta_0 \cdot \left(1 + \text{amp} \cdot \sin\left(\frac{2\pi(t + \text{phase})}{52}\right)\right) \]

The model is simulated using Euler’s method. Transition counts are generated stochastically using binomial draws: \[ \begin{aligned} N_{SI}(t + \Delta t) &= N_{SI}(t) + \text{Binomial}\left(S(t), 1 - \exp\left(-\beta(t) \cdot \frac{I(t)}{N} \cdot \Delta t\right)\right) \\ N_{IR}(t + \Delta t) &= N_{IR}(t) + \text{Binomial}\left(I(t), 1 - \exp\left(-\mu_{IR} \cdot \Delta t\right)\right) \end{aligned} \]

The initial state is parameterized by setting a fraction \(\eta\) of the population N as initially susceptible: \[ \begin{aligned} S(0) &= \eta \cdot N \\ I(0) &= 10 \\ R(0) &= (1 - \eta) \cdot N \end{aligned} \]

Reported cases are modeled as a noisy observation of the newly recovered individuals: \[ \text{cases}_{\text{obs}} \sim \text{NegativeBinomial}(\mu = \rho \cdot \text{cases},\ k) \]

This measurement model allows for underreporting (\(\rho\)) and overdispersion (\(k\)), helping the model accommodate real-world fluctuations in reported case counts.

sir_step <- Csnippet("
  double pi = 3.141593;
  double beta_seasonal = Beta * (1 + amp * sin(2 * pi * (t + phase) / 52));
  double dN_SI = rbinom(S, 1 - exp(-beta_seasonal * I / N * dt));
  double dN_IR = rbinom(I, 1 - exp(-mu_IR * dt));
  S -= dN_SI;
  I += dN_SI - dN_IR;
  R += dN_IR;
  cases = dN_IR;
")

rinit_sir <- Csnippet("
  S = nearbyint(eta * N);
  I = 10;
  R = nearbyint((1 - eta) * N);
  cases = 0;
")

dmeas <- Csnippet("
  lik = dnbinom_mu(cases_obs, k, rho * cases + 1e-6, give_log);
")

rmeas <- Csnippet("
  cases_obs = rnbinom_mu(k, rho * cases + 1e-6);
")

sir_pomp <- pomp(
  data = data,
  times = "time",
  t0 = 0,
  rprocess = euler(sir_step, delta.t = 1/7),
  rinit = rinit_sir,
  rmeasure = rmeas,
  dmeasure = dmeas,
  statenames = c("S", "I", "R", "cases"),
  paramnames = c("Beta", "mu_IR", "eta", "amp", "phase", "rho", "k", "N"),
  accumvars = "cases",
  partrans = parameter_trans(
    log = c("Beta", "mu_IR", "k"),
    logit = c("eta", "amp", "rho")
  ),
  params = c(Beta = 2, mu_IR = 0.2, eta = 0.2, amp = 0.2,
             phase = 0, rho = 0.1, k = 10, N = 100000)
)
registerDoParallel(cores = detectCores())
registerDoRNG(1234)

local_fits <- foreach(i = 1:5, .combine = c, .packages = "pomp") %dopar% {
  coef(sir_pomp) <- c(
    Beta = runif(1, 1, 3),
    mu_IR = runif(1, 0.1, 0.3),
    eta = runif(1, 0.1, 0.4),
    amp = runif(1, 0.1, 0.4),
    phase = runif(1, -13, 13),
    rho = runif(1, 0.01, 0.2),
    k = runif(1, 5, 50),
    N = 100000
  )

  mif2(
    sir_pomp,
    Nmif = 100,
    Np = 1000,
    cooling.fraction.50 = 0.5,
    rw.sd = rw_sd(
      Beta = 0.02, mu_IR = 0.02, eta = ivp(0.02),
      amp = 0.01, phase = 0.5,
      rho = 0.02, k = 0.02
    )
  )
}
# Evaluate local search results
local_ll <- foreach(mf = local_fits, .combine = rbind) %dopar% {
  ll_vals <- replicate(20, logLik(pfilter(mf, Np = 2000)))
  pomp::logmeanexp(ll_vals, se = TRUE)
}

print(local_ll)
##                est          se
## result.1 -765.3246 0.013108871
## result.2 -765.1753 0.018272800
## result.3 -765.9662 0.015715937
## result.4 -761.9383 0.008596403
## result.5 -766.6991 0.013875038
## attr(,"rng")
## attr(,"rng")[[1]]
## [1]       10407  -448109862  -965723501  -714383688 -1182650023  1528564038
## [7]  1798437647
## 
## attr(,"rng")[[2]]
## [1]       10407  1676527055   -47175865 -1376448368  1649226989  -706417026
## [7]   817940116
## 
## attr(,"rng")[[3]]
## [1]       10407  -619054052 -1236775269  1757470479  -426323619  -513425190
## [7] -1775510666
## 
## attr(,"rng")[[4]]
## [1]       10407  1082473876  -724750406 -1303696142    48633096  1367296037
## [7]  -341296503
## 
## attr(,"rng")[[5]]
## [1]       10407 -1534799655 -1084703472  1020818566  -727952162   473212099
## [7]  1016833306
## 
## attr(,"doRNG_version")
## [1] "1.7.4"

The local search yielded several comparable log-likelihood estimates, with the best result being -761.92 (result.4), indicating a slightly better fit than the others, which clustered around -765. The small standard errors (all < 0.02) suggest consistent and stable estimates across different initializations.

##          Beta         mu_IR           eta           amp         phase 
##  1.790624e+00  3.659259e-03  4.001746e-01  1.028887e-01 -1.132254e+01 
##           rho             k             N 
##  9.908567e-01  2.225067e-01  1.000000e+05

The final best local fit of the SIR model, with a transmission rate (1.79) and a very low recovery rate (0.00366), suggests prolonged infectious periods that result in high amplitude and frequent epidemic spikes. The amplitude of seasonal forcing (0.10) and a moderate initial susceptibility level (0.40) allow for a reasonable reproduction of seasonal peaks, but the near-perfect reporting rate (0.99) may unrealistically overestimate case visibility. As seen in the Final Best Local Fit plot, the simulated cases overestimate the true peaks and exhibit excessive volatility, especially during non-epidemic periods—indicating poor calibration to real-world underreporting and noise.

#global search space
sir_guesses <- data.frame(
  Beta = runif(20, 1, 3),
  mu_IR = runif(20, 0.1, 0.3),
  eta = runif(20, 0.1, 0.4),
  amp = runif(20, 0.1, 0.4),
  rho = runif(20, 0.01, 0.2),
  phase = runif(20, -13, 13),
  k = runif(20, 5, 50),
  N = 100000
)


#running a global search
global_results_sir <- foreach(i = 1:nrow(sir_guesses), .combine = rbind) %dopar% {
  start_params <- as.list(sir_guesses[i, ])
  mf <- sir_pomp
  coef(mf) <- unlist(start_params)

  mf <- mif2(
    mf,
    Nmif = 50,
    Np = 1000,
    cooling.fraction.50 = 0.5,
    rw.sd = rw_sd(
      Beta = 0.02,
      mu_IR = 0.02,
      eta = ivp(0.02),
      amp = 0.01,
      phase = 0.5,
      rho = 0.02,
      k = 0.02
    )
  )

  mf <- mif2(mf)

  ll_vals <- replicate(5, logLik(pfilter(mf, Np = 2000)))
  ll <- logmeanexp(ll_vals, se = TRUE)

  tibble(
    loglik = ll[1],
    se = ll[2],
    Beta = coef(mf)["Beta"],
    mu_IR = coef(mf)["mu_IR"],
    eta = coef(mf)["eta"],
    amp = coef(mf)["amp"],
    phase = coef(mf)["phase"],
    rho = coef(mf)["rho"],
    k = coef(mf)["k"],
    N = coef(mf)["N"]
  )
}
global_results_df <- as.data.frame(global_results_sir)

#Best fit based on log-likelihood
best_global_fit <- global_results_df %>%
  arrange(desc(loglik)) %>%
  head(1)

print(best_global_fit)
##      loglik        se     Beta       mu_IR       eta        amp     phase
## 1 -761.4179 0.0128743 1.614922 0.003293449 0.4398909 0.07381085 -17.03899
##         rho        k     N
## 1 0.9972932 0.222513 1e+05

In contrast, the best global fit achieves a slightly more conservative transmission estimate (1.61) and recovery rate (0.00329), with similar levels of seasonal modulation (0.074) and a slightly higher initial susceptibility (0.44). While the global simulation (shown in the Best Global Fit plot) captures the general timing of seasonal peaks more subtly than the local fit, it still fails to replicate the magnitude and width of actual outbreaks. Over time, excessive low-level simulated spikes reflect overdispersion and unrealistic epidemic cycling. These limitations in both local and global fits underscore that the basic SIR framework—even when enhanced with seasonal forcing and negative binomial observation noise—is insufficient to model Nova Scotia’s influenza dynamics. This motivates transitioning to more realistic models, such as SEIR or SIRS, incorporating latent exposure periods and immunity waning to better reflect the complex mechanisms driving flu transmission.

SIRS model

This section generalizes the SIR model to the SIRS model, which includes Susceptible (S), Infected (I), Recovered (R), and Reported (H). In terms of this model, the conservation of individuals can be expressed as the following[2]:

\[ S(t) = S(0) - N_{SI}(t) + N_{RS}(t) \] \[ I(t) = I(0) + N_{SI}(t) - N_{IR}(t) \] \[ R(t) = R(0) +N_{IR}(t) - N_{RS}(t) \]

and its counting process can be described with the ordinary differential equations (ODE)[2]:

\[ \frac{dN_{SI}}{dt} = \mu_{SI}(t)S(t) \] \[ \frac{dN_{IR}}{dt} = \mu_{IR}(t)I(t) \] \[ \frac{dN_{RS}}{dt} = \mu_{RS}(t)R(t) \] The ODEs can be solved by Euler’s numerical method. Specifically, the RHS can be expressed as a binomial approximation with exponential transition probability[2]:

\[ N_{SI}(t + \Delta t) \approx N_{SI}(t) + \text{Binomial}\left[S(t), 1 - \exp(-\mu_{SI}(t)\Delta t)\right] \]

\[ N_{IR}(t + \Delta t) \approx N_{IR}(t) + \text{Binomial}\left[I(t), 1 - \exp(-\mu_{IR} \Delta t)\right] \]

\[ N_{RS}(t + \Delta t) \approx N_{RS}(t) + \text{Binomial}\left[R(t), 1 - \exp(-\mu_{RS} \Delta t)\right] \]

We consider \(\mu_{IR}\) as the constant recovery rate and \(\mu_{RS}\) as the reinfection rate. The rate at which individuals move from \(S\) to \(I\) is the force of infection. We consider the infection rate as a time-variable function with periodic \(\beta\) to incorporate seasonality with a period of 52 weeks (a year), where \(a\), \(b\) are average transmission rates for pre-pandemic years and pandemic seasons, respectively:

\[ \mu_{SI}(t) = \beta(t) \frac{I(t)}{N}, \]

\[ \beta(t) = \beta_0(t)(1 + c \cos(2\pi(t + d)/52)) \]

\[ \beta_0(t) = a, \quad \text{in prepandemic years} \]

\[ \beta_0(t) = b = a - \Delta a, \quad \text{during pandemic year} \]

We first define a set of parameters and the SIRS model using the POMP package:

data <- read.csv("Lab-confirmed_Influenza_Cases.csv")
data$Date <- as.Date(data$Week.Ending, format = "%b %d, %Y")


# Filter and create the time variable (in weeks since first date)
data <- data %>%
  filter(Date >= as.Date("2014-09-06") & Date <= as.Date("2019-09-07")) %>%
  arrange(Date) %>%
  mutate(
    time = as.numeric(difftime(Date, min(Date), units = "weeks")),
    cases = Lab.Confirmed.Influenza.Cases
  ) %>%
  select(time, cases)
# SIRS step function (Euler method)

# Step function
sirs_step <- Csnippet("
  double beta, dN_SI, dN_IR, dN_RS;

  if (t < 261) {
    beta = a;
  } else {
    beta = b;
  }

  beta *= (1 + c * cos(2.0 * 3.141592653589793 * (t + d) / 52.0));

  dN_SI = rbinom(S, 1 - exp(-beta * I / N * dt));
  dN_IR = rbinom(I, 1 - exp(-mu_IR * dt));
  dN_RS = rbinom(R, 1 - exp(-mu_RS * dt));

  S -= dN_SI;
  I += dN_SI - dN_IR;
  R += dN_IR - dN_RS;
  S += dN_RS;
  H += dN_IR;
")

# Initial state
sirs_init <- Csnippet("
  double m = N / (S_0 + I_0 + R_0);
  S = nearbyint(m * S_0);
  I = nearbyint(m * I_0);
  R = nearbyint(m * R_0);
  H = 0;
")

# Measurement model
sirs_rmeas <- Csnippet("cases = rpois(rho * H + 1e-6);")
sirs_dmeas <- Csnippet("lik = dpois(cases, rho * H + 1e-6, give_log);")

# Build pomp object using your actual `data`
sirs_pomp <- pomp(
  data = data,
  times = "time",
  t0 = 0,
  rprocess = euler(sirs_step, delta.t = 1/7),
  rinit = sirs_init,
  rmeasure = sirs_rmeas,
  dmeasure = sirs_dmeas,
  accumvars = "H",
  statenames = c("S", "I", "R", "H"),
  paramnames = c("a", "b", "c", "d", "mu_IR", "mu_RS", "N", "S_0", "I_0", "R_0", "rho")
)

The parameters were chosen based on epidemiological data and literature. The simulation was run 500 times to reflect stochasticity. While the model successfully captures the seasonal pattern and general magnitude of flu incidence, there is a noticeable phase offset, suggesting a possible timing mismatch between real-world data and simulated peaks. This comparison implies the need for further refinement on model parameters, such as adjusting transmission or reinfection rates to better align simulated flu cycles with observed data.

Particle Filter and Likelihood for Initial Guess

To proceed with iterative simulations for local and global parameter searches, we apply particle filtering to the SIRS model. According to the resulting plot, it appears that the ESS tends to drop significantly at the start of each seasonal cycle, indicating difficulty in maintaining particle diversity during those periods, but remains reasonably stable otherwise. The initial likelihood estimate is relatively low, suggesting that there is room for improvement in parameter tuning in subsequent steps.

## Log-likelihood estimate: -5557.442
## Standard error: 0.4262953

Local Search

We then run the local search using the mif2 algorithm to optimize the SIRS model fit based on particle filtering. The random walk perturbation is chosen to be 0.01 or 0.02 for the parameters.

##             a             b             c             d         mu_IR 
##  5.324669e+00  6.626007e-01 -1.041969e-01 -2.037835e+00  8.002134e+00 
##         mu_RS             N           S_0           I_0           R_0 
##  7.104311e-02  3.250000e+08  8.109961e-01  9.401535e-03  1.521880e-01 
##           rho 
##  4.332221e-06

The trace plots from running 50 iterations show how each parameter adjusted over the iterations, with the log-likelihood values converging relatively quickly, indicating stable estimates.

The log-likelihood pair plot shows that most parameter combinations have little influence on improving model likelihood, indicating a relatively flat likelihood surface. Parameters such as S_0 and mu_RS seen to have minimal impact, while others show weak nonlinear patterns, which suggest the need for global search to further improve model fit.

The best-fit parameter set achieved a log-likelihood of -2470.286 with a standard error of approximately 7.37. The parameter set with the highest log-likelihood includes increased values for the transmission rate (a ≈ 4.81) and recovery rate (mu_IR ≈ 6.44), indicating faster disease spread and resolution than what we initially assumed. This observation is consistent with the plot, which displays significantly overly high amplitudes in the simulations despite of improved timing match. These findings also encourage to further explore the global parameter space to seek potential improvements beyond the local search.

## # A tibble: 1 × 13
##       a     b      c      d mu_IR  mu_RS          N   S_0     I_0   R_0      rho
##   <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>      <dbl> <dbl>   <dbl> <dbl>    <dbl>
## 1  4.81  1.76 -0.109 -0.247  6.44 0.0767 325000000. 0.801 0.00923 0.157  3.81e-6
## # ℹ 2 more variables: loglik <dbl>, loglik.se <dbl>

Global Search

Before running the global search, we first use the parameter guesses generated from the defined safe box to simulate the SIRS model to ensure numerical stability. The simulation outputs for the S, I, R, H, and reported cases summarized in the table below appear no negative or non-finite values, which is essential for a valid starting point in the following global search procedure.

Through running 50 mif2 chains, the output shows a wide range of log-likelihood values, indicating that some guesses led to much better model fits than others. The best chains achieved much higher log-likelihoods than in the local search phase, suggesting that the global search does help improve the model fit.

This pair plot compares the initial random parameter guesses (grey) against the optimized results from mif2 (red). The concentration and shift of red points indicate how certain parameters, such as a and mu_IR, tend to increase during optimization. The clear separation between guesses and fitted values supports that mif2 successfully refined parameters toward a higher-likelihood.

The plot below compares simulated flu reports to observed data using the best parameter set from the global search. While the simulation captures the first two peaks relatively well, it significantly underestimates later peaks. This failure to capture the true trend across the entire time frame indicates that the SIRS is not the optimal model for this flu data.

SEIRS model

We are also going to fit our SEIRS model. This model is a variation of the SIRS model that incorporates an unobserved latent period and modifies the dynamics so that recovered individuals return to the susceptible class. Such a modification is important because, in the context of influenza, individuals experience a latent period after becoming infected [anderson_may1991]. We now construct our SEIRS model.

SEIRS diagram

SEIRS diagram

Inspired by a previous project[3], we use a different implementation and parameter set for this report.

The number of individuals in each compartment is described by the following system of equations. \[ \begin{align*} S(t) &= S_0 + N_{RS}(t) - N_{SE}(t) \\ E(t) &= E_0 + N_{SE}(t) - N_{EI}(t) \\ I(t) &= I_0 + N_{EI}(t) - N_{IR}(t) \\ R(t) &= R_0 + N_{IR}(t) - N_{RS}(t) \end{align*} \]

The system of ordinary differential equations below governs the dynamics of the counting process; these equations have been adapted from the ODEs shown in the lecture[2]. In our formulation, we allow \(\mu_{SE}\) to vary with time, whereas \(\mu_{EI},\mu_{IR}\) and \(\mu_{RS}\) are held fixed. \[ \begin{align*} \frac{dN_{SE}}{dt} &= \mu_{SE}(t) S(t) \\ \frac{dN_{EI}}{dt} &= \mu_{EI} E(t) \\ \frac{dN_{IR}}{dt} &= \mu_{IR} I(t) \\ \frac{dN_{RS}}{dt} &= \mu_{RS} R(t) \end{align*} \] We can solve the differential equations by using Euler’s method[2]. \[ \begin{align*} N_{SE}(t + \Delta t) &= N_{SE}(t) +\text{Binomial}[S(t), 1 - \exp(-\mu_{SE}(t) \Delta t)] \\ N_{EI}(t + \Delta t) &= N_{EI}(t) + \text{Binomial}[E(t), 1 - \exp(-\mu_{EI} \Delta t)] \\ N_{IR}(t + \Delta t) &= N_{IR}(t) + \text{Binomial}[I(t), 1 - \exp(-\mu_{IR} \Delta t)] \\ N_{RS}(t + \Delta t) &= N_{RS}(t) + \text{Binomial}[R(t), 1 - \exp(-\mu_{RS} \Delta t)] \end{align*} \]

Because the actual population is distributed across different compartments, we introduced new parameters representing the fraction of the population in each compartment, which also require estimation. Our approach mirrors the formulation from the lecture notes[4], as shown in the following equations. \[ \begin{align*} \text{Pop} &= N / (S_0 + E_0 + I_0 + R_0) \\ S &= S_0 \cdot \text{Pop} \\ E &= E_0 \cdot \text{Pop} \\ I &= I_0 \cdot \text{Pop} \\ R &= R_0 \cdot \text{Pop} \end{align*} \]

To keep the model straightforward, we omitted demographic turnover (births and deaths) and held the population fixed at 969400, corresponding to Nova Scotia’s estimated population in 2021[5].

Reffering to the previous project[3], in order to accurately capture the natural seasonal variations (the cycle of 52 weeks) in influenza cases, we adapted our transmission rate parameter. \[ \beta = \beta_0 \cdot (1 + \text{amp} \cdot \sin(2 \pi \cdot (t + \text{phase}) / 52)) \]

In this formulation, \(\beta_0\) denotes the baseline transmission rate, \(\text{amp}\) specifies the amplitude of the seasonal forcing (the sine wave), and \(\text{phase}\) is its phase offset. These three quantities are treated as additional parameters to be estimated from the data. We also updated the measurement component to use a negative binomial distribution, following the lecture notes[2]. To confirm that this SEIRS structure is reasonable for influenza incidence, we conducted several trial simulations with hand-picked parameter values.

##      Beta0        amp      phase      mu_EI      mu_IR      mu_RS          N 
##  2.000e+00  3.000e-01 -4.500e+00  8.000e-01  2.000e+00  1.000e+00  9.694e+05 
##         S0         E0         I0         R0        rho          k 
##  1.000e-01  1.000e-02  1.000e-02  3.000e-01  5.000e-04  1.000e+01

It seems that our modified SEIRS model succeeds to capture the seasonality in the influenza data. Although some peaks are not perfectly overlapped and the original data seem to decay more rapidly than the simulation, we will adopt this set of parameters to carry out our further analysis.

Let us first check that the basic particle filter is working.

##           est            se 
## -1.346771e+03  5.054138e-02

Our simulation with hand-picked parameter values obtains log-likelihood of -1346.7. And in the particle-filter diagnostics, ESS collapses at every start of an influenza cycle, but even the lowest ESS is far above 0.

Now we construct our local search based on the manually-determined parameters.

Local Search

In order to control the growth rates of \(\mu_{EI}\) and \(\mu_{IR}\) (to make \(\mu_{EI}\) and \(\mu_{IR}\) more sensible[6]), we set mu_EI = 0.002, mu_IR = 0.002 in our rw.sd specification[7]. And we also fix cooling.fraction.50=0.5, so that after 50 mif2 iterations, the perturbations are reduced to half their original magnitudes.

From the trace plot, the “mu” rates (including the paramters that constructing \(\mu_{SE}\)) and to a lesser extent \(\rho\) remain unstable, but that is not a main issue since most core parameters and loglik converge nicely. And we reach a log-likelihood of -587.5 in our local search, which is hundreds of log-units higher than the initial guess.

Moreover, we can view a pair-plot of the model parameters, omitting \(S_0, E_0, I_0\) and \(R_0\)– which all converged quickly–to make the plot easier to read.

Global Search

Referring to the previous project[3], we adopted the parameter ranges from its last global search as our own–except that we narrowed the mu_IR range to \((0,5)\) to make it more sensible[6] and also added \(S_0, E_0, I_0, R_0\) and \(k\) to the design matrix for a more thorough global search. We then reduced the sequence size to 100 so that the run can complete within the time limits on greatlakes. The following plots show a pair-plot, parameters with highest log-likelihood, and a simulation with best paramters.

Beta0 amp phase mu_EI mu_IR mu_RS rho k loglik
2.234572 0.5078962 -0.4161336 1.210331 2.233532 0.1266514 0.0011323 2.382173 -586.5958

We find that the highest log-likelihood (-586.6,which is also the highest among the models in this report) in our global search is a little bit higher than the local search (-587.5). And the simulation using the best parameters appears to fit the original data better than the initial simulation. Even though the magnitude of our simulation peak is too large, the peaks overlap, and once the original data has passed its peak and begun to decline, our simulation fits the data more closely.

Furthermore, we also want to check the effective sample size of our final simulation.

Although the ESS still plunges at the start of each influenza cycle, its minimum value remains well above zero, indicating our best parameters fit the data well.

Profile Likelihood over \(\rho\)

For each fixed value of \(\rho\), we maximize the likelihood over all remaining parameters; plotting those maximized values against \(\rho\) produces the profile-likelihood curve.

The following plots and table show a pair-plot, a scatter plot of profile likelihood over \(\rho\) and its 95% confidence interval.

## # A tibble: 1 × 2
##       min     max
##     <dbl>   <dbl>
## 1 0.00177 0.00177

The reporting rate with the highest log-likelihood is pretty close to 0.0018, while its 95% confidence interval is quite narrow and around 0.001769433. It reveals that the only value of \(\rho\) that keeps the log-likelihood within 1.92 log-units of its maximum is approximately 0.1769% and only about 0.1769% of true infections are actually reported in our data–roughly 1-2 reported cases for every 1,000 infections that occur.

And we also see that the reporting rate from our best-fit parameters (0.001132336) lies outside the 95% confidence interval, and that the peak log-likelihood reached in the \(\rho\)-profile is higher than the maximum log-likelihood (–586.6) we obtained in our earlier global search. If we want to take this project further, one option would be to fix \(\rho\) at 0.001769433 and then-in the global search-either widen the parameter bounds or increase the number of starting points (i.e. the size of the random design) in order to get higher log-likelihood[8].

Conclusion

loglik_sir  <- logLik(pfilter(mif_sir, Np = 2000))
loglik_seir <- logLik(pfilter(mif_seirs, Np = 2000))
loglik_sirs <- logLik(pfilter(mif_sirs, Np = 2000))

print(c(SIR = loglik_sir,  SIRS = loglik_sirs,SEIRS = loglik_seir))
##         SIR        SIRS       SEIRS 
##   -761.9695 -19747.2718   -591.7108

Based on the log-likelihood values, the SEIRS model performs best, with the lowest value of -590.46, followed by the SIR model at -761.97. The SIRS model has a much higher value of 19821.71.

Through applying POMP-based compartmental models, we explored the stochastic and seasonal dynamics of influenza transmission in Nova Scotia. While the basic SIR model captured general outbreak patterns, it overestimated peaks and lacked flexibility in representing immunity and exposure delays. The SEIRS model, incorporating a latent period and waning immunity, provided the best fit based on log‑likelihood and visual comparison with observed case data. It more accurately reflected the timing and shape of seasonal outbreaks, though it still faced challenges in reproducing case magnitude due to underreporting and overdispersion. Because the SIRS model lacks a latent period, it performed poorly relative to SEIRS. Overall, the SEIRS model emerged as the most effective framework for simulating influenza dynamics in this setting, demonstrating the utility of POMP modeling in capturing complex epidemic processes and informing disease surveillance strategies.

Previous Project Comparisons

For the SEIRS model, we primarily built upon the framework of Project 5 (2024)[3]. We adopted their seasonal modification of \(\beta\) to capture influenza seasonality better and used the same parameter ranges from their final global search. However, in our rw.sd specification we set mu_EI = 0.002 and mu_IR = 0.002 to control the growth rates of \(\mu_{EI}\) and \(\mu_{IR}\) making these values more epidemiologically plausible[6]. We also restricted the search range for \(\mu_{IR}\) to \((0,5)\) for greater realism[6], and we expanded the design matrix to include the initial state proportions (\(S_0, E_0, I_0, R_0\)) and the dispersion parameter \(k\) for a more comprehensive global search. Finally, we conducted a profile-likelihood analysis over the reporting rate \(\rho\), which was not performed in the prior project.

For the SIRS model, we refer to the structure of Project 20 (2022)[9]. Since we also noticed significant seasonality in earlier analytical work, we retained the seasonality in transmission rate \(\beta(t)\). Rather than using a fixed \(\mu_{RS}\), we estimated it in our local search to better capture the true waning immunity. Additionally, we used full local optimization via particle filtering (mif2) with parameter sets with broader ranges, in contrast to the simple profile likelihood used in the earlier project. We also included the plot “SIRS Simulation vs Observed Data” with the best local and global fits, which provide visualizations of how well the model fits.

References.

1. Nova Scotia, G. of. (2024). Lab-confirmed influenza cases. https://ouvert.canada.ca/data/dataset/b9af5b9c-d2b1-41ce-4dc6-8cbb3ca449dc
2. Ionides, E. L. (2025). Chapter 12: Introduction to simulation-based inference for epidemiological dynamics via the pomp r package. https://ionides.github.io/531w25/12/index.html.
3. Modeling flu cases in oklahoma. (2024). https://ionides.github.io/531w24/final_project/project05/blinded.html
4. King, A., & Ionides, E. (2024). Notes for STATS/DATASCI 531, Modeling and Analysis of Time Series Data: Chapter 18. https://kingaa.github.io/sbied/measles/notes.pdf
5. Canada, S. (2021). Statistics canada: 2021 census of population. https://www12.statcan.gc.ca/census-recensement/2021/dp-pd/prof/details/page.cfm?Lang=E&DGUIDlist=2021A000212&GENDERlist=1&STATISTIClist=1&HEADERlist=0
6. Anderson, R. M., & May, R. M. (1991). Infectious diseases of humans. Oxford University Press.
7. OpenAI. (2025). chatGPT. Note: chatGPT was asked "how to control the growth rates of mu_EI and mu_IR". https://chatgpt.com/
8. OpenAI. (2025). chatGPT. Note: chatGPT was asked "why the peak log-likelihood reached in the rho-profile is higher than the maximum log-likelihood we obtained in our earlier global search". https://chatgpt.com/
9. Ionides, E. et al. (2022). Final project 20: Blinded analysis. https://ionides.github.io/531w22/final_project/project20/blinded.html#43_Particle_filter_and_likelihood_for_intial_guess.