Introduction

Dengue fever, a mosquito-borne viral disease, continues to pose a public health concern amid rising international mobility and climate shifts that expand vector habitats (1). While the United States reports few locally transmitted cases, imported infections remain persistent due to global travel, presenting ongoing health risk to our communities.

This analysis leverages weekly reported data from the denguedatahub package in R. Using this time series data that spans 53 weeks per year with no missing observations, we fit both compartmental models — SIRS and SEIR — to explore the disease’s temporal dynamics. These models allow us to simulate susceptible–infected–recovered trajectories and assess how latent exposure (as introduced in SEIR) alters the transmission dynamics. To benchmark performance and guard against model misspeciification or parameter noncovergence, we fit a statistical SARIMA model on the same data. This method provides a grounded basis for evaluating epidemiological interpretability relative to empirical forecasting accuracy.

Data

The data provides travel-associated dengue case counts in the U.S. and territories from 2010 to 2023. For our analysis, we specifically focus on the most recent two-year period, from 2022 to 2023 (106 weeks), to ensure alignment with current travel patterns post-pandemic. This subset offers a dense, high-frequency time series with consistent weekly reporting, making it well-suited for evaluating dynamic models over a complete annual cycle.

The time series plot of weekly reported dengue cases from 2022 to 2023 exhibits a clear seasonal pattern. Case counts remain low through the winter and spring months (Weeks 1–25), begin to rise sharply in early summer (around Weeks 26–30), and reach their peak between Weeks 35 and 45. A rapid decline follows toward the end of the year. This seasonal pattern aligns with increased international travels during the summer and with the expected seasonal dynamics of vector-borne diseases, which thrives in warmer and rainy climates.

Explanatory Data Analysis (EDA)

We plot three autocorrelation function (ACF) at different lags, 25,53, and 104, to understand the short-term and/or long-term dependency. The acf plots show a sinusoidal pattern. The oscillating pattern displayed in the plots supports that the data is non-stationary. The lack of significant partial autocorrelation at higher lags suggests that higher-order AR terms are unnecessary.

When the ACF is extended to a lag of 53, a dip into negative autocorrelation between lags 20 and 40 is followed by an increase near lag 53. This pattern suggests a seasonal structure with a periodicity of one year, consistent with 53 weeks in the dataset. At a lag maximum of 104, the ACF exhibits a repeating wave pattern every 50–55 lags, reinforcing the presence of strong annual seasonality. The gradually damped oscillations are characteristic of a seasonal ARMA process, supporting the use of a SARIMA model to capture both short-term and seasonal dependencies.

With this evidence, we proceed to model selection by comparing AIC values across a grid of candidate SARIMA models.

Since we have evidence of annual seasonality, we consider SARIMA models with period = 53.

\[ \phi(B)\Phi(B^{53}) \left[ (1 - B)^d (1 - B^{53})^D Y_n - \mu \right] = \psi(B)\Psi(B^{53})\varepsilon_n \]

The NA values are likely due to optimization problem.

We also compare models with only seasonal AR(1) and only seasonal MA(1).

By comparing the AIC values and model complexity, the most appropriate model is:

\[ \text{SARIMA}(2, 0, 0) \times (0, 0, 1)_{[53]} \]

\[ \begin{aligned} (1 - \phi_1 B - \phi_2 B^2) \left[ Y_n - \mu \right] = (1 + \Theta_1 B^{53}) \varepsilon_n\\ \varepsilon_n &\sim \text{white noise}, \\ \mu &\text{ is the mean of the stationary process } Y_n. \end{aligned} \]

Based on this model, we fit our benchmark model.

SARIMA Model Fitting

## 
## Call:
## arima(x = ts_df, order = c(2, 0, 0), seasonal = list(order = c(0, 0, 1), period = 53))
## 
## Coefficients:
##          ar1     ar2    sma1  intercept
##       0.5301  0.3737  0.7898    49.0243
## s.e.  0.1087  0.1060  0.8720    19.1205
## 
## sigma^2 estimated as 179:  log likelihood = -444.63,  aic = 899.26

It is stable, well-identified, and provides the best fit among all candidate models based on the lowest AIC. To confirm, we manually calculate AR and MA roots.

## AR Roots: 1.073792-5.169879e-26i -2.492376+5.169879e-26i
## SMA Roots: 1.26612+0i
## Absolute values of AR roots: 1.073792 2.492376
## Absolute values of SMA roots: 1.26612

All AR roots lie outside the unit circle, indicating that the fitted SARIMA model is stationary and causal. While one AR root is relatively close to the threshold, it still satisfies the condition for stationarity. To confirm model adequacy, we look at the residuals to examine any signs of autocorrelation or model misfit.

Residual Analysis

The residuals from the SARIMA model appear to be white noise overall. They are centered around zero, with consistent variance throughout most of the series. The histogram of residuals shows an approximately symmetric, bell-shaped distribution with a mild left skew. The Q-Q plot further supports approximate normality, as most residuals fall along the theoretical line. Slight deviations in the tails, particularly on the left, are consistent with the observed outliers in the time series plot.

Taken together, these diagnostics indicate that model assumptions are reasonably satisfied. The residuals appear stationary, show no strong autocorrelation or skewness, and approximate normality holds aside from a few moderate outliers.

Given its ability to capture both short-term and seasonal dynamics, the SARIMA(2,0,0)(0,0,1)[53] model is able to serve as a reasonable benchmark model (with a log likelihood of approximately -445).

SIRS Model

We will first present a the classical SIRS (Susceptible–Infected–Recovered–Susceptible) model from the epidemiological lecture presented in class (3). The idea of using this model comes from final project 20 in 2022 (4). The model accounts for temporary immunity by allowing recovered individuals to return to the susceptible compartment. We state the governing ordinary differential equations, derive the disease‑free equilibrium and the basic reproduction number \(R_0\), and compute the endemic equilibrium.

The SIRS model extends the classical SIR framework in the lecture by incorporating temporary immunity: recovered individuals lose immunity at rate \(\xi\) and re‑enter the susceptible pool (5). This is motivated by infections such as influenza, where immunity wanes over time (6).

Model Formulation

Let the total population \(N\) be constant and partitioned into three compartments: \[ S(t),\quad I(t),\quad R(t), \qquad \text{s.t.} \quad S(t) + I(t) + R(t) = N. \] Define parameters:

  • \(\beta\): transmission rate
  • \(\gamma\): recovery rate
  • \(\xi\): rate of loss of immunity

Under the mass‑action assumption, new infections occur at rate \(\frac{\beta SI}{N}\). Recovered individuals lose immunity and return to \(S\) at rate \(\xi R\). The system of ordinary differential equations is:

\[ \begin{aligned} \frac{dS}{dt} &= \xi\,R \;-\;\frac{\beta\,S\,I}{N},\\[6pt] \frac{dI}{dt} &= \frac{\beta\,S\,I}{N} \;-\;\gamma\,I,\\[6pt] \frac{dR}{dt} &= \gamma\,I \;-\;\xi\,R. \end{aligned} \]

Disease‑Free Equilibrium and \(R_0\)

Set \(I=0\) and steady state (\(dS/dt = dR/dt = 0\)). The disease‑free equilibrium is:

\[ E_0:\quad (S^*, I^*, R^*) = (N, 0, 0). \]

Then we can linearize around \(E_0\) or apply the Next Generation Matrix method which yields the basic reproduction number, which is:

\[ R_0 \;=\;\frac{\beta}{\gamma}. \] If \(R_0 < 1\), the DFE is locally asymptotically stable and an outbreak cannot occur.
If \(R_0 > 1\), the disease can invade and produce an epidemic.

Endemic Equilibrium

When \(R_0 > 1\), there is a non‑trivial steady state \(E^*=(S^*,I^*,R^*)\). Then, solving \(\frac{dS}{dt}=\frac{dI}{dt}=\frac{dR}{dt}=0\) under \(I^*>0\) gives:

\[ \begin{aligned} S^* &= \frac{\gamma}{\beta}\,N,\\[6pt] I^* &= \frac{\xi}{\gamma}\,R^*,\\[6pt] R^* &= \frac{\gamma}{\xi}\,\bigl(R_0 - 1\bigr)\,N. \end{aligned} \] Hence the endemic prevalence \(I^*/N\) increases with the immunity‑loss rate \(\xi\) and with \(R_0-1\).

Model Construction

We include both seasonality and a “pandemic switch” at week 29. Specifically, this means that we allow case-based transmission; transmission that depends on if we are looking at before or after week 29:

\[ \beta_0 = \begin{cases} b, & \text{if } t > \text{pandemic week} \\ a, & \text{otherwise} \end{cases} \]

Then, the full transmission rate is given by:

\[ \beta(t) = \beta_0 \left(1 + c \cdot \sin\left(\frac{2\pi(t + d)}{52}\right)\right) \] where \(c\) is the amplitude of seasonality and \(d\) is the phase shift (in weeks); the week of each year corresponding to a peak in transmission (with a sign component).

We now choose a plausible initial parameter vector based on the knowledge of dengue dynamics and then run a quick set of simulations to verify that our SIRS model reproduces the rough timing and magnitude of the observed case counts (8).

The chosen values for a and b reflect a modest drop in transmission after the “pandemic” threshold at week 29. We set a >b because we noticed that the second peak after week 29 is larger than the first peak. Also, by testing, we found that when a, b < 1, we had bad performance for the model’s placement of peaks and valleys. The seasonal sine term with amplitude \(c=0.4\) and phase shift \(d=–25\) aligns our simulated peaks roughly with observed peaks. Recovery and immunity‐recovery rates (\(mu_IR=0.8\),\(mu_RS=0.25\)) correspond to an average duration of 1.25 weeks and 4 weeks, respectively (9). Finally, we fixed \(N\) and \(\rho\) based on surveillance coverage; all others will be estimated.

From the plot, we can see a moderately good performance, but this can be improved. We will do more objective measures such as a local search and global search.

We leverage particle filtering to check the health of the filter via its effective sample size (ESS).

The top panel shows the observed dengue reports over time. The middle panel plots the effective sample size (ESS) at each week — we see that ESS occasionally dips, especially at the start of each seasonal cycle, but generally remains very close to the total particle count, indicating adequate particle diversity. The bottom panel displays the conditional log‐likelihood of each data point given the filter’s history. After initial transients (log‐lik as low as –20 per week), the conditional log‐lik stabilizes around –10 to –5, showing that the filter begins to track the data more reliably as time progresses.

SEIR Model

We also implemented a stochastic SEIR model to describe the transmission dynamics of dengue fever in a fixed population. A diagram of this type of compartmental model is displayed below (10).

Model Formulation

The population is divided into four latent and one observed compartment(s):

  • \(S(t)\): number of susceptible individuals (latent)
  • \(E(t)\): number of exposed individuals (latent)
  • \(I(t)\): number of infectious individuals (latent)
  • \(R(t)\): number of recovered individuals (latent)
  • \(C(t)\): number of reported cases (observed)

We state the following as assumptions of our model:

  • Susceptible individuals become exposed at a time-varying transmission rate \(\beta(t)\) governed by the force of infection. Transmission is controlled by seasonal variation in the force of infection in order to capture the patterns found concerning the connection between the climate (specifically, rainfall) and increased mosquito abundance which leads to a higher chance of contracting dengue fever (11).
  • Exposed individuals become infectious at rate \(\mu_{EI}\).
  • Infectious individuals recover at rate \(\mu_{IR}\).
  • Cases are reported at rate \(\rho\).
  • Individuals who recover gain full immunity and can not be re-infected.

The force of infection is defined as (13):

\[ \mu_{SE}(t) = \frac{\beta(t)}{N} I(t) \]

where \(N\) is the total population size (in this case, the population size of the U.S. states and territories) and \(\beta(t)\) is the seasonal transmission rate, modeled as:

\[ \beta(t) = \beta_0 \left[ 1 + a \cdot \cos\left( \frac{2\pi(t - \phi)}{T} \right) \right] \]

with:

  • \(\beta_0\): baseline transmission rate
  • \(a \in [0, 1)\): amplitude of seasonality
  • \(\phi\): phase shift (in weeks); the week of each year corresponding to a peak in transmission
  • \(T = 52\): period of 1 year
  • \(t\): time (in weeks)

This cosine function introduces periodic variation in \(\beta(t)\), leading to annual peaks in transmission. The phase parameter \(\phi\) determines the week of peak transmissibility, which we expect should correspond to the week of the year associated with high precipitation levels.

Additionally, the model is implemented stochastically using binomial transitions where (5):

  • \(\Delta N_{SE} \sim \text{Binomial}(S, 1 - e^{-\beta I/N \Delta t})\)
  • \(\Delta N_{EI} \sim \text{Binomial}(E, 1 - e^{-\mu_{EI} \Delta t})\)
  • \(\Delta N_{IR} \sim \text{Binomial}(I, 1 - e^{-\mu_{IR} \Delta t})\)
  • \(S_{t+\Delta t} = S_t - \Delta N_{SE}\)
  • \(E_{t+\Delta t} = E_t - \Delta N_{EI} + \Delta N_{SE}\)
  • \(I_{t+\Delta t} = I_t - \Delta N_{IR} + \Delta N_{EI}\)
  • \(R_{t+\Delta t} = R_t + \Delta N_{IR}\)

To start our initial experimentation, we propose the following parameter values: \(\beta = 1.5, mu_{EI} = 1/4, mu_{IR} = 1/9, \textit{rho} = 0.9, a = 0.7, \phi = 25, N = 3200000, \eta = 0.1, k = 10\) using the information provided by the World Health Organization on the incubation and recovery period (15).

The simulations suggest that this is a reasonable choice to start, as it shows promising signs of the desired seasonality. To better visualize the shape of the simulations, we have included a plot below which visualizes the original data along with the median of the 20 simulations (14).

As we can see, while the size of the peaks for the simulations’ median are lower than the original data, the timing of the peaks is relatively accurate. Now, using these initial parameter values, we performed a local search (12).

Local Search

The log likelihood converged to a maximum of -447.83 when using the following parameters:

## # A tibble: 1 × 13
##    Beta     a   phi mu_EI mu_IR       N   eta   rho     k loglik loglik.se    Np
##   <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>  <dbl>     <dbl> <dbl>
## 1  5.91 0.349  25.3 0.601 0.684  3.20e6 0.117 0.743  10.0  -448.    0.0474  2000
## # ℹ 1 more variable: nfilt <dbl>

Using these parameters, we can run simulations to see how well it allows the model to fit the data.

Looking at the plots, we can see that it does well to incorporate the seasonality and the the magnitude of the peaks is more accurate.

Global Search

We move on to conduct a global search for optimal parameters by using 200 random initial parameter sets (12). The box of initial parameter values was: \(\beta \in (1,80)\), \(\rho \in (0.1, 1)\), \(\eta \in (0.01,1)\), \(mu_{EI} \in (0.1, 6)\), \(mu_{IR} \in (0.1, 6)\)

## # A tibble: 1 × 11
##    Beta   rho   eta mu_EI mu_IR     a   phi        N     k loglik loglik.se
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>  <dbl>     <dbl>
## 1  8.05 0.287 0.881 0.988  35.6 0.135  25.8 3200000.  10.0  -447.    0.0435

We achieved a max log likelihood of -446.79 using the parameters above which were found when conducting our global search. However, this is a very small increase in the loglikelihood compared to our local search. Once again, we present simulations of the model using these parameters.

Overall, we can see that the model fits the data relatively well using these parameters. So, the local and global search has allowed us to determine parameter values for the proposed SEIR model which maximize the log likelihood and allows for a seemingly nice fit for the data (in terms of recognizing seasonal patterns and the scale of the changing transmission dynamics). This is also emphasized by the fact the loglikelihood of the models found through the local and global search (-447.83 and -446.79, respectively) were very close to the benchmark SARIMA model which had a loglikelihood of approximately -445.

Conclusion

In this project, we thoroughly investigated possible modeling techniques for data on dengue fever in the U.S. states and territories in 2022-2023. We began by constructing a SARIMA model to fit the data. During model comparison using AIC, we choose the SARIMA(2,0,0)×(0,0,1)[53] model, which has a log likelihood of approximately -445. This was used as a benchmark for our SIRS and SEIR models. We developed an SIRS model that incorporateed both seasonality and a pandemic‐phase switch when fitting. After a search of parameters to maximize the loglikelihood, this model achieved a loglikelihood of -440. We also consturcted an SEIR model, estimating parameters via particle filtering coupled with the MIF2 algorithm. Through local search and a subsequent global search over 2,000 particle‐filtering replicates, the model’s log‐likelihood was around -446.79. The log likelihoods of both the SIRS and SEIR model were close to the baseline of the SARIMA model, which is a desirable feature. Both od the fitted models seem to accurately reproduce the seasonal patterns and magnitudes of dengue case peaks and troughs observed in the data. By incorporating both seasonality and dynamic transmission parameters, theses approaches appear to capture realistic epidemic waves without overfitting.

References

[1] World Health Organization. Dengue and severe dengue. Retrieved from https://www.who.int/news-room/fact-sheets/detail/dengue-and-severe-dengue (Accessed April 22, 2025).

[2] aic_table_seasonal_ar, aic_table_seasonal_ma functions were generated with help of chatGPT.

[3] Chengjun Sun, Wei Yang, Global results for an SIRS model with vaccination and isolation, Nonlinear Analysis: Real World Applications, Volume 11, Issue 5, 2010, Pages 4223-4237, https://doi.org/10.1016/j.nonrwa.2010.05.009.

[4] Stats531 Final Project - Flu Cases Study, 2022, https://ionides.github.io/531w22/final_project/project20/blinded.html

[5] Edward L. Ionides, DATASCI/STATS 531. Chapter 13: Simulation of stochastic dynamic models https://ionides.github.io/531w25/13/index.html

[6] Herbert W. Hethcote, The Mathematics of Infectious Diseases, SIAM REVIEW Vol. 42, No. 4, 2000, pp. 599–653, https://math.uchicago.edu/~shmuel/Modeling/hethcote2000%20Mathematics%20of%20Infectious%20diseases.pdf

[7] Edward L. Ionides, DATASCI/STATS 531. Chapter 14: Likelihood for POMP models: Theory and practice https://ionides.github.io/531w25/14/index.html

[8] Liliana Sánchez-González, Laura Adams, Gabriela Paz-Bailey, Dengue, CDC Yellow Book 2024, Travel-Associated Infections & Diseases, https://wwwnc.cdc.gov/travel/yellowbook/2024/infections-diseases/dengue

[9] Xavier Rodó, Better Understanding Dengue Dynamics: How Long Does Immunity Last, ISGlobal, 2023, https://www.isglobal.org/en/healthisglobal/-/custom-blog-portlet/immunity-dengue-might-not-be-long-lasting

[10] “Study of daily COVID-19 Infected cases in the United States.” 2021 Final Project. url: https://ionides.github.io/531w21/final_project/project02/blinded.html

[11] Abdullah, Nur Athen Mohd Hardy et al. “The association between dengue case and climate: A systematic review and meta-analysis.” One health (Amsterdam, Netherlands) vol. 15 100452. 31 Oct. 2022, doi:10.1016/j.onehlt.2022.100452

[12] Edward L. Ionides, DATASCI/STATS 531. Chapter 15: Likelihood maximization for POMP models https://kingaa.github.io/sbied/mif/slides.pdf

[13] Edward L. Ionides, DATASCI/STATS 531. Chapter 18: A case study of measles: Dynamics revealed in long time series https://kingaa.github.io/sbied/measles/slides.pdf

[14] “Analysis of Middle-East Respiratory Syndrome coronavirus in Saudi Arabia.” 2024 Final Project. url: https://ionides.github.io/531w24/final_project/project15/blinded.html

[15] “Dengue and Severe Dengue.” World Health Organization, World Health Organization, www.who.int/news-room/questions-and-answers/item/dengue-and-severe-dengue.