Ireland Covid Cases

COVID-19 Case Trends in Ireland: Vaccine Impact and Seasonal Behavior

Covid-19 Testing Site at an Airport (Burns, 2020)
Covid-19 Testing Site at an Airport (Burns, 2020)

Introduction

The COVID-19 pandemic has highlighted the critical need for effective public health strategies to manage infectious diseases, both in the short term and for future resilience. As countries, including Ireland, continue to respond to the ongoing effects of the pandemic, understanding trends in COVID-19 case data is essential for guiding public health interventions and long-term policy development. This report focuses on two key aspects of COVID-19 dynamics in Ireland: the seasonal behavior of the virus, with separate analyses for the pre- and post-peak periods, and the impact of vaccine rollouts on case trends.

Analyzing the seasonal behavior of COVID-19 provides valuable insights into the virus’s patterns and potential periodic outbreaks, which can help inform anticipatory strategies for future waves. Following this, the examination of vaccine rollouts offers a closer look at their effectiveness in curbing case numbers and guiding policy decisions aimed at mitigating the pandemic’s impact. Together, these analyses inform both immediate response strategies and long-term planning in the management of infectious diseases.

Data preparation and EDA

This study utilizes the COVID-19 SDU Acute Hospital Time Series Summary dataset, which provides a comprehensive range of COVID-19 related indicators for acute hospitals in Ireland. The dataset includes information on confirmed COVID-19 cases, new admissions, and discharges across 29 acute hospitals, offering valuable insights into the pandemic’s impact on hospital services.

The data spans from mid-March 2020, capturing the progression of the pandemic and its effects on hospital operations over time. This temporal coverage allows for an in-depth analysis of trends, seasonal variations, and the impact of interventions such as vaccination rollouts. The dataset is publicly accessible through Ireland’s COVID-19 Data Hub, a resource managed by the Health Service Executive (HSE). This platform consolidates various COVID-19 related data, facilitating transparency and supporting research efforts.

Based on the dataset from the COVID-19 SDU Acute Hospital Time Series Summary, an initial exploration of the data reveals significant variations in the number of confirmed COVID-19 cases over time. The time series plot illustrates notable spikes in case numbers, particularly during early 2021, reflecting the impact of successive waves of the virus. The histogram of daily case counts further highlights the distribution of the data, with the majority of daily counts concentrated below 500 but with a few extreme outliers, where cases surge above 2,000. This suggests a right-skewed distribution, with occasional outbreaks leading to sharp increases in confirmed cases.

Building on the initial analysis, we further investigate the temporal dependencies in the COVID-19 case data using the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots. The ACF plot shows a gradual decay, indicating long-range dependence in the data, typical of time series with trends or seasonality. The PACF plot reveals a significant spike at lag 1, suggesting an autoregressive component, with the influence of earlier lags diminishing. These patterns highlight the need for advanced models like ARIMA or SARIMA to capture both short-term dependencies and long-term trends in the data.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  covid$Num_cases
## Dickey-Fuller = -4.9882, Lag order = 12, p-value = 0.01
## alternative hypothesis: stationary

Although the Augmented Dickey-Fuller (ADF) test suggests that the COVID-19 case data is stationary (p-value = 0.01), indicating that the series does not exhibit a unit root or significant non-stationarity, the seasonal decomposition of the time series offers a more nuanced view of the data. While the ADF test implies that any linear trends or unit roots have been removed, the decomposition still reveals a notable trend component, which suggests there are underlying long-term patterns that might not have been fully captured by the test.

The seasonal decomposition breaks down the data into observed, trend, seasonal, and random components. The trend component shows a significant rise in case numbers through early 2022, followed by a sharp decline. This pattern likely reflects an initial period of exponential growth in cases, driven by the spread of the virus and more contagious variants, followed by a reduction as vaccination campaigns and public health measures took effect. The seasonal component shows recurring fluctuations, with consistent peaks during winter months, indicating a strong seasonal effect, potentially linked to environmental factors such as colder weather or increased indoor gatherings. The random component captures unexplained variability, highlighting irregular spikes in case numbers, which could be attributed to sudden outbreaks, reporting changes, or shifts in pandemic dynamics.

These insights emphasize the complex interplay between long-term trends, seasonal cycles, and random fluctuations, underscoring the need for sophisticated modeling techniques that can account for both predictable patterns and unexpected disruptions in the data.

Seasonal Analysis

From our own experiences, we know that diseases such as the flu and influenza have a clear seasonal trend peaking in the winter months and declining over the spring and summer (CDC 2024). A natural extension is whether the COVID-19 virus also follows a seasonal trend and whether the trend has changed post and during the pandemic years.

Specifically, we’ll split the dataset at the date 2/13/2023, with points before being our pandemic time frame and points after being our post-pandemic. While it was officially announced that the pandemic concluded in May of 2023, we set our split in February as we need a minimum of 730 data points to run our seasonal decomposition in the post-pandemic time frame. The plot above shows where this exact split was made in our time series data

From the looks of the plot, it feels as if there is some seasonal cycle in both time frames when focusing on its sharp peaks. The plot also seems to hint at the fact that the seasonal trends have shifted in these two time frames with the post-pandemic looking more stable. However, things might be more clear when looking at the ACFs.

While it might be hard to tell from the figure above when focusing just on the PACF plots it seems every 7th lag is significant during the pandemic and at roughly the 12th lag for the post-pandemic era. This might suggest there is a weekly seasonal trend during the pandemic and more of a biweekly trend post-pandemic. However, we must note the fact that the lags in the post-pandemic setting are not nearly as significant as its counterpart and could be due to random chance alone rather than a signal for seasonality.

Taking one step further let’s look at the frequency decomposition of these two time frames using the decompose function in R as another sanity check on a seasonal presence in the data.

From the decomposition plots, when focusing on the seasonal plot there seems to be a clear cyclical cycle present in both cases. In addition, it seems to follow our intuition that the trend during the pandemic is increasing followed by a decline post-pandemic.

Using our findings from the PACF plots above we can try a seasonal difference on our data at these two lags.

A spectral analysis could be used to further confirm the change in seasonality in these two time frames.

Table 1.1
Pan Post.Pan
360 250

While we notice that the strongest frequency doesn’t lie on \(\frac{1}{7}\) and \(\frac{1}{12}\) respectively from Table 1.1, it does lie on the second peak for each spectral plot respectively as highlighted by the red vertical line! This further hints at the fact that these lags are significant in analyzing these time series data respectively!

Model Selection

Since it seems a seasonal trend is present in the data we’ll try and find the best SARMA model for both time frames using a grid search approach specified on AIC.

A SARMA model is created by adding an additional seasonal term to the ARMA model. This is defined as the following:

\[ SARMA(p,q) \times (P,Q)_{m} \]

The \(m\) is used to specify when these seasonal terms will be applied. Since we found in our PACF plot during the pandemic in Figure 1.2 of the significance of each 7th lag our \(m\) would be 7 or applied as weekly polynomial making:

\[ SARMA(p,q) \times (P,Q)_{m} \] \[ \phi(B)\Phi(B^{7})(Y_{n}-\mu)=\psi(B)\Psi(B^{7})\epsilon_{n} \] with \(\epsilon_{n}\) being white noise and \[ \mu = \mathbb{E}[Y_{n}] \] \[ \phi(x) = 1-\phi_{1}x - \cdots - \phi_{p}x^{p} \] \[ \psi(x) = 1 + \psi_{1}x + \cdots + \psi_{q}x^{q} \] \[ \Phi(x) = 1-\Phi_{1}x - \cdots - \Phi_{P}x^{P} \] \[ \Psi(x) = 1 + \Psi_{1}x + \cdots + \Psi_{Q}x^{Q} \]

Table 1.2: P=1,Q=0
MA0 MA1 MA2 MA3 MA4 MA5
AR0 12885.71 11834.57 11163.29 10599.86 10440.14 10386.15
AR1 9935.54 9807.52 9768.71 9689.14 9651.59 9573.04
AR2 9729.72 9522.84 9517.36 9497.22 9498.21 9497.30
AR3 9639.82 9519.89 9522.22 9702.74 9653.10 9500.75
AR4 9530.81 9502.02 9501.57 9796.09 9449.82 9439.38
Table 1.3: P=1,Q=0
MA0 MA1 MA2 MA3 MA4 MA5
AR0 8241.87 7746.90 7497.96 7365.58 7300.71 7217.19
AR1 7107.12 7076.83 7078.51 7080.50 7081.49 7076.73
AR2 7077.04 7078.50 7080.49 7065.49 7066.92 7065.46
AR3 7078.56 7080.70 7064.86 7051.14 7052.28 7053.48
AR4 7080.40 7082.40 7071.33 7052.51 7059.75 7036.62
Table 1.4: P=1,Q=1
MA0 MA1 MA2 MA3 MA4 MA5
AR0 12709.34 11774.10 11247.19 10601.38 10441.99 10386.41
AR1 9735.91 9621.20 9590.13 9516.89 9491.65 9449.30
AR2 9559.36 9365.60 9364.66 9353.59 9355.11 9348.45
AR3 9479.96 9365.42 9365.71 9352.67 9354.36 9350.37
AR4 9385.12 9356.07 9353.49 9351.04 9352.61 9346.35
Table 1.5: P=1,Q=1
MA0 MA1
AR0 8243.58 7747.87
AR1 6866.59 6826.32
Table 1.6: P=0,Q=1
MA0 MA1 MA2 MA3 MA4 MA5
AR0 12882.37 11890.75 11354.34 10650.94 10441.10 10407.19
AR1 9917.22 9753.40 9691.22 9586.46 9559.04 9513.61
AR2 9614.35 9372.62 9371.18 9359.89 9361.88 9351.24
AR3 9500.20 9372.06 9371.17 9358.81 9359.68 9352.59
AR4 9400.09 9363.27 9360.30 9363.22 9362.79 9352.85
Table 1.7: P=,Q=1
MA0 MA1 MA2 MA3 MA4 MA5
AR0 8242.69 7745.96 7490.85 7347.06 7261.61 7159.14
AR1 6866.72 6827.41 6829.41 6831.19 6832.31 6831.40
AR2 6830.34 6829.41 6831.37 6829.84 6829.79 6830.04
AR3 6829.39 6831.30 6833.36 6833.79 6833.76 6836.24
AR4 6831.25 6828.97 6833.18 6836.00 6834.73 6837.60

From our grid search our best model is found to be SARMA(4,5)x(1,1) for the pandemic time frame and SARMA(1,1)x(1,1) in the post-pandemic time frame as they had the lowest AICs.

Model diagonostics

Assessing the adequacy of the fitted SARMA models for both the pandemic and post-pandemic periods is essential to ensure they accurately capture the underlying patterns in the COVID-19 case data. A well-fitting model should produce residuals that exhibit no significant autocorrelation, approximate normality and lack discernible trends or patterns. To evaluate these aspects, we conducted a series of diagnostic checks, including residual analysis, ACF/PACF analysis, normality testing, and autocorrelation assessments.

Residual Analysis

The residuals from both models exhibit the expected characteristics, showing no clear patterns or trends, which suggests the models are capturing the main data features effectively. The pandemic model, in particular, shows stable residuals with no significant outliers, indicating a good fit with minimal unexplained variation. In contrast, the post-pandemic model displays more frequent and larger outliers, signaling a weaker fit. These larger deviations suggest the model struggles to capture the underlying dynamics of the post-pandemic data.

ACF/PACF Analysis

The ACF and PACF plots of the residuals were analyzed to assess autocorrelation. For the pandemic model, both plots show minimal autocorrelation, suggesting the model effectively captures the data’s dependencies. In contrast, the post-pandemic model exhibits significant autocorrelation with several values exceeding the confidence interval, indicating that the model has not fully accounted for the dependencies in the post-pandemic data and needs further improvement.

Normality Test

## Shapiro-Wilk Test for Pandemic Residuals: p-value = 1.817198e-15
## Shapiro-Wilk Test for Post-Pandemic Residuals: p-value = 2.112151e-37

Both the pandemic and post-pandemic models show clear deviations from normality, as indicated by the Shapiro-Wilk test and the QQ plots. In both models, the residuals deviate notably from the theoretical line, especially in the tails, suggesting that the models may not fully capture the underlying distribution of the data. These deviations could indicate potential model misspecification, and addressing them might improve the fit. However, the severity of the non-normality is not substantial enough to significantly impact the overall model performance.

Box-Ljung Test for Autocorrelation

## Box-Ljung Test for Pandemic Residuals: p-value = 0.9437145
## Box-Ljung Test for Post-Pandemic Residuals: p-value = 0.1654987

The Box-Ljung test results show no significant autocorrelation in the residuals for both models. The pandemic model has a p-value of 0.9437, and the post-pandemic model has a p-value of 0.1655, both indicating that the residuals are close to white noise and that the models have captured the data’s dependencies.

Impact of Vaccination on COVID-19 Cases in Ireland

The introduction of COVID-19 vaccines was pivotal in the global effort to control the pandemic. However, a key question remains: Did the nationwide deployment of vaccines significantly alter the trajectory of confirmed COVID-19 cases in Ireland? In this section, we aim to analyze the potential causal effect of vaccination rollout on the COVID-19 case trends in Ireland using time-series analysis.

To investigate this, we selected three critical vaccination dates, each marking a significant expansion in eligibility. These dates are derived from the “COVID-19 Vaccination in the Republic of Ireland” timeline, where each phase introduced vaccines to broader population segments (Wikipedia,2024). By examining the trends before and after these dates, we can assess whether vaccine availability had a measurable impact on case numbers.

Selected Vaccination Dates: 1. December 29, 2020 – Phase 1: The first COVID-19 vaccine was administered in Ireland, marking the beginning of Phase Eligible groups: Healthcare workers, perople aged 70+, residents in long-term care facilities. 2. March 1, 2021 – Phase 2: Expansion to high-risk groups and older adults. Eligible groups: Phase 1 individuals and individuals aged 65-69 and people aged 16-69 with high-risk medical conditions 3. July 27, 2021 – Phase 3: Vaccination extended to younger populations. Eligible groups: Phase 2 individuals and individuals aged 12-15.

The above time-series plot reveals distinct trends following each vaccination phase in Ireland. After the first rollout (2020-12-29), cases spiked; this could likely be due to the prioritization of older citizens and delayed mass distribution. The second phase (2021-03-01) saw a sharp decline, but cases later rebounded, possibly due to new variants like the alpha variant (Reynolds et al, 2022). Following the third phase (2021-07-27), cases showed a gradual and sustained decline, suggesting broader vaccine coverage contributed to long-term case reduction. As these are speculations, we will first conduct breakpoint detection to identify the key breakpoint dates in the Covid time series.

Generate the Confimed Covid-19 Time Series Breakpoints:

We conducted a breakpoint detection analysis using the strucchange package to identify key moments where the pattern of COVID-19 cases in Ireland shifted significantly (Breakpoint, n.d.). These breakpoints represent structural changes in the time series, indicating potential turning points in the pandemic’s trajectory. If the detected breakpoints align with the vaccine rollout dates, this suggests that vaccination directly influenced case trends. However, if they do not align, it implies that other factors—such as lockdown measures, new variants, or shifts in public behavior—may have played a more dominant role in shaping the observed trends.

## [1] "Detected Breakpoints:"
## [1] "2020-12-27" "2021-11-01" "2022-07-29" "2024-02-02"
## [1] "Original Vaccine Release Dates:"
## [1] "2020-12-29" "2021-03-01" "2022-07-27"

The breakpoint detection analysis identified four major structural shifts in the COVID-19 time series: 2020-12-27, 2021-11-03, 2022-07-29, and 2024-02-02. Notably, two of these breakpoints (2020-12-27 and 2022-07-29) are just two days apart from the official vaccine rollout dates (2020-12-29 and 2022-07-27), suggesting a strong correlation between vaccine deployment and changes in COVID-19 trends. This alignment indicates that Phase 1 and Phase 3 of the vaccination program may have directly influenced the trajectory of confirmed cases. However, to move beyond correlation and establish causal significance, we will conduct a Causal Impact analysis, which will quantify the effect of these vaccine rollouts on case numbers and determine whether these observed shifts are statistically significant.

Causal Impact Analysis

Causal Impact analysis, developed by Google, is a Bayesian structural time series approach designed to estimate the effect of an intervention when a clear counterfactual is unavailable (Casual Impact, n.d.). By comparing observed time series data before and after an event while accounting for trends and seasonality, this method helps quantify the causal effect of an intervention, such as policy changes or public health measures. In this study, we apply Causal Impact analysis to assess whether the rollout of COVID-19 vaccines significantly affected case trends in Ireland.

##         Date P_Value Posterior_Probability_of_a_causal_effect
## 1 2020-12-29 0.48646                                      51%
## 2 2021-03-01 0.38386                                      62%
## 3 2022-07-27 0.26263                                      74%

The Causal Impact analysis results indicate varying levels of statistical evidence for a causal effect of vaccine rollouts on COVID-19 case trends. The Phase 1 rollout (2020-12-29) shows a 51% posterior probability, suggesting weak evidence of an impact. The Phase 2 rollout (2021-03-01) has a 62% probability, indicating a moderate likelihood of influence. The Phase 3 rollout (2022-07-27) has the strongest effect, with a 74% probability, suggesting a higher confidence that the vaccine expansion contributed to the observed trend change. While the results suggest some correlation, they do not provide definitive proof of a causal relationship due to the high p-values associated with the vaccine rollout dates. This warrants further analysis and may indicate the presence of other confounding variables influencing the trend.

Conclusion

Our analysis demonstrates that both SARMA models effectively capture COVID-19 case trends in Ireland, with the pandemic model showing a slightly better fit. The residual analysis indicates minimal autocorrelation, suggesting that both models are reasonable approximations of the data. However, the post-pandemic model exhibits some irregularities in its residuals, implying that it may not fully account for all underlying patterns. While the pandemic model appears more reliable, both models could benefit from further refinement, such as incorporating additional covariates or adjusting seasonal components to improve accuracy. Our seasonal analysis further supports this conclusion, showing clear signs of seasonality during the pandemic but a weaker pattern post-pandemic. The PACF plots confirm that while a weekly cycle existed during the pandemic, post-pandemic trends exhibit more variability, suggesting a shift in COVID-19 dynamics over time.

Beyond seasonality, our vaccine impact analysis reveals a complex relationship between vaccination rollout and COVID-19 case trends. While breakpoints in the time series align closely with vaccine rollout dates, causal impact analysis suggests that vaccination alone did not significantly drive these changes. Other confounding factors, such as government-imposed restrictions, social behavior adjustments, and the emergence of new COVID-19 variants, likely played a crucial role in shaping infection trends. This underscores the challenge of isolating single interventions in public health data and highlights the need for multifactorial models to understand disease dynamics better. Future research could integrate policy shifts and mobility data to refine our understanding of how vaccines, alongside other measures, contributed to pandemic control.

Sources

Previous Project Comparisons

Our analysis shares similarities with the Utah source regarding time series modeling and intervention analysis, but we focus specifically on the impact of vaccine rollouts on COVID-19 cases in Ireland. We employ stationarity tests, spectral analysis, and causal impact modeling to assess changes in seasonality and long-term trends. Unlike the Utah project, which emphasizes ARMA model selection, our approach incorporates breakpoint detection and compares distinct pandemic phases to understand structural shifts in the data better.

The seasonality analysis heavily referenced the Sunspot analysis made by the group in the winter of 2024 in how to thoroughly investigate a seasonal trend. However, our report takes one step further in offering a possible explanation in the trend through breakpoint analysis using our prior knowledge. Our model analysis section is also a bit more fleshed out by adding a Box-Lung Test for autocorrelation, ACF/PACF, and Normality on top of a check on its residuals.

From previous peer reviews, we have learned the importance of clarifying assumptions, improving model selection rationale, and refining visualization techniques. The feedback has suggested considering alternative transformations, such as log or square root scaling, to stabilize variance and comparing different lag models using AIC or likelihood ratio tests. By incorporating these insights, we strengthen our methodology and ensure that our findings are robust and interpretable.

Past Midterm Projects : https://ionides.github.io/531w21/midterm_project/project16/project.html https://ionides.github.io/531w24/midterm_project/project09/blinded.html