Introduction

Coronavirus disease (colloquilly known as COVID-19) is respiratory illness caused by the SARS-CoV-2 virus [1]. First observed in mid-December, 2019, in Wuhan, China, COVID-19 has spread across the globe, being declared a pandemic by World Health Organization in March 11,2020 [2]. Over the course of March and April of 2020, numerous countries around the world closed their borders and issued lockdowns or shelter in place orders [3].

Although much of 2020 was marked by frequent lockdowns and shelter in place orders, the year ended on a high note as Pfizer-BioNTech and Moderna, later on joined by Johnson&Johhnson, obtained emergency authorization for their COVID-19, vaccines using cutting-edge mRNA technology. In United States, as well as in many other countries across the world these vaccines were first distributed to doctors and nurses, high-risk population groups, and the essential front-line workers due to shortages in production and rollout [3]. 4 months after their emergency authorization, COVID-19 vaccines were made available to all adults over the age of 18 in United States [4]. Despite a the initial effort, COVID-19 vaccination rate soon stagnated due to hesitancy against the vaccine in communities across United States[5].

As of today, just a year after the vaccine first became available to general public in the United States, COVID-19 related lock-downs seem a distant memory despite the increasing infectiousness of the COVID-19 variants. As we celebrate the one year anniversery of the COVID-19 vaccine being made available to the general public, the question of what “what if the vaccine rollout post April 19 was faster or slower?” remains a relevant discussion point. In this project we attempt to answer this question by modelling the COVID-19 pandemic following April 19,2021 using the compartment model framework in epidemiology and simulating different vaccine adoption scenarios from our model.

Data

In order to answer this question we utilize COVID-19 case and death count data from the New York Times COVID-19 Case Tracker [6] and the COVID-19 vaccination data from the Center for Disease Control in United States (CDC) [7].

The COVID-19 data from New York Times COVID-19 Case Tracker tracks the COVID-19 cases and COVID-19 related deaths going back all the way to 21st of January, 2020, containing 815 observations 21st of January, 2020 and 15th of April, 2022, the day we last update the data. The data is represented in terms of cumulative case and death counts, so we take the difference in case counts to obtain daily case and death counts.

CDC’s COVID-19 Vaccination Data in United States contains information regarding vaccination rates in United States dating back to 13th of December, 2020, broken up by numerous categories including state, vaccine manufacturer, and major age brackets. After combining the data from different states, we obtain 489 observations between 13th of December, 2020 and 15th of April, 2022, the day we last update the data. As of April 2022 there are 3 major COVID-19 vaccines distributed across United States: Janssen (colloquially known as Johnson&Johnson Vaccine), Moderna, and Pfizer-BioNTech. While Janssen vaccine requires one shot to provide full protection against COVID-19 while Moderna and Pfizer-BioNTech requires 2 shots spaced 4 and 3 weeks apart respectively [8]. As such, in order to simplify our analysis, we only consider the doses administered and the number of individuals who have reached “full vaccination”. Once again, the data is represented in terms of cumulative numbers, so we take the difference to obtain daily rates.

As afformentioned, given that COVID-19 vaccines were not readily available to general public due to supply constraints prior to April 19, 2021, we consider both time series after 19th of April, 2021. Overall we have 361 observations between 19th of April, 2021 and 15th of April, 2022, the day we last update the data.

Exploratory Data Analysis

COVID-19 Cases

The time series plot for the daily COVID-19 cases can be seen below

Looking at the time series of Daily COVID-19 cases, we observe the gradual decrease in cases over time as vaccines become more available. Around July of 2021, we see increase in daily case numbers which can be attributed to the mass gatherings during 4th of July celebrations in United States as well as the spread of the new Delta variant, which was highly infectious to the non-vaccinated [9]. The Delta wave continues till November 2021, when we began to see a rapid increase in case numbers. This increase can be attributed to the gatherings during Thanksgiving and the holiday season (Christmas, Hannukah, New Year, and similar other cultural celebrations based on Winter Solstice) with the emergence of the new Omicron variant [10] which is even more infectious than Delta variant and can result in breakthrough infections in the vaccinated.

As expected, the daily COVID-19 cases are highly time dependent, as can be seen in the sample auto-correlation function plot above. This time dependence seems to by cyclic, which is consistent with the wave patterns we have grown accustomed to in COVID-19 data.

Since the time series of daily COVID-19 cases clearly neither mean stationary nor covariance stationary, we look at the differenced data, the change in daily COVID-19 Cases

Looking at the change in daily COVID-19 cases, we observe that the time series looks to be mean stationary with the mean around 0 but a non-constant variance that changes over time (in our case corresponding to the waves caused by different variants). It can be noted that the time series of change in daily COVID-19 cases, the integrated time series, looks similar to time series of returns of a financial asset as such we can use techniques in financial statistics, such as ARMA-GARCH models, to model it.

As suspected, the sample auto-correlation function plot shows us that the integrated time series still has a strong time dependence; however, given our observations, it is reasonable to assume that this dependency is partially due to heteroskadasticity.

COVID-19 Vaccinations

As noted, given that different vaccines have different times to provide full protection against COVID-19, instead of considering the number people getting vaccinated, we consider number of people reaching full vaccination on a given day. We plot the time series of people reaching full vaccination status on a given day as well as the cumulative number of people that have reached full vaccination on that day below.

Looking at the time series plot of the number of people reaching full vaccination in United States as well as the cumulative people that have reached full vaccination by that date, we observe that following 19th of April, 2021, when the vaccines first became available to general public, number of people reaching full immunity increased at first, but then tapered off towards 0. It is important to note that we have 2 0 with negative number of people reaching full vaccination negative vaccination which is implausible. We attribute this to backlogging of revisions, and thus set these values to 0.

ARIMA Model

We will start by modeling the daily COVID-19 cases using stationary Integrated, AutoRegressive MovingAverage (ARIMA) model. As indicated by our exploratory data analysis, the daily COVID-19 cases cannot be modelled as trend and covariance stationary; however, the change in daily cases seem to better suited towards a stationary model despite showing a clear sign of heteroskadasticity. We first proceed fitting an ARMA(p,q) model on the change in daily COVID-19 cases, which constitutes a ARIMA(p,1,q) model on the daily COVID-19 cases, comparing the model fits using Akaike’s Information Criterion (AIC).

Since our exploratory data analysis highlights that the Omicron wave occured despite the high number of people with full vaccination status, we expect that utilizing cumulative number of people that has reached full vaccination to obscure our results. This hypothesis is confirmed by our preliminary results where including number of people that has reached full vaccination causes convergence issues and using its log transform decreases the model fit as measured by AIC, as such, despite our goal of understanding the effects of vaccination on COVID-19 cases, we have to proceed by not considering the effect of vaccination as we are not able properly model the non-linear effects of vaccination and variants within the ARIMA framework.

Our preleminary results indicated that incorporation of seasonal terms to the model were not necessary and caused convergence issues as well as the roots of the AR/MA and/or SAR/MAR polynomials cancelling and yielding smaller models. As such we limited our model space to the non-seasonal ARIMA, fitting \(ARIMA(p,1,q)\) models for each combination of p and q ranging from 0 to 4, resulting in \(5\times6=25\) different models. The fit of these models as indicated by AIC is presented in the table above.

Model Fit for ARIMA(p,1,q) Measured by AIC
MA0 MA1 MA2 MA3 MA4
AR0 9565.345 9432.528 9432.037 9368.336 9338.203
AR1 9520.496 9433.213 9427.784 9344.230 9340.153
AR2 9455.614 9418.655 9334.482 9358.546 9278.076
AR3 9441.084 9420.643 9422.638 9337.529 9311.151
AR4 9442.649 9420.693 9424.589 9359.871 9315.569

Although ARIMA(2,4) model has the lowest AIC, the AR and MA polynomials have cancelling roots, simplifying the model into a smaller models. After going through our list of candidate models, we observe that the best fitting ARIMA model without any issues regarding to cancelling roots is the ARIMA(1,1,2) model, for which we plot the fitted values below

Looking at the fitted values of the ARIMA(1,1,2) model above, we observe that model fails to model the Delta and Omicron waves due to the heteroskadasticity. This pattern is more appearent when we look into the time series of the ARIMA(1,1,2) model residuals, where the Delta and Omicron waves stand out as outliers, with the model severly underpredicting the change in daily COVID-19 cases in both waves

The time series plot for ARIMA(1,1,2) model shows us that as expected, the heteroskadasticity in the number of COVID-19 cases due to the Delta and Omicron variants renders the ARIMA framework inadequate.

Looking at the ACF of the residuals of the ARIMA(1,1,2) model we observe that there is unaccounted time dependence in our model, which we attribute to the underfitting of the model.

Finally, looking at the normal quantile-quantile plot of the ARIMA(1,1,2) residuals, we observe that the residuals are clearly non-normal with extremely heavy tails, which constitutes a violation of Gaussian white noise assumption of the ARIMA framework. This is expected as the time series for change in daily COVID-19 cases exhibited clear signs of heteroskadasticity. As indicated in our exploratoryrortoy data analysis, given the similarities between the time series of hange in daily COVID-19 cases and the time series of the returns of a tpyical financial asset, we can utilize the ARMA-GARCH framework in the financial statistics literature to account for the hetersokadasticity.

ARMA-GARCH Model

As noted in the financial literature, ARMA models perform sufficiently well when the time series have serial correlation; however, the underlying independent identically distributed white noise assumption fails when there is volatility clustering (periods of high/low variance), which is frequently seen in financial data [11]. In order to address this shortcoming, an ARMA-GARCH model replaces the independent and identically distributed white noise process from an ARMA model with a weak white noise process from a GARCH model, yielding a ARMA-GARCH model. Further details about the ARMA-GARCH model family can be found in Chapter 14 of Ruppert and Matteson [11].

While we expect an ARMA-GARCH model to alleviate the issues arising from heteroskadasticity, these models can already be unstable in relatively well behaving financial data and in our case the ARMA-GARCH models we attempt to fit have computational issues regarding non-invertible Hessian matrix.

Given the poor performance of the ARIMA(1,1,2) model and the numerical convergence issues in the ARMA-GARCH models we try, We conclude this section by pointing out the elementary ARIMA framework is inadequate in modelling COVID-19 cases as the COVID-19 cases exhibit heteroskadasticity due to the variants that randomly emerge and change the infectiousness of the virus. Given the non-linear effects due to variants, vaccinations and dynamic nature of viruses such as COVID-19. As such, we proceed by modelling the daily COVID-19 cases utilizing a modified SEIR model by utilizing the Partially Observed Markov Chain (POMP) framework.

POMP Model: SVEIQRD

To model the daily COVID-19 cases, we modify the SEIR (Susceptible, Exposed, Infected, Recovered) compartment model from epidemiology, which we name the SVEIQRD (Susceptible, Vaccinated,Exposed, Infected, Quarantined, Recovered, Dead) model. A similar model have been used in the epidemiology literature by Ghostine et al. (2021) [12]; however, their model focuses on Saudi Arabia and had limitations with regards to avilability of vaccine data, as COVID-19 vaccines were authorized relatively recently when the paper was published.

Although we have data for the number of people achieving full vacccination status every day, we choose to model the vaccinations within our model as our end goal is to simulate different vaccination rates in order to explore its effects on case numbers.

Model:

We present a diagram for the SVEIQRD Model below:

Each arrow demotes the associated rate for which individuals in one compartment move from one to the other. We assume that there are 6 mutually exclusive stages. Invidiuals can move from being succeptible (S) to being Vaccinated (V) or Exposed (E). The vaccinated individuals have some protection against infection (denoted by \(\gamma\)). Once Exposed (E), individuals move from Exposed (E) to Infected (I), where they can transmit the virus, to Quarantined (Q) where they no longer are able to infect others. After the quarantine stage individuals move to either Recovered (R) or Dead (D) stages, with *denoting the fraction of individuals that die. The number of cases is given when individuals move from Infected (I) to Quarantined (Q) stage with respect to discretized normal distribution truncuated at 0, given by

\[Cases = (round(C_N))^+ \text{ where } C_n\sim N(\chi\cdot H_n,(\rho H_n)^2+\chi\cdot H_n)\] where \(a^+=max\{a,0\}\) and H is the variable tracking the individuals moving from I to Q. Although we do not consider the asymptomatic cases, these can be thought of as being part of the measurement error.

The number of people in each compartment is given by:

\[\begin{eqnarray} \\ &S(t)&= S(0) - N_{SE}(t)-N_{SV}(t) \\ &V(t)&= V(0) + N_{SV}(t) \\ &E(t)&= E(0) + N_{SE}(t)+N_{SV}(t)-N_{EI}(t) \\ &I(t)&= I(0) + N_{EI}(t) - N_{IQ}(t) \\ &Q(t)&= N(0) + N_{IQ}(t) - N_{QR}(t) - N_{QD}(t) \\ &R(t)&= R(0) + N_{QR}(t) \\ &D(t)&= D(0) + N_{QD}(t) \end{eqnarray}\]

where the the movement between compartment is given by:

\[\begin{eqnarray} \\ &\Delta N_{SV}(t)&\sim Binomial(S,1-e^{-\nu\Delta t}) \\ &\Delta N_{SE}(t)&\sim Binomial(S,1-e^{-\beta_t(I/N)\Delta t}) \\ &\Delta N_{VE}(t)&\sim Binomial(V,1-e^{-(1-\gamma)\beta_t(I/N)\Delta t}) \\ &\Delta N_{EI}(t)&\sim Binomial(E,1-e^{-\mu_{EI}\Delta t}) \\ &\Delta N_{IQ}(t)&\sim Binomial(I,1-e^{-\mu_{IQ}\Delta t}) \\ &\Delta N_{QR}(t)&\sim Binomial(Q,1-e^{-\kappa\cdot\mu_{QR}\Delta t}) \\ &\Delta N_{QD}(t)&\sim Binomial(Q,1-e^{-(1-\kappa)\cdot\mu_{QD}\Delta t}) \end{eqnarray}\]

Note that in order to incorporate different infectiousness of variants, we define \(\beta\), the force of infection parameter, as follows.

\[ \beta_t= \left\{ \begin{array}{ll} b_o & t \geq \text{ July 1 2021} \\ b_1 & \text{ 1 July 2021} < t\leq \text{ December 1 2021} \\ b_2 & > t \text{ December 1 2021} \\ \end{array} \right. \]

where \(b_2\) and \(b_3\) correspond to the force of infection for Delta and Omicron variant respetively. The dates were determined in combination with our exploratory data analysis and consulted sources [9], [10].

In order to initialize the compartments, we utilize the our data. We the initial number of vaccinated \(V(0)\) to be the cumulative number of people vaccinated at our start date, 19th of April, 2021. Likewise, we define the initial number of quarantined people \(Q(0)\) to be the cumulative number of cases in the last two weeks before our start date, 19th of April, 2021. We define the initial number of dead individuals \(D(0)\) to be the cumulative number of deaths since 2021, and the recovered individuals \(R(0)\) to be the difference be the difference between the cumulative number of cases and deaths in the 6 months before our start date, 19th of April, 2021. Finally, we define \(C_{-7\leq t\leq0}\) to be the cumulative number of cases starting from 1 week before our start date, 19th of April, 2021 and \(N\) to be the estimated U.S. population at the time of 19th of April, 2021, using these define the remaining compartments as follows:

\[\begin{eqnarray} \\ &S(0)=& N- V(0)-S(0)-E(0)-I(0)-Q(0)-R(0)-D(0) \\ &E(0)=& \phi \cdot C_{-7\leq t\leq0} \\ &I(0)=& \psi \cdot C_{-7\leq t\leq0} \end{eqnarray}\]

where \(\phi\) and \(\psi\) are model parameters.

Apart from \(N\) and \(C_{-7\leq t\leq0}\), which we determine using the data, all other variables are estimated using IF2 filter implemented in R package POMP.

Model Assumptions

The model assumes that United States population is constant across the April 2021-April 2022 year when COVID-19 deaths are accounted and there is no migration/travel in and out of the system. This has the implication that the birth rates and travel from other countries are to be neglected. Neglecting birth rates over the course of a year is expected given the short duration while neglecting travel/migration reasonable given that the United States population is large enough that traveling/migrating population would be negligible with respect to the overall population. Furthermore, not only non-essential travel has been scrutinized over the past year due the COVID-19 pandemic, but most countries have required proof of a negative COVID-19 test within a set period of time before admitting people in, making increase in transmission rates due to travel less likely.

Another model assumption is that in each period people reach full vaccination status at a fixed rate proportional to the size of the eligible population. This is a simplifying assumption given that, as afformentioned, different vaccines have different timelines to provide full effect. Modelling all three of the main vaccines offered in the United States could have unnecessarily complicated our discussion. A alternative model featuring different vaccines and the vaccination timeline is provided in the Discussion section.

We further assume that the COVID-19 vaccine provides a fixed protection against infections. Although it has become clear that the protection offered by COVID-19 vaccines declines over time, requiring booster shots after 5 months of reaching full vaccination [8], we assume that the protection of COVID-19 vaccine is fixed in order to simplify our model.

Finally, we assume that recovered inviduals cannot get infected again. This is once again a simplifying assumption since the choice of how to model re-infections can have potential ramifications with regards to the model. For example, it is not clear whether recovered individuals should be modelled as going back to the succeptible compartment, vaccinated compartment, or even a seperate **RS* compartment spesifically for individuals who have recovered but have become susceptible again after a certain amount of time.

Estimation of Model Parameters

We first guess the parameters and get our first simulated results as the plot shown below. Besides, we set the different run level so we can do different level search in the Great Lakes and our own computers. Then we set the random walk parameters and do the local search.

In the diagnostic plot, the likelihood does not look very stable, waving between -7000 and -5000. Meanwhile, as the iteration process, the other parameters also have large variability.

We can see the log likelihood and coefficients of the best local search as shown below.

##          b1        b2       b3         nu     gamma     mu_EI     mu_IQ
## 1 0.7079926 0.9215825 1.185933 0.10475625 0.6601804 0.2411641 0.2920051
## 2 0.6018194 0.9392144 1.039068 0.10657766 0.7209042 0.2755555 0.2069859
## 3 0.6109461 0.7887199 3.396597 0.08576116 0.5692385 0.2622542 0.3340951
## 4 0.4200233 0.6578362 1.997842 0.06892497 0.7748838 0.4709846 0.2323127
##       kappa      mu_QR      mu_QD       rho       chi         N initial_V
## 1 0.9394876 0.05103900 0.06583730 0.7034657 0.7912499 333106527  88465154
## 2 0.9095020 0.07150920 0.04280260 0.7865378 0.8557000 333106527  88465154
## 3 0.9445689 0.04421171 0.07675571 0.7241015 0.8835148 333106527  88465154
## 4 0.8712129 0.04773455 0.04846517 0.7042682 0.8981909 333106527  88465154
##   last_week_cases       phi       psi initial_Q initial_R initial_D    loglik
## 1          470652 0.1992993 0.4948032    952421  23192505    221264 -4796.801
## 2          470652 0.1980386 0.5048655    952421  23192505    221264 -4851.499
## 3          470652 0.1955222 0.4933718    952421  23192505    221264 -5546.685
## 4          470652 0.1980085 0.4991703    952421  23192505    221264 -7165.325
##    loglik.se
## 1 0.03898374
## 2 0.02373012
## 3 0.20907080
## 4 0.43851710

We can see the simulated results as shown below. It seems that some parametere are too high.

In the plot of the likelihood surface, the plots of loglik look so sparse that it does not give us a clear picture or hint of the ridge in likelihood surface. Thus, we move on to do global search.

Like what we do in the local search, we set the range of parameters, which are based on the simulation result from the local search.

We refer some of the codes from the project 13 of STATS 531 WN21 here.[19]

The best global search had the following coefficients and log likelihood and simulated results. We can see that the best likelihood is -5677.496, which is also in the range of (-7000, -5000).

##           b1        b2        b3         nu         N initial_V last_week_cases
## 1 0.35151838 0.4181022 3.2114206 0.15211719 333106527  88465154          470652
## 2 0.37533335 3.4925660 1.0938091 0.04317137 333106527  88465154          470652
## 3 0.54311207 0.7560287 3.5287085 0.40072775 333106527  88465154          470652
## 4 0.01452329 0.5291114 0.5568778 0.34408179 333106527  88465154          470652
##   initial_Q initial_R initial_D     gamma     mu_EI     mu_IQ     kappa
## 1    952421  23192505    221264 0.5169875 0.3667463 0.1249131 0.9568890
## 2    952421  23192505    221264 0.9173188 0.2797092 0.1537462 0.9156526
## 3    952421  23192505    221264 0.6617313 0.6562538 0.2819042 0.7346736
## 4    952421  23192505    221264 0.9597756 0.2556017 0.1598194 0.8935775
##         mu_QR      mu_QD       rho       chi       phi       psi    loglik
## 1 0.077049518 0.12156670 0.7706217 0.9613344 0.2289540 0.4932489 -5677.496
## 2 0.088327228 0.01315930 0.7533657 0.9879172 0.1706816 0.5378966 -6143.697
## 3 0.002185558 0.01480645 0.6199138 0.8855258 0.2117590 0.5360395 -7125.443
## 4 0.193284539 0.07555806 0.7914039 0.8618219 0.2493511 0.4764857 -8174.010
##    loglik.se
## 1 0.10119092
## 2 0.27684886
## 3 0.43521629
## 4 0.02272337

Then we can see the simulation result, it looks a little better than that in the previous simulation. However, it also does not match the data quite well.

Then we can see the pair plot for the global search, but it also looks sparse

Conclusion

The goal of this report was to modelling the COVID-19 virus, focusing on the effect of the adoption of the COVID-19 vaccine on the daily cases and potentially investigate the effects of changing vaccine adoption rates to case counts using simulation based methods.

As expected, our attempts in modelling daily COVID-19 cases using the Autoregressive Moving Average model framework yielded unsatisfactory results with the ARIMA(1,1,2) model failing to account for heteroskadasticity in the data and having a non-Gaussian noise process underlying it. Although we suspect an ARMA-GARCH model might be useful in modelling the daily COVID-19 cases by taking into account the periods of rappid rise and fall of cases, we suspect a more sophisticated estimation approach might be needed as our models were not able to estimated due to a non-invertible Hessian matrix arising in the estimation process.

Convergence issues were not limited to the conventional mechanistic time series model framework. Using the Partially Observed Markov Chain (POMP) framework, we developed a modified SEIR model to model the effects of vaccination adoption rates and variants in COVID-19 cases.Our SVEIQRD model, which adds additional compartments for Vaccinated, Quarantined, and Dead to the conventional SEIR model, had large variations in the model likelihood and the simulated cases were regularly overestimating the actual cases. Given the failure of our model to converge, we are unable to make a definitive simulation study into the effects of changing COVID-19 cases.

We expect the difficulty in estimation for all of our potential models to be the result of the increase in case numbers due to the Omicron variant being difficult to be accounted without modelling the decreasing effectiveness of the vaccine and the potential for reinfection.

Discussion on the Assumptions and Improvements on the Model

As noted throughout our analysis, COVID-19 is a complex virus and modelling it requires some simplifying assumptions. First of all, it is important to highlight that we modelled United States as a whole, which implicitly allows for a person in California to infect a person from North Dakota. This assumption can be relaxed by modelling United States as a population of populations, with each state having their own individual compartment model and the United States consisting of a compartment of states. Such a model would be much more complex, require many more parameters to estimate, and would be more difficult to interpret.

Second, as noted in the model assumptions section, we assume that is no migration or births within the year we are modelling. This assumption reduces the number of parameters that we have to estimate. This assumption can be held as long as the time horizon we are considering is short (1-2 years) and the travel in and out of the country is small compared to the overall population (which was expected given the ongoing pandemic).

Third, we make a couple simplifying assumptions in our modelling of the vaccines. We assume that there is a steady stream of individuals flowing from Susceptible (S) compartment to the Vaccinated (V) compartment and that there are no individuals who refuses to take the vaccine due to preferences (vaccine hesitancy). Furthermore, we assume that there is only two states of vaccination: vaccinated and un-vaccinated, disregarding the intermediate step of single dose vaccinations (for Pfizer-BioNTech and Moderna). This assumption is in tandem with our implicit assumption that there is only one vaccine with a fixed protection against infection. In other words, we are assuming once vaccinated everyone has the same protection against COVID-19 no matter which vaccine they received and the how long it has been since full vaccination. We present the following model, which takes into account different vaccines, intermediate state between vaccinations for Pfizer-BioNTech and Moderna vaccines, as well as the different duration between doses.

Note that such a model would have significantly more parameters to estimate, though this number can be reduced by taking certain parameter fixed. Particularly, the rate of movement between 1st and 2nd doses can be modeled as fixed by taking the suggested time between vaccines.

Fourth, as noted in the model section, since our goal was to look into effect of vaccination rates on the COVID-19 cases, we chose not to use actual vaccination data, instead modelling the vaccination rate using the parameter \(\nu\). One can take the alternative approach of estimating a functional form for number of people reaching complete vaccination daily, and use that functional form to define the daily movements between the Susceptible (S) and Vaccinated (V) compartments. If this method is used, inference on vaccination rates would require to change the paramters of the function defining the movement between Susceptible (S) and Vaccinated (V) compartments.

Our preliminary investigation shows that the number of people moving people reaching full vaccination every day can be modeled as quadratic polynomial of form \(E[{N}_{SV}]=\) 1.7031021^{6} \(+\) -2.8312807^{4} \(t+\) 202.6987773 \(t^2+\) -0.6149252 \(t^3+\) 202.6987773 \(t^4\). where t is the number days since 19th of April, 2021, the day we start our model. The estimated functional form and the actual data can be seen below.

Moreover, as noted in the model assumptions, we assume that the recovered individuals are unable to get reinfected. Possible re-infections against COVID-19 has been known since the early days of the pandemic [3]. CDC is still studying reinfections in order to come with a definitive explanation [13]. Given that we are only looking for a single year, it is reasonable to assume people infected with COVID-19 are not at all likely to get reinfected; however, future research can relax this assumption, allowing for reinfections to occur. As noted in the model assumptions section, modelling reinfections becomes a modelling choice as there is no clear indication on whether the recovered should flow back to the Susceptible (S) compartment, the Vaccinated (V) compartment (as they presumably have some immunity against the virus), or their own seperate compartment, which we can refer to as Recovered Succeptible (RS).

Finally, it is important to once again highligt that we are assuming that the protection provided by the vaccines is constant over time. As noted, research currently indicates that the effectiveness of the vaccines decline over time, requiring people to take boosters once every 5-6 months [8]. Given that the Omicron wave coincided with the time most people got their boosters, it might be benificial to model the protection provided by the vaccines as a decreasing function of elapsed time or adding a new compartment(s) to reflect this waning protection

Suggestions for Future Studies

The main improvement we suggest for future research is to modelling re-infections and waning protection provided by the COVID-19 vaccine using additional compartments or time-varying parameters through the use of covariates. These would complicate the model, but could help explain the unprecedented spike caused by the omicron wave.

Another different approach would be to disregard the shortages in vaccine distribution and the vaccines not being available to general public, and utilizing data between February 2021 and December 2021, circumventing the omicron spike and the potential issues that can be caused by it.

As noted in the discussion section, future research can also look into modelling the vaccines and their intermediate steps seperately using a modified version of the model we propose in the discussion section. It is important to note that without adressing the main underlying issues with decreasing protection by the COVID-19 vaccine and the potential for re-infection, these models can suffer from similar issues in large variations of model likelihood and overesimation of the data.

Finally, another possibility is to utilize real data or a functional estimate of it to model the flow between the Succeptible (S) and Vaccinated (V) compartments. While the former approach of using real data would help reduce the number of parameters estimated and help turn the attention on fine tuning other parameters, it would not allow for the use of a changing vaccine adoption parameter for the testing the main hypothesis of this project. The latter approach could be useful to study effects of changing vaccine adoption rates similar by varying the parameters of the underlying functional form; however, could introduce issues in prediction.

Works Cited

1: https://www.who.int/health-topics/coronavirus#tab=tab_1

2: https://www.cdc.gov/museum/timeline/covid19.html

3: https://www.yalemedicine.org/news/covid-timeline

4: https://www.reuters.com/article/us-health-coronavirus-usa/all-american-adults-to-be-eligible-for-covid-19-vaccine-by-april-19-biden-idUSKBN2BT1IF

5: https://www.nytimes.com/2021/07/24/us/covid-vaccine-hesitant.html

6: https://github.com/nytimes/covid-19-data

7: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc/data

8: https://www.cdc.gov/coronavirus/2019-ncov/vaccines/stay-up-to-date.html?s_cid=11304:covid%20vaccine%20types:sem.ga:p:RG:GM:gen:PTN:FY21

9: https://www.yalemedicine.org/news/5-things-to-know-delta-variant-covid#:~:text=First%20identified%20in%20India%20in,overwhelming%20increase%20in%20hospitalizations%20in

10: https://www.cdc.gov/coronavirus/2019-ncov/variants/omicron-variant.html#:~:text=Emergence%20of%20Omicron&text=November%2024%2C%202021%3A%20A%20new,14%2C%202021%20in%20South%20Africa

11: Ruppert, David and Matteson, David S., “Statistics and Data Analysis for Financial Engineering with R examples.” Springer New York: Imprint: Springer, New York, NY:, 2015.

12: Ghostine, Rabih, Mohamad Gharamti, Sally Hassrouny, and Ibrahim Hoteit. 2021. “An Extended SEIR Model with Vaccination for Forecasting the COVID-19 Pandemic in Saudi Arabia Using an Ensemble Kalman Filter” Mathematics 9, no. 6: 636. https://doi.org/10.3390/math9060636 Available at: https://www.mdpi.com/2227-7390/9/6/636?type=check_update&version=1#cite

13: https://www.cdc.gov/coronavirus/2019-ncov/your-health/reinfection.html

14: https://www.census.gov/quickfacts/fact/table/US/PST045221

15: Ionides, E.L. (2022). Modeling and Analysis of Time Series Data. Available at: https://ionides.github.io/531w22/.

16: https://stackoverflow.com/questions/53147450/is-there-a-build-in-ordinal-sequence-vector-in-r

17: https://ionides.github.io/531w22/midterm_project/project11/blinded.html

18: https://ionides.github.io/531w21/final_project/project13/blinded.html

19: https://github.com/ionides/531w21/blob/main/final_project/project13/blinded.Rmd

20: https://ionides.github.io/531w21/final_project/project03/blinded.html

21: https://github.com/ionides/531w21/blob/main/final_project/project03/blinded.Rmd

22: https://ionides.github.io/531w21/final_project/project15/blinded.html

23: https://github.com/ionides/531w21/blob/main/final_project/project15/blinded.Rmd