Coronavirus disease 2019 (COVID-19) is a contagious disease caused by the virus with the name, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was identified in Wuhan, China, in December 2019 [1]. The disease has since spread worldwide, leading to the ongoing COVID-19 pandemic. By April 16, 2022 the total cases reached more than 500 million and resulted more than 6 million deaths worldwide [2]. In the United States, it has resulted in more than 80 million confirmed cases with around 1 million all-time deaths, the most of any country, and the nineteenth-highest per capita worldwide.[2] In our project we are going to search a dynamic model for the number of sequenced human COVID-19 viruses in the United States. Because modeling the whole pandemic is quite challenging and requires a wide range of data, from genetic sequences of the virus to the density of the population, from lockdown times to the vaccination rate. Performing a complete and thorough analysis of the pandemic beyond the scope of this class. Therefore we limit ourselves to the specific covid variants in the United States. Comparing dynamic model parameters of subsequent variants would give insight into how the virus adapts to human interventions such as lockdowns, vaccinations, and social distancing [3]. And understanding the trends potentially prepare us for future prevention.
We took the data from the GISAID, it is a global science initiative and primary source established in 2008 that provides open access to genomic data of influenza viruses and the coronavirus. They provide the variant information for each case. We focus on the cases of Delta and Omicron variants [4]. Our data consist the total number of people infected by delta and omicron variants each week from Jan. 2021 to March 2022 in the United States.
From the plot, we can see that the number of reports for Delta variant and Omicron variant are not similar to each other. Delta variant’s reports number increased sharply around June 2021 and fluctuated for 6 months and decreased sharply when Omicron emerges. Omicron increased just after Delta variant, which increased in new year and then it decreased in couple of weeks. Omicron variant started its distribution later than Delta, but had higher reports number at peak than the Delta variant did at peak, which is consistent with the report that the first Delta variant was detected on 23 February 2021 in the United States [5] and the first Omicron variant was detected on 1 December 2021 in the United States [6].
First, we fit an SEIR model for the delta variant cases in the United States. Which is known as the most deadly variant of the virus [7]. Then we compare the estimated parameters with the Omicron variant.
SEIR model is a fundamental class of models for disease transmission dynamics. Here \(S\) means the number of susceptible people, \(E\) means the number of exposed people, \(I\) means the number of people infected, \(R\) means the number of people recovered. Some people in stage \(S\) may become the people in stage \(E\), and people in stage \(E\) may become the people in stage \(I\), then the people in stage \(I\) may become people in stage R.
In this model, we have the following parameters: \(\beta\) = Exposure rate, \(\mu_{EI}\) = Incubation rate, \(\mu_{IR}\) = Recovery rate, \(\eta\) = Initial Susceptible rate, \(k\) = Initial infecteds, \(N\) = Susceptible size and \(\rho\) = Reporting rate. In our model, the value of \(N\) will be fixed at \(300,000,000\) according to the population of the US, \(k\) will be fixed at \(10\) and reporting rate will be fixed at \(0.1\). This is because roughly \(10\%\) of all cases are sequenced. And rest of the parameters will be estimated.
To find the best parameters of our models, we will estimate the value of the parameters by ourselves first, using the graph to find the approximate values of the parameters, and then using a local search method with the initial value find by our previous method to find more accurate values of the parameters, then we will use a global search method to find the value of the parameters.
For our SEIR models we use the construction in the class notes[8].
We started the local search with 100 iterations keeping the reporting rate, k and total number of population to be fixed and other parameters to start with our initial guesses. The likelihood seems to increase in general while some of the runs seems to get stuck in the local maxima. The plot for mu_EI suggests that higher incubation rate could probably suggests lower likelihood and the lines seems to shrink together as the iterations increase. The convergence of parameters like mu_IR, eta, Beta could be a problem that could be induced by not well-identified parameters, more exploration is required in the global search part to verify whether those are the problems that need to be solved.
Here are the estimated parameters from the local search that give the maximum likelihood (\(-919.8814657\)): \(\beta = 76.2387697\), \(\mu_{EI}= 0.3091796\), \(\mu_{IR}=0.699869\) and \(\eta=0.0329375\) with fixed \(N=3\times 10^{8}\) and \(k=10\), \(\rho=0.1\) values.
Above plot is the relationship of the paremetrs and likelihood during the global search. Here are the estimated parameters from the global search that give the maximum likelihood (\(-753.1738795\)): \(\beta = 73.4151881\), \(\mu_{EI}= 0.9130642\), \(\mu_{IR}=5.4640851\) and \(\eta=0.08235\) with fixed \(N=3\times 10^{8}\) and \(k=10\), \(\rho=0.1\) values. The likelihood of model from global search is much higher than the likelihood of model from local search, which shows the significant improvement on goodness of fit from global search. The result could also be supported by the a paper from National library of Medicine: the incubation rate for delta variant is about 4 days [9], which is close to the value in the model.
According to the simulation plot above, the simulated data from model captures the start time with some delay. Simulated reports start to increase around week 27 but this is a bit later than the actual report. The number of reports of delta variants increases sharply from 20 week and fluctuated from 30-50 week and decrease til after 60. The simulation somehow covers the data but quite different. Data have weeks of fluctuations after they reach their peak. But simulations have sharper peaks.
Now we repeat the process for Omicron cases.
The plot for loglikelihood shows that the likelihood increases and show perfect convergence as iteration goes to 100. mu_EI and mu_IR shows some rough convergence with some exceptions. eta does not show convergence well. it could be caused by either non well-defined parameter or inapppropriate search and require more exploration in global search
Here are the estimated parameters from the local search that give the maximum likelihood (\(-253.9265948\)): \(\beta = 224.0399294\), \(\mu_{EI}= 0.6410342\), \(\mu_{IR}=0.4322746\) and \(\eta=0.0361834\) with fixed \(N=3\times 10^{8}\) and \(k=10\), \(\rho=0.1\) values.
Above plot is the relationship of the parameters and likelihood during the global search. Here are the estimated parameters from the global search that give the maximum likelihood (\(-252.5641881\)): \(\beta = 389.3941958\), \(\mu_{EI}= 0.3995001\), \(\mu_{IR}=0.572261\) and \(\eta=0.0293225\) with fixed \(N=3\times 10^{8}\) and \(k=10\) \(\rho=0.1\) values.
The pair plots show that as mu_IR being close to 0.5, the likelihood gets larger. The closer mu_EI gets to 0.3 likelihood gets larger. The likelihood of model from global search is a little bit higher than the likelihood of model from local search, which suggests a little improvement.
The plot shows that the model from global search captures the trend and time of peak of data. The simulated data sharply increased from week 7 and got to peak about week 13 and decreased after that, which similar to the collected data. The number of reports at peak is not captured well from the model, which probably means more complex model are required to develope to make improvement.
For estimated 4 parameters, \(\beta\), \(\mu_{EI}\), \(\mu_{IR}\), \(\eta\), we look at which one of them differs greatly in global optimzation plots. By looking at those plots, for both datasets, \(\mu_{EI}\), \(\mu_{IR}\) and \(\eta\) circled around similar ranges in likelihood optmization. But on the other hand \(\beta\) differed greatly.For global optimization of Delta variant, \(\beta\) ranged between 50, 200. But for Omicron range was between (150, 600). Therefore we took a step forward and compared profile likelihood of \(\beta\) for two datasets.
We set the box for doing profile likelihood by making the boundary for non fixed variables other than beta and drew the 95% confidence interval. From this plot, we can find that the likelihood peak when \(\beta\) is about 130 for Delta variant. And by the confidence interval, we roughly expect the maximum likelihood be in somewhere between 100 and 150. This does not cover our maximum likelihood estimate in global search which was 73.4151881. This might happen when there is high correlation between model parameters, which should be further investigated. For Omicron, profile likelihood of \(\beta\) suggests a 95% interval roughly between 320 and 440, which covers the MLE in the global search. Nonetheless, it is clear that, \(\beta\) is significantly larger for the Omicron variant.
In this report, we construct SEIR models to explain the spread of Delta and Omicron variant in the United States. For our model, we fix the parameter \(N = 300,000,000, k = 10, \rho = 0.1\) By comparing the log likelihood, we get the estimated value of the parameters of the SEIR model for each variant, the estimated parameters have been shown in the previous part.
By comparing the estimated parameters for the two variants, we can see that the value of \(\beta\) of the Omicron variant is much larger than that of the Delta variant, the value of \(\mu_{EI}\) of the Omicron variant is less than that of the Delta variant but the product of \(\beta\) and \(\mu_{EI}\) of the Omicron variant is larger than that of the Delta variant. The comparison means that the Omicron variant can spread faster than the Delta variant, which corresponds with the fact that the Omicron cases increasing much faster than the Delta cases, which could be supported with report from the article Omicron variant spreads about 70 times faster than delta, expert says [10]. The value of \(\mu_{IR}\) of the Omicron variant is much less than that of the Delta variant, which means that it will take more time for a people infected by Omicron variant to recover than the people infected by Delta variant, which corresponds to the higher peak value of the infected number of Omicron variant. The value of \(\eta\) of the Omicron variant is less than that of the Delta variant, showing that less people are facing the risk of infected by the Omicron variant than infected by the Delta variant, which explains why the number of people infected by the Omicron variant decreases much faster than the number of people infected by the Delta variant. However, our findings for parameters other than \(\beta\) should be interpreted causiosly since we could not find time to run profile likelihood for those and get a statisticly significcant comparison. Rather we only comment on point estimates.
According to our simulation, the SEIR model we get can explain the spread of the two variants, while our estimated model still have some limitations. For example, due to the influence of different prevention and control policy in different time, the spread speed of the virus may be different. Besides, the rate of sequenced cases may be different in different time. To get better models later, more information is needed and also more complex models may be needed.
References.