Introduction

The United States is now number one in the world as measured by number of confirmed cases of Covid-19. On April 28, 2020, the United States had 989,357 confirmed cases, according to [1]. This is far beyond the runner up, Spain, with its 232,128 confirmed cases. After one month of lockdown in the United States, some protesters and some politicians are calling for an end to the lockdown, and some states have already begun easing restrictions.

Did the lockdown have any affect? What would the confirmed case count look like if there was not a lockdown? Did the lockdown flatten the curve enough to warrant lifting restrictions? In this final project I attempt to answer these questions, and compare with my results about South Korea in my midterm project [2]. I use some coding patterns from [2] for least squares regression to fit a quadratic function to pre-lockdown US confirmed cases, and to fit a linear function to lockdown US confirmed cases. Then I use Professor Ionide’s Chapter 11 SIR model for the flu in pomp to fit a pomp object for confirmed cases of Covid-19, and to simulate from it. Main findings are: before lockdown growth was quadratic, during lockdown growth was very linear, and during lockdown there were weekly peaks on Fridays. Finally, pomp simulations did not fit the overall data very well, despite much experimentation with the parameters before simulating.

Data Description, Data Transformation, and Visualization

The data source is the Johns Hopkins Center for Systems Science and Engineering, linked in [3] below. I used the time series csv files labelled global, not US. Then I extracted the US row from the three global data sets. The information is cumulative counts of confirmed cases, recovered cases, and deaths, on each day from 1/22/2020 to 4/22/2020. For future compatibility of this file with the data read in, I saved the data to several csv files and this Rmd file reads from the csv files rather than the website (the website formatting might change in the future).

I prepared two data frames for analysis. The first data frame combines cumulative counts of US confirmed cases, recovered cases, and deaths, on each day from 1/22/2020 to 4/22/2020, as well as the number of days since 1/22/2020 (taking 1/22/2020 as day one). I then turned this first data frame into a zoo object, together with the dates. This first data frame was used for all the regression analysis. Below is a plot of this first data frame.

The second data frame was prepared as input for Professor Ionides’ SIR model in pomp. This data frame had two columns: the number of days since 1/22/2020 (taking 1/22/2020 as day one), and the number of people in the infected state on day \(n\). This corresponds to the variable \(B\) (boys with the flu) in the flu model example in Notes 11. In order to obtain the number of people \(B\) in the infected state on day \(n\), I subtracted from the cumulative confirmed count the cumulative recovered and cumulative dead (shifted by 1). A plot of the variable \(B\) is below.

Quadratic Fit before Lockdown, Linear Fit during Lockdown

Lockdowns in many states began around March 20, 2020, and the effects became visible 10 days later on April 1, 2020 (see the first graph above). Testing capacity in the US has been very limited since the first US case on January 22, 2020. Thus, looking at the first graph above, I think the reasonable time span on which to fit a quadratic model is March 27 to April 1. This balances as-reliable-as-possible testing with pre-lockdown data. Since the effects of the lockdown began showing on April 1, 2020, I fit a linear model to the time span April 1 to April 22.

A visual inspection of the plots below show a good fit. The confirmed case count is in red, the blue curve is the quadratic fit. The purple line almost coincides with the confirmed count during lockdown. The plots of the quadratic model and linear model begin at the first day I used to fit them, March 27 and April 1, respectively.

We can make a few comments about the graph. We see that linear growth is a very good model for confirmed cases under lockdown, while quadratic looks good pre-lockdown (though of course very few points were used to fit the quadratic model). Thus, it appears that the lockdown did slow the spread of Covid-19. A very distinctive feature of the good fit of the linear model is that the confirmed case curve during lockdown did not at all begin to level off. The lockdown did not flatten the curve enough to warrant lifting restrictions. Another distinctive feature of the graph is the wide gap between the quadratic prediction on April 22 and the actual confirmed case count on April 22. This difference is 366,606 . In other words, had the lockdown not been in place, I estimate there would have been 366,606 more cases on April 22. This means 43.7% more cases, had there not been a lockdown.

The US numbers are even more dramatic when we view them in comparison to the analogous numbers for South Korea that I found in my midterm project [2]. See the South Korean graph further below, which is taken from my midterm project [2]. There I found that South Korea would have had 43.7% more cases on March 9, had they not implemented their lockdown measures some weeks earlier (the South Korean measures appear to start having an affect on Februrary 29th). The South Korean percentage 43.7% is the same as the US percentage, but the South Korean time span is 10 days and the US time span is 22 days (counting from the visible start of the affect). Most dramatically, the South Korean curve quickly levelled off after 10 days. The US line shows no sign of levelling off after 22 days.

Caption: South Korean Quadratic Fit before Lockdown from [2].

Caption: South Korean Quadratic Fit before Lockdown from [2].

Friday Peaks during Lockdown

I detrended the lockdown time series and found there to be peaks on Fridays. The tick marks are Sundays (April 6, 13, and 20), and the peaks are two days earlier. I think three successive peaks on the same day of the week are unlikely to be due to chance, so I think the pattern is real. One theory might be that people are working despite symptoms and decide to go the hospital as soon as the work week is over. But on the other hand, it is unclear how many people are actually working under lockdown.

Fitting a pomp Object to Confirmed Cases Using the SIR Flu Model in pomp from Professor Ionides’ Chapter 11 Notes

Lastly, I fit Professor Ionide’s SIR Flu model [4] to the confirmed cases. The transformation of the data to the count of people in the infected state on day \(n\) was described above in the data section. An SIR model is a compartment model with three states: susceptible, infected, recovered. In the US data, I defined the “recovered” compartment to include both the recovered and the dead. From Chapter 11 Notes slides 23, 24, and 28, the mathematical specification of the model is:

\[N=S+I+R,\] \[\Delta{N_{SI}} \sim \text{Binomial}\big(S,1-e^{-\beta\Delta{t}}\big),\] \[\Delta{N_{IR}} \sim \text{Binomial}\big(I,1-e^{-\gamma\Delta{t}}\big),\] \[B_t \sim \text{Binomial}\big(H(t)-H(t-1),\rho\big).\] Here \(\Delta N\) is the number of people who transition from its first subscript to its second subscript in time interval \(\Delta t\), and \(B\) is the case reports.

For the simulation I used the paramaters \(\beta=1.2\), \(\gamma=1.18\), \(\rho=.6\), \(N=3*10^8\) (approximate US population), and initial values of \(S=N-1\), \(I=1\), \(R=0\), and \(H=0\). I do not display the code here for fitting the pomp object, because it is directly from Chapter 11. I only display the simulation call to prove that it is working. I simulated 40 times. It is strange that of the 40 simulations, most stay at 0 the entire time, and all are below the data curve (though may eventually surpass it). In fact, I had to try many different parameters to find some parameters that gave non-zero simulations at all.

## ----sir_sim------------------------------------------------------------------
sims <- simulate(sir,params=c(Beta=1.2,
                              gamma=1.18,
                              rho=.6,
                              N=3*10^8),
                 nsim=40,format="data.frame",include=TRUE)
ggplot(sims,mapping=aes(x=day,y=B,group=.id,color=.id=="data")) +
  geom_line() + 
  guides(color=FALSE) +
  ggtitle("40 Simulations from the Flu Pomp Model, \nFit to Confirmed Coronavirus Cases in the US\nBlue=Data, Red=Simulation ")

Discussion

Using least squares regression, I found that before lockdown US confirmed-case growth was quadratic, but during lockdown the growth was linear. I detrended the lockdown time series and discovered weekly peaks on Fridays. The pomp simulations did not fit the data very well. The major omission of course is likelihood based methods. It would have been interesting to see pomp simulations using maximum likelihood estimates, but unfortunately I could not get the code to work, despite many hours of trying.

Another critique of this project is that the data is inaccurate. The testing capacity of the US is still very limited, so there are many unrecorded cases of Covid-19. This is an appropriate situation to apply a partially observed Markov process, because the measurement model is distinct from the process model.

The main takeaway of this project is that the US lockdown did have an affect. By my estimates, on April 22, 2020, there would have been 43.7% more cases, had there not been a US lockdown. The coming months will likely see many more cases, as several US states have already begun re-opening in late April, 2020, although the lockdown line has not levelled off at all.

Acknowledgements

For the regression model fitting part I used coding patterns and the main idea from 2020 Midterm Project 43 “Assessing South Korean Measures to Halt Coronavirus Spread” [2], but adapted here to the US case instead of South Korea. [2] was my midterm project. In the present project I found a new way to plot the fitted models on the graph using NAs for the preceding days.

The R code from Professor Ionides’ Chapter 11 Notes [4] was used literally in the pomp section. The only modifications I did to it were: different data set, and different parameters, and different total, and slight modification of the simulation.

I did read the final projects [5] and [6] cited below from Winter 2018, but I could not use them because I was unable to get the code working on my computer or RStudio Cloud, unfortunately.

Chapter 14 of Paul Teetor’s cookbook [1C] was useful for my coding. The other references in the coding bibliography were useful for my midterm project, so also useful here for plotting. I used the knitr default set up options from 2018 Project 11 [5C] plus my own.

References

[1] COVID-19 Dashboard by the Center for Systems Science and Engineering at Johns Hopkins University, https://coronavirus.jhu.edu/map.html

[2] Midterm Project 43, Stats 531 Winter 2020, “Assessing South Korean Measures to Halt Coronavirus Spread”, https://ionides.github.io/531w20/midterm_project/project43/Midterm_Project1.html

[3] Github Site: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE, https://github.com/CSSEGISandData/COVID-19, in particular I used the time series data called global at https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

[4] Edward Ionides, Notes Chapter 11: Dynamic Models and their Simulation by Euler’s Method, https://ionides.github.io/531w20/11/notes11.pdf

[5] Miao Wang, Final Project 15, Stats 531 Winter 2018, “A Case Study on Smallpox Transmission Dynamic in California 1928-1935”, https://ionides.github.io/531w18/final_project/15/final.html

[6] Final Project 42, Stats 531 Winter 2018, “Modelling of SARS in Beijing April-June, 2003”, https://ionides.github.io/531w18/final_project/42/final.html

References for Coding Tools

[1C] Paul Teetor, R Cookbook, O’Reilly Media Company, 2014

[2C] https://blog.revolutionanalytics.com/2014/01/quantitative-finance-applications-in-r-plotting-xts-time-series.html

[3C] https://joshuaulrich.github.io/xts/plotting_basics.html

[4C] Edward Ionides, Lecture notes, https://ionides.github.io/531w20/#class-notes

[5C] 535 Project 11 from 2018: Human Duodenal MMC Phase 3 Motility Model Based on Manometry Readings, https://ionides.github.io/531w18/midterm_project/project11/midterm_project.html