Study of COVID-19 data

Introduction

The data I got are cases of Covid-19 from China, Hubei from 22/01/2020 to 21/04/2020. The data is from humdata website dataset (https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases). The original dataset contains the global data of cases, recovery numbers and death count. Here I only choose the data of infection and recovery cases.

However, the COVID-19 is a serious infectious disease which is very contagious and during its spread in Hubei there are many other factors affect the number of cases such as the number of reagent and the close of city from public transportation.

So the question I have is that will the traditional SIR model we learned from lecture be suitable for COVID-19? If not, what may be the reason for that. In my project I’m going to apply SIR model to my data to see how suitable it can fit the situation of coronavirus.

Data overview

The original data from website are accumulation cases, recovery and death from lots of countries globally. However, here I only select data from China, Hubei. Also, I extract the number of infection by calculating accumulation cases - recovery.

The plot of infection is shown as below.

The plot of recovery is shown as below.

From the above two graph we can see the rough trend of infection and recovery plot of COVID-19 is similar to traditional infectious disease. So we first try to use a simple SIR model to fit the data.

Model fit.

SIR

We can first use a simple SIR model with three states: susceptible; Infected and recovered which is sutible for our data.

To apply the SIR model on our data, we will use POMP framework.

## Loading required package: foreach

## Loading required package: iterators

## Loading required package: parallel

## Loading required package: rngtools

## Welcome to pomp version 2!
## For information on upgrading your pomp version < 2 code, see the
## 'pomp version 2 upgrade guide' at https://kingaa.github.io/pomp/.

First, we can see the simulation of a simple SIR model with original data with parameters:

Beta=1.5,gamma=1,rho=0.9,N=1000000.

We can see from the simulation plot that the simulation graph is not close to the data. It’s not a weird thing since the single SIR model is to simple for a complex situation. So I think of adding three extra states R1, R2 and R3 as taught on Lecture.

SIR with two extra states R1, R2 and R3

After add R1 R2 there will be five states S, I, R1, R2, R3. And there will also be two extra parameters \(\mu_{R1}\) and \(\mu_{R2}\).

In my case, I now assume R1 to be the state that patient are confined and under cure process. R2 are patients who are still under convalescent. R3 are patients who are viewed as under healthy state.

We can first apply local search on our data to find proper parameters

From the local search we can find parameter set that get the largest likelihood: Beta=2,mu_IR=0.5,rho=1,mu_R1=0.1, mu_R2 = 0.3

To see the simulation graph with our parameters. The simulation is still not closed to our real data.

Then we can see the plot of mif2 filter diagnostic

The first plot is weird that lots of sample points are zero which may means that this model is not so suitable for COVID-19 data. The second plot shows that for loglik and parameters it’s also hard to find a converge trend. So it is obvious that simple SIR and modified SIR is not suitable for COVID-19 data.

Conclusion

From the analysis above we can see that although the original data plot of COVID-19 have a pattern similar to normal convalescent. The model of simple SIR is not suitable of COVID-19 situation. We can see from my experiments that the parameters of model didn’t show a converge trend and the likelihood analysis also seems not correct.

Reason assumed:

Coronavirus cases numbers are depend on the number of professional reagent. So the number of reagent can also be latent contraint on the data of coronavirus states.
Some political measures also need to be considered. For instance, the control of public transportation can have a greate influence on the case of COVID-19.
Another reason may be that the COVID-19 has a extreme long incubation period. So only infection and recovery state will be too weak to simulate the propagation of COVID-19.

Future works: Some more complexed model such as SEIR model may have better performances on fitting the data of COVID-19. Since the SEIR model consider more complex situtation and add some states such as exposed which is proper for COVID-19.

Also, there may be more trend explored if I run the model with more iterations. However, I’m now under really bad network condition and the situation will last until May. So sorry for the simple demo of plot and project.

Reference

[1]Dataset https://archive.ics.uci.edu/ml/machine-learning-databases/00235/

[2]STATS 531 course notations.