Solution to Homework 7

Question 7.2 : Investigating the SEIR model

First we do a local search start with the initial parameter values, $\beta=40, \mu_{IR}=1.3, \mu_{EI}=0.8, \rho=0.5, \eta=0.06$ , and $N=38000$ where we fix the values of $\mu_{IR}$ and $N$ as truth. We do particle filters 30 times to get an unbiased likelihood estimate of -289 with a Monte Carlo standard error of 5.0.

registerDoRNG(123294940)
foreach(i=1:30,.combine=c) %do% {
  library(pomp)
  measSIR %>% pfilter(params=params,Np=50000)
} -> pf

pf %>% logLik() %>% logmeanexp(se=TRUE) -> L_pf
L_pf

##                      se 
## -289.375149    5.027848

registerDoRNG(482947940)
bake(file="local_search.rds",{
  foreach(i=1:20,.combine=c) %do% {
    library(pomp)
    library(tidyverse)
    measSIR %>%
      mif2(
        params=params,
        Np=2000, Nmif=50,
        cooling.fraction.50=0.5,
        rw.sd=rw.sd(Beta=0.02, rho=0.02, mu_EI=0.02, eta=ivp(0.02))
      )
  } -> mifs_local
  mifs_local
}) -> mifs_local

We do a local search via iterative filters around the initial guess in the parameter space. We choose a common perturbation size of 0.02 across all the parameters and cooling.fraction.50=0.5 for mif2 to reduce perturbation size in half after 50 iterations. We see the likelihood increases as iteration goes, so the iterative filter is working. We find higher likelihood is usually associated with lower values of $\rho$ and higher values of $\mu_{EI}$ . There is no clear trend for $\beta$ and $\eta$ .

mifs_local %>%
  traces() %>%
  melt() %>%
  ggplot(aes(x=iteration,y=value,group=L1,color=factor(L1)))+
  geom_line()+
  guides(color=FALSE)+
  facet_wrap(~variable,scales="free_y")

registerDoRNG(900242057)
bake(file="lik_local.rds",{
  foreach(mf=mifs_local,.combine=rbind) %do% {
    library(pomp)
    library(tidyverse)
    evals <- replicate(10, logLik(pfilter(mf,Np=20000)))
    ll <- logmeanexp(evals,se=TRUE)
    mf %>% coef() %>% bind_rows() %>%
      bind_cols(loglik=ll[1],loglik.se=ll[2])
  } -> results
  results
}) -> results

Because of the perturbations applied to iterative filter, we use particle filter again to evaluate their likelihood for each point estimate. According to the pairwise scatter plot below, there might be a ridge in the likelihood surface of { $\beta$ and $\mu_{EI}$ } and { $\beta$ and $\eta$ }.

pairs(~loglik+Beta+mu_EI+eta+rho,data=results,pch=16)

set.seed(2062379496)

runif_design(
  lower=c(Beta=5,mu_EI=0, rho=0.2,eta=0),
  upper=c(Beta=80,mu_EI=5, rho=0.9,eta=0.4),
  nseq=100
) -> guesses

mf1 <- mifs_local[[1]]

bake(file="global_search.rds",{
  registerDoRNG(1270401374)
  foreach(guess=iter(guesses,"row"), .combine=rbind) %do% {
    library(pomp)
    library(tidyverse)
    mf1 %>%
      mif2(params=c(unlist(guess),fixed_params)) %>%
      mif2(Nmif=100) -> mf
    replicate(
      10,
      mf %>% pfilter(Np=10000) %>% logLik()
    ) %>%
      logmeanexp(se=TRUE) -> ll
    mf %>% coef() %>% bind_rows() %>%
      bind_cols(loglik=ll[1],loglik.se=ll[2])
  } -> results
  results
}) %>%
  filter(is.finite(loglik)) -> results

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Beta = col_double(),
##   mu_IR = col_double(),
##   mu_EI = col_double(),
##   rho = col_double(),
##   eta = col_double(),
##   N = col_double(),
##   loglik = col_double(),
##   loglik.se = col_double()
## )
## 
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Beta = col_double(),
##   mu_IR = col_double(),
##   mu_EI = col_double(),
##   rho = col_double(),
##   eta = col_double(),
##   N = col_double(),
##   loglik = col_double(),
##   loglik.se = col_double()
## )

Then, we want to do a global search of the likelihood surface. We use a box that contains reasonable parameter values. $\beta \in (5, 80), \mu_{EI} \in (0, 5), \rho\in(0.2, 0.9), \eta\in(0, 0.4)$ . We start with points uniformly selected from the possible box. We re-run iterative filter from the endpoints from before. After all the iterative filters for all the starting points, we use particle filter to evaluate the likelihood. We can see that the parameter values converges from uniformly distributed boxes to regions with higher likelihood. To get a closer look, we can look at each parameter’s profile likelihood.

read_csv("measles_params.csv") %>%
  filter(loglik>max(loglik)-20,loglik.se<2) %>%
  sapply(range) -> box

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Beta = col_double(),
##   mu_IR = col_double(),
##   mu_EI = col_double(),
##   rho = col_double(),
##   eta = col_double(),
##   N = col_double(),
##   loglik = col_double(),
##   loglik.se = col_double()
## )

set.seed(1196696958)
profile_design(
  eta=seq(0.01,0.85,length=30),
  lower=box[1,c("Beta","mu_EI","rho")],
  upper=box[2,c("Beta","mu_EI","rho")],
  nprof=15, type="runif"
) -> guesses

mf1 <- mifs_local[[1]]

registerDoRNG(830007657)
bake(file="eta_profile.rds",{
  foreach(guess=iter(guesses,"row"), .combine=rbind) %do% {
    library(pomp)
    library(tidyverse)
    mf1 %>%
      mif2(params=c(unlist(guess),fixed_params),
           rw.sd=rw.sd(Beta=0.02,mu_EI=0.02, rho=0.02)) %>%
      mif2(Nmif=100,cooling.fraction.50=0.3) -> mf
    replicate(
      10,
      mf %>% pfilter(Np=1000) %>% logLik()) %>%
      logmeanexp(se=TRUE) -> ll
    mf %>% coef() %>% bind_rows() %>%
      bind_cols(loglik=ll[1],loglik.se=ll[2])
  } -> results
  results
}) -> results

read_csv("measles_params.csv") %>%
  bind_rows(results) %>%
  filter(is.finite(loglik)) %>%
  arrange(-loglik) %>%
  write_csv("measles_params.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Beta = col_double(),
##   mu_IR = col_double(),
##   mu_EI = col_double(),
##   rho = col_double(),
##   eta = col_double(),
##   N = col_double(),
##   loglik = col_double(),
##   loglik.se = col_double()
## )

read_csv("measles_params.csv") %>%
  filter(loglik>max(loglik)-10) -> all

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Beta = col_double(),
##   mu_IR = col_double(),
##   mu_EI = col_double(),
##   rho = col_double(),
##   eta = col_double(),
##   N = col_double(),
##   loglik = col_double(),
##   loglik.se = col_double()
## )

We start by checking the profile likelihood at different values of eta. Because of the limitations of our computational power, we get a few values of eta from its possible range, which causes the gaps in the likelihood plot. However, we can still find the parameter values converges to regions of high likelihood.

pairs(~loglik+Beta+mu_EI+eta+rho,data=all,pch=16)

Here we constructed the profile likelihood of $\eta$ at different values and applied Wilk’s theorem to find a 95% confidence interval. According to the plot below, the true value of $\eta$ should be within the range 0.1 and 0.3 with 95% probability.

maxloglik <- max(results$loglik,na.rm=TRUE)
ci.cutoff <- maxloglik-0.5*qchisq(df=1,p=0.95)

results %>%
  filter(is.finite(loglik)) %>%
  group_by(round(eta,5)) %>%
  filter(rank(-loglik)<3) %>%
  ungroup() %>%
  ggplot(aes(x=eta,y=loglik))+
  geom_point()+
  geom_smooth(method="loess",span=0.25)+
  geom_hline(color="red",yintercept=ci.cutoff)+
  lims(y=maxloglik-c(5,0))

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 12 rows containing non-finite values (stat_smooth).

## Warning: Removed 12 rows containing missing values (geom_point).

read_csv("measles_params.csv") %>%
  group_by(cut=round(rho,2)) %>%
  filter(rank(-loglik)<=10) %>%
  ungroup() %>%
  select(-cut,-loglik,-loglik.se) -> guesses

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Beta = col_double(),
##   mu_IR = col_double(),
##   mu_EI = col_double(),
##   rho = col_double(),
##   eta = col_double(),
##   N = col_double(),
##   loglik = col_double(),
##   loglik.se = col_double()
## )

mf1 <- mifs_local[[1]]
registerDoRNG(2105684752)
bake(file="rho_profile.rds",{
  foreach(guess=iter(guesses,"row"), .combine=rbind) %do% {
    library(pomp)
    library(tidyverse)
    mf1 %>%
      mif2(params=guess,
           rw.sd=rw.sd(Beta=0.02,mu_EI=0.02, eta=ivp(0.02))) %>%
      mif2(Nmif=100,cooling.fraction.50=0.3) %>%
      mif2(Nmif=100,cooling.fraction.50=0.1) -> mf
    replicate(
      10,
      mf %>% pfilter(Np=10000) %>% logLik()) %>%
      logmeanexp(se=TRUE) -> ll
    mf %>% coef() %>% bind_rows() %>%
      bind_cols(loglik=ll[1],loglik.se=ll[2])
  } -> results
  results
}) -> results

read_csv("measles_params.csv") %>%
  bind_rows(results) %>%
  filter(is.finite(loglik)) %>%
  arrange(-loglik) %>%
  write_csv("measles_params.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Beta = col_double(),
##   mu_IR = col_double(),
##   mu_EI = col_double(),
##   rho = col_double(),
##   eta = col_double(),
##   N = col_double(),
##   loglik = col_double(),
##   loglik.se = col_double()
## )

results %>%
  filter(is.finite(loglik)) -> results

Again, we can do the same for the reporting rate parameter $\rho$ . We find the parameter values converges to regions with higher likelihood.

pairs(~loglik+Beta+mu_EI+eta+rho,data=results,pch=16)

The best result of SEIR we got from the global search has likelihood -120 with a standard error of 0.13. From the lecture notes, we find the best likelihood is also -120. These comparable likelihood values imply that there may not be enough evidence to use more complicated model than SIR.
According to the results, we achieve the highest likelihood with the set of parameter values $\beta=18, \mu_{IR}=2, \mu_{EI}=20, \rho=0.2, \eta=0.14, N=38000$ . From the pairwise likelihood plot above, we find $\beta$ converges to a region between 0 and 50. By the confidence interval from our profile likelihood of $\eta$ and $\rho$ , we find the true value of $\eta$ should be within the range 0.1 and 0.3 with 95% probability and the true value for $\rho$ should be in range 0.1 and 0.4 with 95% probability.

On the other hand, we know from the lecture notes that for SIR model, $\beta$ converges to a region around 20, the true value of $\eta$ should be within the range 0.1 and 0.5 with 95% probability, and the true value for $\rho$ should be in range 0.1 and 0.3 with 95% probability. Thus, we conclude the values of parameter estimates are not significantly different.

We plotted the profile likelihood of $\rho$ (the reporting rate) at different values. By Wilk’s theorem, we construct 95% confidence interval again for rho. We find the true value for $\rho$ should be in range 0.1 and 0.4 with 95% probability. We remember from the lecture notes that the true value for $\rho$ should be in range 0.1 and 0.3 with 95% probability. Thus, we conclude that they are not significantly different.

results %>%
  filter(loglik>max(loglik)-10,loglik.se<1) %>%
  group_by(round(rho,2)) %>%
  filter(rank(-loglik)<3) %>%
  ungroup() %>%
  ggplot(aes(x=rho,y=loglik))+
  geom_point()+
  geom_hline(
    color="red",
    yintercept=max(results$loglik)-0.5*qchisq(df=1,p=0.95)
  )

Acknowledgements

This solution are adapted from a homework submission by Xingwen Wei

Solution to Homework 7

STATS/DATASCI 531, Winter 2021

Question 7.2 : Investigating the SEIR model

Acknowledgements