registerDoRNG(123294940)
foreach(i=1:30,.combine=c) %do% {
library(pomp)
measSIR %>% pfilter(params=params,Np=50000)
} -> pf
pf %>% logLik() %>% logmeanexp(se=TRUE) -> L_pf
L_pf
## se
## -289.375149 5.027848
registerDoRNG(482947940)
bake(file="local_search.rds",{
foreach(i=1:20,.combine=c) %do% {
library(pomp)
library(tidyverse)
measSIR %>%
mif2(
params=params,
Np=2000, Nmif=50,
cooling.fraction.50=0.5,
rw.sd=rw.sd(Beta=0.02, rho=0.02, mu_EI=0.02, eta=ivp(0.02))
)
} -> mifs_local
mifs_local
}) -> mifs_local
We do a local search via iterative filters around the initial guess in the parameter space. We choose a common perturbation size of 0.02 across all the parameters and cooling.fraction.50=0.5 for mif2 to reduce perturbation size in half after 50 iterations. We see the likelihood increases as iteration goes, so the iterative filter is working. We find higher likelihood is usually associated with lower values of \(\rho\) and higher values of \(\mu_{EI}\). There is no clear trend for \(\beta\) and \(\eta\).
mifs_local %>%
traces() %>%
melt() %>%
ggplot(aes(x=iteration,y=value,group=L1,color=factor(L1)))+
geom_line()+
guides(color=FALSE)+
facet_wrap(~variable,scales="free_y")
registerDoRNG(900242057)
bake(file="lik_local.rds",{
foreach(mf=mifs_local,.combine=rbind) %do% {
library(pomp)
library(tidyverse)
evals <- replicate(10, logLik(pfilter(mf,Np=20000)))
ll <- logmeanexp(evals,se=TRUE)
mf %>% coef() %>% bind_rows() %>%
bind_cols(loglik=ll[1],loglik.se=ll[2])
} -> results
results
}) -> results
Because of the perturbations applied to iterative filter, we use particle filter again to evaluate their likelihood for each point estimate. According to the pairwise scatter plot below, there might be a ridge in the likelihood surface of {\(\beta\) and \(\mu_{EI}\)} and {\(\beta\) and \(\eta\)}.
pairs(~loglik+Beta+mu_EI+eta+rho,data=results,pch=16)
set.seed(2062379496)
runif_design(
lower=c(Beta=5,mu_EI=0, rho=0.2,eta=0),
upper=c(Beta=80,mu_EI=5, rho=0.9,eta=0.4),
nseq=100
) -> guesses
mf1 <- mifs_local[[1]]
bake(file="global_search.rds",{
registerDoRNG(1270401374)
foreach(guess=iter(guesses,"row"), .combine=rbind) %do% {
library(pomp)
library(tidyverse)
mf1 %>%
mif2(params=c(unlist(guess),fixed_params)) %>%
mif2(Nmif=100) -> mf
replicate(
10,
mf %>% pfilter(Np=10000) %>% logLik()
) %>%
logmeanexp(se=TRUE) -> ll
mf %>% coef() %>% bind_rows() %>%
bind_cols(loglik=ll[1],loglik.se=ll[2])
} -> results
results
}) %>%
filter(is.finite(loglik)) -> results
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Beta = col_double(),
## mu_IR = col_double(),
## mu_EI = col_double(),
## rho = col_double(),
## eta = col_double(),
## N = col_double(),
## loglik = col_double(),
## loglik.se = col_double()
## )
##
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Beta = col_double(),
## mu_IR = col_double(),
## mu_EI = col_double(),
## rho = col_double(),
## eta = col_double(),
## N = col_double(),
## loglik = col_double(),
## loglik.se = col_double()
## )
Then, we want to do a global search of the likelihood surface. We use a box that contains reasonable parameter values. \(\beta \in (5, 80), \mu_{EI} \in (0, 5), \rho\in(0.2, 0.9), \eta\in(0, 0.4)\). We start with points uniformly selected from the possible box. We re-run iterative filter from the endpoints from before. After all the iterative filters for all the starting points, we use particle filter to evaluate the likelihood. We can see that the parameter values converges from uniformly distributed boxes to regions with higher likelihood. To get a closer look, we can look at each parameter’s profile likelihood.
read_csv("measles_params.csv") %>%
filter(loglik>max(loglik)-20,loglik.se<2) %>%
sapply(range) -> box
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Beta = col_double(),
## mu_IR = col_double(),
## mu_EI = col_double(),
## rho = col_double(),
## eta = col_double(),
## N = col_double(),
## loglik = col_double(),
## loglik.se = col_double()
## )
set.seed(1196696958)
profile_design(
eta=seq(0.01,0.85,length=30),
lower=box[1,c("Beta","mu_EI","rho")],
upper=box[2,c("Beta","mu_EI","rho")],
nprof=15, type="runif"
) -> guesses
mf1 <- mifs_local[[1]]
registerDoRNG(830007657)
bake(file="eta_profile.rds",{
foreach(guess=iter(guesses,"row"), .combine=rbind) %do% {
library(pomp)
library(tidyverse)
mf1 %>%
mif2(params=c(unlist(guess),fixed_params),
rw.sd=rw.sd(Beta=0.02,mu_EI=0.02, rho=0.02)) %>%
mif2(Nmif=100,cooling.fraction.50=0.3) -> mf
replicate(
10,
mf %>% pfilter(Np=1000) %>% logLik()) %>%
logmeanexp(se=TRUE) -> ll
mf %>% coef() %>% bind_rows() %>%
bind_cols(loglik=ll[1],loglik.se=ll[2])
} -> results
results
}) -> results
read_csv("measles_params.csv") %>%
bind_rows(results) %>%
filter(is.finite(loglik)) %>%
arrange(-loglik) %>%
write_csv("measles_params.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Beta = col_double(),
## mu_IR = col_double(),
## mu_EI = col_double(),
## rho = col_double(),
## eta = col_double(),
## N = col_double(),
## loglik = col_double(),
## loglik.se = col_double()
## )
read_csv("measles_params.csv") %>%
filter(loglik>max(loglik)-10) -> all
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Beta = col_double(),
## mu_IR = col_double(),
## mu_EI = col_double(),
## rho = col_double(),
## eta = col_double(),
## N = col_double(),
## loglik = col_double(),
## loglik.se = col_double()
## )
We start by checking the profile likelihood at different values of eta. Because of the limitations of our computational power, we get a few values of eta from its possible range, which causes the gaps in the likelihood plot. However, we can still find the parameter values converges to regions of high likelihood.
pairs(~loglik+Beta+mu_EI+eta+rho,data=all,pch=16)
Here we constructed the profile likelihood of \(\eta\) at different values and applied Wilk’s theorem to find a 95% confidence interval. According to the plot below, the true value of \(\eta\) should be within the range 0.1 and 0.3 with 95% probability.
maxloglik <- max(results$loglik,na.rm=TRUE)
ci.cutoff <- maxloglik-0.5*qchisq(df=1,p=0.95)
results %>%
filter(is.finite(loglik)) %>%
group_by(round(eta,5)) %>%
filter(rank(-loglik)<3) %>%
ungroup() %>%
ggplot(aes(x=eta,y=loglik))+
geom_point()+
geom_smooth(method="loess",span=0.25)+
geom_hline(color="red",yintercept=ci.cutoff)+
lims(y=maxloglik-c(5,0))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## Warning: Removed 12 rows containing missing values (geom_point).
read_csv("measles_params.csv") %>%
group_by(cut=round(rho,2)) %>%
filter(rank(-loglik)<=10) %>%
ungroup() %>%
select(-cut,-loglik,-loglik.se) -> guesses
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Beta = col_double(),
## mu_IR = col_double(),
## mu_EI = col_double(),
## rho = col_double(),
## eta = col_double(),
## N = col_double(),
## loglik = col_double(),
## loglik.se = col_double()
## )
mf1 <- mifs_local[[1]]
registerDoRNG(2105684752)
bake(file="rho_profile.rds",{
foreach(guess=iter(guesses,"row"), .combine=rbind) %do% {
library(pomp)
library(tidyverse)
mf1 %>%
mif2(params=guess,
rw.sd=rw.sd(Beta=0.02,mu_EI=0.02, eta=ivp(0.02))) %>%
mif2(Nmif=100,cooling.fraction.50=0.3) %>%
mif2(Nmif=100,cooling.fraction.50=0.1) -> mf
replicate(
10,
mf %>% pfilter(Np=10000) %>% logLik()) %>%
logmeanexp(se=TRUE) -> ll
mf %>% coef() %>% bind_rows() %>%
bind_cols(loglik=ll[1],loglik.se=ll[2])
} -> results
results
}) -> results
read_csv("measles_params.csv") %>%
bind_rows(results) %>%
filter(is.finite(loglik)) %>%
arrange(-loglik) %>%
write_csv("measles_params.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Beta = col_double(),
## mu_IR = col_double(),
## mu_EI = col_double(),
## rho = col_double(),
## eta = col_double(),
## N = col_double(),
## loglik = col_double(),
## loglik.se = col_double()
## )
results %>%
filter(is.finite(loglik)) -> results
Again, we can do the same for the reporting rate parameter \(\rho\). We find the parameter values converges to regions with higher likelihood.
pairs(~loglik+Beta+mu_EI+eta+rho,data=results,pch=16)
The best result of SEIR we got from the global search has likelihood -120 with a standard error of 0.13. From the lecture notes, we find the best likelihood is also -120. These comparable likelihood values imply that there may not be enough evidence to use more complicated model than SIR.
According to the results, we achieve the highest likelihood with the set of parameter values \(\beta=18, \mu_{IR}=2, \mu_{EI}=20, \rho=0.2, \eta=0.14, N=38000\). From the pairwise likelihood plot above, we find \(\beta\) converges to a region between 0 and 50. By the confidence interval from our profile likelihood of \(\eta\) and \(\rho\), we find the true value of \(\eta\) should be within the range 0.1 and 0.3 with 95% probability and the true value for \(\rho\) should be in range 0.1 and 0.4 with 95% probability.
On the other hand, we know from the lecture notes that for SIR model, \(\beta\) converges to a region around 20, the true value of \(\eta\) should be within the range 0.1 and 0.5 with 95% probability, and the true value for \(\rho\) should be in range 0.1 and 0.3 with 95% probability. Thus, we conclude the values of parameter estimates are not significantly different.
results %>%
filter(loglik>max(loglik)-10,loglik.se<1) %>%
group_by(round(rho,2)) %>%
filter(rank(-loglik)<3) %>%
ungroup() %>%
ggplot(aes(x=rho,y=loglik))+
geom_point()+
geom_hline(
color="red",
yintercept=max(results$loglik)-0.5*qchisq(df=1,p=0.95)
)
This solution are adapted from a homework submission by Xingwen Wei