1 Introduction

\[{{\quad\quad}\displaystyle P}_{n+1} = r\,P_{n}\,\exp(-P_{n}+\varepsilon_{n}), \qquad \varepsilon_{n}\;\sim\;\mathrm{Normal}(0,\sigma^2)\]


2 Summary of the data

blowflies <- read.csv("http://ionides.github.io/531w16/final_project/blowfly4.csv",skip=3)
head(blowflies)
##   day   y
## 1   0 948
## 2   2 942
## 3   4 911
## 4   6 858
## 5   8 801
## 6  10 676


3 Fitting an SARMA model

3.1 Selecting SARMA model using AIC

  • First, let’s fit the data with an SARMA model to show the “seasonalty”" we see from the data. From the spectrum we set the seasonal period to be 18 bidays.

\[ (1-{\Phi}_p B^{18})(1-{\phi}_p B) X_n = (1+{\psi}_q B)\epsilon_n.\]

MA0 MA1 MA2 MA3 MA4 MA5
AR0 3550.94 3426.52 3371.20 3348.41 3338.24 3336.03
AR1 3341.66 3334.28 3329.34 3329.90 3330.91 3331.66
AR2 3329.06 3297.07 3298.87 3300.09 3293.95 3290.09
AR3 3320.99 3298.89 3300.85 3290.91 3288.77 3290.38
AR4 3319.67 3300.64 3289.16 3303.47 3304.93 3305.11
  • The AIC table suggests that SARMA(3,4) with a period of 18 bidays fits the data best.
arima(blowflies$pop,order=c(3,0,4),seasonal=list(order=c(1,0,0),period=18))
## 
## Call:
## arima(x = blowflies$pop, order = c(3, 0, 4), seasonal = list(order = c(1, 0, 
##     0), period = 18))
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     ma2      ma3     ma4     sar1
##       2.3853  -1.9321  0.4925  -1.5527  0.5791  -0.0631  0.2098  -0.0006
## s.e.  0.0676   0.1281  0.0677   0.0258  0.1215   0.1239  0.0463   0.0829
##       intercept
##        2666.604
## s.e.    179.926
## 
## sigma^2 estimated as 647408:  log likelihood = -1634.39,  aic = 3288.77
  • The log-likelihood of the SARMA(3,4) with a period of 18 bidays is -1634.39.

3.2 Residual analysis

  • The plot of residuals shows that this model fits bad on the data, with some high peaks in the residual plot. However the acf of the residuals shows that the residual can be seen as independent. The qq-plot shows that the redisuals are not Gaussian distributed.


4 Fitting a POMP model

4.1 Beverton-Holt model

  • Now we want to fit a POMP model for the data.
  • Let’s try the stochastic Beverton-Holt model. \[P_{n+1} = \frac{a\,P_n}{1+b\,P_n}\,\varepsilon_n,\] where \(a\) and \(b\) are parameters and \[\varepsilon_t \sim \mathrm{Lognormal}(-\tfrac{1}{2}\sigma^2,\sigma^2).\]

  • The measurement model is

\[{{\quad\quad}\displaystyle Y}_{n}|P_n\;\sim\;\mathrm{Poisson}(\phi\,P_{n})\].

  • The parameter \(\phi\) is proportional to the sampling effort. Here in this dataset, \(\phi\) can be set as 1.

4.2 Constructing the pomp object

  • Now we can construct a pomp object according to the stochastic Beverton-Holt model.
require(pomp)
skel <- Csnippet("
                 DN = (a*N)/(1+(b*N));
                 ")
stochStep <- Csnippet("
                      e = rlnorm(-0.5*sigma*sigma,sigma);
                      N = ((a*N)/(1+(b*N)))*e;
                      ")
rmeas <- Csnippet("pop = rpois(phi*N);")
dmeas <- Csnippet("lik = dpois(pop,phi*N,give_log);")

pomp(blowflies,
     times="day",t0=0,
     skeleton=skel,
     rprocess=discrete.time.sim(step.fun=stochStep,delta.t=2),
     paramnames=c("a","b","sigma","phi"),
     statenames=c("N","e"),
     rmeasure=rmeas,
     dmeasure=dmeas
     ) -> blowfly
  • By setting the parameters we can give one naive simulation of the model.
coef(blowfly) <- c(N.0=948,e.0=0,a=3,b=0.0007,sigma=0.6,phi=1)
sims <- simulate(blowfly,nsim=1,as.data.frame=TRUE,include.data=TRUE)

ggplot(sims,mapping=aes(x=time,y=pop,group=sim,color=sim=="data"))+
  geom_line()+guides(color=FALSE)

4.3 Likelihood slice

  • Now we can construct a likelihood slice and see how the likelihood changes with the parameters in the POMP model. Here we will use parallel computing.The plots of slices are below.It is convenient to do some parallelization when generating replications.
sliceDesign(
  c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1),
  a=rep(seq(from=2.5,to=4,length=40),each=3),
  b=rep(seq(from=0.0004,to=0.0009,length=40),each=3),
  sigma=rep(seq(from=0.5,to=1.1,length=40),each=3)) -> p

blowfly_mle=unlist(p[which.max(p$loglik),])

require(foreach)
require(doMC)

registerDoMC(cores=5)   

set.seed(998468235L,kind="L'Ecuyer")
mcopts <- list(preschedule=FALSE,set.seed=TRUE)

foreach (theta=iter(p,"row"),.combine=rbind,
         .inorder=FALSE,.options.multicore=mcopts) %dopar% 
         {
           pfilter(blowfly,params=unlist(theta),Np=5000) -> pf
           theta$loglik <- logLik(pf)
           theta
         } -> p

foreach (v=c("a","b","sigma")) %do% 
{
  x <- subset(p,slice==v)
  plot(x[[v]],x$loglik,xlab=v,ylab="loglik")
}

4.4 Filtering on simulated data

  • Let’s check that we can indeed filter and re-estimate parameters successfully for this simulated data.We proceed to carry out replicated particle filters at this tentative MLE:
simulate(blowfly,params=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1),
         nsim=10000,states=TRUE) -> x

ell <- dmeasure(blowfly,y=obs(blowfly),x=x,times=time(blowfly),log=TRUE,
                params=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1))
dim(ell)
## [1] 10000   201
ell <- apply(ell,1,sum); summary(exp(ell)); logmeanexp(ell,se=TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0       0
##                    se 
## -213729.92     563.92
pf <- pfilter(blowfly,Np=5000,params=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1))
logLik(pf)
## [1] -1713.241
pf <- replicate(10,pfilter(blowfly,Np=5000,params=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1)))
ll <- sapply(pf,logLik); ll
##  [1] -1713.400 -1711.800 -1711.691 -1711.378 -1725.610 -1714.222 -1711.897
##  [8] -1709.419 -1712.100 -1715.233
L_pf<-logmeanexp(ll,se=TRUE)
  • We obtain an unbiased likelihood estimate of -1711.3 with a Monte standard error of 0.94. From the slicing we can get a tentative MLE of \(a=3.3\), \(b=0.0006\), \(\sigma=0.7\).

4.5 Setting run-levels

  • Then we set the run levels and prepare for searching for the MLE of the parameters using particle filter.
run_level <- 3
switch(run_level,
       {blowfly_Np=100; blowfly_Nmif=10; blowfly_Neval=4; blowfly_Nglobal=10; blowfly_Nlocal=10}, 
       {blowfly_Np=1000; blowfly_Nmif=100; blowfly_Neval=10; blowfly_Nglobal=20; blowfly_Nlocal=20}, 
       {blowfly_Np=10000; blowfly_Nmif=300; blowfly_Neval=20; blowfly_Nglobal=100; blowfly_Nlocal=20}
)

4.6 Local search for MLE

  • First we can do a local search based on the information from the likelihood surface slice.
stew(file=sprintf("blowfly_local_search-%d.rda",run_level),{
  
  t_local <- system.time({
    mifs_local <- foreach(i=1:blowfly_Nlocal,.packages='pomp', .combine=c, .options.multicore=mcopts) %dopar%  {
      mif2(
        blowfly,
        start=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1),
        Np=blowfly_Np,
        Nmif=blowfly_Nmif,
        cooling.type="geometric",
        cooling.fraction.50=0.5,
        transform=TRUE,
        rw.sd=rw.sd(
          a=0.002,
          b=0.000001,
          sigma=0.001
        )
      )
      
    }
  })
  
},seed=900242057,kind="L'Ecuyer")


stew(file=sprintf("blowfly_lik_local-%d.rda",run_level),{
  t_local_eval <- system.time({
    liks_local <- foreach(i=1:blowfly_Nlocal,.packages='pomp',.combine=rbind) %dopar% {
      evals <- replicate(blowfly_Neval, logLik(pfilter(blowfly,params=coef(mifs_local[[i]]),Np=blowfly_Np)))
      logmeanexp(evals, se=TRUE)
    }
  })
},seed=900242057,kind="L'Ecuyer")

blowfly_results_local <- data.frame(logLik=liks_local[,1],logLik_se=liks_local[,2],t(sapply(mifs_local,coef)))
summary(blowfly_results_local$logLik,digits=5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1710.8 -1708.9 -1707.0 -1705.0 -1704.3 -1681.8
pairs(~logLik+a+b+sigma,data=subset(blowfly_results_local,logLik>max(logLik)-50))

  • This investigation took 46.2 minutes for the maximization and 1.5 minutes for the likelihood evaluation.
  • Evaluation of the best result of this search gives a likelihood of -1681.8 with a standard error of 25.9.

  • These repeated stochastic maximizations can also show us the geometry of the likelihood surface in a neighborhood of this point estimate:

4.7 Global likelihood maximization

  • Finally we will perform the likelihood maximization on a global scale. It took a lot of time.
blowfly_box <- rbind(
  a=c(2.5,4),
  b=c(0.0004,0.0009),
  sigma = c(0.7,1.1)
)

blowfly_fixed_params<-c(N.0=948,e.0=0,phi=1)

stew(file=sprintf("box_eval-%d.rda",run_level),{
  
  t_global <- system.time({
    mifs_global <- foreach(i=1:blowfly_Nglobal,.packages='pomp', .combine=c, .options.multicore=mcopts) %dopar%  mif2(
      mifs_local[[1]],
      start=c(apply(blowfly_box,1,function(x)runif(1,x[1],x[2])),blowfly_fixed_params)
    )
  })
},seed=1270401374,kind="L'Ecuyer")


stew(file=sprintf("lik_global_eval-%d.rda",run_level),{
  t_global_eval <- system.time({
    liks_global <- foreach(i=1:blowfly_Nglobal,.packages='pomp',.combine=rbind, .options.multicore=mcopts) %dopar% {
      evals <- replicate(blowfly_Neval, logLik(pfilter(blowfly,params=coef(mifs_global[[i]]),Np=blowfly_Np)))
      logmeanexp(evals, se=TRUE)
    }
  })
},seed=442141592,kind="L'Ecuyer")

blowfly_results_global <- data.frame(logLik=liks_global[,1],logLik_se=liks_global[,2],t(sapply(mifs_global,coef)))
summary(blowfly_results_global$logLik,digits=5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1722.8 -1711.4 -1703.8 -1703.8 -1696.7 -1678.5
  • Evaluation of the best result of this search gives a likelihood of -1678.5 with a standard error of 24.6. This took in 231.2 minutes for the maximization and 7.3 minutes for the evaluation.

  • We can have a feel of the global geometry of the likelihood surface from the following plot.

if (run_level>2) 
  write.table(rbind(blowfly_results_local,blowfly_results_global),
              file="mif_blowfly_params.csv",append=TRUE,col.names=FALSE,row.names=FALSE)

pairs(~logLik+a+b+sigma,data=subset(blowfly_results_global,logLik>max(logLik)-250))

  • We see that optimization attemps from diverse remote starting points end up with comparable likelihoods. This gives us some confidence in our maximization procedure.

4.8 Diagnostics

  • Now let’s have a look at the diagnostics.
plot(mifs_global)

  • From the diagnostics we can see that the log-likelihood converges slowly with some fluctuation. The parmameters also seem a little unstable, it might be a result of such a small parameter box. For the filtering part, something might catch our attention at around \(t=95\), where there is a sharp peak for e and a drop for the effective sample size, as long as seperation of evaluations for N.
  • Because the parameter estimates seem unstable, it could be a weakly identified parameter subspace. The search for maximization of log-likelihood seems not so successful, indicating that this model might not be a good choice for the blowflies population.

5 Compare with the Nicholson’s equation model and a discrete version

5.1 Time-delayed model

  • The Nicholson’s equation was proposed three decades after the experiments.

\[\frac{dN}{dt}=P\,N(t-\tau)\,exp(-N(t-\tau)/N_0)-\delta\,N(t)\]

  • Here \(N\) is the adult population and \(P\),\(N_0\),\(\delta\), and \(\tau\) are parameters. *This model take in consideration of a time delay regarding the relationship between future adult recruitment and current adult population(Gurney, W. S. C., Blythe, S. P., & Nisbet, R. M. (1980)).

  • The discrete version (Wood, S. N. (2010)) is \[ N_{t+1} = R_{t} + S_{t}\]

where \(R_t\;\sim\;\mathrm{Poi}(P\,N_{t-\tau}\,exp(-N_{t-\tau}/N_0)\,e_t)\) denotes recruiment and \(S_t\;\sim\;\mathrm{binom}(exp(-\delta\varepsilon_t),N_t)\) denotes survival. Assuming timestep for the stochastic process is 1, environmental stochasticity terms \(e_t\) and \(\varepsilon_t\) are Gamma-distributed i.i.d. random variables with mean 1 and variances \(\sigma_{p}^2\) , \(\sigma_{d}^2\). The measurement model is \(Y_t\;\sim\;\mathrm{negbin}(N_t,1/\sigma_y^2)\). The egg production is an independent Poisson process for each female and each adult has an independent probability of \(exp(\delta\,\varepsilon_t)\).

pompExample(blowflies)
## newly created object(s):
##  blowflies1 blowflies2
pf<-replicate(10,pfilter(blowflies1,Np=5000))
ll<-sapply(pf,logLik)
ll
##  [1] -1466.326 -1467.320 -1466.640 -1465.344 -1466.516 -1465.234 -1466.732
##  [8] -1466.951 -1465.803 -1466.118
L_pf<-logmeanexp(ll,se=TRUE)
L_pf
##                          se 
## -1466.0870936     0.2344848

5.2 Comparision using estimated log-likelihood

  • The best estimated log-likelihood of the Beverton-Holt model we used in this project is around -1678.5. The estimated log-likelihood of the SARMA model is -1634.39. The estimated log-likelihood of the discrete time-delayed model from 5000 simulations is -1466.09. And see from simulations, the log-likelihood of the time-delayed model is far more stable than the Beverton-Holt model we used.

6 Reference

Brillinger, D. R. (2012). The Nicholson blowfly experiments: some history and EDA. Journal of Time Series Analysis, 33(5), 718-723. http://www.stat.berkeley.edu/~brill/Papers/jtsa2012.pdf Berezansky, L., Braverman, E., & Idels, L. (2010). Nicholson’s blowflies differential equations revisited: main results and open problems. Applied Mathematical Modelling, 34(6), 1405-1417. http://www.sciencedirect.com/science/article/pii/S0307904X09002674 STATS 531 Winter2016, University of Michigan, Class Notes: 6,11,13,15. http://ionides.github.io/531w16/ Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310), 1102-1104. http://www.nature.com/nature/journal/v466/n7310/abs/nature09319.html Beverton-Holt model on Wikipedia https://en.wikipedia.org/wiki/Beverton%E2%80%93Holt_model