1 Introduction
2 Summary of the data
3 Fitting an SARMA model
- 3.1 Selecting SARMA model using AIC
- 3.2 Residual analysis
4 Fitting a POMP model
5 Compare with the Nicholson’s equation model and a discrete version
- 5.1 Time-delayed model
- 5.2 Comparision using estimated log-likelihood
6 Reference

1 Introduction

From the Nicholson Blowfly Experiment, we can have some idea of population dynamics of artificial population of blowflies. In the experiment, unlimited quantities of larval food were provided; the adult food supply (ground liver) was constant at 0.4g per day, the adult number were monitored, as well as the numbers of eggs and larvae. The result of population of adult blowflies has been shown as an classic example of dynamic nature of fluctuations in insect populations(Brillinger, D. R. (2012).
The prototyic model in population biology is the Ricker model, modeling population growth and resource depletion (Class note 11).

\[{{\quad\quad}\displaystyle P}_{n+1} = r\,P_{n}\,\exp(-P_{n}+\varepsilon_{n}), \qquad \varepsilon_{n}\;\sim\;\mathrm{Normal}(0,\sigma^2)\]

Here, \(P_{n}\) is the population density at time \(t_{n}=n\) and r is a parameter, relating to the population’s intrinsic capacity to increase. And the intrinsic growth rate is assumed to be log-normally distributed. Parameter \(\sigma\) is the standard deviation of the noise process \(\varepsilon\).
Another classic discrete-time population model is the Beverton-Holt model (See Beverton-Holt model on Wikipedia, and Class note 11). \[P_{n+1} = \frac{a\,P_n}{1+b\,P_n}\,\varepsilon_n,\] Here parameter \(a\) can be interpreted as the proliferation rate per generation and \(\frac{a-1}{b}\) is the carring capacity of the environment. The noise process \(\varepsilon\) may have the log-normal distribution with mean \(0\) and variance \(\sigma^2\).
There are also other population model such as the Hassell model and the Maynard Smith-Slatkin model.
What models explain the fluctuations in the experimental population of blowflies? In this project we will fit an SARMA model and a POMP model and compare them with another plausible model.

2 Summary of the data

The data comes from the 4th set of Nicholson Blowfly Experiment. Because the pattern seems to change after \(t=400\), we mainly analyse in the time window befor \(t=400\).

blowflies <- read.csv("http://ionides.github.io/531w16/final_project/blowfly4.csv",skip=3)
head(blowflies)

##   day   y
## 1   0 948
## 2   2 942
## 3   4 911
## 4   6 858
## 5   8 801
## 6  10 676

Let’s have a look at the fluctuations.
We can see some periodic patterns. Check them in the spectrum plot.

Looking into the spectrum of the data, we found that there is a periodic feature with frequency around 0.05~0.06, which is a cycle of around 18 bidays.
Next we will fit models to try to explain these fluctuations.

3 Fitting an SARMA model

3.1 Selecting SARMA model using AIC

First, let’s fit the data with an SARMA model to show the “seasonalty”" we see from the data. From the spectrum we set the seasonal period to be 18 bidays.

\[ (1-{\Phi}_p B^{18})(1-{\phi}_p B) X_n = (1+{\psi}_q B)\epsilon_n.\]

	MA0	MA1	MA2	MA3	MA4	MA5
AR0	3550.94	3426.52	3371.20	3348.41	3338.24	3336.03
AR1	3341.66	3334.28	3329.34	3329.90	3330.91	3331.66
AR2	3329.06	3297.07	3298.87	3300.09	3293.95	3290.09
AR3	3320.99	3298.89	3300.85	3290.91	3288.77	3290.38
AR4	3319.67	3300.64	3289.16	3303.47	3304.93	3305.11

The AIC table suggests that SARMA(3,4) with a period of 18 bidays fits the data best.

arima(blowflies$pop,order=c(3,0,4),seasonal=list(order=c(1,0,0),period=18))

## 
## Call:
## arima(x = blowflies$pop, order = c(3, 0, 4), seasonal = list(order = c(1, 0, 
##     0), period = 18))
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     ma2      ma3     ma4     sar1
##       2.3853  -1.9321  0.4925  -1.5527  0.5791  -0.0631  0.2098  -0.0006
## s.e.  0.0676   0.1281  0.0677   0.0258  0.1215   0.1239  0.0463   0.0829
##       intercept
##        2666.604
## s.e.    179.926
## 
## sigma^2 estimated as 647408:  log likelihood = -1634.39,  aic = 3288.77

The log-likelihood of the SARMA(3,4) with a period of 18 bidays is -1634.39.

3.2 Residual analysis

The plot of residuals shows that this model fits bad on the data, with some high peaks in the residual plot. However the acf of the residuals shows that the residual can be seen as independent. The qq-plot shows that the redisuals are not Gaussian distributed.

4 Fitting a POMP model

4.1 Beverton-Holt model

Now we want to fit a POMP model for the data.
Let’s try the stochastic Beverton-Holt model. \[P_{n+1} = \frac{a\,P_n}{1+b\,P_n}\,\varepsilon_n,\] where \(a\) and \(b\) are parameters and \[\varepsilon_t \sim \mathrm{Lognormal}(-\tfrac{1}{2}\sigma^2,\sigma^2).\]
The measurement model is

\[{{\quad\quad}\displaystyle Y}_{n}|P_n\;\sim\;\mathrm{Poisson}(\phi\,P_{n})\].

The parameter \(\phi\) is proportional to the sampling effort. Here in this dataset, \(\phi\) can be set as 1.

4.2 Constructing the pomp object

Now we can construct a pomp object according to the stochastic Beverton-Holt model.

require(pomp)
skel <- Csnippet("
                 DN = (a*N)/(1+(b*N));
                 ")
stochStep <- Csnippet("
                      e = rlnorm(-0.5*sigma*sigma,sigma);
                      N = ((a*N)/(1+(b*N)))*e;
                      ")
rmeas <- Csnippet("pop = rpois(phi*N);")
dmeas <- Csnippet("lik = dpois(pop,phi*N,give_log);")

pomp(blowflies,
     times="day",t0=0,
     skeleton=skel,
     rprocess=discrete.time.sim(step.fun=stochStep,delta.t=2),
     paramnames=c("a","b","sigma","phi"),
     statenames=c("N","e"),
     rmeasure=rmeas,
     dmeasure=dmeas
     ) -> blowfly

By setting the parameters we can give one naive simulation of the model.

coef(blowfly) <- c(N.0=948,e.0=0,a=3,b=0.0007,sigma=0.6,phi=1)
sims <- simulate(blowfly,nsim=1,as.data.frame=TRUE,include.data=TRUE)

ggplot(sims,mapping=aes(x=time,y=pop,group=sim,color=sim=="data"))+
  geom_line()+guides(color=FALSE)

4.3 Likelihood slice

Now we can construct a likelihood slice and see how the likelihood changes with the parameters in the POMP model. Here we will use parallel computing.The plots of slices are below.It is convenient to do some parallelization when generating replications.

sliceDesign(
  c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1),
  a=rep(seq(from=2.5,to=4,length=40),each=3),
  b=rep(seq(from=0.0004,to=0.0009,length=40),each=3),
  sigma=rep(seq(from=0.5,to=1.1,length=40),each=3)) -> p

blowfly_mle=unlist(p[which.max(p$loglik),])

require(foreach)
require(doMC)

registerDoMC(cores=5)   

set.seed(998468235L,kind="L'Ecuyer")
mcopts <- list(preschedule=FALSE,set.seed=TRUE)

foreach (theta=iter(p,"row"),.combine=rbind,
         .inorder=FALSE,.options.multicore=mcopts) %dopar% 
         {
           pfilter(blowfly,params=unlist(theta),Np=5000) -> pf
           theta$loglik <- logLik(pf)
           theta
         } -> p

foreach (v=c("a","b","sigma")) %do% 
{
  x <- subset(p,slice==v)
  plot(x[[v]],x$loglik,xlab=v,ylab="loglik")
}

4.4 Filtering on simulated data

Let’s check that we can indeed filter and re-estimate parameters successfully for this simulated data.We proceed to carry out replicated particle filters at this tentative MLE:

simulate(blowfly,params=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1),
         nsim=10000,states=TRUE) -> x

ell <- dmeasure(blowfly,y=obs(blowfly),x=x,times=time(blowfly),log=TRUE,
                params=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1))
dim(ell)

## [1] 10000   201

ell <- apply(ell,1,sum); summary(exp(ell)); logmeanexp(ell,se=TRUE)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0       0

##                    se 
## -213729.92     563.92

pf <- pfilter(blowfly,Np=5000,params=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1))
logLik(pf)

## [1] -1713.241

pf <- replicate(10,pfilter(blowfly,Np=5000,params=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1)))
ll <- sapply(pf,logLik); ll

##  [1] -1713.400 -1711.800 -1711.691 -1711.378 -1725.610 -1714.222 -1711.897
##  [8] -1709.419 -1712.100 -1715.233

L_pf<-logmeanexp(ll,se=TRUE)

We obtain an unbiased likelihood estimate of -1711.3 with a Monte standard error of 0.94. From the slicing we can get a tentative MLE of \(a=3.3\), \(b=0.0006\), \(\sigma=0.7\).

4.5 Setting run-levels

Then we set the run levels and prepare for searching for the MLE of the parameters using particle filter.

run_level <- 3
switch(run_level,
       {blowfly_Np=100; blowfly_Nmif=10; blowfly_Neval=4; blowfly_Nglobal=10; blowfly_Nlocal=10}, 
       {blowfly_Np=1000; blowfly_Nmif=100; blowfly_Neval=10; blowfly_Nglobal=20; blowfly_Nlocal=20}, 
       {blowfly_Np=10000; blowfly_Nmif=300; blowfly_Neval=20; blowfly_Nglobal=100; blowfly_Nlocal=20}
)

4.6 Local search for MLE

First we can do a local search based on the information from the likelihood surface slice.

stew(file=sprintf("blowfly_local_search-%d.rda",run_level),{
  
  t_local <- system.time({
    mifs_local <- foreach(i=1:blowfly_Nlocal,.packages='pomp', .combine=c, .options.multicore=mcopts) %dopar%  {
      mif2(
        blowfly,
        start=c(N.0=948,e.0=0,a=3.3,b=0.0006,sigma=0.7,phi=1),
        Np=blowfly_Np,
        Nmif=blowfly_Nmif,
        cooling.type="geometric",
        cooling.fraction.50=0.5,
        transform=TRUE,
        rw.sd=rw.sd(
          a=0.002,
          b=0.000001,
          sigma=0.001
        )
      )
      
    }
  })
  
},seed=900242057,kind="L'Ecuyer")


stew(file=sprintf("blowfly_lik_local-%d.rda",run_level),{
  t_local_eval <- system.time({
    liks_local <- foreach(i=1:blowfly_Nlocal,.packages='pomp',.combine=rbind) %dopar% {
      evals <- replicate(blowfly_Neval, logLik(pfilter(blowfly,params=coef(mifs_local[[i]]),Np=blowfly_Np)))
      logmeanexp(evals, se=TRUE)
    }
  })
},seed=900242057,kind="L'Ecuyer")

blowfly_results_local <- data.frame(logLik=liks_local[,1],logLik_se=liks_local[,2],t(sapply(mifs_local,coef)))
summary(blowfly_results_local$logLik,digits=5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1710.8 -1708.9 -1707.0 -1705.0 -1704.3 -1681.8

pairs(~logLik+a+b+sigma,data=subset(blowfly_results_local,logLik>max(logLik)-50))

This investigation took 46.2 minutes for the maximization and 1.5 minutes for the likelihood evaluation.
Evaluation of the best result of this search gives a likelihood of -1681.8 with a standard error of 25.9.
These repeated stochastic maximizations can also show us the geometry of the likelihood surface in a neighborhood of this point estimate:

4.7 Global likelihood maximization

Finally we will perform the likelihood maximization on a global scale. It took a lot of time.

blowfly_box <- rbind(
  a=c(2.5,4),
  b=c(0.0004,0.0009),
  sigma = c(0.7,1.1)
)

blowfly_fixed_params<-c(N.0=948,e.0=0,phi=1)

stew(file=sprintf("box_eval-%d.rda",run_level),{
  
  t_global <- system.time({
    mifs_global <- foreach(i=1:blowfly_Nglobal,.packages='pomp', .combine=c, .options.multicore=mcopts) %dopar%  mif2(
      mifs_local[[1]],
      start=c(apply(blowfly_box,1,function(x)runif(1,x[1],x[2])),blowfly_fixed_params)
    )
  })
},seed=1270401374,kind="L'Ecuyer")


stew(file=sprintf("lik_global_eval-%d.rda",run_level),{
  t_global_eval <- system.time({
    liks_global <- foreach(i=1:blowfly_Nglobal,.packages='pomp',.combine=rbind, .options.multicore=mcopts) %dopar% {
      evals <- replicate(blowfly_Neval, logLik(pfilter(blowfly,params=coef(mifs_global[[i]]),Np=blowfly_Np)))
      logmeanexp(evals, se=TRUE)
    }
  })
},seed=442141592,kind="L'Ecuyer")

blowfly_results_global <- data.frame(logLik=liks_global[,1],logLik_se=liks_global[,2],t(sapply(mifs_global,coef)))
summary(blowfly_results_global$logLik,digits=5)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1722.8 -1711.4 -1703.8 -1703.8 -1696.7 -1678.5

Evaluation of the best result of this search gives a likelihood of -1678.5 with a standard error of 24.6. This took in 231.2 minutes for the maximization and 7.3 minutes for the evaluation.
We can have a feel of the global geometry of the likelihood surface from the following plot.

if (run_level>2) 
  write.table(rbind(blowfly_results_local,blowfly_results_global),
              file="mif_blowfly_params.csv",append=TRUE,col.names=FALSE,row.names=FALSE)

pairs(~logLik+a+b+sigma,data=subset(blowfly_results_global,logLik>max(logLik)-250))

We see that optimization attemps from diverse remote starting points end up with comparable likelihoods. This gives us some confidence in our maximization procedure.

4.8 Diagnostics

Now let’s have a look at the diagnostics.

plot(mifs_global)

From the diagnostics we can see that the log-likelihood converges slowly with some fluctuation. The parmameters also seem a little unstable, it might be a result of such a small parameter box. For the filtering part, something might catch our attention at around \(t=95\), where there is a sharp peak for e and a drop for the effective sample size, as long as seperation of evaluations for N.
Because the parameter estimates seem unstable, it could be a weakly identified parameter subspace. The search for maximization of log-likelihood seems not so successful, indicating that this model might not be a good choice for the blowflies population.

5 Compare with the Nicholson’s equation model and a discrete version

5.1 Time-delayed model

The Nicholson’s equation was proposed three decades after the experiments.

\[\frac{dN}{dt}=P\,N(t-\tau)\,exp(-N(t-\tau)/N_0)-\delta\,N(t)\]

Here \(N\) is the adult population and \(P\),\(N_0\),\(\delta\), and \(\tau\) are parameters. *This model take in consideration of a time delay regarding the relationship between future adult recruitment and current adult population(Gurney, W. S. C., Blythe, S. P., & Nisbet, R. M. (1980)).
The discrete version (Wood, S. N. (2010)) is \[ N_{t+1} = R_{t} + S_{t}\]

where \(R_t\;\sim\;\mathrm{Poi}(P\,N_{t-\tau}\,exp(-N_{t-\tau}/N_0)\,e_t)\) denotes recruiment and \(S_t\;\sim\;\mathrm{binom}(exp(-\delta\varepsilon_t),N_t)\) denotes survival. Assuming timestep for the stochastic process is 1, environmental stochasticity terms \(e_t\) and \(\varepsilon_t\) are Gamma-distributed i.i.d. random variables with mean 1 and variances \(\sigma_{p}^2\) , \(\sigma_{d}^2\). The measurement model is \(Y_t\;\sim\;\mathrm{negbin}(N_t,1/\sigma_y^2)\). The egg production is an independent Poisson process for each female and each adult has an independent probability of \(exp(\delta\,\varepsilon_t)\).

pompExample(blowflies)

## newly created object(s):
##  blowflies1 blowflies2

pf<-replicate(10,pfilter(blowflies1,Np=5000))
ll<-sapply(pf,logLik)
ll

##  [1] -1466.326 -1467.320 -1466.640 -1465.344 -1466.516 -1465.234 -1466.732
##  [8] -1466.951 -1465.803 -1466.118

L_pf<-logmeanexp(ll,se=TRUE)
L_pf

##                          se 
## -1466.0870936     0.2344848

5.2 Comparision using estimated log-likelihood

The best estimated log-likelihood of the Beverton-Holt model we used in this project is around -1678.5. The estimated log-likelihood of the SARMA model is -1634.39. The estimated log-likelihood of the discrete time-delayed model from 5000 simulations is -1466.09. And see from simulations, the log-likelihood of the time-delayed model is far more stable than the Beverton-Holt model we used.

6 Reference

Brillinger, D. R. (2012). The Nicholson blowfly experiments: some history and EDA. Journal of Time Series Analysis, 33(5), 718-723. http://www.stat.berkeley.edu/~brill/Papers/jtsa2012.pdf Berezansky, L., Braverman, E., & Idels, L. (2010). Nicholson’s blowflies differential equations revisited: main results and open problems. Applied Mathematical Modelling, 34(6), 1405-1417. http://www.sciencedirect.com/science/article/pii/S0307904X09002674 STATS 531 Winter2016, University of Michigan, Class Notes: 6,11,13,15. http://ionides.github.io/531w16/ Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310), 1102-1104. http://www.nature.com/nature/journal/v466/n7310/abs/nature09319.html Beverton-Holt model on Wikipedia https://en.wikipedia.org/wiki/Beverton%E2%80%93Holt_model

STATS 531 Final Project: Models for Population of Nicholson’s Blowflies

April 20, 2016