1 Indroduction

1.1 Background

Parasitoid-host systems are prevalent in nature. Parasitoidism is one of six major evolutionary strategies within parasitism[1], and is used as a major biological control agent in agriculture and conservation[2]. It involves three stages in the most common scenario: the parasitoid lay egg in the body of their host; hatched larva feed on host until killing them; the larva keep feeding on host until they become adult and finally emerge. Thus in some sense parasitoidism is similar to both parasitism and predation.

The parasitoid-host system studied here is pea aphids and their most common parasitoid, Aphidius ervi. Number of living aphids(both unparasitized and parasitized, they are not differentiable from look) and dead aphids(“mummies”) containing parasitoid larvas are documented in different set-up of lab experiments.

1.2 Objectives

Temperature can be a key factor in many types of ecology systems. Here we try to build a POMP model to study its effects on a specific parasitoid-host system, which is done in this paper[3] but using a different statistical method. More specifically, we want to test whether the POMP method will give comparable results to it affects several demographic rates in the system.

2 Data selection and preprocess

All the data are from this paper:Meisner MH, Harmon JP, Ives AR (2014) Temperature effects on long-term population dynamics in a parasitoid-host system. The two time-series are obtained in long-term lab experiments, while the values of some factors in the model are fitted from previous short-term experiments. The original paper used data from 8 set-ups of experiments: 2 different temperatures(\(20^\circ C\), and \(27^\circ C\)), 2 different initial host densities(low and high), then repeat each combination 2 times. They fit all 8 sets of data together using a Maximum Likelihood Estimation(MLE) method, with additional parameters corresponding to changes in demographic rates with temperature. But this is beyond our computational resource and it would be awkward to do so with POMP since different data are of different length. So instead, we pick two experiments with the same initial host density, one in low temperature and one in high temperature. We fit our POMP model to them separately, then compare the fitted parameters. This result won’t be directly comparable to results in the original paper but hopefully it can give us some intuitions at least. Moreover, the observations are made irregularly in the lab experiments, varying from consecutive days to once in several days. We can either let POMP omit those days with no data when it’s calculating conditional likelihood, but this makes many plots ugly since more than half of the days don’t have data; or as we did in this case, interpolating missing data by nearby ones. The resulting data that are used in fitting our POMP model are plotted below.

3 Study with POMP model

3.1 POMP representation

The POMP model we use is slightly modified from the original state-space model proposed in the paper and can be summarized as below(equations extracted from the original paper):

  • Hidden states:
    • \({\boldsymbol{X}}\): a multivariate time series of dimension \(n\); \(X_i, 1\le i\le n\) denotes the number of unparasitized aphids(host) of age \(i\)(days), \(n\) being the max lifespan.
    • \({\boldsymbol{Y}}\): a multivariate time series of dimension \(m\); \(Y_j, 1\le j\le m-1\) denotes the number of A.ervi(parasitoid) of age \(j\)(days). All A.ervi of age less than \(m\) is within their hosts’ body, then emerge on day \(m\) as they become adults. \(Y_m\) denotes the number of female adults of age \(m\) or more.
  • Hidden Markov chain:

\[ \begin{split} {\boldsymbol{X}}(t+1) &= S(z(t))\cdot{\boldsymbol{A}}(x(t),Y_m(t)|T)\cdot({\boldsymbol{L}}(T){\boldsymbol{X}}(t))\cdot e^{{\boldsymbol{\epsilon}}(t)} \\ Y_1(t+1) &= S_y(z(t))\cdot(1-{\boldsymbol{A}}(x(t),Y_m(t)|T))'{\boldsymbol{L}}(T){\boldsymbol{X}}(t) \\ Y_i(t+1) &= s_iS_y(z(t))Y_{i-1}(t)e^{\epsilon_y(t)} \quad \text{for}(i=1,...,m_1) \\ Y_i(t+1) &= Y_{i-1}(t) \quad \text{for}(i=m_1+1,...,m-1) \\ Y_m(t+1) &= \Big(s_yY_m(t)+\frac{1}{2}Y_{m-1}(t)\Big)e^{\epsilon_y(t)} \end{split} \]

where \(m_1\) is the number of days parasitoids remain in the still-living hosts, dot means element-by-element multiplication, and prime denotes matrix transpose. \({\boldsymbol{L}}(T)\) encodes the survival and reproduction rates as

\[ {\boldsymbol{L}}(T)= \begin{pmatrix} 0 & f_1 & f_2 & \cdots & f_{n-1} \\ s_1 & 0 & 0 & & 0 \\ 0 & s_2 & 0 & & 0 \\ \vdots & & &\ddots & 0 \\ 0 & 0 & \cdots & s_{n-1} & 0 \end{pmatrix} \] where \(s_i\) and \(f_i\) denotes survival rates and fecundities of age class \(i\), respectively. They are either estimated by short-term experiments for both cold and hot environments, or cited from other sources, as all other parameters not presented in the “Fitting parameters” list below. The parasitoid-attack survival rate \({\boldsymbol{A}}(x(t),Y_m(t)|T)\) depends on total number of unparasitized aphids \(x(t)\), number of attackers \(Y_m(t)\) and temperature \(T\), in the form of \[ A_i = \Big(1+\frac{a\alpha(T)p_iY_m}{k(hx+1)}\Big)^{-k} \] where \(p_i\) is the age-specific relative attack rates, \(\alpha\) is the explicit temperature dependence, and \(h\) is the “handling time” that makes the whole thing depend on host density. The “intraspecific density dependence[3]” makes survival diminish with increasing density of living aphids for both hosts and parasitoid: \[ \begin{split} S(z) &= (1+\frac{z}{K})^{-1} \\ S_y(z) &= (1+\frac{z}{K_y})^{-1} \end{split} \] where \[ z = \sum_{i=1}^nX_i+\sum_{i=1}^{m_1}Y_i \] is the total living aphids. The random terms \({\boldsymbol{\epsilon}}(t)\) and \(\epsilon_y(t)\) describes the demographic and environmental stochasticity on population dynamics. \({\boldsymbol{\epsilon}}(t)={\boldsymbol{\epsilon}}_d(t)+\epsilon_e(t)\), where \({\boldsymbol{\epsilon}}_d(t)\sim iid\mathcal{N}(0,min((1+z(t))^{-1},1/2))\), and \({\boldsymbol{\epsilon}}_e(t)\sim \mathcal{N}(0,\sigma_x^2)\). Here is the only modification we made to the model presented in the paper[3], as they modeled \(\epsilon_e\) as a multivariate random variable as well, and assumed a correlation \(\rho\) between all its elements. But later they fitted \(\rho\) to be \(1\) and justified it as different age classes react to environment almost the same way. So to use this information and simplify, we just assumed one environmental random term to all age classes. The exact same type of equations apply to \(\epsilon_y\), contributing another fitting parameter \(\sigma_y\). Finally, \(s_y\) is the survival rate of adult A.ervi and 1/2 appeared in the final equation because the female sex ratio of A.ervi is 50%.

As was done in the original paper, we set this POMP model as a discrete model with \(\delta_t\) being 1 day.

  • Observables:
    • \(x^*\): the number of observed living aphids, modeled as \[ x^*(t) = z(t) e^{\gamma_x(t)} \] where \(\gamma_x\sim\mathcal{N}(0, \sigma_{\gamma_x})\).
    • \(y^*\): the number of observed mummies, as \[ y^*(t) = \Big(\sum_{i=m_1+1}^{m-1}Y_i(t)\Big) e^{\gamma_y(t)} \] where \(\gamma_y\sim\mathcal{N}(M, \sigma_m^2)\). The parameter \(M\) is set because mummies are constantly under-counted as some of them are hidden, under leaves for example.
  • Fitting parameters(total of 10):
    • \(a\): basic parasitoid attach rate
    • \(K\): aphid density dependence
    • \(K_yc\): ratio of parasitized aphid density dependence to aphid density dependence(\(K_y/K\))
    • \(k\): parasitoid aggregation
    • \(h\): handling time
    • \(s_y\): parasitoid adult daily survival
    • \(sigma_x\): environmental standard deviation for aphids
    • \(sigma_y\): environmental standard deviation for parasitoids
    • \(M\): mummy observation error offset
    • \(\sigma_m\): mummy observation error standard variation

3.2 Evaluation of estimated parameters by original paper in our model

After building our POMP model according to the previously stated equations. We first run some simulations with parameters estimated by the paper(they can be viewed as prior estimates for us). The values of parameters are
a K K_yc k h s_y sigma_x sigma_y M sigma_m
Cold 2.32 0.000467 1.57 0.35 0.008 0.69 0.44 0.7 -0.22 16.8
Hot 2.32 0.000467 1.57 0.35 0.029 0.69 0.44 0.7 -0.22 16.8

The simulations are shown below:

Both simulations under low and high temperatures are pretty bad: we run 3 simulations for both cases but they are all essentially 0 for aphids numbers. This seems to suggest either the model is just bad or this set of parameters are not fitted well for a single set of data.

We next run particle filters on these estimated parameters. This gives very small but accurate log-likelihood(standard error is very small), confirming that these parameters are not good fit.

3.3 Fit the POMP model

We use the iterated filtering algorithm IF2[4] to fit our POMP model. We first do a local search with prior estimated parameters(estimation by paper using a global MLE method). This gives summaries of log-likelihood of 20 local searches for low temperature to be:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -25.28  -24.15  -23.51  -23.73  -23.29  -22.79

and for high temperature to be:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -51.76  -51.00  -50.62  -50.61  -50.22  -49.28

The range of both results are small(several log units), indicating our choice of hyper parameters such as number of particle filters and iteration times are big enough for convergence. But more importantly, we see these log-likelihoods are much higher than those obtained with prior estimates. This is because even we choose the prior estimates as initial values of parameters, some of the parameters ended up very different values. This becomes clearer as we analyze global search results, obtained by choosing initial parameters from a large range. This gives log-likelihood under low temperature to be:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -40.59  -24.48  -23.76  -24.57  -23.19  -22.35

and under high temperature:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -197.37  -50.74  -50.16  -55.00  -49.80  -48.81

Notice that in each case, the maximum is only slightly larger than that of the local search, and even minimum is larger than those with prior estimates. We present here the pair plots among most relevant parameters:

There seems to be no apparent correlations among them. The plots between log-likelihood and other parameters also show good convergence behavior, especially for \(a\) and \(K\). We can further check which parameters are estimated most differently than their prior estimates by comparing our globally fitted parameters(post estimate) to the prior estimate:
a K K_yc k h s_y sigma_x sigma_y M sigma_m
Cold Prior 2.320000 0.000467 1.570000 0.3500000 0.0080000 0.6900000 0.4400000 0.7000000 -0.2200000 16.800000
Hot Prior 2.320000 0.000467 1.570000 0.3500000 0.0290000 0.6900000 0.4400000 0.7000000 -0.2200000 16.800000
Cold Post 3.243639 1670.843020 7.741425 0.1406575 0.0059707 0.8939531 0.2987710 0.1385662 -0.4750868 9.072897
Hot Post 19.501354 2167.522488 34.196312 0.0264173 0.0030396 0.9708879 0.5715646 0.3518159 -1.3068605 6.730345

We can see all parameters have comparable post and prior estimates, they differ at most by one order, except for \(K\), for which our model gives \(10^7\) larger fit! It is very hard to believe both of them are reasonable in a natural system, and we believe our estimate is better. The reasoning lies in the meaning of this parameter: look at the equation where \(K\) appears, that is the intraspecific density dependence: \[ S(z) = (1+\frac{z}{K})^{-1} \] \(S(z)\rightarrow 0\) as \(K\rightarrow 0\), this means any finite population will suddenly be suppressed to 0! And this is exactly the case of our above simulations! On the other hand, large \(K\) means \(S(z)\rightarrow 1\), just denoting neglectable intraspecific density dependence, which is totally reasonable. This is clearer by simulating with our own estimated parameters, which shows reasonable similarity with data.

3.4 Diagnostics of fitted model

We present diagnostics of fitted POMP model for both low and high temperatures below.

#low temperature diagnostic plot
plot(mifs_global.c)

#high temperature diagnostic plot
plot(mifs_global.h)

A couple of statements we can make for both fits: firstly, the effective sample size and conditional log-likelihood of the last iteration is high during the whole time, besides, the number of fails of MIF2 iteration is always 0, these mean generally speaking our model fits pretty well; secondly, most of the parameters and log-likelihood show good-enough convergence, especially \(k\), \(h\), \(s_y\), \(\sigma_x\) and \(\sigma_y\); finally, \(K\) and \(M\) have poorest convergence behavior, we should definitely try to improve them in the future.

3.5 Profile likelihood study on temperature effects

The original paper concluded \(h\) shows significant temperature dependence under their model, while \(K\) and \(k\) don’t, as opposed to the initial guess. As limited by time and computational resources, we decided only to test for \(h\) and \(K\), but unfortunately only managed to accomplish that for \(K\) in time.

Due to the difference of estimation methods, we don’t have a single statistic telling us whether the dependence on temperature is significant, so instead we use a heuristic method: we construct a profile likelihood and then calculate the 95% confidence intervals of \(K\) for both temperatures, and we draw our conclusions based on their overlap. The less they overlap, the higher probability that \(K\) does dependent on temperature. The code used is modified from homework solution of previous class[4].

The plots of profile likelihood test are shown below:

The first plot tells us the range of \(K\) we choose is too small, the confidence interval seems to be much larger than ~[300, 3500]. The second plot also shows we didn’t set lower boundary small enough, but it should be close to ~1000. In short, the confidence interval under low temperature seems to include the one under high temperature completely, and their means are relatively close. So based on this information, we certainly didn’t see any evidence supporting a significant temperature dependence of \(K\).

4 Conclusion

Firstly, the POMP model we build fits reasonably well with data; the value of one of the parameters we fit makes more sense than what is estimated in the original paper. Secondly, we didn’t find significant temperature dependence of one of the parameters, which agrees with the original paper’s claim.

5 Possible extensions

There are a lot of limits of this project due to restrictions of time and computational resources and even more for us to do in the future. For example, we could build a POMP model including both low and high temperature data together, making the results directly comparable with those made by the paper. In terms of computation itself, we could also try different and larger global search ranges for the parameters for better convergence. More CPU time would also give us a complete and more accurate estimate of confidence intervals by profile likelihood methods.

Reference:

[1]https://en.wikipedia.org/wiki/Parasitoid

[2]Dominic C. Henri, F.J. Frank Van Veen, in Advances in Ecological Research, 2011

[3]Meisner MH, Harmon JP, Ives AR (2014) Temperature effects on long-term population dynamics in a parasitoid-host system

[4]Previous homework solution: https://ionides.github.io/531w16/hw/sol09.html

[5]Course materials: https://ionides.github.io/531w18/