Objectives
To gain practical experience working with POMP models, we start with installing the pomp package and proceed to two introductory exercises.
Computation time is an unavoidable consideration when working with simulation-based inference, for all but small datasets and simple models.
The pomp package therefore allows you to specify the most computationally intensive steps—usually, simulation of the stochastic dynamic system, and evaluation of the measurement density—as snippets of C code.
Consequently, to use pomp, your R program must have access to a C compiler. In addition, pomp takes advantage of some Fortran code and therefore requires a Fortran compiler.
Installing the necessary compilers should be fairly routine, but does involve an extra step beyond the usual installation of an R package, unless you are running the Linux operating system for which they are usually installed by default. Given how fundamental C and Fortran are to scientific computing, it is unfortunate that Mac and Windows do not provide them by default.
The following advice is an updated version of http://kingaa.github.io/sbied/prep/preparation.html.
Check your R version. For our purposes, make sure you have at least 3.2.0. You can run getRversion()
to see what is currently installed.
Install and/or update relevant R packages. The file packages.R contains a list of relevant R packages. This list does not include pomp itself, but all will be helpful if you want to re-run or modify the R code in the notes. Run the following, in R:
update.packages()
source("http://ionides.github.io/531w16/hw/pomp_prep/packages.R",echo=TRUE)
Install and test the pomp package by running pompTest.R
source("http://ionides.github.io/531w16/hw/pomp_prep/pompTest.R",echo=TRUE)
If you get the output PompTest successful!
then pomp is working on your system. If not, re-run this step after trying the following operating-system-dependent advice.
Usually, Linux and Unix distributions have the necessary compilers installed by default. Indeed, the simplest way to run pomp may be to use SSH to access a Linux server. We can discuss in class how to do this. You should be able to do the following:
Note for those without previous Linux experience: Some familiarity with Linux is a basic skill for modern applied statistics, since Linux is currently the dominant environment for scientific and high-performance computing.
For example, 95% of the fastest 500 supercomputers run Linux. The University of Michigan computing cluster, Flux, also runs Linux.
You don’t need to learn Linux just for this course, but if you view improving your Linux skills as a good investment, working in this environment is encouraged. One of many introductory tutorials online is http://www.ee.surrey.ac.uk/Teaching/Unix.
Access scs.itd.umich.edu
, via SSH or using the umich virtual private network (VPN). You can use bayes.stat.lsa.umich.edu
if you have access to the Statistics department machines. You should have this access if you are a Statistics Masters or PhD student. I can get access for others, upon request.
In R, run install.package("pomp")
and select 22 (HTTP mirrors)
. For some technical reason, the default HTTPS CRAN mirrors don’t seem to be working on either the scs or bayes machines.
If you run Linux on your own machine and have trouble with either script above, make sure you have the GNU compiler collection (GCC) installed. Linux distributions typically include this by default but it is not impossible that you have somehow avoided this.
So that you can compile C code and dynamically link it into an R session, you will need to make sure you have the Xcode app installed before running the second script above. This is free and can be installed via the App Store or downloaded from [https://developer.apple.com/xcode/downloads/].
If you have trouble with the first command trying to install pomp from source, receiving the error,
make: gfortran-4.8: No such file or directory
then it is likely that you do not have the necessary version of gfortran installed. Have a look at these instructions and contact me if these don’t work for you.
You will need the ability to compile C code and dynamically link it into an R session. To do this, you’ll need to install the Rtools suite. Download the latest frozen version (http://cran.r-project.org/bin/windows/Rtools) and install it. During installation of Rtools, click all the additional boxes–the default installation doesn’t include everything you need.
Some problems have been reported with the unfrozen version (Rtools33) but none with the last frozen version (Rtools32).
Please report if you have difficulties with the interaction between Rtools and Rstudio; the code should run on current releases of Rstudio.
It may also be a good idea to upgrade your version of R. It has been reported that difficulties getting this working on Windows with R3.2.1 disappeared after installation of R3.2.4.
Please submit to Canvas an Rmd file addressing the following questions. Your Rmd file can read in the Parus major data from the internet, e.g., by
dat <- read.csv("http://ionides.github.io/531w16/notes11/parus.csv")
Question 8.1. Reformulating the Ricker model.
The Ricker equation can be reparameterized so that the scaling of \(P_n\) is explicit: \[P_{n+1} = r\,P_{n}\,\exp\left(-\frac{P_{n}}{k}\right).\] Modify the pomp
object created in the notes to reflect this reparameterization. Also, Modify the measurement model so that the data \({y_n^*}\) is modeled as \[Y_n |P_n \sim \mathrm{Negbin}(\phi\,P_n,\psi).\] Here, \(\mathrm{Negbin}(\mu,\psi)\) is the negative binomial distribution with mean \(\mu\) and probability parameter \(\psi\), and therefore variance \(\mu/\psi\). This parameterization corresponds in R to rbinom(...,mu,prob)
. See ?rnbinom
for documentation on the negative binomial distribution and the R Extensions Manual section on distribution functions for information on how to access these in C.
Try simulating from a few choices of the parameters, and present one simulation from a set of parameters that shows oscillatory behavior.
Question 8.2. Coding a new model.
Construct a pomp object for the Parus major data and the stochastic Beverton-Holt model, \[P_{n+1} = \frac{a\,P_n}{1+b\,P_n}\,\varepsilon_n,\] where \(a\) and \(b\) are parameters and \[\varepsilon_t \sim \mathrm{Lognormal}(-\tfrac{1}{2}\sigma^2,\sigma^2).\] Assume the same measurement model as we used for the Ricker model. Try simulating from a few choices of the parameters. What are the similarities and differences between simulations you obtain from the Beverton-Holt model and those from the Ricker model? Present one simulation to support your comments.
Question 8.3. How long did this homework take?
This homework is conceptually quite simple, but involves overcoming various technical hurdles. The hurdles could be overcome quite quickly, or could turn into a longer battle.
To make progress on statistical inference for POMP models, we have to solve these underlying computational issues.
If you get stuck, ask for help from your peers and/or Dao and/or me. Please report how much time this homework ends up taking, to help me monitor how many difficulties are encountered.