Objectives
To gain practical experience working with POMP models, we start with installing the pomp package and proceed to two introductory exercises.
Computation time is an unavoidable consideration when working with simulation-based inference, for all but small datasets and simple models.
The pomp package therefore allows you to specify the most computationally intensive steps—usually, simulation of the stochastic dynamic system, and evaluation of the measurement density—as snippets of C code.
Consequently, to use pomp, your R program must have access to a C compiler. In addition, pomp takes advantage of some Fortran code and therefore requires a Fortran compiler.
Installing the necessary compilers should be fairly routine, but does involve an extra step beyond the usual installation of an R package, unless you are running the Linux operating system for which they are usually installed by default. Given how fundamental C and Fortran are to scientific computing, it is unfortunate that Mac and Windows do not provide these compilers by default.
If your machine runs Windows, you must install Rtools. This will give you the ability to compile C code and dynamically link it into an R session.
Download Rtools from CRAN and install it. When installing Rtools, choose the “Package authoring installation” option. Also during the installation, you must tick the “edit system PATH” box.
If, having installed the latest version of Rtools compatible with your R, the scripts below fail, try installing a “frozen” version of Rtools.
So that you can compile C code and dynamically link it into an R session, you will need to make sure you have the Xcode app installed before running the second script above. This is free and can be installed via the App Store or downloaded from https://developer.apple.com/xcode/downloads/.
Check your R version. If you run into any difficulties, please sure you have at least version 3.4.0 of R. There are some substantial changes from R3.3 to R3.4, and it’s best not to spend time solving issues with outdated software. The latest version is R3.4.4. You can run getRversion()
to see what you have currently installed.
Install and/or update relevant R packages. The file packages.R contains a list of relevant R packages. This list does not include pomp itself, but all will be helpful if you want to re-run or modify the R code in the notes. Run the following, in R:
update.packages()
source("https://ionides.github.io/531w18/hw07/packages.R",echo=TRUE)
Install and test the pomp package by running pompTest.R
source("https://ionides.github.io/531w18/hw07/pompTest.R",echo=TRUE)
If you get the output PompTest successful!
then pomp is working on your system. If not, a likely possibility is that you do not have the necessary C compiler. Try running helloC.R:
source("https://ionides.github.io/531w18/hw07/helloC.R",echo=TRUE)
If this fails to give the “Hello!” message, you will need to follow the instructions below that correspond to your operating system. Re-run this step after trying the following operating-system-dependent advice.
Sys.info()
and sessionInfo()
to get printouts of the operating system and software version numbers.Usually, Linux and Unix distributions have the necessary compilers installed by default. Indeed, the simplest way to run pomp may be to use SSH to access a Linux server. We can discuss in class how to do this. You should be able to do the following:
Note for those without previous Linux experience: Some familiarity with Linux is a basic skill for modern applied statistics, since Linux is currently the dominant environment for scientific and high-performance computing.
For example, 99.4% of the fastest 500 supercomputers run Linux. The University of Michigan computing cluster, Flux, also runs Linux.
You don’t need to learn Linux just for this course, but if you view improving your Linux skills as a good investment, working in this environment is encouraged. One of many introductory tutorials online is http://www.ee.surrey.ac.uk/Teaching/Unix.
Access scs.itd.umich.edu
, via SSH or using the umich virtual private network (VPN). These machines run R3.4.2.
The bayes computation servers are available to you if you have access to the Statistics department machines. You should have this access if you are a Statistics Masters or PhD student. I can get access for others, upon request. The bayes machines run R3.3.2, but that doesn’t appear to be a problem.
If you run Linux on your own machine and have trouble with either script above, make sure you have the GNU compiler collection (GCC) installed. Linux distributions typically include this by default but it is not impossible that you have somehow avoided this.
If the pompTest.R
script fails because you cannot load pomp, try installing it from source. The easiest way to do this is to use the devtools package. Do
install.packages("devtools")
library(devtools)
install_github("kingaa/pomp")
If, while trying to install from source, you receive the error
make: gfortran-4.8: No such file or directory
or some other error mentioning gfortran
then it is likely that you do not have the necessary version of gfortran installed. Have a look at these instructions and contact me if these don’t work for you.
You have probably failed to install the Rtools correctly.
Revisit the instructions above.
Ask for help if problems persist.
Please submit to Canvas an Rmd file addressing the following questions. Your Rmd file can read in the Parus major data from the internet, e.g., by
dat <- read.csv("https://ionides.github.io/531w18/10/parus.csv")
The Ricker equation can be reparameterized so that the scaling of \(P_n\) is explicit: \[
P_{n+1} = r\,P_{n}\,\exp\left(-\frac{P_{n}}{k}\right).
\] Modify the pomp
object created in the notes to reflect this reparameterization. Also, Modify the measurement model so that the data \({y_n^*}\) is modeled as \[
Y_n |P_n \sim \mathrm{Negbin}(\phi\,P_n,\psi).
\] Here, \(\mathrm{Negbin}(\mu,\psi)\) is the negative binomial distribution with mean \(\mu\) and probability parameter \(\psi\), and therefore variance \(\mu/\psi\). This parameterization corresponds in R to rbinom(...,mu,prob)
. See ?rnbinom
for documentation on the negative binomial distribution and the R Extensions Manual section on distribution functions for information on how to access these in C.
Try simulating from a few choices of the parameters, and present one simulation from a set of parameters that shows oscillatory behavior.
Construct a pomp object for the Parus major data modeled using the stochastic Beverton-Holt model, \[ P_{n+1} = \frac{a\,P_n}{1+b\,P_n}\,\varepsilon_n, \] where \(a\) and \(b\) are parameters and \[ \varepsilon_t \sim \mathrm{Lognormal}(-\tfrac{1}{2}\sigma^2,\sigma^2). \] Assume the same measurement model as we used for the Ricker model. Try simulating from a few choices of the parameters. What are the similarities and differences between simulations you obtain from the Beverton-Holt model and those from the Ricker model? Present one simulation to support your comments.
This homework is conceptually quite simple, but involves overcoming various technical hurdles. The hurdles may be overcome quite quickly, or could turn into a longer battle.
To make progress on statistical inference for POMP models, we have to solve these underlying computational issues.
If you get stuck, ask for help from your peers and/or Joonha and/or me. Please report how much time this homework ends up taking, to help me monitor how many difficulties are encountered.
Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.
Acknowledgments: The installation advice is based on http://kingaa.github.io/sbied/prep/preparation.html.