Objectives
To run a likelihood-based analysis for a nonlinear POMP model using pomp on a Linux server.
To discuss the possibility of running the analysis on the University of Michigan Flux cluster, a collection of Linux servers.
This material is optional. However, it may be useful for your final project. More broadly, doing statistical computing on Linux is useful training for modern statisticians who also aspire to be data scientists, working with complex models and/or big data.
You only need minimal Linux experience to run R. One of many introductory tutorials online is http://www.ee.surrey.ac.uk/Teaching/Unix.
Log on to a Linux machine. You can access scs.itd.umich.edu
, via SSH. From a Mac or Linux machine, you should be able to run the following from a terminal:
ssh scs.itd.umich.edu
From a Windows machine, you can use the UM-supported SSH software. Off campus, you may need to use the umich virtual private network (VPN). You can log in to bayes.stat.lsa.umich.edu
if you have access to the Statistics department machines. Bayes is actually a collection of Linux machines, and when you log in you are automatically placed on the machine which currently has the most availability. You should have access to Bayes if you are a Statistics Masters or PhD student. I can get access for others, upon request.
Typing
R
at a terminal prompt starts R.
Install pomp. In R, run install.packages("pomp")
.
When prompted to choose a mirror site, select 22 (HTTP mirrors)
. For some technical reason, the default HTTPS CRAN mirrors don’t seem to be working on either the scs or bayes machines.
You may have to create a personal R library as follows:
> install.packages("pomp")
Warning in install.packages("pomp") :
'lib = "/usr/lib64/R/library"' is not writable
Would you like to use a personal library instead? (y/n) y
Would you like to create a personal library
~/R/x86_64-unknown-linux-gnu-library/3.2.3
to install packages into? (y/n) y
For converting Rmarkdown to HTML, there is a technical consideration.
The package rmarkdown uses a program called pandoc.
The easiest way to install the necessary version of pandoc is by installing Rstudio.
However, Linux servers often do not have Rstudio installed.
The simplest way to deal with this issue is to bypass it. Think of the Linux server as the place you will do computations and cache the results. Move these results back to your personal machine for text editing.
According to this workflow, we don’t need pandoc on the Linux server.
We just need to carry out the first stage of processing, which is to knit
the .Rmd file to turn it into an .md file. The .md file is a markdown file which contains processed R output.
Install the R package knitr.
Moving files to and from the Linux server. If your personal machine is running Mac or Linux, you can use scp from a terminal. If your personal machine runs Windows, you can use WinSCP.
Within R, you can convert an Rmd file to md by, for example,
knit("notes15.Rmd")
This will produce a file notes15.md
which (together with the subdirectory figure
containing the figures) can be moved back to your personal machine, where you can run Rstudio to convert .md to .html. If you also move back the cache subdirectory and any other files of cached output, you can make further edits on your personal machine without having to re-run computationally intensive results.
The Statistics department has access to 200 cores on the University of Michigan Flux cluster. This cluster of Linux machines is a more powerful computational environment than the Bayes servers.
Using the Flux cluster has an extra step compared to a Linux server: we have to learn how to submit jobs. This is not necessary material for Stats 531, but it is a powerful computationaly approach.
There is an online tutorial, “Flux in 10 easy steps” which may be useful if you want to try this out. Let me or Dao know if you need more information and assistance.