Skip to content
Aaron A King edited this page Jul 22, 2020 · 42 revisions

Tips, tricks, advice, and examples.

Pro-tip: Never use whitespace in file or directory names.

A perpetual source of annoying errors! See, e.g., Issue 115.

Pro-tip: When using Rstudio, turn off automatic save and restore of global workspace.

By default, when you quit a session, R asks whether to save the global session to a hidden file, .RData, in the current working directory. Presumably because this behavior is annoying, Rstudio, by default, answers the "Do you want to save to .RData?" question for you, in the affirmative. Also by default, and with only an easily overlooked message in the startup banner, R loads such a file, re-establishing the workspace at the start of a session. Because the file is hidden, and the behavior is easy to forget about, this can lead to errors that are difficult to track down. [For example, situations where different results are obtained on different machines during a large parallel computation, despite all the code being precisely the same!] For these reasons, it's best to put a stop to all of this skulduggery.

To do so, go to the "Tools" menu in Rstudio and select "Global Options". Make sure the "Restore .RData into workspace at startup" box is unticked. For good measure, set the "Save workspace to .RData on exit" to "Never".

If you ever do want to save your workspace, it's as easy as save.image(file="<filename>.rda"); restoring the file is a matter of load("<filename>.rda"). When you do this, the file you create will be visible, as of course it should be since you gain nothing by hiding things from yourself!

Potential solution for compilation error in Windows

  1. Install Rtools (not an R package). The installer can be downloaded here. There is also a new Youtube video showing how to install Rtools.

  2. If somehow Rtools was not successfully added to system PATH, you can manually add Rtools directory to PATH in the Windows control panel. The Rtools installation instructions (and the video) show how to do this in an R session.

  3. To test whether R is linked with Rtools, run following command in R:

library(pkgbuild)
setup_rtools(cache = TRUE, debug = FALSE)
  1. If the code above returns TRUE but compilation error still presents, run following code to manually link R with Rtools. You need to run this command first every time you restart R. A better solution has yet to be found.
library(pkgbuild)
find_rtools()
has_devel()

(This solution works on Windows 7, R v4.0.0, Rtools4.0)

Reproducibility on a multicore machine via bake and stew

It is often the case that heavy pomp computations are best performed in parallel on a cluster or multi-core machine. This poses some challenges in trying to ensure reproducibility and avoiding repetition of expensive calculations. The bake, stew, and freeze functions provide some useful facilities in this regard.

For example:

library(pomp)
library(foreach)
library(doMC)
library(doRNG)

bake(file="pfilter1.rds",{
  registerDoMC(5)
  registerDoRNG(459983011)

  ricker() -> rick

  foreach (i=1:10, .combine=c, .packages="pomp") %dopar% {
     pfilter(rick,Np=1000)
   }
}) -> pfs

In the above bake first checks to see whether the file pfilter1.rds exists. If it does, it then loads it (using readRDS) and stores the result in pfs. If it does not, it evaluates the expression embraced in the brackets, stores the result in pfilter1.rds, and returns it.

The bake function stores or retrieves and returns a single R object. If one wants to produce multiple objects in a reproducible way, use stew. For example:

stew(file="pfilter2.rda",{
  ricker() -> rick
  te <- system.time(
  foreach (i=1:10, .combine=c, .packages="pomp") %dopar% {
      pf <- pfilter(rick,Np=1000)
      logLik(pf)
   } -> ll
  )
})

In the above, stew again checks to see if pfilter2.rda exists and only evaluates the expression in brackets if it does not. The objects rick, te, and ll are created during this evaluation; these are stored in pfilter2.rda to be retrieved if the snippet is run a second time.

Keeping a database of parameter-space explorations

Likelihood surfaces for dynamic models can be very complex and the computations needed to explore them can be expensive. By keeping a record of all parameter points visited, along with the computed likelihood at each point, is a good way to ensure that you continually improve your picture of the likelihood surface.

Doing this can be as simple as maintaining a CSV file with one column for each parameter, plus the likelihood (and s.e.). It can be useful to supplement this with an indication of the name of the model and any other qualifying information.