Evaluation of NUTS — more comments on the paper by Hoffman and Gelman
Here is my second post on the paper by Matthew Hoffman and Andrew Gelman on “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo”, available from arxiv.org. In my first post, I discussed how well the two main innovations in this “NUTS’” method — ending a trajectory when a “U-Turn” is encountered, and adaptively setting the stepsize — can be expected to work. In this post, I will discuss the empirical evaluations in the NUTS paper, and report on an evaluation of my own, made possible by the authors having kindly made available the NUTS software, concluding that the paper’s claims for NUTS are somewhat overstated. The issues I discuss are also of more general interest for other evaluations of HMC. (more…)
No U-Turns for Hamiltonian Monte Carlo – comments on a paper by Hoffman and Gelman
Matthew Hoffman and Andrew Gelman recently posted a paper called “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” on arxiv.org. It has been discussed on Andrew’s blog.
It’s a good paper, which addresses two big barriers to wider use of Hamiltonian Monte Carlo — the difficulties of tuning the trajectory length and tuning the stepsize to use when simulating a trajectory. The name “No-U-Turn Sampler” (NUTS) comes from their way of addressing the problem of tuning the trajectory length — repeatedly double the length of the current trajectory, until (simplifying a bit) there is a part of the trajectory that makes a “U-Turn”, heading back towards its starting point. This doubling method is clever, and (as I discuss below) one aspect of it seems useful even apart from any attempt to adaptively set the trajectory length. They also introduce a method of adapting the stepsize during the burn-in period, so as to achieve some desired acceptance probability.
However, I don’t think these are completely satisfactory ways of setting trajectory lengths and stepsizes. As I’ll discuss below, these problems are more complicated than they might at first appear. (more…)
GRIMS — General R Interface for Markov Sampling
I have released a (very) preliminary version of my new MCMC software in R, which I’m calling GRIMS, for General R Interface for Markov Sampling. You can get it here.
This software differs from other more-or-less general MCMC packages in several respects, all but one of which make it, I think, a much better tool for serious MCMC applications. Here are some highlights: (more…)
Innumeracy at the Globe and Mail
In the June 23 print edition of the Globe and Mail (billed as “Canada’s National Newspaper”), there’s an article on data centres (“Hewers of wood, storers of data”), in which, on page B4, one can read the following:
Greenpeace recently released a report that said if the Internet were a country, it would be the fifth-largest consumer of energy, largely because of the massive data centres that run unseen in the background. The group estimated that the centres will use 1.9 billion kilowatt hours of electricity by 2020 — more than the amount currently used by Canada, France, Germany and Brazil combined. (The average US home uses 8,000 kilowatt hours a year.)
An exercise for the reader: How many logical fallacies, arithmetic errors, or contradictions of common knowledge can you find in this passage?
I haven’t tried to determine whether these fallacies originate in the (unidentified) Greenpeace report, or are original to the Globe and Mail.
Two textbooks on probability using R
This fall, I’ll be teaching a second-year course on Probability with Computer Applications, which is required for Computer Science majors. I’ve taught this before, but that was five years ago, so I’ve been looking to see what new textbooks would be suitable. The course aims not just to use computer science applications as examples, but also to reinforce concepts of probability with programs, and to show how simulation can be used to solve problems that aren’t easily solved analytically. I’ve used R for the programming part, and plan to again, so I was naturally interested in two recent textbooks that seemed to have similar aims:
Introduction to Probability with R, Kenneth Baclawski, Chapman & Hall / CRC.
Probability with R: An Introduction with Computer Science Applications, Jane M. Horgan, Wiley.
I’ve now had a look at both of these textbooks. Unfortunately, they are both seriously flawed. Even more unfortunately, although some of the flaws in these books are particularly striking, I’ve seen similar, if usually less serious, problems in many other textbooks. (more…)
New patches to speed up R 2.13.0
I have now released a new collection of 30 patches to speed up R version 2.13.0. You can get them here
Assessing how much these patches speed up R is difficult. First of all, the speedup varies tremendously with the type of program. It also varies quite a bit with the machine and compiler used to run R. Finally, it varies in apparently random ways — changing code in one part of the R interpreter can change the speed of operations that never use the modified code by plus or minus 5% or more. This is presumably due to the change altering the exact addresses of other code segments, with consequent effects on alignment of memory fetches or on cache behaviour.
Nevertheless, here is a comparison of R 2.13.0 without modification and with all my patches applied, with and without compilation of R functions. The tests were done with an Intel X5680 processor running at 3.33GHz in 64-bit mode using gcc 4.4.4 under Red Hat Linux with default R configuration parameters. The tests use my suite of speed tests for R.
Here are some highlights: (more…)
Slowing down matrix multiplication in R
After I realized that some aspects of R’s implementation are rather inefficient, one of the first things I looked at was matrix multiplication. There I found a huge performance penalty for many matrix multiplies, a penalty which remains in the current version, 2.13.0. As discussed below, eliminating this penalty speeds up long vector dot products by a factor of 9.5 (on my new machine), and other operations where the result matrix has at least one small dimension are sped up by factors that are somewhat smaller, but still substantial. There’s a long story behind why R’s matrix multiplies are so slow… (more…)
Speed tests for R — and a look at the compiler
I’ve gotten back to work on speeding up R, starting with improving my suite of speed tests. Among other new features, this suite allows one to easily try out the “byte-code” compiler that is now a standard part of the latest release of R, version 2.13.0. You can get the suite here.
I’ve been running these tests on my new workstation, which has a six-core Intel X5680 processor, running at 3.33GHz. Unfortunately, it’s clear that thing runs somewhat slower when you use all the cores at once, so for consistency one needs to do the speed tests using just one core. (Or one needs some more elaborate, and unclear, protocol for testing the speed of R in a muticore environment.) I haven’t figured out how to get Red Hat Linux to compile 32-bit applications yet, so all the tests are in a 64-bit environment.
I’ve started with comparing the speed of R-2.13.0 with and without functions being compiled, and with comparing R-2.13.0 (without the compiler) to R-2.11.1, which was the last release before some of my speed improvements were incorporated. A plot of the results is here. (more…)
Ensemble MCMC
I’m glad to have managed, before teaching starts again, to have finished a Technical Report (available here or at arxiv.org) with what may be my most unwieldy title ever:
MCMC Using Ensembles of States for Problems with Fast and Slow Variables such as Gaussian Process Regression
I wanted the title to mention all three of the nested ideas in the paper. Actually, I wasn’t able to fit in a fourth, most general, idea, of MCMC methods based on caching and mapping (see here). Here is the abstract:
I introduce a Markov chain Monte Carlo (MCMC) scheme in which sampling from a distribution with density π(x) is done using updates operating on an “ensemble” of states. The current state x is first stochastically mapped to an ensemble, x(1),…,x(K). This ensemble is then updated using MCMC updates that leave invariant a suitable ensemble density, ρ(x(1),…,x(K)), defined in terms of π(x(i)) for i=1,…,K. Finally a single state is stochastically selected from the ensemble after these updates. Such ensemble MCMC updates can be useful when characteristics of π and the ensemble permit π(x(i)) for all i in {1,…,K} to be computed in less than K times the amount of computation time needed to compute π(x) for a single x. One common situation of this type is when changes to some “fast” variables allow for quick re-computation of the density, whereas changes to other “slow” variables do not. Gaussian process regression models are an example of this sort of problem, with an overall scaling factor for covariances and the noise variance being fast variables. I show that ensemble MCMC for Gaussian process regression models can indeed substantially improve sampling performance. Finally, I discuss other possible applications of ensemble MCMC, and its relationship to the “multiple-try Metropolis” method of Liu, Liang, and Wong and the “multiset sampler” of Leman, Chen, and Lavine.
I’ve also posted the programs used to produce the results. These haven’t been tested much beyond their use for the paper, but I hope to incorporate them into a general MCMC package in R (also including programs accompanying my review of Hamiltonian Monte Carlo). That’s my next project to do in whatever time I have available after teaching, administration, and a three-year-old daughter, along with more efforts to speed up R, so the that this MCMC package won’t be too slow.
