Posts filed under ‘Statistics’
The University of Toronto Department of Computer Science is recruiting for three positions at the Assistant Professor level — in Machine Learning, Computational Biology, and Systems.
The Machine Learning group in CS currently consists of Geoffrey Hinton, Richard Zemel, and myself (or at least, 25% of myself, the rest being in Statistics), along with a number of other assocated faculty in CS and other departments, such as Ruslan Salakhutdinov and Brendan Frey.
The position in Computational Biology is joint with the The Donnelly Centre for Cellular and Biomolecular Research. There are many research groups at the University of Toronto also working on computational biology, including significant interests within the Machine Learning group.
The systems position is at the suburban campus in Mississauga, though with substantial reasearch and graduate teaching activity at the downtown campus.
Two papers involving Hamiltonian Monte Carlo (HMC) have recently appeared on arxiv.org — Jascha Sohl-Dickstein’s Hamiltonian Monte Carlo with reduced momentum flips, and Jascha Sohl-Dickstein and Benjamin Culpepper’s Hamiltonian annealed importance sampling for partition function estimation.
These papers both relate to the variant of HMC in which momentum is only partially refreshed after each trajectory, which allows random-walk behaviour to be suppressed even when trajectories are short (even just one leapfrog step). This variant is described in Section 5.3 of my HMC review. It seems that the method described in the first paper by Sohl-Dickstein could be applied in the context of the second paper by Sohl-Dickstein and Culpepper, but if so it seems they haven’t tried it yet (or haven’t yet written it up).
In my post on MCMC simulation as a random permutation (paper available at arxiv.org here), I mentioned that this view of MCMC also has implications for the role of randomness in MCMC. This has also been discussed in a recent paper by Iain Murray and Lloyd Elliott on Driving Markov chain Monte Carlo with a dependent random stream.
For the simple case of Gibbs sampling for a continuous distribution, Murray and Elliott’s procedure is the same as mine, except that they do not have the updates of extra variables needed to produce a volume-preserving map. These extra variables are relevant for my importance sampling application, but not for what I’ll discuss here. The method is a simple modification of the usual Gibbs sampling procedure, assuming that sampling from conditional distributions is done by inverting their CDFs (a common method for many standard distributions). It turns out that after this modification, one can often eliminate the random aspect of the simulation and still get good results! (more…)
I’ve just finished a new paper. Continuing my recent use of unwieldy titles, I call it “How to view an MCMC simulation as a permutation, with applications to parallel simulation and improved importance sampling”.
The paper may look a bit technical in places, but the basic idea is fairly simple. I show that, after extending the state space a bit, it’s possible to view an MCMC simulation (done for some number of iterations) as a randomly selected map from an initial state to a final state that is either a permutation, if the extended state space is finite, or more generally a one-to-one map that preserves volume.
Why is this interesting? I think it’s a useful mathematical fact — sort of the opposite of how one can “couple” MCMC simulations in a way that promotes coalescence of states. It may turn out to be applicable in many contexts. I present two of these in the paper. (more…)
Here is my second post on the paper by Matthew Hoffman and Andrew Gelman on “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo”, available from arxiv.org. In my first post, I discussed how well the two main innovations in this “NUTS’” method — ending a trajectory when a “U-Turn” is encountered, and adaptively setting the stepsize — can be expected to work. In this post, I will discuss the empirical evaluations in the NUTS paper, and report on an evaluation of my own, made possible by the authors having kindly made available the NUTS software, concluding that the paper’s claims for NUTS are somewhat overstated. The issues I discuss are also of more general interest for other evaluations of HMC. (more…)
Matthew Hoffman and Andrew Gelman recently posted a paper called “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” on arxiv.org. It has been discussed on Andrew’s blog.
It’s a good paper, which addresses two big barriers to wider use of Hamiltonian Monte Carlo — the difficulties of tuning the trajectory length and tuning the stepsize to use when simulating a trajectory. The name “No-U-Turn Sampler” (NUTS) comes from their way of addressing the problem of tuning the trajectory length — repeatedly double the length of the current trajectory, until (simplifying a bit) there is a part of the trajectory that makes a “U-Turn”, heading back towards its starting point. This doubling method is clever, and (as I discuss below) one aspect of it seems useful even apart from any attempt to adaptively set the trajectory length. They also introduce a method of adapting the stepsize during the burn-in period, so as to achieve some desired acceptance probability.
However, I don’t think these are completely satisfactory ways of setting trajectory lengths and stepsizes. As I’ll discuss below, these problems are more complicated than they might at first appear. (more…)
I have released a (very) preliminary version of my new MCMC software in R, which I’m calling GRIMS, for General R Interface for Markov Sampling. You can get it here.
This software differs from other more-or-less general MCMC packages in several respects, all but one of which make it, I think, a much better tool for serious MCMC applications. Here are some highlights: (more…)
In the June 23 print edition of the Globe and Mail (billed as “Canada’s National Newspaper”), there’s an article on data centres (“Hewers of wood, storers of data”), in which, on page B4, one can read the following:
Greenpeace recently released a report that said if the Internet were a country, it would be the fifth-largest consumer of energy, largely because of the massive data centres that run unseen in the background. The group estimated that the centres will use 1.9 billion kilowatt hours of electricity by 2020 — more than the amount currently used by Canada, France, Germany and Brazil combined. (The average US home uses 8,000 kilowatt hours a year.)
An exercise for the reader: How many logical fallacies, arithmetic errors, or contradictions of common knowledge can you find in this passage?
I haven’t tried to determine whether these fallacies originate in the (unidentified) Greenpeace report, or are original to the Globe and Mail.
This fall, I’ll be teaching a second-year course on Probability with Computer Applications, which is required for Computer Science majors. I’ve taught this before, but that was five years ago, so I’ve been looking to see what new textbooks would be suitable. The course aims not just to use computer science applications as examples, but also to reinforce concepts of probability with programs, and to show how simulation can be used to solve problems that aren’t easily solved analytically. I’ve used R for the programming part, and plan to again, so I was naturally interested in two recent textbooks that seemed to have similar aims:
Introduction to Probability with R, Kenneth Baclawski, Chapman & Hall / CRC.
Probability with R: An Introduction with Computer Science Applications, Jane M. Horgan, Wiley.
I’ve now had a look at both of these textbooks. Unfortunately, they are both seriously flawed. Even more unfortunately, although some of the flaws in these books are particularly striking, I’ve seen similar, if usually less serious, problems in many other textbooks. (more…)
I have now released a new collection of 30 patches to speed up R version 2.13.0. You can get them here
Assessing how much these patches speed up R is difficult. First of all, the speedup varies tremendously with the type of program. It also varies quite a bit with the machine and compiler used to run R. Finally, it varies in apparently random ways — changing code in one part of the R interpreter can change the speed of operations that never use the modified code by plus or minus 5% or more. This is presumably due to the change altering the exact addresses of other code segments, with consequent effects on alignment of memory fetches or on cache behaviour.
Nevertheless, here is a comparison of R 2.13.0 without modification and with all my patches applied, with and without compilation of R functions. The tests were done with an Intel X5680 processor running at 3.33GHz in 64-bit mode using gcc 4.4.4 under Red Hat Linux with default R configuration parameters. The tests use my suite of speed tests for R.
Here are some highlights: (more…)