MCMC simulation as a random permutation
I’ve just finished a new paper. Continuing my recent use of unwieldy titles, I call it “How to view an MCMC simulation as a permutation, with applications to parallel simulation and improved importance sampling”.
The paper may look a bit technical in places, but the basic idea is fairly simple. I show that, after extending the state space a bit, it’s possible to view an MCMC simulation (done for some number of iterations) as a randomly selected map from an initial state to a final state that is either a permutation, if the extended state space is finite, or more generally a one-to-one map that preserves volume.
Why is this interesting? I think it’s a useful mathematical fact — sort of the opposite of how one can “couple” MCMC simulations in a way that promotes coalescence of states. It may turn out to be applicable in many contexts. I present two of these in the paper.
One application is to parallel simulations from many initial states. The obvious way to run many MCMC simulations is to do them completely independently, but this isn’t quite as easy as it seems, since you need to make sure that the pseudo-random number streams that you use are actually independent. Even if you’re not worried about that, however, you might prefer to use a single random stream, since this might speed up all the simulations (if they can all efficiently access this stream). This could be disastrous with standard MCMC simulation, however, since the chains will often coalesce to the same state, eliminating all the benefit of multiple parallel simulations. This can’t happen with permutation MCMC, because permutations don’t map two points to one point. Empirically, the results with permutation MCMC are indistinguishable from independent simulations, though this isn’t quite true in theory.
A more far-reaching application is to importance sampling. This was my original motivation, as I’d like to develop a more general method along the lines of Hamiltonian importance sampling. The crucial issue is that to use a distribution as an importance sampler it has to be possible to compute the probability density (at least up to some constant factor) of each point that is sampled, since that’s needed to find the importance weight for the point. If the distribution is defined in terms of some complex map, this will generally be impossible, since in general you have to consider all points that map to the sampled point, and even if the map is known to be one-to-one, you need to compute the absolute value of the determinant of its Jacobian matrix, which would often be hopeless. It’s not hopeless, though, if you happen to know that the map preserves volume, in which case this Jacobian factor must be one.
In the paper, I describe a relatively simple method that uses permutation MCMC to improve on an original importance sampling distribution that isn’t adequate, but that does have substantial overlap with the distribution of interest. However, I see this as just an initial illustration of the concept.
I also discuss the implications of permutation MCMC for whether randomness is actually required for MCMC. This relates to a recent paper by Iain Murray and Lloyd Elliott on “Driving Markov chain Monte Carlo with a dependent random stream”, which describes methods similar to some in my paper, though without the concern with volume preservation. I’ll talk about this aspect soon in another blog post.