Avoiding Self Transitions in Gibbs Sampling

2024-03-28 at 10:11 pm Leave a comment

I have a new paper on Modifying Gibbs sampling to avoid self transitions. The idea is that an ordinary Gibbs sampling update for a variable will often choose a new value that is the same as the old value. That seems like a waste of time, and would better be avoided.

It’s not a new idea. I have long been aware of a method from 1996 due to Jun Liu, which reduces self transitions by replacing Gibbs sampling with Metropolis-Hastings updates using a proposal to change to a different value. This reduces self transitions, though not to the minimum possible, since the proposal may be rejected.

Then, a while ago, I came across a method described in 1992 by Frigessi, Hwang, and Younes that reduces self transitions further than Liu’s method. This was independently rediscovered by Tjelmeland in 2004. But it seems it’s not well known, partly because neither of their papers really promotes it. It occurred to me that it might be good to write a short note, maybe five pages long, describing these methods and comparing them.

Then I came across some methods by Suwa and Todo, from 2010 and 2022, that reduce self transitions to the theoretical minimum, using an entirely different approach.

I also thought of a new method, sort of the reverse of the Frigessi, et al method. And then I thought of a modification of this new method that also reduces self transitions to the theoretical minimum. And then I thought of another completely different class of methods, including one that also achieves the theoretical minimum.

Along the way, I wondered what one can prove about these methods. Liu’s method, and that of Frigessi, et al and Tjelmeland, can be shown to be better than Gibbs sampling (in the context of random selection of a variable to update) using Peskun’s Theorem, because they are reversible methods that always increase every probability for moving to a different value. But Peskun’s Theorem doesn’t apply to the methods of Suwa and Todo, or to any of my new methods. Some of these methods aren’t reversible, and those that are reversible can decrease some of the probabilities for moving to different values.

There are actually two theoretical issues here. The first is whether, when seen as a local update for a single variable, one of these new methods is superior to Gibbs sampling. The second is whether such superiority carries over to a full MCMC scheme, in which all the variables are updated in some fashion.

These questions inspired a paper by myself and Jeffrey Rosenthal, on Efficiency of reversible MCMC methods: elementary derivations and applications to composite methods. As the title indicates, we only deal with reversible update methods, with the application also limited to full schemes in which the variable to update is selected randomly. But in that context we do show how variable updates can be shown to improve on Gibbs sampling, and how such improvements can carry over to methods that combine such updates. We also provide a self-contained and accessible derivation of the needed theory.

So, with the theoretical issues dealt with, I could get back to my original paper. It turned out to be a bit longer than the five pages I’d planned. About seventy-nine pages longer.

Some of these pages are devoted to details of efficient implementations of all the methods. There are also extensive empirical assessments of performance on four different problems. In addition to showing how well the different methods perform, these experiments show some interesting phenomena regarding the choice of order to update variables and of using “thinned” estimates. There’s a github repository with the code and results.

Though several of the methods I tested performed well, I think one of my new methods is likely to be the most robust, and also has the best theoretical guarantees. I call this method ZDNAM, which stands for Zero-self Downward Nested Antithetic Modification (maybe a bit of a mouthful). If you’ve been using Gibbs sampling, for discrete variables, you might give it a try!

Entry filed under: Monte Carlo Methods, Statistics, Statistics - Computing.

Staggered Stream

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Radford Neal's blog