<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Non-random MCMC</title>
	<atom:link href="http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/feed/" rel="self" type="application/rss+xml" />
	<link>http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/</link>
	<description></description>
	<lastBuildDate>Mon, 10 Jun 2013 16:04:09 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Nathan Bishop</title>
		<link>http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/#comment-831</link>
		<dc:creator><![CDATA[Nathan Bishop]]></dc:creator>
		<pubDate>Tue, 18 Sep 2012 00:17:34 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=1161#comment-831</guid>
		<description><![CDATA[Thank you for your answer. My background is physics so my questions might sound trivial, but what is the insight in putting u equal to F^{-1}(x_j^{old})? Can I use a linear combination of F^{-1}(x_j^{old}) and F^{-1}(x_j^{older})?]]></description>
		<content:encoded><![CDATA[<p>Thank you for your answer. My background is physics so my questions might sound trivial, but what is the insight in putting u equal to F^{-1}(x_j^{old})? Can I use a linear combination of F^{-1}(x_j^{old}) and F^{-1}(x_j^{older})?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Radford Neal</title>
		<link>http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/#comment-830</link>
		<dc:creator><![CDATA[Radford Neal]]></dc:creator>
		<pubDate>Mon, 17 Sep 2012 13:40:45 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=1161#comment-830</guid>
		<description><![CDATA[No.  That wouldn&#039;t make sense, since x needn&#039;t be bounded by 0 and 1, and so may not be a valid argument of F inverse.  Plus u is supposed to be in [0,1).]]></description>
		<content:encoded><![CDATA[<p>No.  That wouldn&#8217;t make sense, since x needn&#8217;t be bounded by 0 and 1, and so may not be a valid argument of F inverse.  Plus u is supposed to be in [0,1).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nathan Bishop</title>
		<link>http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/#comment-829</link>
		<dc:creator><![CDATA[Nathan Bishop]]></dc:creator>
		<pubDate>Mon, 17 Sep 2012 02:47:12 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=1161#comment-829</guid>
		<description><![CDATA[Do you mean setting u to F^{-1}(x_j^{old}), where x is the previous value?]]></description>
		<content:encoded><![CDATA[<p>Do you mean setting u to F^{-1}(x_j^{old}), where x is the previous value?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Radford Neal</title>
		<link>http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/#comment-776</link>
		<dc:creator><![CDATA[Radford Neal]]></dc:creator>
		<pubDate>Thu, 17 May 2012 16:48:14 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=1161#comment-776</guid>
		<description><![CDATA[That&#039;s an interesting question.  I&#039;ve just tried a 10-dimensional truncated multivariate normal, with mean vector zero and a covariance matrix built as follows (in R):
&lt;code&gt;
X =  rbind( c(1, 3, 3, 2, 1, 0, 1),
            c(2, 1, 3, 2, 1, 1, 1),
            c(2, 1, 2, 2, 1, 3, 1),
            c(1,-1,-2,-3,-1,-2,-1),
            c(0, 3, 0, 3, 1, 1, 0),
            c(2, 0,-2,-3, 0,-2,-1),
            c(2, 0,-2,-3, 2, 2,-1),
            c(1, 3, 2,-3, 2, 2,-1),
            c(0, 2, 0, 0, 5, 0, 0),
            c(0, 0, 3, 0, 0,-4, 0))
cov = X %*% t(X) + diag(10)
&lt;/code&gt;
I truncated this distribution to minimum values of -10 for all coordinates and maximum values of 5, 6, ..., 14.

I simulated 250 parallel Gibbs sampling chains, started from points drawn uniformly from [-1,1]&lt;sup&gt;10&lt;/sup&gt;, in several ways. First, using standard Gibbs sampling, with the 250 chains being independent. Second using permutation Gibbs sampling, with the same random values for s&lt;sub&gt;0&lt;/sub&gt;, s&lt;sub&gt;1&lt;/sub&gt;, ... for all chains (but different initial states).  Finally, I tried fixing all the s&lt;sub&gt;i&lt;/sub&gt; to a single value, either 0.0073, 0.0187, 0.0518, or 0.1377.  I simulated 400 burnin iterations for each chain, followed by 10000 iterations taken to be from equilibrium. 

Using these 250x10000 states, I computed estimates of the expected values of each coordinate (which aren&#039;t zero, due to truncation) and of the expected values of the squares of the coordinates, along with estimated standard errors for these estimates, based on the variation over the 250 chains.  The results are plotted &lt;a href=&quot;http://radfordneal.files.wordpress.com/2012/05/tmvnorm4lb.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.  As you can see from the first and third plots, all the methods produce very consistent results.  The standard errors of the methods differ, however. Standard Gibbs sampling and permuation Gibbs sampling with random s&lt;sub&gt;i&lt;/sub&gt; are very similar, but the standard errors for the coordinates are smaller by roughly a factor of two when s&lt;sub&gt;i&lt;/sub&gt; is fixed at 0.0073, corresponding to about a four times efficiency advantage over standard Gibbs sampling.  Fixing the s&lt;sub&gt;i&lt;/sub&gt; at 0.0187 gives a smaller advantage, and the advantage is smaller still when the s&lt;sub&gt;i&lt;/sub&gt; are fixed at 0.0518.  When the s&lt;sub&gt;i&lt;/sub&gt; are fixed at 0.1377, the results are very close to standard Gibbs sampling and permutation Gibbs sampling with random s&lt;sub&gt;i&lt;/sub&gt;.

The standard errors for the estimates of the expected values of the squares of the coordinates are all about the same, except that they are sometimes bigger when the s&lt;sub&gt;i&lt;/sub&gt; are fixed at 0.0073.

So it seems that fixing all the s&lt;sub&gt;i&lt;/sub&gt; to 0.0187 gives clearly better results than standard Gibbs sampling.]]></description>
		<content:encoded><![CDATA[<p>That&#8217;s an interesting question.  I&#8217;ve just tried a 10-dimensional truncated multivariate normal, with mean vector zero and a covariance matrix built as follows (in R):<br />
<code><br />
X =  rbind( c(1, 3, 3, 2, 1, 0, 1),<br />
            c(2, 1, 3, 2, 1, 1, 1),<br />
            c(2, 1, 2, 2, 1, 3, 1),<br />
            c(1,-1,-2,-3,-1,-2,-1),<br />
            c(0, 3, 0, 3, 1, 1, 0),<br />
            c(2, 0,-2,-3, 0,-2,-1),<br />
            c(2, 0,-2,-3, 2, 2,-1),<br />
            c(1, 3, 2,-3, 2, 2,-1),<br />
            c(0, 2, 0, 0, 5, 0, 0),<br />
            c(0, 0, 3, 0, 0,-4, 0))<br />
cov = X %*% t(X) + diag(10)<br />
</code><br />
I truncated this distribution to minimum values of -10 for all coordinates and maximum values of 5, 6, &#8230;, 14.</p>
<p>I simulated 250 parallel Gibbs sampling chains, started from points drawn uniformly from [-1,1]<sup>10</sup>, in several ways. First, using standard Gibbs sampling, with the 250 chains being independent. Second using permutation Gibbs sampling, with the same random values for s<sub>0</sub>, s<sub>1</sub>, &#8230; for all chains (but different initial states).  Finally, I tried fixing all the s<sub>i</sub> to a single value, either 0.0073, 0.0187, 0.0518, or 0.1377.  I simulated 400 burnin iterations for each chain, followed by 10000 iterations taken to be from equilibrium. </p>
<p>Using these 250&#215;10000 states, I computed estimates of the expected values of each coordinate (which aren&#8217;t zero, due to truncation) and of the expected values of the squares of the coordinates, along with estimated standard errors for these estimates, based on the variation over the 250 chains.  The results are plotted <a href="http://radfordneal.files.wordpress.com/2012/05/tmvnorm4lb.pdf" target="_blank" rel="nofollow">here</a>.  As you can see from the first and third plots, all the methods produce very consistent results.  The standard errors of the methods differ, however. Standard Gibbs sampling and permuation Gibbs sampling with random s<sub>i</sub> are very similar, but the standard errors for the coordinates are smaller by roughly a factor of two when s<sub>i</sub> is fixed at 0.0073, corresponding to about a four times efficiency advantage over standard Gibbs sampling.  Fixing the s<sub>i</sub> at 0.0187 gives a smaller advantage, and the advantage is smaller still when the s<sub>i</sub> are fixed at 0.0518.  When the s<sub>i</sub> are fixed at 0.1377, the results are very close to standard Gibbs sampling and permutation Gibbs sampling with random s<sub>i</sub>.</p>
<p>The standard errors for the estimates of the expected values of the squares of the coordinates are all about the same, except that they are sometimes bigger when the s<sub>i</sub> are fixed at 0.0073.</p>
<p>So it seems that fixing all the s<sub>i</sub> to 0.0187 gives clearly better results than standard Gibbs sampling.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Johnson</title>
		<link>http://radfordneal.wordpress.com/2012/05/03/non-random-mcmc/#comment-774</link>
		<dc:creator><![CDATA[Mark Johnson]]></dc:creator>
		<pubDate>Wed, 16 May 2012 12:02:42 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=1161#comment-774</guid>
		<description><![CDATA[It&#039;s very interesting that this deterministic modification of Gibbs sampling works at all, let alone so well, on this small example.  But deterministic quadrature rules can be efficient in low dimensions.  How well do these methods work in higher dimensions?]]></description>
		<content:encoded><![CDATA[<p>It&#8217;s very interesting that this deterministic modification of Gibbs sampling works at all, let alone so well, on this small example.  But deterministic quadrature rules can be efficient in low dimensions.  How well do these methods work in higher dimensions?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
