<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Radford Neal's blog</title>
	<atom:link href="http://radfordneal.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://radfordneal.wordpress.com</link>
	<description></description>
	<lastBuildDate>Sat, 07 Mar 2009 22:42:53 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='radfordneal.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/b961c000a5c200dd0da5115d93ee91c3?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>Radford Neal's blog</title>
		<link>http://radfordneal.wordpress.com</link>
	</image>
			<item>
		<title>Does coverage matter?</title>
		<link>http://radfordneal.wordpress.com/2009/03/07/does-coverage-matter/</link>
		<comments>http://radfordneal.wordpress.com/2009/03/07/does-coverage-matter/#comments</comments>
		<pubDate>Sat, 07 Mar 2009 22:42:53 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Statistics - Nontechnical]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=549</guid>
		<description><![CDATA[In response to Andrew Gelman&#8217;s extended April Fool&#8217;s diatribe on Objections to Bayesian Statistics, Larry Wasserman commented regarding physicists who want  guaranteed frequentist coverage for their confidence intervals that  &#8220;Their desire for frequentist coverage seems well justified. Someday, we can count how many of their intervals trapped the true parameter values and assess the coverage. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=549&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>In response to Andrew Gelman&#8217;s extended April Fool&#8217;s diatribe on <a href="http://ba.stat.cmu.edu/journal/2008/vol03/issue03/gelman.pdf">Objections to Bayesian Statistics</a>, Larry Wasserman <a href="http://ba.stat.cmu.edu/journal/2008/vol03/issue03/wasserman.pdf">commented</a> regarding physicists who want  guaranteed frequentist coverage for their confidence intervals that  &#8220;Their desire for frequentist coverage seems well justified. Someday, we can count how many of their intervals trapped the true parameter values and assess the coverage. The 95 percent frequentist intervals will live up to their advertised coverage claims. A trail of Bayesian intervals will, in general, not have this property&#8221;.</p>
<p>One thing to note about this statement is that it&#8217;s just not true.  Confidence intervals produced in actual scientific research are notorious for not covering the true value, even when they are produced using frequentist recipes.  This is why high-energy physicists insist on such absurdly high confidence levels (or absurdly low p-values) before declaring discoveries — what they call &#8220;five sigma&#8221; evidence, which corresponds to a p-value of less than 10<sup>-6</sup>. If taken seriously, quoting such a small p-value would be pointless, since any reader would surely assign a higher probability than that to the possibility that the &#8220;discovery&#8221; results from fraud or gross incompetence. The high confidence levels demanded are just an ad hoc way of trying to compensate for possible inadequacies in the statistical model used, which can easily make the true coverage probability be much less than advertised (or the true Type I error rate much higher than advertised).</p>
<p>Let&#8217;s ignore this, though, since discussions of theory omitting messy practical issues can be valuable.  The next thing to ask, then, is how it is possible that a 90% Bayesian probability interval — which purports to contain the true value with 90% probability —  can contain the true value less than 90% of the time.  A simple example will show how this can happen, and provide insight into whether we should care.<span id="more-549"></span></p>
<p>One can trivially get an example of low or zero coverage using a Bayesian method in which some parameter values are assigned low or zero prior probability, but that&#8217;s not very interesting, since if you use such a prior, you presumably want such values to be considered low probability (and hence excluded from posterior intervals) even if they&#8217;re not especially disfavoured by the data.  But here I&#8217;ll give an example in which the coverage is zero for a parameter value that has just as high prior probability as all the other parameter values.</p>
<p>Suppose we have an unknown parameter, θ, with ten possible values, 0, 1, &#8230;, 9.  We obtain data, x, which  has nine possible values, 1, &#8230;, 9.  If θ=0, the observation x is equally likely to be any of the values 1, &#8230;, 9 —  ie, the probabiity of  each of these values is 1/9.  If θ&gt;0, we will observe x=θ with probability one.  Suppose we use a uniform prior for θ, so each of the values 0, 1, &#8230; 9 has prior probability 1/10.</p>
<p>Once we observe x, only two values of θ will remain possible — θ=x and θ=0.  By assumption, the ratio of prior probabilities for these two values is one.  The ratio of likelihoods is 1 over 1/9, or 9, in favour of θ=x.  The posterior odds are the product of the prior odds and the likelihood ratio, so we find that the posterior probabilities are 9/10 for θ=x, 1/10 for θ=0, and zero for any other value of θ.  A 90% Bayesian probability region is therefore easy to define — it&#8217;s just the set consisting only of the observed value of x.</p>
<p>What&#8217;s the coverage of this Bayesian probability region?  Well, if θ&gt;0, the coverage is 100%, since when θ&gt;0 we are guaranteed to observe x=θ.  But when θ=0, the coverage is 0%, since we never produce a posterior probability region that includes 0.  Frequentist coverage is the minimum probability, for any true θ, that the region will include the true θ.  So the coverage for these Bayesian probability regions is zero.</p>
<p>Should zero coverage be cause for worry?  That depends first of all on whether we actually believe the model and the prior that were used to obtain the Bayesian regions, and secondly on whether a posterior probability region actually provides the information that we need.  It&#8217;s certainly possible that the answers to both these questions are &#8220;yes&#8221;, in which case the zero coverage should not be a cause for worry.  But I think that often Bayesian posterior regions are not what we want.</p>
<p>Recall that one way of looking at a 90% frequentist confidence region is as the set of all parameter values for which a hypothesis test would produce a p-value greater than 0.1 — ie, the set of parameter values that are consistent with the data, according to a hypothesis test using a 10% significance level.  From this point of view, the confidence region is a way of telling theorists what theories to discard — namely, all those theories that predict a value for the parameter outside the confidence region.</p>
<p>Does this work for the Bayesian posterior regions in the example above?  If we observe x=9, we construct the 90% posterior region {9}, and conclude that all theories that predict any value for the parameter other than 9 should be discarded (at least if we think 90% is high enough).  This is certainly the right thing to do for theories that predict that the parameter is 1, 2, &#8230;, 8, since those values are excluded by the data with certainty.  But what about parameter value 0?  It&#8217;s outside the 90% posterior region, but note that its posterior probability of 1/10 is <em>exactly the same</em> as its prior probability.  The observation of x=9 has not reduced the probability that θ=0 at all, so it certainly seems strange to say that θ=0 is now excluded by the data!</p>
<p>I think part of the problem is that reports of experimental results should not be aimed at presenting conclusions, as may seem most natural from a Bayesian viewpoint, but rather at providing the information with which the readers may draw conclusions.  This may be the source of some objections to the prior distribution in Bayesian analysis, which can be seen as corrupting the objective presentation of the experimental results, even though frequentist methods like p-values are not suitable presentations either.  In simple examples like the one above, the experimental results can be communicated fully and objectively (assuming the model is uncontroversial) by reporting the likelihood function — which in this example is 1 for Θ equal to the observed x, 1/9 for Θ equal to 0, and zero for any other Θ.  There is no need for either frequentist confidence regions or Bayesian posterior regions.   Readers can use the likelihood function to produce posterior regions based on whatever priors they like.  (Frequentists using methods that violate the Likelihood Principle are out of luck, but you can&#8217;t please everyone.)</p>
<p>In more complex problems with a high-dimensional parameter, however, just presenting the likelihood function is both infeasible and unenlightening.  One solution is to present a marginal likelihood function for just a smaller set of parameters of interest, integrating with respect to a prior distribution for the other &#8220;nuisance&#8221; parameters.  This only works if the prior for nuisance parameters is relatively uncontroversial, but that may often be the case (or at least, the experimenters may be the best people to formulate this prior).</p>
<p>If you think you must instead produce something like a &#8220;Bayesian confidence region&#8221;, perhaps the best method is to report the region for which the ratio of posterior probability (or density) to prior probability (or density) is not less than 0.1 (for an analogue of a 90% region).  This can also be thought of as the region containing all  Θ values for which the Bayes Factor is at least 0.1 for a comparison of the model in which this Θ has prior probability one to the model in which Θ has whatever prior you have set up as the &#8220;default&#8221;.  This can also be seen as the region of Θ where a theorist who assigns prior probability 1/2 to their pet theory predicting that value of Θ would not abandon their theory (using a 90% confidence level) if the they use your &#8220;default&#8221; prior for parameter values conditional on their theory being false.  For the example above, the regions produced in this way (with uniform default prior) would consist of the observed x plus 0.  Coverage would be 100%.</p>
<p>I hasten to add that this suggestion assumes a context in which various theories predict specific values for the parameter, and our interest is in which of these theories is true.  This is quite unlike many applications of confidence intervals, in which the parameters are things such as the average daily caloric intake of Canadian adults, or the regression coefficient of income on years of education.  For such parameters, the likelihood function (or marginal likelihood function) should be reported.  If this likelihood happens to be approximately normal, and concentrated in a region where any reasonable prior would be nearly uniform, you could go ahead and present a 90% posterior region if you&#8217;re so inclined.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/549/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/549/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/549/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/549/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/549/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/549/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/549/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/549/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/549/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/549/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=549&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2009/03/07/does-coverage-matter/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>
	</item>
		<item>
		<title>Downtown with Sky</title>
		<link>http://radfordneal.wordpress.com/2008/10/13/downtown-with-sky/</link>
		<comments>http://radfordneal.wordpress.com/2008/10/13/downtown-with-sky/#comments</comments>
		<pubDate>Mon, 13 Oct 2008 23:55:44 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Photography]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=533</guid>
		<description><![CDATA[
Click on image for larger version.
Downtown Toronto, August 2008. Mamiya C220 TLR, 80mm 1:2.8 lens, Fuji Reala 100 film, Nikon Coolscan 9000.
       <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=533&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p style="text-align:center;"><a href="http://radfordneal.files.wordpress.com/2008/10/05-building-small.jpg"><img class="size-full wp-image-325 aligncenter" src="http://radfordneal.files.wordpress.com/2008/10/05-building-tiny.jpg" alt="" /></a><span id="more-533"></span></p>
<p style="text-align:center;">Click on image for larger version.</p>
<p style="text-align:left;">Downtown Toronto, August 2008. Mamiya C220 TLR, 80mm 1:2.8 lens, Fuji Reala 100 film, Nikon Coolscan 9000.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/533/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/533/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/533/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/533/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/533/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/533/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/533/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/533/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/533/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/533/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=533&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/10/13/downtown-with-sky/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>

		<media:content url="http://radfordneal.files.wordpress.com/2008/10/05-building-tiny.jpg" medium="image" />
	</item>
		<item>
		<title>Answers to Applied PhD Comprehensive Question #2</title>
		<link>http://radfordneal.wordpress.com/2008/10/13/answers-to-applied-phd-comprehensive-question-2/</link>
		<comments>http://radfordneal.wordpress.com/2008/10/13/answers-to-applied-phd-comprehensive-question-2/#comments</comments>
		<pubDate>Mon, 13 Oct 2008 16:21:37 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Statistics - Nontechnical]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=525</guid>
		<description><![CDATA[I&#8217;ve been busy with teaching, so I&#8217;m only now getting around to posting the answers to the second applied statistic comprehensive exam question that I posted (here).
Originally, I&#8217;d thought that posting the answer would be simply a matter of extracting the answer I&#8217;d already written before.  But looking it over before posting, I noticed that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=525&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I&#8217;ve been busy with teaching, so I&#8217;m only now getting around to posting the answers to the second applied statistic comprehensive exam question that I posted (<a href="http://radfordneal.wordpress.com/2008/09/13/applied-statistics-phd-comprehensive-question-2/">here</a>).</p>
<p>Originally, I&#8217;d thought that posting the answer would be simply a matter of extracting the answer I&#8217;d already written before.  But looking it over before posting, I noticed that I&#8217;d made an additional unintentional error in the analysis that the question asks you to critique! Of course, one could say that the more errors the better, but I did need to update my answer to reflect this.</p>
<p>The error is that in the first analysis presented, I had intended to include interaction terms between treatment and covariates such as sex in the regression shown.  Due to some sort of momentary brain failure, I instead just put these covariates in by themselves.  When writing up my answer later (which students who wrote the exam got), my mind was set on the idea that I&#8217;d put in interaction terms, so this error wasn&#8217;t reflected in that answer.  I don&#8217;t think this had any significant effect on the marking, fortunately, since the comments on the later analyses aren&#8217;t really affected.  It does show how easy it is to keep seeing what you expect to see.</p>
<p>Here&#8217;s the <a href="http://radfordneal.files.wordpress.com/2008/10/q6-ans.pdf">PDF file with the answers</a>.  The questions are included as well, so no need to refer back to them.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/525/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=525&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/10/13/answers-to-applied-phd-comprehensive-question-2/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>
	</item>
		<item>
		<title>Design Flaws in R #3 — Zero Subscripts</title>
		<link>http://radfordneal.wordpress.com/2008/09/21/design-flaws-in-r-3-%e2%80%94-zero-subscripts/</link>
		<comments>http://radfordneal.wordpress.com/2008/09/21/design-flaws-in-r-3-%e2%80%94-zero-subscripts/#comments</comments>
		<pubDate>Sun, 21 Sep 2008 18:34:52 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Statistics - Computing]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=503</guid>
		<description><![CDATA[Unlike the two design flaws I posted about before (here, here, and also here), where one could at least see a reason for the design decision, even if it was unwise, this design flaw is just  incomprehensible.  For no reason at all that I can see, R allows one to use zero as a subscript [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=503&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Unlike the two design flaws I posted about before (<a href="http://radfordneal.wordpress.com/2008/08/06/design-flaws-in-r-1-reversing-sequences/">here</a>, <a href="http://radfordneal.wordpress.com/2008/08/20/design-flaws-in-r-2-%E2%80%94-dropped-dimensions/">here</a>, and also <a href="http://radfordneal.wordpress.com/2008/08/25/r-design-flaws-1-and-2-a-solution-to-both/">here</a>), where one could at least see a reason for the design decision, even if it was unwise, this design flaw is just  incomprehensible.  For no reason at all that I can see, R allows one to use zero as a subscript without triggering an error.  (Remember that in R, indexes for vectors and matrices start at one, not zero.)</p>
<p>This is of course a terrible decision, because it makes debugging harder, and makes it more likely that bugs will exist that have never been noticed.<span id="more-503"></span></p>
<p>So what does R do with a zero subscript, seeing as it&#8217;s meaningless?  It just ignores it, which is possible because it views all numeric subscripts as vectors, that extract or replace a set of elements, not necessarily just one.   So R simply removes all zeros from a vector used as a subscript, producing a shorter vector.</p>
<p>Here&#8217;s what happens (with the current version of R, 2.7.2):</p>
<pre>   &gt; a
   [1] 10 20 30 40 50
   &gt; a[0]
   numeric(0)
   &gt; a[c(4,2)]
   [1] 40 20
   &gt; a[c(4,0,2,0)]
   [1] 40 20
   &gt; a[0] &lt;- 7
   [1] 10 20 30 40 50
   &gt; a[c(4,0,2,0)] &lt;- 7
   [1] 10  7 30  7 50</pre>
<p>Contrast this with what happens when you use a subscript that is too large:</p>
<pre>   &gt; a
   [1] 10 20 30 40 50
   &gt; a[7]
   [1] NA
   &gt; a[c(4,7,2)]
   [1] 40 NA 20
   &gt; a[7] &lt;- 7
   [1] 10 20 30 40 50 NA  7</pre>
<p>Extending vectors automatically when an assignment is made beyond the end can obviously be useful (though it might be wiser not to).  Returning NA when extracting an element beyond the end is also a sensible action (though signalling an error immediately might be more useful for debugging). And negative subscripts are usefully defined as referring to their complement. But what possible use is there for ignoring zero subscripts rather than signalling an error?</p>
<p>It&#8217;s perhaps belabouring the obvious, but let me explain that signalling an error when a zero subscript is used is desirable because this is a very common sort of program bug.  It can easily arise when a program is scanning backwards through the vector elements, and goes one step too far.  It can also easily arise when data is initialized to zeros, with the intent to replace the zeros with something sensible later, but actually some zeros are never replaced. The way R behaves when zero is used as a subscript when replacing elements is particularly bad, since doing nothing at all can easily lead to an apparently working program that produces wrong answers.  (The behaviour of returning an empty vector when zero is used as a subscript when extracting an element is more likely to produce an error later on, so that at least the problem will be evident.)</p>
<p>So what should be done?  That&#8217;s easy — change R so that use of zero as a subscript produces an immediate error.  That&#8217;s trivial to do (mixing positive and negative subscripts produces an immediate error now, so the apparatus for it must be there).  Might that break some existing programs?  Yes, it will.  But 99.9% of those programs are <em>already broken</em>.  The users just don&#8217;t know it, thinking that the answers they get are correct when they&#8217;re not.  The remaining 0.1% of these broken programs were written by really stupid programmers who thought that exploiting an obscure and unwise feature in order to produce a really hard-to-understand program was a good idea.  It wasn&#8217;t.</p>
<p>Along with this, R should be changed so that using NA as a subscript when replacing elements in a vector also produces an error.  What to do with NA subscripts used to extract elements is a little bit harder to decide, but it seems to me that something about the following is a bit funny:</p>
<pre>   &gt; a
   [1] 10 20 30 40 50
   &gt; a[NA]
   [1] NA NA NA NA NA
   &gt; a[NA+0]
   [1] NA</pre>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/503/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/503/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/503/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/503/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/503/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/503/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/503/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/503/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/503/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/503/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=503&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/09/21/design-flaws-in-r-3-%e2%80%94-zero-subscripts/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>
	</item>
		<item>
		<title>Applied Statistics PhD Comprehensive Question #2</title>
		<link>http://radfordneal.wordpress.com/2008/09/13/applied-statistics-phd-comprehensive-question-2/</link>
		<comments>http://radfordneal.wordpress.com/2008/09/13/applied-statistics-phd-comprehensive-question-2/#comments</comments>
		<pubDate>Sat, 13 Sep 2008 21:39:36 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Statistics - Nontechnical]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=490</guid>
		<description><![CDATA[PhD students in the Dept. of Statistics at the University of Toronto normally write three comprehensive exams at the end of their first year, in Probability, Theoretical Statistics, and Applied Statistics.  Below is a question I set for the 2007 exam in Applied Statistics.  It may be an interesting exercise for others too. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=490&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>PhD students in the Dept. of Statistics at the University of Toronto normally write three comprehensive exams at the end of their first year, in Probability, Theoretical Statistics, and Applied Statistics.  Below is a question I set for the 2007 exam in Applied Statistics.  It may be an interesting exercise for others too.  It should in theory be doable by someone with just a good introductory undergraduate course in statistics, including multiple regression.  However, many PhD students had difficulty with it, so I wouldn&#8217;t say it&#8217;s easy.</p>
<p>The question is <a href="http://radfordneal.files.wordpress.com/2008/09/q6.pdf">here</a>. I&#8217;ll post my answer in a week or so.</p>
<p>My previous post with a question from the 2008 exam is <a href="http://radfordneal.wordpress.com/2008/08/11/applied-statistics-phd-comprehensive-question-1/">here</a>.</p>
<p><strong>Update:</strong> <a href="http://radfordneal.wordpress.com/2008/10/13/answers-to-applied-phd-comprehensive-question-2/">Here </a>is the post with the answers.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/radfordneal.wordpress.com/490/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/radfordneal.wordpress.com/490/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/490/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/490/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/490/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/490/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/490/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/490/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/490/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/490/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/490/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/490/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=490&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/09/13/applied-statistics-phd-comprehensive-question-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>
	</item>
		<item>
		<title>Amazement</title>
		<link>http://radfordneal.wordpress.com/2008/09/07/amazement/</link>
		<comments>http://radfordneal.wordpress.com/2008/09/07/amazement/#comments</comments>
		<pubDate>Sun, 07 Sep 2008 17:22:09 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Photography]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=475</guid>
		<description><![CDATA[
Click on image for larger version.
Nikon FG, Nikkor AIS 1:1.4 50mm, Black&#8217;s ISO 200 film, Nikon Coolscan V.
       <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=475&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p style="text-align:center;"><a href="http://radfordneal.files.wordpress.com/2008/09/21-eleanor-defocus-5-cropped-small.jpg"><img class="size-full wp-image-325 aligncenter" src="http://radfordneal.files.wordpress.com/2008/09/21-eleanor-defocus-5-cropped-tiny.jpg" alt=""></a><span id="more-475"></span></p>
<p style="text-align:center;">Click on image for larger version.</p>
<p style="text-align:left;">Nikon FG, Nikkor AIS 1:1.4 50mm, Black&#8217;s ISO 200 film, Nikon Coolscan V.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/radfordneal.wordpress.com/475/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/radfordneal.wordpress.com/475/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/475/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/475/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/475/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/475/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/475/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/475/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/475/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/475/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/475/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/475/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=475&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/09/07/amazement/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>

		<media:content url="http://radfordneal.files.wordpress.com/2008/09/21-eleanor-defocus-5-cropped-tiny.jpg" medium="image" />
	</item>
		<item>
		<title>Down Syndrome and Decision Theory</title>
		<link>http://radfordneal.wordpress.com/2008/09/07/down-syndrome-and-decision-theory/</link>
		<comments>http://radfordneal.wordpress.com/2008/09/07/down-syndrome-and-decision-theory/#comments</comments>
		<pubDate>Sun, 07 Sep 2008 17:06:18 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Statistics - Nontechnical]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=381</guid>
		<description><![CDATA[I have a wonderful 11-month-old daughter, who thankfully is entirely healthy.   During the pregnancy, my wife and I were of course worried about the possibility of a congenital defect, of which the most prominent is Down Syndrome.  Today, couples must make a series of complex decisions — whether to have a screening [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=381&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I have a wonderful 11-month-old daughter, who thankfully is entirely healthy.   During the pregnancy, my wife and I were of course worried about the possibility of a congenital defect, of which the most prominent is Down Syndrome.  Today, couples must make a series of complex decisions — whether to have a screening test for Down Syndrome, whether (based on its result) to have a more risky diagnostic test, and of course, what to do if the final result is that the fetus has Down Syndrome.  These decisions depend on moral judgements, on various facts regarding the nature of the fetus at various ages, regarding the nature of Down Syndrome, and regarding the reliability and dangers of the tests, and finally, on the proper way to use this information to make a decision.</p>
<p>This last aspect is in the domain of decision theory, and will be the main focus of this post.  Decision theory purports to show how a decision-maker should use the probabilities of the various possible outcomes along with their personal &#8220;utilities&#8221; for these outcomes to make a rational decision, which maximizes their expected utility. The validity of decision theory as a guide to rational action has often been challenged. The <a href="http://www.overcomingbias.com/2008/01/allais-paradox.html">Allais Paradox</a> describes one situation where decision theory does not accord with the judgements of many people, and some argue that the fault is not with these people, but rather with decision theory.  Interestingly, Down Syndrome testing involves an analogue of the Allais Paradox.<span id="more-381"></span></p>
<p>For readers who aren&#8217;t familiar with pre-natal tests for Down Syndrome, I&#8217;ll first present the main facts:</p>
<ul>
<li>Down syndrome results from a chromosomal abnormality, which causes cognitive impairment and a wide range of physical problems.  Both the degree of cognitive impairment and the severity of the physical problems are highly variable.</li>
<li>The risk of Down Syndrome increases with maternal age, from less than 1 i 1000 for young women, to 1 in 100 for women age 40, to 1 in 30 for women age 45, and to 1 in 12 for the rare  birth to a woman age 49.</li>
<li>Whether a fetus has Down Syndrome can be determined to high accuracy between 16 and 22 weeks into the pregnancy by performing amniocentesis, which is an invasive procedure that has about a 1 in 200  chance of causing the fetus to miscarry. Results are available after 2 to 4 weeks.</li>
<li>Several non-invasive screening procedures (blood tests and ultrasound examination) that are performed at various times from 11 weeks to 16 weeks into the pregnancy can provide information on how likely the fetus is to have Down Syndrome.  Combined with information on maternal age, these can give a probability that the fetus has Down Syndrome.</li>
<li>The results of these screening tests never definitely indicate Down Syndrome.  They are used only to decide whether amniocentesis should be done.</li>
</ul>
<p>Pretty much the only point of testing for Down Syndrome is to provide the option of terminating the pregnancy if the fetus is confirmed to have Down Syndrome.  (The only other purpose I can see would be to prepare mentally or financially for having a child with Down Syndrome, but given the risk of amniocentesis, I think few well-informed couples would regard this as a sensible reason to have it done.)  The tests available in Ontario are described at<a href="http://www.lhsc.on.ca/programs/rmgc/mss/what_mms.htm"> this web site</a>, which gives all the facts above (plus more, I&#8217;ve slightly simplified the available options), but which strangely makes no mention of pregnancy termination, except buried in the PDF files of some pamphlets.  <a href="http://www.acog.org/publications/patient_education/bp164.cfm">This page</a> by the American College of Obstetricians and Gynecologists also manages to avoid any mention of pregnancy termination.  One has to wonder what the rationale might be for this stunning failure to inform people of the purpose of the tests they discuss.</p>
<p>Clearly, if you are certain that you wouldn&#8217;t want to terminate the pregnancy even if you were sure the fetus has Down Syndrome, then you shouldn&#8217;t do any of these tests.  At least, that&#8217;s so if we ignore the complication that some of these tests can also detect some rarer conditions, such as Trisomy 18, which are much more severe than Down Syndrome, as well as some rarer conditions that might be treatable.</p>
<p>I suspect that few people are so certain about what they would actually do if faced with a decision of whether to terminate a Down Syndrome pregnancy.   On the question of when a fetus becomes &#8220;human&#8221; (possessing human rights), most people are not extremists — they neither believe that a fertilized egg is fully human, nor believe that a fetus just about to be born has no moral standing.  Instead, most people are unsure when a fetus becomes human, and think in any case that the process is a gradual one.  Furthermore, where their views fall within this non-extreme range may well be altered by the experience of pregnancy (especially the first time).  Also, on learning that their unborn child has Down Syndrome, most couples are likely to learn more about Down Syndrome before making a decision. I do assume that few people regard Down Syndrome as a desirable condition, and indeed, that most people would consider it unethical to conceive a child if they knew for certain (before conception) that the child would have Down Syndrome. (Note that this is a hypothetical situation which never arises in practice.)</p>
<p>Most people should therefore have the non-invasive tests, and then think about the issue more once they have the results.  The first question is whether to have amniocentesis done, which would provide an accurate diagnosis of whether the fetus has Down Syndrome, but which has a 1 in 200 chance of causing a miscarriage.</p>
<p>It&#8217;s at this point that decision theory has something to say.  Suppose that the non-invasive tests (along with maternal age) give a 1 in 1000 chance of Down Syndrome.  You could try to directly weigh the benefit (given this chance) of using amniocentesis to confirm whether the fetus really has Down Syndrome against the 1 in 200 chance that amniocentesis would cause a miscarriage, and come to a decision.   But applying the &#8220;Independence Axiom&#8221; of decision theory may clarify the situation.</p>
<p>Let&#8217;s ignore the 1 in 200,000 chance that the fetus has Down Syndrome <em>and </em>would be lost to a miscarriage if amniocentesis were done.  We can then visualize the decision in terms of 1000 hypothetical pregnancies.  In one of these pregnancies, but not the others, the fetus has Down Syndrome.  In 5 of these pregnancies, a miscarriage will occur if amniocentesis is done.  In 994 of these pregnancies, the fetus does not have Down Syndrome, and amniocentesis would not cause a miscarriage.  For these last 994 pregnancies, it <em>makes no difference</em> whether amniocentesis is done or not.  These pregnancies can therefore be ignored when making a decision.</p>
<p>Pretending, therefore, that one of the remaining 6 pregnancies is the real one, the decision looks like this:   If you do nothing, there&#8217;s a 1 in 6 chance you&#8217;ll have a child with Down Syndrome.  If you have amniocentesis done, that&#8217;s effectively like terminating the pregnancy, since for 5 of the 6 pregnancies miscarriage will result, and for the other (with Down Syndrome) you would likely decide to terminate the pregnancy.  The decision whether or not to have amniocentesis when there&#8217;s a 1 in 1000 chance of Down Syndrome is therefore equivalent to a hypothetical decision whether or not to terminate a pregnancy when there is a 1 in 6 chance of Down Syndrome and no further diagnostic test is available.  Of course, if the non-invasive tests produced a different probability, we&#8217;d have a different equivalent problem — for example, with a 1 in 2000 probability of Down Syndrome, the equivalent decision is whether to terminate a pregnancy when there&#8217;s a 1 in 11 chance of Down Syndrome.</p>
<p>Is making a hypothetical equivalent decision of this sort any easier than making the original decision?  I think so, because the probabilities are less extreme, and the decision is more similar to other decisions that you may have already considered.</p>
<p>One highly relevant comparison is that about 1 in 30 children are born with some sort of serious congenital defect, most of which can&#8217;t be diagnosed before birth.  If you don&#8217;t find this level of risk acceptable, you shouldn&#8217;t be thinking of having children. The chance of Down Syndrome in the hypothetical decision described above will often be close enough to this that you can think about the risk in the same way, and decide whether you regard it as also being acceptable.</p>
<p>A few years ago, before any prenatal tests were available, a pregnant woman age 49 would have faced the choice of whether to terminate her pregnancy based simply on the overall risk of 1 in 12 of Down Syndrome at her age.  This is about equivalent to the choice of whether to have amniocentesis done when the non-invasive tests (plus age) give a 1 in 2000 probability of Down Syndrome.</p>
<p>It may also be helpful to compare with the current world-wide infant mortality rate (deaths in the first year) of around 1 in 20. Infant mortality is around 1 in 200 in developed countries, and around 1 in 6 in a few countries (such as Angola and Afghanistan).  Infant mortality in the United States in 1950 was about 1 in 30.  These figures tell you something about what other people  have regarded as acceptable risks.</p>
<p>Ultimately, of course, your decision will still depend on your personal risk tolerance, your view of Down Syndrome, your view of the humanity of a fetus age 16 weeks or more,  and the probability of another pregnancy if you terminate this one.</p>
<p>This last point would seem to be the dominant consideration for anyone who does not view a 16-week (or possibly up to 24-week) fetus as human.  Its implication is that older women who want a child should be less inclined to have amniocentesis done than younger women, for a given probability of carrying a Down Syndrome fetus.  (Of course, maternal age is one factor on which this probability is based.) <a href="http://freakonomics.blogs.nytimes.com/2008/09/04/the-economics-of-the-amniocentesis">This recent post</a> (which I found <a href="http://www.marginalrevolution.com/marginalrevolution/2008/09/assorted-link-2.html">via Marginal Revolution</a>) discusses this issue.</p>
<p>The equivalence I present above depends, of course, on the validity of the Independence Axiom. In my view, it is obviously correct, even if its implications are not all obvious.  The situation with Down Syndrome testing is very analogous to that in the <a href="http://www.overcomingbias.com/2008/01/allais-paradox.html">Allais Paradox</a>, which has been taken by some as a refutation of the Independence Axiom, though such a conclusion seems unjustified to me.  I think the Down Syndrome testing situation is more interesting, and more real, than the gambling scenario in the typical presentation of the Allais Paradox. (Down Syndrome and decision theory has also been discussed in a comment by A. P. Dawid in <em>Statistical Science</em>, November 1986.)</p>
<p>You will have to judge for yourself whether your intuition satisfies the Independence Axiom, or whether the decision you would intuitively favour for the original problem — whether to have amniocentesis with a 1 in 200 chance of miscarriage if there&#8217;s a 1 in 1000 chance of Down Syndrome — differs from the decision you would intuitively favour in the &#8220;equivalent&#8221; problem — whether to terminate a pregnancy due to a 1 in 6 chance of Down Syndrome.</p>
<p>Suppose that you do find a difference in your intuitions.  Those who argue against decision theory (and the Independence Axiom in particular) assume that such differences cast doubt on its validity.  But what would be the purpose of developing a theory of decision making if <em>everything it told you was intuitively clear to you anyway?</em></p>
<p>It&#8217;s actually a great benefit of decision theory that it demonstrates that problems which seem different to some people are actually equivalent.  This provides an opportunity for further thought, leading to a more satisfactory decision.  Analogous situations arise with probability theory, where inconsistent subjective probability judgements indicate that more thought is needed.  One should note, however, that sometimes the resolution of such inconsistencies is not that one or both judgements are wrong, but that both are right, with the apparent conflict being due to a failure when formalizing the problem to include some non-obvious but relevant aspects (such as the effect of the decision made on the decision-maker&#8217;s reputation for good decision-making).</p>
<p>Some recent studies have shown much lower risk of miscarriage from amniocentesis than 1 in 200 (1 in 1000 or lower), but caution in interpreting these results is needed, since they may not apply to the particular facility where you would have amniocentesis done. Better non-invasive tests, such as <a href="http://www.newscientist.com/article.ns?id=dn11095">ones that look at fetal cells in the mother&#8217;s blood</a>, may also become available.  We can hope that this interesting decision theory problem will soon cease to be real.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/radfordneal.wordpress.com/381/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/radfordneal.wordpress.com/381/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/381/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=381&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/09/07/down-syndrome-and-decision-theory/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>
	</item>
		<item>
		<title>R Design Flaws #1 and #2:  A Solution to Both?</title>
		<link>http://radfordneal.wordpress.com/2008/08/25/r-design-flaws-1-and-2-a-solution-to-both/</link>
		<comments>http://radfordneal.wordpress.com/2008/08/25/r-design-flaws-1-and-2-a-solution-to-both/#comments</comments>
		<pubDate>Tue, 26 Aug 2008 03:42:10 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Statistics - Computing]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=351</guid>
		<description><![CDATA[I&#8217;ve previously posted about two design flaws in R. The first post was about how R produces reversed sequences from a:b when a&#62;b, with bad consequences in &#8220;for&#8221; statements (and elsewhere).  The second post was about how R by default drops dimensions in expressions like M[i:j,] when i:j is a sequence only one long [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=351&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I&#8217;ve previously posted about two design flaws in R. The <a href="http://radfordneal.wordpress.com/2008/08/06/design-flaws-in-r-1-reversing-sequences/">first post</a> was about how R produces reversed sequences from a:b when a&gt;b, with bad consequences in &#8220;for&#8221; statements (and elsewhere).  The <a href="http://radfordneal.wordpress.com/2008/08/20/design-flaws-in-r-2-%e2%80%94-dropped-dimensions/">second post</a> was about how R by default drops dimensions in expressions like M[i:j,] when i:j is a sequence only one long (ie, when i equals j).</p>
<p>In both posts, I suggested ways of extending R to try to solve these problems.   I now think there is a better way, however, which solves both problems with one simple extension to R.  This extension would also make R programs run faster and use less memory.<span id="more-351"></span></p>
<p>Recall the problems, and the solutions I proposed previously&#8230; To stop sequences from reversing, we need a new operator to use rather than 1:n, which suffers from the reversal problem when n is zero (which can&#8217;t be changed for compatibility reasons).  I suggested 1:&gt;:n.  To stop dimenions from being dropped, I suggested using semicolons rather than commas to separate subscripts. My new suggestion is that in the most common case where the indexing vector is a sequence, we could use the same new operator introduced to solve the reversing problem — ie, we write M[1:&gt;:n,] to get the first n rows, and define it so that the result isn&#8217;t converted to a vector when n is one.</p>
<p>Now, this may seem like it won&#8217;t work.  If 1:&gt;:n returns a vector, then when n is one, it&#8217;s a vector of length one, which has to (by default) lead to the dimension being dropped if ordinary subscripting by scalars is to work (since scalars in R are really vectors of length one).</p>
<p>The solution is for 1:&gt;:n to <strong>not </strong>return a vector, but instead return a new data type that just records its two operands.  This new data type — perhaps it could be called an indexing pair — would have to be recognized by &#8220;for&#8221; statements and by the subscripting operator.  A dimension indexed by such an indexing pair would never be dropped.  One could add an operator :&lt;: as well, for iterating or subscripting in descending sequence, though this is much less common.  If this descending operator is omitted, I think .. (two periods) would be a better name than :&gt;: for the ascending indexing pair operator (but it unfortunately has no obviously mnemonic descending counterpart).</p>
<p>One disadvantage of this solution is that it doesn&#8217;t address the dropped dimension problem in the general case where the vector index may not be a sequence.  But indexing by an ascending sequence is by far the most common case, so having to write drop=FALSE for the others may be OK.  (However, perhaps knowledge of the drop=FALSE option would be less common if it is needed less often.) Similary, this solution doesn&#8217;t address the problem of reversing sequences outside the context of for loops and subscripting, but those are the most common uses.</p>
<p>One extra advantage of introducing indexing pairs is that they will take up a trivial amount of storage.  In contrast, if you use 1:1000000 to iterate a million times in a for loop, R will allocate 4 Megabyes of storage to hold this sequence.  Producing this sequence also takes time, of course.  It would be possible for R to avoid this cost when using 1:1000000 in a for loop, by treating this combination specially, but it doesn&#8217;t (in version 2.4.1 at least):</p>
<pre>   &gt; gc()
            used (Mb) gc trigger (Mb) max used (Mb)
   Ncells 234834  6.3     467875 12.5   407500 10.9
   Vcells 104319  0.8     786432  6.0   690698  5.3
   &gt; for (i in 1:1000000) if (i==1000000) print(gc())
            used (Mb) gc trigger (Mb) max used (Mb)
   Ncells 234855  6.3     467875 12.5   467875 12.5
   Vcells 604321  4.7     905753  7.0   720903  5.5
   &gt; gc()
            used (Mb) gc trigger (Mb) max used (Mb)
   Ncells 234904  6.3     467875 12.5   467875 12.5
   Vcells 104328  0.8     786432  6.0   720903  5.5</pre>
<p>Notice how the memory usage goes up by 3.9 Megabytes during the for loop.</p>
<p>Another extra advantage of introducing :&gt;: (or ..) is that the precedence of this operator could be made lower than that of any other operator (with no loss, since an indexing pair wouldn&#8217;t be a valid operand for any other operator).   Expressions like 1:&gt;:n-1 would then do what the programmer meant (unlike 1:n-1).</p>
<p>So, does anyone see any flaws in this solution?</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/radfordneal.wordpress.com/351/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/radfordneal.wordpress.com/351/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/351/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=351&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/08/25/r-design-flaws-1-and-2-a-solution-to-both/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>
	</item>
		<item>
		<title>Answers to Applied PhD Comprehensive Question #1</title>
		<link>http://radfordneal.wordpress.com/2008/08/23/answers-to-applied-phd-comprehensive-question-1/</link>
		<comments>http://radfordneal.wordpress.com/2008/08/23/answers-to-applied-phd-comprehensive-question-1/#comments</comments>
		<pubDate>Sat, 23 Aug 2008 16:57:33 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Statistics - Nontechnical]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=340</guid>
		<description><![CDATA[This post links to a Question I set for an applied statistics PhD comprehensive exam.  My answers for this question are here (the question is repeated there, so no need to look at the old post).
Note that my answers are more elaborate than I would expect a student to write on an exam.  [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=340&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><a href="http://radfordneal.wordpress.com/2008/08/11/applied-statistics-phd-comprehensive-question-1/">This post</a> links to a Question I set for an applied statistics PhD comprehensive exam.  My answers for this question are <a href="http://radfordneal.files.wordpress.com/2008/08/myq-ans-2008.pdf">here</a> (the question is repeated there, so no need to look at the old post).</p>
<p>Note that my answers are more elaborate than I would expect a student to write on an exam.  I also gave credit for discussions that showed some insight, even if the final answer wasn&#8217;t completely correct.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/radfordneal.wordpress.com/340/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/radfordneal.wordpress.com/340/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/340/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=340&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/08/23/answers-to-applied-phd-comprehensive-question-1/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>
	</item>
		<item>
		<title>Young Explorer</title>
		<link>http://radfordneal.wordpress.com/2008/08/20/young-explorer/</link>
		<comments>http://radfordneal.wordpress.com/2008/08/20/young-explorer/#comments</comments>
		<pubDate>Thu, 21 Aug 2008 03:13:38 +0000</pubDate>
		<dc:creator>Radford Neal</dc:creator>
				<category><![CDATA[Photography]]></category>

		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=327</guid>
		<description><![CDATA[
Click on image for larger version.
Nikon FG, Nikon Series E 1:1.8 50mm, Black&#8217;s ISO 200 film, Nikon Coolscan V.
       <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=327&subd=radfordneal&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p style="text-align:center;"><a href="http://radfordneal.files.wordpress.com/2008/08/08-eleanor-grass-small.jpg"><img class="size-full wp-image-325 aligncenter" src="http://radfordneal.files.wordpress.com/2008/08/08-eleanor-grass-tiny.jpg?w=435&#038;h=293" alt="" width="435" height="293" /></a><span id="more-327"></span></p>
<p style="text-align:center;">Click on image for larger version.</p>
<p style="text-align:left;">Nikon FG, Nikon Series E 1:1.8 50mm, Black&#8217;s ISO 200 film, Nikon Coolscan V.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/radfordneal.wordpress.com/327/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/radfordneal.wordpress.com/327/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/radfordneal.wordpress.com/327/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/radfordneal.wordpress.com/327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/radfordneal.wordpress.com/327/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/radfordneal.wordpress.com/327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/radfordneal.wordpress.com/327/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/radfordneal.wordpress.com/327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/radfordneal.wordpress.com/327/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/radfordneal.wordpress.com/327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/radfordneal.wordpress.com/327/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/radfordneal.wordpress.com/327/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=radfordneal.wordpress.com&blog=4390751&post=327&subd=radfordneal&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://radfordneal.wordpress.com/2008/08/20/young-explorer/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">radfordneal</media:title>
		</media:content>

		<media:content url="http://radfordneal.files.wordpress.com/2008/08/08-eleanor-grass-tiny.jpg" medium="image" />
	</item>
	</channel>
</rss>