<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Inconsistent Maximum Likelihood Estimation: An &#8220;Ordinary&#8221; Example</title>
	<atom:link href="http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/feed/" rel="self" type="application/rss+xml" />
	<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/</link>
	<description></description>
	<lastBuildDate>Sat, 03 Oct 2009 22:21:37 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: ekzept</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-251</link>
		<dc:creator>ekzept</dc:creator>
		<pubDate>Fri, 10 Jul 2009 05:32:36 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-251</guid>
		<description>Regarding &quot;The big difference between Bayesian and non-Bayesian methods is that Bayesian methods integrate over the parameter space, and non-Bayesian methods don’t&quot;, frequentist methods are not the only non-Bayesian method. There are also Kullback-Leibler methods, means of inference based upon divergence measures between likelihood functions. See for instance Burnham and Anderson, Wildlife Research, 2001, 28, 111–119, &quot;KL information as a basis for strong inference in ecological studies.&quot;</description>
		<content:encoded><![CDATA[<p>Regarding &#8220;The big difference between Bayesian and non-Bayesian methods is that Bayesian methods integrate over the parameter space, and non-Bayesian methods don’t&#8221;, frequentist methods are not the only non-Bayesian method. There are also Kullback-Leibler methods, means of inference based upon divergence measures between likelihood functions. See for instance Burnham and Anderson, Wildlife Research, 2001, 28, 111–119, &#8220;KL information as a basis for strong inference in ecological studies.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Four Years Remaining &#187; Blog Archive &#187; Maximum Likelihood for Incomplete Data</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-193</link>
		<dc:creator>Four Years Remaining &#187; Blog Archive &#187; Maximum Likelihood for Incomplete Data</dc:creator>
		<pubDate>Thu, 27 Nov 2008 18:11:17 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-193</guid>
		<description>[...] turns out to be quite a reasonable suggestion, which leads to good estimates (although, there are some rare exceptions). Here, for example, it turns out that to estimate the mean you should simply compute the average [...]</description>
		<content:encoded><![CDATA[<p>[...] turns out to be quite a reasonable suggestion, which leads to good estimates (although, there are some rare exceptions). Here, for example, it turns out that to estimate the mean you should simply compute the average [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Radford Neal</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-75</link>
		<dc:creator>Radford Neal</dc:creator>
		<pubDate>Wed, 27 Aug 2008 01:35:36 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-75</guid>
		<description>Keith,

Here are the &lt;a href=&quot;http://radfordneal.files.wordpress.com/2008/08/keith-mlr.doc&quot; rel=&quot;nofollow&quot;&gt;R program&lt;/a&gt; and &lt;a href=&quot;http://radfordneal.files.wordpress.com/2008/08/keith-mlp.pdf&quot; rel=&quot;nofollow&quot;&gt;plots produced by it&lt;/a&gt; that you sent by email.  I haven&#039;t absorbed the plots entirely, but it&#039;s certainly the case that the inconsistency of the MLE is due to a few (one, typically) data points dominating.</description>
		<content:encoded><![CDATA[<p>Keith,</p>
<p>Here are the <a href="http://radfordneal.files.wordpress.com/2008/08/keith-mlr.doc" rel="nofollow">R program</a> and <a href="http://radfordneal.files.wordpress.com/2008/08/keith-mlp.pdf" rel="nofollow">plots produced by it</a> that you sent by email.  I haven&#8217;t absorbed the plots entirely, but it&#8217;s certainly the case that the inconsistency of the MLE is due to a few (one, typically) data points dominating.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Radford Neal</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-71</link>
		<dc:creator>Radford Neal</dc:creator>
		<pubDate>Tue, 26 Aug 2008 18:47:15 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-71</guid>
		<description>&lt;i&gt;if one took account of observations being actually discrete rather than truly continuous and replaced the density with the integral from obs - e to obs + e the inconsistency would go away - if e was big enough?&lt;/i&gt;

Yes. In that case, the data space would be finite (we can ignore the infinity in the big direction), and with enough data, the probability of each of these possible data values would be very well estimated. Values for the single model parameter map to distributions over these finite number of data values, in a continuous and probably one-to-one fashion, allowing the parameter to be well estimated by maximum likelihood.
However, you shouldn’t conclude that results about inconsistencies of MLEs aren’t relevant in practice, since data is always rounded. The practical effect of an inconsistent MLE is that the MLE also isn’t very good for finite amounts of data. Even if the data is rounded, so the MLE is consistent, this bad performance for finite amounts of data may remain. In a problems with higher-dimensional data, even fairly coarse rounding will still produce a huge number of possible data values, so that the finiteness may not be of any practical relevance.</description>
		<content:encoded><![CDATA[<p><i>if one took account of observations being actually discrete rather than truly continuous and replaced the density with the integral from obs &#8211; e to obs + e the inconsistency would go away &#8211; if e was big enough?</i></p>
<p>Yes. In that case, the data space would be finite (we can ignore the infinity in the big direction), and with enough data, the probability of each of these possible data values would be very well estimated. Values for the single model parameter map to distributions over these finite number of data values, in a continuous and probably one-to-one fashion, allowing the parameter to be well estimated by maximum likelihood.<br />
However, you shouldn’t conclude that results about inconsistencies of MLEs aren’t relevant in practice, since data is always rounded. The practical effect of an inconsistent MLE is that the MLE also isn’t very good for finite amounts of data. Even if the data is rounded, so the MLE is consistent, this bad performance for finite amounts of data may remain. In a problems with higher-dimensional data, even fairly coarse rounding will still produce a huge number of possible data values, so that the finiteness may not be of any practical relevance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keith O'Rourke</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-66</link>
		<dc:creator>Keith O'Rourke</dc:creator>
		<pubDate>Tue, 26 Aug 2008 15:07:21 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-66</guid>
		<description>Last comment was from me - forgot to enter email and name ;-)

Keith</description>
		<content:encoded><![CDATA[<p>Last comment was from me &#8211; forgot to enter email and name <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Keith</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-65</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Tue, 26 Aug 2008 15:05:53 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-65</guid>
		<description>I&#039;ll resend when I am back at my usual computer

Occured to me this morning if one took account of observations being actually discrete rather than truly continuous and replaced the density with the integral from obs - e to obs + e the inconsistency would go away - if e was big enough? 

cheers
Keith</description>
		<content:encoded><![CDATA[<p>I&#8217;ll resend when I am back at my usual computer</p>
<p>Occured to me this morning if one took account of observations being actually discrete rather than truly continuous and replaced the density with the integral from obs &#8211; e to obs + e the inconsistency would go away &#8211; if e was big enough? </p>
<p>cheers<br />
Keith</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Radford Neal</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-63</link>
		<dc:creator>Radford Neal</dc:creator>
		<pubDate>Tue, 26 Aug 2008 01:23:54 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-63</guid>
		<description>Keith,

I think the code got sort of destroyed in your comment by the blog software interpreting less-than and greater-than signs as HTML commands.  To get a less-than through,  you have to use ampersand, &quot;lt&quot;, semicolon.  I think... 

Here&#039;s a try: &lt;  Worked?</description>
		<content:encoded><![CDATA[<p>Keith,</p>
<p>I think the code got sort of destroyed in your comment by the blog software interpreting less-than and greater-than signs as HTML commands.  To get a less-than through,  you have to use ampersand, &#8220;lt&#8221;, semicolon.  I think&#8230; </p>
<p>Here&#8217;s a try: &lt;  Worked?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keith O'Rourke</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-60</link>
		<dc:creator>Keith O'Rourke</dc:creator>
		<pubDate>Mon, 25 Aug 2008 19:07:04 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-60</guid>
		<description>Looking at how the individual observation likelihood components combine (and contrast with each other) can be useful. 

This example is interesting to me because even though we know the observations were generated with a single (common) parameter most individual observation likelihoods support .6 but a few support (different?) small values around zero - and these few out weigh the majority... so the parameter value most likley to generate this set of observations is &quot;misleading&quot;. 

Simple modifications the the plot.lik code to get these plots are given below (on the log scale).

cheers
Keith

function (x,...)
{
  grid 0 &amp; x&lt;3],seq(0.01,3,by=0.01)))

  ll &lt;- rep(NA,length(grid))
  for (i in 1:length(grid))
  { ll[i] &lt;- log.lik(x,grid[i])
  }

  mlv &lt;- max(ll)
  mle &lt;- grid[ll==mlv][1]
  max.ll &lt;- round(mlv/log(10))

  #ll &lt;- ll - max.ll*log(10)
  #lik &lt;- exp(ll)
mlvi=(1:length(grid))[ll==mlv]
ll &lt;- ll - ll[ll==mlv] + 2
lik &lt;- ll

  plot(c(-0,2),c(-8,max(lik) + 2),type=&quot;n&quot;,
    xlab=&quot;data / parameter&quot;,
    ylab=paste(&quot;likelihood (x 10^&quot;,max.ll,&quot;)&quot;,sep=&quot;&quot;),
    ...)
  points(x,rep(0,length(x)),lty=19)
  lines(grid,lik)

  title(paste(&quot;Likelihood -&quot;,length(x),&quot;data points,&quot;,
              &quot;MLE approximately&quot;,round(mle,4)))
  iol=NULL
  for(k in 1:length(x)){
  for(ki in 1:length(grid))
  iol[ki]=log.lik(x[k],grid[ki]) 
  lines(grid,iol - iol[mlvi] + 2/length(x),col=2)}
}</description>
		<content:encoded><![CDATA[<p>Looking at how the individual observation likelihood components combine (and contrast with each other) can be useful. </p>
<p>This example is interesting to me because even though we know the observations were generated with a single (common) parameter most individual observation likelihoods support .6 but a few support (different?) small values around zero &#8211; and these few out weigh the majority&#8230; so the parameter value most likley to generate this set of observations is &#8220;misleading&#8221;. </p>
<p>Simple modifications the the plot.lik code to get these plots are given below (on the log scale).</p>
<p>cheers<br />
Keith</p>
<p>function (x,&#8230;)<br />
{<br />
  grid 0 &amp; x&lt;3],seq(0.01,3,by=0.01)))</p>
<p>  ll &lt;- rep(NA,length(grid))<br />
  for (i in 1:length(grid))<br />
  { ll[i] &lt;- log.lik(x,grid[i])<br />
  }</p>
<p>  mlv &lt;- max(ll)<br />
  mle &lt;- grid[ll==mlv][1]<br />
  max.ll &lt;- round(mlv/log(10))</p>
<p>  #ll &lt;- ll &#8211; max.ll*log(10)<br />
  #lik &lt;- exp(ll)<br />
mlvi=(1:length(grid))[ll==mlv]<br />
ll &lt;- ll &#8211; ll[ll==mlv] + 2<br />
lik &lt;- ll</p>
<p>  plot(c(-0,2),c(-8,max(lik) + 2),type=&#8221;n&#8221;,<br />
    xlab=&#8221;data / parameter&#8221;,<br />
    ylab=paste(&#8220;likelihood (x 10^&#8221;,max.ll,&#8221;)&#8221;,sep=&#8221;"),<br />
    &#8230;)<br />
  points(x,rep(0,length(x)),lty=19)<br />
  lines(grid,lik)</p>
<p>  title(paste(&#8220;Likelihood -&#8221;,length(x),&#8221;data points,&#8221;,<br />
              &#8220;MLE approximately&#8221;,round(mle,4)))<br />
  iol=NULL<br />
  for(k in 1:length(x)){<br />
  for(ki in 1:length(grid))<br />
  iol[ki]=log.lik(x[k],grid[ki])<br />
  lines(grid,iol &#8211; iol[mlvi] + 2/length(x),col=2)}<br />
}</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Radford Neal</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-53</link>
		<dc:creator>Radford Neal</dc:creator>
		<pubDate>Mon, 25 Aug 2008 01:58:25 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-53</guid>
		<description>Meng,

In this example, parameter values just above zero are the source of the problem, but of course other values might be the problem for other models.  By &quot;finite parameter space&quot; I meant that that there are only a finite number of possible values for the parameter (so the parameter space isn&#039;t an interval of the real line, for instance).  An approximation with a finite parameter space will be good for all values of the parameter if slight changes in parameter values don&#039;t have huge effects.  In the model of this post, that&#039;s not true, because even though the density never becomes singlular (so there aren&#039;t any infinite density values), it does become increasingly peaked as the parameter gets closer to zero.</description>
		<content:encoded><![CDATA[<p>Meng,</p>
<p>In this example, parameter values just above zero are the source of the problem, but of course other values might be the problem for other models.  By &#8220;finite parameter space&#8221; I meant that that there are only a finite number of possible values for the parameter (so the parameter space isn&#8217;t an interval of the real line, for instance).  An approximation with a finite parameter space will be good for all values of the parameter if slight changes in parameter values don&#8217;t have huge effects.  In the model of this post, that&#8217;s not true, because even though the density never becomes singlular (so there aren&#8217;t any infinite density values), it does become increasingly peaked as the parameter gets closer to zero.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Meng</title>
		<link>http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/#comment-52</link>
		<dc:creator>Meng</dc:creator>
		<pubDate>Mon, 25 Aug 2008 01:27:14 +0000</pubDate>
		<guid isPermaLink="false">http://radfordneal.wordpress.com/?p=91#comment-52</guid>
		<description>Re: Adam

A sample size of 30 can be large enough to well represent the true density. You may draw a histogram to see this.  Actually as illustrated by Radford, it will be even worse with more samples.

&quot;the infinity induced by the density concentrated at zero&quot; is a (infinite) value of the likelihood function at a parameter point (close to zero). The value of the likelihood function at a parameter value  equals to the product (or sum in log-scale) of the values of the density function at each data point with that parameter value. Therefore, for a given parameter value, how large the likehood function is is a combined effect from all the data points through their density function. However, if one data point has some type of dominant contribution (e.g. infinite density value),  averaging over even a large number of values may not be able to remove atypical phenonimena like &quot;infinity&quot;.

My opinion though.</description>
		<content:encoded><![CDATA[<p>Re: Adam</p>
<p>A sample size of 30 can be large enough to well represent the true density. You may draw a histogram to see this.  Actually as illustrated by Radford, it will be even worse with more samples.</p>
<p>&#8220;the infinity induced by the density concentrated at zero&#8221; is a (infinite) value of the likelihood function at a parameter point (close to zero). The value of the likelihood function at a parameter value  equals to the product (or sum in log-scale) of the values of the density function at each data point with that parameter value. Therefore, for a given parameter value, how large the likehood function is is a combined effect from all the data points through their density function. However, if one data point has some type of dominant contribution (e.g. infinite density value),  averaging over even a large number of values may not be able to remove atypical phenonimena like &#8220;infinity&#8221;.</p>
<p>My opinion though.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
