At the other extreme, for models in which the parameter space is a finite set, the MLE will certainly be consistent.

Looking at both of these extremes may give you more insight into what to expect from an MLE.

]]>Example, my ‘family’ is a function of t: at one end (say t0) it does not include delta function. (Delta function is just an extreme, showing how absurdly narrow distributions in my/your ‘family’ could be)

Such a ‘family’ will never converge to a right original distribution because as it hits any data point, it MLE will diverge for t <0.

I do want to point out that it is a great lesson and amazing work nevertheless, I simply disagree that it is an ORDINARY example. Boy, its anything but, It includes divergent likely-hoods.

Am I getting something wrong here? It's always good to know if you are wrong, that's where we learn the most.

]]>Trying to think what you might be thinking, however, my guess is that you think that all real data is discrete, not continuous, and hence all real likelihoods are based on probability mass functions, not probability density functions. There’s something to be said for this view, and it does formally eliminate the inconsistency of the MLE in this example if you assume that the data has limited precision.

However, in high-dimensional problems the finite space of possible data sets is extremely large, even assuming individual values are rounded to not-too-much precision. It may then be more enlightening to consider continuous data (even if that’s an unrealizable idealization) than to trust that the MLE is guaranteed to be consistent in finite settings, when convergence to the correct value may in practice occur extremely slowly.

]]>L(Data)=P(Data|Model). For normally distributed IID data, this likelihood represented by the product point sampled Gaussians CAN be a good approximation of un-normalized probability, but aren’t always. Here they are not, for the peaked distribution surrounding the point closest to zero (singling out that point as sigma ->0) .

As likelihood is a probability, L(D) is never >1. I believe the issue here is in a problematic estimate of likelihood, not inconsistency of the MLE.

]]>You claim that consistency of this MLE is “a mathematical fact”, demonstrated by some theorem, which you don’t quote. There are lots of theorems about consistency of the MLE. They all have premises. The conclusion only holds if the premises hold. They don’t in this case.

]]>The ML estimator for the considered likelihood function is perfectly consistent and asymptotically normal (provided t>0). This is a mathematical fact. Consistence or asymptotic normality are theorems, not simulations. In this case, it is. Naturally, the second step is to obtain a bound on the error committed approximating the distribution of the estimator by a normal variable, and that is provided by the Berry-Esseen Theorem, form where you will know that a large n is required, or alternatively if your sample is small (30 or 100) at least you should use an Edgeworth Expansion approximation instead of a normal. So theory works well, but you need to be more careful.

The post shows clearly how misleading is to use simulations as a poor substitute of mathematics, and therefore it does not prove inconsistence, just that sample sizes must be larger. But to show it you need a full simulation. Run a Monte Carlo with 5.000 sample draws, and take n=10.000 and then take a look to the results. There is nothing in the model that affects consistency or asymptotic normality.

]]>http://econ.ucsb.edu/~doug/researchpapers/Testing%20for%20Regime%20Switching%20A%20Comment.pdf

]]>Integration is always with respect ot some measure, which one might call a “prior”, and if you think having a “prior” means you’re “Bayesian”, then my statement is true. You might not think that, of course, but my point was really that you have to add something like a prior to the MLE framework before you can sensibly talk about integration over points near the MLE, as in the first comment.

]]>Just dropped by to refresh my memory of this model. In point of fact Thomas Severini had a paper on the subject at the time the above comment was written. There’s been another one since.

]]>