Trying to think what you might be thinking, however, my guess is that you think that all real data is discrete, not continuous, and hence all real likelihoods are based on probability mass functions, not probability density functions. There’s something to be said for this view, and it does formally eliminate the inconsistency of the MLE in this example if you assume that the data has limited precision.

However, in high-dimensional problems the finite space of possible data sets is extremely large, even assuming individual values are rounded to not-too-much precision. It may then be more enlightening to consider continuous data (even if that’s an unrealizable idealization) than to trust that the MLE is guaranteed to be consistent in finite settings, when convergence to the correct value may in practice occur extremely slowly.

]]>L(Data)=P(Data|Model). For normally distributed IID data, this likelihood represented by the product point sampled Gaussians CAN be a good approximation of un-normalized probability, but aren’t always. Here they are not, for the peaked distribution surrounding the point closest to zero (singling out that point as sigma ->0) .

As likelihood is a probability, L(D) is never >1. I believe the issue here is in a problematic estimate of likelihood, not inconsistency of the MLE.

]]>You claim that consistency of this MLE is “a mathematical fact”, demonstrated by some theorem, which you don’t quote. There are lots of theorems about consistency of the MLE. They all have premises. The conclusion only holds if the premises hold. They don’t in this case.

]]>The ML estimator for the considered likelihood function is perfectly consistent and asymptotically normal (provided t>0). This is a mathematical fact. Consistence or asymptotic normality are theorems, not simulations. In this case, it is. Naturally, the second step is to obtain a bound on the error committed approximating the distribution of the estimator by a normal variable, and that is provided by the Berry-Esseen Theorem, form where you will know that a large n is required, or alternatively if your sample is small (30 or 100) at least you should use an Edgeworth Expansion approximation instead of a normal. So theory works well, but you need to be more careful.

The post shows clearly how misleading is to use simulations as a poor substitute of mathematics, and therefore it does not prove inconsistence, just that sample sizes must be larger. But to show it you need a full simulation. Run a Monte Carlo with 5.000 sample draws, and take n=10.000 and then take a look to the results. There is nothing in the model that affects consistency or asymptotic normality.

]]>http://econ.ucsb.edu/~doug/researchpapers/Testing%20for%20Regime%20Switching%20A%20Comment.pdf

]]>Integration is always with respect ot some measure, which one might call a “prior”, and if you think having a “prior” means you’re “Bayesian”, then my statement is true. You might not think that, of course, but my point was really that you have to add something like a prior to the MLE framework before you can sensibly talk about integration over points near the MLE, as in the first comment.

]]>Just dropped by to refresh my memory of this model. In point of fact Thomas Severini had a paper on the subject at the time the above comment was written. There’s been another one since.

]]>In general, if you know estimator [math]T[/math] has lower MSE than estimator [math]U[/math] for all possible values of the unknown parameters, you should choose [math]T[/math] over [math]U[/math]. (If squared error loss is not appropriate, substitute …

]]>