Does coverage matter?

In response to Andrew Gelman’s extended April Fool’s diatribe on Objections to Bayesian Statistics, Larry Wasserman commented regarding physicists who want  guaranteed frequentist coverage for their confidence intervals that  “Their desire for frequentist coverage seems well justified. Someday, we can count how many of their intervals trapped the true parameter values and assess the coverage. The 95 percent frequentist intervals will live up to their advertised coverage claims. A trail of Bayesian intervals will, in general, not have this property”.

One thing to note about this statement is that it’s just not true.  Confidence intervals produced in actual scientific research are notorious for not covering the true value, even when they are produced using frequentist recipes.  This is why high-energy physicists insist on such absurdly high confidence levels (or absurdly low p-values) before declaring discoveries — what they call “five sigma” evidence, which corresponds to a p-value of less than 10-6. If taken seriously, quoting such a small p-value would be pointless, since any reader would surely assign a higher probability than that to the possibility that the “discovery” results from fraud or gross incompetence. The high confidence levels demanded are just an ad hoc way of trying to compensate for possible inadequacies in the statistical model used, which can easily make the true coverage probability be much less than advertised (or the true Type I error rate much higher than advertised).

Let’s ignore this, though, since discussions of theory omitting messy practical issues can be valuable.  The next thing to ask, then, is how it is possible that a 90% Bayesian probability interval — which purports to contain the true value with 90% probability —  can contain the true value less than 90% of the time.  A simple example will show how this can happen, and provide insight into whether we should care. (more…)

6 comments March 7, 2009

Downtown with Sky

1 comment October 13, 2008

Answers to Applied PhD Comprehensive Question #2

I’ve been busy with teaching, so I’m only now getting around to posting the answers to the second applied statistic comprehensive exam question that I posted (here).

Originally, I’d thought that posting the answer would be simply a matter of extracting the answer I’d already written before.  But looking it over before posting, I noticed that I’d made an additional unintentional error in the analysis that the question asks you to critique! Of course, one could say that the more errors the better, but I did need to update my answer to reflect this.

The error is that in the first analysis presented, I had intended to include interaction terms between treatment and covariates such as sex in the regression shown.  Due to some sort of momentary brain failure, I instead just put these covariates in by themselves.  When writing up my answer later (which students who wrote the exam got), my mind was set on the idea that I’d put in interaction terms, so this error wasn’t reflected in that answer.  I don’t think this had any significant effect on the marking, fortunately, since the comments on the later analyses aren’t really affected.  It does show how easy it is to keep seeing what you expect to see.

Here’s the PDF file with the answers.  The questions are included as well, so no need to refer back to them.

5 comments October 13, 2008

Design Flaws in R #3 — Zero Subscripts

Unlike the two design flaws I posted about before (here, here, and also here), where one could at least see a reason for the design decision, even if it was unwise, this design flaw is just  incomprehensible.  For no reason at all that I can see, R allows one to use zero as a subscript without triggering an error.  (Remember that in R, indexes for vectors and matrices start at one, not zero.)

This is of course a terrible decision, because it makes debugging harder, and makes it more likely that bugs will exist that have never been noticed. (more…)

12 comments September 21, 2008

Applied Statistics PhD Comprehensive Question #2

PhD students in the Dept. of Statistics at the University of Toronto normally write three comprehensive exams at the end of their first year, in Probability, Theoretical Statistics, and Applied Statistics. Below is a question I set for the 2007 exam in Applied Statistics. It may be an interesting exercise for others too. It should in theory be doable by someone with just a good introductory undergraduate course in statistics, including multiple regression. However, many PhD students had difficulty with it, so I wouldn’t say it’s easy.

The question is here. I’ll post my answer in a week or so.

My previous post with a question from the 2008 exam is here.

Update: Here is the post with the answers.

2 comments September 13, 2008

Amazement

7 comments September 7, 2008

Down Syndrome and Decision Theory

I have a wonderful 11-month-old daughter, who thankfully is entirely healthy. During the pregnancy, my wife and I were of course worried about the possibility of a congenital defect, of which the most prominent is Down Syndrome. Today, couples must make a series of complex decisions — whether to have a screening test for Down Syndrome, whether (based on its result) to have a more risky diagnostic test, and of course, what to do if the final result is that the fetus has Down Syndrome. These decisions depend on moral judgements, on various facts regarding the nature of the fetus at various ages, regarding the nature of Down Syndrome, and regarding the reliability and dangers of the tests, and finally, on the proper way to use this information to make a decision.

This last aspect is in the domain of decision theory, and will be the main focus of this post. Decision theory purports to show how a decision-maker should use the probabilities of the various possible outcomes along with their personal “utilities” for these outcomes to make a rational decision, which maximizes their expected utility. The validity of decision theory as a guide to rational action has often been challenged. The Allais Paradox describes one situation where decision theory does not accord with the judgements of many people, and some argue that the fault is not with these people, but rather with decision theory. Interestingly, Down Syndrome testing involves an analogue of the Allais Paradox. (more…)

9 comments September 7, 2008

R Design Flaws #1 and #2: A Solution to Both?

I’ve previously posted about two design flaws in R. The first post was about how R produces reversed sequences from a:b when a>b, with bad consequences in “for” statements (and elsewhere). The second post was about how R by default drops dimensions in expressions like M[i:j,] when i:j is a sequence only one long (ie, when i equals j).

In both posts, I suggested ways of extending R to try to solve these problems. I now think there is a better way, however, which solves both problems with one simple extension to R. This extension would also make R programs run faster and use less memory. (more…)

13 comments August 25, 2008

Answers to Applied PhD Comprehensive Question #1

This post links to a Question I set for an applied statistics PhD comprehensive exam. My answers for this question are here (the question is repeated there, so no need to look at the old post).

Note that my answers are more elaborate than I would expect a student to write on an exam. I also gave credit for discussions that showed some insight, even if the final answer wasn’t completely correct.

14 comments August 23, 2008

Young Explorer

3 comments August 20, 2008

Previous Posts


Categories

  • Blogroll

  • Feeds