## Answers to Applied PhD Comprehensive Question #1

This post links to a Question I set for an applied statistics PhD comprehensive exam. My answers for this question are here (the question is repeated there, so no need to look at the old post).

Note that my answers are more elaborate than I would expect a student to write on an exam. I also gave credit for discussions that showed some insight, even if the final answer wasn’t completely correct.

Entry filed under: Statistics - Nontechnical. Tags: .

• 1. Bob O'H  |  2008-08-23 at 1:46 pm

For the last part, why not suggest giving the test before the coffee as well? That would be cheaper and easier than testing with both treatments, and would provide at least one type of baseline measure.

• 2. Ryan J. Parker  |  2008-08-23 at 3:42 pm

Thanks for this. The answers are very informative.

• 3. rif  |  2008-08-23 at 10:09 pm

So in the experimental study, if Fred hadn’t bothered to measure the age, then we should have been more willing to accept his conclusion that caffeine improves reaction time?

• 4. govstats  |  2008-08-23 at 10:31 pm

That was alot of fun to read through. Can you possibly reveal the simulating distribution for question 1?

• 5. Radford Neal  |  2008-08-24 at 12:07 am

So in the experimental study, if Fred hadn’t bothered to measure the age, then we should have been more willing to accept his conclusion that caffeine improves reaction time?

Yes, that’s right. If age hadn’t been measured, the best analysis would be the simple t test, which might well have led one to a conclusion that seems like it might be wrong. There’s nothing strange about that, though. It’s similar to how conclusions based on the first 75 subjects might differ from what one would conclude based on all 100 subjects. More information often changes the conclusions, and the earlier conclusions are sometimes wrong. (For that matter, of course, the later conclusions could be wrong, but they’re a better bet than the ones based on less data.)

• 6. Aniko Szabo  |  2008-08-25 at 10:03 am

Problem 2 was a really great question: this issue comes up all the time. But I have to disagree with your statment that View 1 is _wrong_. Ignoring age might not be the _optimal_ analysis method, but it is certainly _valid_ in the sense that the stated type I error rate is preserved. Just as when using only the first 100 observations out of 1000.
The main problem in practice is that clinicians don’t just collect one additional covariate; once they have the patient they will ask him/her everything they can think of “just in case”. Some of those things will be unbalanced between the groups, so what do you do? Adjust for all 20 covariates? You’ll lose any power you ever had (not to mention having to worry whether you adjusted appropriately, or perhaps some non-linear term is needed somewhere). Adjust for those that differ significantly among the groups? You are in the middle of View 2 which does not preserve type I error rates. I regularly see statistical analysis plans that state that they will do exactly this. Ignore all of them? But Dr Neal subtracted points on my test for suggesting that! :)
I know that the real solution is careful design, balancing and explicitly planning to adjust for known predictors, etc, but View 1 – especially if stated in advance of the data collection – is the right way to go quite often.

• 7. Andrew Gelman  |  2008-08-25 at 10:48 am

Hi, Radford. I hope that next time you demonstrate good statistical practice to your students by rounding the numbers and not displaying everything to a zillion decimal places. (But maybe you’re demonstrating good research practice by not wasting time worrying about rounding!)

• 8. Radford Neal  |  2008-08-25 at 11:08 am

Hi Andrew,

I’m not sure what you’re referring to here. I of course didn’t rewrite standard R procedures to round to fewer decimal places (which wouldn’t actually be a good idea in any case). And when I referred to numbers from the R output in my answers, I of course gave the full, unrounded number, since if I didn’t students might not be sure that they had correctly identified which number in the R output I was referring to.

• 9. Radford Neal  |  2008-08-25 at 11:36 am

Aniko,

I’d think that reporting p-values based on only the first 100 observations, when you really have 1000 observations, is not only wrong, it would constitute scientific misconduct! And the same would be true if you had measured age, and knew that age was important, and new that the groups were unbalanced for age, but reported a p-value that didn’t adjust for age anyway.

Now, if you thought that age wasn’t important, then as I say in my answer, you might not adjust for it, since including it would reduce power. This is obviously a bigger concern if you measured 20 things rather than one. A Bayesian could just include all these as additional predictors, with a prior that expressed how likely each is to be relevant. A non-Bayesian could decide ahead of time which are important enough to include, but if they didn’t start thinking about the issue until after they’d looked at the data, appealing to View 1 doesn’t justify ignoring variables that are obviously relevant.

• 10. Andrew Gelman  |  2008-08-25 at 12:18 pm

Hi, Radford. I’m not sure what you mean by “of course” when you say you didn’t rewrite standard R procedures. I think you can set the number of decimal places in R so that by default it doesn’t spew out so many digits. (Actually, I did rewrite the standard functions; I display regression output using display() rather than summary(), which gives more relevant results, I think.)

If a student handed in a result to me referring to a p-value of “0.2284,” I’d make a comment that this sort of precision is meaningless. But, as noted above, I can certainly respect the argument that such niceties aren’t a high priority when writing an exam answer. It’s more disturbing when I see such things in JASA and the like.

• 11. Radford Neal  |  2008-08-25 at 3:56 pm

Andrew. I think you’re taking a bit of an overly religious view of rounding. As I said, even if I could change the number of decimal places displayed by R, I wouldn’t . First of all, I’m not going to confront students on an exam with R output that’s different from what they’re used to seeing. Furthermore, it’s not generally a good idea to round the output to a small number of decimal places.

One reason for this is that the number of decimal places that are desirable for regression coefficients depends on the associated standard errors. I don’t think we want the output to have a different number of decimal places for every coefficient. Beyond that, suppressing all but a few significant figures conceals useful information.

For example, you might fit a regression model with all the data, then fit it with one observation that looks like an outlier removed. With your output with few decimal places you may see that the results are the same with and without this observation removed, and conclude that the observation doesn’t matter. But with a larger number of decimal places displayed, you may see that the results are even then identical with and without the observation. That’s implausible, and tells you that you made a mistake, and didn’t really delete the observation at all.

When it comes to reporting results, one should indeed report that a p-value was 0.015, not 0.0151138, and that a regression coefficient was 0.457 with standard error 0.0017, not 0.457134 with standard error 0.00174332. However, when in doubt, adding one more decimal place is the way to go, since one too few is worse than one too many. (Rounding the p-value above to 0.02 is probably wrong, for instance.) Maybe you would like my 2007 comprehensive exam question better, as it contains lots of rounded results in paragraphs that report results. (I’ll post it sometime.) In this question, you’ll note that the one place where a number is presented in a context of a result report, I did round it.

The places you object to are not reports of results, but arguments that refer to R output. The crucial thing there is to make sure the reader knows which place in the R output is being referenced. Rounding would be counterproductive in this context.

• 12. Andrew Gelman  |  2008-08-26 at 8:22 pm

No, I would not report a p-value as 0.015. I’d just say it’s less than 5% and leave it at that! In any case, I agree that it’s a matter of focus. If you’re doing a lot of programming and debugging, than that 6th significant digit can be useful in revealing changes in an algorithm. If you’re reporting social science research, it’s rare that that third decimal place will ever make a difference. There are costs either way–too few decimals can mean that you’ll miss a key change in a program, whereas too many decimals can reduce the amount of output that you can focus on, so that you can miss a substantive result.

• 13. ZBicyclist  |  2008-08-27 at 10:09 pm

Anika wrote: “The main problem in practice is that clinicians don’t just collect one additional covariate; once they have the patient they will ask him/her everything they can think of “just in case”. Some of those things will be unbalanced between the groups, so what do you do? Adjust for all 20 covariates?”

I’m with Anika. I’m involved in a study in which the researcher has about 250 potential covariates. There are more covariates available, but Excel was used to stage the data, and mercifully that version of Excel only allows 256 columns.

So, in the question as written Mary’s analysis is better, but I’m not sure adding many covariates just because they are handy is always good practice.

• 14. Keith O'Rourke  |  2008-08-31 at 10:41 am

> No, I would not report a p-value as 0.015. I’d just say it’s less than 5%

Andrew, we need to deal with reporting appropriate for WHAT?

For reporting a study as a source of evidence to be contrasted and combined with other sources (i.e. other studies) your rounding above would be far from ideal

For reproducible research (i.e. if you obtained the data and redid the analyses would you get the same results) its also far from ideal as even very small differences could indicate underlying serious errors

But I also not sure what the downside is (to the research community) if a niave reader takes the extra digits seriously (other than identifying themselves as possibly being quantitatively niave)

Keith

August 2008
M T W T F S S
Sep »
123
45678910
11121314151617
18192021222324
25262728293031