## Archive for August, 2008

### R Design Flaws #1 and #2: A Solution to Both?

I’ve previously posted about two design flaws in R. The first post was about how R produces reversed sequences from a:b when a>b, with bad consequences in “for” statements (and elsewhere). The second post was about how R by default drops dimensions in expressions like M[i:j,] when i:j is a sequence only one long (ie, when i equals j).

In both posts, I suggested ways of extending R to try to solve these problems. I now think there is a better way, however, which solves both problems with one simple extension to R. This extension would also make R programs run faster and use less memory. (more…)

### Answers to Applied PhD Comprehensive Question #1

This post links to a Question I set for an applied statistics PhD comprehensive exam. My answers for this question are here (the question is repeated there, so no need to look at the old post).

Note that my answers are more elaborate than I would expect a student to write on an exam. I also gave credit for discussions that showed some insight, even if the final answer wasn’t completely correct.

### Design Flaws in R #2 — Dropped Dimensions

In a comment on my first post on design flaws in the R language, Longhai remarked that he has encountered problems as a result of R’s default behaviour of dropping a dimension of a matrix when you select only one row/column from that dimension. This was indeed the design flaw that I was going to get to next! I think it also points to what is perhaps a deeper design flaw. (more…)

### The Harmonic Mean of the Likelihood: Worst Monte Carlo Method Ever

Many Bayesian statisticians decide which of several models is most appropriate for a given dataset by computing the *marginal likelihood* of each model (also called the *integrated likelihood* or the *evidence*). The marginal likelihood is the probability that the model gives to the observed data, averaging over values of its parameters with respect to their prior distribution. If *x* is the entire dataset and *t* is the entire set of parameters, then the marginal likelihood is

Here, I use *P(x)* and so forth to represent a probability density or mass function, as appropriate. After P(x) has been computed for all models (which may well have different sorts of parameters), the model with largest marginal likelihood is chosen, or more elaborately, predictions might be made by averaging over all models, with weights proportional to their marginal likelihoods (perhaps multiplied by a prior factor for each model).

Use of the marginal likelihood to evaluate models is often rather dubious, because it is very sensitive to any flaws in the specifications of the priors for the parameters of each model. That may be a topic for a future post, but I’ll ignore that issue here, and talk instead about how people try to compute the marginal likelihood.

Computing *P(x)* is difficult, because it’s an integral of what is usually a highly variable function over what is typically a high dimensional parameter space. Of course, for some models the integral is analytically tractable (typically ones with conjugate priors), and if the parameter space is low-dimensional, brute-force numerical integration may work. For most interesting models, however, the only known way of accurately computing P(x) is by applying quite sophisticated Monte Carlo techniques, which have long been researched by physicists in order to solve the essentially identical problem of computing “free energies”. See, for example, Section 6.2 of my review of Probabilistic Inference Using Markov Chain Monte Carlo Methods, and my paper on Annealed Importance Sampling.

These sophisticated techniques aren’t too well-known, however. They also take substantial effort to implement and substantial amounts of computer time. So it’s maybe not surprising that many people have been tempted by what *seems *like a much easier method — compute the harmonic mean of the likelihood with respect to the posterior distribution, using the same Markov Chain Monte Carlo (MCMC) runs that they have usually already done to estimate parameters or make predictions based on each model.

### Applied Statistics PhD Comprehensive Question #1

PhD students in the Dept. of Statistics at the University of Toronto normally write three comprehensive exams at the end of their first year, in Probability, Theoretical Statistics, and Applied Statistics. Below is a question I set for the 2008 exam in Applied Statistics. It may be an interesting exercise for others too. It should in theory be doable by someone with just a good introductory undergraduate course in statistics, including multiple regression. However, many PhD students had difficulty with it, so I wouldn’t say it’s easy.

The question is here. I’ll post my answer in a week or so. I’ll also post a question from 2007 sometime.

**Update:** Here is the post with the answers.