Posts filed under ‘Statistics – Technical’
Critique of “Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period” — Part 4: Modelling R, seasonality, immunity
In this post, fourth in a series (previous posts: Part 1, Part 2, Part 3), I’ll finally talk about some substantive conclusions of the following paper:
Kissler, Tedijanto, Goldstein, Grad, and Lipsitch, Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period, Science, vol. 368, pp. 860-868, 22 May 2020 (released online 14 April 2020). The paper is also available here, with supplemental materials here.
In my previous post, I talked about how the authors estimate the reproduction numbers (R) over time for the four common cold coronavirus, and how these estimates could be improved. In this post, I’ll talk about how Kissler et al. use these estimates for R to model immunity and cross-immunity for these viruses, and the seasonal effects on their transmission. These modelling results inform the later parts of the paper, in which they consider various scenarios for future transmission of SARS-CoV-2 (the coronavirus responsible for COVID-19), whose characteristics may perhaps resemble those of these other coronaviruses.
The conclusions that Kissler et al. draw from their model do not seem to me to be well supported. The problems start with the artifacts and noise in the proxy data and R estimates, which I discussed in Part 2 and Part 3. These issues with the R estimates induce Kissler et al. to model smoothed R estimates, which results in autocorrelated errors that invalidate their assessments of uncertainty. The noise in R estimates also leads them to limit their model to the 33 weeks of “flu season”; consequently, their model cannot possibly provide a full assessment of the degree of seasonal variation in R, which is one matter of vital importance. The conclusions Kissler et al. draw from their model regarding immunity and cross-immunity for the betacoronavirues are also flawed, because they ignore the effects of aggregation over the whole US, and because their model is unrealistic and inconsistent in its treatment of immunity during a season and at the start of a season. A side effect of this unrealistic immunity model is that the partial information on seasonality that their model produces is biased.
After justifying these criticisms of Kissler et al.’s results, I will explore what can be learned using better incidence proxies and R estimates, and better models of seasonality and immunity.
The code I use (written in R) is available here, with GPLv2 licence.
The Harmonic Mean of the Likelihood: Worst Monte Carlo Method Ever
Many Bayesian statisticians decide which of several models is most appropriate for a given dataset by computing the marginal likelihood of each model (also called the integrated likelihood or the evidence). The marginal likelihood is the probability that the model gives to the observed data, averaging over values of its parameters with respect to their prior distribution. If x is the entire dataset and t is the entire set of parameters, then the marginal likelihood is
Here, I use P(x) and so forth to represent a probability density or mass function, as appropriate. After P(x) has been computed for all models (which may well have different sorts of parameters), the model with largest marginal likelihood is chosen, or more elaborately, predictions might be made by averaging over all models, with weights proportional to their marginal likelihoods (perhaps multiplied by a prior factor for each model).
Use of the marginal likelihood to evaluate models is often rather dubious, because it is very sensitive to any flaws in the specifications of the priors for the parameters of each model. That may be a topic for a future post, but I’ll ignore that issue here, and talk instead about how people try to compute the marginal likelihood.
Computing P(x) is difficult, because it’s an integral of what is usually a highly variable function over what is typically a high dimensional parameter space. Of course, for some models the integral is analytically tractable (typically ones with conjugate priors), and if the parameter space is low-dimensional, brute-force numerical integration may work. For most interesting models, however, the only known way of accurately computing P(x) is by applying quite sophisticated Monte Carlo techniques, which have long been researched by physicists in order to solve the essentially identical problem of computing “free energies”. See, for example, Section 6.2 of my review of Probabilistic Inference Using Markov Chain Monte Carlo Methods, and my paper on Annealed Importance Sampling.
These sophisticated techniques aren’t too well-known, however. They also take substantial effort to implement and substantial amounts of computer time. So it’s maybe not surprising that many people have been tempted by what seems like a much easier method — compute the harmonic mean of the likelihood with respect to the posterior distribution, using the same Markov Chain Monte Carlo (MCMC) runs that they have usually already done to estimate parameters or make predictions based on each model.
Inconsistent Maximum Likelihood Estimation: An “Ordinary” Example
The widespread use of the Maximum Likelihood Estimate (MLE) is partly based on an intuition that the value of the model parameter that best explains the observed data must be the best estimate, and partly on the fact that for a wide class of models the MLE has good asymptotic properties. These properties include “consistency” — that as the amount of data increases, the estimate will, with higher and higher probability, become closer and closer to the true value — and, moreover, that the MLE converges as quickly to this true value as any other estimator. These asymptotic properties might be seen as validating the intuition that the MLE must be good, except that these good properties of the MLE do not hold for some models.
This is well known, but the common examples where the MLE is inconsistent aren’t too satisfying. Some involve models where the number of parameters increases with the number of data points, which I think is cheating, since these ought to be seen as “latent variables”, not parameters. Others involve singular probability densities, or cases where the MLE is at infinity or at the boundary of the parameter space. Normal (Gaussian) mixture models fall in this category — the likelihood becomes infinite as the variance of one of the mixture components goes to zero, while the mean is set to one of the data points. One might think that such examples are “pathological”, and do not really invalidate the intuition behind the MLE.
Here, I’ll present a simple “ordinary” model where the MLE is inconsistent. The probability density defined by this model is free of singularities (or any other pathologies), for any value of the parameter. The MLE is always well defined (apart from ties, which occur with probability zero), and the MLE is always in the interior of the parameter space. Moreover, the problem is one-dimensional, allowing easy visualization. (more…)