The Puzzling Linearity of COVID-19
2020-04-23 at 3:08 pm 10 comments
We all understand how the total number of cases of COVID-19 and the total number of deaths due to COVID-19 are expected to grow exponentially during the early phase of the pandemic — every infected individual is in contact with others, who are unlikely to themselves be infected, and on average infects more than one of them, leading to the number of cases growing by a fixed percentage every day. We also know that this can’t go on forever — at some point, many of the people in contact with an infected individual have already been infected, so they aren’t a source of new infections. Or alternatively, people start to take measures to avoid infection.
So we expect that on a logarithmic plot of the cumulative number of cases or deaths over time, the curve will initially be a straight line, but later start to level off, approaching a horizontal line when there are no more new cases or deaths (assuming the disease is ultimately eliminated). And that’s what we mostly see in the data, except that we haven’t achieved a horizontal line yet.
On a linear plot of cases or deaths over time, we expect an exponentially rising curve, which also levels off eventually, ultimately becoming a horizontal line when there are no more cases or deaths. But that’s not what we see in much of the data.
Instead, for many countries, the linear plots of total cases or total deaths go up exponentially at first, and then approach a straight line that is not horizontal. What’s going on?
Before trying to answer this question, I’ll first illustrate the issue with some plots taken from https://www.worldometers.info/coronavirus/.
Here’s the linear plot of cases in the UK:
We see how the initial exponential rise changes into a linear rise from about April 3. We can also look at the daily number of cases:
From around April 3, the initial exponential rise in new cases changes to an approximately constant number of new cases each day.
It’s the same for deaths, with a small time lag:
Browsing around worldometers shows that the plots for many (though not all) countries are similar. For instance, Canada:
And the United States:
It’s not too hard to come up with a reason for the number of cases to rise linearly. One need only assume that the country has a limited, and fixed, testing capacity. Once the number of probable cases reaches this capacity, the number of confirmed cases can’t rise any faster than they can be tested, so even if the true number of cases is still rising exponentially, the number of reported cases rises only linearly. Now, one might instead think that testing capacity is being increased, or that cases are being reported as COVID-19 based on symptoms alone, without a confirming test, so this isn’t a completely convincing reason for linearity. But considering that the relationship of number of reported cases to number of actual cases may be rather distant, it’s maybe not too interesting to think about this further.
A linear rise in the number of deaths seems harder to explain.
The analogue of limited testing capacity would be limited hospital capacity. But for that to produce a linear rise in reported deaths, we’d need to assume not only that hospital capacity has been exceeded (which seems not to be true in most places) but also that only the deaths from COVID-19 that occur in hospitals are reported.
It’s possible to imagine situations where the true number of deaths rises exponentially at first, but then linearly. For example, people could live along a river. Infections start among people at the river’s mouth, and expand exponentially amongst them, until most of them are infected. It also spreads up the river, but only by local contagion, so the number of deaths (and cases) grows linearly according to how far up-river it has spread. This scenario, however, seems nothing like what we would expect in almost all countries.
Most countries have taken various measures to slow the spread of COVID-19. We might expect that in some countries, these measures are insufficient, and that the growth in total deaths (and daily deaths) is still exponential, just with a longer doubling time. We might expect that in some other countries, the measures are quite effective, so that the number of new deaths is now declining exponentially, with the plot of total deaths levelling off to a horizontal line. To get a linear growth in number of deaths, the measures taken would need to be just effective enough, but no more, that they lead to a constant number of deaths per day, neither growing or shrinking. Considering that various disparate measures have been taken, it seems like an unlikely coincidence for their net effect to just happen to be a growth rate of zero.
What’s left? Well, it could be that the growth in total deaths is not really linear, but is instead now growing exponentially with a very long doubling time, or is instead levelling off, but very slowly.
Certainly there are some countries where total deaths seem to be levelling off, such as Spain:
Maybe if we only looked at these plots up until about April 7, we’d see the growth in total deaths as being nearly linear (after about March 26), and the number of daily deaths as being almost constant.
But the number of countries where growth in total deaths is almost linear seems more than I’d expect. Could these countries have very precisely calibrated the magnitude of their infection control measures to keep the number of new cases constant, thereby avoiding overwhelming their health care systems at minimal social and economic cost? This seems unlikely to me. I’m puzzled.
Entry filed under: COVID-19, Science, Statistics, Statistics - Nontechnical.
1. Aaron Galloway | 2020-04-24 at 8:19 am
Dear Dr. Neal,
I’m curious if the apparent linear rise of cases (actually confirmed cases as per testing) may be an artifact of both the lag time from the moment individuals are actually infected to where they may be exhibiting symptoms or are tested, given an incubation estimated to be about 5-14 days.
If so, and given the inconsistent timing of government interventions across the globe, we might expect to see a “bending” of the rise in cases into a more horizontal line in, say, around May.
2. Radford Neal | 2020-04-24 at 9:02 am
Yes, it could be that the curve is bending down from exponential growth to horizontal, but slowly enough that for a while it seems to be going up linearly. This might be more likely given the variable incubation time (and time to death when fatal), which would have the effect of smoothing out the impact of a sudden “lockdown”.
But if you look right now at the world totals for cases and deaths at https://www.worldometers.info/coronavirus/ it’s really hard to think that the strikingly linear growth since about March 30 can be explained so simply. Perhaps the staggered timing of interventions that you mention could explain this, if they just by coincidence have the combined effect of producing a linear curve (for a while), but this seems a bit unlikely.
3. Radford Neal | 2020-04-24 at 9:07 pm
There’s interesting discussion of this post at https://www.lesswrong.com/posts/QTXvG3MxrZqafT4ir/the-puzzling-linearity-of-covid-19
4. Ken | 2020-04-26 at 9:41 pm
One difficulty that some counties may be having is pushing their transmission rate much below one, so they tend to have fairly flat case rates, at the peak. The death rates will then be delayed and spread out looking even flatter. The UK and USA look like this.
Another interesting feature of the data is that often case rate curves will rise very rapidly early, as a country realises that they have an epidemic and will put resources into finding cases, so will find existing as well as new cases.
5. Ken | 2020-04-28 at 6:15 am
Saw today that Germany had their transmission rate down to 0.7 but then relaxed their restrictions and it is now 1.0. It does seem that it is quite difficult to get it below 1.0. In Australia we are seeing that people make their own decision on what they think is appropriate levels of restrictions, as the rate of new cases declines to low levels. The good news is that if we go to a transmission rate of 1 then it won’t be a problem, as we are at under 20 cases per day.
6. Radford Neal | 2020-04-28 at 9:34 am
Peter McCluskey makes an interesting comment on this post at lesswrong pointing out that after restrictions start, there will still be transmission within households for a while. Since most households have only two adults (and children will probably not be noted as having covid-19), this would for a while lead to R seeming to be 1. So we can hope that that’s why getting it below 1 seems to be difficult at the moment.
7. Yves Moreau | 2020-05-27 at 9:22 pm
I have also been puzzled that the effective reproduction number has seemed to be close to 1 after measures were taken. I have not assumed linearity (R=1), but simply close to it. In particular, one might have expected that the drastic measures taken should have brought R much lower. R is usually qualitatively described as C x P x D (number of contacts per day x probability that contact is infectious x number of days that a patient is infectious). Given the multiple interventions tackling each of these factors, it is surprising that R went from 2 to 4 pre-lockdown only to 0.7 to 1 post-lockdown. At least one factor that may explain this is that by confining people in their homes they will infect family members more effectively than pre-lockdown. The idea of the lockdown is that intrafamilial contamination should stop at the family, so that after lockdown we should only see a fixed multiplicative factor but no exponential growth. This may have happened in Wuhan because of the severity of the measures taken (e.g., only one person allowed to leave the home for shopping every other day). But in other countries, some people still went to work, some people still went out despite the lockdown. If extrafamilial contaminations go down but intrafamilial contaminations go up, there will be a “buffering effect” that counterbalances the decrease in R. Such an effect might partly explain why it has been hard to bring R closer to zero.
8. ecoquant | 2020-05-27 at 10:12 pm
@Yves Moreau,
Some of your puzzlement may be explained by the attendant simplified model of transmission. Reproduction number is essentially a Poisson lambda mean. In fact, disease transmission is characterized by that parameter, and a variance, variously called a concentration or a dispersion parameter, and the distribution is the Negative Binomial. So it can be overdispersed.
See:
J. O. Lloyd-Smith, S. J. Schreiber, P. E. Kopp & W. M. Getz, “Superspreading and the effect of individual variation on disease emergence“, Vol 438|17 November 2005|doi:10.1038/nature04153
9. Radford Neal | 2020-07-01 at 9:40 pm
A comment by “SoerenMind” on the lesswrong link to this post (https://www.lesswrong.com/posts/QTXvG3MxrZqafT4ir/the-puzzling-linearity-of-covid-19) points to the following highly relevant paper:
https://www.medrxiv.org/content/10.1101/2020.05.22.20110403v1
10. ecoquant | 2020-07-01 at 10:26 pm
I don’t understand the continuing emphasis upon counts of positive cases. As indicated, there are many problems with believing these to be actual measures of infection prevalence. Ideally, we’d like to do something like random survey sampling to estimate prevalence in population. Without the investment in that, there are proxies which, while much cruder, could indicate, such as the proportion infected of the number of tests conducted. Then, there are fused estimates which consider linear combinations of such proportions and proportions of antibody tests showing positives and deaths. There’s a suite of network sampling techniques like NSUM which, to my knowledge, have yet to be applied to ascertain covert populations of infected people.
It’s not like the literature hasn’t discussed these:
Russell Timothy W , Hellewell Joel , Jarvis Christopher I , van Zandvoort Kevin , Abbott Sam , Ratnayake Ruwan , CMMID COVID-19 working group , Flasche Stefan, Eggo Rosalind M , Edmunds W John , Kucharski Adam J . “Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted
data from the outbreak on the Diamond Princess cruise ship”, February 2020. Euro Surveill. 2020;25(12):pii=2000256. https://doi.org/10.2807/1560-7917. ES.2020.25.12.2000256.
T. Jombart, et al, “Inferring the number of COVID-19 cases from recently reported deaths” [version 1; peer review: 2 approved], Wellcome Open Research 2020, 5:78 Last updated: 26 MAY 2020.
T. W. Russell, et al, “Using a delay-adjusted case fatality
ratio to estimate under-reporting”, https://fondazionecerm.it/wp-content/uploads/2020/03/Using-a-delay-adjusted-case-fatality-ratio-to-estimate-under-reporting-_-CMMID-Repository.pdf.
Given that infections are far from Poisson events, being overdispersed because of the superspreader phenomenon, it seems a good deal more investigation of those long tails would be warranted. After all, the nice thing about Poisson statistics is that they imply a certain stability and predictability in outcome. Forcing a Poisson model on top of an actually Negative Binomial model with a big variance means the
of the Poisson is going to be exaggerated. Sure, it looks like the Poisson is exaggerating. But, in fact, there’s bigger latent risk: Can’t know how the big tail events are going to behave.
Indeed, if there’s anything specific to be criticized about Imperial College is that they did not acknowledge this feature of epidemics in their analysis. The superspreader phenomenon has been known for a while, since 2000 at least. See
J. O. Lloyd-Smith, S. J. Schreiber, P. E. Kopp & W. M. Getz, “Superspreading and the effect of individual variation on disease emergence”, Nature,438(17), November 2005, doi:10.1038/nature04153.
And see its references.