## Critique of ‘Debunking the climate hiatus’, by Rajaratnam, Romano, Tsiang, and Diffenbaugh

*2016-01-10 at 11:43 pm* *
31 comments *

Records of global temperatures over the last few decades figure prominently in the debate over the climate effects of CO2 emitted by burning fossil fuels, as I discussed in my first post in this series, on What can global temperature data tell us? One recent controversy has been whether or not there has been a `pause’ (also referred to as a `hiatus’) in global warming over the last 15 to 20 years, or at least a `slowdown’ in the rate of warming, a question that I considered in my second post, on Has there been a `pause’ in global warming?

As I discussed in that post, the significance of a pause in warming since around 2000, after a period of warming from about 1970 to 2000, would be to show that whatever the warming effect of CO2, other factors influencing temperatures can be large enough to counteract its effect, and hence, conversely, that such factors could also be capable of enhancing a warming trend (eg, from 1970 to 2000), perhaps giving a misleading impression that the effect of CO2 is larger than it actually is. To phrase this more technically, a pause, or substantial slowdown, in global warming would be evidence that there is a substantial degree of positive autocorrelation in global temperatures, which has the effect of rendering conclusions from apparent temperature trends more uncertain.

Whether you see a pause in global temperatures may depend on which series of temperature measurements you look at, and there is controversy about which temperature series is most reliable. In my previous post, I concluded that even when looking at the satellite temperature data, for which a pause seems most visually evident, one can’t conclude definitely that the trend in yearly average temperature actually slowed (ignoring short-term variation) in 2001 through 2014 compared to the period 1979 to 2000, though there is also no definite indication that the trend has not been zero in recent years.

Of course, I’m not the only one to have looked at the evidence for a pause. In this post, I’ll critique a paper on this topic by Bala Rajaratnam, Joseph Romano, Michael Tsiang, and Noah S. Diffenbaugh, Debunking the climate hiatus, published 17 September 2015 in the journal Climatic Change. Since my first post in this series, I’ve become aware that `tamino’ has also commented on this paper, here, making some of the same points that I will make. I’ll have more to say, however, some of which is of general interest, apart from the debate on the `pause’ or `hiatus’.

First, a bit about the authors of the paper, and the journal it is published in. The authors are all at Stanford University, one of the world’s most prestigious academic institutions. Rajaratnam is an Assistant Professor of Statistics and of Environmental Earth System Science. Romano is a Professor of Statistics and of Economics. Diffenbaugh is an Associate Professor of Earth System Science. Tsiang is a PhD student. Climatic Change appears to be a reputable refereed journal, which is published by Springer, and which is cited in the latest IPCC report. The paper was touted in popular accounts as showing that the whole hiatus thing was mistaken — for instance, by Stanford University itself.

You might therefore be surprised that, as I will discuss below, this paper is completely wrong. Nothing in it is correct. It fails in every imaginable respect.

To start, here is the data they analyse, taken from plots in their Figure 1:

The second plot is a closeup of data from the first plot, for years from 1998 to 2013.

Rajaratnam, et al. describe this data as “the NASA-GISS global mean land-ocean temperature index”, which is a commonly used data set, discussed in my first post in this series. However, the data plotted above, and which they use, is not actually the GISS land-ocean temperature data set. It is the GISS land-only data set, which is less widely used, since as GISS says, it “overestimates trends, since it disregards most of the dampening effects of the oceans”. They appear to have mistakenly downloaded the wrong data set, and not noticed that the vertical scale on their plot doesn’t match plots in other papers showing the GISS land-ocean temperature anomalies. (They also apply their methods to various other data sets, claiming similar results, but only results from this data are shown in the paper.)

GISS data sets continually change (even for past years), and I can’t locate the exact version used in this paper. For the 1998 to 2013 data, I manually digitized the plot above, obtaining the following values:

`1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
0.84 0.59 0.57 0.68 0.80 0.78 0.69 0.88 0.78 0.86 0.65 0.79 0.93 0.78 0.76 0.82
`

In my analyses below, I will use these values when only post-1998 data is relevant, and otherwise use the closest matching GISS land-only data I can find.

Before getting into details, we need to examine what Rajaratnam, et al. think are the questions that need to be addressed. In my previous post, I interpreted debate about a `pause’, or `hiatus’, or `slowdown’ as really being about the degree of long-term autocorrelation in the temperature series. If various processes unrelated to CO2 affect temperatures for periods of decades or more, judging the magnitude of the effect of CO2 will be more difficult, and in particular, the amount of recent warming since 1970 might give a misleadingly high impression of how much effect CO2 has on temperature. A hiatus of over a decade, following a period of warming, while CO2 continues to increase, would be evidence that such other effects, operating on long time scales, can be large enough to temporarily cancel the effects of CO2.

Rajaratnam, et al. seem to have some awareness that this is the real issue, since they say that the perceived hiatus has “inspired valuable scientific insight into the processes that regulate decadal-scale variations of the climate system”. But they then ignore this insight when formulating and testing four hypotheses, each of which they see as one possible way of formalizing a claim of a hiatus. In particular, they emphasize that they “attempt to properly account for temporal dependence”, since failure to do so can “lead to erroneous scientific conclusions”. This would make sense if they were accounting only for short-term auto-correlations. Accounting for long-term autocorrelation makes no sense, however, since the point of looking for a pause is that it is a way of looking for evidence of long-term autocorrelation. To declare that there is no pause on the grounds that it is not significant once long-term autocorrelation is accounted for would miss the entire point of the exercise.

So Rajaratnam, Romano, Tsiang, and Diffenbaugh are asking the wrong questions, and trying to answer them using the wrong data. Let’s go on to the details, however.

The first hypothesis that Rajaratnam, et al. test is that temperature anomalies from 1998 to 2013 have zero or negative trend. Here are the anomalies for this period (data shown above), along with a trend line fit by least-squares, and a horizontal line fit to the mean:

Do you think one can confidently reject the hypothesis that the true trend during this period is zero (or negative) based on these 16 data points? I think one can get a good idea with only a few seconds of thought. The least-squares fit must be heavily influenced by the two low points at 1999 and 2000. If there is actually no trend, the fact that both these points are in the first quarter of the series would have to be ascribed to chance. How likely is that? The answer is 1/4 times 1/4, which is 1/16, which produces a one-sided p-value over 5%. As Rajaratnam, et al. note, the standard regression t-test of the hypothesis that the slope is zero gives a two-sided p-value of 0.102, and a one-sided p-value of 0.051, in rough agreement.

Yet Rajaratnam, et al. conclude that there is “overwhelming evidence” that the slope is positive — in particular, they reject the null hypothesis based on a two-sided p-value of 0.019 (though personally I would regard this as somewhat less than overwhelming evidence). How do they get the p-value to change by more than a factor of five, from 0.102 to 0.019? By accounting for autocorrelation.

Now, you may not see much evidence of autocorrelation in these 16 data points. And based on the appearance of the rest of the series, you may think that any autocorrelation that is present will be positive, and hence make any conclusions *less* certain, not *more* certain, a point that the authors themselves note on page 24 of the supplemental information for the paper. What matters, however, is autocorrelation in the residuals — the data points minus the trend line. Of course, we don’t know what the true trend is. But suppose we look at the residuals from the least-squares fit. We get the following autocorrelations (from Fig. 8 in the supplemental information for the paper):

As you can see, the autocorrelation estimates at lags 1, 2, and 3 are negative (though the dotted blue lines show that the estimates are entirely consistent with the null hypothesis that the true autocorrelations at lags 1 and up are all zero).

Rajaratnam, et al. first try to account for such autocorrelation by fitting a regression line using an AR(1) model for the residuals. I might have thought that one would do this by finding the maximum likelihood parameter estimates, and then employing some standard method for estimating uncertainty such as looking at the observed information matrix. But Rajaratham, et al. instead find estimates using a method devised by Cochrane and Orcutt in 1949, and then estimate uncertainty by using a block bootstrap procedure. They report a two-sided p-value of 0.075, which of course makes the one-sided p-value significant at the conventional 5% level.

In the supplemental information (page 24), one can see that the reported p-value of 0.075 was obtained using a block size of 5. Oddly, however, a smaller p-value of 0.005 was obtained with a block size of 4. One might suspect that the procedure becomes more dubious with larger block sizes, so why didn’t they report the more significant result with the smaller block size (or perhaps report the p-value of 0.071 obtained with a block size of 3)?

The paper isn’t accompanied by code that would allow one to replicate its results, and I haven’t tried to reproduce this method myself, partly because they seem to regard it as inferior to their final method.

This final method of testing the null hypothesis of zero slope uses a circular block bootstrap without an AR(1) model, and yields the two-sided p-value of 0.019 mentioned above, which they regard as overwhelming evidence against the slope being zero (or negative). I haven’t tried reproducing their result exactly. There’s no code provided that would establish exactly how their method works, and exact reproduction would anyway be impossible without knowing what random number seed they used. But I have implemented my interpretation of their circular bootstrap method, as well as a non-circular block bootstrap, which seems better to me, since considering this series to be circular seems crazy.

I get a two-sided p-value of 0.023 with a circular bootstrap, and 0.029 with a non-circular bootstrap, both using a blocksize of 3, which is the same as Rajaratnam, et al. used to get their p-value of 0.019 (supplementary information, page 26). My circular bootstrap result, which I obtained using 20000 bootstrap replications, is consistent with their result, obtained with 1000 bootstrap replications, because 1000 replications is not really enough to obtain an accurate p-value (the probability of getting 0.019 or less if the true p-value is 0.023 is 23%).

So I’ve more-or-less confirmed their result that using a circular bootstrap on residuals of a least-squares fit produces a p-value indicative of a significant positive trend in this data. But you shouldn’t believe that this result is actually *correct*. As I noted at the start, just looking at the data one can see that there is no significant trend, and that there is no noticeable autocorrelation that might lead one to revise this conclusion. Rajaratam, et al. seem devoid of such statistical intuitions, however, concluding instead that “applying progressively more general statistical techniques, the scientific conclusions have progressively strengthened from “not significant,” to “significant at the 10 % level,” and then to “significant at the 5 % level.” It is therefore clear that naive statistical approaches can possibly lead to erroneous scientific conclusions.”

One problem with abandoning such “naive” approaches in favour of complex methods is that there are many complex methods, not all of which will lead to the same conclusion. Here, for example, are the results of testing the hypothesis of zero slope using a simple permutation test, and permutation tests based on permuting blocks of two successive observations (with two possible phases):

The simple permutation test gives the same p-value of 0.102 as the simple t-test, and looking at blocks of size two makes little difference. And here are the results of a simple bootstrap of the original observations, not residuals, and the same for block bootstraps of size two and three:

We again fail to see the significant results that Rajaratnam, et al. obtain with a circular bootstrap on residuals.

I also tried three Bayesian models, with zero slope, unknown slope, and unknown slope and AR(1) residuals. The log marginal likelihoods for these models were 11.2, 11.6, and 10.7, so there is no strong evidence as to which is better. The two models with unknown slope gave posterior probabilities of the slope being zero or negative of 0.077 and 0.134. Interestingly, the posterior distribution of the autoregressive coefficient in the model with AR(1) residuals showed a (slight) preference for a positive rather than a negative coefficient.

So how do Rajaratnam, et al. get a significant non-zero slope when all these other methods don’t? To find out, I tested my implementations of the circular and non-circular residual bootstrap methods on data sets obtained by randomly permuting the observations in the actual data set. I generated 200 permuted data sets, and applied these methods (with 20000 bootstrap samples) to each, then plotted a histogram of the 200 p-values produced. I also plotted the lag 1 autocorrelations for each series of permuted observations and each series of residuals from the least-squares fit to those permuted observations. Here are the results:

Since the observations are randomly permuted, the true trend is of course zero in all these data sets, so the distribution of p-values for a test of a zero trend ought to be uniform over (0,1). It actually has a pronounced peak between 0 and 0.05, which amplifies the probability of a `significant’ result by a factor of more than 3.5. If one adjusts the p-values obtained on the real data for this non-uniform distribution, the adjusted p-values are 0.145 and 0.130 for the non-circular and circular residual bootstrap methods. The `significant’ results of Rajaratnam, et al. are a result of the method they use being flawed.

One possible source of the wrong results can be seen in the rightmost plot above. On randomly permuted data, the lag-1 autocorrelations are biased towards negative values, more so for the residuals than for the observations themselves. This effect should be negligible for larger data sets. Using a block bootstrap method with only 16 observations may be asking for trouble.

Finally, if one looks at the context of these 16 data points, it becomes clear why there are two low temperature anomaly values in 1999 and 2000, and consequently a positive (albeit non-significant) trend in the data. Here are the values of the Multivariate ENSO Index over the last few decades:

Note the peak index value in the El Nino year of 1998, which explains the relatively high temperature anomaly that year, and the drop for the La Nina years thereafter, which explains the low temperature anomalies in 1999 and 2000. The ENSO index shows no long-term trend, and is probably not related to any warming from CO2. Any trend based on these two low data points being near the beginning of some time period is therefore meaningless for any discussion of warming due to CO2.

The second hypothesis that Rajaratnam, et al. test is that the trend from 1998 to 2013 is at least as great as the trend from 1950 to 1997. Rejection of this hypothesis would support the claim of a `slowdown’ in global warming in recent years. They find estimates and standard errors (assuming independent residuals) for the slopes in separate regressions for 1950-1997 and for 1998-2013, which they then combine to obtain a p-value for this test. The (one-sided) p-value they obtain is 0.210. They therefore claim that there is no statistically signficant slowdown in the recent trend (since 1998).

I can’t replicate this result exactly, since I can’t find the exact data set they used, but on what seems to be a similar version of the GISS land-only data, I get a one-sided p-value of 0.173, which is similar to their result.

This result is meaningless, however. As one can see in the plot from their paper shown above, there was a `pause’ in the temperature from 1950 to 1970, which makes the slope from 1950 to 1997 be less than it was in the decades immediately before 1998. Applying the same procedure but with the first period being from 1970 to 1997, I obtain a one-sided p-value of 0.027, which one might regard as statistically significant evidence of a slowdown in the trend.

Rajaratnam, et al. are aware of the sensitivity of their p-value to the start date of the first period, since in a footnote on page 27 of the supplemental information for the paper they say

Changing the reference period from 1950-1997 to 1880-1997 only strengthens the null hypothesis of no difference between the hiatus period and before. This follows from the fact that the trend during 1880-1997 is more similar to the trend in the hiatus period. Thus the selected period 1950-1997 can be regarded as a lower bound on p-values for tests of difference in slopes.

Obviously, it is not actually a lower bound, since the 1970-1997 period produces a lower value.

Perhaps they chose to start in 1950 because that is often considered the year by which CO2 levels had increased enough that one might expect to see a warming effect. But clearly (at least in this data set) there was no warming until around 1970. One might consider that only in 1970 did warming caused by CO2 actually start, in which case comparing to the trend from 1950 to 1997 (let alone 1880 to 1997) is deceptive. Alternatively, one might think that CO2 did have an effect starting in 1950, in which case the lack of any increase in temperatures from 1950 to 1970 is an early instance of a `pause’, which *strengthens*, rather than weakens, the claim that the effect of CO2 is weak enough that other factors can sometimes counteract it, and at other times produce a misleadingly large warming trend that should not all be attributed to CO2.

Rajaratnam, et al. also test this second hypothesis another way, by seeing how many 16-year periods starting after 1950 and ending before 1998 have a trend at least as small as the trend over 1998-2013. The results are summarized in these plots, from their Fig. 2:

Since 13 of these 33 periods have lower trends than that of 1998-2013, they arrive at a p-value of 13/33=0.3939, and conclude that “the observed trend during 1998–2013 does not appear to be anomalous in a historical context”.

However, in the supplemental information (page 29) they say

Having said this, from Figure 2 there is a clear pattern in the distribution of 16 year linear trends over time: all 16 year trends starting at 1950 all the way to 1961 are lower than the trend during hiatus period, and all 16 year linear trends starting at years 1962 all the way to 1982 are higher than the trend during the hiatus period, with the exception of the 1979-1994 trend.

That is the end of their discussion. It does not appear to have occurred to them that they should use 1970 as the start date, which would produce a p-value of 1/13=0.077, which would be at least some weak evidence of a slowdown starting in 1998.

The third hypothesis tested by Rajaratnam, et al. is that the expected global temperature is the same at every year from 1998 on — that global warming has `stalled’. Of course, as they briefly discuss in the supplemental information (page 31), this is the same as their first hypothesis — that the global temperature has a linear trend with slope zero from 1998 on. Their discussion of this hypothesis, and why it isn’t really the same as the first, makes no sense to me, so I will say no more about it, except to note that strangely no numerical p-values are given for the tests of this hypothesis, only `reject’ or `retain’ descriptions.

The fourth, and final, hypothesis tested in the paper is that the distribution of year-to-year differences in temperature is the same after 1998 as during the period from 1950 to 1998. Here also, their discussion of their methods is confusing and incomplete, and again no numerical p-values are given. It is clear, however, that their methods suffer from the same unwise choice of 1950 as the start of the comparison period as was the case for their test of the second hypothesis. Their failure to find a difference in the distribution of differences in the 1998-2013 period compared to the 1950-1997 period is therefore meaningless.

Rajaratnam, Romano, Tsiang, and Diffenbaugh conclude by summarizing their results as follows:

Our rigorous statistical framework yields strong evidence against the presence of a global warming hiatus. Accounting for temporal dependence and selection effects rejects — with overwhelming evidence — the hypothesis that there has been no trend in global surface temperature over the past ≈15 years. This analysis also highlights the potential for improper statistical assumptions to yield improper scientific conclusions. Our statistical framework also clearly rejects the hypothesis that the trend in global surface temperature has been smaller over the recent ≈ 15 year period than over the prior period. Further, our framework also rejects the hypothesis that there has been no change in global mean surface temperature over the recent ≈15 years, and the hypothesis that the distribution of annual changes in global surface temperature has been different in the past ≈15 years than earlier in the record.

This is all wrong. There is not “overwhelming evidence” of a positive trend in the last 15 years of the data — they conclude that only because they used a flawed method. They do not actually reject “the hypothesis that the trend in global surface temperature has been smaller over the recent ≈ 15 year period than over the prior period”. Rather, after an incorrect choice of start year, they fail to reject the hypothesis that the trend in the recent period has been equal to or greater than the trend in the prior period. Failure to reject a null hypothesis is not the same as rejecting the alternative hypothesis, as we try to teach students in introductory statistics courses, sometimes unsuccessfully. Similarly, they do not actually reject “the hypothesis that the distribution of annual changes in global surface temperature has been different in the past ≈15 years than earlier in the record”. To anyone who understands the null hypothesis testing framework, it is obvious that one could not possibly reject such a hypothesis using any finite amount of data.

Those familiar with the scientific literature will realize that completely wrong papers are published regularly, even in peer-reviewed journals, and even when (as for this paper) many of the flaws ought to have been obvious to the reviewers. So perhaps there’s nothing too notable about the publication of this paper. On the other hand, one may wonder whether the stringency of the review process was affected by how congenial the paper’s conclusions were to the editor and reviewers. One may also wonder whether a paper reaching the opposite conclusion would have been touted as a great achievement by Stanford University. Certainly this paper should be seen as a reminder that the reverence for “peer-reviewed scientific studies” sometimes seen in popular expositions is unfounded.

The results above can be reproduced by first downloading the data using this shell script (which downloads other data too, that I use in other blog posts), or manually download from the URLs it lists if you don’t have wget. You then need to download my R script for the above analysis and this R source file (renaming them to .r from the .doc that wordpress requires), and then running the script in R as described in its opening comments (which will take quite a long time).

Entry filed under: R Programming, Science, Society, Statistics, Statistics - Nontechnical.

## 31 Comments Add your own

### Leave a Reply to ...and Then There's Physics Cancel reply

Trackback this post | Subscribe to the comments via RSS Feed

1.Phil Chapman | 2016-01-11 at 2:32 amNice critique. Do you plan to submit this to a journal as a paper?

2.Radford Neal | 2016-01-11 at 7:50 amThanks. I don’t intend to submit this anywhere. I’m planning to move on to further posts in this series.

3.Phil Chapman | 2016-01-11 at 12:53 pmThat’s a shame (but I understand why). Journals such as F1000Research are open access and use open review and are cheap (and quick) to publish in. Although a ‘letter to the editor’ at the original journal would also be useful.

4.Larry Wasserman | 2016-01-11 at 9:50 amGreat post Radford.

Like Phil, I think you should submit this to a journal.

I think critiques like this are important for science

and need to reach a lot of people.

–Larry Wasserman

5.Bruce McCullough | 2016-01-11 at 2:30 pmThe below indicates that what they have done is not science: science by definition is reproducible. I strongly encourage you to write this up and submit it somewhere.

“I haven’t tried reproducing their result exactly. There’s no code provided that would establish exactly how their method works, and exact reproduction would anyway be impossible without knowing what random number seed they used. “

6.anspiess | 2016-01-11 at 3:27 pmI second the opinion that you should publish this! The best would be a “Letter to Editor” that is a direct response to the paper.

This is actually a topic that is perfectly suited for my “Detection of influential data points in Linear Regression” Shinyapp at

https://anspiess.shinyapps.io/influence

I used your values above and got the results at

http://www.dr-spiess.de/temp/infl_plot.pdf

for the first 3 highest influential data points and the corresponding regression lines when removed.

The shown data for R-square, p-value, intercept, slope and confidence interval of slope is always all observations/first/second/third highest removed. The highest influencer (in terms of Cook’d distance) is actually 1998! When removed, p-value of slope gets significant. When removing 1999/2000 (third and second highest, respectively), p-value increases and R-square drops.

Cheers, Andrej

7.Radford Neal | 2016-01-11 at 3:44 pmRajaratnam, et al. note that the slope is significant when the 1998 point is removed, and also (though less so, p-value 0.090 in a simple t-test) when both 1998 and 1999 are removed. All of 1998, 1999, and 2000 are influenced by El Nino / La Nina, however, so this is rather meaningless. If you start in 2001, the p-value for a simple t-test is 0.350. On the other hand, going back to 1997 does also produce a significant positive slope. There is certainly a cherry picking issue here, but I think moving the cherries around within a problematic El Nino / La Nina period doesn’t accomplish anything.

8.Andy | 2016-01-11 at 6:56 pmExcellent post. To the other commenters: I don’t understand why it would help for him to publish this in a journal. It is likely that this blog post will receive a higher readership anyway, and moreover he released the code to reproduce it, so it’s not like we have much reason to doubt whether it is accurate or not. We are in the middle of the move towards the post-journal era, and for important questions like climate change, we don’t always have time to wait for the sluggish journal process. So I would urge Dr. Neal to spend his time on the next blog post rather than wasting his time submitting this to a journal, although it is clearly journal-level quality work.

9.douglasproctor | 2016-01-12 at 10:55 amI disagree, Andy. Anything correcting an incorrect paper needs publication in a mainstream, peer-reviewed publication. The rules of the science game require it for serious consideration as a rebuttal. It can’t be recognized by the IPCC etc if it is in just a blog.

10.gyges01 (@gyges01) | 2016-02-07 at 5:18 pmI agree with Andy. We’re looking at the death of scientific journals, why should the blog author try and inject any life into a corpse?

11.gyges01 (@gyges01) | 2016-02-07 at 5:21 pmThe ‘rules of the science game’ are changing such that ‘peer-review publication’ is required for anything. What you’re looking at _is_ peer review with the exception that the peers are open about their review comments. We’re currently looking at the wikipediation of science, this is just an example of it.

12.Richard Tol (@RichardTol) | 2016-01-12 at 12:04 pmRadford: Beware. You can find any amount of statistical nonsense about climate change published in the peer-reviewed literature. While it is tempting — and I have been tempted — to counter this, it is not for your humour and it is not good for your career.

13.Editor of the Fabius Maximus website | 2016-01-12 at 1:41 pmI agree with the others that this brilliant work deserves to be published, even more so given the importance of the associated public policy debate.

Also, have you communicated this to the authors?

14.Radford Neal | 2016-01-12 at 2:05 pmI emailed the authors to let them know that I’d posted this critique.

15.hidethedecline (@hidethedecline) | 2016-01-13 at 4:33 pmI think that’s all you need to do, RN.

Chemist In Langley has a great post on just this issue – the irrelevance of peer review.

https://achemistinlangley.wordpress.com/2016/01/13/on-blogging-and-the-irrelevance-of-academic-peer-review-in-multi-disciplinary-fields/

16.Lou | 2016-04-28 at 3:04 amI’m curious — what, if any, was their response?

17.Radford Neal | 2016-04-28 at 10:31 amNone.

18.Editor of the Fabius Maximus website | 2016-01-12 at 2:15 pmProfessor Neal,

Thanks for the reply. I hope this starts a useful dialog with the authors.

As for publication, if you consider formal pre-reviewed publication not worthwhile (for understandable reasons), you might consider posting a paper on arXiv or an open-review website.

The climate science debate needs an intervention from statisticians. Edward Wegman made a rocky entry to this debate, but I hope more follow his lead. My experience in drug testing has convinced me that your profession’s contribution would make a large contribution.

19.Catalin C | 2016-01-12 at 7:06 pmYou say among other “It is the GISS land-only data set, which is less widely used, since as GISS says, it “overestimates trends, since it disregards most of the dampening effects of the oceans”.”

You are probably missing (willfully or ignorantly) the fact that also GISS land+ocean disregards about 90% or more of the physics involved, here is one graph with all that data directly from IPCC AR5:

http://www.easterbrook.ca/steve/wp-content/IPCC-AR5-WG1-Box-3.1-Fig-1.png

Do you think this one shows any pause?

20.Radford Neal | 2016-01-12 at 7:31 pmThis is dominated by ocean heat content measurements. As I said in a reply to a comment on one of my previous posts, I haven’t investigated whether or not the ocean heat content measurements are actually as accurate as claimed (though off-hand the claim seems rather ambitious to me).

But in any case, it’s irrelevant to the correctness or otherwise of the Rajaratnam, et al. paper. This post doesn’t pretend to be a comprehensive assessment of all claims regarding pauses or lack of pauses.

21.Catalin C | 2016-01-13 at 2:55 amI can not understand how anybody even with just a very basic understanding of physics is trying to dismiss the importance of the total energy anomaly and trying to replace it with a talk about surface temperature anomalies – ask around your university in the physics department if there is a law of conservation of temperatures or a law of conservation of energy.

Regarding your observations about the Rajaratnam paper – the fact they cherry-pick 3% of the energy imbalance and you cherry-pick 10% of the energy imbalance does not make your analysis correct about the topic itself, it just makes it marginally less wrong.

22.Scientists explore causes of the pause in warming, perhaps the most important research of the decade | Fabius Maximus website | 2016-01-12 at 9:14 pm[…] See this brutal and comprehensive rebuttal by Radford Neal, Professor of Statistics at the University of Toronto. […]

23....and Then There's Physics | 2016-01-13 at 4:24 amIt really does seem as though the Rajaratnam paper has a number of pretty fundamental blunders. On the other hand, if you interested in papers about the “hiatus” this one may be worth looking at.

I’m also interested in why you seem to think that claims about the ocean heat content are ambitious. It’s not just in-situ measurements that are used to determine this, there is also sea level rise, and satellite measurements of the planetary energy imbalance.

The key point, in my view, is that anthropogenic global warming is really about energy, not just about surface temperatures. They’re an important indicator, partly because of the length of the data series, but also partly it’s where we live, but they’re not the only indicator. Focusing on them specifically is ignoring a great deal of other, extremely relevant, evidence.

24.Radford Neal | 2016-01-13 at 8:00 amThanks for the link.

Regarding ocean heat content, like I say, I don’t have a well-founded opinion. But I can see many issues, which to me seem difficult to handle. Obtaining trends in ocean heat content from temperature measurements requires measuring temperature trends to very high precision, with good spatial coverage. The floats used presumably have a power source, producing waste heat. Does the way this waste heat affect measurements change as the float gets older? Does stuff growing on the outside of the float affect measurements? The floats mostly just drift with currents, so actual spatial coverage may not be what it seems, since it’s drifting in the same body of water as it moves. Regarding sea-level rise, the ocean heat content inference will require first subtracting out other influences, such as change in continental ground water volumes, and will in any case depend on not just the ocean heat content but on how it is distributed. Theoretically, it’s possible for an increase in ocean volume to be the result of a decrease in heat content, not an increase, if the change is at a low temperature (though I’m not suggesting this is the actual situation).

25.Catalin C | 2016-01-13 at 8:44 amThe “issues” that you think you see are not something new for hundreds of experts in that specific field that were researching exactly that for many decades (some of them even before you and I were born), I really don’t have now the time to link you all the papers on the subject but here is a primer:

http://www.seabird.com/technical_references/LongtermTSstabilityAGUDec08Handout2Pages.pdf

26....and Then There's Physics | 2016-01-13 at 9:56 amPartly what Catalin said, but I’m not sufficiently expert to really delve into whether or not your concerns are valid or not. My point was a little subtler. There are many lines of evidence that all point in the same general direction. Focusing on only one that is actually only associated with a small fraction of the overall process, runs the risk of encountering the same problem as the blind men and an elephant.

27.Bart | 2016-01-13 at 12:41 pm“There are many lines of evidence that all point in the same general direction.”There are

alwaysmany lines of evidence that all point in the same general direction. But, the scientific method demands you look at the evidences which point in the opposite direction, as well. It is “the truth, thewholetruth, and nothing but the truth” standard.28.Bart | 2016-01-13 at 12:37 pm“Rajaratnam, et al. first try to account for such autocorrelation by fitting a regression line using an AR(1) model for the residuals.”This always drives me crazy. They always use an AR(1) model, arbitrarily, and without any apparent justification. Why? I guess because it looks sciencey. How about a series of AR(2) models to capture the oscillations? That’s what the ACF estimate appears to indicate. I expect because it would blow away any claims to significant AGW.

29.Igor | 2016-01-18 at 11:25 pmI have a question. Is it possible to account for known standard error of each year measurement?

Assuming normal distribution (I am not familiar with the domain and don’t know whether this assumption is valid!) one might simulate, say, 100 random values for each year (from 1950 to now), then run change-point analysis to see if there is a significant trend change at around 1972 and 1998.

Thank you

30.Radford Neal | 2016-01-19 at 9:39 amThe HadCRUT data attempts to provide uncertainty indications. I haven’t looked at the details, but I would guess that these are dominated by some allowance for systematic errors, rather than random measurement error in thermometer readings. I’d also guess that the year-to-year variation in the temperature anomaly is dominated by real changes in temperatures, not by measurement error, though systematic errors could affect the long-term trend (if the systematic errors themselves have a trend).

Though discussions of the `pause’ or `hiatus’ are often phrased in terms of the trend changing abruptly in 1998 (or whenever), I think it is unlikely that this is a good model of the actual situation, unless perhaps there was an abrupt change in systematic measurement errors. I suppose I could imagine some reason for an abrupt change in the trend in actual temperatures, but it seems more likely that the appearance of a change in trend would be the result of long-lag autocorrelations. See my post on whether there has been a pause for examples of what series with a constant trend but high autocorrelation can look like, and in particular examples “E”, “G” and “O” there, which have apparent drastic and abrupt trend changes that are really just a consequence of high autocorrelation.

31.Scottish Sceptic | 2016-01-28 at 1:36 pmI would be interested to see a statistical evaluation of the CET data.

Of course, at the very simplest level the problem of global temperature is deciding whether what we see is “abnormal” … but in order to do that, one must first determine was is “normal”.

And (as one of the first people to talk about the pause) – its importance has always been that it shows a failure to warm as predicted. As such it shows that natural variation of at least the same scale of the warming must exist in the climate. But if massive natural variation exists, that undermines the argument that the 1970-2000 warming MUST be explained by an arbitrary scaling up of the CO2 effect by presumed “feedbacks”.

So, the pause removes the claim that the 1970-2000 warming must be due to CO2 because much of it can be due to natural variation. So now natural variation is at least as large as the initial prediction warming over the “pause” – but that it turn means the predicted warming is smaller (as much is now expected to be from natural variation) – so the pause means natural variation is greater than manmade contributions.

Proof: recent temperature trends are not abnormal

http://scottishsceptic.co.uk/2015/03/06/proof-recent-temperature-trends-are-not-abnormal/