Comparing the speed of pqR with R-2.15.0 and R-3.0.1
As part of developing pqR, I wrote a suite of speed tests for R. Some of these tests were used to show how pqR speeds up simple real programs in my post announcing pqR, and to show the speed-up obtained with helper threads in pqR on systems with multiple processor cores.
However, most tests in the suite are designed to measure the speed of more specific operations. These tests provide insight into how much various modifications in pqR have improved speed, compared to R-2.15.0 on which it was based, or to the current R Core release, R-3.0.1. These tests may also be useful in judging how much you would expect your favourite R program to be sped up using pqR, based on what sort of operations the program does.
Below, I’ll present the results of these tests, discuss a bit what some of the tests are doing, and explain some of the run time differences. I’ll also look at the effect of “byte-code” compilation, in both pqR and the R Core versions of R.
There are a lot of tests, so you’ll want to look at each result plot in a separate window. Here are links for all the comparison plots I’ll discuss below:
In these plots, each row in one of the three panels shows, for the named test, the ratio of run time using one version of R to run time using another version of R (each either with or without byte-code compilation, both of the test functions and the packages used). Note that most named tests actually consist of several sub-tests, so there are varying numbers of results in each named section.
The first comparison, of pqR against R-2.15.0, with neither using byte-code compilation, shows most clearly the improvements I’ve made in creating pqR from R-2.15.0 (given that I haven’t changed the byte-code compiler in any significant way). The first group of tests (top left, before “alloc”) are based on small real programs. They mostly show how interpretive and other general overhead has been reduced in pqR, speeding up such programs by roughly a factor of two. The “hlp-” tests at the end are discussed in my post on helper threads in pqR. In between are a large number of tests targeting specific operations, for many of which I have made specific modifications to R-2.15.0, although these tests are also sped up by modifications reducing general overhead.
The “any-all.relop-is” test (left panel) is of operations such as
x <- any(a>0.2), where
a is a vector of length 1000. The large (over 10 times) improvement on this test results from pqR not actually storing the result of
a>0.2, but just computing the “or” of the comparisons as they are done, stopping once a TRUE result is obtained. This is faster even when the whole vector has to be looked at (eg, all comparisons are FALSE), and very much faster when a TRUE comparison is encountered early. This is implemented with pqR’s “variant result” internal mechanism, which is also crucial to the implementation of helper threads, and which I will write about in a future post.
I will also post in future on the reason for the large speed improvements in the “matprod” tests (middle panel) of matrix multiplication. The large speed-ups for “pow.vec-to-scalar” (also middle panel) come from handling powers of 2, 1, 0, -1, and 0.5 specially. The large gains in “vec-subset.vec-by-neg” (right panel) come from more efficient handling of deletion from negative subscripting, as in
Turning to the tests showing more modest speed gains (by factors of two to five), the “dollar” tests (left panel) are of significance because access to list elements with expressions such as
L$e is very common in some R coding styles. This speed-up comes from detailed improvements to the C code implementing “$”. The speed-up in the random number generation functions (the “rand” tests, middle panel) also comes from detailed improvements in the C code, and from not copying the 2500 bytes of the random seed (for the default random generator) on every call of
runif. Many other operations have also been sped up by more than a factor of two, but I won’t discuss them all here.
It’s also of obvious practical interest to see how pqR compares with R-3.0.1, the current R Core release. Not too much has changed from R-2.15.0. There’s been an improvement in
rep, for which there is not yet a corresponding improvement in pqR, but random number generation is even slower in R-3.0.1 (perhaps because it is done with .External rather than .Internal calls).
However, in R-3.0.1, byte-compiling functions almost always speeds them up, so let’s look at how R-3.0.1 compares to pqR, both with compilation (of the test functions and all packages). For the simple programs in the top of the left panel, the speed gain from using pqR is reduced, to roughly a factor of 1.4. But many of the speed-ups for particular operations are just as large as when no compilation is done. Not all, however — speed improvements in pqR that are implemented using “variant results” aren’t effective in byte-compiled code, since the byte-code interpreter knows nothing about this mechanism. So byte compiling functions doesn’t always speed them up in pqR. This can be seen in the comparison of pqR without compilation and with compilation. Use of helper threads is also ineffective in compiled code.
This makes it hard to decide whether to compile functions for use with pqR or not. One long-term solution would be to change the byte-code compiler so it can use variant results. Another would be to further improve the interpreter in pqR until there is no longer any substantial advantage to compilation. For example, I expect that the large advantage of compilation in the “assign” tests (left panel) can be eliminated by simply changing the interpreter to use the same technique for complex assignments as is presently used in compiled code (or perhaps to use a better technique).