## Posts filed under ‘Computing’

### Faster garbage collection in pqR

The latest version of pqR and the version before as well use a new garbage collector, and new memory layouts for R objects, which both reduce memory usage and considerably speed up garbage collection. |

### The new pqR parser, and R’s “else” problem

The latest version of pqR (mostly) solves R’s “else” problem, by modifying the new parser previously introduced in pqR. I’ll explain the “else” problem and solution here, and also present other advantages of pqR’s parser over the R Core parser, including a big speed advantage in one context. |

### New version of pqR, with major speed improvements

I’ve released pqR-2018-11-18, a new version of my variant implementation of R. You can install it on Linux, Windows, or Mac as described at pqR-project.org. Installation must currently be from source, similarly to source installs of R Core versions of R. |

This version has some major speed improvements, as well as some new features. I’ll details some of these improvements in future posts. Here, I’ll just mention a few things to show the flavour of the improvements in this release, and why you might be interested in pqR as an alternative to the R Core implementation.

### Performance improvements

One landmark reached in this release is that it is no longer advisable to use the byte-code compiler in pqR. The speed of direct interpretation of R code has now been improved to the point where it is about as fast at executing simple scalar code as the byte-code interpreter. Eliminating the byte-code compiler simplifies the overall implementation, and avoids possible semantic differences between interpreted and byte-compiled code. It is also important for pqR because some pqR optimizations and some new pqR features are not implemented in byte-code. For example, only the interpreter does optimizations such as deferring vector operations so that they may automatically be merged with other operations or be done in parallel when multiple cores are available.

Some vector operations have been substantially sped up compared to the previous release of pqR-2017-06-09. The improvement compared to R-3.5.1 can be even greater. Here is an example of replacing a subset of vector elements, benchmarked on an Intel “Skylake” processor, with both pqR-2018-11-18 and R-3.5.1 compiled from source with gcc 8.2.0 at optimization level -O3:

Here’s R-3.5.1:

> a <- numeric(20000) > system.time(for (i in 1:100000) a[2:19999] <- 3.1) user system elapsed 4.211 0.148 4.360

And here’s pqR-2018-11-18:

> a <- numeric(20000) > system.time(for (i in 1:100000) a[2:19999] <- 3.1) user system elapsed 0.256 0.000 0.257

So the current R Core implementation is 17 times slower than pqR for this replacement operation.

The advantage of pqR isn’t always this large, but many vector operations are sped up by smaller but still significant factors. An example:

With R-3.5.1:

> a <- seq(0,1,length=2000); b <- seq(1,0,length=2000) > system.time (for (i in 1:100000) { + d <- abs(a-b); r <- sum (d>0.4 & d<0.7) }) user system elapsed 1.215 0.015 1.231

With pqR-2018-11-18:

> a <- seq(0,1,length=2000); b <- seq(1,0,length=2000) > system.time (for (i in 1:100000) { + d <- abs(a-b); r <- sum (d>0.4 & d<0.7) }) user system elapsed 0.654 0.008 0.662

So for this example, pqR is almost twice as fast.

For some operations, pqR’s implementation has lower asymptotic time complexity, and so can be enormously faster. An example is the following convenient coding pattern that R programmers are currently warned to avoid:

With R-3.5.1:

> n <- 200000; a <- numeric(0); > system.time (for (i in 1:n) a <- c(a,(i+1)^2)) user system elapsed 30.387 0.223 30.612

With pqR-2018-11-18:

> n <- 200000; a <- numeric(0); > system.time (for (i in 1:n) a <- c(a,(i+1)^2)) user system elapsed 0.040 0.004 0.045

In R-3.5.1, extending a vector one element at a time with “c” takes time growing as n^{2}, as a new vector is allocated when each element is appended. With the latest version of pqR, the time grows only as n log n. In this example, that leads to pqR being 680 times faster, but the ratio could be made arbitrarily large by increasing n.

It’s still faster in pqR to preallocate a vector of length n, but only by about a factor of three, which would often be tolerable when writing one-off code if using “c” is more convenient.

### New features

The latest version of pqR has some new features. As for earlier pqR versions, some new features are aimed at addressing design flaws in R that lead to unreliable code, and others are aimed at making R more convenient for programming and scripting.

One new convenience feature is that the paste and paste0 operations can now be written with new `!!` and `!` operators. For example,

> city <- "Toronto"; province <- "Ontario" > city ! "," !! province [1] "Toronto, Ontario"

The `!!` operator pastes strings together with space separation; the `!` operator pastes with no separation. Of course, `!` retains its meaning of “not” when used as a unary operator; there is no ambiguity.

### What next?

I’ll be writing some more blog posts regarding improvements in pqR-2018-11-18, and regarding some improvements in earlier pqR versions that I haven’t already blogged about. Of course, you can read about these now in the pqR NEWS file.

The main disadvantage of pqR is that it is not fully compatible with the current R Core version. It is a fork of R-2.15.0, with many, but not all, later changes incorporated. This affects what packages will work with pqR.

Addressing this compatibility issue is one thing that needs to be done going forward. I’ll discuss this and other plans — notably implementing automatic differentiation — in another future blog post.

I’m open to other people getting involved in this project. Of course, you can contribute now by trying out pqR and reporting any problems in the comments here or at the pqR issues page. Performance comparisons, especially on real-world applications, are also welcome.

Finally, for the paranoid, here are the shasum values for the compressed and uncompressed tar files that you can download from pqR-project.org:

89216dc76be23b3928c26561acc155b6e5ad32f3 pqR-2018-11-18.tar.gz f0ee8a37198b7e078fa1aec7dd5cda762f1a7799 pqR-2018-11-18.tar

### Fixing R’s design flaws in a new version of pqR

In particular, the extensions fix the problems that `1:n` doesn’t work as intended when `n` is zero, and that `M[1:n,]` is a vector rather than a matrix when `n` is one, or when `M` has only one column. Since changing the “`:`” operator would cause too many problems with existing programs, pqR introduces a new “`..`” operator for generating increasing sequences. Unwanted dimension dropping is also addressed in ways that have minimal effects on existing code.

The new release, pqR-2016-06-24, is available at pqR-project-org. The NEWS file for this release also documents some other language extensions, as well as fixes for various bugs (some of which are also in R-3.3.1).

(more…)

### Exact computation of sums and means

A while ago, I came across a mention of the Python math.fsum function, which sums a set of floating-point values *exactly*, then rounds to the closest floating point value. This seemed useful. In particular, I thought that if it’s fast enough it could be used instead of R’s rather primitive two-pass approach to trying to compute the sample mean more accurately (but still not exactly). My initial thought was to just implement the algorithm Python uses in pqR. But I soon discovered that there were newer (and faster) algorithms. And then I thought that I might be able to do even better…

The result is a new paper of mine on Fast exact summation using small and large superaccumulators (also available from arxiv.org).

### How large vectors in R might be stored compactly

Vectors in R can currently have elements of two sizes — 8-byte double-precision floating-point elements for `numeric’ vectors, or 4-byte elements for `integer’ or `logical’ vectors. You can also have vectors whose elements are 1-byte `raw’ values, but these raw vectors don’t support negative numbers, or NA values, so they aren’t suitable for general use.

It seems that lots of actual data vectors could be stored more compactly than at present. Many integer vectors consist solely of elements that would fit in one or two bytes. Logical vectors could be stored using two bits per element (allowing TRUE, FALSE, and NA), which would use only one-sixteenth as much memory as at present. It’s likely that many operations would also be faster on such compact vectors, so there’s not even necessarily a time-space tradeoff.

For integer and logical types, the possible compact representations, and how to work with them, are fairly obvious. The challenge is how to start using such compact representations while retaining compatibility with existing R code, including functions written in C, Fortran, or whatever. Of course, one could use the S3 or S4 class facilities to define new classes for data stored compactly, with suitable redefinitions of standard operators such as `+’, but this would have substantial overhead, and would in any case not completely duplicate the behaviour of non-compact numeric, integer, or logical vectors. Below, I discuss how to implement compact representations in a way that is completely invisible to R programs. I hope to try this out in my pqR implementation of R sometime, though other improvements to pqR have higher priority at the moment.

How to compactly represent floating-point data (of R’s `numeric’ type) is not so obvious. If the use of a compact representation is to have no effect on the results, one cannot just use single-precision floating point. I describe a different approach in a new paper on Representing numeric data in 32 bits while preserving 64-bit precision (also on arxiv). I’ll present the idea of this paper next, before returning to the question of how one might put compact representations of any sort into an R interpreter, invisibly to R programs. (more…)

### Subset replacement in pqR: Now faster and better

The latest version of pqR that I just released uses a new way of implementing subset replacement operations — such as a[i]<-1 or L$M[1:100,i]<-v. The new approach is much faster, and eliminates some strange behaviour of the previous approach. |

This change affects only interpreted code. The bytecode compiler (available since R-2.13.0) introduced a different mechanism, which is also faster than the previous approach used by the interpreter (though it still has some of the strange behaviour). This faster mechanism was one of the main reasons for byte-compiled code to be faster than interpreted code (although it would have been possible to use the new mechanism in the interpreter as well). With pqR’s new implementation of subset replacement, this advantage of byte-compiled over interpreted code is much reduced.

In addition to being faster, pqR’s new approach is also more coherent than the previous approach (still current in the interpreter for R Core releases to at least R-3.1.1), which despite its gross inefficiency and confused semantics has remained essentially unchanged for 18 years. Unfortunately, the new approach in pqR is not as coherent as it might be, because past confusion has resulted in some packages doing “wrong” things, which have to be accommodated, as least in the short term.