Archive for 2013-07-02

Fixing R’s NAMED problems in pqR

In R, objects of most types are supposed to be treated as “values”, that do not change when other objects change. For instance, after doing the following:

  a <- c(1,2,3)
  b <- a
  a[2] <- 0

b[2] is supposed to have the value 2, not 0. Similarly, a vector passed as an argument to a function is not normally changed by the function. For example, with b as above, calling f(b), will not change b even if the definition of f is f <- function (x) x[2] <- 0.

This semantics would be easy to implement by simply copying an object whenever it is assigned, or evaluated as the argument to a function. Unfortunately, this would be unacceptably slow. Think, for example, of passing a 10000 by 10000 matrix as an argument to a little function that just accesses a few elements of the matrix and returns a value computed from them.  The copying would take far longer than the computation within the function, and the extra 800 Megabytes of memory required might also be a problem.

So R doesn’t copy all the time.  Instead, it maintains a count, called NAMED, of how many “names” refer to an object, and copies only when an object that needs to be modified is also referred to by another name.  Unfortunately, however, this scheme works rather poorly.  Many unnecessary copies are still made, while many bugs have arisen in which copies aren’t made when necessary. I’ll talk about this more below, and discuss how pqR has made a start at solving these problems. (more…)

2013-07-02 at 9:44 pm 3 comments


Calendar

July 2013
M T W T F S S
« Jun   Dec »
1234567
891011121314
15161718192021
22232425262728
293031  

Posts by Month

Posts by Category