## Two textbooks on probability using R

*2011-06-18 at 11:34 pm* *
23 comments *

This fall, I’ll be teaching a second-year course on Probability with Computer Applications, which is required for Computer Science majors. I’ve taught this before, but that was five years ago, so I’ve been looking to see what new textbooks would be suitable. The course aims not just to use computer science applications as examples, but also to reinforce concepts of probability with programs, and to show how simulation can be used to solve problems that aren’t easily solved analytically. I’ve used R for the programming part, and plan to again, so I was naturally interested in two recent textbooks that seemed to have similar aims:

*Introduction to Probability with R*, Kenneth Baclawski, Chapman & Hall / CRC.

*Probability with R: An Introduction with Computer Science Applications*, Jane M. Horgan, Wiley.

I’ve now had a look at both of these textbooks. Unfortunately, they are both seriously flawed. Even more unfortunately, although some of the flaws in these books are particularly striking, I’ve seen similar, if usually less serious, problems in many other textbooks.

*Introduction to Probability with R* seemed from the title and blurb to be quite promising. I began looking at it by just flipping to a few random pages to see what it read like. That wasn’t supposed to be a serious evaluation yet, but in only a couple minutes, I’d found two passages that pretty much eliminated it from further consideration. Here is the introduction to parametric families of distributions on pages 56 and 57:

… The distributions within a family are distinguished from one another by “parameters”… Because of the dependence on parameters, a family of distributions is also called a *random* or *stochastic function*… Be careful not to think of a random function as a “randomly chosen function” any more than a random variable is a “randomly chosen variable.”

Now, I’ve *never* before seen a parametric family of distributions called a “random function” or “stochastic function”. And I’ve quite frequently seen “random function” used to mean exactly a “randomly chosen function”. Where the author got his terminology, I’ve no idea. For good measure, this same passage refers to the *p* parameter of a binomial distribution as the “bias”, and has a totally pointless illustration of a bunch of gears grinding a value for *n* and a “bias” into a distribution.

So, maybe he uses non-standard notation, but could the content be good? Here’s what’s on page 131:

**Main Rule of Statistics**. In any statistical measurement we may assume that the individual measurements are distributed according to the normal distribution, *N*(*m*,σ^{2}).

To use this rule, we first find the mean *m* and variance σ^{2} from information given in our problem or by using the sample mean… and/or sample variance… defined below. We then compute using either the pnorm or the qnorm function.

As stated, the main rule says only that our results will be “reasonable” if we assume that the measurements are normally distributed. We can actually assert more. In the absence of better information, we *must* assume that a measurement is normally distributed. In other words, if several models are possible, we must use the normal model unless there is a significant reason for rejecting it.

(The author continues his avoidance of standard terminology in the passage above by denoting the sample mean of *x _{1}, …, x_{n}* by

*m*with a bar over it and the sample variance by σ

^{2}with a bar over it, which I haven’t tried to reproduce.)

Sometimes, an occasional incorrect passage in a textbook does no harm, if corrected, and can even make for an interesting example in lecture, but when a textbook emphatically states completely erroneous nonsense it would be a disservice to students to force them to buy it. The passage above reads like a parody of a statistics textbook — too many of which say how important it is to think carefully about what model is appropriate for your problem, but then proceed to use a normal distribution in all the examples without any discussion. Still, lip-service to good practice is better than nothing, and much better than insistent advocacy of bad practice.

*Probability with R: An Introduction with Computer Science Applications *seemed from its title and blurb to be even closer to what I need for my course. For this book, two minutes of flipping through it did not provide sufficient grounds for rejection, so I started looking at it more closely, including reading it systematically from the beginning. I found lots of careless errors (some corrected in the on-line errata), and lots to quibble about, along with nothing that was particularly impressive, but it wasn’t until page 83 that I encountered something seriously wrong:

Returning to the birthday problem …, instead of using permutations and counting, we could view it as a series of *k* events and apply the multiplication law of probability.

*B _{i}* is the birthday of the

*i*th student.

*E* is the event that all students have different birthdays.

For example with two students, the probability of different birthdays is that the second student has a birthday different from that of the first,

that is, the second student can have a birthday on any of the days of the year except the birthday of the first student.

With three students, the probability that the third is different from the previous two is

that is, the third student can have a birthday on any of the days of the year, except the two of the previous two students.

The example continues like this, with equations that on the right side have numerical values that are correct (in terms of the subsequent explanation in words), while on the left side are probability statements that are complete nonsense — since of course *B _{i}*, “the birthday of the

*i*th student” is

*not*an event. Nor can I see any alternative definition of

*B*that would lead to these probability statements making sense.

_{i}Maybe “anyone” could make a mistake like this, but maybe not — I do wonder whether the author actually understands elementary probability theory. I lost all confidence in her ability to apply it in practice on reading the following on page 192, in the section on “Machine learning and the binomial distribution”:

Suppose there are three classifiers used to classify a new example. The probability that any of these classifiers correctly classifies the new case is 0.7, and therefore 0.3 of making an error. If a majority decision is made, what is the probability that the new case will be correctly classified?

Let *X* be the number of correct classifications made by the three classifiers. For a majority vote we need *X*≥2. Because the classifiers are independent, *X* follows a binomial distribution with parameters *n*=3 and *p*=0.7…

We have improved the probability of a correct classification from 0.7 with one classifier to 0.784 with three…

Obviously, by increasing the number of classifiers, we can improve classification accuracy further…

With 21 classifiers let us calculate the probability that a majority decision will be in error for various values of *p*, the probability that any one classifier will be in error…

Thus the key to successful ensemble methods is to construct individual classifiers with error rates below 0.5.

No, I haven’t omitted any phrase like “assuming the classifiers make independent errors”. The author just says “because the classifiers are independent” as if this was totally obvious. Of course, in any real ensemble of classifiers, the mistakes they make are *not* independent, and it is *not* enough to just produce lots of classifiers with error rates below 0.5.

Many, many textbooks give examples in which they say things like “assume that whether one patient dies after surgery is independent of whether another patient dies” without considering the many reasons why this might not be so. But at least they do say they are making an assumption, and at least it is possible to imagine situations in which the assumption is approximately correct. There is no realistic machine learning situation in which multiple classifiers in an ensemble will make errors independently, even approximately. The example in the book is totally misleading.

As well as being seriously flawed, neither of these books makes particularly good use of R to explain probability concepts or to demonstrate the use of simulation. The fragments of R code they contain are very short, basically using it as a calculator and plot program.

So, that’s it for these books. If any readers know of good books on probability that either have good computer science applications, or use R for simulations and to clarify probability concepts, or preferably both, please let me know!

Entry filed under: R Programming, Statistics, Statistics - Computing, Statistics - Nontechnical. Tags: .

1.Chunyi | 2011-06-19 at 12:00 amHave you very considered writing one yourself?

2.Radford Neal | 2011-06-19 at 12:03 amI’ve considered writing various textbooks (before concluding I don’t have time), but not so far one involving probability and R. I expect that if I ever do anything like that, it would have a bit wider scope.

3.robjhyndman | 2011-06-19 at 2:03 amHave a look at Jones, Maillardet and Robinson:

By Owen Jones, Robert Maillardet, Andrew Robinson: Introduction to Scientific Programming and Simulation Using R First (1st) Edition

4.MasterG | 2011-06-19 at 6:34 pmMore complete description can be found here

http://www.crcpress.com/product/isbn/9781420068726

5.Anthony | 2011-06-19 at 5:20 amHi,

Would you have time to make a comment under the review section on Amazon about these texts, as I think newbies wouldn’t realise the problems with them and the current reviews for “Introduction to Probability with R” have 5 and 4 stars!

“Probability with R: An Introduction with Computer Science Applications” has no reviews.

Shows how important it is to have technically knowledgeable editors when publishing technical books :) I guess there should be some kind of peer review mechanism to weed out poorly written textbooks!

Thanks for the reviews,

Anthony Meagher

6.michaeleriksson | 2011-06-19 at 9:46 amMy advice would be to simply pick a good textbook on probability without giving consideration to R or computer applications. Complement this with a reference for R and some good examples of your own.

(Generally, I am suspicious towards classes and books that want to teach more than one thing. They usually botch at least one of the topics—sometimes both. Further, in the long run, knowing the theory is more important than knowing a specific tool.)

7.Adel | 2011-06-19 at 9:54 amI have taken a class with you Dr. Neal and I found your introduction to R a perfect way to learn a new language environment along with new statistical concepts. I’m sure if you just create “lecture notes” and use them as a reference for students, it could be a kind of textbook (like what Dr. Rosenthal has done over the years with his eventual Probability text I suppose). Looking at the course outline, there isn’t much you couldn’t do yourself with respect to reference material and/or online sources (like some of the CS courses!).

8.Randall Thomas | 2011-06-19 at 11:20 amThanks for providing the overview here. I’ve been sorting through a huge mound of material and have found problems with quite a few of the published books out there. It seems that good introductory texts for probability theory are few and far between – frequently, something that implies “introductory” assumes that you’re a third year graduate student with years of practice behind you.

Mostly, I’ve been forced to cull the programming basics from multiple sources and published papers. Amazingly enough, the statistics portal on wikipedia has collected some excellent material. Along with the wikibook for probability, (http://en.wikibooks.org/wiki/Probability) . Perhaps you can use the wikibook as a basis for the course and commit any new notes or corrections back to the wiki. Any increase in the quality of the material currently available would be a great help to persons like myself who are trying to identify resources to get a better understanding of probability.

9.dsh | 2011-06-19 at 3:19 pmThe link below is to a free (quite) introductory probability textbook. It certainly doesn’t meet all your requirements, but I would be curious if you thought it might be useful. It puts a reasonably high level of emphasis on simulating probabilistic scenarios, but the programs that come with the book are in maple or mathematica. The text itself discusses algorithms at a higher level of abstraction than the actual code and it should be easy for undergrad computer science students to just write their own code to do simple simulations in R without looking at the Maple code. It does not have really any direct computer science applications, the simulations are only used to illustrate basic probability concepts.

So typical exercises might say something like “estimate, by simulation, the probability of…”

Although the book is not exactly what you are looking for, at least it is free online and thus could potentially be used to supplement another text book.

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/book.html

http://www.math.dartmouth.edu/~prob/prob/prob.pdf

10.David W. Hogg | 2011-06-20 at 6:50 amIsn’t part of the issue that you are considering only books that include in their title the word “R”? Why handicap your choice of books by making them talk about R? Isn’t statistics the point, and R just one of many places they could do the work? If your class is primarily about R that’s one thing, but then you can’t complain that the statistics isn’t up to snuff! Another way of putting it: R is a tool, not the point, right? And if R *is* the point, make the text an R user manual and teach the statistics on the side.

11.Carlos Ortega | 2011-06-20 at 7:17 amHi,

Have you checked IPSUR (Introduction to Probability and Statistics using R) which is already free available and included in CRAN?:

http://cran.at.r-project.org/web/packages/IPSUR/index.html

http://ipsur.r-forge.r-project.org/book/

12.Radford Neal | 2011-06-20 at 8:22 amThanks to those above who have linked to other texts, online or not. I’ll have a look at them.

Regarding whether a book that’s about not just probability, but also R and computer science applications, is a good thing, or whether I should just go for a book on probability: the latter is of course an option, which I may have to go for. But if there is actually a good book that combines probability, R, and computer applications, I’d prefer it. Why wouldn’t I, since I plan on combining them in the course?

One general argument for why there might not be a good book with both probability and R (or with both probability and X, for any X) is that someone good who is thinking of writing a probability text will want to target as wide an audience as possible, so why would they combine it with R, thereby limiting their market? The counter-argument to this is that if you assume that the reader can write (not too complicated) programs, which I can for the course, then one can present the material on probability in a different way, using programs to illustrate concepts and applications. And to do this, one needs to chose some programming language, which might as well be R.

13.Ken Baclawski | 2011-07-08 at 12:45 amThank you for your comments on my textbook, Introduction to Probability with R.

The term “stochastic function” is from the literature on Bayesian networks. See D. Koller, D. McAllester and A. Pfeffer, Effective Bayesian Inference for Stochastic Programs in Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), pages 740-747, Providence, Rhode Island, August 1997.

The Main Rule of Statistics is just the Central Limit Theorem restated as a rule of thumb. Many examples and exercises are given to make it clear what is intended by this rule. The commentary following the rule of thumb is based on Boltzmann’s result that the normal distribution has the maximum entropy. This result is formally stated and proved in Section 9.3 of the textbook.

14.The foundations of Statistics: a simulation-based approach « Xi'an's Og | 2011-07-11 at 6:12 pm[...] and most sadly, I find the book does not live up to expectations. As in Radford Neal’s recent coverage of introductory probability books with R, there are statements there that show a deep [...]

15.Keith O"Rourke | 2011-07-28 at 10:49 amRadford:

Perhaps a few readings might help.

I would conjecture that this – “Stigler, S. M. (2010), Darwin, Galton and the Statistical Enlightenment. JRSSA” – would allow students to grasp formal Bayes inference as the workings of a pin-ball machine (see fig 5).

Jeff’s animation of virtual quincunx and paper at http://probability.ca/jeff/java/uncunx.html would help students grasp the CLT and failures of CLT.

I also have a simple R animation for two parameter two-stage virtual quincunx to demonstrate a Bayesian analysis of a two group binary outcome that focuses on log odd ratio after “eliminating” the nuisance parameter. (Let me if you wish me to email it to you.)

Keith

16.Bull | 2011-09-21 at 8:48 amGranted criticism may be considered a principal cog in the wheel of research/academia. But just like in any form of communication, mode and tone of delivery often masks the good intent of the critic. You don’t just tear into a book that took time and resources by REPUTABLE (Ken and his late co-teacher) authors. True books often have flaws, some fundamental, but to fault both CRC and Wiley and their editors amounts to intellectual sour grapes of some sort.

You admit that you have considered writing book/s but cannot due to lack of time. Well please read the old adage: CRITICS WHERE AT THOU WHEN WRITERS BURN THE MIDNIGHT OIL ONLY FOR YOU TO EMERGE FROM THE SHADOWS TO SHRED THEIR WORK.

17.Radford Neal | 2011-09-21 at 2:03 pmI’m sure the authors of the two books spent a lot of time and effort writing them. So it might seem mean for me to criticize them so severely. But you’re forgetting some other people involved here: All the people who might read these books, and think they are obtaining correct information. If I had written that these books cover important topics, but could be improved, blah, blah, blah, I’m not sure that potential readers would understand that they are better off with no book at all than with one of these books.

18.Bull | 2013-09-19 at 12:11 pmTalk is cheap and playing the intellectual (statistics) gate keeper is sadism albeit in disguise. You are a smart professor, show us your mettle and write the book. No need to e-mourn.

19.Radford Neal | 2013-09-19 at 2:26 pmBull: Your attitude is not compatible with how a healthy intellectual community operates. I’ve seen it before in other contexts – for example, someone points out flaws in someone’s experimental study, and they receive a reply that if they think they’re so smart, they should do their own study! That’s not the way it works. I don’t have to write my own book to point out flaws in other people’s books.

20.John Nash | 2012-03-27 at 9:44 pmI have read the “Introduction to Probability with R”. It’s well-written and very organized and I don’t think some terminology flaws are such big deals. Can anyone point out how these “errors” affect the general readers?

21.Radford Neal | 2012-03-27 at 9:56 pmI’m not sure what you mean by a “general reader”, but a reader who wants to go on and read other books on probability and statistics is certainly going to be harmed by having gotten various peculiar ideas about terminology and notation.

And of course my objection to the “Main Rule of Statistics” is not terminological. It’s that it’s completely wrong, in a way that certainly will lead readers who take it seriously to come to very wrong conclusions when they analyse data.

22.Washington S. Silva (@twssecn) | 2012-12-19 at 8:13 pmThe book Think Stats Probability and Statistics for Programmers,

By Allen B. Downey, deserves analysis. (In english now)

23.Understanding Statistics? | Musings on Using and Misusing Statistics | 2014-03-12 at 10:39 pm[…] (Added 3/12/14: See also Radford Neal’s 6/18/11 blog entry Two textbooks on probability using R.) […]