R : a coercive programming language

I recently got a great answer to a question I asked on economics.stackexchange.com which lead me to explore the matchingMarket R-library, by Thilo Kleins. The library contains very handy functions to compute the outcome of some of the most famous matching algorithm in the college admission literature.

I wanted to tweak the functions a little to use them in a research project but I quickly realized that my knowledge of R was to limited to do that. So I painfully started learning a little more about R and I figured I might as well document some of my quest for R-mastery here. Hopefully, I won’t be completely off all the time and what I write may help someone.

Here is the first important thing  that I learned the hard way and which is not always emphasized in introductions to R : R is a coercive programming language. This means that, from time to time, R will “coerce” variables of a certain type  (logical, numeric,…) into another type. Suppose for instance that we ask

1 == TRUE

This seems to make no sense : 1 is a numeric, TRUE is a logical, what does it mean to ask whether they are equal? Intuitively, one may expect equality comparisons to only be applicable between numbers, and one might expect

1 == TRUE 

to yield an error. Well, it might be so in other languages, but not in R. It is not that R can “really” answer questions like

"apple" == 3.14

But if you ask R to do so, it will still try to see if it cannot, in one way or another, transform (i.e. coerce) the type of “apple” and “3.14” to make them comparable. Back to the first query

1 == TRUE 

R knows that TRUE is a logical and that 1 is a numeric. But instead of giving up and returning an error, R will ask “what if I tried to turn ‘TRUE’ into a numeric and performed the equality check anyways”? The R function in charge of trying to turn any argument into a numeric is as.numeric. Thus when sending R the former query, you are in fact asking

1 == as.numeric(TRUE) 

Now, somewhat unsurprisingly, R people have decided that

as.numeric(TRUE)=1

Other variables have a defined numeric equivalent. For example, as you might expect,

as.numeric(FALSE)) = 0

Thus to sum up

1 == TRUE 

is equivalent to

1 == as.numeric(TRUE) 

and therefore in the console

> 1== TRUE
[1] TRUE

The last piece of code might seem like nothing, but for someone who starts to learn R, it can be a huge pain to figure out. Now, there is much more to coercion and object  types than the little example above.  One could for instance ask : “How do we know if R will coerce ‘TRUE’ into a numeric or ‘1’ into a logical”? The short answer can be found in the help of the == operator (” ?`==` ” in the console).

” If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.”

Also, if == was the only operator to coerce variables, it would not be so much of an issue. Things are much more complex however : many other operators and functions do coerce object types.

As a matter of fact, there would be a lot to say (and I still have a lot to learn) about R types and the way R manages coercion. It is not the place — nor am I the right author — to do this. My only goal here was to stress this following simple point :

When dealing with R, one must learn to live with the fact that object types will not always match and that this will not always lead to an error. In an ideal world (?), code would not rely too much on automatic coercion by R and types would be explicitly matched before performing operations. In practice however, code does rely on automatic coercion and if one wants to survive the R world, one must learn about and get used to coercion. At the very top of the list of question one should ask him or herself when one does not understand a piece of code is : “is there any coercion happening which I am not completely comfortable with?”.

To dig deeper, here are a couple of relevant questions on http://stackoverflow.com/: