dplyr masks GGally and breaks ggparcoord

Given a fresh session, executing a small ggparcoord(.) example provided in the documentation of the function

library(GGally)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results into the following plot:

在这里输入图像描述

Again, starting in a fresh session and executing the same script with the loaded dplyr

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results in:

Error: (list) object cannot be coerced to type 'double'

Note that the order of the library(.) statements does not matter.

Questions

  • Is there something wrong with the code samples?
  • Is there a way to overcome the problem (over some namespace functions)?
  • Or is this a bug?
  • I need both dplyr and ggparcoord(.) in a bigger analysis but this minimal example reflects the problem i am facing.

    Versions

  • R @ 3.2.3
  • dplyr @ 0.4.3
  • GGally @ 1.0.1
  • ggplot @ 2.0.0
  • UPDATE

    To wrap the excellent answer given by Joran up:

    Answers

  • The code samples are in fact wrong as ggparcoord(.) expects a data.frame not a tbl_df as given by the diamonds data set (if dplyr is loaded).
  • The problem is solved by coercing the tbl_df to a data.frame.
  • No it is not a bug.
  • Working code sample:

    library(GGally)
    library(dplyr)
    
    data(diamonds, package="ggplot2")
    diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
    ggparcoord(data = as.data.frame(diamonds.samp), columns = c(1, 5:10))
    

    Converting my comments to an answer...

    The GGally package here is making the reasonable assumption that using [ on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds data set is a tbl_df as well as a data.frame .

    When dplyr is loaded, the behavior of [ is overridden such that drop = FALSE is always the default for a tbl_df . So there's a place in GGally where data[,"cut"] is expected to return a vector, but instead it returns another data frame.

    ...specifically, the error is thrown in your example while attempting to execute:

    data[, fact.var] <- as.numeric(data[, fact.var]). 
    

    Since data[,fact.var] remains a data frame, and hence a list, as.numeric won't work.

    As for your conclusion that this isn't a bug, I'd say....maybe. Probably. At least there probably isn't anything the GGally package author ought to do to address it. You just have to be aware that using tbl_df 's with non-Hadley written packages may break things.

    As you noted, removing the extra class attributes fixes the problem, as it returns R to using the normal [ method.


    Workaround: coerce your data for ggparcoord to as.data.table(...) or as.data.table(... , keep.rownames=TRUE) unless you want to lose all your rownames.

    Cause: as per @joran's investigating, when dplyr is loaded, tbl_df overrides [ so that drop = FALSE.

    Solution: file a pull-request on GGally.

    链接地址: http://www.djcxy.com/p/5228.html

    上一篇: 非线性回归的几个级别在R

    下一篇: dplyr掩盖GGally并打破ggparcoord