Controlling alpha in ggparcoord (from GGally package)

I am trying to build from a question similar to mine (and from which I borrowed the self-contained example and title inspiration). I am trying to apply transparency individually to each line of a ggparcoord or somehow add two layers of ggparcoord on top of the other. The detailed description of the problem and format of data I have for the solution to work is provided below.


I have a dataset with thousand of lines, lets call it x .

library(GGally)
x = data.frame(a=runif(100,0,1),b=runif(100,0,1),c=runif(100,0,1),d=runif(100,0,1))

After clustering this data I also get a set of 5 lines, let's call this dataset y .

y = data.frame(a=runif(5,0,1),b=runif(5,0,1),c=runif(5,0,1),d=runif(5,0,1))

In order to see the centroids y overlaying x I use the following code. First I add y to x such that the 5 rows are on the bottom of the final dataframe. This ensures ggparcoord will put them last and therefore stay on top of all the data:

df <- rbind(x,y)

Next I create a new column for df, following the question advice I referred such that I can color differently the centroids and therefore can tell it apart from the data:

df$cluster = "data"
df$cluster[(nrow(df)-4):(nrow(df))] <- "centroids"

Finally I plot it:

p <- ggparcoord(df, columns=1:4, groupColumn=5, scale="globalminmax", alphaLines = 0.99) + xlab("Sample") + ylab("log(Count)")
p + scale_colour_manual(values = c("data" = "grey","centroids" = "#94003C"))

The problem I am stuck with is from this stage and onwards. On my original data, plotting solely x doesn't lead to much insight since it is a heavy load of lines (on this data this is equivalent to using ggparcoord above on x instead of df :

在这里输入图像描述

By reducing alphaLines considerably (0.05), I can naturally see some clusters due to the overlapping of the lines (this is again running ggparcoord on x reducing alphaLines ):

在这里输入图像描述

It makes more sense to observe the centroids added to df on top of the second plot, not the first.

However, since everything it is on a single dataframe, applying such a high value for alphaLine makes the centroid lines disappear. My only option is then to use ggparcoord (as provided above) on df without decreasing the alphaValue:

在这里输入图像描述

My goal is to have the red lines (centroid lines) on top of the second figure with very low alpha. There are two ways I thought so far but couldn't get it working:

(1) Is there any way to create a column on the dataframe, similar to what is done for the color, such that I can specify the alpha value for each line?

(2) I originally attempted to create two different ggparcoords and "sum them up" hoping to overlay but an error was raised.

The question may contain too much detail, but I thought this could motivate better the applicability of the answer to serve the interest of other readers.

The answer I am looking for would use the provided data variables on the current format and generate the plot I am looking for. Better ways to reconstruct the data is also welcomed, but using the current structure is preferred.


In this case I think it easier to just use ggplot , and build the graph yourself. We make slight adjustments to how the data is represented (we put it in long format), and then we make the parallel coordinates plot. We can now map any attribute to cluster that you like.

library(dplyr)
library(tidyr)

# I start the same as you
x <- data.frame(a=runif(100,0,1),b=runif(100,0,1),c=runif(100,0,1),d=runif(100,0,1))
y <- data.frame(a=runif(5,0,1),b=runif(5,0,1),c=runif(5,0,1),d=runif(5,0,1))

# I find this an easier way to combine the two data.frames, and have an id column
df <- bind_rows(data = x, centroids = y, .id = 'cluster')
# We need to add id's, so we know which points to connect with a line
df$id <- 1:nrow(df)

# Put the data into long format
df2 <- gather(df, 'column', 'value', a:d)

# And plot:
ggplot(df2, aes(column, value, alpha = cluster, color = cluster, group = id)) +
  geom_line() +
  scale_colour_manual(values = c("data" = "grey", "centroids" = "#94003C")) +
  scale_alpha_manual(values = c("data" = 0.2, "centroids" = 1)) +
  theme_minimal()

在这里输入图像描述

链接地址: http://www.djcxy.com/p/30936.html

上一篇: 来自dplyr包的连接在新列中产生NA结果

下一篇: 控制ggparcoord中的alpha(来自GGally包)