Specify the colour of ggpairs plot using a variable but not plot that variable

I have a dataset from the world bank with some continuous and categorical variables.

> head(nationsCombImputed)
  iso3c iso2c              country year.x life_expect population birth_rate neonat_mortal_rate                     region
1   ABW    AW                Aruba   2014       75.45     103441       10.1                2.4  Latin America & Caribbean
2   AFG    AF          Afghanistan   2014       60.37   31627506       34.2               36.1                 South Asia
3   AGO    AO               Angola   2014       52.27   24227524       45.5               49.6         Sub-Saharan Africa
4   ALB    AL              Albania   2014       77.83    2893654       13.4                6.5      Europe & Central Asia
5   AND    AD              Andorra   2014       70.07      72786       20.9                1.5      Europe & Central Asia
6   ARE    AE United Arab Emirates   2014       77.37    9086139       10.8                3.6 Middle East & North Africa
               income gdp_percap.x  log_pop
1         High income     47008.83 5.014693
2          Low income      1942.48 7.500065
3 Lower middle income      7327.38 7.384309
4 Upper middle income     11307.55 6.461447
5         High income     30482.64 4.862048
6         High income     67239.00 6.958379

I wish to use ggpairs to plot some of the continuous variables (life_expect, birth_rate, neonat_mortal_rate, gdp_percap.x) in a scatter plot but I would like to colour them using the region categorical variable from the data. I have tried a number of different ways but I cannot colour the continuous variables without including the categorical variable.

ggpairs(nationsCombImputed[,c(2,5,7,8,9,11)],
        title="Scatterplot of Variables",
        mapping = ggplot2::aes(color = region),
        labeller = "iso2c")

But I get this error

Error in stop_if_high_cardinality(data, columns, cardinality_threshold) : Column 'iso2c' has more levels (211) than the threshold (15) allowed. Please remove the column or increase the 'cardinality_threshold' parameter. Increasing the cardinality_threshold may produce long processing times

Ultimately I would just like a 4x4 scatter plot of the continuous variables coloured by region with the data points labels using the iso2c code in column 2.

Is this possible in ggpairs?

Well yes it is possible! As per @Robin Gertenbach suggestions I added the columns argument to my code and this worked great, please see below.

在这里输入图像描述

ggpairs(nationsCombImputed,
        title="Scatterplot of Variables",
        columns = c(5,7,8,11),
        mapping=ggplot2::aes(colour = region))

I still wish to add data point labels to the scatter plot using the iso2c column but I am struggling with this, any pointers would be greatly appreciated.


As mentioned in the comment you can get ggpairs to color but not plot a dimension by specifying the numeric indices of the columns you do want to plot with columns = c(5,7,8,11) .

To have a text scatter plot you will need to define a function eg textscatter that you will supply via lower = list(continuous = textscatter) in the ggpairs function call and specify the labels in the aesthetics.

textscatter <- function(data, mapping, ...) {
   ggplot(data, mapping, ...) + geom_text()
}

ggpairs(
  nationsCombImputed, 
  title="Scatterplot of Variables",
  columns = c(5,7,8,11),
  mapping=ggplot2::aes(colour = region, label = iso2c))
  lower = list(continuous = textscatter)
)

Of course you can also put the label aesthetic definition into textscatter

链接地址: http://www.djcxy.com/p/30870.html

上一篇: 将grid.arrange()图保存到文件

下一篇: 使用变量指定ggpairs图的颜色,但不绘制该变量