Remove an entire column from a data.frame in R

Does anyone know how to remove an entire column from a data.frame in R? For example if I am given this data.frame:

> head(data)
   chr       genome region
1 chr1 hg19_refGene    CDS
2 chr1 hg19_refGene   exon
3 chr1 hg19_refGene    CDS
4 chr1 hg19_refGene   exon
5 chr1 hg19_refGene    CDS
6 chr1 hg19_refGene   exon

and I want to remove the 2nd column.


You can set it to NULL .

> Data$genome <- NULL
> head(Data)
   chr region
1 chr1    CDS
2 chr1   exon
3 chr1    CDS
4 chr1   exon
5 chr1    CDS
6 chr1   exon

As pointed out in the comments, here are some other possibilities:

Data[2] <- NULL    # Wojciech Sobala
Data[[2]] <- NULL  # same as above
Data <- Data[,-2]  # Ian Fellows
Data <- Data[-2]   # same as above

You can remove multiple columns via:

Data[1:2] <- list(NULL)  # Marek
Data[1:2] <- NULL        # does not work!

Be careful with matrix-subsetting though, as you can end up with a vector:

Data <- Data[,-(2:3)]             # vector
Data <- Data[,-(2:3),drop=FALSE]  # still a data.frame

To remove one or more columns by name, when the column names are known (as opposed to being determined at run-time), I like the subset() syntax. Eg for the data-frame

df <- data.frame(a=1:3, d=2:4, c=3:5, b=4:6)

to remove just the a column you could do

Data <- subset( Data, select = -a )

and to remove the b and d columns you could do

Data <- subset( Data, select = -c(d, b ) )

You can remove all columns between d and b with:

Data <- subset( Data, select = -c( d : b )

As I said above, this syntax works only when the column names are known. It won't work when say the column names are determined programmatically (ie assigned to a variable). I'll reproduce this Warning from the ?subset documentation:

Warning:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like '[', and in particular the non-standard evaluation of argument 'subset' can have unanticipated consequences.


The posted answers are very good when working with data.frame s. However, these tasks can be pretty inefficient from a memory perspective. With large data, removing a column can take an unusually long amount of time and/or fail due to out of memory errors. Package data.table helps address this problem with the := operator:

library(data.table)
> dt <- data.table(a = 1, b = 1, c = 1)
> dt[,a:=NULL]
     b c
[1,] 1 1

I should put together a bigger example to show the differences. I'll update this answer at some point with that.

链接地址: http://www.djcxy.com/p/70894.html

上一篇: 将data.frame列格式从字符转换为factor

下一篇: 从R中的data.frame中删除整列