How to read data when some numbers contain commas as thousand separator?

I have a csv file where some of the numerical values are expressed as strings with commas as thousand separator, eg "1,513" instead of 1513 . What is the simplest way to read the data into R?

I can use read.csv(..., colClasses="character") , but then I have to strip out the commas from the relevant elements before converting those columns to numeric, and I can't find a neat way to do that.


Not sure about how to have read.csv interpret it properly, but you can use gsub to replace "," with "" , and then convert the string to numeric using as.numeric :

y <- c("1,200","20,000","100","12,111")
as.numeric(gsub(",", "", y))
# [1]  1200 20000 100 12111

This was also answered previously on R-Help (and in Q2 here).

Alternatively, you can pre-process the file, for instance with sed in unix.


You can have read.table or read.csv do this conversion for you semi-automatically. First create a new class definition, then create a conversion function and set it as an "as" method using the setAs function like so:

setClass("num.with.commas")
setAs("character", "num.with.commas", 
        function(from) as.numeric(gsub(",", "", from) ) )

Then run read.csv like:

DF <- read.csv('your.file.here', 
   colClasses=c('num.with.commas','factor','character','numeric','num.with.commas'))

I want to use R rather than pre-processing the data as it makes it easier when the data are revised. Following Shane's suggestion of using gsub , I think this is about as neat as I can do:

x <- read.csv("file.csv",header=TRUE,colClasses="character")
col2cvt <- 15:41
x[,col2cvt] <- lapply(x[,col2cvt],function(x){as.numeric(gsub(",", "", x))})
链接地址: http://www.djcxy.com/p/60620.html

上一篇: Android版本是否有最小的堆大小?

下一篇: 当一些数字包含逗号作为千位分隔符时如何读取数据?