When should I use setDT() instead of data.table() to create a data.table?

2018-06-24 13:26:51

I am having difficulty grasping the essence of the setDT() function. As I read code on SO, I frequently come across the use of setDT() to create a data.table. Of course the use of data.table() is ubiquitous. I feel like I solidly comprehend the nature of data.table() yet the relevance of setDT() eludes me. ?setDT tells me this:

setDT converts lists (both named and unnamed) and data.frames to data.tables by reference.

as well as:

In data.table parlance, all set* functions change their input by reference. That is, no copy is made at all, other than temporary working memory, which is as large as one column.

So this makes me think I should only use setDT() to make a data.table, right? Is setDT() simply a list to data.table converter?

library(data.table)

a <- letters[c(19,20,1,3,11,15,22,5,18,6,12,15,23)]
b <- seq(1,41,pi)
ab <- data.frame(a,b)
d <- data.table(ab)
e <- setDT(ab)

str(d)
#Classes ‘data.table’ and 'data.frame': 13 obs. of  2 variables:
# $ a: Factor w/ 12 levels "a","c","e","f",..: 9 10 1 2 5 7 11 3 8 4 ...
# $ b: num  1 4.14 7.28 10.42 13.57 ...
# - attr(*, ".internal.selfref")=<externalptr>

str(e)
#Classes ‘data.table’ and 'data.frame': 13 obs. of  2 variables:
# $ a: Factor w/ 12 levels "a","c","e","f",..: 9 10 1 2 5 7 11 3 8 4 ...
# $ b: num  1 4.14 7.28 10.42 13.57 ...
# - attr(*, ".internal.selfref")=<externalptr>

Seemingly no difference in this instance. In another instance the difference is evident:

ba <- list(a,b)
f <- data.table(ba)
g <- setDT(ba)

str(f)
#Classes ‘data.table’ and 'data.frame': 2 obs. of  1 variable:
# $ ba:List of 2
#  ..$ : chr  "s" "t" "a" "c" ...
#  ..$ : num  1 4.14 7.28 10.42 13.57 ...
# - attr(*, ".internal.selfref")=<externalptr>

str(g)
#Classes ‘data.table’ and 'data.frame': 13 obs. of  2 variables:
# $ V1: chr  "s" "t" "a" "c" ...
# $ V2: num  1 4.14 7.28 10.42 13.57 ...
# - attr(*, ".internal.selfref")=<externalptr>

When should I use setDT() ? What makes setDT() relevant? Why not just make the original data.table() function capable of doing what setDT() is able to do?

setDT() is not a replacement for data.table() . It's a more efficient replacement for as.data.table() which can be used with certain types of objects.

mydata <- as.data.table(mydata) will copy the object behind mydata , convert the copy to a data.table , then change the mydata symbol to point to the copy.

setDT(mydata) will change the object behind mydata to a data.table . No copying is done.

So what's a realistic situation to use setDT() ? When you can't control the class of the original data. For instance, most packages for working with databases give data.frame output. In that case, your code would be something like

mydata <- dbGetQuery(conn, "SELECT * FROM mytable")  # Returns a data.frame
setDT(mydata)                                        # Make it a data.table

When should you use as.data.table(x) ? Whenever x isn't a list or data.frame . The most common use is for matrices.

链接地址: http://www.djcxy.com/p/68798.html

上一篇: 用Python编写一个列表到一个文件

下一篇: 什么时候应该使用setDT（）而不是data.table（）来创建data.table？