Errors related to data frame columns during merging

The following piece of code is supposed to load and prepare datasets from a specified directory for further data analysis. The problem is that the code generates the following errors during attempts to merge data (one for each merging option). I'm confused about what is going on here. However, my gut feeling tells me that the errors might be due to the absence of column names in some data

合并期间与数据帧列相关的错误

下面的一段代码应该从指定的目录加载和准备数据集以进行进一步的数据分析。 问题是,在尝试合并数据期间,代码会生成以下错误 (每个合并选项一个)。 我对这里发生的事情感到困惑。 但是,我的直觉告诉我,错误可能是由于某些数据框中没有列名。 我希望澄清。 另外,请告知首选合并选项 (#1和#2之间)。 谢谢! 更新2 (以最小的可重现示例重写,删除以前的版本): 当前错误 (启用合并选项1): Error in fix.by

Raise exception in de

I'm currently writing an R script to anonimize data gathered from social media. There's one column which includes the (digital) name of the author and I'm trying to anonimize this column. I found this script on Stackoverflow already: anonymiseColumns <- function(df, colIDs) { id <- if(is.character(colIDs)) match(colIDs, names(df)) else colIDs for(id in col

在de中引发异常

我目前正在编写一个R脚本来对从社交媒体收集的数据进行匿名化处理。 有一列包含作者的(数字)名称,我正试图对此专栏进行匿名化处理。 我在Stackoverflow上发现了这个脚本: anonymiseColumns <- function(df, colIDs) { id <- if(is.character(colIDs)) match(colIDs, names(df)) else colIDs for(id in colIDs) { prefix <- sample(LETTERS, 1) suffix <- as.char

Replacing NAs with latest non

In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a data.frame ) is the following: > y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA) I would like a function fill.NAs() that allows me to construct yy such that: > yy [1] NA NA NA 2 2 2 2 3 3 3 4 4 I need to repeat this o

用最新的非NAs代替NAs

在data.frame(或data.table)中,我想“填充”具有最近的非NA值的NA。 一个简单的例子,使用矢量(而不是data.frame )如下: > y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA) 我想要一个函数fill.NAs() ,它允许我构造yy ,使得: > yy [1] NA NA NA 2 2 2 2 3 3 3 4 4 我需要对许多(总计data.frame )小尺寸数据data.frame (~ data.frame )重复这种操作,其中一行是NA,它的所有条目都是。 什么是解

fread data.table in R doesn't read in column names

When reading in a data file into R, I can read it in either as a data.frame or a data.table using the data.table package. I would prefer to use data.table in the future since it deals with large data better. However, there are issues with both methods ( read.table for data.frames, fread for data.tables) and I'm wondering if there's a simple fix out there. When I use read.table to prod

R中的fread data.table不读入列名

在将数据文件读入R时,可以使用data.table包将其作为data.frame或data.table data.table 。 由于更好地处理大数据,我宁愿将来使用data.table 。 但是,这两种方法都有问题(data.frames为read.table ,data.tables为fread ),我想知道是否有简单的修复方法。 当我使用read.table生成一个data.frame ,如果我的列名包含冒号或空格,它们会被句点替换,而我不想。 我希望列名按“原样”读取。 或者,当我使用fread生成data.

Calculating percentages of a factor variable with dplyr

I am trying to calculate percentages/counts of each level of a factor variable in a data frame within dplyr, kind of like using table, and while I can do this manually, this becomes tedious if I have many factor variables or the factor variable has many levels. Example: set.seed(100) data <- data.frame(groupbyvar = LETTERS[1:4], var1 = letters[1:4], var2 = as.f

用dplyr计算因子变量的百分比

我试图计算dplyr中数据框中每个因子变量的百分比/计数,有点像使用表格,虽然我可以手动执行此操作,但如果我有许多因子变量或因子变量具有这种情况,这会变得单调乏味许多级别。 例: set.seed(100) data <- data.frame(groupbyvar = LETTERS[1:4], var1 = letters[1:4], var2 = as.factor(sample(1:4,12,TRUE))) data %>% group_by(groupbyvar) %>% summarise(var1_a = mean(var1 =

by followed by factor in mutate

In the data.frame at the bottom of this question, I want to compute the number of individuals by year and age, create a year-class variable that is the year minus the age, and then create factor versions of the age and year-class variables. Using group_by() , summarize() , and mutate() from dplyr and factor() from base RI get the following result: d <- group_by(d,year,age) %>% summarize(c

通过随后的因子进行变异

在这个问题底部的data.frame中,我想按年份和年龄计算个人数量,创建年份减年龄的年份变量,然后创建年龄和年份的因子版本,类变量。 使用group_by() ,从base RI的dplyr和factor() summarize()和mutate() ,得到以下结果: d <- group_by(d,year,age) %>% summarize(catch=n()) d1 <- mutate(d,yrclass=year-age,fage=as.factor(age),fyrclass=as.factor(yrclass)) d1 Source: local data frame [27 x 6] Groups: ye

Dynamically select data frame columns using $ and a vector of column names

I wish to order a data frame based on different columns, one at a turn. I have a character vector with the relevant column names on which the order should be based: parameter <- c("market_value_LOCAL", "ep", "book_price", "sales_price", "dividend_yield", "beta", "TOTAL_RATING_SCORE", "ENVIRONMENT", "SOCIAL", "GOVERNANCE") I wish to loop over the names in parameter and dynamic

使用$和列名称向量动态选择数据框列

我希望根据不同的列排序数据框,一个轮到一个。 我有一个字符向量与order应该基于的相关列名称: parameter <- c("market_value_LOCAL", "ep", "book_price", "sales_price", "dividend_yield", "beta", "TOTAL_RATING_SCORE", "ENVIRONMENT", "SOCIAL", "GOVERNANCE") 我希望遍历parameter的名称并动态选择要用于order数据的列: Q1_R1000_parameter <- Q1_R1000[order(Q1_R1000$parameter[X]), ] X

How to create an example (anonymous) R script to provide a reproducible example?

This is related to: How to create example data set from private data (replacing variable names and levels with uninformative place holders)? Which stems from : How to make a great R reproducible example? I have come to realize that having an anonymous data set (were the data and labels are uninformative but consistent with the original data) is half the battle of producing a reproducible

如何创建一个示例(匿名)R脚本来提供可重现的示例?

这涉及到: 如何从私人数据创建示例数据集(用无用的占位符替换变量名称和级别)? 源于: 如何做一个伟大的R可重现的例子? 我已经认识到,拥有一个匿名数据集(数据和标签是无信息的,但与原始数据一致的)是通过脚本和数据生成可重复的示例(用于问题或错误报告)的一半的战斗,而这些是您不能共享的(如专有信息,未发表的发现等)。 任何建议如何自动化脚本的翻译,以便它匹配使用此计算器答案中提供的答案之一创

Location and value for consecutive values above threshold

I need to find where my data are reaching a threshold for consecutive days. I'm looking for 4 consecutive observations above the threshold. I want to return the location of the first observation of the series that meets these criteria. Here is an example data set: eg = structure(list(t.date = structure(c(1L, 2L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .

连续值高于阈值的位置和值

我需要找到我的数据在连续几天达到阈值的位置。 我正在寻找高于阈值的4个连续观察值。 我想返回符合这些标准的系列的第一个观察点的位置。 以下是一个示例数据集: eg = structure(list(t.date = structure(c(1L, 2L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("4/30/11", "5/1/11", "5/10/11", "5/11/11", "5/12/11", "5/13/11", "5/14/11", "5/15/11", "5/16/11", "5/17/

Wrapper functions for data.table

I have a project that has already been written using context of data.frame. In order to improve calc times I'm trying to leverage the speed of using data.table instead. My methodology for this has been to construct wrapper functions that read in frames, convert them to tables, do the calculations and then convert back to frames. Here's one of the simple examples... FastAgg<-functio

data.table的包装函数

我有一个已经使用data.frame的上下文编写的项目。 为了提高计算时间,我试图利用data.table的速度来代替。 我的方法是创建包装函数,读入框架,将它们转换为表格,进行计算,然后转换回框架。 这是一个简单的例子... FastAgg<-function(x, FUN, aggFields, byFields = NULL, ...){ require('data.table') y<-setDT(x) y<-y[,lapply(X=.SD,FUN=FUN,...),.SDcols = aggFields,by=byFields] y<-data.frame(y