loop to something more R

Been using SO as a resource constantly for my work. Thanks for holding together such a great community. I'm trying to do something kinda complex, and the only way I can think to do it right now is with a pair of nested for-loops (I know that's frowned upon in R)... I have records of three million-odd course enrollments: student UserID's paired with CourseID's. In each row, the

循环到更多的R

一直以来,我一直在使用SO作为资源。 感谢你们共同拥有这样一个伟大的社区。 我试图做一些有点复杂的事情,我现在想的唯一办法就是用一对嵌套for循环(我知道这在R中被折磨)......我有300万条记录 - 奇数课程注册:学生用户ID与课程ID配对。 在每一行中,都有一堆数据,包括开始/结束日期和分数等等。 我需要做的是,对于每个注册,计算该用户在注册课程之前参加的课程的平均分数。 我用for循环的代码如下所示: data$Me

Error when uploading package to CRAN incoming: 550 access denied

I'm trying to upload a package on CRAN for its first release, but I can't get past the FTP upload. It seems I do not have write access to ftp://cran.r-project.org/incoming: 550 Access is denied. Could not download /home/roudierp/Documents/CODE/lhs/fresh_meat/clhs_0.4-2.tar.gz from local filesystem There were 1 files or directories that could not be transferred. Check the log for which

将包上传到CRAN传入时出错:550访问被拒绝

我试图在CRAN上发布第一个版本的软件包,但我无法通过FTP上传。 看来我没有对ftp://cran.r-project.org/incoming的写入权限: 550 Access is denied. Could not download /home/roudierp/Documents/CODE/lhs/fresh_meat/clhs_0.4-2.tar.gz from local filesystem There were 1 files or directories that could not be transferred. Check the log for which items were not properly transferred. 我尝试了两个文件浏览器(

Numeric comparison difficulty in R

I'm trying to compare two numbers in R as a part of a if-statement condition: (ab) >= 0.5 In this particular instance, a = 0.58 and b = 0.08... and yet (ab) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related: (a - b) == 0.5) is false, while all.equal((a - b), 0.5) is true. The only solution I can think of is to have

R中的数字比较难度

我试图将R中的两个数字作为if语句条件的一部分进行比较: (ab) >= 0.5 在这个特定的例子中,a = 0.58和b = 0.08 ...,而(ab) >= 0.5是错误的。 我意识到使用==进行确切的数字比较的危险,这似乎是相关的: (a - b) == 0.5)是假的,而 all.equal((a - b), 0.5)为真。 我能想到的唯一解决方案是有两个条件: (ab) > 0.5 | all.equal((ab), 0.5) (ab) > 0.5 | all.equal((ab), 0.5) 。 这是有效的,但这真

zero null hypothesis using R

I'm testing the correlation between two variables: set.seed(123) x <- rnorm(20) y <- x + x * 1:20 cor.test(x, y, method = c("spearman")) which gives: Spearman's rank correlation rho data: x and y S = 54, p-value = 6.442e-06 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.9594 The p-value is testing the null hypothesis that the correlation is zero.

使用R的零零假设

我正在测试两个变量之间的相关性: set.seed(123) x <- rnorm(20) y <- x + x * 1:20 cor.test(x, y, method = c("spearman")) 这使: Spearman's rank correlation rho data: x and y S = 54, p-value = 6.442e-06 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.9594 p值正在测试相关性为零的零假设。 是否有R函数可以让我测试一个不同的零假设 - 说相关性小于或等于0.3?

Using Maven for R projects

I am beginning work on a project that makes heavy use of R. I've used R in the past, but only in a casual mode, whereas I'm now interested in following a more rigid practice of test/source control/continuous integration. I'm hoping to use Maven with this project if possible (having been pleased with how this manages packages with Java), but I can't find any evidence that it is po

使用Maven for R项目

我正在开始一个大量使用R的项目。我过去使用过R,但只是在休闲模式下工作,而现在我正在考虑遵循更严格的测试/源代码控制/持续集成的实践。 如果可能的话,我希望在这个项目中使用Maven(对使用Java管理包的方式感到满意),但我找不到任何证据表明可以使用Maven与R.是否有可能创建一个R与Maven合作,如果可以,我可以在哪里找到帮助我开始的步骤? 我发现了这个问题和这个问题,但他们没有提到R. 那么你可以利用Maven Exec

How does setTimeLimit work in R?

I am trying to master setTimeLimit() in R and my experience has led to several related questions, so maybe the fundamental question is: how does this really work? (I have been looking at evalWithTimeout() from R.utils as well, and it may suit my purposes slightly better, but it's built on this function.) Here are the key things I am trying to figure out: How does it monitor the elapsed t

setTimeLimit如何在R中工作?

我想在R中掌握setTimeLimit() ,并且我的经验已经导致了几个相关的问题,所以也许最基本的问题是:这是如何工作的? (我一直在寻找来自R.utils evalWithTimeout() ,它可能适合我的目的稍微好一些,但是它建立在这个函数上。) 以下是我想要弄清楚的关键事项: 它如何监控经过的时间? 即它似乎被插入到流量控制中,那么它是如何做到的? 能够拥有“背景”流程非常酷,可以用于报告状态,检查点等等。 我能确定有多少时

The best way to import R object into Python?

I know that there is a Python package that imports RData file. But I was wondering if that is the best option for me. I have Dataframes in R that I want to use in Python. I was wondering if I should save this as json or csv and then read with pandas in Python, or I should just save it as RData and use the rpy2 package. All I need is just turn these R dataframes into Python data frame, so

将R对象导入Python的最佳方法是什么?

我知道有一个导入RData文件的Python包。 但我想知道这对我来说是否是最好的选择。 我在R中有Dataframes,我想在Python中使用。 我想知道是否应该将它保存为json或csv,然后用Python中的pandas进行读取,或者我应该将它保存为RData并使用rpy2包。 我所需要的只是将这些R数据框转换为Python数据框,这样我就可以操纵和结合我在Python中计算的其他结果... 你可以使用feather 。 它是数据框架的数据格式(由@Wes McKinne

huge size in model output from train function in r caret package

I am training a bagFDA model using train() function in r caret package, and save the model output as a .Rdata file. the input file is about 300k records with 26 variables, but the output .Rdata has a size of 3G. I simply run the following: modelout <- train(x,y,method="bagFDA") save(file= "myout.Rdata", modelout) under a window system. question: (1) why myout.Rdata is so

在r caret包中的train功能的模型输出中有巨大的尺寸

我正在使用r caret包中的train()函数训练bagFDA模型,并将模型输出保存为.Rdata文件。 输入文件大约有300k个记录,包含26个变量,但输出.Rdata的大小为3G。 我只需运行以下命令:在窗口系统下运行modelout < - train(x,y,method =“bagFDA”)save(file =“myout.Rdata”,modelout)。 问题:(1)为什么myout.Rdata是如此之大? (2)如何缩小文件的大小? 提前致谢! JT 在trainControl中为starters设置了ret

should .RData files be used to store functions?

I use .RData files to store objects (eg lists, vectors, etc) then call them into other scripts, but I'm wondering whether they should also be used to store functions (most likely user-defined functions)? I know source() is generally recommended for this purpose (and creating packages even more so), but an advantage as I see it is that a single .RData file can contain multiple objects - a li

应该使用.RData文件来存储函数?

我使用.RData文件来存储对象(例如列表,向量等),然后将它们调用到其他脚本中,但是我想知道它们是否也应该用于存储函数(很可能是用户定义的函数)? 我知道source()通常被推荐用于这个目的(并且更加创建packages ),但是我认为一个优点是单个.RData文件可以包含多个对象 - 例如列表,数据框和函数。 保存需要使用load()来调用对象,然后单独使用source()函数。 有没有理由对这种方法保持谨慎,我没有看到? 谢谢

Searching a functions source code

In R, you can view the source of a function as a function is simply another object. I am looking for a way to search through this source code, without knowing the file that the source is saved in. For example, I might want to know if the function shapiro.test contains the function sort (it does). If shapiro.test was a string or a vector of strings I would use grep('sort', shapiro.test) Bu

搜索函数源代码

在R中,您可以查看函数的来源,因为函数只是另一个对象。 我正在寻找一种方法来搜索这个源代码,而不知道源文件保存在哪里。 例如,我可能想知道函数shapiro.test包含函数sort (它的作用)。 如果shapiro.test是一个字符串或一串字符串,我会使用 grep('sort', shapiro.test) 但是,由于shapiro.test是一个函数,因此会给出错误“as.character(x)中的错误:不能强制类型'closure'变为'character'类型