How to Correctly Use Lists in R?

Brief background: Many (most?) contemporary programming languages in widespread use have at least a handful of ADTs [abstract data types] in common, in particular, string (a sequence comprised of characters) list (an ordered collection of values), and map-based type (an unordered array that maps keys to values) In the R programming language, the first two are implemented as character and

如何正确使用R中的列表?

简要背景:广泛使用的许多(大多数?)当代编程语言至少具有少数ADT [抽象数据类型],特别是, 字符串 (由字符组成的序列) 列表 (一个有序的值集合)和 基于地图的类型 (将键映射到值的无序数组) 在R编程语言中,前两个分别实现为character和vector 。 当我开始学习R时,几乎从一开始就有两件事情是明显的: list是R中最重要的数据类型(因为它是R data.frame的父类),其次,我无法理解它们是如何工作,至少不够

R foreach: from single

The following (simplified) script works fine on the master node of a unix cluster (4 virtual cores). library(foreach) library(doParallel) nc = detectCores() cl = makeCluster(nc) registerDoParallel(cl) foreach(i = 1:nrow(data_frame_1), .packages = c("package_1","package_2"), .export = c("variable_1","variable_2")) %dopar% { row_temp = data_frame_1[i,] function(argument_1 = row_te

R foreach:来自单身

以下(简化)脚本在unix集群的主节点(4个虚拟内核)上正常工作。 library(foreach) library(doParallel) nc = detectCores() cl = makeCluster(nc) registerDoParallel(cl) foreach(i = 1:nrow(data_frame_1), .packages = c("package_1","package_2"), .export = c("variable_1","variable_2")) %dopar% { row_temp = data_frame_1[i,] function(argument_1 = row_temp, argument_2 = variable_1, argument_3

parallel regression in R (maybe with snowfall)

I'm trying to run R in parallel to run a regression. I'm trying to use the snowfall library (but am open to any approach). Currently, I'm running the following regression which is taking an extremely long time to run. Can someone show me how to do this? sales_day_region_ctgry_lm <- lm(log(sales_out+1)~factor(region_out) + date_vector_out + factor(date_vector_out)

R中的并行回归(可能与降雪有关)

我试图并行运行R来运行回归。 我正在尝试使用降雪库(但我愿意接受任何方法)。 目前,我正在运行以下回归,运行时间非常长。 有人可以告诉我如何做到这一点? sales_day_region_ctgry_lm <- lm(log(sales_out+1)~factor(region_out) + date_vector_out + factor(date_vector_out) + factor(category_out) + mean_temp_out) 我已经开始走下面的道路: library(snowfall) sfInit(parallel =

How to see how many nodes a process is using on a cluster with Sun grid engine?

I am (trying to) run R on a multicore computing cluster with a Sun grid engine. I would like to run R in parallel using the MPI environment and the snow / snowfall parLapply() functions. My code is working at least on my laptop, but to be sure whether it does what it is supposed to on the cluster as well, I have the following questions. If I request a number of slots / nodes, say 4, how can I

如何查看某个进程在具有Sun Grid Engine的群集上使用的节点数量?

我(试图)在具有Sun Grid Engine的多核计算群集上运行R. 我想使用MPI环境和snow / snowfall parLapply()函数并行运行R. 我的代码至少在我的笔记本电脑上工作,但要确定它是否能够在集群上实现它,我有以下问题。 如果我请求多个插槽/节点,比如说4,我如何检查一个正在运行的进程是否实际使用了所需的全部CPU数量? 是否有可以显示有关进程的请求节点上的CPU使用情况的详细信息? 为了验证群集工作人员是否真的在适当

parallel processing in R using snow

I have 1000's of list and each list has multiple time series. I would like to apply forecasting to each element in the list. This has became an intractable problem interms of computing resources. I don't have backgrounder in parallel computing or advanced R programming. Any help would be greatly appreciated. I have created dummy list. Basically, dat.list is similar to what I'm

在R中使用雪并行处理

我有1000个名单,每个名单有多个时间序列。 我想将预测应用于列表中的每个元素。 这已成为计算资源的难题。 我没有并行计算或高级R编程背景。 任何帮助将不胜感激。 我创建了虚拟列表。 基本上,dat.list与我正在处理的类似。 library("snow") library("plyr") library("forecast") ## Create Dummy Data z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1961, 1), frequency = 12) lam <- 0.8

How can I directly pass a process from local R to an Amazon EC

I've been looking into running R on EC2, but I'm wondering what the deal is with parallel/cluster computing is with this setup. I've had a look around but I haven't been able to find a tutorial for this. Basically what I'm looking to do is have R (Rstudio) running on my laptop, and do most of the work on that, but then when I have a big operation to run, explicitly pass it

我如何直接将本地R的流程传递给Amazon EC

我一直在考虑在EC2上运行R,但是我想知道在并行/群集计算中使用这种设置是怎样的。 我看了一下,但我一直无法找到这个教程。 基本上我想要做的是让R(Rstudio)在我的笔记本电脑上运行,并完成大部分工作,但是当我有一个很大的操作要运行时,明确地将它传递给AWS Slave实例以完成所有工作沉重的举重。 据我所知,雪/降雪包似乎是答案...但我不确定如何。 我正在使用http://bioconductor.org/help/bioconductor-cloud-ami

R connecting to EC2 instance for parallel processing

I am having trouble initialising a connection to an AWS EC2 instance from R as I seem to keep getting the error: Permission denied (publickey) I am currently using a Mac OS X 10.6.8 as my OS The code that I try to run in the terminal ($) and then R (>) is as follows: $ R --vanilla > require(snowfall) > sfInit(parallel=TRUE,socketHosts =list("ec2-xx-xxx-xx-xx.zone.compute.amazonaws.com

R连接到EC2实例进行并行处理

我无法初始化从R的AWS EC2实例的连接,因为我似乎不断收到错误: Permission denied (publickey)我目前使用Mac OS X 10.6.8作为我的操作系统 我尝试在终端($)和R(>)中运行的代码如下所示: $ R --vanilla > require(snowfall) > sfInit(parallel=TRUE,socketHosts =list("ec2-xx-xxx-xx-xx.zone.compute.amazonaws.com")) Permission denied (publickey) 但奇怪的是,当试图ssh进入实例时,我不需要密码,因为

Using AWS for parallel processing with R

I want to take a shot at the Kaggle Dunnhumby challenge by building a model for each customer. I want to split the data into ten groups and use Amazon web-services (AWS) to build models using R on the ten groups in parallel. Some relevant links I have come across are: The segue package; A presentation on parallel web-services using Amazon. What I don't understand is: How do I get th

使用AWS与R并行处理

我想通过为每个客户建立一个模型来解决Kaggle Dunnhumby挑战。 我想将数据拆分为10个组,并使用Amazon Web服务(AWS)并行地在十个组上使用R构建模型。 我遇到的一些相关链接是: segue包; 关于使用Amazon的并行Web服务的介绍。 我不明白的是: 我如何获得数据到十个节点? 如何发送和执行节点上的R功能? 如果您能分享一些建议和提示,让我指出正确的方向,我将非常感激。 PS我在AWS上使用免费的使用账户,但在

: Run code first time a package is installed or used

I am busy writing a package for a customer with little knowledge about R. Given their complex data structure, I need to set up a "data base" within R containing tons of information obtained from a set of spreadsheets they get from another company. As they can't install SQL or so on their computers (ICT has some power control issues...), I've written an emulation in R, based on

:第一次安装或使用软件包时运行代码

我正在为一个对R知之甚少的客户忙着编写一个软件包。鉴于他们复杂的数据结构,我需要在R中建立一个“数据库”,其中包含从其他公司获得的一组电子表格中获得的大量信息。 由于他们无法在他们的计算机上安装SQL等(ICT有一些电源控制问题...),我已经根据特定的目录结构在R中编写了仿真。 现在我想自动运行它,但只是第一次加载包。 就像.First.lib ,但是然后.VeryFirst 。 第一次加载软件包时如何加载一段代码的任何想法?

Extract information from conditional formula

I'd like to write an R function that accepts a formula as its first argument, similar to lm() or glm() and friends. In this case, it's a function that takes a data frame and writes out a file in SVMLight format, which has this general form: <line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info> <tar

从条件公式中提取信息

我想写一个接受公式作为第一个参数的R函数,类似于lm()或glm()和朋友。 在这种情况下,它是一个函数,它接收一个数据帧并以SVMLight格式写出一个文件,其格式如下: <line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info> <target> .=. +1 | -1 | 0 | <float> <feature> .=. <integer> | "qid" <va