r-顶级程序员

Plot 3 categorical variables with dependencies on previous column

I have the following data that i want to plot using r (ggplot2 or any other package) QUESTION IS: What is the best method to visualize in R, to understand the relationship between those who are Happy = Yes, Pay = No and Donate = Yes (and other combinations). AreYouHappy | WouldYouPay | WouldYouDonate Yes | No | Yes Yes | No | No Yes | No | Yes Yes | Yes | Yes Yes | No | No No | Yes

2018-06-08 05:17:59

绘制3个与前一列相关的分类变量

我有以下数据，我想使用r（ggplot2或任何其他包）问题是：什么是最好的方法在R中可视化，以了解那些快乐=是，支付=否和捐赠=是（和其他组合）之间的关系。 AreYouHappy | WouldYouPay | WouldYouDonate Yes | No | Yes Yes | No | No Yes | No | Yes Yes | Yes | Yes Yes | No | No No | Yes | Yes Yes | No | No Yes | No | Yes ...[58 data points] [可重复的代码] df <- data.frame( cbind(AreYouH

2018-06-08 05:17:59

extend limit for qualitative variable in ggplot

Hi everyone and thanks for your consideration, The goal is to provide more space in situations where there are many, many x-axis labels. Note I don't care whether or not the labels themselves are visualized on the plot (I've excluded them in the code below). What I want to change is the fact that when there are ~1000 x-axis labels and 1000 data points in a typical geom_point plot, the

2018-06-08 05:16:57

扩展ggplot中定性变量的限制

大家好，感谢您的考虑，目标是在有许多许多x轴标签的情况下提供更多空间。注意我不在乎标签本身是否在图上可见（我已经在下面的代码中将它们排除在外）。我想要改变的是，当一个典型的geom_point图中有〜1000个x轴标签和1000个数据点时，与这些第一个和最后几个x轴标签相关的左边和右边的点被压制绘图区域的边缘。我想填补一些空间，所以这些点不会被挤压。我想知道在xi和xii不是数字的情况下是否有办法改变scale-x-

2018-06-08 05:16:57

Optimal/efficient plotting of survival/regression analysis results

I perform regression analyses on a daily basis. In my case this typically means estimation of the effect of continuous and categorical predictors on various outcomes. Survival analysis is probably the most common analysis that I perform. Such analyses are often presented in a very convenient way in journals. Here is an example: I wonder if anyone has come across any publicly availble functi

2018-06-08 05:15:55

存活/回归分析结果的最佳/有效绘图

我每天都进行回归分析。在我的情况下，这通常意味着估计连续和分类预测因子对各种结果的影响。生存分析可能是我执行的最常见的分析。这些分析通常在期刊中以非常方便的方式呈现。这里是一个例子：我想知道是否有人遇到任何可公开发布的功能或包：直接使用回归对象（ coxph ，lm，lmer，glm或任何你拥有的对象）绘制每个预测变量对森林图的影响，或者甚至允许绘制一组预测变量。对于分类预测变量也显示参考类

2018-06-08 05:15:55

R: How to visualize large and clumped scatter plot

status = sample(c(0, 1), 500, replace = TRUE) value = rnorm(500) plot(value) smoothScatter(value) I'm trying to make a scatterplot of value, but if I were to just plot it, the data is all clumped together and it's not very presentable. I've tried smoothScatter(), which makes the plot look a bit nicer, but I am wondering if there's a way to color code the values based on the

2018-06-08 05:14:53

R：如何可视化大的散乱图

status = sample(c(0, 1), 500, replace = TRUE) value = rnorm(500) plot(value) smoothScatter(value) 我正在试图创造一个有价值的散点图，但如果我只是绘制它，那么这些数据就会聚集在一起，而且它并不是很有现实感。我试过smoothScatter（），这让情节看起来更漂亮，但我想知道是否有一种方法根据相应的状态对值进行颜色编码？我试图看看状态和价值之间是否存在关系。什么是很好地呈现数据的另一种方式？我试

2018-06-08 05:14:49

Understanding dates and plotting a histogram with ggplot2 in R

Main Question I'm having issues with understanding why the handling of dates, labels and breaks is not working as I would have expected in R when trying to make a histogram with ggplot2. I'm looking for: A histogram of the frequency of my dates Tick marks centered under the matching bars Date labels in %Yb format Appropriate limits; minimized empty space between edge of grid

2018-06-08 05:13:43

理解日期并在R中用ggplot2绘制直方图

主要问题我在理解为什么处理日期，标签和中断时不能正常工作，因为我试图用ggplot2创建直方图时，会出现R中预期的问题。我在找：我日期频率的直方图勾号标记集中在匹配条的下方日期标签以%Yb格式显示适当的限制; 最大限度地减少网格空间边缘与最外面的条之间的空间我已将我的数据上传到pastebin以使其具有可重复性。我已经创建了几个列，因为我不确定最好的方法来做到这一点： > dates <- read.cs

2018-06-08 05:13:42

Functions available for Tufte boxplots in R?

I have some data that I've divided into enough groupings that standard boxplots look very crowded. Tufte has his own boxplots in which you basically drop all or half of box, like this: Some sample data: cw <- transform(ChickWeight, Time = cut(ChickWeight$Time,4) ) cw$Chick <- as.factor( sample(LETTERS[seq(3)], nrow(cw), replace=TRUE) ) levels(cw$Diet) <- c("Low Fat","Hi Fat"

2018-06-08 05:10:36

可用于R中Tufte boxplots的功能？

我有一些数据，我已经分成足够的分组，标准箱图显得非常拥挤。 Tufte有他自己的盒子，其中你基本放弃了全部或部分盒子，如下所示：一些示例数据： cw <- transform(ChickWeight, Time = cut(ChickWeight$Time,4) ) cw$Chick <- as.factor( sample(LETTERS[seq(3)], nrow(cw), replace=TRUE) ) levels(cw$Diet) <- c("Low Fat","Hi Fat","Low Prot.","Hi Prot.") 我想为每个饮食*时间*小鸡分组提供一个箱子重

2018-06-08 05:10:34

Why does X[Y] join of data.tables not allow a full outer join, or a left join?

This is a bit of a philosophical question about data.table join syntax. I am finding more and more uses for data.tables, but still learning... The join format X[Y] for data.tables is very concise, handy and efficient, but as far as I can tell, it only supports inner joins and right outer joins. To get a left or full outer join, I need to use merge : X[Y, nomatch = NA] -- all rows in Y -- ri

2018-06-08 05:08:29

为什么data.tables的X [Y]连接不允许完整的外连接或左连接？

这是关于data.table连接语法的一个哲学问题。我正在为data.tables找到越来越多的用途，但仍然在学习...... data.tables的连接格式X[Y]非常简洁，方便和高效，但据我所知，它只支持内连接和右外连接。要获得左或全外连接，我需要使用merge ： X[Y, nomatch = NA] - Y中的所有行 - 右外部X[Y, nomatch = NA] （默认） X[Y, nomatch = 0] - 只有X和Y都有匹配的行 - 内连接 merge(X, Y, all = TRUE) - 来自X和Y的所有行

2018-06-08 05:08:28

Combine two data frames with different number of rows in R

This question already has an answer here: How to join (merge) data frames (inner, outer, left, right)? 12 answers df1 <- data.frame(wpt = c(1, "meditate", "meditate", 2,3,"meditate"), ID = c(1235, 4562, 0928,6351,3826,0835)) df1$wpt <- as.character(df1$wpt) df2 <- data.frame(wpt = c(1,2,3), fuel = c(1235, 4562, 0928), distance = c(2,3,4)

2018-06-08 05:05:23

组合R中具有不同行数的两个数据帧

这个问题在这里已经有了答案：如何加入（合并）数据框架（内部，外部，左侧，右侧）？ 12个答案 df1 <- data.frame(wpt = c(1, "meditate", "meditate", 2,3,"meditate"), ID = c(1235, 4562, 0928,6351,3826,0835)) df1$wpt <- as.character(df1$wpt) df2 <- data.frame(wpt = c(1,2,3), fuel = c(1235, 4562, 0928), distance = c(2,3,4)) df2$wpt <- as.char

2018-06-08 05:05:22

match two columns with two other columns

I have several rows of data (tab separated). I want to find the row which matches elements from two columns (3rd & 4th) in each row with two other columns (10th & 11th) . For example, in row 1 , 95428891 & 95443771 in column 3 & 4 matches elements in columns 10 & 11 in row 19 . Similarly the reciprocal is also true. Elements in columns 3 & 4 in the 19th row also match

2018-06-08 05:04:21

将两列与另外两列进行匹配

我有几行数据（分隔标签）。我想找到匹配来自两列（第3和第4）在每行中的元素与另外两列（第 10和第11）的行。例如，在第1行中，第3＆4列中的 95428891＆95443771 与第19行中第 10和11列中的元素匹配。同样的倒数也是如此。第19行第 3和4 列中的元素也与第1行中第 10和11列中的元素相匹配。我需要能够遍历每行并输出相应匹配的行索引。有时可能有时只有一列匹配而不是两者匹配（因为有时候会有重复的数字），但是

2018-06-08 05:04:21

Efficient alternatives to merge for larger data.frames R

I am looking for an efficient (both computer resource wise and learning/implementation wise) method to merge two larger (size>1 million / 300 KB RData file) data frames. "merge" in base R and "join" in plyr appear to use up all my memory effectively crashing my system. Example load test data frame and try test.merged<-merge(test, test) or test.merged<-join(

2018-06-08 05:02:17

合并大数据的高效替代方案

我正在寻找一种有效的（计算机资源明智的和学习/实施明智的）方法来合并两个更大（大小> 1百万/ 300 KB的RData文件）数据帧。基础R中的“合并”和plyr中的“合并”似乎耗尽了我所有的内存，从而有效地崩溃了我的系统。例加载测试数据帧并尝试 test.merged<-merge(test, test) 要么 test.merged<-join(test, test, type="all") - 以下文章提供了一个合并和替代的列表：如何加入（合并）数据框架（

2018-06-08 05:02:17