Plotting data with Multiple Conditions on a Single Chart

I am attempting to make a plot using ggplot2 with side by side bars generated from certain conditions that can be calculated from the data. I suspect the problem is formatting my data properly so that ggplot will give me what I want. I can't for the life of me get it right though.

What I have is data frame filled with rows for each time a student takes a course at a school. The variables of interest are Student.ID, Course.ID, Session, Fiscal.Year, and Facility. Each row is an occurrence of a student taking a course and tells what course they took, where they took it, etc. As far as I know, this is what's required for the data to be in long form (correct me if I'm wrong). The only field with possible NA values is the Facility, but I plan to exclude those from the plot anyways so you can treat the data frame as being completely filled.

What I want to do is produce a plot showing by fiscal year how many courses had <= 2 students, how many had < 4 students, and how many had <= 4 students, and how many courses were offered total. (Note: When I'm talking about how many courses were offered, I'm taking into account that each course may be offered multiple times and each time it's offered it has a session number associated with it. The tricky part is that the session numbers are not unique. I hope that makes sense, and I can try to clarify more if needed.)

I envision the final product being multiple charts using facet on the locations, x-axis being Fiscal.Year, and the y-axis being the number of courses/sessions. For each FY in the chart, I want different colored bars stacked side by side showing the numbers of <2, <4, <=4, total courses offered for that FY at that location. Consider the following chart, only instead of "Income, Expense, Loans", I want "<=2, <4, <=4, Total" (they would also be ascending from left to right, since there is inclusion between the different categories).

Here is some sample data to work with (typed as CSV since I can't just copy the head of the file). I've excluded the Facility column because faceting by that is easy and we can just assume one FY for a test example I think. For reference, it should have 3 courses with <=2 students, 5 courses with < 4, and 6 with <= 4. The total number of courses offered in this sample set is 6.

ID,CourseID,Session,Fiscal.Year 101,1,,1,FY13 102,1,1,FY13 103,1,1,FY13 104,1,1,FY13 101,2,1,FY13 102,2,1,FY13 103,2,1,FY13 101,2,2,FY13 102,2,2,FY13 103,2,2,FY13 101,3,1,FY13 102,3,1,FY13 101,3,2,FY13 102,3,2,FY13 101,3,3,FY13 102,3,3,FY13

I have tried:

  • Creating a new data frame using ddply with columns Course.ID, Session, FY, Facility, Count of Students. Then I used created a new column called "TwoLess", which just has a 1 if the count is <=2 and 0 otherwise. (I repeated this process for the other conditions, creating new columns for the others as well similarly.) Using the ggplot code below I was able to get a faceted plot for only one of the conditions (ie: only <=2 students), but wasn't able to get them to combine. I believe the following is the equivalent code used, changed to reflect my test set above:
  • ggplot(na.omit(df), aes(y = TwoLess, x = Fiscal.Year)) + geom_bar(stat = 'identity') + facet_wrap(~Facility)

    I am thinking this approach is heavily flawed and I'm missing out on some of the "niceness" of having data in long form, since that's what ggplot wants as I understand it.

    What is the best way to approach plotting this in ggplot?

    It's also worth mentioning that while I have access to some of the more popular packages like ggplot2, plyr, reshape2, I do not have the ability to load all packages so I would prefer a solution that uses the above packages (or any of their dependencies). It shouldn't be that large of a restriction, I don't think.


    Would something like this help?

    Extending your data

    > dput(df)
    structure(list(ID = c(101L, 102L, 103L, 104L, 101L, 102L, 103L, 
    101L, 102L, 103L, 101L, 102L, 101L, 102L, 101L, 102L, 101L, 102L, 
    103L, 104L, 101L, 102L, 103L, 101L, 102L, 103L, 101L, 102L, 101L, 
    102L, 101L, 102L, 101L, 102L, 103L, 104L, 101L, 102L, 103L, 101L, 
    102L, 103L, 101L, 102L, 101L, 102L, 101L, 102L), CourseID = c(1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), 
        Session = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
        2L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
        1L, 2L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
        1L, 1L, 2L, 2L, 3L, 3L), Fiscal.Year = c("FY13", "FY13", 
        "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", 
        "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY14", "FY14", 
        "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", 
        "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY15", "FY15", 
        "FY15", "FY15", "FY15", "FY15", "FY15", "FY15", "FY15", "FY15", 
        "FY15", "FY15", "FY15", "FY15", "FY15", "FY15")), .Names = c("ID", 
    "CourseID", "Session", "Fiscal.Year"), class = "data.frame", row.names = c(NA, 
    -48L))
    
    df
        ID CourseID Session Fiscal.Year
    1  101        1       1        FY13
    2  102        1       1        FY13
    3  103        1       1        FY13
    4  104        1       1        FY13
    5  101        2       1        FY13
    6  102        2       1        FY13
    7  103        2       1        FY13
    8  101        2       2        FY13
    9  102        2       2        FY13
    10 103        2       2        FY13
    11 101        3       1        FY13
    12 102        3       1        FY13
    13 101        3       2        FY13
    14 102        3       2        FY13
    15 101        3       3        FY13
    16 102        3       3        FY13
    17 101        1       1        FY14
    18 102        1       1        FY14
    19 103        1       1        FY14
    20 104        1       1        FY14
    21 101        2       1        FY14
    22 102        2       1        FY14
    23 103        2       1        FY14
    24 101        2       2        FY14
    25 102        2       2        FY14
    26 103        2       2        FY14
    27 101        3       1        FY14
    28 102        3       1        FY14
    29 101        3       2        FY14
    30 102        3       2        FY14
    31 101        3       3        FY14
    32 102        3       3        FY14
    33 101        1       1        FY15
    34 102        1       1        FY15
    35 103        1       1        FY15
    36 104        1       1        FY15
    37 101        2       1        FY15
    38 102        2       1        FY15
    39 103        2       1        FY15
    40 101        2       2        FY15
    41 102        2       2        FY15
    42 103        2       2        FY15
    43 101        3       1        FY15
    44 102        3       1        FY15
    45 101        3       2        FY15
    46 102        3       2        FY15
    47 101        3       3        FY15
    48 102        3       3        FY15
    

    Summarise it with dplyr

    d1 <- df %>%
      group_by(CourseID, Session, Fiscal.Year) %>%
      summarise(n=length(ID))
    

    And again

    d2 <- d1 %>%
      group_by(Fiscal.Year) %>%
      summarise(d1 = length(n[n <= 2]),
                d2 = length(n[n <  4]),
                d3 = length(n[n <= 4])
      )
    library(reshape2)
    d3 <- melt(d2)
    ggplot(d3, aes(Fiscal.Year, value, fill = variable)) +
      geom_bar(stat = 'identity', position = 'dodge')
    

    to plot it with ggplot2

    Someone must provide a clever option. I'm tired. Go to bed now.

    链接地址: http://www.djcxy.com/p/30930.html

    上一篇: GGally包中的平行坐标图的颜色离散组

    下一篇: 在单个图表上绘制多个条件的数据