首页 文章

用ggplot2和R创建一个Pareto图表

提问于
浏览
19

我一直在努力学习如何使用ggplot2包在R中制作Pareto Chart . 在制作条形图或直方图的许多情况下,我们需要按X轴排序的项目 . 在帕累托图中,我们希望按Y轴中的值降序排序的项目 . 有没有办法让ggplot绘制由Y轴上的值排序的项目?我首先尝试排序数据框,但似乎ggplot重新排序它们 .

例:

val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt")
val<-with(val, val[order(-Value), ])
p <- ggplot(val)
p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1")

数据帧val已排序,但输出如下所示:

alt text http://www.cerebralmastication.com/wp-content/uploads/2009/11/exp.png

哈德利正确地指出,这会产生一个更好的图形来显示实际与预测:

ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual"))

返回:

alt text http://www.cerebralmastication.com/wp-content/uploads/2009/11/exp1.png

但它仍然不是帕累托图 . 有小费吗?

7 回答

  • 1

    ggplot2中的条形按因子中的级别顺序排序 .

    val$State <- with(val, factor(val$State, levels=val[order(-Value), ]$State))
    
  • 23

    对数据进行子集和排序;

    valact <- subset(val, variable=='actual')
    valsort <- valact[ order(-valact[,"Value"]),]
    

    从那里它只是一个标准 boxplot() ,顶部有一个非常手动的累积功能:

    op <- par(mar=c(3,3,3,3)) 
    bp <- barplot(valsort [ , "Value"], ylab="", xlab="", ylim=c(0,1),    
                  names.arg=as.character(valsort[,"State"]), main="How's that?") 
    lines(bp, cumsum(valsort[,"Value"])/sum(valsort[,"Value"]), 
          ylim=c(0,1.05), col='red') 
    axis(4)
    box() 
    par(op)
    

    它应该是这样的

    alt text http://dirk.eddelbuettel.com/misc/jdlong_pareto.png

    并且它甚至不需要过度绘制技巧,因为 lines() 愉快地注释了初始情节 .

  • 3

    ggplot2中的传统帕累托图.......

    阅读Cano,E.L . ,Moguerza,J.M . ,&Redchuk,A . (2012)后开发 . Six Sigma with R.(G . Robert,K . Hornik,&G . Parmigiani,Eds . )Springer .

    library(ggplot2);library(grid)
    
    counts  <- c(80, 27, 66, 94, 33)
    defects <- c("price code", "schedule date", "supplier code", "contact num.", "part num.")
    dat <- data.frame(count = counts, defect = defects, stringsAsFactors=FALSE )
    dat <- dat[order(dat$count, decreasing=TRUE),]
    dat$defect <- factor(dat$defect, levels=dat$defect)
    dat$cum <- cumsum(dat$count)
    count.sum<-sum(dat$count)
    dat$cum_perc<-100*dat$cum/count.sum
    
    p1<-ggplot(dat, aes(x=defect, y=cum_perc, group=1))
    p1<-p1 + geom_point(aes(colour=defect), size=4) + geom_path()
    
    p1<-p1+ ggtitle('Pareto Chart')+ theme(axis.ticks.x = element_blank(), axis.title.x = element_blank(),axis.text.x = element_blank())
    p1<-p1+theme(legend.position="none")
    
    p2<-ggplot(dat, aes(x=defect, y=count,colour=defect, fill=defect))
    p2<- p2 + geom_bar()
    
    p2<-p2+theme(legend.position="none")
    
    plot.new()
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(2, 1)))
    print(p1, vp = viewport(layout.pos.row = 1,layout.pos.col = 1))
    print(p2, vp = viewport(layout.pos.row = 2,layout.pos.col = 1))
    
  • 15

    举个简单的例子:

    > data
        PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9    PC10 
    0.29056 0.23833 0.11003 0.05549 0.04678 0.03788 0.02770 0.02323 0.02211 0.01925
    

    barplot(data) 做得对

    ggplot等效"should be": qplot(x=names(data), y=data, geom='bar')

    但是,这会错误地按字母顺序对条形图进行重新排序/排序...因为这就是 levels(factor(names(data))) 的排序方式 .

    解决方案: qplot(x=factor(names(data), levels=names(data)), y=data, geom='bar')

    唷!

  • 7

    另外,请参阅包含qcc的包__具有函数 pareto.chart() . 看起来它也使用基本图形,所以开始你的ggplot2解决方案的赏金:-)

  • 4

    为简化起见,我们只考虑估算 .

    estimates <- subset(val, variable == "estimate")
    

    首先,我们重新排序因子水平,以便按 Value 的降序绘制 State .

    estimates$State <- with(estimates, reorder(State, -Value))
    

    同样,我们重新排序数据集并计算累积值 .

    estimates <- estimates[order(estimates$Value, decreasing = TRUE),]
    estimates$cumulative <- cumsum(estimates$Value)
    

    现在我们准备绘制情节了 . 在同一轴上获取直线和条的技巧是将State变量(一个因子)转换为数字 .

    p <- ggplot(estimates, aes(State, Value)) + 
      geom_bar() +
      geom_line(aes(as.numeric(State), cumulative))
    p
    

    正如问题所述,试图绘制两个彼此相邻的变量组的帕累托图非常简单 . 如果你想要多个帕累托图,你最好不要使用刻面 .

  • 0
    freqplot = function(x, by = NULL, right = FALSE)
    {
    if(is.null(by)) stop('Valor de "by" precisa ser especificado.')
    breaks = seq(min(x), max(x), by = by )
    ecd = ecdf(x)
    den = ecd(breaks)
    table = table(cut(x, breaks = breaks, right = right))
    table = table/sum(table)
    
    intervs = factor(names(table), levels = names(table))
    freq = as.numeric(table/sum(table))
    acum = as.numeric(cumsum(table))
    
    normalize.vec = function(x){
      (x - min(x))/(max(x) - min(x))
    }
    
    dados = data.frame(classe = intervs, freq = freq, acum = acum, acum_norm = normalize.vec(acum))
    p = ggplot(dados) + 
      geom_bar(aes(classe, freq, fill = classe), stat = 'identity') +
      geom_point(aes(classe, acum_norm, group = '1'), shape = I(1), size = I(3), colour = 'gray20') +
      geom_line(aes(classe, acum_norm, group = '1'), colour = I('gray20'))
    
    p
    }
    

相关问题