用R中的ggplot2覆盖直方图-Java 学习之路

我是R的新手，我试图将3个直方图绘制到同一个图表上 . 一切都运行正常，但我的问题是你没有看到2个直方图重叠的位置 - 它们看起来相当截止：Histogram

当我制作密度图时，它看起来很完美：每条曲线都被黑色框线包围，颜色在曲线重叠的地方看起来不同：Density Plot

有人可以告诉我，如果第一张照片中的直方图可以实现类似的东西吗？这是我正在使用的代码：

lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)

提前感谢任何有用的提示！

3 回答

您当前的代码：

ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)

告诉 ggplot 使用 f0 中的所有值构造一个直方图，然后根据变量 utt 为该单个直方图的条形着色 .

你想要的是创建三个单独的直方图，使用alpha混合使它们彼此可见 . 所以你可能想对 geom_histogram 使用三个单独的调用，其中每个调用都有自己的数据框并填充：

ggplot(histogram, aes(f0)) + 
    geom_histogram(data = lowf0, fill = "red", alpha = 0.2) + 
    geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
    geom_histogram(data = highf0, fill = "green", alpha = 0.2) +

这是一个输出的具体示例：

dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))

ggplot(dat,aes(x=xx)) + 
    geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
    geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
    geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)

产生这样的东西：

enter image description here

编辑修复错别字;你想要填充，而不是颜色 .

回复于 2024-05-02T14:27:42+08:00

100
使用@joran的样本数据，
```
ggplot(dat, aes(x=xx, fill=yy)) + geom_histogram(alpha=0.2, position="identity")
```
请注意 geom_histogram 的默认位置是"stack."

看这个页面的“位置调整”：

docs.ggplot2.org/current/geom_histogram.html
回复于 2024-05-02T14:27:42+08:00

187

虽然在ggplot2中只需要几行来绘制多个/重叠的直方图，但结果并不总是令人满意 . 需要 proper use of borders and coloring 以确保眼睛能够 differentiate between histograms .

以下函数 balancer border colors, opacities, and superimposed density plots 以使查看器能够区分分布 .

Single histogram ：

plot_histogram <- function(df, feature) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
    geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
    geom_density(alpha=0.3, fill="red") +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Density")
    print(plt)
}

Multiple histogram ：

plot_multi_histogram <- function(df, feature, label_column) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
    geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
    geom_density(alpha=0.7) +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Density")
    plt + guides(fill=guide_legend(title=label_column))
}

Usage ：

只需 pass your data frame into the above functions 以及所需的参数：

plot_histogram(iris, 'Sepal.Width')

enter image description here

plot_multi_histogram(iris, 'Sepal.Width', 'Species')

enter image description here

plot_multi_histogram中的 extra parameter 是包含类别标签的列的名称 .

通过使用 many different distribution means 创建数据框，我们可以更加明显地看到这一点：

a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))

像以前一样传递数据框（并使用选项扩展图表）：

options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')

enter image description here

回复于 2024-05-02T14:27:42+08:00

用R中的ggplot2覆盖直方图

3 回答

相关问题