首页 文章

使用Python中的ggplot2(rpy2)对分类变量进行稳定的颜色映射

提问于
浏览
1

我有一个与this question的最高回复类似但未解决的问题 . 但是,我通过rpy2包在python中使用ggplot2,这带来了额外的困难 .

我有许多不同的时间序列(带有变量名称),我想在数据系列旁边绘制(这是常量) . 我希望数据系列在所有图中都是相同的颜色,但不关心其他系列的颜色 . 但是,如果我允许ggplot2自动分配颜色,它会按字母顺序排列,颜色也是如此映射不稳定,具体取决于系列名称是按字母顺序排在“数据”之前还是之后 . (见下面的代码)

请注意,系列名称(代码示例中的'a_model','e_model')并不是事先都知道的,所以我不能简单地创建一个包含所有可能的系列名称的手动色标 . 此外,图表可能包括数据系列和多个其他系列 . 我只对保持数据系列的颜色不变感兴趣 .

from rpy2 import robjects
from rpy2.robjects.lib import grid
from rpy2.robjects.packages import importr
import rpy2.robjects.lib.ggplot2 as ggplot2
from rpy2.robjects import pandas2ri
import pandas as pd
pandas2ri.activate()             

###Input data###
plot_data={}  
plot_data.update({'a_model':[0.217,0.226,0.238,0.253,0.272,0.278,0.283,0.29,0.296,0.298]})
plot_data.update({'data':[0.255,0.226,0.241,0.19,0.264,0.302,0.291,0.26,0.218,0.221]})
plot_data.update({'mos_since_start':[1,2,3,4,5,6,7,8,9,10]})

###Plotting Function###
def plot(plot_data, filename):
    df=pd.DataFrame(in_dict)
    fig = pd.melt(df, id_vars=['mos_since_start'])
    pp = ggplot2.ggplot(fig) + \
         ggplot2.aes_string(x='mos_since_start', 
         y='value',group='variable',colour='variable', shape = 'variable', linetype = 'variable') +\
         ggplot2.geom_line() + ggplot2.geom_point() 
     robjects.r.ggsave(filename=filename, plot=pp, width =12, height = 8)

###Plots###
plot(plot_data,"./testplot.pdf")
plot_data.update({'e_model':plot_data.pop('a_model')})
plot(plot_data,"./testplot2.pdf")

1 回答

  • 0

    这不是用python编写的,但是应该显示使数据系列成为图例中第一个值的选项,这些值应该是图中的一致颜色

    library(ggplot2)
    library(reshape2)
    
    df1 <- data.frame(a_model = c(0.217,0.226,0.238,0.253,0.272,0.278,0.283,0.29,0.296,0.298),
                      e_model = c(0.217,0.226,0.238,0.253,0.272,0.278,0.283,0.29,0.296,0.298),
                      data = c(0.255,0.226,0.241,0.19,0.264,0.302,0.291,0.26,0.218,0.221),
                      b_model = c(0.217,0.226,0.238,0.253,0.272,0.278,0.283,0.29,0.296,0.298),
                      mos_since_start = c(1,2,3,4,5,6,7,8,9,10))
    dfm <- melt(df1, id.vars = "mos_since_start")
    
    ggplot(dfm,
           aes(x = mos_since_start,
               y = value,
               group = variable,
               colour = variable,
               shape = variable,
               linetype = variable)) +
             geom_line() +
             geom_point() +
      scale_shape_discrete(name = "legend",
                           breaks = union("data", dfm$variable)) +
      scale_colour_discrete(name = "legend",
                            breaks = union("data", dfm$variable)) +
      scale_linetype_discrete(name = "legend",
                              breaks = union("data", dfm$variable))
    

    可能更简单的第二种方法是更改 variable 的因子顺序

    dfm$variable <- relevel(dfm$variable, "data")
    
    ggplot(dfm,
           aes(x = mos_since_start,
               y = value,
               group = variable,
               colour = variable,
               shape = variable,
               linetype = variable)) +
      geom_line() +
      geom_point()
    

相关问题