首页 文章

python:具有多个发行版的distplot

提问于
浏览 771
15

我正在使用seaborn绘制分布图 . 我想在不同颜色的同一图上绘制多个分布:

以下是我开始分发图的方法:

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
iris = pd.DataFrame(data= np.c_[iris['data'], iris['target']],columns= iris['feature_names'] + ['target'])

sns.distplot(iris[['sepal length (cm)']], hist=False, rug=True);

'target'列包含3个值:0,1,2 .

我想看一个萼片长度的分布图,其中target == 0,target == 1,target == 2,总共3个图 .

有谁知道我是怎么做到的?

谢谢 .

2 回答

  • 16

    针对此类问题的一种更常见的方法是使用melt将数据重新转换为长格式,然后让map执行其余操作 .

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_iris
    import seaborn as sns
    
    iris = load_iris()
    iris = pd.DataFrame(data=np.c_[iris['data'], iris['target']], 
                        columns=iris['feature_names'] + ['target'])
    
    # recast into long format 
    df = iris.melt(['target'], var_name='cols',  value_name='vals')
    
    df.head()
    
       target               cols  vals
    0     0.0  sepal length (cm)   5.1
    1     0.0  sepal length (cm)   4.9
    2     0.0  sepal length (cm)   4.7
    3     0.0  sepal length (cm)   4.6
    4     0.0  sepal length (cm)   5.0
    

    您现在可以通过创建FacetGrid并使用map来进行绘图:

    g = sns.FacetGrid(df, col='cols', hue="target", palette="Set1")
    g = (g.map(sns.distplot, "vals", hist=False, rug=True))
    

    enter image description here

  • 8

    重要的是按 target012 的值对数据帧进行排序 .

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_iris
    import seaborn as sns
    
    iris = load_iris()
    iris = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                        columns=iris['feature_names'] + ['target'])
    
    # Sort the dataframe by target
    target_0 = iris.loc[iris['target'] == 0]
    target_1 = iris.loc[iris['target'] == 1]
    target_2 = iris.loc[iris['target'] == 2]
    
    sns.distplot(target_0[['sepal length (cm)']], hist=False, rug=True)
    sns.distplot(target_1[['sepal length (cm)']], hist=False, rug=True)
    sns.distplot(target_2[['sepal length (cm)']], hist=False, rug=True)
    
    sns.plt.show()
    

    输出如下:

    enter image description here

    如果您不知道 target 可能有多少个值,请在 target 列中找到唯一值,然后对数据帧进行切片并相应地添加到图中 .

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_iris
    import seaborn as sns
    
    iris = load_iris()
    iris = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                        columns=iris['feature_names'] + ['target'])
    
    unique_vals = iris['target'].unique()  # [0, 1, 2]
    
    # Sort the dataframe by target
    # Use a list comprehension to create list of sliced dataframes
    targets = [iris.loc[iris['target'] == val] for val in unique_vals]
    
    # Iterate through list and plot the sliced dataframe
    for target in targets:
        sns.distplot(target[['sepal length (cm)']], hist=False, rug=True)
    
    sns.plt.show()
    

相关问题