首页 文章

将列添加到groupby数据帧

提问于
浏览
1

如何通过数据框向熊猫组添加“总和”列?我想对下面的groupby数据框的'看跌'和'看涨'内部栏目做一个'总和' .

然后我想添加另外两列:

%看跌=看跌/总和* 100

%看涨=看涨/总和* 100

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
group_df = group_df.unstack()

                    message        
sentiment           Bearish Bullish
created                            
2017-08-01 23:00:00     2.0     2.0
2017-08-02 00:00:00     1.0     3.0
2017-08-02 01:00:00     NaN     4.0

1 回答

  • 1

    您可以将concat与新 DataFrame 一起使用:

    idx = pd.date_range('2017-08-01 23:13:00', periods=12, freq='12T')
    df = pd.DataFrame({'message':[1,1,2,2,2,2,2,2,3,3,3,3],
                       'sentiment':['Bearish'] * 5 + ['Bullish'] * 7 }, index=idx)
    print (df)
                         message sentiment
    2017-08-01 23:13:00        1   Bearish
    2017-08-01 23:25:00        1   Bearish
    2017-08-01 23:37:00        2   Bearish
    2017-08-01 23:49:00        2   Bearish
    2017-08-02 00:01:00        2   Bearish
    2017-08-02 00:13:00        2   Bullish
    2017-08-02 00:25:00        2   Bullish
    2017-08-02 00:37:00        2   Bullish
    2017-08-02 00:49:00        3   Bullish
    2017-08-02 01:01:00        3   Bullish
    2017-08-02 01:13:00        3   Bullish
    2017-08-02 01:25:00        3   Bullish
    

    group_df =df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
    #add ['message'] for remove Multiindex in columns
    group_df = group_df['message'].unstack()
    
    #divide by sum
    #add prefix - https://stackoverflow.com/q/45453508/2901002 
    df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
    print (df1)
                         %Bearish   %Bullish
    2017-08-01 23:00:00      80.0        NaN
    2017-08-02 00:00:00      20.0  57.142857
    2017-08-02 01:00:00       NaN  42.857143
    
    df = pd.concat([group_df, df1], axis=1)
    print (df)
                         Bearish  Bullish  %Bearish   %Bullish
    2017-08-01 23:00:00      4.0      NaN      80.0        NaN
    2017-08-02 00:00:00      1.0      4.0      20.0  57.142857
    2017-08-02 01:00:00      NaN      3.0       NaN  42.857143
    

    如果需要GroupBy.size

    group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).size()
    group_df = group_df.unstack()
    
    df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
    print (df1)
                         %Bearish   %Bullish
    2017-08-01 23:00:00      80.0        NaN
    2017-08-02 00:00:00      20.0  57.142857
    2017-08-02 01:00:00       NaN  42.857143
    
    df = pd.concat([group_df, df1], axis=1)
    print (df)
                         Bearish  Bullish  %Bearish   %Bullish
    2017-08-01 23:00:00      4.0      NaN      80.0        NaN
    2017-08-02 00:00:00      1.0      4.0      20.0  57.142857
    2017-08-02 01:00:00      NaN      3.0       NaN  42.857143
    

    What is the difference between size and count in pandas?

相关问题