Pandas按groupby求和，但不包括某些列-Java 学习之路

在Pandas数据帧上执行groupby的最佳方法是什么，但是从该组中排除某些列？例如 . 我有这个人 . 数据帧：

Code    Country Item_Code   Item    Ele_Code    Unit    Y1961   Y1962   Y1963
2   Afghanistan 15          Wheat   5312        Ha      10       20      30
2   Afghanistan 25          Maize   5312        Ha      10       20      30
4   Angola      15          Wheat   7312        Ha      30       40      50
4   Angola      25          Maize   7312        Ha      30       40      50

我想通过列Country和Item_Code进行分组，并且只计算落在Y1961，Y1962和Y1963列下的行的总和 . 结果数据框应如下所示：

Code    Country Item_Code   Item    Ele_Code    Unit    Y1961   Y1962   Y1963
    2   Afghanistan 15      C3      5312        Ha      20       40      60
    4   Angola      25      C4      7312        Ha      60       80      100

现在，我这样做：

df.groupby('Country').sum()

但是，这也会将Item_Code列中的值相加 . 有什么方法可以指定sum（）操作中包含哪些列以及要排除哪些列？

3 回答

74
如果您正在寻找一种更通用的方法来应用于许多列，您可以做的是构建列名列表并将其作为分组数据帧的索引传递 . 在您的情况下，例如：
```
columns = ['Y'+str(i) for year in range(1967, 2011)]

df.groupby('Country')[columns].agg('sum')
```
回复于 2024-04-29T06:20:20+08:00

您可以选择groupby的列：

In [11]: df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum()
Out[11]:
                       Y1961  Y1962  Y1963
Country     Item_Code
Afghanistan 15            10     20     30
            25            10     20     30
Angola      15            30     40     50
            25            30     40     50

请注意，传递的列表必须是列的子集，否则您将看到KeyError .

回复于 2024-04-29T06:20:20+08:00

9
agg 函数将为您执行此操作 . 传递列并作为带有列的dict函数输出：
```
df.groupby(['Country', 'Item_Code']).agg({'Y1961': np.sum, 'Y1962': [np.sum, np.mean]})  # Added example for two output columns from a single input column
```
这将仅按列显示组和指定的聚合列 . 在这个例子中，我包括两个应用于'Y1962'的agg函数 .

要获得您希望看到的内容，请在组中包含其他列，并将和应用于框架中的Y变量：
```
df.groupby(['Code', 'Country', 'Item_Code', 'Item', 'Ele_Code', 'Unit']).agg({'Y1961': np.sum, 'Y1962': np.sum, 'Y1963': np.sum})
```
回复于 2024-04-29T06:20:20+08:00

Pandas按groupby求和，但不包括某些列

3 回答

相关问题