pandas agg和apply函数有什么区别？-Java 学习之路

我无法弄清楚Pandas .aggregate 和 .apply 函数之间的区别 .
以下面的例子为例：我加载数据集，执行 groupby ，定义一个简单的函数，以及用户 .agg 或 .apply .

正如您所看到的，使用 .agg 和 .apply 之后，我的函数中的打印语句会产生相同的输出 . 结果，另一方面是不同的 . 这是为什么？

import pandas
import pandas as pd
iris = pd.read_csv('iris.csv')
by_species = iris.groupby('Species')
def f(x):
    ...:     print type(x)
    ...:     print x.head(3)
    ...:     return 1

使用 apply ：

by_species.apply(f)
#<class 'pandas.core.frame.DataFrame'>
#   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
#0           5.1          3.5           1.4          0.2  setosa
#1           4.9          3.0           1.4          0.2  setosa
#2           4.7          3.2           1.3          0.2  setosa
#<class 'pandas.core.frame.DataFrame'>
#   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
#0           5.1          3.5           1.4          0.2  setosa
#1           4.9          3.0           1.4          0.2  setosa
#2           4.7          3.2           1.3          0.2  setosa
#<class 'pandas.core.frame.DataFrame'>
#    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width     Species
#50           7.0          3.2           4.7          1.4  versicolor
#51           6.4          3.2           4.5          1.5  versicolor
#52           6.9          3.1           4.9          1.5  versicolor
#<class 'pandas.core.frame.DataFrame'>
#     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
#100           6.3          3.3           6.0          2.5  virginica
#101           5.8          2.7           5.1          1.9  virginica
#102           7.1          3.0           5.9          2.1  virginica
#Out[33]: 
#Species
#setosa        1
#versicolor    1
#virginica     1
#dtype: int64

使用 agg

by_species.agg(f)
#<class 'pandas.core.frame.DataFrame'>
#   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
#0           5.1          3.5           1.4          0.2  setosa
#1           4.9          3.0           1.4          0.2  setosa
#2           4.7          3.2           1.3          0.2  setosa
#<class 'pandas.core.frame.DataFrame'>
#    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width     Species
#50           7.0          3.2           4.7          1.4  versicolor
#51           6.4          3.2           4.5          1.5  versicolor
#52           6.9          3.1           4.9          1.5  versicolor
#<class 'pandas.core.frame.DataFrame'>
#     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
#100           6.3          3.3           6.0          2.5  virginica
#101           5.8          2.7           5.1          1.9  virginica
#102           7.1          3.0           5.9          2.1  virginica
#Out[34]: 
#           Sepal.Length  Sepal.Width  Petal.Length  Petal.Width
#Species                                                         
#setosa                 1            1             1            1
#versicolor             1            1             1            1
#virginica              1            1             1            1

3 回答

21

apply 将该函数应用于每个组（您的 Species ） . 您的函数返回1，因此您最终为3个组中的每个组分配1个值 .

agg 汇总每个组的每个列（功能），因此每个组每列最终会有一个值 .

请阅读groupby文档，它们非常有用 . 网络上还有一堆教程 .

回复于 2024-05-02T20:13:14+08:00

（ Note: 这些比较是 relevant for DataframeGroupby objects ）

与.apply（）， for DataFrame GroupBy objects 相比，有些合理的 advantages of using .agg() 将是：

1）.agg（）给出了 applying multiple functions at once 的灵活性，或者将函数列表传递给每一列 .

2）此外， applying different functions at once to different columns of dataframe.

这意味着每次操作都可以控制每列 .

以下是更多详细信息的链接：http://pandas.pydata.org/pandas-docs/version/0.13.1/groupby.html

但是，apply函数可以限制为一次将一个函数应用于数据帧的每个列 . 因此，您可能必须重复调用apply函数以对同一列调用不同的操作 .

Here, are some example comparison for .apply() vs .agg() for DataframeGroupBy objects :

Lets, first, see the operations using .apply( ):

In [261]: df = pd.DataFrame({"name":["Foo", "Baar", "Foo", "Baar"], "score_1":[5,10,15,10], "score_2" :[10,15,10,25], "score_3" : [10,20,30,40]})

In [262]: df
Out[262]: 
   name  score_1  score_2  score_3
0   Foo        5       10       10
1  Baar       10       15       20
2   Foo       15       10       30
3  Baar       10       25       40

In [263]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.sum())
Out[263]: 
name  score_1
Baar  10         40
Foo   5          10
      15         10
Name: score_2, dtype: int64

In [264]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.min())
Out[264]: 
name  score_1
Baar  10         15
Foo   5          10
      15         10
Name: score_2, dtype: int64

In [265]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.mean())
Out[265]: 
name  score_1
Baar  10         20.0
Foo   5          10.0
      15         10.0
Name: score_2, dtype: float64

现在，看看 same operations using .agg( ) effortlessly:

In [274]: df = pd.DataFrame({"name":["Foo", "Baar", "Foo", "Baar"], "score_1":[5,10,15,10], "score_2" :[10,15,10,25], "score_3" : [10,20,30,40]})

In [275]: df
Out[275]: 
   name  score_1  score_2  score_3
0   Foo        5       10       10
1  Baar       10       15       20
2   Foo       15       10       30
3  Baar       10       25       40

In [276]: df.groupby(["name", "score_1"]).agg({"score_3" :[np.sum, np.min, np.mean, np.max], "score_2":lambda x : x.mean()})
Out[276]: 
              score_2 score_3               
             <lambda>     sum amin mean amax
name score_1                                
Baar 10            20      60   20   30   40
Foo  5             10      10   10   10   10
     15            10      30   30   30   30

因此，与.apply（）相比，.agg（）在处理DataFrameGroupBy对象时非常方便 . But, if you are handling only pure dataframe objects, and not DataFrameGroupBy objects then apply() can be very useful, as apply( ) can apply a function along any axis of the dataframe.

(For Eg: axis = 0 implies column-wise operation with .apply(), 这是默认模式， axis = 1 would imply for row-wise operation while dealing with pure dataframe objects )

回复于 2024-05-02T20:13:14+08:00

8

当我使用apply to a groupby时，我遇到了.apply将返回分组列 . 文档中有一个注释（pandas.pydata.org/pandas-docs/stable/groupby.html）：

“......因此，分组列可以包含在输出中，也可以设置索引 . ”

.aggregate不会返回分组列 .

回复于 2024-05-02T20:13:14+08:00

pandas agg和apply函数有什么区别？

3 回答

相关问题