如何获取Pandas数据帧的行数？-Java 学习之路

504

我正在尝试用Pandas获取dataframe df的行数，这是我的代码 .

方法1：

total_rows = df.count
print total_rows +1

方法2：

total_rows = df['First_columnn_label'].count
print total_rows +1

这两个代码片段都给我这个错误：

TypeError：不支持的操作数类型：'instancemethod'和'int'

我究竟做错了什么？

根据@root给出的@root检查df长度的最佳（最快）方法是调用：

df.shape[0]

11 回答

4
如果您想在链式操作的中间获取行计数，您可以使用：
```
df.pipe(len)
```
例：
```
row_count = (
      pd.DataFrame(np.random.rand(3,4))
      .reset_index()
      .pipe(len)
)
```
如果您不想在len（）函数中放置long语句，这可能很有用 .

您可以使用__len __（）代替但__len __（）看起来有点奇怪 .
回复于 2024-04-29T04:09:03+08:00

假设 df 是您的数据帧：

count_row = df.shape[0]  # gives number of row count
count_col = df.shape[1]  # gives number of col count

回复于 2024-04-29T04:09:03+08:00

15
我从 R 背景来到pandas，我发现在选择行或列时，大熊猫更复杂 . 我不得不与它搏斗一段时间，然后我找到了一些方法来处理：

获取列数：
```
len(df.columns)  
## Here:
#df is your data.frame
#df.columns return a string, it contains column's titles of the df. 
#Then, "len()" gets the length of it.
```
获取行数：
```
len(df.index) #It's similar.
```
回复于 2024-04-29T04:09:03+08:00
4
对于dataframe df，在浏览数据时使用打印的逗号格式行计数：
```
def nrow(df):
    print("{:,}".format(df.shape[0]))
```
例：
```
nrow(my_df)
12,456,789
```
回复于 2024-04-29T04:09:03+08:00
16

len() 是你的朋友，行数的简短回答是 len(df) .

或者，您可以通过 df.index 按 df.index 和所有列访问所有行，并且因为您可以使用 len(anyList) 获取列表计数，因此您可以使用 len(df.index) 获取行数，使用 len(df.columns) 作为列计数 .

或者，您可以使用 df.shape 将行数和列数一起返回，如果要访问的行数仅使用 df.shape[0] ，并且列数仅使用： df.shape[1] .

回复于 2024-04-29T04:09:03+08:00
6

df.shape 以元组的形式返回数据框的形状（行数，列数） .

你可以简单地访问否 . 行或否 . cols分别为 df.shape[0] 或 df.shape[1] ，与访问元组的值相同 .

回复于 2024-04-29T04:09:03+08:00
0
行数（使用任何一个）：
```
df.shape[0]
len(df)
```
回复于 2024-04-29T04:09:03+08:00
684
使用 len(df) . 这适用于熊猫0.11或甚至更早 .

__len__() 目前（0.12）记录 Returns length of index . 时间信息，设置方式与root的答案相同：
```
In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop
```
由于一个额外的函数调用它比直接调用 len(df.index) 慢一点，但这在大多数用例中不起任何作用 .
回复于 2024-04-29T04:09:03+08:00
2
除了上面的答案，使用 df.axes 可以使用行和列索引获取元组，然后使用 len() 函数：
```
total_rows=len(df.axes[0])
total_cols=len(df.axes[1])
```
回复于 2024-04-29T04:09:03+08:00

......以Jan-Philip Gehrcke的答案为基础 .

len(df) 或 len(df.index) 比 df.shape[0] 快的原因 . 看看代码 . df.shape是 @property ，运行两次调用 len 的DataFrame方法 .

df.shape??
Type:        property
String form: <property object at 0x1127b33c0>
Source:     
# df.shape.fget
@property
def shape(self):
    """
    Return a tuple representing the dimensionality of the DataFrame.
    """
    return len(self.index), len(self.columns)

在len（df）的引擎盖下

df.__len__??
Signature: df.__len__()
Source:   
    def __len__(self):
        """Returns length of info axis, but here we use the index """
        return len(self.index)
File:      ~/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py
Type:      instancemethod

len(df.index) 会比 len(df) 略快，因为它只有一个较少的函数调用，但这总是比 df.shape[0] 快

回复于 2024-04-29T04:09:03+08:00

135

您可以使用 .shape 属性或只使用 len(DataFrame.index) . 但是，性能差异显着（ len(DataFrame.index) 最快）：

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.arange(12).reshape(4,3))

In [4]: df
Out[4]: 
   0  1  2
0  0  1  2
1  3  4  5
2  6  7  8
3  9  10 11

In [5]: df.shape
Out[5]: (4, 3)

In [6]: timeit df.shape
2.77 µs ± 644 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: timeit df[0].count()
348 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [8]: len(df.index)
Out[8]: 4

In [9]: timeit len(df.index)
990 ns ± 4.97 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

编辑：正如@Dan Allen在评论中指出 len(df.index) 和 df[0].count() 不可互换，因为 count 排除 NaN s，

回复于 2024-04-29T04:09:03+08:00

如何获取Pandas数据帧的行数？

方法1：

方法2：

11 回答

相关问题