系列的真值是模棱两可的 . 使用a.empty，a.bool（），a.item（），a.any（）或a.all（）-Java 学习之路

140

问题是使用 or 条件过滤我的结果数据框 . 我希望我的结果 df 提取高于0.25且低于-0.25的所有列 _var_ 值 . 下面的这个逻辑给了我一个模糊的真值，但是当我在两个单独的操作中分割这个过滤时它可以工作 . 这里发生了什么？不知道在哪里使用建议 a.empty(), a.bool(), a.item(),a.any() or a.all() .

result = result[(result['var']>0.25) or (result['var']<-0.25)]

4 回答

1
or 和 and python语句需要 truth -values . 对于 pandas ，这些被认为是不明确的，因此您应该使用"bitwise" | （或）或 & （和）操作：
```
result = result[(result['var']>0.25) | (result['var']<-0.25)]
```
对于这些类型的数据结构，这些都是重载的，以产生元素 or （或 and ） .

只是为此声明添加更多解释：

如果要获取 pandas.Series 的 bool ，则抛出异常：
```
>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```
您点击的是操作员 implicitly 将操作数转换为 bool 的位置（您使用 or 但 and ， if 和 while 也是如此）：
```
>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```
除了这4个语句之外，有几个python函数隐藏了一些 bool 调用（如 any ， all ， filter ，...），这些函数通常不会出现问题 pandas.Series 但是为了完整性我想提一下这些 .

在你的情况下，例外是't really helpful, because it doesn'提及 right alternatives . 对于 and 和 or ，您可以使用（如果您想要按元素比较）：
- numpy.logical_or：
```
>>> import numpy as np
>>> np.logical_or(x, y)
```
或者只是 | 运算符：
```
>>> x | y
```
- numpy.logical_and：
```
>>> np.logical_and(x, y)
```
或者只是 & 运算符：
```
>>> x & y
```
如果您正在使用运算符，请确保因the operator precedence而正确设置括号 .

several logical numpy functions应该适用于 pandas.Series .

如果在执行 if 或 while 时遇到它，则Exception中提到的替代方法更适合 . 我将简要解释其中的每一个：
- 如果你想检查你的系列是否 empty ：
```
>>> x = pd.Series([])
>>> x.empty
True
>>> x = pd.Series([1])
>>> x.empty
False
```
如果没有明确的布尔解释，Python通常将 len 容器（如 list ， tuple ，...）解释为真值 . 因此，如果你想要类似python的检查，你可以这样做： if x.size 或 if not x.empty 而不是 if x .
- 如果 Series 包含 one and only one 布尔值：
```
>>> x = pd.Series([100])
>>> (x > 50).bool()
True
>>> (x < 50).bool()
False
```
- 如果要检查系列的 first and only item （如 .bool() ，但即使不是布尔内容也适用）：
```
>>> x = pd.Series([100])
>>> x.item()
100
```
- 如果要检查 all 或 any 项是否为零，不为空或不为假：
```
>>> x = pd.Series([0, 1, 2])
>>> x.all()   # because one element is zero
False
>>> x.any()   # because one (or more) elements are non-zero
True
```
回复于 2024-05-19T12:49:48+08:00

对于布尔逻辑，请使用 & 和 | .

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))

>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

要查看发生了什么，每个比较都会得到一列布尔值，例如：

df.C > 0.25
0     True
1    False
2    False
3     True
4     True
Name: C, dtype: bool

如果您有多个条件，则会返回多个列 . 这就是连接逻辑不明确的原因 . 使用 and 或 or 分别处理每个列，因此首先需要将该列减少为单个布尔值 . 例如，要查看每列中的任何值或所有值是否为True .

# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True

# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False

实现相同目标的一种复杂方法是将所有这些列压缩在一起，并执行适当的逻辑 .

>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

有关更多详细信息，请参阅文档中的Boolean Indexing .

回复于 2024-05-19T12:49:48+08:00

或者，您也可以使用操作员模块 . 更多详细信息请点击此处Python docs

import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.4438

回复于 2024-05-19T12:49:48+08:00

238
This excellent answer很好地解释了发生了什么并提供了解决方案 . 我想添加另一个可能适用于类似情况的解决方案：使用query方法：
```
result = result.query("(var > 0.25) or (var < -0.25)")
```
另见http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query .

（我正在使用的数据帧的一些测试表明，这种方法比在一系列布尔值上使用按位运算符慢一点：2 ms vs.870μs）

A piece of warning ：至少有一种情况，这不是直截了当的，当列名恰好是python表达式时 . 我有名为 WT_38hph_IP_2 ， WT_38hph_input_2 和 log2(WT_38hph_IP_2/WT_38hph_input_2) 的列，并希望执行以下查询： "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

我获得了以下异常级联：
- KeyError: 'log2'
- UndefinedVariableError: name 'log2' is not defined
- ValueError: "log2" is not a supported function
我想这是因为查询解析器试图从前两列创建一些东西，而不是使用第三列的名称来识别表达式 .

提出了一种可能的解决方法here .
回复于 2024-05-19T12:49:48+08:00

系列的真值是模棱两可的 . 使用a.empty，a.bool（），a.item（），a.any（）或a.all（）

4 回答

相关问题