NumPy或Pandas：在具有NaN值的同时将数组类型保持为整数-Java 学习之路

是否有一种首选方法可以将 numpy 数组的数据类型固定为 int （或 int64 或其他），同时仍然将内部元素列为 numpy.NaN ？

特别是，我正在将内部数据结构转换为Pandas DataFrame . 在我们的结构中，我们有整数类型的列，仍然有NaN 's (but the dtype of the column is int). It seems to recast everything as a float if we make this a DataFrame, but we' d真的很想成为 int .

思考？

Things tried:

我尝试在pandas.DataFrame下使用 from_records() 函数，使用 coerce_float=False ，这没有帮助 . 我也尝试使用带有NaN fill_value的NumPy蒙版数组，这也没有用 . 所有这些都导致列数据类型变为浮点数 .

4 回答

7
如果性能不是主要问题，则可以存储字符串 .
```
df.col = df.col.dropna().apply(lambda x: str(int(x)) )
```
然后你可以随意和 NaN 混合 . 如果你真的想要整数，根据你的应用程序，你可以使用 -1 ，或 0 ，或 1234567890 ，或其他一些专用值来表示 NaN .

您也可以临时复制列：一个就像你一样，有浮动;另一个实验，有整数或字符串 . 然后在每个合理的位置插入 asserts ，检查两者是否同步 . 经过充分的测试，你可以放下花车 .
回复于 2024-04-18T05:33:32+08:00
90
这不是所有情况的解决方案，但我的（基因组坐标）我已经使用0作为NaN
```
a3['MapInfo'] = a3['MapInfo'].fillna(0).astype(int)
```
这至少允许使用正确的“原生”列类型，减法，比较等操作按预期工作
回复于 2024-04-18T05:33:32+08:00
6

NaN 不能存储在整数数组中 . 这是目前大熊猫的一个已知限制;我一直在等待在NumPy中使用NA值取得进展（类似于R中的NAs），但在NumPy获得这些功能之前至少需要6个月到一年，似乎：

http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

（请注意，它已添加但仅作为开发版本中的新功能（到目前为止）：http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#optional-integer-na-support）

回复于 2024-04-18T05:33:32+08:00
3

此功能已添加到最新的熊猫测试版中：http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#optional-integer-na-support

回复于 2024-04-18T05:33:32+08:00

NumPy或Pandas：在具有NaN值的同时将数组类型保持为整数

4 回答

相关问题