替换NaN返回ValueError：Array条件必须与self相同-Java 学习之路

我的目标是使用'ffill'（如果它们在早上7点之前发生）和'插入'（错误> =早上7点）来估算错误值（零和负数） . 我的“文本”文件包含数千天和数百列 . 下面是其中的一小部分，显示在上午7点之前和之后三天都有错误 .

date                 a    b    c        
2016-03-02 06:55:00  0.0  1.0  0.0
2016-03-02 07:00:00  2.0  2.0  0.0
2016-03-02 07:55:00  3.0  0.0  3.0
2016-03-03 06:10:00 -4.0  4.0  0.0
2016-03-03 07:00:00  5.0  5.0  5.0
2016-03-03 07:05:00  6.0  0.0  6.0
2016-03-03 08:05:00  7.0  0.0  7.0
2016-03-03 17:40:00  8.0  8.0 -8.0
2016-03-04 05:55:00  0.0  9.0  0.0
2016-03-04 06:00:00  0.0  0.0  10.0

当'date'是一列时，another post下面的一小段代码与其他df完美配合 .

df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Change zeros and negatives to NaN
df.replace(0, np.nan, inplace=True)  
df[df < 0] = np.nan                  

# construct Boolean switch series
switch = (df.index - df.index.normalize()) > pd.to_timedelta('07:00:00')

# use numpy.where to differentiate between two scenarios
df.iloc[:, 0:] = df.iloc[:, 0:].interpolate().where(switch, df.iloc[:, 0:].ffill())

但是，当'date'成为索引时，代码返回 ValueError: Array conditional must be same shape as self . 任何帮助表示赞赏 .

1 回答

以下最终解决了我的问题 .

df['date'] = pd.to_datetime(df['date'])
# don't set column 'date' to index

# Change zeros and negatives to NaN
df.replace(0, np.nan, inplace=True)  
df[df.loc[:, df.columns != 'date'] < 0] = np.nan # change negatives to NaN,   
                                                 # but exclude column 'date',   
                                                 # otherwise, column 'date' will be   
                                                 # converted to NaT  

# construct Boolean switch series
switch = (df['date'] - df['date'].dt.normalize()) > pd.to_timedelta('07:00:00')

# use numpy.where to differentiate between two scenarios
df.iloc[:, 0:] = df.iloc[:, 0:].interpolate().where(switch, df.iloc[:, 0:].ffill())

感谢@jpp建议最重要的最后两行here .

回复于 2024-04-27T11:07:56+08:00

替换NaN返回ValueError：Array条件必须与self相同

1 回答

相关问题