我试图在数据帧的特定列中填充相同类型的非空值的均值(基于数据帧中另一列的值) . 以下是重现我的问题的代码:
import numpy as np
import pandas as pd
df = pd.DataFrame()
#Create the DateFrame with a column of floats
#And a column of labels (str)
np.random.seed(seed=6)
df['col0']=np.random.randn(100)
lett=['a','b','c','d']
df['col1']=np.random.choice(lett,100)
#Set some of the floats to NaN for the test.
toz = np.random.randint(0,100,25)
df.loc[toz,'col0']=np.NaN
df[df['col0'].isnull()==False].count()
#Create a DF with mean for each label.
w_series = df.loc[(~df['col0'].isnull())].groupby('col1').mean()
col0
col1
a 0.057199
b 0.363899
c -0.068074
d 0.251979
#This dataframe has our label (a,b,c,d) as the index. Doesn't seem
#to work when I try to df.fillna(w_series). So I try to reindex such
#that the labels (a,b,c,d) become a column again.
#
#For some reason I cannot just do a set_index and expect the
#old index to become column. So I append the new index and
#then reset it.
w_series['col2'] = list(range(w_series.size))
w_frame = w_series.set_index('col2',append=True)
w_frame.reset_index('col1',inplace=True)
#I try fillna() with the new dataframe.
df.fillna(w_frame)
仍然没有运气:
col0 col1
0 0.057199 b
1 0.729004 a
2 0.217821 d
3 0.251979 c
4 -2.486781 a
5 0.913252 b
6 NaN a
7 NaN b
我究竟做错了什么?
如何使用与缺失信息匹配的特定行的平均值来填充数据框?
填充的数据框的大小(df)和填充数据框(w_frame)是否必须匹配?
谢谢
1 回答
fillna
基于索引,因此,您需要为目标数据框和流程数据框提供相同的索引我去
fillna
的路