首页 文章

无法使用系列中的值填充数据框中的列

提问于
浏览
1

我试图在数据帧的特定列中填充相同类型的非空值的均值(基于数据帧中另一列的值) . 以下是重现我的问题的代码:

import numpy as np
import pandas as pd

df = pd.DataFrame()
#Create the DateFrame with a column of floats
#And a column of labels (str)
np.random.seed(seed=6)
df['col0']=np.random.randn(100)    
lett=['a','b','c','d']
df['col1']=np.random.choice(lett,100)

#Set some of the floats to NaN for the test.
toz = np.random.randint(0,100,25)
df.loc[toz,'col0']=np.NaN
df[df['col0'].isnull()==False].count()

#Create a DF with mean for each label.
w_series = df.loc[(~df['col0'].isnull())].groupby('col1').mean()



        col0
col1    
a   0.057199
b   0.363899
c   -0.068074
d   0.251979

#This dataframe has our label (a,b,c,d) as the index. Doesn't seem
#to work when I try to df.fillna(w_series). So I try to reindex such
#that the labels (a,b,c,d) become a column again.
#
#For some reason I cannot just do a set_index and expect the
#old index to become column. So I append the new index and 
#then reset it.
w_series['col2'] = list(range(w_series.size))
w_frame = w_series.set_index('col2',append=True)
w_frame.reset_index('col1',inplace=True)

#I try fillna() with the new dataframe.
df.fillna(w_frame)

仍然没有运气:

col0    col1
0   0.057199    b
1   0.729004    a
2   0.217821    d
3   0.251979    c
4   -2.486781   a
5   0.913252    b
6   NaN         a
7   NaN         b

我究竟做错了什么?

如何使用与缺失信息匹配的特定行的平均值来填充数据框?

填充的数据框的大小(df)和填充数据框(w_frame)是否必须匹配?

谢谢

1 回答

  • 0

    fillna 基于索引,因此,您需要为目标数据框和流程数据框提供相同的索引

    df.set_index('col1')['col0'].fillna(w_frame.set_index('col1').col0).reset_index()
    
    # I only show the first 11 row
    Out[74]: 
       col1      col0
    0     b  0.363899
    1     a  0.729004
    2     d  0.217821
    3     c -0.068074
    4     a -2.486781
    5     b  0.913252
    6     a  0.057199
    7     b  0.363899
    8     c -0.068074
    9     b -0.429894
    10    a  2.631281
    

    我去 fillna 的路

    df['col1']=df.groupby("col1")['col0'].transform(lambda x: x.fillna(x.mean()))
    

相关问题