无法使用系列中的值填充数据框中的列-Java 学习之路

我试图在数据帧的特定列中填充相同类型的非空值的均值（基于数据帧中另一列的值） . 以下是重现我的问题的代码：

import numpy as np
import pandas as pd

df = pd.DataFrame()
#Create the DateFrame with a column of floats
#And a column of labels (str)
np.random.seed(seed=6)
df['col0']=np.random.randn(100)    
lett=['a','b','c','d']
df['col1']=np.random.choice(lett,100)

#Set some of the floats to NaN for the test.
toz = np.random.randint(0,100,25)
df.loc[toz,'col0']=np.NaN
df[df['col0'].isnull()==False].count()

#Create a DF with mean for each label.
w_series = df.loc[(~df['col0'].isnull())].groupby('col1').mean()



        col0
col1    
a   0.057199
b   0.363899
c   -0.068074
d   0.251979

#This dataframe has our label (a,b,c,d) as the index. Doesn't seem
#to work when I try to df.fillna(w_series). So I try to reindex such
#that the labels (a,b,c,d) become a column again.
#
#For some reason I cannot just do a set_index and expect the
#old index to become column. So I append the new index and 
#then reset it.
w_series['col2'] = list(range(w_series.size))
w_frame = w_series.set_index('col2',append=True)
w_frame.reset_index('col1',inplace=True)

#I try fillna() with the new dataframe.
df.fillna(w_frame)

仍然没有运气：

col0    col1
0   0.057199    b
1   0.729004    a
2   0.217821    d
3   0.251979    c
4   -2.486781   a
5   0.913252    b
6   NaN         a
7   NaN         b

我究竟做错了什么？

如何使用与缺失信息匹配的特定行的平均值来填充数据框？

填充的数据框的大小（df）和填充数据框（w_frame）是否必须匹配？

谢谢

1 回答

fillna 基于索引，因此，您需要为目标数据框和流程数据框提供相同的索引

df.set_index('col1')['col0'].fillna(w_frame.set_index('col1').col0).reset_index()

# I only show the first 11 row
Out[74]: 
   col1      col0
0     b  0.363899
1     a  0.729004
2     d  0.217821
3     c -0.068074
4     a -2.486781
5     b  0.913252
6     a  0.057199
7     b  0.363899
8     c -0.068074
9     b -0.429894
10    a  2.631281

我去 fillna 的路

df['col1']=df.groupby("col1")['col0'].transform(lambda x: x.fillna(x.mean()))

回复于 2024-05-03T18:22:10+08:00

无法使用系列中的值填充数据框中的列

1 回答

相关问题