首页 文章

重新采样pandas数据帧时的NaN值

提问于
浏览
1

我有一个包含两个不同列的pandas数据框:

  • 日期时间索引列;

  • 包含dict的列

如果我运行一个自定义重新采样器,返回一个新的dict作为结果,我在重采样数据帧中得到一个NaN值 .

是否不可能运行不返回数字的重新采样?

谢谢,FB

EDIT1: 这是一个数据样本:

2017-10-15 06:55:14.237039000,"{'SMA120C': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA115_L': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA121_CT': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA110_4L': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA111': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}}"
2017-10-15 06:55:18.584042000,"{'SMA120C': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA115_L': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA121_CT': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA110_4L': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA111': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}}"
2017-10-15 06:55:22.881817000,"{'SMA120C': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA115_L': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA121_CT': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA110_4L': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA111': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}}"
2017-10-15 06:55:27.234606000,"{'SMA120C': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA115_L': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA121_CT': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA110_4L': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA111': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}}"
2017-10-15 06:55:31.593890000,
2017-10-15 06:55:35.978696000,"{'SMA120C': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA115_L': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA121_CT': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA110_4L': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA111': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}}"
2017-10-15 06:55:40.296786000,"{'SMA120C': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA115_L': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA121_CT': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA110_4L': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA111': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}}"
2017-10-15 06:55:44.655286000,"{'SMA120C': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA115_L': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA121_CT': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA110_4L': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA111': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}}"
2017-10-15 06:55:48.957150000,"{'SMA120C': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA115_L': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA121_CT': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA110_4L': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA111': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}}"
2017-10-15 06:55:53.299944000,

我刚刚过滤掉第二列不包含任何基于字符串的dict的行 .

EDIT2:

重采样器功能:

def custom_resampler(array_like):
    ref_el = {}
    data = {}
    for element in filter(lambda item: item is not None, array_like):
        for machine in element.keys():
                if not ref_el.get(machine, None):
                    ref_el[machine] = element[machine].get('totalProduction', 0) if isinstance(element[machine], dict) else 0
                    data[machine] = {
                        '0': [],
                        '1': [],
                        '2':[],
                        '3':[],
                        '4':[],
                        '5':[],
                        '6': [],
                        '7':[],
                        '8':[],
                        '9':[],
                        '10':[]
                    }
                else:
                    status = str(element[machine]['status'])
                    total_prod_diff = element[machine].get('totalProduction', 0) - ref_el[machine]
                    data[machine][status].append(
                        total_prod_diff
                    )
                    ref_el[machine] = element[machine].get('totalProduction', 0)

1 回答

  • 2

    您需要先将列从 strings 转换为 dictionaries

    import ast
    df['col'] = df['col'].fillna('{}').apply(ast.literal_eval)
    

    然后使用输出聚合字典将return函数添加到最后:

    def custom_resampler(array_like):
        ref_el = {}
        data = {}
        for element in filter(lambda item: item is not None, array_like['fetched_data']):
            for machine in element.keys():
                    if not ref_el.get(machine, None):
                        ref_el[machine] = element[machine].get('totalProduction', 0) if isinstance(element[machine], dict) else 0
                        data[machine] = {
                            '0': [],
                            '1': [],
                            '2':[],
                            '3':[],
                            '4':[],
                            '5':[],
                            '6': [],
                            '7':[],
                            '8':[],
                            '9':[],
                            '10':[]
                        }
                    else:
                        status = str(element[machine]['status'])
                        total_prod_diff = element[machine].get('totalProduction', 0) - ref_el[machine]
                        data[machine][status].append(
                            total_prod_diff
                        )
                        ref_el[machine] = element[machine].get('totalProduction', 0)
        #return ouptut dict
        return [ref_el]
    

    df1 = df.resample('T').apply(custom_resampler)
    print (df1)
                                                              fetched_data
    2017-10-15 06:55:00  {'SMA111': 2809, 'SMA121_CT': 2809, 'SMA110_4L...
    

相关问题