Pandas DataFrame将自动错误的值作为索引-Java 学习之路

我试图从JSON文件创建DataFrames .

我有一个名为“Series_participants”的列表，其中包含此JSON文件的一部分 . 当我打印它时，我的列表看起来像这样 .

participantId                                                                1
championId                                                                  76
stats                        {'item0': 3265, 'item2': 3143, 'totalUnitsHeal...
teamId                                                                     100
timeline                     {'participantId': 1, 'csDiffPerMinDeltas': {'1...
spell1Id                                                                     4
spell2Id                                                                    12
highestAchievedSeasonTier                                               SILVER
dtype: object
<class 'list'>

在我将这个列表转换为像这样的DataFrame之后

pd.DataFrame(Series_participants)

但是pandas使用“stats”和“timeline”的值作为DataFrame的索引 . 我希望有自动索引范围（0，...，n）

编辑1：

participantId    championId     stats  teamId    timeline    spell1Id  spell2Id  highestAchievedSeasonTier
0       1               76         3265     100       NaN          4          12     SILVER

我希望有一个带有“stats”和“timeline”colomns的数据框，其中包含其值的序列，如系列显示中所示 .

我的错误是什么？

编辑2：

我试图手动创建DataFrame，但是pandas没有考虑我的选择，最后获取了Series的“stats”键的索引 .

这是我的代码：

for j in range(0,len(df.participants[0])):

    for i in range(0,len(df.participants[0][0])):

        Series_participants = pd.Series(df.participants[0][i])
        test = {'participantId':Series_participants.values[0],'championId':Series_participants.values[1],'stats':Series_participants.values[2],'teamId':Series_participants.values[3],'timeline':Series_participants.values[4],'spell1Id':Series_participants.values[5],'spell2Id':Series_participants.values[6],'highestAchievedSeasonTier':Series_participants.values[7]}

        if j == 0:
            df_participants = pd.DataFrame(test)

        else:
            df_participants.append(test, ignore_index=True)

双循环是解析我的JSON文件的所有“参与者” .

最后编辑：

我用以下代码实现了我想要的东西：

for i in range(0,len(df.participants[0])):

    Series_participants = pd.Series(df.participants[0][i])

    df_test = pd.DataFrame(data=[Series_participants.values], columns=['participantId','championId','stats','teamId','timeline','spell1Id','spell2Id','highestAchievedSeasonTier'])

    if i == 0:
        df_participants = pd.DataFrame(df_test)
    else:
        df_participants = df_participants.append(df_test, ignore_index=True)

print(df_participants)

感谢大家的帮助！

3 回答

1
如果您尝试将包含dicts的列表，系列或数组输入到对象构造函数中，则它无法识别您要执行的操作 . 解决此问题的一种方法是手动设置：
```
df.at['a', 'b'] = {'x':value}
```
请注意，只有在您的DataFrame中已创建列 and 索引时，上述操作才有效 .
回复于 2024-04-30T15:59:08+08:00
1
每条评论更新：Pandas数据框可以包含字典，但不建议使用 .

Pandas解释为每个字典键需要一个索引，然后在它们之间广播单个项目列 .

因此，为了帮助您尝试做什么，我建议您阅读字典项目作为列 . 这是什么数据帧通常用于和非常擅长 .

示例由于pandas尝试按键，值对读入字典而导致的错误：
```
df = pd.DataFrame(columns= ['a', 'b'], index=['a', 'b'])
df.loc['a','a'] = {'apple': 2}
```
回报
```
ValueError: Incompatible indexer with Series
```
在下面的注释中每个jpp（当使用构造函数方法时）：

“它们可以容纳任意类型，例如：
```
df.iat[0, 0] = {'apple': 2}
```
但是，不建议以这种方式使用熊猫 . “
回复于 2024-04-30T15:59:08+08:00

为了提高效率，您应该在构建数据框时尝试操作数据，而不是单独执行操作 .

但是，要拆分字典键和值，可以使用 numpy.repeat 和 itertools.chain 的组合 . 这是一个最小的例子：

df = pd.DataFrame({'A': [1, 2],
                   'B': [{'key1': 'val0', 'key2': 'val9'},
                         {'key1': 'val1', 'key2': 'val2'}],
                   'C': [{'key3': 'val10', 'key4': 'val8'},
                         {'key3': 'val3', 'key4': 'val4'}]})

import numpy as np
from itertools import chain

chainer = chain.from_iterable

lens = df['B'].map(len)

res = pd.DataFrame({'A': np.repeat(df['A'], lens),
                    'B': list(chainer(df['B'].map(lambda x: x.values())))})

res.index = chainer(df['B'].map(lambda x: x.keys()))

print(res)

      A     B
key1  1  val0
key2  1  val9
key1  2  val1
key2  2  val2

回复于 2024-04-30T15:59:08+08:00

Pandas DataFrame将自动错误的值作为索引

3 回答

相关问题