创建一个空的Pandas DataFrame，然后填充它？-Java 学习之路

252

我从这里的熊猫数据框文档开始：http://pandas.pydata.org/pandas-docs/stable/dsintro.html

我想用时间序列计算中的值迭代地填充数据框 . 所以基本上，我想初始化数据框，包括列A，B和时间戳行，全部为0或全部为NaN .

然后我会添加初始值并检查此数据，计算前一行中的新行，比如行[A] [t] =行[A] [t-1] 1左右 .

我目前正在使用下面的代码，但我觉得它有点难看，必须有一种方法可以直接使用数据框，或者只是更好的方式 . 注意：我使用的是Python 2.7 .

import datetime as dt
import pandas as pd
import scipy as s

if __name__ == '__main__':
    base = dt.datetime.today().date()
    dates = [ base - dt.timedelta(days=x) for x in range(0,10) ]
    dates.sort()

    valdict = {}
    symbols = ['A','B', 'C']
    for symb in symbols:
        valdict[symb] = pd.Series( s.zeros( len(dates)), dates )

    for thedate in dates:
        if thedate > dates[0]:
            for symb in valdict:
                valdict[symb][thedate] = 1+valdict[symb][thedate - dt.timedelta(days=1)]

    print valdict

4 回答

-1

以下是一些建议：

使用date_range作为索引：

import datetime
import pandas as pd
import numpy as np

todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')

columns = ['A','B', 'C']

注意：我们可以通过编写简单的方法创建一个空的DataFrame（使用 NaN ）

df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs

要对数据执行这些类型的计算，请使用numpy数组：

data = np.array([np.arange(10)]*3).T

因此我们可以创建DataFrame：

In [10]: df = pd.DataFrame(data, index=index, columns=columns)

In [11]: df
Out[11]: 
            A  B  C
2012-11-29  0  0  0
2012-11-30  1  1  1
2012-12-01  2  2  2
2012-12-02  3  3  3
2012-12-03  4  4  4
2012-12-04  5  5  5
2012-12-05  6  6  6
2012-12-06  7  7  7
2012-12-07  8  8  8
2012-12-08  9  9  9

回复于 2024-05-11T17:18:57+08:00

93
If you simply want to create an empty data frame and fill it with some incoming data frames later, try this:

在这个例子中，我使用this pandas doc创建一个新的数据帧，然后使用append用来自oldDF的数据写入newDF .

Have a look at this
```
newDF = pd.DataFrame() #creates a new dataframe that's empty
newDF = newDF.append(oldDF, ignore_index = True) # ignoring index is optional
# try printing some data from newDF
print newDF.head() #again optional
```
- 如果我必须继续从多个oldDF向这个newDF追加新数据，我只需使用for循环迭代pandas.DataFrame.append()
回复于 2024-05-11T17:18:57+08:00
216
假设一个包含19行的数据帧
```
index=range(0,19)
index

columns=['A']
test = pd.DataFrame(index=index, columns=columns)
```
将A列保持为常数
```
test['A']=10
```
将列b保持为循环给出的变量
```
for x in range(0,19):
    test.loc[[x], 'b'] = pd.Series([x], index = [x])
```
您可以用任何值替换pd.Series中的第一个x（[x]，index = [x]）
回复于 2024-05-11T17:18:57+08:00
40
If you want to have your column names in place from the start, use this approach:
```
import pandas as pd

col_names =  ['A', 'B', 'C']
my_df  = pd.DataFrame(columns = col_names)
my_df
```
If you want to add a record to the dataframe it would be better to use:
```
my_df.loc[len(my_df)] = [2, 4, 5]
```
您可能还想传递字典：
```
my_dic = {'A':2, 'B':4, 'C':5}
my_df.loc[len(my_df)] = my_dic
```
However if you want to add another dataframe to my_df do as follows:
```
col_names =  ['A', 'B', 'C']
my_df2  = pd.DataFrame(columns = col_names)
my_df = my_df.append(my_df2)
```
if you are adding rows inside a loop consider performance issues;
对于大约前1000个记录"my_df.loc"性能更好，并且通过增加循环中的记录数量逐渐变慢 .

If you plan to do thins inside a big loop (say 10M records or so)
你最好混合使用这两种;使用iloc填充数据帧，直到大小达到1000，然后将其附加到原始数据帧，并清空临时数据帧 . 这会使你的表现提高10倍左右
回复于 2024-05-11T17:18:57+08:00

创建一个空的Pandas DataFrame，然后填充它？

4 回答

相关问题