首页 文章

在Python中将系列重塑为Dataframe矩阵

提问于
浏览
1

我有一个305系列的系列,带有Datatime索引 . 数据看起来像这样

1992-01-31     1.123077 
1992-02-28    -2.174845 
1992-03-31    -3.884848
1992-04-30     8.682919
1992-05-29     1.312976
1992-06-30     7.851080
1992-07-31    -3.192788
1992-08-31    -7.351976
1992-09-30    -6.782217
1992-10-30   -17.182738
1992-11-30     3.898782
1992-12-31   -26.190414
1993-01-29     2.233359
1993-02-26     6.709006
continues with monthly data till December 2017

我想将数据重新整形为一个DataFrame,它具有行和行的所有年份以及适当填充的数据和数据

January February  March     etc >>  December
2017    values  values    values    values  values  
2016    values  values    values    values  values  
2015    values  values    values    values  values  
etc \\// 
1992    values

我查看了其他帖子并尝试了重塑和asmatrix,但鉴于它是不均匀的系列我不断收到此错误 .

ValueError:新数组的总大小必须保持不变 .

我真正想要做的是如果矩阵是奇数形状,那么插入NaN作为缺失值 . 因此,如果2017年没有11月或12月的 Value ,他们将是NaN

让我知道是否有人可以提供帮助

3 回答

  • 0
    s
    
    1992-01-31     1.123077
    1992-02-28    -2.174845
    1992-03-31    -3.884848
    1992-04-30     8.682919
    1992-05-29     1.312976
    1992-06-30     7.851080
    1992-07-31    -3.192788
    1992-08-31    -7.351976
    1992-09-30    -6.782217
    1992-10-30   -17.182738
    1992-11-30     3.898782
    1992-12-31   -26.190414
    1993-01-29     2.233359
    1993-02-26     6.709006
    Name: 1, dtype: float64
    
    type(s)
    pandas.core.series.Series
    

    如有必要,将索引转换为 datetime -

    s.index = pd.to_datetime(s.index, errors='coerce')
    

    现在,使用 pd.pivot -

    x = pd.Series(s.index.strftime('%Y %B')).str.split()
    y, m = x.str[0], x.str[1]
    
    pd.pivot(y, m, s)
    
             April    August   December  February   January      July     June  \
    1992  8.682919 -7.351976 -26.190414 -2.174845  1.123077 -3.192788  7.85108   
    1993       NaN       NaN        NaN  6.709006  2.233359       NaN      NaN   
    
             March       May  November    October  September  
    1992 -3.884848  1.312976  3.898782 -17.182738  -6.782217  
    1993       NaN       NaN       NaN        NaN        NaN
    
  • 2

    尝试类似的东西

    #Give your series index a name so that we can reset index and have a new column
    your_series.index = your_series.index.rename('Time')
    df = your_series.toframe('Values').reset_index()
    
    #Create variables for month and year
    df['Month'] = df.Time.dt.month
    df['Year'] = df.Time.dt.Year
    
    #Assuming they are unique, create a pivot table
    df.pivot('Year','Month','Values')
    

    月份将是数字 . 如果你想要你必须要做的月份名称

    import datetime as dt
    
    df['Month'] = df.Time.date.apply(lambda x: dt.datetime.strftime(x,'%B'))
    

    如果您的月/年对不是唯一的,那么请执行类似的操作

    df.groupby(['Year','Month']).Values.sum().unstack()
    
  • 2

    来源DF:

    In [159]: df
    Out[159]:
                      val
    date
    1992-01-31   1.123077
    1992-02-28  -2.174845
    1992-03-31  -3.884848
    1992-04-30   8.682919
    1992-05-29   1.312976
    1992-06-30   7.851080
    1992-07-31  -3.192788
    1992-08-31  -7.351976
    1992-09-30  -6.782217
    1992-10-30 -17.182738
    1992-11-30   3.898782
    1992-12-31 -26.190414
    1993-01-29   2.233359
    1993-02-26   6.709006
    

    解:

    import calendar
    
    In [158]: (df.assign(year=df.index.year, mon=df.index.month)
                 .pivot(index='year', columns='mon', values='val')
                 .rename(columns=dict(zip(range(13), calendar.month_name))))
    Out[158]:
    mon    January  February     March     April       May     June      July    August  September    October  November   December
    year
    1992  1.123077 -2.174845 -3.884848  8.682919  1.312976  7.85108 -3.192788 -7.351976  -6.782217 -17.182738  3.898782 -26.190414
    1993  2.233359  6.709006       NaN       NaN       NaN      NaN       NaN       NaN        NaN        NaN       NaN        NaN
    

    UPDATE: 或者更好更短version from @COLDSPEED

    In [164]: pd.pivot(df.index.year, df.index.month, df['val']) \
                .rename(columns=calendar.month_name.__getitem__)
    Out[164]:
    date   January  February     March     April       May     June      July    August  September    October  November   December
    date
    1992  1.123077 -2.174845 -3.884848  8.682919  1.312976  7.85108 -3.192788 -7.351976  -6.782217 -17.182738  3.898782 -26.190414
    1993  2.233359  6.709006       NaN       NaN       NaN      NaN       NaN       NaN        NaN        NaN       NaN        NaN
    

相关问题