首页 文章

xarray Multiindex concat的最佳实践

提问于
浏览
1

我有一组1000(2D) pd.Dataframe (比方说,索引:时间,列:run_id),每一个都有3个属性(比方说温度,压力,位置) . 理想情况下,我希望将 xr.DataArray 中的所有内容都包含在5个维度中(或者具有4个维度的 xr.Dataset 并将最后一个维度作为唯一数据变量) .

我创建了一个带有两个dims和2个3 coords的DataArray但是 xr.concat 似乎不适用于多个维度 . (我按照这里提到的方法Add 'constant' dimension to xarray Dataset . )

示例:我从各个数据框架和属性列表构建DataArrays .

# Mock data:
data = {}
for i in np.arange(500):
    data[i] = pd.DataFrame(np.random.randn(1000, 8), index=pd.DatetimeIndex(start='01.01.2013',periods=1000,freq='h'),
                    columns=list('ABCDEFGH'))
df_catalogue = pd.DataFrame(np.random.choice(10,(500, 3)), columns=['temp','pre','zon'])

#Build DataArrays adding scalar coords
res_da = []
for i,v in df_catalogue.iterrows():
    i_df = data[i] # data is a dictionary of properly indexed dataframes

    da = xr.DataArray(i_df.values,
                   coords={'time':i_df.index.values,'runs':i_df.columns.values,
                           'temp':v['temp'], 'pre':v['pre'],'zon':v['zon']},
                   dims=['time','runs'])
    res_da.append(da)

但是当我尝试 all_da = xr.concat(res_da, dim=['temp','pre','zon']) 时,我得到了奇怪的结果 . 实现这样的目标的最佳方法是什么:

<xarray.DataArray (time: 8000, runs: 50, temp:8, pre:10, zon: 5)>
array([[[ 4545.453613,  4545.453613, ...,  4545.453613,  4545.453613],
        [ 4545.453613,  4545.453613, ...,  4545.453613,  4545.453613],
        ..., 
        [ 4177.425781,  4177.425781, ...,  4177.425781,  4177.425781]]], dtype=float32)
Coordinates:
  * runs  (runs) object 'A' 'B' ...
  * time  (time) datetime64[ns] 2013-12-31T23:00:00 2014-01-01 ...
  * zon   (zon) 'zon1', 'zon2', 'zon3', ......
  * temp  (temp)  'XX' 'YY', 'ZZ' .....
  * pre   (pre) 'AAA', 'BBB', 'CCC' ....

1 回答

  • 2

    xarray.concat 仅支持沿单个维度连接 . 但我们可以通过连接,设置MultiIndex然后取消堆叠来解决这个问题 .

    我'm altering your setup code because this only works if each combination of the new coordinates you'重建( ['temp','pre','zon'] )是独一无二的:

    import numpy as np
    import pandas as pd
    import xarray as xr
    import itertools
    
    data = {}
    for i in np.arange(500):
        data[i] = pd.DataFrame(np.random.randn(1000, 8),
                               index=pd.DatetimeIndex(start='01.01.2013',periods=1000,freq='h'),
                               columns=list('ABCDEFGH'))
    cat_data = [(x, y, z)
                for x in range(20)
                for y in ['a', 'b', 'c', 'd', 'e']
                for z in ['A', 'B', 'C', 'D', 'E']]
    df_catalogue = pd.DataFrame(cat_data, columns=['temp','pre','zon'])
    
    #Build DataArrays adding scalar coords
    res_da = []
    for i,v in df_catalogue.iterrows():
        i_df = data[i] # data is a dictionary of properly indexed dataframes
    
        da = xr.DataArray(i_df.values,
                       coords={'time':i_df.index.values,'runs':i_df.columns.values,
                               'temp':v['temp'], 'pre':v['pre'],'zon':v['zon']},
                       dims=['time','runs'])
        res_da.append(da)
    

    然后,我们可以简单地写:

    xr.concat(res_da, dim='prop').set_index(prop=['temp', 'pre', 'zon']).unstack('prop')
    

    这导致你想要的5D阵列:

    <xarray.DataArray (time: 1000, runs: 8, temp: 20, pre: 5, zon: 5)>
    array([[[[[-0.690557, ..., -1.526415],
              ...,
              [ 0.737887, ...,  1.585335]],
    
             ...,
    
             [[ 0.99557 , ...,  0.256517],
              ...,
              [ 0.179632, ..., -1.236502]]],
    
    
            ...,
    
    
            [[[ 0.234426, ..., -0.149901],
              ...,
              [ 1.492255, ..., -0.380909]],
    
             ...,
    
             [[-0.36111 , ..., -0.451571],
              ...,
              [ 0.10457 , ...,  0.722738]]]]])
    Coordinates:
      * time     (time) datetime64[ns] 2013-01-01 2013-01-01T01:00:00 ...
      * runs     (runs) object 'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'
      * temp     (temp) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
      * pre      (pre) object 'a' 'b' 'c' 'd' 'e'
      * zon      (zon) object 'A' 'B' 'C' 'D' 'E'
    

相关问题