首页 文章

xarray从Pandas写入netCDF - 维度问题

提问于
浏览
0

学习如何使用xarray从Pandas DF生成netCDF文件 . 以下几个教程和SO问题Add 'constant' dimension to xarray DatasetAdd 'constant' dimension to xarray Dataset但仍有一些问题,因为我无法获得Date_Time,lat和lon作为维度 . 当我进行nc转储时,它们不正确 .

将txt文件导入pandas df然后将xr导入netCDF的初始方法:

import pandas as pd
import xarray

#IMport Data from .dat file
colnames1 = ['Date','Time','latitude','longitude','Status','depth']
df2 = pd.read_csv('test.txt',header=0,error_bad_lines=False, names = colnames1,delim_whitespace=True)

# create xray Dataset from Pandas DataFrame
xr = xarray.Dataset.from_dataframe(df2)

# add variable attribute metadata
xr['latitude'].attrs={'units':'degrees', 'long_name':'Latitude'}
xr['longitude'].attrs={'units':'degrees', 'long_name':'Longitude'}
xr['depth'].attrs={'units':'m', 'long_name':'depth'}


# add global attribute metadata
xr.attrs={'Conventions':'CF-1.6', 'title':'Data', 'summary':'Data generated'}
#print xr
print xr
# save to netCDF
xr.to_netcdf('test.nc')

其中df2 =

Date            Time  grid_latitude  grid_longitude  Status  depth                                                                   
2017-09-05  13:01:59     -29.034083       31.068567     2.0    0.0   
2017-09-05  13:01:59     -29.039367       31.059150     2.0    0.0   
2017-09-05  13:01:59     -29.036650       31.059200     3.0    0.0   
2017-09-05  13:01:59     -29.036750       31.065417     7.0  100.0   
2017-09-05  13:01:59     -29.039317       31.056050     7.0  100.0   
2017-09-05  13:01:59     -29.034000       31.062367     3.0    0.0   
2017-09-05  13:01:59     -29.036517       31.049900     3.0    0.0   
2017-09-05  13:01:59     -29.031100       31.050000     3.0    0.0

这很好但尺寸不正确(见下文):

<xarray.Dataset>
Dimensions:    (index: 8)
Coordinates:
  * index      (index) int64 0 1 2 3 4 5 6 7
Data variables:
    Date       (index) object '2017-09-05' '2017-09-05' '2017-09-05' ...
    Time       (index) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
    latitude   (index) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
    longitude  (index) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
    Status     (index) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
    depth      (index) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
    title: Data
    summary: Data generated
    Conventions: CF-1.6

如果我将Date或合并的Date_Time设置为DF索引,则日期/时间的维度很好并且被视为维度:

<xarray.Dataset>
Dimensions:    (Date: 8)
Coordinates:
  * Date       (Date) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Data variables:
    Time       (Date) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
    latitude   (Date) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
    longitude  (Date) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
    Status     (Date) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
    depth      (Date) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
    title: Data
    summary: Data generated
    Conventions: CF-1.6

但是如果我在Date_Time,Lat和Lon上设置df.index,它将恢复为空白(索引) . 希望能够获得设置尺寸的指针 . 使用netCDF模块,可以使用以下语法:lat = dataset.createDimension('lat',73)来创建维度 . SO示例add dimension to an xarray DataArray并没有丢失某些东西,或者它想要使它达到nc转储产生类似于此的东西的程度 .

NetCDF dimension information:
        Name: lat
                size: 73
                type: dtype('float32')
                units: u'degrees_north'
                actual_range: array([ 90., -90.], dtype=float32)
                long_name: u'Latitude'
                standard_name: u'latitude'
                axis: u'Y'
        Name: lon
                size: 144
                type: dtype('float32')
                units: u'degrees_east'
                long_name: u'Longitude'
                actual_range: array([   0. ,  357.5], dtype=float32)
                standard_name: u'longitude'
                axis: u'X'
        Name: time
                size: 366
                type: dtype('float64')
                units: u'hours since 1-1-1 00:00:0.0'
                long_name: u'Time'
                actual_range: array([ 17628096.,  17636856.])
                delta_t: u'0000-00-01 00:00:00'
                standard_name: u'time'
                axis: u'T'
                avg_period: u'0000-00-01 00:00:00'

否则,我可以将DF列转换为np数组,并使用netCDF模块?提前谢谢了 . 我冒昧尝试这样的事情,但我怀疑它是在正确的道路上:

#add dimeensions
#d = {}
#d['time'] = ('time',df2.Time)
#d['latitude'] = ('latitude',df2.latitude)
#d['longitude'] = ('longitude', df2.longitude)
#d['var'] = (['time','latitude','longitude','Depth'], xr)
#xr = xray.Dataset(d)

1 回答

  • 1

    在转换为xarray数据集之前,将 Timegrid_latitudegrid_longitude 组合到DataFrame上的 pandas.MultiIndexset_index() 之间,这是最容易实现的 .

    例如:

    # note that pandas.DataFrame's to_xarray() method is equivalent to
    # xarray.Dataset.from_dataframe()
    ds = df.set_index(['Time', 'grid_latitude', 'grid_longitude']).to_xarray()
    

相关问题