学习如何使用xarray从Pandas DF生成netCDF文件 . 以下几个教程和SO问题Add 'constant' dimension to xarray Dataset和Add 'constant' dimension to xarray Dataset但仍有一些问题,因为我无法获得Date_Time,lat和lon作为维度 . 当我进行nc转储时,它们不正确 .
将txt文件导入pandas df然后将xr导入netCDF的初始方法:
import pandas as pd
import xarray
#IMport Data from .dat file
colnames1 = ['Date','Time','latitude','longitude','Status','depth']
df2 = pd.read_csv('test.txt',header=0,error_bad_lines=False, names = colnames1,delim_whitespace=True)
# create xray Dataset from Pandas DataFrame
xr = xarray.Dataset.from_dataframe(df2)
# add variable attribute metadata
xr['latitude'].attrs={'units':'degrees', 'long_name':'Latitude'}
xr['longitude'].attrs={'units':'degrees', 'long_name':'Longitude'}
xr['depth'].attrs={'units':'m', 'long_name':'depth'}
# add global attribute metadata
xr.attrs={'Conventions':'CF-1.6', 'title':'Data', 'summary':'Data generated'}
#print xr
print xr
# save to netCDF
xr.to_netcdf('test.nc')
其中df2 =
Date Time grid_latitude grid_longitude Status depth
2017-09-05 13:01:59 -29.034083 31.068567 2.0 0.0
2017-09-05 13:01:59 -29.039367 31.059150 2.0 0.0
2017-09-05 13:01:59 -29.036650 31.059200 3.0 0.0
2017-09-05 13:01:59 -29.036750 31.065417 7.0 100.0
2017-09-05 13:01:59 -29.039317 31.056050 7.0 100.0
2017-09-05 13:01:59 -29.034000 31.062367 3.0 0.0
2017-09-05 13:01:59 -29.036517 31.049900 3.0 0.0
2017-09-05 13:01:59 -29.031100 31.050000 3.0 0.0
这很好但尺寸不正确(见下文):
<xarray.Dataset>
Dimensions: (index: 8)
Coordinates:
* index (index) int64 0 1 2 3 4 5 6 7
Data variables:
Date (index) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Time (index) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
latitude (index) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
longitude (index) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
Status (index) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
depth (index) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
title: Data
summary: Data generated
Conventions: CF-1.6
如果我将Date或合并的Date_Time设置为DF索引,则日期/时间的维度很好并且被视为维度:
<xarray.Dataset>
Dimensions: (Date: 8)
Coordinates:
* Date (Date) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Data variables:
Time (Date) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
latitude (Date) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
longitude (Date) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
Status (Date) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
depth (Date) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
title: Data
summary: Data generated
Conventions: CF-1.6
但是如果我在Date_Time,Lat和Lon上设置df.index,它将恢复为空白(索引) . 希望能够获得设置尺寸的指针 . 使用netCDF模块,可以使用以下语法:lat = dataset.createDimension('lat',73)来创建维度 . SO示例add dimension to an xarray DataArray并没有丢失某些东西,或者它想要使它达到nc转储产生类似于此的东西的程度 .
NetCDF dimension information:
Name: lat
size: 73
type: dtype('float32')
units: u'degrees_north'
actual_range: array([ 90., -90.], dtype=float32)
long_name: u'Latitude'
standard_name: u'latitude'
axis: u'Y'
Name: lon
size: 144
type: dtype('float32')
units: u'degrees_east'
long_name: u'Longitude'
actual_range: array([ 0. , 357.5], dtype=float32)
standard_name: u'longitude'
axis: u'X'
Name: time
size: 366
type: dtype('float64')
units: u'hours since 1-1-1 00:00:0.0'
long_name: u'Time'
actual_range: array([ 17628096., 17636856.])
delta_t: u'0000-00-01 00:00:00'
standard_name: u'time'
axis: u'T'
avg_period: u'0000-00-01 00:00:00'
否则,我可以将DF列转换为np数组,并使用netCDF模块?提前谢谢了 . 我冒昧尝试这样的事情,但我怀疑它是在正确的道路上:
#add dimeensions
#d = {}
#d['time'] = ('time',df2.Time)
#d['latitude'] = ('latitude',df2.latitude)
#d['longitude'] = ('longitude', df2.longitude)
#d['var'] = (['time','latitude','longitude','Depth'], xr)
#xr = xray.Dataset(d)
1 回答
在转换为xarray数据集之前,将
Time
,grid_latitude
和grid_longitude
组合到DataFrame上的pandas.MultiIndex
与set_index()
之间,这是最容易实现的 .例如: