ICDC Data for Python users
Intake YAML catalog
In a first test phase, we want to facilitate the process of reading in our ICDC data for Python users on mistral. Therefore, we created a YAML catalog - at first just for the ocean - which you can read in by means of the Python library intake. Here are some short examples how to use this catalogue on mistralpp after loading the anaconda3 module:
module unload netcdf_c
module load anaconda3
If you like this new feature, we would create catalogues for all spheres and for the CEN network (under /data/icdc), too. Please give us feedback!
Example 1
Which data sets are available?
import intake # load yaml file as a catalogue containing almost all ocean data of ICDC cat = intake.open_catalog("/pool/data/ICDC/ocean/ocean_mistral.yml")
# show set of entries
list(cat)
Example 2
Load a specific data set (HadISST):
import intake
# load yaml file as a catalogue containing almost all ocean data of ICDC
cat = intake.open_catalog("/pool/data/ICDC/ocean/ocean_mistral.yml")
# load data set hadisst into an xarray dataset
ds = cat["hadisst"].to_dask()
# from here on use it as an xarray dataset
print(ds.info())
Example 3
Loading of data sets with additional parameters: Some data sets have additional options that are called "parameters", e.g. OSCAR Surface Current Velocity has an option to load either the 1 degree or the 0.33 degree resolution of the data set. Thus, it pays off to look at the YAML file /pool/data/ICDC/ocean/ocean_mistral.yml before you load a data set.
Here you can see the excerpt for OSCAR of the YAML file:
oscar_surface_current_velocity:
driver: netcdf
description: global near-surface current estimates
metadata:
url_origin: icdc.cen.uni-hamburg.de/en/oscar-oceansurfacecurrent.html
parameters:
s_res:
description: spatial resolution
type: str
default: 1degree
allowed: [1degree,1third_degree]
args:
urlpath: /pool/data/ICDC/ocean/oscar_surface_current_velocity/DATA/{{s_res}}/*oscar_vel*.nc
xarray_kwargs:
concat_dim: time
combine: nested
drop_variables: depth
You can use the parameter s_res in Python like this:
import intake
# load yaml file as a catalogue containing almost all ocean data of ICDC
cat = intake.open_catalog("/pool/data/ICDC/ocean/ocean_mistral.yml")
# load data set oscar_surface_current_velocity in 0.33 degrees resolution
ds = cat.oscar_surface_current_velocity(s_res="1third_degree").to_dask()
print(ds.info())