ICDC Data management
Data management at ICDC
- About ICDC
- Datasets and data sources at ICDC
- Quality management
- Metadata concept
- Storage of data sets, backup and archiving
- Access options and restrictions
- Data set publication
- Communication channels
1. About ICDC
The Integrated Climate Data Center (ICDC) was started in 2008 as part of the CliSAP Cluster of Excellence as a climate database for in-situ and satellite data, with the aim of creating easy access to high-quality earth observation data and to publish data that has been created in CliSAP . After CliSAP ended in 2018, the ICDC became an institution of the Center for Earth System Research and Sustainability (CEN). Further data sets, also from the spectrum of the humanities, which were created as part of research in the Clusters of Excellence and the CEN, have now enriched the ICDC portfolio. Personal scientific advice on the use and publication of data is one of the core tasks of the ICDC.
2. Datasets and data sources at ICDC
The focus of the expertise at ICDC is on observation data of the Earth system. The majority of the data sets are from in-situ measurements and remote sensing, taken at the ground and from satellites in the atmosphere, ocean, ice and on the land surface. Much of this data is available on regular grids, but there are also data sets with spatially irregularly distributed point measurements.
There are also data products that are based on calculations and evaluations, including climatologies, climate indices and reanalysis data sets. What these data have in common is the spatial reference to geographical coordinates and a time coordinate that relates to a point in time or a period of time.
A large number of data sets are processed by ICDC in such a way that they are easier to use. This includes, for example, the translation of bit strings into a set of information that is easier to read, file format changes and the creation of global data products from tiled satellite remote sensing data sets.
The data from the Society category, which ICDC also offers, has a completely different data structure. This includes, for example, population surveys and media analyzes.
Together with the actual records was partially saved associated documentation and programs.
ICDC obtains the data sets from various sources. Some of the data sets at the ICDC come from research at the University of Hamburg and the members of the Excellence Clusters.
In addition to the data sets that come from research at CEN or one of the institutions that are involved in the Excellence Cluster, ICDC also offers data from external sources, some exclusively for internal use at CEN and for members of the Excellence Cluster.
Table 1: Total number of ICDC data records and subdivided according to category and data source.
Category |
Internal Sources: Number, (Percent) |
External Sources: Number, (Percent) |
Total |
Atmosphere (without SAMD Archive) |
5 (14,3%) |
30 (85,7%) |
35 |
SAMD Archive |
9 (4,8%) |
178 (95,2%) |
187 |
Ice and Snow |
10 (47,6%) |
11 (52,4%) |
21 |
Land |
1 (4%) |
24 (96%) |
25 |
Ocean |
16 (53,3%) |
14 (46,7%) |
30 |
Society |
5 (71,4%) |
2 (28,6%) |
7 |
Reanalyses Atmosphere |
0 (0%) |
6 (100%) |
6 |
Reanalyses Ocean |
2 (3,7%) |
52 (96,3%) |
54 |
Climate Indices |
3 (60%) |
2 (40%) |
5 |
All |
51 (13,8%) |
319 (86,2%) |
370 |
3. Quality management
All data sets offered were subjected to a careful quality check before publication at ICDC and the results were described on the associated data sheet. The ICDC scientists decide how the data will be checked for each data set. The following are common:
-
Check whether the contents of the files are complete and legible and whether the contents correspond to the description
-
Plausibility check e.g. by comparison with other data records
-
In the case of measured data, a consultation with the scientist or literature research is carried out on the magnitude of the measurement errors
At the SAMD archive, the data was checked for quality in the context of the HD (CP) 2 project using standardized procedures.
In addition to the quality assurance of already finished products, ICDC is actively involved in the evaluation and validation of earth observation data products.
4. Metadata concept
A data record with incomplete or no metadata description at all cannot be used in the long term. ICDC therefore collects all relevant information about the data. A uniform scheme is possible for the earth system data; only some of the information applies to the humanities data sets.
The following information is recorded (applicable to the humanities is marked with *):
-
Access to the data via various channels, currently mainly via FTP, HTTP, LAS, OPeNDAP and internally via the file system. *
-
Detailed description of the data set *
-
Last update of the data set at the ICDC *
-
Parameters of the data record with name, unit and comment
-
Period and temporal resolution
-
Spatial coverage and resolution
-
Data format *
-
Data quality
-
Contact persons *
-
References *
-
Data quotation *
-
License *
-
Acknowledgments *
5. Storage of data sets, backup and archiving
The data sets are saved by the ICDC as files in folders that are located in the CEN network. In addition, most of the data is also mirrored on drives that are directly accessible to users of the DKRZ. If a new version of the data record is available, it will be added. If the old version is no longer required, it will be moved to the ICDC archive and, if the data is long-term saved elsewhere, it will be deleted after a reasonable period of time. The same procedure is used for obsolete data.
All drives that ICDC uses for data storage are regularly backed up with a backup that is kept for 3 months and then discarded.
The data sheets that contain the metadata are currently being created as a website in a content management system. Although this has version management, changes in the metadata are not explicitly saved. Important changes to the data, e.g. a new version, are included in the data record description of the data sheet.
For the permanent archiving of scientific data, ICDC cooperates with the archives of the Hamburg University and the DKRZ, as only these have sufficient storage space and can guarantee the necessary retention periods. In this case, ICDC is available as a consultant for the archiving process.
6. Access options and restrictions
ICDC has several options for controlling access to the data sets. In general, it is preferred that the data can be published with Creative Commons licenses, which allows full access across all systems. In addition to file access via the CEN and DKRZ networks, this also includes availability via the web technologies used by ICDC such as FTP, THREDDS and OpeNDAP; the latter also enable interactive data access.
For data sets from external sources that are only to be made available to the Hamburg scientists, access via the WWW is restricted in such a way that the data can only be used within the Hamburg community. This is controlled via the network affiliation of the computers as well as via the user accounts, which enable access to the WWW resources from outside the network.
Within the file systems in the network of CEN and DKRZ, the data records can only be accessed with a corresponding user account. Access to directories can be further restricted by setting up user groups. Users can then use a form to submit an application to ICDC that they want to be included in the user group. For various reasons, some data are only accessible to ICDC employees and are made available by arrangement.
The access options and contact persons are described for each data record on the data sheet.
7. Data set publication
ICDC can advise the scientists before publication on how the data should be processed and determine the required meta-information. As soon as the data has been prepared by the scientist, ICDC enters the files and meta-data into the ICDC system.
ICDC publishes all data sheets as a website at this internet address.
The web-based system ensures high data visibility, e.g. in search engines. The data records can be accessed there either directly and without a login, or it is described for whom and how it can be done. This procedure enables ICDC to make data available to the public quickly and without rigid guidelines. However, it is recommended to assign a DOI for better citability.
ICDC can publish data sets via the FDR of the University of Hamburg, provide them with a DOI and maintain the data for a long period of time and they are archived there for at least 10 years. ICDC currently manages the communities there:
-
CEN - Center for Earth System Research and Sustainability
-
Cluster of Excellence Climate, Climatic Change, and Society (CLICCS)
-
Integrated Climate Data Center - ICDC
An overview of the data from these communities is here:
https://tools.fdm.uni-hamburg.de/fdm/uhh-fdr.html
DOIs have also been assigned in cooperation with the DKRZ, with the ICDC only having an advisory role; this also applies to all other repositories.
In addition, various web services at ICDC enable interactive two-dimensional visualization of the data sets.
8. Communication channels
The ICDC staff will be happy to advise you on the subject of data usage and processing and can be reached via the email address icdc.cen @ lists.uni-hamburg.de, further information on how to contact the staff, e.g. by telephone, can be found on the ICDC website here.
The changes to the data sets are summarized as a message about weekly and announced via the ICDC website, as a weekly newsletter to all employees of the CEN and Excellence Cluster, and via the ICDC Twitter channel https://twitter.com/icdc_hamburg/ .
ICDC has its own section in the CEN Confluence Collaboration System under:
https://collaboration.cen.uni-hamburg.de/display/ICDC/CEN+Integrated+Climate+Data+Center