Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/develop/fixing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Fixing data
***********

The baseline case for ESMValCore input data is CMOR fully compliant
The baseline case for ESMValCore input data is fully :ref:`CMOR compliant <cmor_tables>`
data that is read using Iris' :func:`iris:iris.load_raw`.
ESMValCore also allows for some departures from compliance (see
:ref:`cmor_check_strictness`). Beyond that situation, some datasets
Expand Down
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Contact information is available :ref:`here <Support-and-Contact>`.
Development <develop/index>
Contributing <contributing>
How-to guides <how-to/index>
Reference guides <reference/index>
ESMValCore API Reference <api/esmvalcore>
Changelog <changelog>

Expand Down
13 changes: 6 additions & 7 deletions doc/quickstart/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -851,12 +851,11 @@ The keyword arguments specified in the list items are directly passed to
Extra Facets
------------

It can be useful to automatically add extra key-value pairs to variables or
datasets without explicitly specifying them in the recipe.
These key-value pairs can be used for :ref:`finding data
<extra-facets-data-finder>` or for providing extra information to the functions
that :ref:`fix data <extra-facets-fixes>` before passing it on to the
preprocessor.
It can be useful to automatically add extra :ref:`facets <facets>` to variables
or datasets without explicitly specifying them in the recipe.
These facets can be used for :ref:`finding data <extra-facets-data-finder>` or
for providing extra information to the functions that
:ref:`fix data <extra-facets-fixes>` before passing it on to the preprocessor.

To support this, we provide the **extra facets** facilities.
Facets are the key-value pairs described in :ref:`Datasets`.
Expand Down Expand Up @@ -1149,7 +1148,7 @@ Example of the CMIP6 project configuration:
Project CMOR table configuration
--------------------------------

ESMValCore comes bundled with several CMOR tables, which are stored in the directory
ESMValCore comes bundled with several :ref:`CMOR tables <cmor_tables>`, which are stored in the directory
`esmvalcore/cmor/tables <https://github.com/ESMValGroup/ESMValCore/tree/main/esmvalcore/cmor/tables>`_.
These are copies of the tables available from `PCMDI <https://github.com/PCMDI>`_.

Expand Down
8 changes: 4 additions & 4 deletions doc/recipe/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ the following:
Recipe section: ``datasets``
============================

The ``datasets`` section includes dictionaries that, via key-value pairs or
"facets", define standardized data specifications:
The ``datasets`` section includes dictionaries that, via :ref:`facets <facets>`,
define standardized data specifications:

- dataset name (key ``dataset``, value e.g. ``MPI-ESM-LR`` or ``UKESM1-0-LL``).
- project (key ``project``, value ``CMIP5`` or ``CMIP6`` for CMIP data,
Expand Down Expand Up @@ -435,7 +435,7 @@ Recipe section: ``diagnostics``
The diagnostics section includes one or more diagnostics. Each diagnostic
section will include:

- the variable(s) to preprocess, including the preprocessor to be applied to each variable;
- the :ref:`variable(s) <cmor_tables>` to preprocess, including the preprocessor to be applied to each variable;
- the diagnostic script(s) to be run;
- a description of the diagnostic and lists of themes and realms that it applies to;
- an optional ``additional_datasets`` section.
Expand Down Expand Up @@ -563,7 +563,7 @@ running the tool (a lower number means higher priority).

Variable and dataset definitions
--------------------------------
To define a variable/dataset combination that corresponds to an actual
To define a :ref:`variable <cmor_tables>`/dataset combination that corresponds to an actual
variable from a dataset, the keys in each variable section
are combined with the keys of each dataset definition. If two versions of the same
key are provided, then the key in the datasets section will take precedence
Expand Down
66 changes: 66 additions & 0 deletions doc/reference/cmor_tables.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
.. _cmor_tables:

Variables and CMOR Tables
=========================

ESMValCore has been designed to facilitate working with
`Earth System Model <https://www.climateurope.eu/earth-system-modeling-a-definition/>`__
data, also known as climate model data.
To make it easy to compare and combine data from different climate models,
reanalysis datasets, and observational datasets, ESMValCore uses the standardized
variables from the
`CMOR tables <https://github.com/ESMValGroup/ESMValCore/tree/main/esmvalcore/cmor/tables>`_
Comment thread
schlunma marked this conversation as resolved.
provided by the projects it supports. `CMOR <https://github.com/PCMDI/cmor>`__
(Climate Model Output Rewriter) is a tool commonly used by climate modelling
centers to format their model output according to community standards.
The CMOR tables define the standardized variable names, units,
coordinates, and other metadata for various climate variables and are typically
compiled from a Data Request and a Controlled Vocabulary, e.g. the
`CMIP7 CMOR tables <https://github.com/WCRP-CMIP/cmip7-cmor-tables/>`__ are
based on the
`CMIP7 Data Request <https://wcrp-cmip.org/cmip-phases/cmip7/cmip7-data-request/>`__,
and the
`CMIP7 Controlled Vocabulary <https://github.com/WCRP-CMIP/CMIP7-CVs>`__
.
ESMValCore comes bundled with several CMOR tables, which are stored in the directory
`esmvalcore/cmor/tables <https://github.com/ESMValGroup/ESMValCore/tree/main/esmvalcore/cmor/tables>`_.
It is possible to :ref:`configure which CMOR tables are used by ESMValCore <cmor_table_configuration>`.

The :ref:`facets <facets>` ``project``, ``mip``, ``short_name``, and optionally
``branding_suffix``, uniquely determine the variable to use. These facets are
used to look up the variable in the CMOR table for the project.
Compliance with the variable definition from the CMOR table is checked when data is
loaded, to avoid unexpected results or errors during data processing. The strictness
of these checks can be :ref:`configured <cmor_check_strictness>`.
For example, the facets ``project: CMIP6, mip: Amon, short_name: tas``
define the near-surface air temperature variable in the CMIP6 Amon table:

.. literalinclude:: ../../esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Amon.json
:start-at: "tas": {
:end-at: },
:caption: The ``tas`` variable definition in the CMIP6 Amon table at `esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Amon.json <https://github.com/ESMValGroup/ESMValCore/tree/main/esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Amon.json>`__.

In some cases the ``short_name`` (called ``out_name`` in the CMOR tables) of a
variable may differ from the name used as a key in the CMOR table.
This is always the case for CMIP7, where the
`branded variable name <https://wcrp-cmip.github.io/cmip7-guidance/CMIP7/branded_variables/>`__
is used, which is composed of the ``short_name`` followed
by an underscore and the ``branding_suffix``. For example, the facets
``project: CMIP7, mip: atmos, short_name: tas, branding_suffix: tavg-h2m-hxy-u``
Comment thread
schlunma marked this conversation as resolved.
select one of the near-surface air temperature variables in the CMIP7 atmos table:

.. literalinclude:: ../../esmvalcore/cmor/tables/cmip7/Tables/CMIP7_atmos.json
:start-at: "tas_tavg-h2m-hxy-u": {
:end-at: },
:caption: One of the ``tas`` variable definitions in the CMIP7 atmos table at `esmvalcore/cmor/tables/cmip7/Tables/CMIP7_atmos.json <https://github.com/ESMValGroup/ESMValCore/tree/main/esmvalcore/cmor/tables/cmip7/Tables/CMIP7_atmos.json>`__.

For other projects, the facet ``branding_suffix`` can also be used to distinguish
between variables from the same CMOR table that share the same ``short_name``,
but differ in other aspects, even though these projects do not use branded variables.
For example, the ``ch4Clim`` entry in the CMIP6 Amon table can be selected in
the recipe by specifying ``project: CMIP6, mip: Amon, short_name: ch4, branding_suffix: Clim``:

.. literalinclude:: ../../esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Amon.json
:start-at: "ch4Clim": {
:end-at: },
:caption: One of the ``ch4`` variable definitions in the CMIP6 Amon table at `esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Amon.json <https://github.com/ESMValGroup/ESMValCore/tree/main/esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Amon.json>`__.
179 changes: 179 additions & 0 deletions doc/reference/facets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
.. _facets:

Facets
======

A facet is a key-value pair that describes a certain property of a dataset and
enables `faceted search <https://en.wikipedia.org/wiki/Faceted_search>`_, for
example as provided by `ESGF <https://esgf-node.ornl.gov/search>`__.
The facets used on ESGF are closely related to the global attributes defined by
the `controlled vocubulary <https://en.wikipedia.org/wiki/Controlled_vocabulary>`__
used by the various "project"s hosted on ESGF. A "project" is a collection of
datasets that share certain properties, e.g.
`CMIP7 <https://wcrp-cmip.org/cmip-phases/cmip7/>`__ is a project.
Each project has its own set of facets that are relevant for that project.
The documents linked below provide an overview of the official facets for
various projects. They also provide a reference directory structure and file naming
convention based on facets, which is used to organise data on local filesystems.

ESMValCore uses "facets" to search for and define input data, both in the
:ref:`recipe <recipe>` and in the :class:`~esmvalcore.dataset.Dataset` object.
This allows specifying data without relying on e.g. file names or directory
structures, which may vary between computers. ESMValCore uses its own set of
facets, which is consistent across all projects it supports.

Here is a mapping from the facet names used in ESMValCore to the corresponding
project specific facet names used on ESGF.

CMIP7
-----

`Official CMIP7 facets <https://wcrp-cmip.github.io/cmip7-guidance/CMIP7/global_attributes/>`__.

.. note::
This mapping is prelimary as no CMIP7 data bas been published on ESGF yet.

+----------------------+---------------------------+
| ESMValCore facet | ESGF facet |
+======================+===========================+
| ``activity`` | ``activity_id`` |
+----------------------+---------------------------+
| ``branding_suffix`` | ``branding_suffix`` |
+----------------------+---------------------------+
| ``dataset`` | ``source_id`` |
+----------------------+---------------------------+
| ``ensemble`` | ``variant_label`` |
+----------------------+---------------------------+
| ``exp`` | ``experiment_id`` |
+----------------------+---------------------------+
| ``frequency`` | ``frequency`` |
+----------------------+---------------------------+
| ``grid`` | ``grid_label`` |
+----------------------+---------------------------+
| ``institute`` | ``institution_id`` |
+----------------------+---------------------------+
| ``realm`` | ``realm`` |
+----------------------+---------------------------+
| ``region`` | ``region`` |
+----------------------+---------------------------+
| ``project`` | ``project`` / ``mip_era`` |
+----------------------+---------------------------+
| ``short_name`` | ``variable_id`` |
+----------------------+---------------------------+
| ``version`` | ``version`` |
+----------------------+---------------------------+

CMIP6
-----

`Official CMIP6 facets <https://wcrp-cmip.github.io/WGCM_Infrastructure_Panel/Papers/CMIP6_global_attributes_filenames_CVs_v6.2.7.pdf>`__.

+----------------------+---------------------------+
| ESMValCore facet | ESGF facet |
+======================+===========================+
| ``activity`` | ``activity_id`` |
+----------------------+---------------------------+
| ``dataset`` | ``source_id`` |
+----------------------+---------------------------+
| ``ensemble`` | ``member_id`` |
+----------------------+---------------------------+
| ``exp`` | ``experiment_id`` |
+----------------------+---------------------------+
| ``frequency`` | ``frequency`` |
+----------------------+---------------------------+
| ``grid`` | ``grid_label`` |
+----------------------+---------------------------+
| ``institute`` | ``institution_id`` |
+----------------------+---------------------------+
| ``mip`` | ``table_id`` |
+----------------------+---------------------------+
| ``realm`` | ``realm`` |
+----------------------+---------------------------+
| ``project`` | ``project`` / ``mip_era`` |
+----------------------+---------------------------+
| ``short_name`` | ``variable_id`` |
+----------------------+---------------------------+

CMIP5
-----

`Official CMIP5 facets <https://pcmdi.github.io/mips/cmip5/docs/CMIP5_output_metadata_requirements.pdf>`__.
Note that there appear to be differences between the official facets and those
used on ESGF. Below we present the facets used on ESGF.

+----------------------+-----------------------+
| ESMValCore facet | ESGF facet |
+======================+=======================+
| ``dataset`` | ``model`` |
+----------------------+-----------------------+
| ``ensemble`` | ``ensemble`` |
+----------------------+-----------------------+
| ``exp`` | ``experiment`` |
+----------------------+-----------------------+
| ``frequency`` | ``time_frequency`` |
+----------------------+-----------------------+
| ``institute`` | ``institute`` |
+----------------------+-----------------------+
| ``mip`` | ``cmor_table`` |
+----------------------+-----------------------+
| ``realm`` | ``realm`` |
+----------------------+-----------------------+
| ``product`` | ``product`` |
+----------------------+-----------------------+
| ``project`` | ``project`` |
+----------------------+-----------------------+
| ``short_name`` | ``variable`` |
+----------------------+-----------------------+

CMIP3
-----

+----------------------+-----------------------+
| ESMValCore facet | ESGF facet |
+======================+=======================+
| ``dataset`` | ``model`` |
+----------------------+-----------------------+
| ``ensemble`` | ``ensemble`` |
+----------------------+-----------------------+
| ``exp`` | ``experiment`` |
+----------------------+-----------------------+
| ``frequency`` | ``time_frequency`` |
+----------------------+-----------------------+
| ``short_name`` | ``variable`` |
+----------------------+-----------------------+

CORDEX
-------

`Official CORDEX-CMIP5 facets <https://zenodo.org/records/15223120>`__.
Note that there appear to be differences between the official facets and those
used on ESGF. Below we present the facets used on ESGF.

+----------------------+-----------------------+
| ESMValCore facet | ESGF facet |
+======================+=======================+
| ``dataset`` | ``rcm_name`` |
+----------------------+-----------------------+
| ``driver`` | ``driving_model`` |
+----------------------+-----------------------+
| ``domain`` | ``domain`` |
+----------------------+-----------------------+
| ``ensemble`` | ``ensemble`` |
+----------------------+-----------------------+
| ``exp`` | ``experiment`` |
+----------------------+-----------------------+
| ``frequency`` | ``time_frequency`` |
+----------------------+-----------------------+
| ``institute`` | ``institute`` |
+----------------------+-----------------------+
| ``short_name`` | ``variable`` |
+----------------------+-----------------------+

obs4MIPs
--------

`Official obs4MIPs facets <https://doi.org/10.5281/zenodo.11500473>`__.
Note that obs4MIPs first followed the CMIP5 conventions before switching to
the CMIP6 conventions. That means that both conventions are in use depending on
when a particular dataset was published. See the CMIP5 and CMIP6 tables above
for the mappings.
17 changes: 17 additions & 0 deletions doc/reference/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _reference:

Reference guides
================

.. note::
This is work in progress as we are currently working on adding more
reference guides to this section.

For a more detailed overview of what reference guides should cover,
you can head to the `Diataxis page <https://diataxis.fr/reference/>`__.

.. toctree::
:maxdepth: 1

Variables and CMOR Tables <cmor_tables>
Facets <facets>
7 changes: 5 additions & 2 deletions esmvalcore/_recipe/to_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,8 +255,11 @@ def _append_missing_supplementaries(
for facet in FACETS.get(project, ["mip"])
if facet not in _CMOR_KEYS + tuple(INHERITED_FACETS)
}
if "version" in facets:
supplementary_facets["version"] = "*"
for key in ("frequency", "version"):
# Do not inherit these facets as they tend to differ from the
# main variable.
if key in facets:
supplementary_facets[key] = "*"
supplementary_facets["short_name"] = short_name
supplementaries.append(supplementary_facets)

Expand Down
13 changes: 9 additions & 4 deletions esmvalcore/cmor/_fixes/fix.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,15 +252,20 @@ def get_fixes(
Fixes to apply for the given data.

"""
vardef = get_var_info(project, mip, short_name)
if extra_facets is None:
extra_facets = {}

vardef = get_var_info(
project,
mip,
short_name,
branding_suffix=extra_facets.get("branding_suffix"),
)

project = project.replace("-", "_").lower()
dataset = dataset.replace("-", "_").lower()
short_name = short_name.replace("-", "_").lower()

if extra_facets is None:
extra_facets = {}

fixes = []

fixes_modules = []
Expand Down
Loading