Skip to content

Conversation

@bouweandela
Copy link
Member

@bouweandela bouweandela commented Jan 5, 2026

Description

Makes a start with #2874. Also closes #333. Support for downloading from ESGF has not been implemented yet as no CMIP7 data has been published to ESGF yet. The first CMIP7 data is expected sometime in the next few months.

Works with examples/recipe_python.yml and tas and areacella data created with Simple_recmorise_cmip6-cmip7.ipynb after some updates to that notebook: Simple_recmorise_cmip6-cmip7.ipynb. Data generated with the notebook is available on Levante in /work/bd0854/b381141/simulated_cmip7_data.

Example datasets recipe entry:

datasets:
  - dataset: PCMDI-test-1-0
    project: CMIP7
    exp: historical
    ensemble: r1i1p1f3
    grid: gn
    activity: CMIP
    institute: PCMDI
    mip: atmos
    frequency: mon
    region: glb
    branding_suffix: tavg-h2m-hxy-u

A copy of examples/recipe_python.yml with the fake CMIP7 dataset generated by the notebook is available under Details.

Details
# ESMValTool
# recipe_python.yml
#
# See https://docs.esmvaltool.org/en/latest/recipes/recipe_examples.html
# for a description of this recipe.
#
# See https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/overview.html
# for a description of the recipe format.
---
documentation:
  description: |
    Example recipe that plots a map and timeseries of temperature.

  title: Recipe that runs an example diagnostic written in Python.

  authors:
    - andela_bouwe
    - righi_mattia

  maintainer:
    - schlund_manuel

  references:
    - acknow_project

  projects:
    - esmval
    - c3s-magic

datasets:
  - dataset: PCMDI-test-1-0
    project: CMIP7
    exp: historical
    ensemble: r1i1p1f3
    grid: gn
    activity: CMIP
    institute: PCMDI
    mip: atmos
    frequency: mon
    region: "*"
    branding_suffix: tavg-h2m-hxy-u
  - dataset: BCC-ESM1
    project: CMIP6
    exp: historical
    ensemble: r1i1p1f1
    grid: gn
  - dataset: bcc-csm1-1
    version: v1
    project: CMIP5
    exp: historical
    ensemble: r1i1p1

preprocessors:
  # See https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html
  # for a description of the preprocessor functions.

  to_degrees_c:
    convert_units:
      units: degrees_C

  annual_mean_amsterdam:
    extract_location:
      location: Amsterdam
      scheme: linear
    annual_statistics:
      operator: mean
    multi_model_statistics:
      statistics:
        - mean
      span: overlap
    convert_units:
      units: degrees_C

  annual_mean_global:
    area_statistics:
      operator: mean
    annual_statistics:
      operator: mean
    convert_units:
      units: degrees_C

diagnostics:
  map:
    description: Global map of temperature in January 2000.
    themes:
      - phys
    realms:
      - atmos
    variables:
      tas:
        mip: Amon
        preprocessor: to_degrees_c
        timerange: 2000/P1M
        caption: |
          Global map of {long_name} in January 2000 according to {dataset}.
    scripts:
      script1:
        script: examples/diagnostic.py
        quickplot:
          plot_type: pcolormesh
          cmap: Reds

  timeseries:
    description: Annual mean temperature in Amsterdam and global mean since 1850.
    themes:
      - phys
    realms:
      - atmos
    variables:
      tas_amsterdam:
        short_name: tas
        mip: Amon
        preprocessor: annual_mean_amsterdam
        timerange: 1850/2000
        caption: Annual mean {long_name} in Amsterdam according to {dataset}.
      tas_global:
        short_name: tas
        mip: Amon
        preprocessor: annual_mean_global
        timerange: 1850/2000
        caption: Annual global mean {long_name} according to {dataset}.
    scripts:
      script1:
        script: examples/diagnostic.py
        quickplot:
          plot_type: plot

activity and institute are automatically added from the CMIP6_CV.json file for CMIP6, but for CMIP7 this file does not appear to be available yet. I found ESGF/esgf-vocab#150 (comment) describing some initial work to generate it from the controlled vocabulary.

Link to documentation:

Backward incompatible change

Most users will not be affected by these changes, which were introduced to keep the some function signatures easy to read.

  1. It is no longer possible to pass derive as a positional argument to the esmvalcore.cmor.table.CMIP6Info.get_variable method. Please use derive=True or derive=False instead of True or False respectively.
  2. Similarly, frequency and check_level are now keyword only arguments for the functions esmvalcore.cmor.check.cmor_check_metadata, esmvalcore.cmor.check.cmor_check_data, and esmvalcore.cmor.check.cmor_check.
  3. The argument table to the method esmvalcore.cmor.table.CustomInfo.get_variable has been renamed to table_name so the signature of this method matches with the same method on the parent class esmvalcore.cmor.table.InfoBase.get_variable.

Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

Allow use of branding_suffix in other projects than CMIP7 to select the right variable from the CMOR table
@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.59%. Comparing base (f0b9561) to head (2c5139c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2935      +/-   ##
==========================================
- Coverage   95.59%   95.59%   -0.01%     
==========================================
  Files         266      266              
  Lines       15573    15593      +20     
==========================================
+ Hits        14887    14906      +19     
- Misses        686      687       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

"rcm_version",
"driver",
"domain",
"activity",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The activity is uniquely determined by the exp facet, so this only adds noise and no new information. To keep it readable, I would propose to remove this.

@bouweandela bouweandela changed the title Add CMIP7 support Add preliminary CMIP7 support Jan 8, 2026
@bouweandela bouweandela added this to the v2.14.0 milestone Jan 9, 2026
@bouweandela bouweandela marked this pull request as ready for review January 9, 2026 20:36
@bouweandela bouweandela requested a review from schlunma January 9, 2026 20:37
Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bouwe, looks great! Just got a couple of minor questions/comments.

I tested this with the fake CMIP7 data, everything worked as expected 🚀

It can be useful to automatically add extra key-value pairs to variables or
datasets without explicitly specifying them in the recipe.
It can be useful to automatically add extra key-value pairs or
:ref:`facets <facets>` to variables or datasets without explicitly specifying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Key-value pairs" and "facets" mean the same thing here, right? In that case it would probably be easier to just call them "facets" here. A more detailed description (including the mention of "key-value pairs") is given in the link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 5e13989

It can be useful to automatically add extra key-value pairs or
:ref:`facets <facets>` to variables or datasets without explicitly specifying
them in the recipe.
These key-value pairs can be used for :ref:`finding data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These key-value pairs can be used for :ref:`finding data
These facets can be used for :ref:`finding data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 5e13989


The ``datasets`` section includes dictionaries that, via key-value pairs or
"facets", define standardized data specifications:
:ref:`facets <facets>`, define standardized data specifications:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just call them "facets" here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5e13989

This is always the case for CMIP7, where the branded name of the variable is
used, which is composed of the ``short_name`` followed
by an underscore and the ``branding_suffix``. For example, the facets
``project: CMIP7, mip: atmos, short_name: tas, branding_suffix: tavg-h2m-hxy-u``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to briefly describe the different elements of the branding suffix or at least provide a link where this is described?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a link in 5e13989

To make it easy to compare and combine data from different climate models,
reanalysis datasets, and observational datasets, ESMValCore uses the standardized
variables from the
`CMOR tables <https://github.com/ESMValGroup/ESMValCore/tree/main/esmvalcore/cmor/tables>`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the "official" CMOR tables here? A link to our copy is provided below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official CMOR tables aren't kept in one place, so I can't link them in the same way. Would you prefer it if I removed the link? Or some other solution?

+------------------+-----------------------+
| ESMValCore facet | ESGF facet |
+==================+=======================+
| activity | activity_id |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to use monospace font for all the facets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5e13989

def get_variable(
self,
table: str, # noqa: ARG002
table_name: str, # noqa: ARG002
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, this is also a backwards-incompatible change, right?

Copy link
Member Author

@bouweandela bouweandela Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I changed it so the arguments to this method match with the method of the same name on the parent class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to the pull request description

CMIP7:
data:
local:
type: "esmvalcore.io.local.LocalDataSource"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: "esmvalcore.io.local.LocalDataSource"
type: esmvalcore.io.local.LocalDataSource

I don't think it's necessary to use quotation marks here. Applies to all appearances of this, also in other config files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 5e13989


start_date, end_date = _parse_period(timerange)
start, end = _get_start_end_date(filename)
start, end = filename.facets["timerange"].split("/") # type: ignore[union-attr]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the timerange facet point to the actual time range of a single file? As far as I am aware, this points to the requested time range given in the recipe (but maybe that changed).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines 30 to 31
cmor_default_table_prefix: "CMIP7_"
cmor_path: "cmip7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cmor_default_table_prefix: "CMIP7_"
cmor_path: "cmip7"

I guess those are technically not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the cmor_default_table_prefix appears to be unessary. cmor_path defaults to a lower case version of cmor_type:

default_path = os.path.join(install_dir, "tables", cmor_type.lower())
table_path = project.get("cmor_path", default_path)

so that is needed because not specifying that would load the CMIP6 CMOR tables instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed cmor_default_prefix in 5e13989

@schlunma
Copy link
Contributor

Ah, one more thing: I think this actually properly closes #333, right?

@valeriupredoi
Copy link
Contributor

I see Manu has picked up my slack and is reviewing - many thanks, Manu! And I can have last looksee when Bouwe and Manu are happy (phew!) 😁

@bouweandela
Copy link
Member Author

Thank you for reviewing @schlunma! I've addressed all your comments except #2935 (comment) because I wasn't sure what to do there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem loading zg for 6hrPlevPt

4 participants