Skip to content

mrixlam/intgen

Repository files navigation

Intermediate-file generation utility for WRF/MPAS

Intermediate-file generation utility (IntGen) for converting meteorological analysis (and model forecast) data such as ERA5, IFS, GFS, etc. into WPS-compatible intermediate files.

This repository contains lightweight utilities and a small processing framework used to extract variables, write intermediate fields, and compute diagnostics. A lot of the capability is inspired (and mimicked) by the WPS intermediate file generation code era5_to_int.

Main Features

  • Lightweight IntProcessor orchestration for reading ERA5 / IFS / GFS inputs and writing intermediate files.
  • GFS GRIB2 support — reads NCEP GFS 0.25° pressure-level GRIB2 files via pygrib, with direct field mapping (RH, geopotential height, snow, soil height provided natively).
  • Diagnostic hooks to compute derived fields (e.g., relative humidity, snow diagnostics) for ECMWF sources.
  • ECMWF pressure interpolation for model-level data (analogous to WPS's calc_ecmwf_p.exe).
  • Automatic conversion from hybrid sigma-pressure coordinates to isobaric levels.
  • Proper vertical coordinate handling for MPAS init_atmosphere compatibility.
  • Unified CLI with download, validate, and process subcommands.
  • Centralized FileManager with a canonical ~/intgenData directory layout.

Requirements

  • Python 3.8+
  • numpy, netCDF4, pygrib (installed automatically with pip install .)
  • cdsapi — for ERA5 downloads from the Copernicus Climate Data Store
  • ecmwf-api-client — for IFS downloads from ECMWF MARS
  • GFS downloads use Python stdlib only (urllib) — no extra package needed
  • Optional: pytest for the test suite

Note: pygrib is required for GFS GRIB2 processing. It is included as a dependency and installed automatically.

Installation

Recommended — conda

# Create a new conda environment with Python 3.11
conda create -n intgen python=3.11 -y

# Activate the environment
conda activate intgen

# Install core dependency packages from conda-forge
conda install -c conda-forge numpy netCDF4 pygrib cdsapi ecmwf-api-client -y

# Install additional dependencies for development (optional)
conda install -c conda-forge pytest black flake8 pytest-cov -y

# Install IntGen in editable mode for development
pip install -e .  

Production / CI

pip install .

Download dependencies (optional)

If we want to use the intgen download subcommand to fetch ERA5 and IFS data, we need to install the respective API clients cdsapi and ecmwf-api-client. These should already be installed if you used the conda installation above, but if you used pip install . in a clean environment, you may need to install these separately.

# Install CDS API client to download ERA5 data
pip install cdsapi

# Install ECMWF API client to download IFS data
pip install ecmwf-api-client

Besides the API clients, you will also need credential files to authenticate with the respective data services and access the data. These files are typically placed in the user's home directory and contain API keys or tokens:

Testing

# Install test dependencies
pip install pytest

# Run the test suite
pytest -q

# Run a specific test file
pytest -q tests/core/test_era5_locate.py

# Test a specific test function
pytest -q tests/core/test_era5_locate.py::test_locate_era5

# Test with coverage report
pytest --cov=intgen tests/

# Test with coverage report and missing statements
pytest --cov=intgen --cov-report=term-missing tests/

CLI Overview

IntGen exposes three subcommands through the intgen console entry point:

Subcommand Purpose
intgen download Download ERA5, IFS, or GFS data from CDS / MARS / AWS S3
intgen validate Verify that all required input files exist and are readable
intgen process Build WPS intermediate files from ERA5, IFS, or GFS data
intgen --help                  # show top-level help
intgen <command> --help        # show help for a specific subcommand

For development (without installing), use python -m intgen.cli instead of intgen.


Data Directory Layout

All subcommands share a single base directory (default ~/intgenData) managed by the FileManager:

~/intgenData/
├── input/
│   ├── era5/
│   │   ├── ml/
│   │   │   └── YYYYMMDD/
│   │   │       └── HHz/    ← era5.oper.an.ml.{code}.{name}.regn320sc.YYYYMMDDThh.nc
│   │   └── sfc/
│   │       └── YYYYMMDD/
│   │           └── HHz/    ← era5.oper.an.sfc.{code}.{name}.regn320sc.YYYYMMDDThh.nc
│   ├── ifs/
│   │   ├── ml/
│   │   │   └── YYYYMMDD/
│   │   │       └── HHz/    ← ec.oper.an.ml.{code}.{name}.regn1280sc.YYYYMMDDThh.nc
│   │   ├── pl/
│   │   │   └── YYYYMMDD/
│   │   │       └── HHz/    ← ec.oper.an.pl.{code}.{name}.regn1280sc.YYYYMMDDThh.nc
│   │   └── sfc/
│   │       └── YYYYMMDD/
│   │           └── HHz/    ← ec.oper.an.sfc.{code}.{name}.regn1280sc.YYYYMMDDThh.nc
│   └── gfs/
│       └── YYYYMMDD/
│           └── HHz/        ← gfs.0p25.YYYYMMDDHH.fFFF.grib2  (all sfc + pl vars combined)
└── output/                  ← Intermediate-file output

GFS bundles all variables (isobaric, surface, and soil) into a single GRIB2 file per forecast step, so no separate sfc/ or pl/ subdirectory is needed. Files downloaded via intgen download --source gfs --type pl are saved directly to input/gfs/YYYYMMDD/{HH}z/. The file locator also accepts a flat input/gfs/ or input/gfs/YYYYMMDD/ layout for manually placed files.

Override the base directory with --base-dir on any subcommand.


1. Downloading Data (intgen download)

The download subcommand provides a unified interface to all download scripts. It automatically organizes files into the FileManager directory layout.

ERA5 Downloads

ERA5 data is fetched from the Copernicus Climate Data Store (CDS). Access to CDS repository is provided by ECMWF cdsapi that requires an user account (which can be opened for free).

Download all ERA5 model-level and surface data for a date:

intgen download --source era5 --type ml sfc \
  --start-date 2024-09-17 --grid 0.25/0.25 \
  --format netcdf --yes

Download selected ERA5 model-level data only for a specified date range:

intgen download --source era5 --type ml \
  --start-date 2024-09-17 --end-date 2024-09-19 \
  --time 00:00:00 --params u v t q --grid 0.25/0.25 --format netcdf --yes

Download selected ERA5 surface data only for a specified date range:

intgen download --source era5 --type sfc \
  --start-date 2024-09-17 --end-date 2024-09-19 \
  --time 00:00:00 --params sp msl t2m d2m skt u10 v10 sst ci sd rsn stl1 stl2 stl3 stl4 swvl1 swvl2 swvl3 swvl4 lsm z \
  --grid 0.25/0.25 --format netcdf --yes

GFS Downloads

GFS data is fetched from the public NOAA AWS S3 bucket (noaa-gfs-bdp-pds). No credentials or API client are needed. The download script retrieves both the pgrb2 and pgrb2b GRIB2 files for each requested forecast step and concatenates them into a single output file.

Download f000 (analysis time) for a single date:

intgen download --source gfs --type pl \
  --start-date 2024-09-17 --time 00 --step 0 --yes

Download multiple forecast steps:

intgen download --source gfs --type pl \
  --start-date 2024-09-17 --time 00 --step 0/6/12/24 --yes

Download a date range:

intgen download --source gfs --type pl \
  --start-date 2024-09-17 --end-date 2024-09-19 --time 00 --step 0 --yes

Output files are saved as gfs.0p25.{YYYYMMDDHH}.f{FFF}.grib2 under ~/intgenData/input/gfs/YYYYMMDD/{HH}z/.


IFS Downloads

IFS data is fetched from ECMWF's MARS archive. Access to MARS archive is provided by ECMWF ecmwf-api-client that requires a subscription. Please check with ECMWF regarding your access to MARS archive if you want to initialize MPAS with ECMWF's high-resolution analysis or forecast data.

Download all IFS model-level and surface data for a date:

intgen download --source ifs --type ml sfc \
  --start-date 2024-09-17 --grid 0.25/0.25 \
  --format netcdf --yes

Download IFS model-level data only:

intgen download --source ifs --type ml \
  --start-date 2024-09-17 --end-date 2024-09-19 \
  --time 00:00:00 --params u v t q \
  --grid 0.07/0.07 --format netcdf --yes

Download IFS surface data only:

intgen download --source ifs --type sfc \
  --start-date 2024-09-17 --end-date 2024-09-19 \
  --time 00:00:00 --params sp msl t2m d2m skt u10 v10 sst ci sd rsn stl1 stl2 stl3 stl4 swvl1 swvl2 swvl3 swvl4 lsm z \
  --grid 0.07/0.07 --format netcdf --yes

Download IFS pressure-level data:

intgen download --source ifs --type pl \
  --start-date 2024-09-17 --grid 0.25/0.25 \
  --format netcdf --params u v t q z --yes

Download Options Reference

Option Description
--source era5, ifs, or gfs (required)
--type ml, sfc, and/or pl — multiple allowed (required)
--start-date Start date YYYY-MM-DD (required)
--end-date End date YYYY-MM-DD (defaults to start-date)
--time Init hour HH or HH:MM:SS (default: 00:00:00)
--step Forecast step(s) in hours, e.g. 0, 0/6/12/24 (GFS only; default: 0)
--grid Grid resolution lat/lon (e.g. 0.25/0.25; ERA5/IFS only)
--format netcdf or grib (ERA5/IFS only)
--params Space-separated parameter short names or codes (ERA5/IFS only)
--base-dir Override data tree root (default: ~/intgenData)
--yes Skip confirmation prompt
--list-params Print available parameters and exit (ERA5/IFS only)

2. Validating Data (intgen validate)

After downloading, use validate to check that every required file exists and is readable before starting a processing run.

Validate ERA5 Data

Single timestep:

intgen validate --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00

Multi-day range at 6-hourly intervals:

intgen validate --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18 \
  --interval 6

Validate specific variables only:

intgen validate --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --params PSFC,TT,SPECHUMD

Validate IFS Data

Single timestep:

intgen validate --source ifs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00

Multi-day range:

intgen validate --source ifs \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18 \
  --interval 6

Validate isobaric (pressure-level) templates:

intgen validate --source ifs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --isobaric

Validate GFS Data

Single timestep:

intgen validate --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00

Multi-day range at 6-hourly intervals:

intgen validate --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18 \
  --interval 6

Validate Options Reference

Option Description
--source era5, ifs, or gfs (required)
--start-date Start timestamp YYYY-MM-DD_HH (required)
--end-date End timestamp YYYY-MM-DD_HH (required)
--interval Hours between timesteps (default: 6)
--params Comma-separated WPS variable names to check (default: all)
--isobaric Use pressure-level templates instead of model-level
--base-dir Override data tree root (default: ~/intgenData)
--debug Enable verbose debug logging

Example Output

Validating ERA5 data
  Variables  : 25
  Timesteps  : 1 (2024-09-17_00 → 2024-09-17_00, every 6h)
  Data dir   : /Users/you/intgenData

  2024-09-17_00:
    ✓ PSFC           OK  (era5.oper.an.sfc.134.128.sp.regn320sc.20240917T00.nc)
    ✓ SST            OK  (era5.oper.an.sfc.34.128.sst.regn320sc.20240917T00.nc)
    ...
    ✓ VV             OK  (era5.oper.an.ml.132.128.v.regn320sc.20240917T00.nc)

Validation Summary
  Total checks : 25
  OK           : 25
  Missing      : 0
  Bad          : 0

All files present and readable.

3. Processing Data (intgen process)

The process subcommand reads downloaded ERA5 / IFS NetCDF files or GFS GRIB2 files and writes WPS-compatible intermediate files.

Process ERA5 Data

Model-level processing (default):

intgen process --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00

Isobaric (pressure-level) processing:

intgen process --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --isobaric

Process a multi-day range:

intgen process --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18 \
  --interval 6

Dry-run (validate configuration only):

intgen process --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --dry-run

Process specific variables with debug output:

intgen process --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --params TT,SPECHUMD,PSFC --debug

Process IFS Data

Model-level processing (native hybrid sigma-pressure coordinates):

intgen process --source ifs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00

Isobaric (pressure-level) processing:

intgen process --source ifs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_06 \
  --isobaric

When using --isobaric, the tool will either read pre-interpolated pressure-level data from ec.oper.an.pl/ directories, or automatically interpolate model-level data to isobaric levels (analogous to WPS's calc_ecmwf_p.exe). Therefore, processed intermediate file is fully ready to be ingested by MPAS init-atmosphere without needing to use calc_ecmwf_p.exe externally.

Custom output directory:

intgen process --source ifs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --outdir ./my_output

Generate SST intermediate files:

intgen process --source ifs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --params LANDSEA,SEAICE,SKINTEMP --outdir ./output
cd output && for v in IFS*; do mv "$v" "SST${v#IFS}"; done  

Process GFS Data

GFS processing reads NCEP GFS 0.25° GRIB2 files and writes WPS intermediate files. GFS data is always isobaric (pressure-level), so the --isobaric flag is set automatically.

Process a single timestep:

intgen process --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00

Process a multi-day range:

intgen process --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18 \
  --interval 6

Dry-run:

intgen process --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --dry-run

Process specific variables:

intgen process --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --params TT,GHT,SPECHUMD,PSFC

Generate SST intermediate files from GFS:

intgen process --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-17_00 \
  --params LANDSEA,SEAICE,SKINTEMP --outdir ./output
cd output && for v in GFS*; do mv "$v" "SST${v#GFS}"; done

Process Options Reference

Option Description
--source era5, ifs, or gfs (default: era5)
--start-date Start timestamp YYYY-MM-DD_HH (required)
--end-date End timestamp YYYY-MM-DD_HH (required)
--interval Processing interval in hours (default: 6)
--params Comma-separated WPS variable names (default: all)
--isobaric Process as isobaric (pressure-level) data
--outdir Output directory (default: {base-dir}/output/)
--base-dir Override data tree root (default: ~/intgenData)
--dry-run Print configuration and exit without processing
--fail-on-missing Exit with error if data files are missing
--verify-io Read back intermediate files and compare slabs
--debug Enable verbose debug logging

Typical End-to-End Workflow

ERA5 Workflow

# 1. Download ERA5 model-level + surface data
intgen download --source era5 --type ml sfc \
  --start-date 2024-09-17 --end-date 2024-09-19 \
  --grid 0.25/0.25 --format netcdf --yes

# 2. Verify all files are present
intgen validate --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18

# 3. Generate intermediate files
intgen process --source era5 \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18

The same workflow applies to IFS — just replace --source era5 with --source ifs.

GFS Workflow

GFS data is downloaded directly from the public NOAA AWS S3 bucket — no credentials needed.

# 1. Download GFS 0.25° pressure-level data (pgrb2 + pgrb2b combined per step)
intgen download --source gfs --type pl \
  --start-date 2024-09-17 --end-date 2024-09-19 \
  --time 00 --step 0 --yes

# 2. Validate all files are present
intgen validate --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18

# 3. Generate intermediate files
intgen process --source gfs \
  --start-date 2024-09-17_00 --end-date 2024-09-19_18

File Naming Conventions

ERA5 files (downloaded via CDS):

era5.oper.an.{type}.{param_id}.128.{short_name}.regn320sc.{YYYYMMDDThh}.nc

Examples:

  • era5.oper.an.ml.130.128.t.regn320sc.20240917T00.nc (temperature, model-level)
  • era5.oper.an.sfc.134.128.sp.regn320sc.20240917T00.nc (surface pressure)

IFS files (downloaded via MARS):

ec.oper.an.{type}.{param_id}.128.{short_name}.regn1280sc.{YYYYMMDDThh}.nc

Examples:

  • ec.oper.an.ml.130.128.t.regn1280sc.20240917T00.nc (temperature, model-level)
  • ec.oper.an.sfc.167.128.t2m.regn1280sc.20240917T00.nc (2m temperature)

GFS files (downloaded from AWS S3 via intgen download):

gfs.0p25.{YYYYMMDDHH}.f{FFF}.grib2

Examples:

  • gfs.0p25.2024091700.f000.grib2 (all fields, 00Z init, f000)
  • gfs.0p25.2024091700.f024.grib2 (all fields, 00Z init, f024)

Each file is the concatenation of the NOAA pgrb2 and pgrb2b GRIB2 files, giving a complete set of isobaric, surface, and soil fields in a single file per forecast step.

Output intermediate files:

{PREFIX}:{YYYY-MM-DD_HH}

Examples:

  • ERA5:2024-09-17_00, IFS:2024-09-17_00, GFS:2024-09-17_00

Project Layout

intgen/                — the main Python package
├── cli.py             — CLI entry point (download / validate / process)
├── utils.py           — utility helpers
├── core/              — processing primitives, specs, and file manager
│   ├── era5_spec.py   — ERA5 input specification and templates
│   ├── ifs_spec.py    — IFS input specification and templates
│   ├── gfs_spec.py    — GFS input specification and templates
│   ├── grib_adapter.py — GRIB2 adapter (pygrib → netCDF4-like interface)
│   ├── file_manager.py — centralized directory layout manager
│   ├── processor.py   — IntProcessor orchestration
│   ├── spec_vars.py   — variable descriptor classes (VarDesc, GfsVar)
│   └── ...
├── diagnostics/       — diagnostic calculators (RH, HGT, PRES, etc.)
├── download/          — per-source download scripts
│   ├── era5_ml.py              — ERA5 model-level (CDS)
│   ├── era5_sfc.py             — ERA5 surface/single-level (CDS)
│   ├── gfs_fc_pl.py            — GFS forecast pressure-level (AWS S3, stdlib only)
│   ├── ifs_an_ml.py            — IFS analysis model-level (MARS)
│   ├── ifs_an_pl.py            — IFS analysis pressure-level (MARS)
│   ├── ifs_an_sfc.py           — IFS analysis surface (MARS)
│   ├── ifs_fc_ml.py            — IFS forecast model-level (MARS)
│   ├── ifs_fc_pl.py            — IFS forecast pressure-level (MARS)
│   └── ifs_fc_sfc.py           — IFS forecast surface (MARS)
├── grid/              — grid handling
└── projection/        — map projections
tests/                 — unit tests
ecmwf_coeffs/          — model-level coefficients (L137)
docs/                  — design and implementation notes
setup.cfg              — packaging metadata
pyproject.toml         — build configuration

Development

pip install -e .          # editable install
pytest -q                 # run tests

When working from the repository without installing, use python -m intgen.cli so imports resolve correctly.

Contributing

  • Fork + branch
  • Add tests for new features
  • Run pytest locally
  • Open a PR describing your changes

License

This project is licensed under the MIT License — see LICENSE for details.

Contact

For questions or issues, open an issue in the repository or contact the maintainer.

About

Intermediate-file generation utility (IntGen) for converting meteorological analysis (and model forecast) data such as ERA5, IFS, GFS, etc. into WPS-compatible intermediate files.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors