Intermediate-file generation utility (IntGen) for converting meteorological analysis (and model forecast) data such as ERA5, IFS, GFS, etc. into WPS-compatible intermediate files.
This repository contains lightweight utilities and a small processing framework used to extract variables, write intermediate fields, and compute diagnostics. A lot of the capability is inspired (and mimicked) by the WPS intermediate file generation code era5_to_int.
- Lightweight
IntProcessororchestration for reading ERA5 / IFS / GFS inputs and writing intermediate files. - GFS GRIB2 support — reads NCEP GFS 0.25° pressure-level GRIB2 files via
pygrib, with direct field mapping (RH, geopotential height, snow, soil height provided natively). - Diagnostic hooks to compute derived fields (e.g., relative humidity, snow diagnostics) for ECMWF sources.
- ECMWF pressure interpolation for model-level data (analogous to WPS's
calc_ecmwf_p.exe). - Automatic conversion from hybrid sigma-pressure coordinates to isobaric levels.
- Proper vertical coordinate handling for MPAS
init_atmospherecompatibility. - Unified CLI with download, validate, and process subcommands.
- Centralized
FileManagerwith a canonical~/intgenDatadirectory layout.
- Python 3.8+
numpy,netCDF4,pygrib(installed automatically withpip install .)cdsapi— for ERA5 downloads from the Copernicus Climate Data Storeecmwf-api-client— for IFS downloads from ECMWF MARS- GFS downloads use Python stdlib only (
urllib) — no extra package needed - Optional:
pytestfor the test suite
Note:
pygribis required for GFS GRIB2 processing. It is included as a dependency and installed automatically.
# Create a new conda environment with Python 3.11
conda create -n intgen python=3.11 -y
# Activate the environment
conda activate intgen
# Install core dependency packages from conda-forge
conda install -c conda-forge numpy netCDF4 pygrib cdsapi ecmwf-api-client -y
# Install additional dependencies for development (optional)
conda install -c conda-forge pytest black flake8 pytest-cov -y
# Install IntGen in editable mode for development
pip install -e . pip install .If we want to use the intgen download subcommand to fetch ERA5 and IFS data, we need to install the respective API clients cdsapi and ecmwf-api-client. These should already be installed if you used the conda installation above, but if you used pip install . in a clean environment, you may need to install these separately.
# Install CDS API client to download ERA5 data
pip install cdsapi
# Install ECMWF API client to download IFS data
pip install ecmwf-api-clientBesides the API clients, you will also need credential files to authenticate with the respective data services and access the data. These files are typically placed in the user's home directory and contain API keys or tokens:
- ERA5 (CDS) —
~/.cdsapirc(see https://cds.climate.copernicus.eu/how-to-api) - IFS (MARS) —
~/.ecmwfapirc(see https://api.ecmwf.int/v1/key/) - GFS — no credentials required; data is downloaded from the public AWS S3 bucket
noaa-gfs-bdp-pds
# Install test dependencies
pip install pytest
# Run the test suite
pytest -q
# Run a specific test file
pytest -q tests/core/test_era5_locate.py
# Test a specific test function
pytest -q tests/core/test_era5_locate.py::test_locate_era5
# Test with coverage report
pytest --cov=intgen tests/
# Test with coverage report and missing statements
pytest --cov=intgen --cov-report=term-missing tests/IntGen exposes three subcommands through the intgen console entry point:
| Subcommand | Purpose |
|---|---|
intgen download |
Download ERA5, IFS, or GFS data from CDS / MARS / AWS S3 |
intgen validate |
Verify that all required input files exist and are readable |
intgen process |
Build WPS intermediate files from ERA5, IFS, or GFS data |
intgen --help # show top-level help
intgen <command> --help # show help for a specific subcommandFor development (without installing), use python -m intgen.cli instead of intgen.
All subcommands share a single base directory (default ~/intgenData) managed by the FileManager:
~/intgenData/
├── input/
│ ├── era5/
│ │ ├── ml/
│ │ │ └── YYYYMMDD/
│ │ │ └── HHz/ ← era5.oper.an.ml.{code}.{name}.regn320sc.YYYYMMDDThh.nc
│ │ └── sfc/
│ │ └── YYYYMMDD/
│ │ └── HHz/ ← era5.oper.an.sfc.{code}.{name}.regn320sc.YYYYMMDDThh.nc
│ ├── ifs/
│ │ ├── ml/
│ │ │ └── YYYYMMDD/
│ │ │ └── HHz/ ← ec.oper.an.ml.{code}.{name}.regn1280sc.YYYYMMDDThh.nc
│ │ ├── pl/
│ │ │ └── YYYYMMDD/
│ │ │ └── HHz/ ← ec.oper.an.pl.{code}.{name}.regn1280sc.YYYYMMDDThh.nc
│ │ └── sfc/
│ │ └── YYYYMMDD/
│ │ └── HHz/ ← ec.oper.an.sfc.{code}.{name}.regn1280sc.YYYYMMDDThh.nc
│ └── gfs/
│ └── YYYYMMDD/
│ └── HHz/ ← gfs.0p25.YYYYMMDDHH.fFFF.grib2 (all sfc + pl vars combined)
└── output/ ← Intermediate-file output
GFS bundles all variables (isobaric, surface, and soil) into a single GRIB2 file per forecast step, so no separate sfc/ or pl/ subdirectory is needed. Files downloaded via intgen download --source gfs --type pl are saved directly to input/gfs/YYYYMMDD/{HH}z/. The file locator also accepts a flat input/gfs/ or input/gfs/YYYYMMDD/ layout for manually placed files.
Override the base directory with --base-dir on any subcommand.
The download subcommand provides a unified interface to all download scripts. It automatically organizes files into the FileManager directory layout.
ERA5 data is fetched from the Copernicus Climate Data Store (CDS). Access to CDS repository is provided by ECMWF cdsapi that requires an user account (which can be opened for free).
Download all ERA5 model-level and surface data for a date:
intgen download --source era5 --type ml sfc \
--start-date 2024-09-17 --grid 0.25/0.25 \
--format netcdf --yesDownload selected ERA5 model-level data only for a specified date range:
intgen download --source era5 --type ml \
--start-date 2024-09-17 --end-date 2024-09-19 \
--time 00:00:00 --params u v t q --grid 0.25/0.25 --format netcdf --yesDownload selected ERA5 surface data only for a specified date range:
intgen download --source era5 --type sfc \
--start-date 2024-09-17 --end-date 2024-09-19 \
--time 00:00:00 --params sp msl t2m d2m skt u10 v10 sst ci sd rsn stl1 stl2 stl3 stl4 swvl1 swvl2 swvl3 swvl4 lsm z \
--grid 0.25/0.25 --format netcdf --yesGFS data is fetched from the public NOAA AWS S3 bucket (noaa-gfs-bdp-pds). No credentials or API client are needed. The download script retrieves both the pgrb2 and pgrb2b GRIB2 files for each requested forecast step and concatenates them into a single output file.
Download f000 (analysis time) for a single date:
intgen download --source gfs --type pl \
--start-date 2024-09-17 --time 00 --step 0 --yesDownload multiple forecast steps:
intgen download --source gfs --type pl \
--start-date 2024-09-17 --time 00 --step 0/6/12/24 --yesDownload a date range:
intgen download --source gfs --type pl \
--start-date 2024-09-17 --end-date 2024-09-19 --time 00 --step 0 --yesOutput files are saved as gfs.0p25.{YYYYMMDDHH}.f{FFF}.grib2 under ~/intgenData/input/gfs/YYYYMMDD/{HH}z/.
IFS data is fetched from ECMWF's MARS archive. Access to MARS archive is provided by ECMWF ecmwf-api-client that requires a subscription. Please check with ECMWF regarding your access to MARS archive if you want to initialize MPAS with ECMWF's high-resolution analysis or forecast data.
Download all IFS model-level and surface data for a date:
intgen download --source ifs --type ml sfc \
--start-date 2024-09-17 --grid 0.25/0.25 \
--format netcdf --yesDownload IFS model-level data only:
intgen download --source ifs --type ml \
--start-date 2024-09-17 --end-date 2024-09-19 \
--time 00:00:00 --params u v t q \
--grid 0.07/0.07 --format netcdf --yesDownload IFS surface data only:
intgen download --source ifs --type sfc \
--start-date 2024-09-17 --end-date 2024-09-19 \
--time 00:00:00 --params sp msl t2m d2m skt u10 v10 sst ci sd rsn stl1 stl2 stl3 stl4 swvl1 swvl2 swvl3 swvl4 lsm z \
--grid 0.07/0.07 --format netcdf --yesDownload IFS pressure-level data:
intgen download --source ifs --type pl \
--start-date 2024-09-17 --grid 0.25/0.25 \
--format netcdf --params u v t q z --yes| Option | Description |
|---|---|
--source |
era5, ifs, or gfs (required) |
--type |
ml, sfc, and/or pl — multiple allowed (required) |
--start-date |
Start date YYYY-MM-DD (required) |
--end-date |
End date YYYY-MM-DD (defaults to start-date) |
--time |
Init hour HH or HH:MM:SS (default: 00:00:00) |
--step |
Forecast step(s) in hours, e.g. 0, 0/6/12/24 (GFS only; default: 0) |
--grid |
Grid resolution lat/lon (e.g. 0.25/0.25; ERA5/IFS only) |
--format |
netcdf or grib (ERA5/IFS only) |
--params |
Space-separated parameter short names or codes (ERA5/IFS only) |
--base-dir |
Override data tree root (default: ~/intgenData) |
--yes |
Skip confirmation prompt |
--list-params |
Print available parameters and exit (ERA5/IFS only) |
After downloading, use validate to check that every required file exists and is readable before starting a processing run.
Single timestep:
intgen validate --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-17_00Multi-day range at 6-hourly intervals:
intgen validate --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-19_18 \
--interval 6Validate specific variables only:
intgen validate --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--params PSFC,TT,SPECHUMDSingle timestep:
intgen validate --source ifs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00Multi-day range:
intgen validate --source ifs \
--start-date 2024-09-17_00 --end-date 2024-09-19_18 \
--interval 6Validate isobaric (pressure-level) templates:
intgen validate --source ifs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--isobaricSingle timestep:
intgen validate --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00Multi-day range at 6-hourly intervals:
intgen validate --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-19_18 \
--interval 6| Option | Description |
|---|---|
--source |
era5, ifs, or gfs (required) |
--start-date |
Start timestamp YYYY-MM-DD_HH (required) |
--end-date |
End timestamp YYYY-MM-DD_HH (required) |
--interval |
Hours between timesteps (default: 6) |
--params |
Comma-separated WPS variable names to check (default: all) |
--isobaric |
Use pressure-level templates instead of model-level |
--base-dir |
Override data tree root (default: ~/intgenData) |
--debug |
Enable verbose debug logging |
Validating ERA5 data
Variables : 25
Timesteps : 1 (2024-09-17_00 → 2024-09-17_00, every 6h)
Data dir : /Users/you/intgenData
2024-09-17_00:
✓ PSFC OK (era5.oper.an.sfc.134.128.sp.regn320sc.20240917T00.nc)
✓ SST OK (era5.oper.an.sfc.34.128.sst.regn320sc.20240917T00.nc)
...
✓ VV OK (era5.oper.an.ml.132.128.v.regn320sc.20240917T00.nc)
Validation Summary
Total checks : 25
OK : 25
Missing : 0
Bad : 0
All files present and readable.
The process subcommand reads downloaded ERA5 / IFS NetCDF files or GFS GRIB2 files and writes WPS-compatible intermediate files.
Model-level processing (default):
intgen process --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-17_00Isobaric (pressure-level) processing:
intgen process --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--isobaricProcess a multi-day range:
intgen process --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-19_18 \
--interval 6Dry-run (validate configuration only):
intgen process --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--dry-runProcess specific variables with debug output:
intgen process --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--params TT,SPECHUMD,PSFC --debugModel-level processing (native hybrid sigma-pressure coordinates):
intgen process --source ifs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00Isobaric (pressure-level) processing:
intgen process --source ifs \
--start-date 2024-09-17_00 --end-date 2024-09-17_06 \
--isobaricWhen using --isobaric, the tool will either read pre-interpolated pressure-level data from ec.oper.an.pl/ directories, or automatically interpolate model-level data to isobaric levels (analogous to WPS's calc_ecmwf_p.exe). Therefore, processed intermediate file is fully ready to be ingested by MPAS init-atmosphere without needing to use calc_ecmwf_p.exe externally.
Custom output directory:
intgen process --source ifs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--outdir ./my_outputGenerate SST intermediate files:
intgen process --source ifs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--params LANDSEA,SEAICE,SKINTEMP --outdir ./output
cd output && for v in IFS*; do mv "$v" "SST${v#IFS}"; done GFS processing reads NCEP GFS 0.25° GRIB2 files and writes WPS intermediate files. GFS data is always isobaric (pressure-level), so the --isobaric flag is set automatically.
Process a single timestep:
intgen process --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00Process a multi-day range:
intgen process --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-19_18 \
--interval 6Dry-run:
intgen process --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--dry-runProcess specific variables:
intgen process --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--params TT,GHT,SPECHUMD,PSFCGenerate SST intermediate files from GFS:
intgen process --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-17_00 \
--params LANDSEA,SEAICE,SKINTEMP --outdir ./output
cd output && for v in GFS*; do mv "$v" "SST${v#GFS}"; done| Option | Description |
|---|---|
--source |
era5, ifs, or gfs (default: era5) |
--start-date |
Start timestamp YYYY-MM-DD_HH (required) |
--end-date |
End timestamp YYYY-MM-DD_HH (required) |
--interval |
Processing interval in hours (default: 6) |
--params |
Comma-separated WPS variable names (default: all) |
--isobaric |
Process as isobaric (pressure-level) data |
--outdir |
Output directory (default: {base-dir}/output/) |
--base-dir |
Override data tree root (default: ~/intgenData) |
--dry-run |
Print configuration and exit without processing |
--fail-on-missing |
Exit with error if data files are missing |
--verify-io |
Read back intermediate files and compare slabs |
--debug |
Enable verbose debug logging |
# 1. Download ERA5 model-level + surface data
intgen download --source era5 --type ml sfc \
--start-date 2024-09-17 --end-date 2024-09-19 \
--grid 0.25/0.25 --format netcdf --yes
# 2. Verify all files are present
intgen validate --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-19_18
# 3. Generate intermediate files
intgen process --source era5 \
--start-date 2024-09-17_00 --end-date 2024-09-19_18The same workflow applies to IFS — just replace --source era5 with --source ifs.
GFS data is downloaded directly from the public NOAA AWS S3 bucket — no credentials needed.
# 1. Download GFS 0.25° pressure-level data (pgrb2 + pgrb2b combined per step)
intgen download --source gfs --type pl \
--start-date 2024-09-17 --end-date 2024-09-19 \
--time 00 --step 0 --yes
# 2. Validate all files are present
intgen validate --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-19_18
# 3. Generate intermediate files
intgen process --source gfs \
--start-date 2024-09-17_00 --end-date 2024-09-19_18ERA5 files (downloaded via CDS):
era5.oper.an.{type}.{param_id}.128.{short_name}.regn320sc.{YYYYMMDDThh}.nc
Examples:
era5.oper.an.ml.130.128.t.regn320sc.20240917T00.nc(temperature, model-level)era5.oper.an.sfc.134.128.sp.regn320sc.20240917T00.nc(surface pressure)
IFS files (downloaded via MARS):
ec.oper.an.{type}.{param_id}.128.{short_name}.regn1280sc.{YYYYMMDDThh}.nc
Examples:
ec.oper.an.ml.130.128.t.regn1280sc.20240917T00.nc(temperature, model-level)ec.oper.an.sfc.167.128.t2m.regn1280sc.20240917T00.nc(2m temperature)
GFS files (downloaded from AWS S3 via intgen download):
gfs.0p25.{YYYYMMDDHH}.f{FFF}.grib2
Examples:
gfs.0p25.2024091700.f000.grib2(all fields, 00Z init, f000)gfs.0p25.2024091700.f024.grib2(all fields, 00Z init, f024)
Each file is the concatenation of the NOAA pgrb2 and pgrb2b GRIB2 files, giving a complete set of isobaric, surface, and soil fields in a single file per forecast step.
Output intermediate files:
{PREFIX}:{YYYY-MM-DD_HH}
Examples:
ERA5:2024-09-17_00,IFS:2024-09-17_00,GFS:2024-09-17_00
intgen/ — the main Python package
├── cli.py — CLI entry point (download / validate / process)
├── utils.py — utility helpers
├── core/ — processing primitives, specs, and file manager
│ ├── era5_spec.py — ERA5 input specification and templates
│ ├── ifs_spec.py — IFS input specification and templates
│ ├── gfs_spec.py — GFS input specification and templates
│ ├── grib_adapter.py — GRIB2 adapter (pygrib → netCDF4-like interface)
│ ├── file_manager.py — centralized directory layout manager
│ ├── processor.py — IntProcessor orchestration
│ ├── spec_vars.py — variable descriptor classes (VarDesc, GfsVar)
│ └── ...
├── diagnostics/ — diagnostic calculators (RH, HGT, PRES, etc.)
├── download/ — per-source download scripts
│ ├── era5_ml.py — ERA5 model-level (CDS)
│ ├── era5_sfc.py — ERA5 surface/single-level (CDS)
│ ├── gfs_fc_pl.py — GFS forecast pressure-level (AWS S3, stdlib only)
│ ├── ifs_an_ml.py — IFS analysis model-level (MARS)
│ ├── ifs_an_pl.py — IFS analysis pressure-level (MARS)
│ ├── ifs_an_sfc.py — IFS analysis surface (MARS)
│ ├── ifs_fc_ml.py — IFS forecast model-level (MARS)
│ ├── ifs_fc_pl.py — IFS forecast pressure-level (MARS)
│ └── ifs_fc_sfc.py — IFS forecast surface (MARS)
├── grid/ — grid handling
└── projection/ — map projections
tests/ — unit tests
ecmwf_coeffs/ — model-level coefficients (L137)
docs/ — design and implementation notes
setup.cfg — packaging metadata
pyproject.toml — build configuration
pip install -e . # editable install
pytest -q # run testsWhen working from the repository without installing, use python -m intgen.cli so imports resolve correctly.
- Fork + branch
- Add tests for new features
- Run
pytestlocally - Open a PR describing your changes
This project is licensed under the MIT License — see LICENSE for details.
For questions or issues, open an issue in the repository or contact the maintainer.