Skip to content

Fixes to RMG Cantera yaml writers to better match ck2yaml#2947

Open
djlucey wants to merge 15 commits into
mainfrom
yaml_fix
Open

Fixes to RMG Cantera yaml writers to better match ck2yaml#2947
djlucey wants to merge 15 commits into
mainfrom
yaml_fix

Conversation

@djlucey
Copy link
Copy Markdown
Contributor

@djlucey djlucey commented May 12, 2026

Motivation or Problem

The Cantera yaml output files written in RMG had some errors when loaded into Cantera.

  • The cantera1 writer defines a Cantera reaction object by its products and reactants, and the equation that is then written to the yaml and used to load back into Cantera end up duplicating third body type surface species for certain reactions. It is described in this Cantera issue 2115.
  • The cantera2 writer is missing X as an element and gives CanteraError thrown by Phase::addSpecies: Species 'vacantX(3)' contains an undefined element 'X'. and also does not specify the sites correctly, giving and error Number of surface sites not balanced in reaction HOCOXX(73) <=> OX(6) + XCOH(54).

Once fixes were applied to load mechanisms into Cantera without errors, the comparison file generated at the end of the RMG run revealed some differences.

  • It identified both methods as missing the transport note entry in species definition (which is present in ck2yaml)
  • Both methods were missing "state" for surface phase (and for gas phase in cantera2 only).
  • The ordering of the cantera2 reactions was not consistent with ck2yaml, which separates them into surface and gas reactions, so the comparison method is also limited without applying this separation due to the dependence on the order of the reactions in the file.

Description of Changes

The "state" field was populated in each place it was missing for the rmg writer but present in ck2yaml. The species transport note had been turned off for the non-annotated yaml files, so this condition was removed to be consistent with ck2yaml. The gas and surface reactions were separated for surface mechanisms for the cantera2 method, which enabled the comparison method. The cantera1 logic was modified for use in cantera2 to specify custom elements. For surface species, "sites" was populated in cantera2 by counting the X atoms. To combat the duplication of species from Cantera's reaction definition, the equation was composed manually from reactants and products.

Testing

I was using this input file for methane partial oxidation: input.py which writes Cantera yaml for both methods. The comparison text files generated were used to identify differences between ck2yaml method and the RMG yaml writers. Claude was used to help make some tests that the missing fields were present and that generated yamls agree with the ck2yaml method. The new and existing tests seem to be working for me as far as I can tell.

@djlucey djlucey requested a review from rwest May 12, 2026 01:38
rwest and others added 12 commits May 13, 2026 14:41
…misidentification

Cantera's Python API, when constructing ct.Reaction from reactants/products
dicts, silently re-interprets any species with net-zero stoichiometry (equal
count on both sides — a spectator or surface catalyst) as a third-body
collider.  This mutates input_data: the equation string has the spectator's
stoichiometry doubled and a spurious 'efficiencies' entry is added.  The
corrupted YAML cannot be round-tripped.
This affects all rate types (ArrheniusRate, InterfaceArrheniusRate,
StickingArrheniusRate, PlogRate, ChebyshevRate) and is reproducible on
Cantera 3.1.0 and 3.2.0.  A bug report has been filed with the Cantera
project.

Fix: replace the reactants/products dict form of ct.Reaction with the
equation-string form for all non-third-body reaction types.  Passing an
equation string avoids the misidentification entirely.  ThirdBody, Troe, and
Lindemann reactions are left using the dict form because they require the
third_body= keyword parameter, and their species do not appear on both sides
so the bug does not affect them.

A nested helper _ct_equation() is added inside to_cantera() to build the
equation string from the already-computed ct_reactants / ct_products dicts.

Also adds a regression test (test_reaction_to_dicts_surface_spectator_species) that
verifies no 'efficiencies' key and no doubled stoichiometry appear for a SurfaceArrhenius
reaction where the same surface species appears on both sides.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
defines base elements from existing list, but also includes custom elements 
and surface site as done in cantera1 method
Replaces the ad-hoc sum over atoms with the existing Molecule API method,
and only emits the 'sites' field when a species occupies more than one site
(monodentate adsorbates with sites=1 don't need the field).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…annotated file

The way things are set up currently, transport notes belong in annotated output only.
The previous commit removed the verbose guards, to match the ck2yaml version.
Yes, we want them consistent,  but for now make it so the
non-annotated yaml files have no comments/notes, in any version.

Instead: restore if-verbose gating in both yaml_cantera1 and yaml_cantera2, and
strip transport notes from cantera_from_ck/chem.yaml after ck2yaml generates it
when verbose_comments is False, so all three non-annotated files are consistent.

To remove them from the ck2yaml-generated file, we can use a regex to
remove 'note:' lines and their indented continuations .
Avoieds yaml.safe_load + yaml.dump (which reformats the whole file)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…res cantera2 writer to ck2yaml as is done for cantera1
…haviour

Two tests were written against the original (ungated) transport note behaviour
and against sites=1 being emitted for monodentate species:
- Rename and invert the transport note tests: note must be absent when
  verbose=False and present when verbose=True.
- Single-site surface species should have no 'sites' key (Cantera defaults to 1);
  only bidentate+ species emit the field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These files are generated on the fly by running the mainTest functional test.
Committing them bakes in machine-specific paths and database-version-dependent
thermo coefficients that drift over time. The comparison test already skips
gracefully when the files are absent, so CI is unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rwest and others added 3 commits May 13, 2026 16:17
…d compares cantera2 writer to ck2yaml as is done for cantera1
Both Cantera YAML writers and the Chemkin writer were emitting a
hardcoded periodic-table grab-bag (H, C, O, N, Ne, Ar, He, Si, S, F,
Cl, Br, I plus D/T/CI/OI isotopes plus X) in every output file,
regardless of what the model actually contained. This polluted the
files and caused diffs against ck2yaml-converted reference outputs.

Add ReactionModel.get_elements() that walks each species' molecule[0]
atoms and returns the set of Element singletons in use. All three
writers now derive their elements block from that set: built-in
elements appear only when present; isotopes (D, T, CI, OI) appear only
when an isotope atom is on some species; X appears only for surface
models. Writer 2's is_plasma path still adds the E pseudo-element
without iterating atoms.

In chemkin.pyx save_chemkin, the union is computed once across all
species before the surface/gas split so that the gas-only Chemkin
file in a surface run still lists X (required for downstream ck2yaml
conversion to recognize the surface site element).

We could cache this list of elements, either updated once per save_all()
call, or even just once after model initiation (since no chemistry can 
create an element that wasn't already there). But that is not done yet.
(For simplicity)

Side cleanups: drop the unused search_for_additional_elements branch
in writer 2 (the new design subsumes it); replace writer 2's inline
MixedModel with a real ReactionModel so .get_elements() works without
duck typing; remove the module-import-time ELEMENTS_BLOCK/ELEMENTS_LINE
globals in writer 1 in favor of per-call computation. Mock containers
in the writer 2 tests now inherit ReactionModel for the same reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The destination directories under test/rmgpy/test_data/yaml_writer_data/
(cantera1, cantera2, ck2yaml) are not guaranteed to exist on a fresh
checkout: we recently removed the committed golden YAML files leaving the 
subdirectories empty. Git does not track empty directories, so CI failed 

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants