Declarative checkpoint config conversion (Llama pilot) by jlamypoirier · Pull Request #508 · ServiceNow/Fast-LLM

jlamypoirier · 2026-05-05T18:02:39Z

Summary

First step of the conversion-simplification refactor. Reintroduces declarative config-conversion primitives, applied within the post-#362 modular per-section structure, and migrates Llama as the pilot to validate the design.

Three sequential commits:

Reclassify architecture-impacting fields under FieldHint.architecture — eight fields (attention dense_layer / softmax_scale_power, MLP activation, MoE router, four Llama3 / five Yarn rotary scaling fields, StochasticMixer main_mixer_name, vision patch height/width). These drive the new coverage check.
Add declarative ConfigConverter primitives and section-converter ABC in fast_llm/engine/checkpoint/external.py. Eight primitives (Rename, ConstantExport, ConstantImport, Default, Optional, Ignored, Custom, Nested, Dispatch) plus ConfigSectionConverter. Walker is implicit — NestedConfigConverter and DispatchConfigConverter call public import_config/export_config so subclass overrides participate. Coverage check fires only when type(config) exactly matches the converter's declared fast_llm_config_class, so unmigrated subclasses (Mixtral on Llama, Qwen2's _check_config override, etc.) keep working through super().
Migrate Llama config converters to declarative primitives. Eight section converters cover normalization/MLP/attention/block/embeddings/head/decoder/base-model. Weight side unchanged. LlamaDecoderConverter stays imperative (Fixed/Pattern block-sequence dispatch doesn't fit cleanly). _check_config is retained as an overridable hook. PEFT non-default values now fail loudly on export instead of being silently dropped.

Notable shape decisions (open to course-correction)

Coverage check is type-strict (type(config) is cls.fast_llm_config_class). Strict subclasses defer to a more specific converter. This was needed to keep Mixtral working through super().export_config() on MoEMLPConfig while only Llama is migrated.
NestedConfigConverter is flat-merge only. The transformer side is assumed flat. Non-flat HF cases (Apriel2 mixers) will use DispatchConfigConverter with an hf_path, or CustomConfigConverter.
No global type-keyed registry. Sub-converter dispatch is local: parents declare NestedConfigConverter(field, converter_class) for fixed types, DispatchConfigConverter(field, registry) for polymorphic ones. Subclasses override sub-converter classes the same way as today's ClassVar[type] pattern.
parent_context plumbing is dropped for now (was speculative, unused in Llama). Will re-introduce as an explicit kwarg when Apriel migration needs it for mamba sibling-field defaults.
IgnoredConfigConverter is permissive — silently passes architecture fields through without check. Used for ParameterConfig sub-fields (init/lr_scale only, no architecture sub-fields) and for fields where Llama HF format genuinely has no representation. PEFT (which IS architecture-significant when configured) uses CustomConfigConverter with an explicit Assert.custom(isinstance, config.peft, NoPeftConfig) instead.

Verification

Live round-trip parity for Llama-3, Qwen2, Mistral, Mixtral, MTP-Llama with realistic HF configs.
Coverage check fires on missing declarations (verified by removing head_size).
Constant assertions fire on non-default softmax_scale_power and on configured PEFT.
pytest tests/models/test_checkpoint.py --models gpt: 139 passed, 0 failed across llama / qwen_2 / mistral / mixtral / mtp_llama / apriel2_attn / llava / diffusion_llama.

Test plan

pytest -v -n 6 tests/models/test_checkpoint.py 2>&1 | tee /tmp/fast_llm_tests/pytest_out.txt
pytest -v -n 6 tests/models/test_hf_roundtrip.py
pytest -v -n 6 --models gpt tests/
pytest -v -n 6 fast_llm_external_models/tests/ (separate invocation per CLAUDE.md)
Manual smoke: fast-llm convert --input.format llama --input.path <ref> --output.format llama --output.path <tmp>; reload both and compare configs.

What's not in this PR

Phase 2 steps 3–8 of the plan (apriel2 / mistral / qwen2 / mtp_llama / mixtral / diffusion / apriel / multimodal migrations + cleanup) and the weight-converter declarative refactor are deferred. The framework is built so they can land incrementally on top of this.

🤖 Generated with Claude Code

Eight config fields whose values directly affect model architecture were tagged as feature/core/(none). They drive the upcoming declarative-converter coverage check, which uses FieldHint.architecture as the source of truth for "must be handled by every checkpoint format". - AttentionConfig.dense_layer (output projection presence) - AttentionConfig.softmax_scale_power (attention scaling) - MLPConfig.activation (forward-pass activation type) - MoEMLPConfig.router (routing weights drive token assignment) - Llama3RotaryConfig: scale_factor, low_frequency_factor, high_frequency_factor, original_context_length - YarnRotaryConfig: scale_factor, attention_factor, beta_fast, beta_slow, original_context_length - StochasticMixerConfig.main_mixer_name (selects inference mixer) - PatchEmbeddingsConfig.patch_height/patch_width (input tokenization) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reintroduces the declarative config-conversion shape that pre-dated PR #362, applied within the post-#362 modular per-section structure. Replaces the imperative import_config/export_config bodies with a small set of named primitives and a recursive walker driven by per-section declarations. Primitives in fast_llm.engine.checkpoint.external: - RenameConfigConverter — 1:1 path rename - ConstantExportConfigConverter — write constant on export, assert on import - ConstantImportConfigConverter — assert on export, inject on import - DefaultConfigConverter — rename with HF-side fallback - OptionalConfigConverter — emit/import only when non-sentinel - IgnoredConfigConverter — declare a field as intentionally not converted - CustomConfigConverter — escape hatch for cross-field transforms - NestedConfigConverter — recurse into a fixed-typed sub-config; flat-merges HF output into the parent (transformer side is assumed flat) - DispatchConfigConverter — runtime type dispatch for polymorphic sub-configs ConfigSectionConverter is the per-Fast-LLM-class converter base. Subclasses declare their conversion via _create_config_converters() and inherit import_config/export_config concretely. The architecture-coverage check fires only when type(config) exactly matches the converter's declared fast_llm_config_class — strict subclass types defer to a more specific converter, allowing yet-to-be-migrated subclasses (e.g., Mixtral on Llama) to call super().export_config() without tripping the parent's check on fields the parent doesn't know about. The walker is implicit: NestedConfigConverter / DispatchConfigConverter call the public import_config/export_config on the sub-converter class so subclass overrides participate, rather than a private path that bypasses them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pilot of the new ConfigSectionConverter framework. Each Llama section converter (Normalization/MLP/Attention/Block/Embeddings/Head/BaseModel) now declares its conversion via _create_config_converters() instead of imperative import_config/export_config bodies. Weight side is unchanged. Notable shape decisions: - LlamaDecoderConverter stays as a regular (imperative) class because Fixed/Pattern block-sequence dispatch doesn't lend itself to the declarative shape. LlamaBaseModelConverter wires it in via a small CustomConfigConverter; subclasses (Mistral, Qwen2, MTP-Llama, ...) continue to plug in different block converters via block_converter_class. - _check_config is retained as an overridable classmethod and called from the linear_layers CustomConfigConverter, so Qwen2 can keep its asymmetric Q/K/V bias rule without re-implementing the export. - IgnoredConfigConverter is used for ParameterConfig sub-fields with no architecture-significant content (weight, output_weight, word_embeddings), and for prediction_heads (which Llama HF doesn't expose; subclass MTP-Llama adds it imperatively). - peft uses CustomConfigConverter to assert NoPeftConfig on export. Llama HF format cannot represent PEFT, so a configured LoRA now fails loudly rather than being silently dropped. - Rotary remains in CustomConfigConverter — the v4/v5 transformers split (rope_theta/rope_scaling vs. rope_parameters) and three rope_type variants don't fit pure rename primitives. Verified with live round-trips of Llama-3, Qwen2, Mistral, Mixtral, and MTP-Llama HF configs, plus tests/models/test_checkpoint.py for all GPT formats (139 passed, 0 failed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jlamypoirier and others added 3 commits May 5, 2026 18:33

jlamypoirier force-pushed the jlp_simplify_conversion branch from 5567a71 to 0c406db Compare May 5, 2026 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Declarative checkpoint config conversion (Llama pilot)#508

Declarative checkpoint config conversion (Llama pilot)#508
jlamypoirier wants to merge 3 commits intomainfrom
jlp_simplify_conversion

jlamypoirier commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlamypoirier commented May 5, 2026

Summary

Notable shape decisions (open to course-correction)

Verification

Test plan

What's not in this PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant