initial commit for esm-c model code #1463

pstjohn · 2026-02-11T22:40:13Z

Adds the ESM-C 300M model from https://huggingface.co/EvolutionaryScale/esmc-300m-2024-12 to models/esmc

Summary by CodeRabbit

Release Notes

New Features
- Added ESMC TransformerEngine-optimized protein language model.
- Added model conversion utilities between ESMC and TransformerEngine formats.
- Added export functionality for ESMC-300M checkpoint to HuggingFace format.
- Support for FP8 quantization, multiple attention backends, and input format options (BSHD, THD).
Tests
- Added comprehensive test infrastructure for model validation and conversion fidelity.
- Added test fixtures for recipe parameterization, attention backends, and distributed testing.

copy-pr-bot · 2026-02-11T22:40:17Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-11T22:40:22Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

pstjohn · 2026-02-11T22:45:03Z

bionemo-recipes/models/esmc/modeling_esmc_te.py

+        # TE's _create_qk_norm_modules doesn't respect params_dtype, so QK norm
+        # weights default to float32. Cast them to match the model dtype to avoid
+        # Q/K vs V dtype mismatch during FP8 attention.
+        if config.dtype is not None:
+            for layer in self.layers:
+                for norm in (layer.self_attention.q_norm, layer.self_attention.k_norm):
+                    if norm is not None:
+                        norm.to(dtype=config.dtype)


this can't be right. We use layernorm elsewhere in recipes

oh maybe we don't use qk_norm_type elsewhere

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@bionemo-recipes/models/esmc/export.py`:
- Around line 68-69: The copy uses a relative path which breaks when run from
other working directories; change the source to an absolute path derived from
the module file (e.g., Path(__file__).resolve().parent / "modeling_esmc_te.py")
when calling shutil.copy so that export_path / "modeling_esmc_te.py" remains the
destination, and add the necessary "from pathlib import Path" import if not
present; keep the target as export_path to preserve existing behavior.

In `@bionemo-recipes/models/esmc/state.py`:
- Around line 66-73: The function apply_transforms currently uses mutable
default arguments for transforms and state_dict_ignored_entries which can lead
to unexpected shared-state bugs; change their defaults to None (i.e.,
transforms: Optional[List[...]] = None and state_dict_ignored_entries:
Optional[List] = None) and inside apply_transforms (near the top of the
function) add defensive handling: if transforms is None: transforms = [] and if
state_dict_ignored_entries is None: state_dict_ignored_entries = []; keep the
rest of the logic unchanged and ensure any subsequent code references the local
lists.

In `@bionemo-recipes/models/esmc/tests/common/__init__.py`:
- Around line 44-51: The docstring example imports a non-existent symbol
(BioNeMoModelTester) but the module actually exports BaseModelTest; update the
example to import BaseModelTest (and any other real exports like TestTolerances)
and have the example tester class inherit from BaseModelTest (e.g., class
ESM2ModelTester(BaseModelTest):) and adjust any referenced abstract method names
to match BaseModelTest's API so the snippet reflects the real public types.
- Around line 1-30: Remove the duplicate license header block that appears twice
and retain only the canonical 2025 Apache-2.0 header; specifically delete the
first header containing "SPDX-FileCopyrightText: Copyright (c) 2026" and
"SPDX-License-Identifier: LicenseRef-Apache2" and keep the second header
containing "SPDX-FileCopyrightText: Copyright (c) 2025" and
"SPDX-License-Identifier: Apache-2.0" so only one Apache-2.0 license header
(with those SPDX tags) remains in __init__.py.

🧹 Nitpick comments (15)

bionemo-recipes/models/esmc/requirements.txt (1)

1-6: Consider pinning versions and separating dev dependencies.

For reproducibility and to avoid breaking changes, consider:

Pinning versions for critical dependencies (transformer_engine, transformers, torch, accelerate)

Separating pytest into a dev-requirements.txt or using extras in setup.py/pyproject.toml
bionemo-recipes/models/esmc/state.py (1)
160-162: Use logger instead of print for consistency.

Line 161 uses print() while the rest of the file uses logger. This should be consistent for proper log level control.
♻️ Suggested fix
-            print(f"Unexpected key: {name} not in target model but is in source model.")
+            logger.warning(f"Unexpected key: {name} not in target model but is in source model.")
bionemo-recipes/models/esmc/tests/common/README.md (1)
7-13: Add language identifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified. For directory structures, use text or plaintext.
📝 Suggested fix
-```
+```text
 tests/common/
 ├── __init__.py             # Public API exports
 ├── test_modeling_common.py # BaseModelTest, TestTolerances
 ├── fixtures.py             # input_format, fp8_recipe, te_attn_backend, etc.
 └── README.md
</details>

</blockquote></details>
<details>
<summary>bionemo-recipes/models/esmc/export.py (1)</summary><blockquote>

`28-29`: **Use explicit relative imports for robustness.**

The imports `import convert` and `from modeling_esmc_te import ...` rely on implicit relative imports, which can fail depending on how the module is invoked.

<details>
<summary>♻️ Suggested fix</summary>

```diff
-import convert
-from modeling_esmc_te import AUTO_MAP, NVEsmcConfig
+from . import convert
+from .modeling_esmc_te import AUTO_MAP, NVEsmcConfig
Note: If this script is intended to be run directly as __main__, you may need to add the parent directory to sys.path or use a different approach.
bionemo-recipes/models/esmc/convert.py (3)
37-68: Remove or use the mapping dictionary.

The mapping dictionary is defined but never used in convert_esmc_to_te or convert_esmc_te_to_ref. The actual key mapping is done inline in the conversion functions.

If this is intended as documentation, consider moving it to a docstring or comment. Otherwise, consider removing it to avoid confusion.

195-197: Consider using strict=True or logging unmatched keys.

Using strict=False in load_state_dict will silently ignore missing or unexpected keys, which could mask conversion bugs. Consider either:

Using strict=True (may need to handle _extra_state keys explicitly)

Logging the result of load_state_dict to surface any mismatches
missing, unexpected = model_te.load_state_dict(target_state, strict=False, assign=True)
if missing or unexpected:
    logger.warning(f"State dict mismatch - missing: {missing}, unexpected: {unexpected}")
34-34: Use explicit relative import.

Same issue as in export.py - the implicit import may fail depending on how the module is invoked.
♻️ Suggested fix
-from modeling_esmc_te import NVEsmcConfig, NVEsmcForMaskedLM
+from .modeling_esmc_te import NVEsmcConfig, NVEsmcForMaskedLM
bionemo-recipes/models/esmc/tests/common/fixtures.py (3)
1-30: Duplicate license headers should be consolidated.

The file contains two separate license headers (lines 1-14 for 2026 and lines 16-29 for 2025). This appears to be a copy-paste artifact. Keep only one license header with the appropriate year.
🛠️ Proposed fix
 # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: LicenseRef-Apache2
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
 """Shared test fixtures for BioNeMo models."""
55-66: Remove redundant import os inside fixture.

os is already imported at line 33; the local import at line 62 is unnecessary.
🧹 Proposed fix
 `@pytest.fixture`(autouse=True)
 def use_te_debug():
     """Auto-use fixture to enable TransformerEngine debugging.
 
     This fixture automatically enables debug mode for TransformerEngine
     in all tests for better error messages.
     """
-    import os
-
     os.environ["NVTE_DEBUG"] = "1"
     yield
     del os.environ["NVTE_DEBUG"]
126-143: Use os.environ.pop() for safer cleanup.

If the fixture setup fails before setting the environment variables, the cleanup code will raise KeyError. Using pop() with a default is more defensive.
🛡️ Proposed fix
     yield request.param
 
-    del os.environ["NVTE_FUSED_ATTN"]
-    del os.environ["NVTE_FLASH_ATTN"]
+    os.environ.pop("NVTE_FUSED_ATTN", None)
+    os.environ.pop("NVTE_FLASH_ATTN", None)
     _attention_backends["backend_selection_requires_update"] = True
bionemo-recipes/models/esmc/modeling_esmc_te.py (3)
112-112: Type annotation mismatch for _tied_weights_keys.

The type is annotated as ClassVar[dict[str, str]] but HuggingFace's convention expects ClassVar[list[str]] (a list of weight key patterns). The empty value {} currently has no effect, but the type annotation should match the expected usage.
🧹 Proposed fix
-    _tied_weights_keys: ClassVar[dict[str, str]] = {}
+    _tied_weights_keys: ClassVar[list[str]] = []
Apply the same fix at line 300 for NVEsmcForMaskedLM.
263-264: Hardcoded max_seq_len=4096 for rotary embeddings.

The maximum sequence length is hardcoded. Consider making this configurable via NVEsmcConfig (e.g., max_position_embeddings) or at minimum document this limitation.
🔧 Proposed fix

Add to NVEsmcConfig.__init__:
max_position_embeddings: int = 4096,
Then use it:
         with torch.autocast(device_type="cuda", enabled=False):
-            te_rope_emb = self.rotary_emb(max_seq_len=4096)
+            te_rope_emb = self.rotary_emb(max_seq_len=self.config.max_position_embeddings)
404-404: Module-level dynamo config modification affects global state.

torch._dynamo.config.capture_scalar_outputs = True modifies global state at import time, which could affect other code. Consider adding a comment explaining why this is necessary for the @torch.compile decorators on the packing functions.
📝 Proposed documentation
 # ===================== Utility Functions for THD Packing =====================
 
+# Enable scalar output capture for torch.compile to support variable-length
+# sequence packing/unpacking operations in _pad_input and _unpad_input.
 torch._dynamo.config.capture_scalar_outputs = True
bionemo-recipes/models/esmc/tests/common/test_modeling_common.py (2)
598-604: Remove redundant AutoConfig import.

AutoConfig is already imported at line 30; the local import at line 599 is unnecessary.
🧹 Proposed fix
     def test_convert_config(self):
         """Test that config can be converted between HF and TE formats."""
         upstream_id = self.get_upstream_model_id()
         revision = self.get_upstream_model_revision()
 
         # Load HF config
-        from transformers import AutoConfig
-
         kwargs = {}
544-558: Base class assumes get_te_to_hf_converter returns a model, but ESMC returns state_dict.

The base class test_convert_te_to_hf assumes the converter returns a model instance (line 558 asserts isinstance(model_hf_converted, self.get_upstream_model_class())). However, ESMC's get_te_to_hf_converter returns convert_esmc_te_to_ref which produces a state_dict.

ESMC correctly overrides this test, but the docstring for get_te_to_hf_converter (lines 192-199) should clarify the expected return type contract, or the base class should document that subclasses may need to override these tests for non-HF models.

bionemo-recipes/models/esmc/export.py

bionemo-recipes/models/esmc/state.py

bionemo-recipes/models/esmc/tests/common/__init__.py

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

pstjohn · 2026-02-12T14:48:23Z

@coderabbitai resolve

coderabbitai · 2026-02-12T14:48:39Z

✅ Actions performed

Comments resolved.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

pstjohn commented Feb 11, 2026

View reviewed changes

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

pstjohn added 2 commits February 12, 2026 06:45

initial commit for esm-c model code

7e84137

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

address coderabbit review

f1157f7

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

pstjohn marked this pull request as ready for review February 12, 2026 14:48

pstjohn requested review from cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, savitha-eng and trvachov as code owners February 12, 2026 14:48

pstjohn force-pushed the pstjohn/bio-236-add-esm-c-model branch from 2bd275f to f1157f7 Compare February 12, 2026 14:48

pstjohn marked this pull request as draft February 12, 2026 14:50

match golden values with explicit q and k norm layers

e4d0c9d

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial commit for esm-c model code #1463

initial commit for esm-c model code #1463

Uh oh!

pstjohn commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

Review skipped

Uh oh!

pstjohn Feb 11, 2026

Uh oh!

pstjohn Feb 11, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pstjohn commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

initial commit for esm-c model code #1463

Are you sure you want to change the base?

initial commit for esm-c model code #1463

Uh oh!

Conversation

pstjohn commented Feb 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

pstjohn Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

pstjohn Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pstjohn commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pstjohn commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading