Skip to content

Conversation

@pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Feb 11, 2026

Adds the ESM-C 300M model from https://huggingface.co/EvolutionaryScale/esmc-300m-2024-12 to models/esmc

Summary by CodeRabbit

Release Notes

  • New Features

    • Added ESMC TransformerEngine-optimized protein language model.
    • Added model conversion utilities between ESMC and TransformerEngine formats.
    • Added export functionality for ESMC-300M checkpoint to HuggingFace format.
    • Support for FP8 quantization, multiple attention backends, and input format options (BSHD, THD).
  • Tests

    • Added comprehensive test infrastructure for model validation and conversion fidelity.
    • Added test fixtures for recipe parameterization, attention backends, and distributed testing.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 11, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines 193 to 200
# TE's _create_qk_norm_modules doesn't respect params_dtype, so QK norm
# weights default to float32. Cast them to match the model dtype to avoid
# Q/K vs V dtype mismatch during FP8 attention.
if config.dtype is not None:
for layer in self.layers:
for norm in (layer.self_attention.q_norm, layer.self_attention.k_norm):
if norm is not None:
norm.to(dtype=config.dtype)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can't be right. We use layernorm elsewhere in recipes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh maybe we don't use qk_norm_type elsewhere

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@bionemo-recipes/models/esmc/export.py`:
- Around line 68-69: The copy uses a relative path which breaks when run from
other working directories; change the source to an absolute path derived from
the module file (e.g., Path(__file__).resolve().parent / "modeling_esmc_te.py")
when calling shutil.copy so that export_path / "modeling_esmc_te.py" remains the
destination, and add the necessary "from pathlib import Path" import if not
present; keep the target as export_path to preserve existing behavior.

In `@bionemo-recipes/models/esmc/state.py`:
- Around line 66-73: The function apply_transforms currently uses mutable
default arguments for transforms and state_dict_ignored_entries which can lead
to unexpected shared-state bugs; change their defaults to None (i.e.,
transforms: Optional[List[...]] = None and state_dict_ignored_entries:
Optional[List] = None) and inside apply_transforms (near the top of the
function) add defensive handling: if transforms is None: transforms = [] and if
state_dict_ignored_entries is None: state_dict_ignored_entries = []; keep the
rest of the logic unchanged and ensure any subsequent code references the local
lists.

In `@bionemo-recipes/models/esmc/tests/common/__init__.py`:
- Around line 44-51: The docstring example imports a non-existent symbol
(BioNeMoModelTester) but the module actually exports BaseModelTest; update the
example to import BaseModelTest (and any other real exports like TestTolerances)
and have the example tester class inherit from BaseModelTest (e.g., class
ESM2ModelTester(BaseModelTest):) and adjust any referenced abstract method names
to match BaseModelTest's API so the snippet reflects the real public types.
- Around line 1-30: Remove the duplicate license header block that appears twice
and retain only the canonical 2025 Apache-2.0 header; specifically delete the
first header containing "SPDX-FileCopyrightText: Copyright (c) 2026" and
"SPDX-License-Identifier: LicenseRef-Apache2" and keep the second header
containing "SPDX-FileCopyrightText: Copyright (c) 2025" and
"SPDX-License-Identifier: Apache-2.0" so only one Apache-2.0 license header
(with those SPDX tags) remains in __init__.py.
🧹 Nitpick comments (15)
bionemo-recipes/models/esmc/requirements.txt (1)

1-6: Consider pinning versions and separating dev dependencies.

For reproducibility and to avoid breaking changes, consider:

  1. Pinning versions for critical dependencies (transformer_engine, transformers, torch, accelerate)
  2. Separating pytest into a dev-requirements.txt or using extras in setup.py/pyproject.toml
bionemo-recipes/models/esmc/state.py (1)

160-162: Use logger instead of print for consistency.

Line 161 uses print() while the rest of the file uses logger. This should be consistent for proper log level control.

♻️ Suggested fix
-            print(f"Unexpected key: {name} not in target model but is in source model.")
+            logger.warning(f"Unexpected key: {name} not in target model but is in source model.")
bionemo-recipes/models/esmc/tests/common/README.md (1)

7-13: Add language identifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified. For directory structures, use text or plaintext.

📝 Suggested fix
-```
+```text
 tests/common/
 ├── __init__.py             # Public API exports
 ├── test_modeling_common.py # BaseModelTest, TestTolerances
 ├── fixtures.py             # input_format, fp8_recipe, te_attn_backend, etc.
 └── README.md
</details>

</blockquote></details>
<details>
<summary>bionemo-recipes/models/esmc/export.py (1)</summary><blockquote>

`28-29`: **Use explicit relative imports for robustness.**

The imports `import convert` and `from modeling_esmc_te import ...` rely on implicit relative imports, which can fail depending on how the module is invoked.

<details>
<summary>♻️ Suggested fix</summary>

```diff
-import convert
-from modeling_esmc_te import AUTO_MAP, NVEsmcConfig
+from . import convert
+from .modeling_esmc_te import AUTO_MAP, NVEsmcConfig

Note: If this script is intended to be run directly as __main__, you may need to add the parent directory to sys.path or use a different approach.

bionemo-recipes/models/esmc/convert.py (3)

37-68: Remove or use the mapping dictionary.

The mapping dictionary is defined but never used in convert_esmc_to_te or convert_esmc_te_to_ref. The actual key mapping is done inline in the conversion functions.

If this is intended as documentation, consider moving it to a docstring or comment. Otherwise, consider removing it to avoid confusion.


195-197: Consider using strict=True or logging unmatched keys.

Using strict=False in load_state_dict will silently ignore missing or unexpected keys, which could mask conversion bugs. Consider either:

  1. Using strict=True (may need to handle _extra_state keys explicitly)
  2. Logging the result of load_state_dict to surface any mismatches
missing, unexpected = model_te.load_state_dict(target_state, strict=False, assign=True)
if missing or unexpected:
    logger.warning(f"State dict mismatch - missing: {missing}, unexpected: {unexpected}")

34-34: Use explicit relative import.

Same issue as in export.py - the implicit import may fail depending on how the module is invoked.

♻️ Suggested fix
-from modeling_esmc_te import NVEsmcConfig, NVEsmcForMaskedLM
+from .modeling_esmc_te import NVEsmcConfig, NVEsmcForMaskedLM
bionemo-recipes/models/esmc/tests/common/fixtures.py (3)

1-30: Duplicate license headers should be consolidated.

The file contains two separate license headers (lines 1-14 for 2026 and lines 16-29 for 2025). This appears to be a copy-paste artifact. Keep only one license header with the appropriate year.

🛠️ Proposed fix
 # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: LicenseRef-Apache2
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
 """Shared test fixtures for BioNeMo models."""

55-66: Remove redundant import os inside fixture.

os is already imported at line 33; the local import at line 62 is unnecessary.

🧹 Proposed fix
 `@pytest.fixture`(autouse=True)
 def use_te_debug():
     """Auto-use fixture to enable TransformerEngine debugging.
 
     This fixture automatically enables debug mode for TransformerEngine
     in all tests for better error messages.
     """
-    import os
-
     os.environ["NVTE_DEBUG"] = "1"
     yield
     del os.environ["NVTE_DEBUG"]

126-143: Use os.environ.pop() for safer cleanup.

If the fixture setup fails before setting the environment variables, the cleanup code will raise KeyError. Using pop() with a default is more defensive.

🛡️ Proposed fix
     yield request.param
 
-    del os.environ["NVTE_FUSED_ATTN"]
-    del os.environ["NVTE_FLASH_ATTN"]
+    os.environ.pop("NVTE_FUSED_ATTN", None)
+    os.environ.pop("NVTE_FLASH_ATTN", None)
     _attention_backends["backend_selection_requires_update"] = True
bionemo-recipes/models/esmc/modeling_esmc_te.py (3)

112-112: Type annotation mismatch for _tied_weights_keys.

The type is annotated as ClassVar[dict[str, str]] but HuggingFace's convention expects ClassVar[list[str]] (a list of weight key patterns). The empty value {} currently has no effect, but the type annotation should match the expected usage.

🧹 Proposed fix
-    _tied_weights_keys: ClassVar[dict[str, str]] = {}
+    _tied_weights_keys: ClassVar[list[str]] = []

Apply the same fix at line 300 for NVEsmcForMaskedLM.


263-264: Hardcoded max_seq_len=4096 for rotary embeddings.

The maximum sequence length is hardcoded. Consider making this configurable via NVEsmcConfig (e.g., max_position_embeddings) or at minimum document this limitation.

🔧 Proposed fix

Add to NVEsmcConfig.__init__:

max_position_embeddings: int = 4096,

Then use it:

         with torch.autocast(device_type="cuda", enabled=False):
-            te_rope_emb = self.rotary_emb(max_seq_len=4096)
+            te_rope_emb = self.rotary_emb(max_seq_len=self.config.max_position_embeddings)

404-404: Module-level dynamo config modification affects global state.

torch._dynamo.config.capture_scalar_outputs = True modifies global state at import time, which could affect other code. Consider adding a comment explaining why this is necessary for the @torch.compile decorators on the packing functions.

📝 Proposed documentation
 # ===================== Utility Functions for THD Packing =====================
 
+# Enable scalar output capture for torch.compile to support variable-length
+# sequence packing/unpacking operations in _pad_input and _unpad_input.
 torch._dynamo.config.capture_scalar_outputs = True
bionemo-recipes/models/esmc/tests/common/test_modeling_common.py (2)

598-604: Remove redundant AutoConfig import.

AutoConfig is already imported at line 30; the local import at line 599 is unnecessary.

🧹 Proposed fix
     def test_convert_config(self):
         """Test that config can be converted between HF and TE formats."""
         upstream_id = self.get_upstream_model_id()
         revision = self.get_upstream_model_revision()
 
         # Load HF config
-        from transformers import AutoConfig
-
         kwargs = {}

544-558: Base class assumes get_te_to_hf_converter returns a model, but ESMC returns state_dict.

The base class test_convert_te_to_hf assumes the converter returns a model instance (line 558 asserts isinstance(model_hf_converted, self.get_upstream_model_class())). However, ESMC's get_te_to_hf_converter returns convert_esmc_te_to_ref which produces a state_dict.

ESMC correctly overrides this test, but the docstring for get_te_to_hf_converter (lines 192-199) should clarify the expected return type contract, or the base class should document that subclasses may need to override these tests for non-HF models.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
@pstjohn
Copy link
Collaborator Author

pstjohn commented Feb 12, 2026

@coderabbitai resolve

@pstjohn pstjohn marked this pull request as ready for review February 12, 2026 14:48
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

✅ Actions performed

Comments resolved.

@pstjohn pstjohn force-pushed the pstjohn/bio-236-add-esm-c-model branch from 2bd275f to f1157f7 Compare February 12, 2026 14:48
@pstjohn pstjohn marked this pull request as draft February 12, 2026 14:50
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant