-
Notifications
You must be signed in to change notification settings - Fork 121
initial commit for esm-c model code #1463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| # TE's _create_qk_norm_modules doesn't respect params_dtype, so QK norm | ||
| # weights default to float32. Cast them to match the model dtype to avoid | ||
| # Q/K vs V dtype mismatch during FP8 attention. | ||
| if config.dtype is not None: | ||
| for layer in self.layers: | ||
| for norm in (layer.self_attention.q_norm, layer.self_attention.k_norm): | ||
| if norm is not None: | ||
| norm.to(dtype=config.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can't be right. We use layernorm elsewhere in recipes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh maybe we don't use qk_norm_type elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@bionemo-recipes/models/esmc/export.py`:
- Around line 68-69: The copy uses a relative path which breaks when run from
other working directories; change the source to an absolute path derived from
the module file (e.g., Path(__file__).resolve().parent / "modeling_esmc_te.py")
when calling shutil.copy so that export_path / "modeling_esmc_te.py" remains the
destination, and add the necessary "from pathlib import Path" import if not
present; keep the target as export_path to preserve existing behavior.
In `@bionemo-recipes/models/esmc/state.py`:
- Around line 66-73: The function apply_transforms currently uses mutable
default arguments for transforms and state_dict_ignored_entries which can lead
to unexpected shared-state bugs; change their defaults to None (i.e.,
transforms: Optional[List[...]] = None and state_dict_ignored_entries:
Optional[List] = None) and inside apply_transforms (near the top of the
function) add defensive handling: if transforms is None: transforms = [] and if
state_dict_ignored_entries is None: state_dict_ignored_entries = []; keep the
rest of the logic unchanged and ensure any subsequent code references the local
lists.
In `@bionemo-recipes/models/esmc/tests/common/__init__.py`:
- Around line 44-51: The docstring example imports a non-existent symbol
(BioNeMoModelTester) but the module actually exports BaseModelTest; update the
example to import BaseModelTest (and any other real exports like TestTolerances)
and have the example tester class inherit from BaseModelTest (e.g., class
ESM2ModelTester(BaseModelTest):) and adjust any referenced abstract method names
to match BaseModelTest's API so the snippet reflects the real public types.
- Around line 1-30: Remove the duplicate license header block that appears twice
and retain only the canonical 2025 Apache-2.0 header; specifically delete the
first header containing "SPDX-FileCopyrightText: Copyright (c) 2026" and
"SPDX-License-Identifier: LicenseRef-Apache2" and keep the second header
containing "SPDX-FileCopyrightText: Copyright (c) 2025" and
"SPDX-License-Identifier: Apache-2.0" so only one Apache-2.0 license header
(with those SPDX tags) remains in __init__.py.
🧹 Nitpick comments (15)
bionemo-recipes/models/esmc/requirements.txt (1)
1-6: Consider pinning versions and separating dev dependencies.For reproducibility and to avoid breaking changes, consider:
- Pinning versions for critical dependencies (
transformer_engine,transformers,torch,accelerate)- Separating
pytestinto a dev-requirements.txt or using extras in setup.py/pyproject.tomlbionemo-recipes/models/esmc/state.py (1)
160-162: Use logger instead of print for consistency.Line 161 uses
print()while the rest of the file useslogger. This should be consistent for proper log level control.♻️ Suggested fix
- print(f"Unexpected key: {name} not in target model but is in source model.") + logger.warning(f"Unexpected key: {name} not in target model but is in source model.")bionemo-recipes/models/esmc/tests/common/README.md (1)
7-13: Add language identifier to fenced code block.Per markdownlint, fenced code blocks should have a language specified. For directory structures, use
textorplaintext.📝 Suggested fix
-``` +```text tests/common/ ├── __init__.py # Public API exports ├── test_modeling_common.py # BaseModelTest, TestTolerances ├── fixtures.py # input_format, fp8_recipe, te_attn_backend, etc. └── README.md</details> </blockquote></details> <details> <summary>bionemo-recipes/models/esmc/export.py (1)</summary><blockquote> `28-29`: **Use explicit relative imports for robustness.** The imports `import convert` and `from modeling_esmc_te import ...` rely on implicit relative imports, which can fail depending on how the module is invoked. <details> <summary>♻️ Suggested fix</summary> ```diff -import convert -from modeling_esmc_te import AUTO_MAP, NVEsmcConfig +from . import convert +from .modeling_esmc_te import AUTO_MAP, NVEsmcConfigNote: If this script is intended to be run directly as
__main__, you may need to add the parent directory tosys.pathor use a different approach.bionemo-recipes/models/esmc/convert.py (3)
37-68: Remove or use themappingdictionary.The
mappingdictionary is defined but never used inconvert_esmc_to_teorconvert_esmc_te_to_ref. The actual key mapping is done inline in the conversion functions.If this is intended as documentation, consider moving it to a docstring or comment. Otherwise, consider removing it to avoid confusion.
195-197: Consider usingstrict=Trueor logging unmatched keys.Using
strict=Falseinload_state_dictwill silently ignore missing or unexpected keys, which could mask conversion bugs. Consider either:
- Using
strict=True(may need to handle_extra_statekeys explicitly)- Logging the result of
load_state_dictto surface any mismatchesmissing, unexpected = model_te.load_state_dict(target_state, strict=False, assign=True) if missing or unexpected: logger.warning(f"State dict mismatch - missing: {missing}, unexpected: {unexpected}")
34-34: Use explicit relative import.Same issue as in
export.py- the implicit import may fail depending on how the module is invoked.♻️ Suggested fix
-from modeling_esmc_te import NVEsmcConfig, NVEsmcForMaskedLM +from .modeling_esmc_te import NVEsmcConfig, NVEsmcForMaskedLMbionemo-recipes/models/esmc/tests/common/fixtures.py (3)
1-30: Duplicate license headers should be consolidated.The file contains two separate license headers (lines 1-14 for 2026 and lines 16-29 for 2025). This appears to be a copy-paste artifact. Keep only one license header with the appropriate year.
🛠️ Proposed fix
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: LicenseRef-Apache2 # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - """Shared test fixtures for BioNeMo models."""
55-66: Remove redundantimport osinside fixture.
osis already imported at line 33; the local import at line 62 is unnecessary.🧹 Proposed fix
`@pytest.fixture`(autouse=True) def use_te_debug(): """Auto-use fixture to enable TransformerEngine debugging. This fixture automatically enables debug mode for TransformerEngine in all tests for better error messages. """ - import os - os.environ["NVTE_DEBUG"] = "1" yield del os.environ["NVTE_DEBUG"]
126-143: Useos.environ.pop()for safer cleanup.If the fixture setup fails before setting the environment variables, the cleanup code will raise
KeyError. Usingpop()with a default is more defensive.🛡️ Proposed fix
yield request.param - del os.environ["NVTE_FUSED_ATTN"] - del os.environ["NVTE_FLASH_ATTN"] + os.environ.pop("NVTE_FUSED_ATTN", None) + os.environ.pop("NVTE_FLASH_ATTN", None) _attention_backends["backend_selection_requires_update"] = Truebionemo-recipes/models/esmc/modeling_esmc_te.py (3)
112-112: Type annotation mismatch for_tied_weights_keys.The type is annotated as
ClassVar[dict[str, str]]but HuggingFace's convention expectsClassVar[list[str]](a list of weight key patterns). The empty value{}currently has no effect, but the type annotation should match the expected usage.🧹 Proposed fix
- _tied_weights_keys: ClassVar[dict[str, str]] = {} + _tied_weights_keys: ClassVar[list[str]] = []Apply the same fix at line 300 for
NVEsmcForMaskedLM.
263-264: Hardcodedmax_seq_len=4096for rotary embeddings.The maximum sequence length is hardcoded. Consider making this configurable via
NVEsmcConfig(e.g.,max_position_embeddings) or at minimum document this limitation.🔧 Proposed fix
Add to
NVEsmcConfig.__init__:max_position_embeddings: int = 4096,Then use it:
with torch.autocast(device_type="cuda", enabled=False): - te_rope_emb = self.rotary_emb(max_seq_len=4096) + te_rope_emb = self.rotary_emb(max_seq_len=self.config.max_position_embeddings)
404-404: Module-level dynamo config modification affects global state.
torch._dynamo.config.capture_scalar_outputs = Truemodifies global state at import time, which could affect other code. Consider adding a comment explaining why this is necessary for the@torch.compiledecorators on the packing functions.📝 Proposed documentation
# ===================== Utility Functions for THD Packing ===================== +# Enable scalar output capture for torch.compile to support variable-length +# sequence packing/unpacking operations in _pad_input and _unpad_input. torch._dynamo.config.capture_scalar_outputs = Truebionemo-recipes/models/esmc/tests/common/test_modeling_common.py (2)
598-604: Remove redundantAutoConfigimport.
AutoConfigis already imported at line 30; the local import at line 599 is unnecessary.🧹 Proposed fix
def test_convert_config(self): """Test that config can be converted between HF and TE formats.""" upstream_id = self.get_upstream_model_id() revision = self.get_upstream_model_revision() # Load HF config - from transformers import AutoConfig - kwargs = {}
544-558: Base class assumesget_te_to_hf_converterreturns a model, but ESMC returns state_dict.The base class
test_convert_te_to_hfassumes the converter returns a model instance (line 558 assertsisinstance(model_hf_converted, self.get_upstream_model_class())). However, ESMC'sget_te_to_hf_converterreturnsconvert_esmc_te_to_refwhich produces a state_dict.ESMC correctly overrides this test, but the docstring for
get_te_to_hf_converter(lines 192-199) should clarify the expected return type contract, or the base class should document that subclasses may need to override these tests for non-HF models.
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
|
@coderabbitai resolve |
✅ Actions performedComments resolved. |
2bd275f to
f1157f7
Compare
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Adds the ESM-C 300M model from https://huggingface.co/EvolutionaryScale/esmc-300m-2024-12 to models/esmc
Summary by CodeRabbit
Release Notes
New Features
Tests