Add ESM2 PEFT recipe #1446

balvisio · 2026-02-04T16:08:56Z

Description

This PR adds a recipe to perform LoRA fine-tuning to the Evo2 model. It provides support for DDP and sequence packing. It also contains a file infer.py that shows how to do inference from a fine-tuned checkpoint. The PR contains the datasets that were used to train the models. Eventually we can convert them into HF datasets.

It does not add support for FSDP or FP8 yet.

Usage

cd bionemo-recipes/recipes/evo2_peft_te/
python train_lora_ddp.py

For more information on usage see the README.

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

ciflow:skip - Skip all CI tests for this PR
ciflow:notebooks - Run Jupyter notebooks execution tests for bionemo2
ciflow:slow - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2
ciflow:all - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2.
ciflow:all-recipes - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes.

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

@coderabbitai review - Triggers a standard review
@coderabbitai full review - Triggers a comprehensive review

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

Summary by CodeRabbit

Release Notes

New Features

Added Parameter-Efficient Fine-Tuning (LoRA) support for ESM2 models, enabling efficient adaptation with reduced memory footprint
Introduced convolutional token classification head as an alternative architecture for sequence labeling tasks
New ESM2 PEFT recipe with complete training and inference pipelines, including distributed training support

Documentation

Added comprehensive guides for training, inference, datasets, and Docker deployment
Provided Hydra configuration examples for quick-start scenarios

Chores

Updated PEFT dependency to specialized version for enhanced functionality
Enhanced distributed training infrastructure with performance monitoring and logging

coderabbitai · 2026-02-04T16:09:04Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev/ba/esm2-peft-recipe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents

In `@bionemo-recipes/models/esm2/pyproject.toml`:
- Line 17: The git-based peft dependency string in pyproject.toml currently pins
to a branch ("peft @
git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora"); replace the
branch ref with an immutable identifier (a specific commit SHA or an official
release tag) so the requirement becomes pinned to that commit/tag, update the
dependency entry accordingly, and verify the chosen SHA/tag exists in the peft
repo and builds correctly (look for the dependency line containing "peft @
git+https://github.com/balvisio/peft.git@...").

In `@bionemo-recipes/recipes/esm2_peft_te/checkpoint.py`:
- Line 1: The current checkpoint.py is a cross-recipe symlink to another
recipe's checkpoint logic; replace it by copying the checkpoint implementation
into this recipe (or extract the shared logic into a new common module and
import from there), add the proper per-file license header to the copied file,
remove any imports that reference other recipes, and update all references to
use the local checkpoint implementation (i.e., the functions/classes originally
provided by the external checkpoint module) so the recipe is fully
self-contained.

In `@bionemo-recipes/recipes/esm2_peft_te/dataset.py`:
- Around line 20-28: The file imports DataCollatorWithFlattening from
transformers but this recipe expects the local, recipe-specific implementation
in collator.py; change the import to pull DataCollatorWithFlattening from the
local collator module (the same place TokenPackingDataset is imported from) so
that the DataCollatorWithFlattening used by Dataset code matches the
recipe-specific signature and Flash Attention / THD-format parameters; update
the import list to reference collator.DataCollatorWithFlattening instead of
transformers.DataCollatorWithFlattening.

In `@bionemo-recipes/recipes/esm2_peft_te/distributed_config.py`:
- Line 1: The file distributed_config.py currently points to
../esm2_native_te/distributed_config.py (cross-recipe symlink); replace this
pointer by inlining the distributed_config implementation into this recipe (copy
the code from esm2_native_te/distributed_config.py into this recipe's
distributed_config.py) and add the required per-file license header at the top,
or alternatively extract any truly shared utilities into a new common module
outside recipes (e.g., a shared package) and import that instead; ensure the new
distributed_config.py in this recipe contains no imports from other recipes and
includes the proper license header.

In `@bionemo-recipes/recipes/esm2_peft_te/Dockerfile`:
- Around line 1-12: The Dockerfile is copying files from esm2_native_te
(checkpoint.py, collator.py, distributed_config.py, scheduler.py) which violates
the self-contained recipe rule; remove those COPY lines and either vendor those
helper modules into esm2_peft_te (add them under esm2_peft_te/ and update any
imports) or move them into a shared non-recipe package, then update the
Dockerfile to only COPY esm2_peft_te/ and its requirements; ensure any import
paths in the code reference the vendored modules (e.g., esm2_peft_te.checkpoint,
esm2_peft_te.collator, etc.) so the image no longer depends on esm2_native_te.

In `@bionemo-recipes/recipes/esm2_peft_te/infer.py`:
- Around line 27-36: Add a Google-style docstring to the _batched_inference
function: document the function purpose in one line, then an Args section
listing model, tokenizer, records, batch_size (int), max_seq_length (int),
stride (int), infer_overflowing_aas (bool), and device (str) with short
descriptions and types, and a Returns section describing the tuple of list[str]
(predicted sequences) and list[int] (corresponding lengths/ids) following
pydocstyle conventions; place the docstring immediately below the def
_batched_inference(...) signature and ensure proper triple-quote formatting and
punctuation.
- Around line 46-76: The code assumes inputs contains
"overflow_to_sample_mapping" but when tokenizer was called with
return_overflowing_tokens/infer_overflowing_aas=False that key is missing;
before using overflow_map (variable overflow_map) in the inner loop (inside the
block that constructs sub_inputs and iterates preds) guard access by checking if
"overflow_to_sample_mapping" is in inputs and set overflow_map =
inputs.pop("overflow_to_sample_mapping", None); when overflow_map is None,
compute original_idx using i (outer sample index) directly (or j+k) without
indexing overflow_map, and keep appending to sequences_to_sample_mapping and
predictions as before to avoid KeyError; update references in the loop where
original_idx is assigned (original_idx = i + overflow_map[j + k].item()) to
handle both cases.
- Line 108: Replace the hardcoded tokenizer checkpoint with the model-derived
tokenizer: modify the AutoTokenizer.from_pretrained call (where tokenizer =
AutoTokenizer.from_pretrained("nvidia/esm2_t48_15B_UR50D")) to load from the
runtime/config value (e.g., args.model_tag or a config field tokenizer_name) so
the tokenizer matches the model loaded elsewhere in infer.py; also add
tokenizer_name: ${model_tag} to hydra_config/defaults_infer.yaml so the
tokenizer checkpoint can be overridden via config. Ensure you reference the same
symbol used to load the model (args.model_tag or the config object) when calling
AutoTokenizer.from_pretrained.

In `@bionemo-recipes/recipes/esm2_peft_te/requirements.txt`:
- Around line 2-6: The git-based peft dependency is pinned to a mutable branch;
update the requirement line "peft @
git+https://github.com/balvisio/peft.git@dev/ba/support-te-lora" to reference an
immutable identifier by replacing the branch name with a specific commit SHA or
released tag (e.g., @<commit-sha> or `@vX.Y.Z`) so builds are reproducible and
supply-chain safe; ensure the updated string remains in requirements.txt as
"peft @ git+https://github.com/balvisio/peft.git@<commit-or-tag>".

In `@bionemo-recipes/recipes/esm2_peft_te/scheduler.py`:
- Line 1: This file currently redirects to or imports from
esm2_native_te.scheduler which violates recipe isolation; replace that
cross-recipe dependency by copying the required scheduler logic into this recipe
or implementing a local equivalent. Identify the exported symbols you depend on
(e.g., Scheduler class, create_scheduler or get_scheduler factory function, and
any helper functions like schedule_task or init_scheduler) from the
esm2_native_te implementation, reproduce their behavior locally inside
bionemo-recipes/recipes/esm2_peft_te/scheduler.py, update any local imports to
use the new local implementations, and remove any import lines that reference
esm2_native_te.scheduler so the recipe is fully self-contained.
- Line 1: The file currently contains a bare path string which raises a
SyntaxError; replace it with a valid Python implementation or a re-export
import: remove the literal path and add an import that re-exports the native
scheduler (e.g., import or from import of esm2_native_te.scheduler) or implement
a minimal wrapper function/class matching this package's expected API (e.g.,
functions/classes used elsewhere that reference scheduler.py) so the module can
be imported without error; update symbols to match callers if you implement a
wrapper.

In `@bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py`:
- Line 25: The recipe imports NVEsmForConvTokenClassification from
modeling_esm_te in train_lora_convnet.py which pulls code from another recipe
and breaks self-containment; either vendor the required module into this recipe
(add modeling_esm_te.py with NVEsmForConvTokenClassification implementation
alongside train_lora_convnet.py and update the package/module path) or declare
and install the external esm package from models/esm2 (add it to
requirements.txt and update the Dockerfile to copy/install that package) so that
the import in train_lora_convnet.py resolves without referencing code outside
this recipe.

🟡 Minor comments (15)

bionemo-recipes/recipes/esm2_peft_te/scheduler.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Add license header and Google-style module docstring.

The file is missing the required license header and module docstring. Pre-commit hooks will likely fail.

As per coding guidelines: “Ensure license headers are present in all files…” and “Use Google-style docstrings (pydocstyle).”
.github/workflows/unit-tests-recipes.yml-158-160 (1)
158-160: ⚠️ Potential issue | 🟡 Minor

Avoid hard-coded safe.directory path to prevent CI breakage in forks.

Using a fixed /__w/bionemo-framework/bionemo-framework path can fail when the repo name changes (forks or mirrors), triggering “dubious ownership” errors. Prefer $GITHUB_WORKSPACE so the command is resilient.
🛠️ Proposed fix
-        run: git -c safe.directory=/__w/bionemo-framework/bionemo-framework sparse-checkout add bionemo-recipes/recipes/esm2_native_te
+        run: git -c safe.directory="$GITHUB_WORKSPACE" sparse-checkout add bionemo-recipes/recipes/esm2_native_te
bionemo-recipes/recipes/esm2_peft_te/example_nv_esm2_t6_8M_UR50D_peft_checkpoint/config.json-31-42 (1)

31-42: ⚠️ Potential issue | 🟡 Minor

Add a comment explaining the "L" label in label2id.

The label2id mapping includes "L": 2, which is not a standard DSSP secondary structure code. While this label is used consistently throughout the recipe (mapped to the coil class), it's undocumented what "L" represents. Add an inline comment clarifying whether "L" represents "Loop" or another designation, and explain why it's included in the label scheme alongside standard DSSP codes.

bionemo-recipes/recipes/esm2_peft_te/example_nv_esm2_t6_8M_UR50D_peft_checkpoint/README.md-1-200 (1)

1-200: ⚠️ Potential issue | 🟡 Minor

Replace placeholder fields before shipping the model card.
The model card is entirely "[More Information Needed]" placeholders. If this accompanies a published checkpoint, please fill in at least core metadata (license, training data, intended use, evaluation) or clearly mark it as a template to avoid shipping incomplete documentation.
bionemo-recipes/recipes/esm2_peft_te/README.md-101-107 (1)
101-107: ⚠️ Potential issue | 🟡 Minor

Use descriptive link text for the esm2_native_te README link.
Line 107 uses "here", which is not descriptive and triggers MD059. Suggest updating the anchor text.
Proposed fix
-For more information see [here](../esm2_native_te/README.md).
+For more information see the [esm2_native_te README](../esm2_native_te/README.md).
bionemo-recipes/models/esm2/src/esm/modeling_esm_te.py-684-707 (1)
684-707: ⚠️ Potential issue | 🟡 Minor

Add Google-style Args/Returns to the new conv head docstrings.
NVConvNetHead and NVEsmForConvTokenClassification docstrings are minimal and missing Args/Returns sections, and the __init__ docstring name is off. Please update them to Google-style to satisfy pydocstyle.
Example docstring updates
 class NVConvNetHead(nn.Module):
-    """Convolution based head for token classification."""
+    """Convolution-based head for token classification.
+
+    Args:
+        config (NVEsmConfig): Model configuration.
+    """
@@
-    def forward(self, features, **kwargs):
-        """Forward pass for the convolutional token classification head."""
+    def forward(self, features, **kwargs):
+        """Forward pass for the convolutional token classification head.
+
+        Args:
+            features (torch.Tensor): Input features of shape (batch, hidden, seq_len).
+            **kwargs: Unused keyword arguments.
+
+        Returns:
+            torch.Tensor: Logits of shape (batch, seq_len, num_labels).
+        """
@@
 class NVEsmForConvTokenClassification(NVEsmPreTrainedModel):
@@
-    def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+    def __init__(self, config):
+        """Initialize NVEsmForConvTokenClassification.
+
+        Args:
+            config (NVEsmConfig): Model configuration.
+        """
As per coding guidelines: Use Google-style docstrings following pydocstyle conventions.
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py-705-714 (1)
705-714: ⚠️ Potential issue | 🟡 Minor

Fix docstring mismatch and redundant init_weights() call.

Two issues in the constructor:

Line 706: Docstring says "Initialize NVEsmForTokenClassification" but this is NVEsmForConvTokenClassification.

Lines 713-714: Calling both init_weights() and post_init() is redundant—post_init() already invokes init_weights() internally (see HuggingFace PreTrainedModel.post_init()). Other classes in this file (e.g., NVEsmForTokenClassification at line 640) only call post_init().
Proposed fix
     def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+        """Initialize NVEsmForConvTokenClassification."""
         super().__init__(config)
         self.num_labels = config.num_labels

         self.esm = NVEsmModel(config, add_pooling_layer=False)
         self.classifier = NVConvNetHead(config)

-        self.init_weights()
         self.post_init()
bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py-705-714 (1)
705-714: ⚠️ Potential issue | 🟡 Minor

Fix docstring mismatch and redundant init_weights() call.

Same issues as in other esm_nv.py files:

Docstring says "Initialize NVEsmForTokenClassification" instead of "Initialize NVEsmForConvTokenClassification".

Redundant init_weights() call before post_init().
Proposed fix
     def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+        """Initialize NVEsmForConvTokenClassification."""
         super().__init__(config)
         self.num_labels = config.num_labels

         self.esm = NVEsmModel(config, add_pooling_layer=False)
         self.classifier = NVConvNetHead(config)

-        self.init_weights()
         self.post_init()
bionemo-recipes/recipes/esm2_peft_te/example_nv_esm2_t6_8M_UR50D_peft_checkpoint/esm_nv.py-705-714 (1)
705-714: ⚠️ Potential issue | 🟡 Minor

Fix docstring mismatch and redundant init_weights() call.

Same issues as in esm2_peft_te/example_8m_checkpoint/esm_nv.py:

Docstring says "Initialize NVEsmForTokenClassification" instead of "Initialize NVEsmForConvTokenClassification".

Redundant init_weights() call before post_init().
Proposed fix
     def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+        """Initialize NVEsmForConvTokenClassification."""
         super().__init__(config)
         self.num_labels = config.num_labels

         self.esm = NVEsmModel(config, add_pooling_layer=False)
         self.classifier = NVConvNetHead(config)

-        self.init_weights()
         self.post_init()
bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py-705-714 (1)
705-714: ⚠️ Potential issue | 🟡 Minor

Fix docstring mismatch and redundant init_weights() call.

Same issues as in other esm_nv.py files:

Docstring says "Initialize NVEsmForTokenClassification" instead of "Initialize NVEsmForConvTokenClassification".

Redundant init_weights() call before post_init().
Proposed fix
     def __init__(self, config):
-        """Initialize NVEsmForTokenClassification."""
+        """Initialize NVEsmForConvTokenClassification."""
         super().__init__(config)
         self.num_labels = config.num_labels

         self.esm = NVEsmModel(config, add_pooling_layer=False)
         self.classifier = NVConvNetHead(config)

-        self.init_weights()
         self.post_init()
bionemo-recipes/recipes/esm2_peft_te/infer.py-104-106 (1)
104-106: ⚠️ Potential issue | 🟡 Minor

Handle CPU-only environments or allow a device override.

The script hardcodes CUDA at line 106, which crashes on CPU-only systems. Additionally, the _batched_inference() call (lines 112-117) doesn't pass the device parameter despite the function supporting it.
🛠️ Suggested device handling
     # Load PEFT adapters on top
     peft_model = PeftModel.from_pretrained(base_model, args.peft_model_config_dir)
-    peft_model = peft_model.to("cuda").eval()
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    peft_model = peft_model.to(device).eval()
@@
-    predictions, sequences_to_sample_mapping = _batched_inference(
-        peft_model,
-        tokenizer,
-        records,
-        **args.inference,
-    )
+    inference_kwargs = dict(args.inference)
+    inference_kwargs.setdefault("device", device)
+    predictions, sequences_to_sample_mapping = _batched_inference(
+        peft_model,
+        tokenizer,
+        records,
+        **inference_kwargs,
+    )
bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py-170-196 (1)
170-196: ⚠️ Potential issue | 🟡 Minor

Guard against empty validation dataloader.

If validation yields zero batches, val_steps stays 0 and the averaging will raise. Add a guard to handle empty validation splits.
💡 Suggested fix
-                avg_val_loss = val_loss_total / val_steps
-                avg_val_acc = val_correct_total / val_tokens_total if val_tokens_total > 0 else 0.0
+                if val_steps == 0:
+                    avg_val_loss = 0.0
+                    avg_val_acc = 0.0
+                else:
+                    avg_val_loss = val_loss_total / val_steps
+                    avg_val_acc = val_correct_total / val_tokens_total if val_tokens_total > 0 else 0.0
bionemo-recipes/recipes/esm2_peft_te/train_lora_ddp.py-171-197 (1)
171-197: ⚠️ Potential issue | 🟡 Minor

Guard against empty validation dataloader.

If the validation dataloader yields zero batches, val_steps remains 0 and this division will raise. Add a zero-batch guard.
💡 Suggested fix
-                avg_val_loss = val_loss_total / val_steps
-                avg_val_acc = val_correct_total / val_tokens_total if val_tokens_total > 0 else 0.0
+                if val_steps == 0:
+                    avg_val_loss = 0.0
+                    avg_val_acc = 0.0
+                else:
+                    avg_val_loss = val_loss_total / val_steps
+                    avg_val_acc = val_correct_total / val_tokens_total if val_tokens_total > 0 else 0.0
bionemo-recipes/recipes/esm2_peft_te/utils.py-106-123 (1)
106-123: ⚠️ Potential issue | 🟡 Minor

Validate CSV headers and handle empty files.

The current code fails on empty CSV files (line 115: "pdb_id" in reader.fieldnames raises TypeError when fieldnames is None), and missing the sequence column causes an unhandled KeyError on line 120. Add upfront validation to provide clear error messages.
Suggested fix
 def load_csv(path: Path) -> list[dict]:
     """Read input CSV file for inference.
@@
-    with open(path) as f:
-        reader = csv.DictReader(f)
-        has_pdb_id = "pdb_id" in reader.fieldnames
+    with open(path, newline="") as f:
+        reader = csv.DictReader(f)
+        if reader.fieldnames is None:
+            raise ValueError("CSV must include a header with a 'sequence' column.")
+        if "sequence" not in reader.fieldnames:
+            raise ValueError("CSV header must include a 'sequence' column.")
+        has_pdb_id = "pdb_id" in reader.fieldnames
bionemo-recipes/recipes/esm2_peft_te/tests/test_train_lora.py-16-18 (1)
16-18: ⚠️ Potential issue | 🟡 Minor

Guard test_sanity_ddp_thd with CUDA availability check to prevent failures in CPU-only environments.

torch.cuda.get_device_capability() raises if CUDA is unavailable. Add the skip guard at the start of the function to avoid test failures in CPU-only runs.
💡 Suggested fix
 import torch
+import pytest
 from hydra import compose, initialize_config_dir

 from train_lora_ddp import main as main_ddp


 def test_sanity_ddp_thd(tmp_path, monkeypatch, recipe_path):
+    if not torch.cuda.is_available():
+        pytest.skip("CUDA is required for DDP THD sanity test")
     if torch.cuda.get_device_capability() == (12, 0):
         # TODO(BIONEMO-2840): On sm120, we need to set NVTE_FUSED_ATTN to 0 since TE will choose fused attn by default,

🧹 Nitpick comments (7)

bionemo-recipes/recipes/esm2_peft_te/perf_logger.py (2)
139-142: Avoid mutating the caller's outputs object.

Directly modifying outputs.logits with unsqueeze(0) mutates the object passed by the caller, which could cause unexpected side effects in the training loop if the outputs are used elsewhere after this call.
Proposed fix: use a local variable
         # Handle sequence packing for torchmetrics calculation.
+        logits_for_perplexity = outputs.logits
         if outputs.logits.dim() < 3:
-            outputs.logits = outputs.logits.unsqueeze(0)
+            logits_for_perplexity = outputs.logits.unsqueeze(0)

-        self.metrics["train/perplexity"].update(outputs.logits, batch["labels"])
+        self.metrics["train/perplexity"].update(logits_for_perplexity, batch["labels"])
153-166: Inconsistent rank checks for logging.

Line 153 uses is_main_process() for wandb logging, but line 165 uses local_rank == 0 for logger output. In multi-node distributed setups, these may not be equivalent (e.g., local_rank == 0 is true on every node, while is_main_process() is typically true only on global rank 0).

Consider using consistent rank checks throughout.
Proposed fix
-            if self._dist_config.local_rank == 0:
+            if self._dist_config.is_main_process():
                 logger.info(", ".join([f"{k.split('/')[1]}: {v:.3g}" for k, v in metrics.items()]))
bionemo-recipes/recipes/esm2_peft_te/tests/test_train_lora_two_gpus.py (1)
49-52: Add a Google-style docstring to the test.

Keeps docstring linting consistent even under relaxed test rules.
♻️ Suggested update
 `@requires_multi_gpu`
 def test_multi_gpu_train_te_ddp(tmp_path, recipe_path):
+    """Smoke-test multi-GPU DDP training for the recipe."""
     # Run 'accelerate launch train.py' as a subprocess
As per coding guidelines: Use Google-style docstrings following pydocstyle conventions.
bionemo-recipes/recipes/esm2_peft_te/train_lora_ddp.py (1)
111-112: Gate verbose prints to the main process.

These prints will fire on every rank and can spam logs; consider restricting them to the main process.
💡 Suggested fix
-    print("----- PEFT Model --------")
-    peft_model.print_trainable_parameters()
+    if dist_config.is_main_process():
+        print("----- PEFT Model --------")
+        peft_model.print_trainable_parameters()
@@
-                print(f"\nStep: {step}: Validation Loss = {avg_val_loss:.4f}, Accuracy: {avg_val_acc:.4f}\n")
+                if dist_config.is_main_process():
+                    print(f"\nStep: {step}: Validation Loss = {avg_val_loss:.4f}, Accuracy: {avg_val_acc:.4f}\n")
Also applies to: 198-198
bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py (1)
110-111: Gate verbose prints to the main process.

These prints execute on every rank; restricting to the main process avoids noisy multi-rank output.
💡 Suggested fix
-    print("----- PEFT Model --------")
-    peft_model.print_trainable_parameters()
+    if dist_config.is_main_process():
+        print("----- PEFT Model --------")
+        peft_model.print_trainable_parameters()
@@
-                print(f"\nStep: {step}: Validation Loss = {avg_val_loss:.4f}, Accuracy: {avg_val_acc:.4f}\n")
+                if dist_config.is_main_process():
+                    print(f"\nStep: {step}: Validation Loss = {avg_val_loss:.4f}, Accuracy: {avg_val_acc:.4f}\n")
Also applies to: 197-197
bionemo-recipes/recipes/esm2_peft_te/dataset.py (1)
31-44: Use Google-style docstring for create_dataloader.

The current one-line docstring doesn’t meet the Google-style requirement. Add Args:/Returns: sections.
💡 Suggested fix
-    """Create a dataloader for the secondary structure dataset."""
+    """Create dataloaders for the secondary structure dataset.
+
+    Args:
+        distributed_config: Distributed training configuration.
+        use_sequence_packing: Whether to enable sequence packing.
+        tokenizer_name: Tokenizer identifier.
+        micro_batch_size: Training micro-batch size.
+        val_micro_batch_size: Validation micro-batch size.
+        num_workers: DataLoader worker count.
+        max_seq_length: Maximum sequence length.
+        stride: Tokenizer stride for overflow.
+        seed: RNG seed for shuffling.
+        ss3_classification: Whether to use SS3 labels (else SS8).
+        load_dataset_kwargs: Keyword arguments for datasets.load_dataset.
+
+    Returns:
+        Tuple of (train_dataloader, val_dataloader, train_dataset_or_sampler).
+    """
As per coding guidelines, Ensure all Python files follow Google-style docstrings (pydocstyle convention).
bionemo-recipes/recipes/esm2_peft_te/utils.py (1)
55-61: Use Google-style docstrings for public helpers.

These public functions should include Args:/Returns: sections to meet the docstring standard. Consider updating all helper docstrings similarly.
💡 Example for one function
 def compute_accuracy(preds, labels, ignore_index=-100) -> tuple[int, int]:
-    """Calculate the accuracy."""
+    """Calculate the accuracy.
+
+    Args:
+        preds: Model logits or scores per token.
+        labels: Ground-truth label tensor.
+        ignore_index: Label value to exclude from accuracy.
+
+    Returns:
+        Tuple of (correct_count, total_count).
+    """
As per coding guidelines, Ensure all Python files follow Google-style docstrings (pydocstyle convention).
Also applies to: 64-82, 84-104, 106-135, 138-161, 163-170

bionemo-recipes/models/esm2/pyproject.toml

bionemo-recipes/models/esm2/src/esm/modeling_esm_te.py

bionemo-recipes/recipes/esm2_peft_te/checkpoint.py

bionemo-recipes/recipes/esm2_peft_te/dataset.py

bionemo-recipes/recipes/esm2_peft_te/distributed_config.py

bionemo-recipes/recipes/esm2_peft_te/infer.py

bionemo-recipes/recipes/esm2_peft_te/requirements.txt

bionemo-recipes/recipes/esm2_peft_te/scheduler.py

bionemo-recipes/recipes/esm2_peft_te/train_lora_convnet.py

bionemo-recipes/recipes/esm2_peft_te/scheduler.py

bionemo-recipes/recipes/esm2_peft_te/data/porter6_train_dataset_55k.parquet

...s/recipes/esm2_peft_te/example_nv_esm2_t6_8M_UR50D_peft_checkpoint/adapter_model.safetensors

bionemo-recipes/recipes/esm2_peft_te/utils.py

.github/workflows/unit-tests-recipes.yml

bionemo-recipes/models/esm2/pyproject.toml

bionemo-recipes/recipes/esm2_native_te/checkpoint.py

bionemo-recipes/recipes/esm2_peft_te/tests/peft_test_dataset.csv

ci/scripts/check_copied_files.py

Signed-off-by: Bruno Alvisio <balvisio@nvidia.com>

balvisio requested review from cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, pstjohn, savitha-eng and trvachov as code owners February 4, 2026 16:08

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

pstjohn reviewed Feb 9, 2026

View reviewed changes

bionemo-recipes/recipes/esm2_peft_te/scheduler.py Show resolved Hide resolved

pstjohn reviewed Feb 9, 2026

View reviewed changes

bionemo-recipes/recipes/esm2_peft_te/data/porter6_train_dataset_55k.parquet Outdated Show resolved Hide resolved

pstjohn reviewed Feb 9, 2026

View reviewed changes

...s/recipes/esm2_peft_te/example_nv_esm2_t6_8M_UR50D_peft_checkpoint/adapter_model.safetensors Outdated Show resolved Hide resolved

pstjohn reviewed Feb 9, 2026

View reviewed changes

bionemo-recipes/recipes/esm2_peft_te/utils.py Outdated Show resolved Hide resolved

balvisio force-pushed the dev/ba/esm2-peft-recipe branch 5 times, most recently from ceb3874 to 1308d55 Compare February 11, 2026 03:08