Layer-wise selective fine-tuning of protein language models for antibodies.
AbTune is a user-friendly framework for sequence-specific and computationally efficient fine-tuning of protein language models (pLMs) on antibody datasets.
AbTune implements a layer-wise selective fine-tuning strategy, where only a subset of transformer layers are updated during adaptation. This substantially reduces computational cost while improving performance on antibody-related downstream tasks.
The framework currently supports:
- Sequence representation fine-tuning with ESM2
- Structure prediction with ESMFold
- Conservation and mutation-effect analysis through sequence scanning
The tool accompanies the preprint:
Xu et al. AbTune: Layer-wise Selective Fine-Tuning of Protein Language Models for Antibodies. bioRxiv, 2025.
https://www.biorxiv.org/content/10.1101/2025.10.17.682998v1
- Layer-wise selective fine-tuning for reduced GPU memory usage
- Efficient adaptation of large protein language models
- Supports both sequence- and structure-level workflows
- Compatible with ESM2 and ESMFold backbones
- YAML-based configuration system
- Automatic heavy/light chain handling
- FASTA-based input pipelines
- Conservation and mutation-scanning utilities
| Dependency | Version |
|---|---|
| Python | >= 3.10 |
| biopython | 1.85 |
| fair-esm | 2.0.0 |
| numpy | 1.26.4 |
| omegaconf | 2.3.0 |
| pandas | 2.2.3 |
| torch | 2.1.2 |
All required dependencies are installed automatically during installation.
If you intend to use AbTune in ESMFold mode, ESMFold must be installed separately before installing AbTune.
ESMFold requires additional dependencies, including OpenFold, which are not bundled with this package.
Follow the official installation instructions:
git clone https://github.com/facebookresearch/esmMore details:
https://github.com/facebookresearch/esm
pip install Ab-Tunegit clone https://github.com/haddocking/Finetune-Ab
cd Finetune-Ab
pip install .This installs the command-line entry point:
ab-tuneAbTune is configured entirely through YAML configuration files.
Run a job using:
ab-tune --config configs/ESM2.yamlYou can switch running modes, datasets, and hyperparameters without modifying the source code.
AbTune currently supports three operating modes.
Fine-tunes ESM2 models directly on antibody sequences.
Only the linear projection layers inside Multi-Head Attention (MHA) modules are updated during training, while the remainder of the model remains frozen.
This mode is useful when:
- Improved antibody embeddings are required
- Structure prediction is not needed
- Minimal computational overhead is desired
| Output | Description |
|---|---|
| Embeddings | Fine-tuned antibody sequence representations |
| Training logs | Optimization and loss metrics |
| Validation metrics | Task-specific evaluation results |
- Embedding extraction
- Similarity analysis
- Downstream machine learning pipelines
- Antibody property prediction
Extends the ESM2 mode to additionally perform ESMFold structure prediction using the fine-tuned backbone.
Heavy and light chain sequences are concatenated internally using a 25-residue polyglycine linker before being passed to the model.
In our experiments, inclusion of the linker consistently improved model performance.
This mode requires ESMFold to be installed separately.
| Output | Description |
|---|---|
| Predicted PDB structures | Antibody structural models |
| Per-residue confidence scores | pLDDT-style confidence estimates |
| Structure inference logs | Prediction metadata |
| Sequence embeddings | Fine-tuned latent representations |
| Training logs | Optimization and loss metrics |
- Heavy and light chains should be linked using a 25-residue glycine linker (
GGGGGGGGGGGGGGGGGGGGGGGGG) - The linker can be added automatically during preprocessing
- ESMFold installation is required
- Antibody structure prediction
- Structural downstream analysis
- Docking preparation
- Conformation-sensitive modeling
A conservation-aware fine-tuning mode that scans protein sequences for mutation effects and residue preferences during adaptation.
This mode estimates the probability of observing each amino acid at every sequence position across fine-tuning steps.
It is useful for studying sequence conservation, mutation tolerance, and evolutionary constraints.
| Output | Description |
|---|---|
| Conservation profiles | Probability distribution over amino acids at each residue position |
| Mutation effect scores | Predicted impact of sequence mutations |
| Training logs | Optimization and loss metrics |
- Mutation-effect prediction
- Conservation analysis
- Functional residue identification
- Protein engineering studies
All experiments are controlled through YAML configuration files.
Example configuration:
# ==============================================
# ⚙️ ESM2 Fine-Tuning Configuration
# ==============================================
# Name of pretrained ESM model
# All models from esm.pretrained are supported
esm_model_name: esm2_t33_650M_UR50
# Running mode
# Choose one of [ESM2, ESMFold, Conservation]
running_mode: ESM2
# Directory for saving outputs
save_path: ./outputs
# Number of fine-tuning steps
steps: 50
# Layer indices for LoRA injection
# [] means all layers
inject_layers: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
# Target module for LoRA injection
lora_target_replace_module: MultiheadAttention
# Fine-tuning steps used for scoring
score_seq_steps_list: [1,2,3,4,5]
# Target protein sequence
seq: VVKFMDVYQRSYCHPIETLVDIFQEYPDEIEYIFKPSCVPLMRCGGCCNDEGLECVPTEESNITMQIMRIKPHQGQHIGEMSFLQHNKCECRPK
# PDB identifier and chain ID
# Format: pdbid_chainid
pdbid_chainid: 1a2c_A| Parameter | Description |
|---|---|
esm_model_name |
Name of pretrained ESM model |
running_mode |
Running mode (ESM2, ESMFold, Conservation) |
save_path |
Directory for output files |
steps |
Number of fine-tuning steps |
inject_layers |
Transformer layers used for selective fine-tuning |
score_seq_steps_list |
Steps used for sequence scoring |
seq |
Input amino acid sequence |
pdbid_chainid |
PDB identifier and chain |
Input sequences should be provided in FASTA format.
Example:
>antibody_1_heavy
EVQLVESGGGLVQPGGSLRLSCAAS...
For paired heavy/light chain data:
- Manual linker insertion is needed
Finetune-Ab/
├── AbTune/ Core Python package
├── configs/ Example configuration files
├── .github/workflows/ CI configuration
├── pyproject.toml Package metadata
├── README.md
└── LICENSE