Skip to content

haddocking/AbTune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AbTune

Layer-wise selective fine-tuning of protein language models for antibodies.

AbTune is a user-friendly framework for sequence-specific and computationally efficient fine-tuning of protein language models (pLMs) on antibody datasets.

AbTune implements a layer-wise selective fine-tuning strategy, where only a subset of transformer layers are updated during adaptation. This substantially reduces computational cost while improving performance on antibody-related downstream tasks.

The framework currently supports:

  • Sequence representation fine-tuning with ESM2
  • Structure prediction with ESMFold
  • Conservation and mutation-effect analysis through sequence scanning

The tool accompanies the preprint:

Xu et al. AbTune: Layer-wise Selective Fine-Tuning of Protein Language Models for Antibodies. bioRxiv, 2025.

https://www.biorxiv.org/content/10.1101/2025.10.17.682998v1


Features

  • Layer-wise selective fine-tuning for reduced GPU memory usage
  • Efficient adaptation of large protein language models
  • Supports both sequence- and structure-level workflows
  • Compatible with ESM2 and ESMFold backbones
  • YAML-based configuration system
  • Automatic heavy/light chain handling
  • FASTA-based input pipelines
  • Conservation and mutation-scanning utilities

Installation

Requirements

Dependency Version
Python >= 3.10
biopython 1.85
fair-esm 2.0.0
numpy 1.26.4
omegaconf 2.3.0
pandas 2.2.3
torch 2.1.2

All required dependencies are installed automatically during installation.


Step 1 (Optional): Install ESMFold

If you intend to use AbTune in ESMFold mode, ESMFold must be installed separately before installing AbTune.

ESMFold requires additional dependencies, including OpenFold, which are not bundled with this package.

Follow the official installation instructions:

git clone https://github.com/facebookresearch/esm

More details:

https://github.com/facebookresearch/esm


Step 2: Install AbTune

Install from PyPI (recommended)

pip install Ab-Tune

Install from source

git clone https://github.com/haddocking/Finetune-Ab
cd Finetune-Ab
pip install .

This installs the command-line entry point:

ab-tune

Quick Start

AbTune is configured entirely through YAML configuration files.

Run a job using:

ab-tune --config configs/ESM2.yaml

You can switch running modes, datasets, and hyperparameters without modifying the source code.


Running Modes

AbTune currently supports three operating modes.


1. ESM2 Mode

Fine-tunes ESM2 models directly on antibody sequences.

Only the linear projection layers inside Multi-Head Attention (MHA) modules are updated during training, while the remainder of the model remains frozen.

This mode is useful when:

  • Improved antibody embeddings are required
  • Structure prediction is not needed
  • Minimal computational overhead is desired

Typical Outputs

Output Description
Embeddings Fine-tuned antibody sequence representations
Training logs Optimization and loss metrics
Validation metrics Task-specific evaluation results

Example Use Cases

  • Embedding extraction
  • Similarity analysis
  • Downstream machine learning pipelines
  • Antibody property prediction

2. ESMFold Mode

Extends the ESM2 mode to additionally perform ESMFold structure prediction using the fine-tuned backbone.

Heavy and light chain sequences are concatenated internally using a 25-residue polyglycine linker before being passed to the model.

In our experiments, inclusion of the linker consistently improved model performance.

This mode requires ESMFold to be installed separately.

Typical Outputs

Output Description
Predicted PDB structures Antibody structural models
Per-residue confidence scores pLDDT-style confidence estimates
Structure inference logs Prediction metadata
Sequence embeddings Fine-tuned latent representations
Training logs Optimization and loss metrics

Notes

  • Heavy and light chains should be linked using a 25-residue glycine linker (GGGGGGGGGGGGGGGGGGGGGGGGG)
  • The linker can be added automatically during preprocessing
  • ESMFold installation is required

Example Use Cases

  • Antibody structure prediction
  • Structural downstream analysis
  • Docking preparation
  • Conformation-sensitive modeling

3. Conservation Mode

A conservation-aware fine-tuning mode that scans protein sequences for mutation effects and residue preferences during adaptation.

This mode estimates the probability of observing each amino acid at every sequence position across fine-tuning steps.

It is useful for studying sequence conservation, mutation tolerance, and evolutionary constraints.

Typical Outputs

Output Description
Conservation profiles Probability distribution over amino acids at each residue position
Mutation effect scores Predicted impact of sequence mutations
Training logs Optimization and loss metrics

Example Use Cases

  • Mutation-effect prediction
  • Conservation analysis
  • Functional residue identification
  • Protein engineering studies

Configuration Files

All experiments are controlled through YAML configuration files.

Example configuration:

# ==============================================
#   ⚙️  ESM2 Fine-Tuning Configuration
# ==============================================

# Name of pretrained ESM model
# All models from esm.pretrained are supported
esm_model_name: esm2_t33_650M_UR50

# Running mode
# Choose one of [ESM2, ESMFold, Conservation]
running_mode: ESM2

# Directory for saving outputs
save_path: ./outputs

# Number of fine-tuning steps
steps: 50

# Layer indices for LoRA injection
# [] means all layers
inject_layers: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

# Target module for LoRA injection
lora_target_replace_module: MultiheadAttention

# Fine-tuning steps used for scoring
score_seq_steps_list: [1,2,3,4,5]

# Target protein sequence
seq: VVKFMDVYQRSYCHPIETLVDIFQEYPDEIEYIFKPSCVPLMRCGGCCNDEGLECVPTEESNITMQIMRIKPHQGQHIGEMSFLQHNKCECRPK

# PDB identifier and chain ID
# Format: pdbid_chainid
pdbid_chainid: 1a2c_A

Important Parameters

Parameter Description
esm_model_name Name of pretrained ESM model
running_mode Running mode (ESM2, ESMFold, Conservation)
save_path Directory for output files
steps Number of fine-tuning steps
inject_layers Transformer layers used for selective fine-tuning
score_seq_steps_list Steps used for sequence scoring
seq Input amino acid sequence
pdbid_chainid PDB identifier and chain

Input Format

Input sequences should be provided in FASTA format.

Example:

>antibody_1_heavy
EVQLVESGGGLVQPGGSLRLSCAAS...

For paired heavy/light chain data:

  • Manual linker insertion is needed

Project Structure

Finetune-Ab/
├── AbTune/                 Core Python package
├── configs/                Example configuration files
├── .github/workflows/      CI configuration
├── pyproject.toml          Package metadata
├── README.md
└── LICENSE

About

Layer-wise selective Fine-Tuning of protein language models for Antibodies

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages