Skip to content

Add MMseqs Colabfold Search GPU Support#497

Open
nbtm-sh wants to merge 56 commits intonf-core:devfrom
nbtm-sh:add-mmseqs-colabfoldsearch-gpu
Open

Add MMseqs Colabfold Search GPU Support#497
nbtm-sh wants to merge 56 commits intonf-core:devfrom
nbtm-sh:add-mmseqs-colabfoldsearch-gpu

Conversation

@nbtm-sh
Copy link
Copy Markdown

@nbtm-sh nbtm-sh commented Feb 26, 2026

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/proteinfold branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Description

This PR aims to add support for GPU accelerated alignment in the colabfoldsearch pipeline. This is partly awaiting the merge of mmseqs/makepaddedseqdb in nf-core. This will rely on the mmseqs/makepaddedseqdb nf-core module (merged 2025-02-16) to build GPU databases if none are provided.

To-Do

Pipeline runs just fine with new GPU accelerated search. The only missing feature is generating the databases if the user does not provide them.

Paths to the container also need to be changed. The upstream dockerfile will need to be rebuilt.

@keiran-rowell-unsw keiran-rowell-unsw added this to the 2.1.0 milestone Feb 26, 2026
@keiran-rowell-unsw keiran-rowell-unsw added the enhancement Improvement for existing functionality label Feb 26, 2026
@nbtm-sh
Copy link
Copy Markdown
Author

nbtm-sh commented Mar 11, 2026

Closes #520

@nbtm-sh nbtm-sh marked this pull request as ready for review March 11, 2026 03:19
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 11, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 3fdc5ea

+| ✅ 345 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗  35 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: conf/igenomes_ignored.config
  • nextflow_config - Config manifest.version should end in dev: 2.0.0
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • schema_description - Ungrouped param in schema: colabfold_enable_gpu_search
  • schema_description - Ungrouped param in schema: colabfold_envdb_path_padded
  • schema_description - Ungrouped param in schema: colabfold_uniref30_path_padded
  • schema_description - No description provided in schema for parameter: rosettafold2na_uniref30_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_bfd_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_pdb100_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_weights_link
  • schema_description - No description provided in schema for parameter: rfam_full_region_link
  • schema_description - No description provided in schema for parameter: rfam_cm_link
  • schema_description - No description provided in schema for parameter: rnacentral_rfam_annotations_link
  • schema_description - No description provided in schema for parameter: rnacentral_id_mapping_link
  • schema_description - No description provided in schema for parameter: rnacentral_sequences_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_uniref30_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_bfd_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_pdb100_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_weights_path
  • local_component_structure - prepare_esmfold_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_helixfold3_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_alphafold3_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_rosettafold2na_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_alphafold2_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - post_processing.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - aria2_uncompress.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_rosettafold_all_atom_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_boltz_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_colabfold_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-03-12 22:58:05

@nbtm-sh
Copy link
Copy Markdown
Author

nbtm-sh commented Mar 11, 2026

Genuinely not a clue why this is failing. Not a single mention of makepaddedseqdb in the code that I can see. @keiran-rowell-unsw would you be so kinda as to provide a 2nd set of eyes. Disregard. The moment I draw attention to it, I find the issue.

Copy link
Copy Markdown

@keiran-rowell keiran-rowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

Can't see/fix your dev merge conflicts since its from your private branch. Once in post release if can resolve conflicts,, tweak language, and it executes consistently, then LGTM

Comment thread docs/usage.md Outdated
label 'process_high'

container "nf-core/proteinfold_mmseqs_colabfoldsearch:2.0.0"
container "docker.io/nbtmsh/mmseqs_colabfoldsearch:latest"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely place in quay.io/nf-core (or Seqera wave when it gets to it) when dev is open to merges for v2.1

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leave it as is for now?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for now. Ping Jose a bit before the merge so the container can go to the standard location

Comment thread docs/gpu-dbs.md Outdated
Comment thread docs/gpu-dbs.md Outdated
Comment thread docs/gpu-dbs.md Outdated
Comment thread docs/gpu-dbs.md Outdated
Comment thread docs/gpu-dbs.md
cp ./colabfold_envdb/colabfold_envdb_202108_db_aln.* ./colabfold_envdb_padded/
```

You should now have a directory structure that looks something similar to this
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a quick summary of extensions, something like you must see the db_h files, the .index, etc.

Just to quickly highlight what's not in the pre-DLed DBs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to quickly highlight what's not in the pre-DLed DBs.

Just a bit confused by this. All the DBs shown here have been downloaded form the mmseqs server. This is just copying the unpadded alignment files to the padded database, as these are also needed

Comment thread tests/colabfold_local_gpu.nf.test
Comment thread docs/gpu-dbs.md Outdated
@nbtm-sh
Copy link
Copy Markdown
Author

nbtm-sh commented Mar 12, 2026

@keiran-rowell resolved merge conflicts. Please review my comments

@jscgh jscgh removed this from the 2.1.0 milestone Apr 23, 2026
@jscgh jscgh mentioned this pull request Apr 23, 2026
11 tasks
@jscgh jscgh added this to the 3.0.0 milestone Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement for existing functionality Ready for review

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants