Skip to content

Add MMseqs makepaddedseqdb#10239

Merged
keiran-rowell-unsw merged 32 commits intonf-core:masterfrom
Australian-Structural-Biology-Computing:add-createpaddeddb
Feb 26, 2026
Merged

Add MMseqs makepaddedseqdb#10239
keiran-rowell-unsw merged 32 commits intonf-core:masterfrom
Australian-Structural-Biology-Computing:add-createpaddeddb

Conversation

@nbtm-sh
Copy link
Contributor

@nbtm-sh nbtm-sh commented Feb 24, 2026

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

Description

This PR adds support for MMseqs makepaddedseqdb, used for GPU accelerated searches. It takes in an existing MMseqs database, and outputs a padded database in a similar format.

Copy link

@keiran-rowell-unsw keiran-rowell-unsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changes to fix. Would like another non-fresh nf-core maintainer to approve if they have greater knowledge of module consistency

@keiran-rowell-unsw keiran-rowell-unsw added the new module Adding a new module label Feb 24, 2026
Copy link
Contributor

@Joon-Klaps Joon-Klaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor issues, nice addition.

Comment on lines 13 to 14
run("MMSEQS_CREATEDB", alias: "MMSEQS_CREATEDB_TARGET") {
script "../../../mmseqs/createdb/main.nf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think you need to make an alias here, you don't have any other MMSEQS_CREATEDB modules in your setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Removed

Comment on lines +36 to +42
"test_query_gpu:md5,5b24585ba92fd826c78b8664c63b4e95",
"test_query_gpu.dbtype:md5,01d39098f2bfee5c808a3b4ff54deac2",
"test_query_gpu.index:md5,5946b4989d08320d9daca503155ba693",
"test_query_gpu.lookup:md5,3eb85c645034a0717db62ef0a3da5479",
"test_query_gpu_h:md5,a9fca4931be476b8f302cc27b5dff9b0",
"test_query_gpu_h.dbtype:md5,740bab4f9ec8808aedb68d6b1281aeb2",
"test_query_gpu_h.index:md5,ce0ca30c2e57677077cc23823ef17206"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suprised these are consistent but that's good!

mkdir -p ${padded_prefix}
mmseqs \\
makepaddedseqdb \\
${prefix}/${prefix} \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not always give the wanted results.

MMseqs allows to have multiple databases in a single directory. To avoid this we have implemented this (ugly) find command so users can specify if necessary which database to use here.

# Extract files with specified args based suffix | remove suffix | isolate longest common substring of files
    DB_TARGET_PATH_NAME=\$(find -L "${db_target}/" -maxdepth 1 -name "${args2}" | sed 's/\\.[^.]*\$//' | sed -e 'N;s/^\\(.*\\).*\\n\\1.*\$/\\1\\n\\1/;D' 

Comment on lines 37 to 44
mkdir -p ${padded_prefix}
touch ${padded_prefix}/${padded_prefix}
touch ${padded_prefix}/${padded_prefix}.dbtype
touch ${padded_prefix}/${padded_prefix}.index
touch ${padded_prefix}/${padded_prefix}.lookup
touch ${padded_prefix}/${padded_prefix}_h
touch ${padded_prefix}/${padded_prefix}_h.dbtype
touch ${padded_prefix}/${padded_prefix}_h.index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change all to prefix

nbtm-sh and others added 8 commits February 25, 2026 08:58
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
@nbtm-sh
Copy link
Contributor Author

nbtm-sh commented Feb 24, 2026

@Joon-Klaps thanks for the feedback and commit suggestions. I'd also like to apologise for my messy commit history. I believe I have implemented all changes you have suggested, and have updated meta.yml and tests accordingly. All tests are passing on my end.

Do let me know if any extra changes are needed or I've missed any suggestions.

P.S, I noticed that there is a maintainer field in the meta.yml file. Would you like me to maintain this module going forward?

@nbtm-sh nbtm-sh requested a review from Joon-Klaps February 24, 2026 23:54
Copy link
Contributor

@Joon-Klaps Joon-Klaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nbtm-sh, Looks great!
Have to double check if all tests are passing correctly, check if you can update your local branch to be in sync with the master branch as some modules you didn't touch are being ran (which typically means, your local fork is outdated).

You are free to add yourself as a maintainer of the module, if you don't want to, no pressure. This module will be kept up to date as it's part of the mmseqs suite.

@nbtm-sh
Copy link
Contributor Author

nbtm-sh commented Feb 25, 2026

Hey @Joon-Klaps, thanks for that.

I'm happy to maintain the module going forward. I have synced this fork with the upstream and added myself to the yaml file. Let me know if anything else needs to be changed.

Cheers

Copy link

@keiran-rowell-unsw keiran-rowell-unsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Joon's review was great improvement.

@keiran-rowell-unsw keiran-rowell-unsw added this pull request to the merge queue Feb 26, 2026
Merged via the queue into nf-core:master with commit 40a31ec Feb 26, 2026
19 checks passed
@keiran-rowell-unsw keiran-rowell-unsw deleted the add-createpaddeddb branch February 26, 2026 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new module Adding a new module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants