Add MMseqs makepaddedseqdb#10239
Conversation
keiran-rowell-unsw
left a comment
There was a problem hiding this comment.
Small changes to fix. Would like another non-fresh nf-core maintainer to approve if they have greater knowledge of module consistency
Joon-Klaps
left a comment
There was a problem hiding this comment.
Couple of minor issues, nice addition.
| run("MMSEQS_CREATEDB", alias: "MMSEQS_CREATEDB_TARGET") { | ||
| script "../../../mmseqs/createdb/main.nf" |
There was a problem hiding this comment.
Don't think you need to make an alias here, you don't have any other MMSEQS_CREATEDB modules in your setup.
There was a problem hiding this comment.
You're right. Removed
| "test_query_gpu:md5,5b24585ba92fd826c78b8664c63b4e95", | ||
| "test_query_gpu.dbtype:md5,01d39098f2bfee5c808a3b4ff54deac2", | ||
| "test_query_gpu.index:md5,5946b4989d08320d9daca503155ba693", | ||
| "test_query_gpu.lookup:md5,3eb85c645034a0717db62ef0a3da5479", | ||
| "test_query_gpu_h:md5,a9fca4931be476b8f302cc27b5dff9b0", | ||
| "test_query_gpu_h.dbtype:md5,740bab4f9ec8808aedb68d6b1281aeb2", | ||
| "test_query_gpu_h.index:md5,ce0ca30c2e57677077cc23823ef17206" |
There was a problem hiding this comment.
Suprised these are consistent but that's good!
| mkdir -p ${padded_prefix} | ||
| mmseqs \\ | ||
| makepaddedseqdb \\ | ||
| ${prefix}/${prefix} \\ |
There was a problem hiding this comment.
This will not always give the wanted results.
MMseqs allows to have multiple databases in a single directory. To avoid this we have implemented this (ugly) find command so users can specify if necessary which database to use here.
# Extract files with specified args based suffix | remove suffix | isolate longest common substring of files
DB_TARGET_PATH_NAME=\$(find -L "${db_target}/" -maxdepth 1 -name "${args2}" | sed 's/\\.[^.]*\$//' | sed -e 'N;s/^\\(.*\\).*\\n\\1.*\$/\\1\\n\\1/;D' | mkdir -p ${padded_prefix} | ||
| touch ${padded_prefix}/${padded_prefix} | ||
| touch ${padded_prefix}/${padded_prefix}.dbtype | ||
| touch ${padded_prefix}/${padded_prefix}.index | ||
| touch ${padded_prefix}/${padded_prefix}.lookup | ||
| touch ${padded_prefix}/${padded_prefix}_h | ||
| touch ${padded_prefix}/${padded_prefix}_h.dbtype | ||
| touch ${padded_prefix}/${padded_prefix}_h.index |
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
|
@Joon-Klaps thanks for the feedback and commit suggestions. I'd also like to apologise for my messy commit history. I believe I have implemented all changes you have suggested, and have updated Do let me know if any extra changes are needed or I've missed any suggestions. P.S, I noticed that there is a maintainer field in the meta.yml file. Would you like me to maintain this module going forward? |
Joon-Klaps
left a comment
There was a problem hiding this comment.
Hi @nbtm-sh, Looks great!
Have to double check if all tests are passing correctly, check if you can update your local branch to be in sync with the master branch as some modules you didn't touch are being ran (which typically means, your local fork is outdated).
You are free to add yourself as a maintainer of the module, if you don't want to, no pressure. This module will be kept up to date as it's part of the mmseqs suite.
|
Hey @Joon-Klaps, thanks for that. I'm happy to maintain the module going forward. I have synced this fork with the upstream and added myself to the yaml file. Let me know if anything else needs to be changed. Cheers |
keiran-rowell-unsw
left a comment
There was a problem hiding this comment.
LGTM, Joon's review was great improvement.
PR checklist
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile condaDescription
This PR adds support for MMseqs
makepaddedseqdb, used for GPU accelerated searches. It takes in an existing MMseqs database, and outputs a padded database in a similar format.