Module to build local snpEff database#9967
Open
pmoris wants to merge 3 commits intonf-core:masterfrom
Open
Conversation
This module can build a snpEff config file and database from an input reference fasta, annotation and optionally cds/protein file. It outputs the database and config file separately. The snpEff config is created based on an existing template file (user-provided, but could alternatively be found inside snpEff's install directory), and appends appropriate genome lines for the all provided genome. The snpEff build command makes use of the dataDir flag in combination with a relative path, which snpEff resolves relative to the provided config file. This will override any datadir definition in the config file.
Makes snpeff annotation module compatible with the build module. The module expects a VCF, snpeff config and snpeff database as its inputs. To be determined whether or not this presents a breaking change compared to the previous implementation (e.g. cache_command vs datadir).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
My attempt at improving the snpEff module(s) to make it possible to generate a snpEff config file and database directory from reference fasta/annotation/cds/protein files.
Motivation
For the use-cases of my lab (small pathogen genomes), we've up until now always relied on creating a custom snpEff database ourselves, rather than using the pre-built databases as recommended by snpEff. We reasoned this makes it easier to control the versions of reference genomes and ensure we are using the same ones for alignment and annotation.
I can imagine there are other use cases where custom genomes not present in the database are required as well.
Note I haven't dived into https://annotation-cache.github.io/ yet, but I guess even there it makes sense to allow users to create a completely custom local database if they desire to do so.
Caveats and todos
Related info:
PR checklist
Closes #XXX
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda