Skip to content

CSQ masks non-coding gene annotations #2548

@MattWellie

Description

@MattWellie

We've identified a corner case where a clinically relevant non-coding gene (RNU2-2[P]) overlaps a non-clinically-relevant gene (WDR74). bcftools csq's logic only searches for non-coding transcript consequences if there are no coding-transcript hits.

https://github.com/samtools/bcftools/blob/develop/csq.c#L3694

We are using BCFtools in a workflow where csq annotates variant consequences, but also to associates variants with genes, so annotation on non-coding genes is still important. This csq decision was obscured for a while because in our hands it was annotating some non-coding genes just fine (e.g. RNU4-2), though that now appears to be because RNU4-2 doesn't overlap a coding transcript, so this condition was never triggered.

We were able to overcome this issue by splitting the GFF into coding and non-coding, and doing two non-conflicting annotation loops, but we've also solved the problem in code and wondered if this was a change you might be interested in adopting.

  • Original logic: If a CDS, UTR, or splice consequence is annotated on the variant record, don't run the transcript scan (here is the only point in code where non-coding CSQs originate)
  • New logic: Record whether a CDS, UTR, or splice consequence is annotated on the variant record, then run the transcript scan. If a coding variant was detected, skip all coding transcripts, but annotate non-coding transcripts as normal.

develop...populationgenomics:bcftools:develop

In practice this leaves the coding annotation unchanged, and always checks for overlapping non-coding gene annotations, removing the conflict between the two entities.

I appreciate non-coding annotation is not always useful, so this might not be useful for most users. It would be ideal if this was a CLI-switch behaviour to allow users to opt in to more non-coding annotation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions