Skip to content

Different allele counts for vcf and tree file. #41

@jodyhey

Description

@jodyhey

Did a test run of 100kb of vcf with 98 genomes with -polar 0.9. I scanned the tree file (actually codex did) for 0 and 1 counts at each snp and compared to biallelic vcf, expecting either identical counts or complementary counts. But observed about 3% of sites showing something like 162 1's in the vcf and 153 in the tree file. I asked codex to check into this, focusing specifically at pos 5039, a position with 4 1's in the vcf file and 196 1's in the tree file.

Here is the explanation from codex: " - At chr2L:5039 (relative 233), the tree has two top-level mutations (parent = -1):
- root node 2955: derived 1
- descendant node 1943: derived 0

  • With both marked top-level, tskit applies them in table order; the root 1 overwrites everything,
    yielding all 1’s in genotypes.
  • The 0 on node 1943 is a back-mutation and must be a child of the root 1 mutation (its parent should
    be the 1-mutation’s ID). Then tskit would produce 192 ones and 4 zeros, matching the VCF. "

So it looks like I can still work with this, but though it worth mentioning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions