Skip to content

Potential bug when using paired-end files #17

@mapo9

Description

@mapo9

Hi,
I found some weird behaviour when running paired-end data.

To test some stuff, I created some simulated datasets where I know all the parameters like repertoire size, size of each clonotype, V gene, J gene etc.
I created the fastq's as paired-end seq files
and ran catt with the following command catt --f1 test_R1.fastq --f2 test_R2.fastq -o test_out -t 20.

The unintended behaviour I experienced can nicely be seen in one of my samples with a repertoire of one clonotype with 10.000 clones.
Catt returns 3 clones with exactly equal NNseq:
AAseq,NNseq,Prob,Vregion,Jregion,Dregion,Frequency CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*02,TRBJ2-1*01,TRBD1*01,6727 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*06,TRBJ2-1*01,TRBD1*01,6727 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*03,TRBJ2-1*01,TRBD1*01,6546

When combining the "different" clonotypes into one the frequencies sum up to 20.000 clones instead of 10.000.
So, it seems like catt is counting each clone twice

I thus merged the paired end files to a single file using pear and repeated the analysis.
This returned the same results as the paired-end run, only the frequencies are different.
AAseq,NNseq,Prob,Vregion,Jregion,Dregion,Frequency CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*06,TRBJ2-1*01,TRBD1*01,3379 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*02,TRBJ2-1*01,TRBD1*01,3348 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*03,TRBJ2-1*01,TRBD1*01,3273
For the merged "single-end" files the clones sum up to the expected 10.000.
What confuses me a little though that the counts aren't exactly half of the ones in the paired-end run

I guess that there must be some issue when counting the frequency for the paired-end samples.

Would be awesome if you could have a look!
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions