Skip to content

Map transcript coordinates to genome coordinates #203

@dariober

Description

@dariober

I have the coordinates of domains mapped to transcripts so that the coordinates are relative to transcripts. Now I would like to transfer these transcript-coordinates to genome-coordinates. I wonder if there is some magic in gffutils that makes this reasonably easy. Hopefully an example will clarify:

Say my genome gff file contains this transcript with 3 exons:

import gffutils

data = """\
chr1 . gene 11 30 . . . ID=g1
chr1 . mRNA 11 30 . . . ID=t1;Parent=g1
chr1 . exon 11 14 . . . ID=e1;Parent=t1
chr1 . exon 19 22 . . . ID=e2;Parent=t1
chr1 . exon 27 30 . . . ID=e3;Parent=t1"""

db = gffutils.create_db(data.replace(' ', '\t'), ":memory:", from_string=True)

Which would look like this as text ideogram:

          GGGGGGGG_g1_GGGGGGGG
          EEEE----EEEE----EEEE
1         11        21        31 

I have a domain that in transcript coordinates start at position 7 and ends at position 10. So it would look like this:

          GGGGGGGG_g1_GGGGGGGG
          EEEE----EEEE----EEEE
                    ||----||
1         11        21        31

I would like a function that given the transcript ID and the domain coordinates in transcript-space returns the coordinates in genome-space:

transcriptToGenome(db, txid='t1', tx_start=7, tx_end=10)
# returns something like:
chr1	.	dom	21	28	.	.	.	ID=d1;Parent=t1
chr1	.	dom	21	22	.	.	.	ID=d1.1;Parent=d1
chr1	.	dom	27	28	.	.	.	ID=d1.2;Parent=d1

Before I bang my head on it, I wonder if an easy solution already exists. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions