-
Notifications
You must be signed in to change notification settings - Fork 18
Description
How are protein-stemmed glyphs to be used in conjunction with CDS/CDS domain glyphs?
Should CDS glyphs be required to represent the full start->stop codon open reading frame?
The polypeptide-stemmed glyph indicates a feature that manifests in the polypeptide form. Extended from this stem, the protein cleavage site glyph might represent a TEV protease site and a protein stability element glyph might represent a solubilization domain or(?) a degradation tag². Polypeptides are encoded by CDSs (Coding DNA Sequences), a DNA feature¹. The clash naturally arises in how to indicate a protein-stemmed feature within the CDS it is encoded within. The ambiguity might be why they have been some of the least used glyphs.
Option 1a): require CDS/domain glyphs to cover the contiguous translation unit, i.e. start to stop codons, AND³ 1b) allow superposition of any protein-stem glyphs with the CDS glyph or any of its domains, as shown in example (A).
Rationale: The SO term "CDS" is defined as "A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon." Furthermore, the CDS pentagon/arrow glyph (or a rectangle) has historically been used to faithfully encompass start–>stop codon translated reading frame spans in biology and syn bio well before SBOL's formalization of it and to this day. The translational unit is evidently of extreme importance to denote with a single glyph without interruptions, save for domain indicators that subdivide the glyph without breaking it. Surely we cannot violate such basic definitions and norms. In fact, we sort of already decided on the sanctity of the contiguous CDS glyph when we deliberated on the 2A peptide glyph in Issue 78, where we chose dashed lines that don't interrupt the CDS pentagon shape. I actually brought up the present issue and arguments back then: comment (#1) (#2)
Option 2): Allow protein-stem glyphs to substitute domain glyphs as in example (B), and thus allow CDS/domain glyphs to stand for protein-coding segments of DNA without implication of a full open reading frame, i.e., without the implication of beginning/ending with start/stop codons.
Rationale: Maybe someone thinks glyph superposition must be avoided and that the CDS definition and norms are better to be revised instead of being respected. I think example (B) is misleading: instinct to see the CDS glyph as a translational unit makes the diagram evoke that the CDS wrongly ends after the purple domain, and that the stability element is a feature that follows the CDS, not is part of it. Also, the cleavage site in the middle of (B) interrupts the interlocking domain shape, which is aesthetically unpleasing.
¹ CDSs may be DNA features, perhaps rationalized as information-storage parts, but CDSs truly only manifest their coding function in the RNA, since that's what the ribosome/tRNA read. 🤔
² The stability-top in general is perhaps so rarely used because the +/– direction of stabilization is hugely important to understanding the function of the part and the reason it is used in a circuit. Positive-stability domains are quite rare in syn bio; I can only imagine enzymes being stabilized by, e.g., an MBP or GST tag. It's counterintuitive that a shield glyph can represent negative stabilization when degradation tags would be the predominant use of the glyph in syn bio. Furthermore, it is pretty easy to misuse/misinterpret the protein cleavage site glyph as a degradation tag, as the X top evokes degradation as well as cleavage. Not to mention, technically, the proteasomal degradation process is a series of many proteolytic cleavages. This matter is for a separate issue.
³ option 1B need not necessarily be in conjunction with 1A. But this would mean that either protein-stem glyphs would have to be deprecated or such glyphs could only be used only in isolation, outside genes/CDSs, e.g., in part plasmids where they are in isolation. There must be some implied SBOL rule that prevents glyph composition from invalidating the usage of a glyph, as would happen when, say, a deg tag part and the protein-stemmed glyph that represents it gets used to build a CDS in a gene: the glyph would become invalid in the composition with other CDS domains, where the CDS domains would take precedent. Hence the option being to permit superposition of the glyphs.
