|
Figure 1. Comparison of nomenclatures for alternative splicing.Examples of splicing structures in the 5 human genes VEGFA (A), CLEC10A (B), TCL6 (C), AURKC (D), and AIF1 (E). In each case a schema of the exonâintron structure is shown where variable sites are numbered consecutively from 5â² to 3â². Subsequently, the splicing structure is described with the Malko's 5-component strings, Nagasaki's bit matrices and integer vectors, the nomenclature of the ASD/ATD/AEdb databases and with the AS code we propose in this work. The nomenclature of ASD/ATD/AEdb assigns ambiguously the same identifier to the structures in VEGFA (A) and TCL6 (C), respectively in CLEC10A (B) and AURKC (D). In CLEC10A (B), the bit matrix system assumes independence between both sides of the exon and therefore can not identify a single AS event. In AURKC (D), the vector (1,3) is assignable from the bit matrices, but it is not considered as part of the alternative donor event (9,13). Authors of the ASD/ATD/AEdb nomenclature propose the term âCIRâ for complex intron retention structures. However, as in AIF1 (E), the selection of the central intron can be problematic as the names âCIR-II-5p3p-5p-IR-3pâ, âCIR-CIR-II5p3p-5p-5pâ, or âCIR-II5p4p-CIR-IR-3p-3pâ could be imaginable.
|
|
Figure 2. Pairwise AS events in the TCL6 gene.Schematic overview of the RefSeq transcripts of the TCL6 gene (top) and all pairwise AS events (AâN) they describe according to Definition 4. For each event, the corresponding AS code and the structure with the variable splice sites numbered from 5â² to 3â² are presented. Besides traditional events as skipped exon (A and G), retained intron (B), mutually exclusive exons (H), alternative donor (C) and acceptor site (F), novel events are observed that involve more than one of the latter types (D and E) or are connected to differences in the transcription start/polyadenylation site (I through N). Note that in our method L, M and N are considered as three different events that expose the same structure (i.e., [1]â[2],[3]â[4]).
|
|
Figure 3. Comparison of the AS landscape in human reference annotations.Distribution of AS events that are not related to alternative transcription starts/polyadenylation sites and contain exclusively introns with canonical splice sites in different reference annotations of the human genome: EnsEmbl, RefSeq, and Gencode. Numbers represent the event count for each different structure and the proportions of the 4 simplest splicing patterns are colored as follows: exon skipping in blue, alternate donors in green, alternate acceptors in red and retained introns in yellow; the fraction of all types of more complex events is shown together in grey with the number of different structures observed there given in brackets. In general, the landscape of AS splicing is similar across the three datasets, with the biggest difference being a comparatively larger fraction of complex events in EnsEmbl.
|
|
Figure 4. Landscape of AS events in the 5â² UTR vs. CDS.Landscape of AS events in RefSeq with all variable splice sites included in the 5â² UTR (A) in comparison to the ones included in the genomic region of the CDS (B). The structurally different groups are colored as in Figure 3. ES is more frequent in the CDS, whereas IR is observed more often in the 5â² UTR. Whereas in CDS alternative acceptors are more frequent than alternative donors, the landscape of events in the 5â² UTR exhibits a reverse ratio with a bias against alternative acceptors. The more complex AS events are mainly located in the region of the CDS.
|
|
Figure 5. Bias of potential stop codons in the splice site sequences.Proportion of the coding exons that truncate the ORF when artificially extended into the intronic region at the splice donor (blue diamonds) or splice acceptor sites (red crosses). The horizontal axis shows the number of artificial codons taken from the intronic sequence (i.e., the 1st, 2nd, 3rd, etc. codon downstream of the splice donor respectively upstream of the splice acceptor). The vertical axis to the left gives the percentage of sites that show an in-frame stop with the theoretical inclusion of the respective codon. For the regions A, B, and C, sequence logos are shown where dotted lines indicate the exon boundary and intrinsic potential stop codons are shaded in grey. When regarding exclusively the extension of one (complete) codon into the intron, one third less ORFs would be truncated when extending at the acceptor site compared to the donor site (A vs. B). The observation can partially be explained by in-frame stop codons intrinsic to the different splice site consensus sequences. A secondary peak of stop codons is observed â¼9 extended codons upstream of the acceptor site at a common position for the branch point (consensus sequence C). Sequence logos have been produced with the tool âseqlogoâ [66]. Branch point sequences have been kindly provided by the Ast laboratory (http://ast.bioinfo.tau.ac.il/BranchSite.htm).
|
|
Figure 6. Landscape of AS in noncoding transcripts.The landscape of AS in CDSs of coding transcripts (A) compared to events occurring in noncoding transcripts (B) with the different classes colored as in Figure 3. Complex events and retained introns are more frequent in noncoding transcripts whereas the fraction of ES is clearly higher in coding regions. Alternative donors compared to alternative acceptors are more frequent in the noncoding transcripts.
|
|
Figure 7. Comparative genomics of the AS landscape in 12 metazoa.For each of the 12 compared species a pie diagram shows the distribution of events across 5 structural different classes (color scheme as in Figure 3). Vertebratesâamongst them especially mammalsâexhibit more exon skipping and complex events and less retained introns than invertebrates. Estimations of evolutionary distances are given according to [67].
|
|
Figure 8. Algorithm for the extraction of pairwise AS events.The algorithm extracts from a splicing graph G(V,E) all events that are described by transcript pairs (St,Su) in a locus C. By priority queue W, nodes si of the splicing graph are iterated from 5â² to 3â² according to pos(si). The queue contains at the beginning root and subsequently is filled with all nodes sj that are connected by outedges of si âif they are supported by either St or Su.
|