|
Figure 1. Genomic architecture of the Otop1 locus in vertebrates. (a) Mouse genomic region encompassing Otop1. The relative positions and transcriptional orientation of the indicated genes are drawn to scale, with the genomic structure of Otop1 provided in greater detail. The identical organization of genes within the depicted genomic region is also found in orangutan, macaque, galago, rat, guinea pig, horse, cat, dog, armadillo, opossum, platypus, chicken, and stickleback. (b) X. tropicalis genomic region encompassing Otop1 genes. Note the presence of two paralogous Otop1 genes (Otop1a and Otop1b), but otherwise the same general gene organization as in mouse. (c) Human genomic region encompassing OTOP1 showing the inversion separating OTOP1 and DRD5. The ~5-Mb inverted segment is flanked by large, highly similar SDs arranged in a palindromic fashion. The connecting blue lines indicate regions of paralogy between the proximal and distal duplicons (see also additional file 2 for additional pair-wise homology information). The underlying structure of each duplicon-containing region is depicted by colored vertical lines. Selected gene annotations from the UCSC Genome Browser RefSeq track are also shown: OTOP1 (red), genes residing in proximity to Otop1 in panels (a) and (b) (blue), genes within the RS447 megasatellite (green), and genes inverted in the human genome compared to species' genomes (orange). (d) Fine structure of the SDs flanking OTOP1 and the RS447 megasatellite in the human genome. Each duplicon is a mosaic of smaller duplicated segments that are labeled and colored based on their ancestral cytogenetic band of origin. Sequencing gaps in the hg17 sequence assembly of the human genome are indicated with pink lines, while the gene content is annotated below each duplicon. Note the 7E OR clusters residing at each duplicon boundary (black boxes) and palindromic configuration of the proximal and distal duplicons.
|
|
Figure 2. Phylogeny of the Otop family in amphibians. Maximum-likelihood phylogenetic tree based on the multi-sequence alignment of 19 Otop proteins identified in amphibian, mouse, human, and dog. Proteins are labeled as Xtr_ for X. tropicalis, Xlv_ for X. laevis, mus_ for mouse, homo_ for human, and dog_ for dog. Three distinct clades divide the Otop family into three subfamilies: Otop1, Otop2, and Otop3 (colored red, green, and blue, respectively). Amphibian Otop3 genes appear to have undergone additional gene-duplication events, creating lineage-specific paralogs (designated a to c following the gene symbol). We have applied the same naming convention to the X. tropicalis Otop1 (Otop1a and Otop1b) and Otop2 (Otop2a, Otop2b, and Otop2c) genes, although it is less clear if the duplication events giving rise to these multiple copies occurred in the amphibian lineage or are more ancient (with the genes then getting lost in the mammalian lineage). Branch labels are bootstrap values for 1000 replicates.
|
|
Figure 3. Inversion analysis of the genomic region encompassing OTOP1. Three-color interphase FISH using probes WIBR2-1849E16 (red), WIBR2-1416B12 (blue), and WIBR2-1634L14 (green) was used to determine the orientation of the OTOP1-containing region in the human (HSA), chimpanzee (PTR: Marcus, Cochise, Douglas, Katie, and Veronica), orangutan (PPY; PPY9, PPY6, and Susie), and macaque (MMU) genomes. The inversion changes the order of the red and blue probes (mapping inside of the inversion) and their relative position with respect to the green probe (mapping outside of the inversion). FISH results show inversion of the region in human and chimpanzee with respect to orangutan and macaque.
|
|
Figure 4. Evolutionary analysis of the OTOP1-proximal locus in the human genome. (a) SDs in human (HSA), chimpanzee (PTR), orangutan (PPY), and macaque (MMU) genomes, as detected by excess of whole-genome shotgun reads (depth of coverage). Note the evidence for a ~60-kb human-specific SD containing OTOP1-like and TMEM128-like sequences (i.e., the region between the vertical red dashed lines). (b) The genomic regions in chromosomes 4 and 2 were extracted from the human assembly hg17 and aligned with Miropeats Pertinent gene annotations and ancestral duplicon composition of each duplicon (obtained from DupMasker) are also shown. Newly identified ΨOTOP1 and ΨTMEM128 belong to a single duplication event located in the pericentromeric region of human chromosome 2. The red box represents an indel that has deleted exon 4 of ΨOTOP1.
|
|
Figure 5. Genomic architecture of the Ush1g-Otop2-Otop3 locus in vertebrates. (a) Mouse genomic region encompassing the Ush1g, Otop2, and Otop3 genes. The relative positions and transcriptional orientation of the indicated genes are drawn to scale, with the genomic structures of Ush1g, Otop2, and Otop3 provided in greater detail. Note the red, orange, and purple lines that indicate how three known splice forms of mouse Otop2 are derived from the use of four alternate non-coding exons (named 1a to 1d) and two internal splice donor sites in exons 1d and 2. The identical organization of genes within the depicted genomic region is also in found human, chimpanzee, orangutan, macaque, marmoset, galago, rat, guinea pig, horse, cow, cat, dog, armadillo, opossum, and chicken. (b) X. tropicalis genomic region encompassing the Ush1g, Otop2, and Otop3 genes. Note the presence of three paralogous genes for both Otop2 and Otop3 (for details about the phylogenetic relationships of the Otop genes in amphibian and selected vertebrates, see Figure 2). (c) Stickleback genomic regions containing the Ush1g, Otop2, and Otop3 genes showing a complex duplication and rearrangement pattern (see text for details).
|
|
Figure 6. CTCF-binding sites within the Ushg1-Otop2 locus. The coordinates of the depicted genomic region in the mouse genome (assembly NCBI/mm9) are Chr11:115168200-115193650. (a) Position of mapped sequence reads from ChIP-seq studies using an anti-CTCF antibody and the following cells: mouse E14 ES cell line [37]; human H1-hESC ES cell line (Duke/UNC/UT-Austin/EBI ENCODE group; http://genome-test.cse.ucsc.edu); resting human CD4+T cells [38]; and five non-cancerous, karyotypically normal human cell lines (HRE, human renal epithelial cells; BJ, skin fibroblasts; HUVEC, human umbilical vein endothelial cells; SAEC, small airway epithelial cells; and NHEK, normal human epidermal keratinocytes [Duke/UNC/UT-Austin/EBI ENCODE group]). For display purposes, the coordinates of the human CTCF-binding fragments CTCF2 and CTCF3 are presented based on the coordinates in the mouse genome. Based on these data, four CTCF-binding sites (CTCF1 to CTCF4) were identified with the following species-specific occupancy: CTCF1 and CTCF4, mouse-specific; CTCF2, human-specific; and CTCF3, both mouse and human. The mouse Ushg1 and Otop2 gene structures (b) and CpG-island content (c) were derived from the RefSeq and CpG island tracks of the UCSC Genome Browser (blue and green boxes, respectively). Black boxes represent non-coding MCSs identified by both ExactPlus and PhastaCons (d) or ExactPlus only (e), respectively. The open red boxes highlight the position of the CTCF-binding motifs. (f) Sequence Logos for CTCF1 to CTCF4 graphically represent the multi-sequence alignment at the CTCF-binding sites in placental mammals; the height of each symbol reflects the relative frequency of that nucleotide at that position. (*) Indicates that hominoid sequences were not considered for the Logo generation of CTCF1 due to lack of motif conservation (i.e., hominoid-specific deletion of base 9). Consensus Logo motifs for low-, medium-, and high-occupancy CTCF-binding sites (LowOc, MedOc and HighOc, respectively) are also shown; these classes are based on the degree to which the CTCF-binding sites match the known CTCF-binding motif and the densities of sequence reads mapped at the binding sites [40,41].
|