Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
???displayArticle.abstract???
BACKGROUND: Many homeobox genes show remarkable conservation between divergent animal phyla. In contrast, the ARGFX (Arginine-fifty homeobox) homeobox locus was identified in the human genome but is not present in mouse or invertebrates. Here we ask when and how this locus originated and examine its pattern of molecular evolution.
RESULTS: Phylogenetic and phylogenomic analyses suggest that ARGFX originated by gene duplication from Otx1, Otx2 or Crx during early mammalian evolution, most likely on the stem lineage of the eutherians. ARGFX diverged extensively from its progenitor homeobox gene and its exons have been functional and subject to purifying selection through much of placental mammal radiation. Surprisingly, the coding region is disrupted in most mammalian genomes analysed, with human being the only mammal identified in which the full open reading frame is retained. Indeed, we describe a transcript from human testis that has the potential to encode the full deduced protein.
CONCLUSIONS: The unusual pattern of evolution suggests that the ARGFX gene may encode a functional RNA or alternatively it may have 'flickered' between functional and non-functional states in the evolutionary history of mammals, particularly in the period when many mammalian lineages diverged within a relatively short time span.
Figure 1. Gene structure of human ARGFX. PCR primer positions and amplicons are shown relative to predicted human ARGFX gene structure [6]. Boxes indicate exons, drawn to scale; lines indicate introns, not drawn to scale. Numbers above boxes and beneath lines indicate the lengths of each exon and intron. The 5' and 3' untranslated regions are shown in white and the protein coding regions are shown in black, except for the homeodomain which is red.
Figure 2. ARGFX gene in vertebrates. The phylogenetic tree of mammals is based on [29-32]. Column 1: ARGFX is inferred to be a probable gene (â), possible pseudogene with disrupted coding region (Ï), secondarily completely lost (Ã) or not included in current genome data (?). Column 2: Minimal number of retrotransposed ARGFX pseudogenes based on current genome data. Column 3: Synteny with the human ARGFX genomic region is conserved (â) or not (Ã), or not sequenced (?).
Figure 3. Comparison of ARGFX, OTX1, OTX2 and CRX gene structures. Human TPRX1, DPRX, HESX1 and GSC gene structures were used as references. Exons are represented by boxes and introns by lines, with the length in nucleotides written above. The 5' and 3' untranslated regions are shown in white and the protein coding regions in black except for homeodomains which are shown in red. Human gene structures follow the NCBI gene annotation; tree shrew and megabat ARGFX intron positions were deduced by reference to retroposed pseudogenes.
Figure 4. Greater conservation of DNA sequences between human ARGFX, OTX1 and OTX2 genomic regions than with DPRX and TPRX1. Human GSC and HESX1 were also used as references, but no similarity was found. Genomic sequences from the last base pair of upstream gene to the first base pair of downstream gene for each locus (based on UCSC at http://genome.ucsc.edu/ were used, and compared using Shuffle-LAGAN [28] in mVISTA, which can detect sequence rearrangements. Coloured peaks (purple, coding; pink, intergenic; blue, transcribed non-coding) indicate regions of at least 30 bp and 30% similarity.
Figure 5. Phylogenetic relationship between ARGFX and other PRD class homeobox genes. Maximum likelihood phylogenetic tree constructed using complete deduced human ARGFX protein sequence and the most similar human homeodomain proteins. Bootstrap support values over 50% are shown. Essentially the same topology was recovered by Bayesian analysis except at weakly supported nodes, notably the position of VSX1.
Figure 6. Synteny and paralogy around the Otx gene family. Map positions of amphioxus Otx and its neighbouring genes are compared to their human orthologues, which map primarily to chromosomes 1, 2, 14, 11 and 19, not chromosome 3. Amphioxus genes are shown in their physical order, and are numbered as in amphioxus (B. floridae) genome assembly v. 1.0. GeneID 20 is amphioxus Otx. GeneID 22 and 23 are most likely two parts of a gene and are treated as one locus. Human orthologues are not necessarily in order. Amphioxus genes 2 and 19 (black boxes) do not have clear human homologues; phylogenetic relationships are not well resolved for amphioxus genes 8, 17 and 24 (grey boxes). Human orthologues of amphioxus gene 13 do not map to on the five main chromosomal regions.
Figure 7. Conservation of DNA sequence conservation between mammalian ARGFX genomic sequences. Only species with a high genome assembly in this region were used. Sequences were aligned by the LAGAN program [17] in mVISTA. Length of the genomic region for each species is on the right. Macaque is missing a region between exon 2 and exon 4 accounting for higher similarity between human and marmoset than human and macaque in this region. The higher sequence similarity in and around exons is clearly visible, indicative of selective constraints since divergence of the species shown. Coloured peaks (purple, coding; pink, intergenic; light blue, UTR) indicate regions of at least 50 bp and 50% similarity.a. mVISTA plot using repeat-masked genomic sequences; b. mVISTA plot using sequence with no masking of repeats.
Abascal,
ProtTest: selection of best-fit models of protein evolution.
2005, Pubmed
Abascal,
ProtTest: selection of best-fit models of protein evolution.
2005,
Pubmed
Altschul,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997,
Pubmed
Booth,
Annotation, nomenclature and evolution of four novel homeobox genes expressed in the human germ line.
2007,
Pubmed
Brudno,
LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.
2003,
Pubmed
Brudno,
Glocal alignment: finding rearrangements during alignment.
2003,
Pubmed
Cillo,
HOX genes in human cancers.
,
Pubmed
Clapp,
Evolutionary conservation of a coding function for D4Z4, the tandem DNA repeat mutated in facioscapulohumeral muscular dystrophy.
2007,
Pubmed
Del Bene,
Cell cycle control by homeobox genes in development and disease.
2005,
Pubmed
Felsenstein,
CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.
1985,
Pubmed
Frith,
Pseudo-messenger RNA: phantoms of the transcriptome.
2006,
Pubmed
Gal-Mark,
Alternative splicing of Alu exons--two arms are better than one.
2008,
Pubmed
Garcia-Fernández,
Archetypal organization of the amphioxus Hox gene cluster.
1994,
Pubmed
Guindon,
PHYML Online--a web server for fast maximum likelihood-based phylogenetic inference.
2005,
Pubmed
Holland,
Classification and nomenclature of all human homeobox genes.
2007,
Pubmed
Kappen,
Evolution of a regulatory gene family: HOM/HOX genes.
1993,
Pubmed
Kriegs,
Evolutionary history of 7SL RNA-derived SINEs in Supraprimates.
2007,
Pubmed
Manak,
A class act: conservation of homeodomain protein functions.
1994,
Pubmed
Murphy,
Molecular phylogenetics and the origins of placental mammals.
2001,
Pubmed
Nishihara,
Retroposon analysis and recent geological data suggest near-simultaneous divergence of the three superorders of mammals.
2009,
Pubmed
Nunes,
Homeobox genes: a molecular link between development and cancer.
2003,
Pubmed
Prasad,
Confirming the phylogeny of mammals by use of large comparative sequence data sets.
2008,
Pubmed
Putnam,
The amphioxus genome and the evolution of the chordate karyotype.
2008,
Pubmed
Ronquist,
MrBayes 3: Bayesian phylogenetic inference under mixed models.
2003,
Pubmed
Saitou,
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
1987,
Pubmed
Schneider,
Support patterns from different outgroups provide a strong phylogenetic signal.
2009,
Pubmed
Takatori,
Comprehensive survey and classification of homeobox genes in the genome of amphioxus, Branchiostoma floridae.
2008,
Pubmed
Tamura,
MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0.
2007,
Pubmed
Thompson,
The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.
1997,
Pubmed
Yang,
PAML 4: phylogenetic analysis by maximum likelihood.
2007,
Pubmed
Zhang,
Positive Darwinian selection after gene duplication in primate ribonuclease genes.
1998,
Pubmed
Zheng,
Integrated pseudogene annotation for human chromosome 22: evidence for transcription.
2005,
Pubmed