Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
???displayArticle.abstract???
Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and approximately 67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. The results of the analysis have been stored in a publicly available database XenDB http://bibiserv.techfak.uni-bielefeld.de/xendb/. A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches.
Figure 3. Identification of chimeric TCs: Matches of at least 100 bp in length were mapped back to the TC sequences to identify the regions that are covered by a match (yellow boxes). If two matches overlap, the region will be extended accordingly. If after the mapping two clearly separated regions remain as shown here, the TC is flagged as potential chimera.
Figure 1. Full length clone selection (top) and TC categories (bottom). ESTs derived from different clones were clustered and assembled. The CAP3 contig was compared to protein databases using BLASTX and FASTY and hits categorized in 4 categories. Class 1 hits had to match the whole protein sequence and start with an ATG in the TC and M in the protein and the hit had to end at a STOP codon. Class 2 hits had to match the whole protein sequence, start with an ATG in the TC and M in the protein. Class 3 had to match the full protein sequence (without further restrictions), class 4 had to cover the protein over almost its full length, allowing the match to start or end maximal 10 ten amino acids after/before the start or end of the protein. Predicted 5' TCs (P5P) had to have enough sequence to fill up the missing 5' end of the protein sequence. Clone selection: Clone A and B were discarded because of missing IMAGE id. Clone 54321 does not span 5' end of protein match. Clone 21345 was selected as most 5' clone fulfilling the requirements.
Figure 2. Comparison of a BLASTX alignment with corresponding full length FASTY alignment, as generated by the Genlight system. Blue boxes in (a) indicate open reading frames, green boxes start and red boxes stop codons, respectively. The assembled TC sequence has a frameshift at position 1150 from frame 1 to 3, generating two distinct HSPs in the BLASTX alignment (b). FASTY clearly corrects this frameshift and generates a full length alignment (c).
Figure 4. Two examples of TCs derived from clones predicted to have a full length insert (P5P). The start positions in the hit suggest that the unmatched amino-terminal protein sequence is not well conserved between X. laevis and the matched organisms, here rabbit (top) and human (bottom), but the open reading frames (blue boxes) indicate that the clones the sequences were derived from do actually contain a full length insert. (Screenshots of the results were generated by the Genlight system.)
Figure 5. Cluster view of the XenDB Web interface. Best FASTY hits to NR protein database, five model organisms and Xenopus proteins are shown on top. Gene Ontologies (GO) are based on best human and mouse IPI hits, functional categories on hits to COG and KOG databases. Below, additional information for each EST in the cluster is shown, such as accession, UniGene and TGI id, clone, cell and tissue type. Clones predicted not to be full length are colored red. Links to CAP3 assembly and TC sequence are provided.
Aaronson,
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.
1996, Pubmed
Aaronson,
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.
1996,
Pubmed
Adams,
Complementary DNA sequencing: expressed sequence tags and human genome project.
1991,
Pubmed
Altmann,
Microarray-based analysis of early development in Xenopus laevis.
2001,
Pubmed
,
Xenbase
Altschul,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997,
Pubmed
Apweiler,
UniProt: the Universal Protein knowledgebase.
2004,
Pubmed
Arima,
Global analysis of RAR-responsive genes in the Xenopus neurula using cDNA microarrays.
2005,
Pubmed
,
Xenbase
Ashburner,
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
2000,
Pubmed
Bartel,
MicroRNAs: genomics, biogenesis, mechanism, and function.
2004,
Pubmed
Bendtsen,
Improved prediction of signal peptides: SignalP 3.0.
2004,
Pubmed
Besemer,
Heuristic approach to deriving models for gene finding.
1999,
Pubmed
Boguski,
ESTablishing a human transcript map.
1995,
Pubmed
Boon,
An anatomy of normal and malignant gene expression.
2002,
Pubmed
Burke,
d2_cluster: a validated method for clustering EST and full-length cDNAsequences.
1999,
Pubmed
Chevreux,
Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs.
2004,
Pubmed
Chow,
Pax6 induces ectopic eyes in a vertebrate.
1999,
Pubmed
,
Xenbase
Christoffels,
STACK: Sequence Tag Alignment and Consensus Knowledgebase.
2001,
Pubmed
Chung,
Screening of FGF target genes in Xenopus by microarray: temporal dissection of the signalling pathway using a chemical inhibitor.
2004,
Pubmed
,
Xenbase
Cox,
Caudalization of neural fate by tissue recombination and bFGF.
1995,
Pubmed
,
Xenbase
Crump,
Exposure to the herbicide acetochlor alters thyroid hormone-dependent gene expression and metamorphosis in Xenopus Laevis.
2002,
Pubmed
,
Xenbase
Edgar,
Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.
2002,
Pubmed
Ewing,
Analysis of expressed sequence tags indicates 35,000 human genes.
2000,
Pubmed
Ewing,
Base-calling of automated sequencer traces using phred. II. Error probabilities.
1998,
Pubmed
Gaiano,
The role of notch in promoting glial and neural stem cell fates.
2002,
Pubmed
Gehring,
Homeodomain proteins.
1994,
Pubmed
Gehring,
The genetic control of eye development and its implications for the evolution of the various eye-types.
2002,
Pubmed
Gehring,
Pax 6: mastering eye morphogenesis and eye evolution.
1999,
Pubmed
Glaser,
Genomic structure, evolutionary conservation and aniridia mutations in the human PAX6 gene.
1992,
Pubmed
Gupta,
Genome wide identification and classification of alternative splicing based on EST data.
2004,
Pubmed
Halder,
Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila.
1995,
Pubmed
Henderson,
Finding genes in DNA with a Hidden Markov Model.
1997,
Pubmed
Hillier,
Generation and analysis of 280,000 human expressed sequence tags.
1996,
Pubmed
Huang,
CAP3: A DNA sequence assembly program.
1999,
Pubmed
International Human Genome Sequencing Consortium,
Finishing the euchromatic sequence of the human genome.
2004,
Pubmed
Isaacs,
Regulation of Hox gene expression and posterior development by the Xenopus caudal homologue Xcad3.
1998,
Pubmed
,
Xenbase
Jurka,
Repbase update: a database and an electronic journal of repetitive elements.
2000,
Pubmed
Kersey,
The International Protein Index: an integrated database for proteomics experiments.
2004,
Pubmed
Klint,
Signal transduction by fibroblast growth factor receptors.
1999,
Pubmed
Komar,
Internal ribosome entry sites in cellular mRNAs: mystery of their existence.
2005,
Pubmed
König,
Reliability of gene expression ratios for cDNA microarrays in multiconditional experiments with a reference design.
2004,
Pubmed
,
Xenbase
Koonin,
A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes.
2004,
Pubmed
Kota,
Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.).
2003,
Pubmed
Krogh,
Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.
2001,
Pubmed
Krüger,
e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences.
2004,
Pubmed
Kuhlbrodt,
Sox10, a novel transcriptional modulator in glial cells.
1998,
Pubmed
Ladd,
Finding signals that regulate alternative splicing in the post-genomic era.
2002,
Pubmed
Lal,
A public database for gene expression in human cancers.
1999,
Pubmed
Lash,
SAGEmap: a public gene expression resource.
2000,
Pubmed
Liang,
An optimized protocol for analysis of EST sequences.
2000,
Pubmed
Lipscombe,
Functional diversity in neuronal voltage-gated calcium channels by alternative splicing of Ca(v)alpha1.
2002,
Pubmed
Mattick,
Non-coding RNAs: the architects of eukaryotic complexity.
2001,
Pubmed
Michaut,
Analysis of the eye developmental pathway in Drosophila using DNA microarrays.
2003,
Pubmed
Mironov,
Frequent alternative splicing of human genes.
1999,
Pubmed
Morey,
Employment opportunities for non-coding RNAs.
2004,
Pubmed
Muñoz-Sanjuán,
Gene profiling during neural induction in Xenopus laevis: regulation of BMP signaling by post-transcriptional mechanisms and TAB3, a novel TAK1-binding protein.
2002,
Pubmed
,
Xenbase
Nekrutenko,
Reconciling the numbers: ESTs versus protein-coding genes.
2004,
Pubmed
Oklü,
The latent transforming growth factor beta binding protein (LTBP) family.
2000,
Pubmed
Pearson,
Comparison of DNA sequences with protein sequences.
1997,
Pubmed
Peiffer,
A Xenopus DNA microarray approach to identify novel direct BMP target genes involved in early embryonic development.
2005,
Pubmed
,
Xenbase
Quackenbush,
The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species.
2001,
Pubmed
Roberts,
Alternative splicing: combinatorial output from the genome.
2002,
Pubmed
Sammut,
The fate of duplicated major histocompatibility complex class Ia genes in a dodecaploid amphibian, Xenopus ruwenzoriensis.
2002,
Pubmed
Schuler,
Pieces of the puzzle: expressed sequence tags and the catalog of human genes.
1997,
Pubmed
Schuler,
A gene map of the human genome.
1996,
Pubmed
Shin,
Identification of neural genes using Xenopus DNA microarrays.
2005,
Pubmed
,
Xenbase
Sonnhammer,
A hidden Markov model for predicting transmembrane helices in protein sequences.
1998,
Pubmed
Stamm,
Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome.
2002,
Pubmed
Strausberg,
The cancer genome anatomy project: online resources to reveal the molecular signatures of cancer.
2002,
Pubmed
Su,
A gene atlas of the mouse and human protein-encoding transcriptomes.
2004,
Pubmed
Tarone,
Integrin function and regulation in development.
2000,
Pubmed
Tran,
Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals.
2002,
Pubmed
,
Xenbase
Useche,
High-throughput identification, database storage and analysis of SNPs in EST sequences.
2001,
Pubmed
Velculescu,
Serial analysis of gene expression.
1995,
Pubmed
Venables,
Alternative splicing in the testes.
2002,
Pubmed
Wang,
EST clustering error evaluation and correction.
2004,
Pubmed
Waterston,
Initial sequencing and comparative analysis of the mouse genome.
2002,
Pubmed
Wheeler,
Database resources of the National Center for Biotechnology Information: update.
2004,
Pubmed
Wheeler,
Database resources of the National Center for Biotechnology.
2003,
Pubmed
Wheeler,
Database resources of the National Center for Biotechnology Information.
2005,
Pubmed
Wright,
The Xenopus XIHbox 6 homeo protein, a marker of posterior neural induction, is expressed in proliferating neurons.
1990,
Pubmed
,
Xenbase
Yelin,
Widespread occurrence of antisense transcription in the human genome.
2003,
Pubmed
Yoshida,
Intermediate filament proteins define different glial subpopulations.
2001,
Pubmed
,
Xenbase
Yoshida,
Glial-defined rhombomere boundaries in developing Xenopus hindbrain.
2000,
Pubmed
,
Xenbase
Zhang,
A greedy algorithm for aligning DNA sequences.
2000,
Pubmed
Zhang,
Computational prediction of eukaryotic protein-coding genes.
2002,
Pubmed