Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
PLoS Genet
2010 May 20;65:e1000954. doi: 10.1371/journal.pgen.1000954.
Show Gene links
Show Anatomy links
A survey of genomic traces reveals a common sequencing error, RNA editing, and DNA editing.
Zaranek AW
,
Levanon EY
,
Zecharia T
,
Clegg T
,
Church GM
.
???displayArticle.abstract???
While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets.
Figure 1. Evidence for editing events emerges by enrichment for clusters of mismatches.(A) Human traces are mined for clusters of mismatches of the same type. Shown is the percent frequency of clusters by type. The G-to-A mismatch type becomes more dominant with increasing numbers of mismatches (as does T-to-G). (B) Runs of five (or more) mismatches by type and sequencing center with an identical 3bp motif centered on each mismatch. Data from eight sequencing centers is shown. All of these centers had at least 1000 examples that meet the above criteria. (C) Clusters with three (or more) mismatches with at least two very high quality mismatches (Phred 40). A mismatch spectrum consistent with editing can be observed.
Figure 2. G-to-A sequencing artifact.(A) A chromatogram, from a trace matching the criteria in Figure 1B. An AAA motif is centered at position 244 and corresponds with position 90 in the control; another AAA motif occurs at position 253 which corresponds to position 99 in the control. It can be seen that each peak in this chromatogram is preceded by a smaller, identical sub-peak. This has the effect of making it likely that a normally small peak (see control) will be overwhelmed by the sub-peak of the adjacent, normally tall peak (see control). (B) A chromatogram from a control trace that matches the referenceâposition 90 is the center of an AGA motif.
Figure 3. DNA editing in human HERVL-A1.Trace 1735626615 aligns uniquely to chromosome 2 where the known retrotransposon HERVL-A1 is located (chr2: 100697697â100700125). A cluster of 15 G-to-A mismatches (worst mismatch phred 35; best mismatch phred 49) suggests that the trace originates from an edited version of the element. Support for the APOBEC source of the editing comes from the preferred GG-to-AG motif (11 out of the 15 cases) and GA-to-AA (remaining 4 cases) which is the dinucleotide context (in the same order) in an HIV hypermutated genome, and is the sequence motif of APOBEC3G and APOBEC3F [31].
Figure 4. DNA editing in human AluY.Example of possible DNA editing in human chr21:40977741â40978045. Alignment of trace 1745107496 to the human reference genome lead to large number of G-to-A mismatches which are indications for possible DNA editing in this retrotransposon. All the mismatches are located in high quality sequence positions, reducing the possibility of sequence errors.
Figure 5. Evidence for RNA editing in the cDNA traces.(A) While no over-representation of the RNA derived mismatches (A-to-G and its complimentary T-to-C) clusters are observed in the full set of RNA traces in human (nâ=â238,370) and Xenopus tropicalis (nâ=â444,526), (B) significant over-representation of RNA editing type is observed in high quality cDNA sequencing set of human (nâ=â769; p-value 1.5e-119; Fisher's Exact Test.) and Xenopus (nâ=â2,847; p-valueâªe-200). (C) No such over-representation was observed in the set of high quality DNA traces (human: nâ=â64,191; Xenopus: nâ=â3,471). These observations support that RNA editing is the cause of the mismatches in the sets of higher quality cDNA.
Figure 6. ADAR signature in the cDNA edited traces.Significant under-representation of âGâ immediately upstream to the editing sites which is in agreement with the known sequence motif of the ADAR proteins.
Figure 7. RNA editing in Xenopus tropicalis.(A) Evidence for RNA editing can be seen in this locus as multiple traces of RNA origin align to it with numerous A-to-G mismatches. The trace accession numbers and their coordinates are given in the multiple alignment. (B) Predicted RNA structure of the genomic locus indicates a long and stable dsRNA structure which is a favorite target for editing by ADARs. Each editing site from the multiple alignment is marked by an arrow. The length of the arrow corresponds to the editing level.
Athanasiadis,
Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome.
2004, Pubmed
Athanasiadis,
Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome.
2004,
Pubmed
Bass,
RNA editing by adenosine deaminases that act on RNA.
2002,
Pubmed
Bass,
A developmentally regulated activity that unwinds RNA duplexes.
1987,
Pubmed
,
Xenbase
Bentley,
Accurate whole human genome sequencing using reversible terminator chemistry.
2008,
Pubmed
Blow,
A survey of RNA editing in human brain.
2004,
Pubmed
Chiu,
The APOBEC3 cytidine deaminases: an innate defensive network opposing exogenous retroviruses and endogenous retroelements.
2008,
Pubmed
Chiu,
High-molecular-mass APOBEC3G complexes restrict Alu retrotransposition.
2006,
Pubmed
Conticello,
The AID/APOBEC family of nucleic acid mutators.
2008,
Pubmed
Eisenberg,
Is abundant A-to-I RNA editing primate-specific?
2005,
Pubmed
Esnault,
APOBEC3G cytidine deaminase inhibits retrotransposition of endogenous retroviruses.
2005,
Pubmed
Ewing,
Base-calling of automated sequencer traces using phred. II. Error probabilities.
1998,
Pubmed
Ewing,
Base-calling of automated sequencer traces using phred. I. Accuracy assessment.
1998,
Pubmed
Harris,
RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators.
2002,
Pubmed
Harris,
DNA deamination mediates innate immunity to retroviral infection.
2003,
Pubmed
Hillier,
Generation and analysis of 280,000 human expressed sequence tags.
1996,
Pubmed
Hurst,
Deamination of mammalian glutamate receptor RNA by Xenopus dsRNA adenosine deaminase: similarities to in vivo RNA editing.
1995,
Pubmed
,
Xenbase
Jarmuz,
An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22.
2002,
Pubmed
Keegan,
Adenosine deaminases acting on RNA (ADARs): RNA-editing enzymes.
2004,
Pubmed
Kent,
The human genome browser at UCSC.
2002,
Pubmed
Kim,
Molecular cloning of cDNA for double-stranded RNA adenosine deaminase, a candidate enzyme for nuclear RNA editing.
1994,
Pubmed
Kim,
Widespread RNA editing of embedded alu elements in the human transcriptome.
2004,
Pubmed
Kimelman,
An antisense mRNA directs the covalent modification of the transcript encoding fibroblast growth factor in Xenopus oocytes.
1989,
Pubmed
,
Xenbase
Lee,
Hypermutation of an ancient human retrovirus by APOBEC3G.
2008,
Pubmed
Lehmann,
Double-stranded RNA adenosine deaminases ADAR1 and ADAR2 have overlapping specificities.
2000,
Pubmed
,
Xenbase
Lellek,
Purification and molecular cloning of a novel essential component of the apolipoprotein B mRNA editing enzyme-complex.
2000,
Pubmed
Levanon,
Systematic identification of abundant A-to-I editing sites in the human transcriptome.
2004,
Pubmed
Li,
Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing.
2009,
Pubmed
Maas,
A-to-I RNA editing and human disease.
2006,
Pubmed
Mangeat,
Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts.
2003,
Pubmed
Mariani,
Species-specific exclusion of APOBEC3G from HIV-1 virions by Vif.
2003,
Pubmed
McKernan,
Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.
2009,
Pubmed
Mehta,
Molecular cloning of apobec-1 complementation factor, a novel RNA-binding protein involved in the editing of apolipoprotein B mRNA.
2000,
Pubmed
Melcher,
A mammalian RNA editing enzyme.
1996,
Pubmed
Muramatsu,
Specific expression of activation-induced cytidine deaminase (AID), a novel member of the RNA-editing deaminase family in germinal center B cells.
1999,
Pubmed
Muramatsu,
Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme.
2000,
Pubmed
Navaratnam,
The p27 catalytic subunit of the apolipoprotein B mRNA editing enzyme is a cytidine deaminase.
1993,
Pubmed
,
Xenbase
Neeman,
RNA editing level in the mouse is determined by the genomic repeat repertoire.
2006,
Pubmed
O'Connell,
Cloning of cDNAs encoding mammalian double-stranded RNA-specific adenosine deaminase.
1995,
Pubmed
Revy,
Activation-induced cytidine deaminase (AID) deficiency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2).
2000,
Pubmed
Saccomanno,
A minor fraction of basic fibroblast growth factor mRNA is deaminated in Xenopus stage VI and matured oocytes.
1999,
Pubmed
,
Xenbase
Scadden,
Inosine-containing dsRNA binds a stress-granule-like complex and downregulates gene expression in trans.
2007,
Pubmed
Sheehy,
Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein.
2002,
Pubmed
Teng,
Molecular cloning of an apolipoprotein B messenger RNA editing protein.
1993,
Pubmed
Tuzun,
Fine-scale structural variation of the human genome.
2005,
Pubmed
Vartanian,
Evidence for editing of human papillomavirus DNA by APOBEC3 in benign and precancerous lesions.
2008,
Pubmed
Wedekind,
Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business.
2003,
Pubmed
Wheeler,
Database resources of the National Center for Biotechnology Information.
2008,
Pubmed
Wong,
Substrate recognition by ADAR1 and ADAR2.
2001,
Pubmed
Yu,
Single-strand specificity of APOBEC3G accounts for minus-strand deamination of the HIV genome.
2004,
Pubmed
Zaranek,
Free Factories: Unified Infrastructure for Data Intensive Web Services.
2008,
Pubmed
Zhang,
A greedy algorithm for aligning DNA sequences.
2000,
Pubmed