Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
???displayArticle.abstract???
BACKGROUND: Mass spectrometry-based proteomics enables the global identification and quantification of proteins and their posttranslational modifications in complex biological samples. However, proteomic analysis requires a complete and accurate reference set of proteins and is therefore largely restricted to model organisms with sequenced genomes.
RESULTS: Here, we demonstrate the feasibility of deep genome-free proteomics by using a reference proteome derived from heterogeneous mRNA data. We identify more than 11,000 proteins with 99% confidence from the unfertilized Xenopus laevis egg and estimate protein abundance with approximately 2-fold precision. Our reference database outperforms the provisional gene models based on genomic DNA sequencing and references generated by other methods. Surprisingly, we find that many proteins in the egg lack mRNA support and that many of these proteins are found in blood or liver, suggesting that they are taken up from the blood plasma, together with yolk, during oocyte growth and maturation, potentially contributing to early embryogenesis.
CONCLUSION: To facilitate proteomics in nonmodel organisms, we make our platform available as an online resource that converts heterogeneous mRNA data into a protein reference set. Thus, we demonstrate the feasibility and power of genome-free proteomics while shedding new light on embryogenesis in vertebrates.
Figure 1. MS Data Can Be Used to Evaluate Relative Reference Database
Quality
Spectra from a tryptic digest of yeast lysate were searched against the standard
yeast protein database (Full DB). Shown are the number of total peptide
spectral matches (blue), unique peptides (orange), or proteins (black)
that were confidently identified. To simulate poor reference databases, we
removed half (Half DB) or three-quarters of proteins (Quarter DB) from the
reference database. The number of identified PSMs and unique peptides
scale approximately with the number of proteins in the database. To
test how the addition of nonsense sequences would affect the number of
identified peptides, we added randomized human proteins to the full yeast
database (Full DB + Nonsense). The numbers of peptides and proteins are
negatively affected. To simulate a reference database in which proteins
are fragmented, we divided at a random position every protein in the reference
into two proteins. Whereas the number of identified peptides slightly
decreases, the number of identified proteins substantially increases.
Figure 2. Overview of the Steps for Constructing the High-Quality Protein Reference Set PHROG
Transcripts from four different sources were combined, trimmed and cleaned using SeqClean, masked using RepeatMasker, and clustered and assembled
using TGICL/CAP3. The assembled transcripts were aligned against a collection of model vertebrate proteins using BLASTX. The results were used for identifying
the correct translation frame, for frameshift correction (if appropriate), and for removing sequences without significant similarity to known proteins.
Once translated using BioPerl, the longest peptide for each protein is identified, and the ends are trimmed to match tryptic peptides. The collection is processed
to remove 100% redundant proteins using CD-HIT, and gene symbols are assigned to the remaining members using the reciprocal or single best
BLAST hit against human proteins. The numbers indicate the numbers of transcripts or proteins in each group.
Figure 3. Comparison of Protein Reference Databases for the Fractionated
X. laevis Egg Sample
(A) Number of unique peptides identified with 0.5% FDR on the peptide
level. PHROG significantly outperforms the publically available proteins
from Xenbase and even the preliminary gene models from the 7.0 genome
assembly as reference database.
(B) Comparison of the number of proteins identified in the egg, with additional
filtering to 1% FDR at the protein level and maximal parsimony.
Figure 4. Estimation of Protein Abundance in the Xenopus Egg
(A) Previously published protein concentrations for 49 proteins versus measured ion current in MS1 spectrum normalized to protein length. The Pearson
correlation is 0.92. On average, the predicted protein concentration is approximately 2-fold different from the reported protein concentration.
(B) Histogram of concentration for all identified proteins regressed from normalizedMS1 ion current.Median concentration of measured proteins is approximately
30 nM.
(C) Estimated concentration for subunits of stable complexes is similar. For the APC/C, we additionally distinguished between subunits that were reported
to be dimeric (square) or monomeric (triangle) within the complex. Although our accuracy is not good enough to separate the two populations, the estimated
concentrations for dimeric subunits tend to be higher than those for monomeric subunits.
(D) Concentrations for enzymes of a metabolic pathway can vary widely. For each metabolic pathway, the predicted concentrations of its members are
plotted (based on the Kyoto Encyclopedia of Genes and Genomes).
Figure 5. mRNA and Protein Abundance
(A) Histogram of mRNA levels in the egg. mRNA for which the protein was also detected is colored blue. Orange indicates that only mRNA was detected. The
median of mRNA concentration is approximately 1,000-fold lower than the median for protein abundance. Although we see only a weak correlation between
mRNA and protein abundance (0.32 Pearson correlation), the lower the mRNA concentration, the less likely we are to detect the corresponding protein.
(B) mRNA and protein were matched via assigned gene symbols. MS is able to identify approximately 60% of all gene symbols for which we could detect
mRNA. The proteins that we cannot detect via MS are overrepresented by transcription factors, proteins involved in differentiation, and transmembrane
proteins. On the contrary, for w350 gene symbols, we could identify only proteins, but not mRNA. This group is highly enriched for blood plasma and liver
proteins and was likely endocytosed during oocyte growth.
Abzhanov,
Bmp4 and morphological variation of beaks in Darwin's finches.
2004, Pubmed
Abzhanov,
Bmp4 and morphological variation of beaks in Darwin's finches.
2004,
Pubmed
Altschul,
Basic local alignment search tool.
1990,
Pubmed
Arike,
Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli.
2012,
Pubmed
Beck,
The quantitative proteome of a human cell line.
2011,
Pubmed
Cox,
MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.
2008,
Pubmed
Desai,
The use of Xenopus egg extracts to study mitotic spindle assembly and function in vitro.
1999,
Pubmed
,
Xenbase
di Prisco,
Tracking the evolutionary loss of hemoglobin expression by the white-blooded Antarctic icefishes.
2002,
Pubmed
Elias,
Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.
2007,
Pubmed
Evans,
De novo derivation of proteomes from transcriptomes for transcript and protein identification.
2012,
Pubmed
Fusaro,
Prediction of high-responding peptides for targeted protein assays by mass spectrometry.
2009,
Pubmed
Goffeau,
Life with 6000 genes.
1996,
Pubmed
Huang,
CAP3: A DNA sequence assembly program.
1999,
Pubmed
Huang,
Ultrasensitivity in the mitogen-activated protein kinase cascade.
1996,
Pubmed
,
Xenbase
Hughes,
Evolution of duplicate genes in a tetraploid animal, Xenopus laevis.
1993,
Pubmed
,
Xenbase
Huttlin,
A tissue-specific atlas of mouse protein phosphorylation and expression.
2010,
Pubmed
Kragl,
Cells keep a memory of their tissue origin during axolotl limb regeneration.
2009,
Pubmed
Lawo,
HAUS, the 8-subunit human Augmin complex, regulates centrosome and spindle integrity.
2009,
Pubmed
Lee,
The roles of APC and Axin derived from experimental and theoretical analysis of the Wnt pathway.
2003,
Pubmed
,
Xenbase
Li,
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.
2006,
Pubmed
Lohse,
RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics.
2012,
Pubmed
Looso,
Advanced identification of proteins in uncharacterized proteomes by pulsed in vivo stable isotope labeling-based mass spectrometry.
2010,
Pubmed
Low,
Quantitative and qualitative proteome characteristics extracted from in-depth integrated genomics and proteomics analysis.
2013,
Pubmed
Makarov,
Dynamics of ions of intact proteins in the Orbitrap mass analyzer.
2009,
Pubmed
McGrath,
Genome diversity in microbial eukaryotes.
2004,
Pubmed
Menschaert,
Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events.
2013,
Pubmed
Mills,
An acidic protein which assembles nucleosomes in vitro is the most abundant protein in Xenopus oocyte nuclei.
1980,
Pubmed
,
Xenbase
Murray,
Cyclin synthesis drives the early embryonic cell cycle.
1989,
Pubmed
,
Xenbase
Nagaraj,
Deep proteome and transcriptome mapping of a human cancer cell line.
2011,
Pubmed
Nesvizhskii,
A statistical model for identifying proteins by tandem mass spectrometry.
2003,
Pubmed
Newport,
A major developmental transition in early Xenopus embryos: II. Control of the onset of transcription.
1982,
Pubmed
,
Xenbase
Opresko,
Differential postendocytotic compartmentation in Xenopus oocytes is mediated by a specifically bound ligand.
1980,
Pubmed
,
Xenbase
Opresko,
Specific proteolysis regulates fusion between endocytic compartments in Xenopus oocytes.
1987,
Pubmed
,
Xenbase
Pertea,
TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets.
2003,
Pubmed
Quackenbush,
The TIGR gene indices: reconstruction and representation of expressed gene sequences.
2000,
Pubmed
Ruepp,
CORUM: the comprehensive resource of mammalian protein complexes--2009.
2010,
Pubmed
Schwanhäusser,
Global quantification of mammalian gene expression control.
2011,
Pubmed
Straus,
Comparative DNA renaturation kinetics in amphibians.
1971,
Pubmed
Tian,
Integrated genomic and proteomic analyses of gene expression in Mammalian cells.
2004,
Pubmed
Vizcaíno,
ProteomeXchange provides globally coordinated proteomics data submission and dissemination.
2014,
Pubmed
Vogel,
Label-free protein quantitation using weighted spectral counting.
2012,
Pubmed
Wallace,
Studies on amphibian yolk. 8. The estrogen-induced hepatic synthesis of a serum lipophosphoprotein and its selective uptake by the ovary and trasformation into yolk platelet proteins in Xenopus laevis.
1969,
Pubmed
,
Xenbase
Wang,
WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013.
2013,
Pubmed
Wang,
RNA-Seq: a revolutionary tool for transcriptomics.
2009,
Pubmed
Wernersson,
Virtual Ribosome--a comprehensive DNA translation tool with support for integration of sequence feature annotation.
2006,
Pubmed
Wessel,
A method for the quantitative recovery of protein in dilute solution in the presence of detergents and lipids.
1984,
Pubmed
Wühr,
Accurate multiplexed proteomics at the MS2 level using the complement reporter ion cluster.
2012,
Pubmed
,
Xenbase
Wühr,
A model for cleavage plane determination in early amphibian and fish embryos.
2010,
Pubmed
,
Xenbase
Yanai,
Mapping gene expression in two Xenopus species: evolutionary constraints and developmental flexibility.
2011,
Pubmed
,
Xenbase
Zhang,
Recombinant expression, reconstitution and structure of human anaphase-promoting complex (APC/C).
2013,
Pubmed