Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins.
Hutchins JR
.
???displayArticle.abstract???
The genomic era has enabled research projects that use approaches including genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrometry-based proteomics to discover genes and proteins involved in biological processes. Such methods generate data sets of gene, transcript, or protein hits that researchers wish to explore to understand their properties and functions and thus their possible roles in biological systems of interest. Recent years have seen a profusion of Internet-based resources to aid this process. This review takes the viewpoint of the curious biologist wishing to explore the properties of protein-coding genes and their products, identified using genome-based technologies. Ten key questions are asked about each hit, addressing functions, phenotypes, expression, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set-wide information retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Results obtained using some of these are discussed in more depth using the p53 tumor suppressor as an example. This flexible and universally applicable approach for characterizing experimental hits helps researchers to maximize the potential of their projects for biological discovery.
FIGURE 1:. Generalized workflow for the analysis of DNA, RNA, or protein samples and questions about the hits identified. Nucleic acid or protein samples isolated from the biological material of interest are processed, then analyzed by various methods. Raw analytical data are then matched to entries in public databases, generating a results table listing the genes, transcripts, or proteins (hits) identified. For each of these hits, 10 questions relating to their features, functions, and other properties are shown (blue boxes). Each question is addressed by a section in the text, plus one or more supplemental tables containing examples of hyperlinks to entries in online resources.
FIGURE 2:. Approaches for obtaining functional information about experimentally identified gene, transcript, or protein hits. Freely available software tools can be used to obtain information about features and functions of genes, transcripts, or proteins in a results table from multiple sources. Generation of an interaction network shows at a glance the nature of any previously reported interactions between members of a set of hits, each of which can be explored using the resources indicated. Making a hyperlinked results table allows one-click access from each hit directly to relevant pages from a wide range of resources. Creating an annotated results table containing controlled-vocabulary terms or keywords from a range of sources allows hits to be classified and sorted on the basis of these terms. Step-by-step protocols for performing these analyses are presented in the Supplemental Materials.
Alberts,
The cell as a collection of protein machines: preparing the next generation of molecular biologists.
1998, Pubmed
Alberts,
The cell as a collection of protein machines: preparing the next generation of molecular biologists.
1998,
Pubmed
Alexander,
Spatial exclusivity combined with positive and negative selection of phosphorylation motifs is the basis for context-dependent mitotic signaling.
2011,
Pubmed
,
Xenbase
Altschul,
Basic local alignment search tool.
1990,
Pubmed
Amberger,
A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®).
2011,
Pubmed
Asplund,
Antibodies for profiling the human proteome-The Human Protein Atlas as a resource for cancer research.
2012,
Pubmed
Barrett,
NCBI GEO: archive for functional genomics data sets--update.
2013,
Pubmed
Becker,
The genetic association database.
2004,
Pubmed
Benson,
GenBank.
2014,
Pubmed
Bento,
The ChEMBL bioactivity database: an update.
2014,
Pubmed
Bhagwat,
Searching NCBI's dbSNP database.
2010,
Pubmed
Blake,
Gene Ontology annotations and resources.
2013,
Pubmed
Boutros,
The art and design of genetic screens: RNA interference.
2008,
Pubmed
Bragin,
DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation.
2014,
Pubmed
Bulusu,
canSAR: updated cancer research and drug discovery knowledgebase.
2014,
Pubmed
Capaldi,
Analysis of gene function using DNA microarrays.
2010,
Pubmed
Chatr-Aryamontri,
The BioGRID interaction database: 2013 update.
2013,
Pubmed
Croft,
The Reactome pathway knowledgebase.
2014,
Pubmed
Davis,
The Comparative Toxicogenomics Database: update 2013.
2013,
Pubmed
de Beer,
PDBsum additions.
2014,
Pubmed
Deribe,
Post-translational modifications in signal integration.
2010,
Pubmed
Dice,
Peptide sequences that target cytosolic proteins for lysosomal proteolysis.
1990,
Pubmed
Dinkel,
Phospho.ELM: a database of phosphorylation sites--update 2011.
2011,
Pubmed
Dinkel,
The eukaryotic linear motif resource ELM: 10 years and counting.
2014,
Pubmed
Dorée,
From Cdc2 to Cdk1: when did the cell cycle kinase join its cyclin partner?
2002,
Pubmed
Eisenhaber,
Prediction of posttranslational modification of proteins from their amino acid sequence.
2010,
Pubmed
Fernández,
iHOP web services.
2007,
Pubmed
Finn,
Pfam: the protein families database.
2014,
Pubmed
Fitch,
Homology a personal view on some of the problems.
2000,
Pubmed
Fleischmann,
IntEnz, the integrated relational enzyme database.
2004,
Pubmed
Flicek,
Ensembl 2014.
2014,
Pubmed
Forbes,
COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.
2011,
Pubmed
Forsburg,
The art and design of genetic screens: yeast.
2001,
Pubmed
Franceschini,
STRING v9.1: protein-protein interaction networks, with increased coverage and integration.
2013,
Pubmed
Gaudet,
neXtProt: organizing protein knowledge in the context of human proteome projects.
2013,
Pubmed
Geer,
CDART: protein homology by domain architecture.
2002,
Pubmed
Geer,
The NCBI BioSystems database.
2010,
Pubmed
Gnad,
PHOSIDA 2011: the posttranslational modification database.
2011,
Pubmed
Good,
Scaffold proteins: hubs for controlling the flow of cellular information.
2011,
Pubmed
,
Xenbase
Griss,
Consequences of the discontinuation of the International Protein Index (IPI) database and its substitution by the UniProtKB "complete proteome" sets.
2011,
Pubmed
Gutmanas,
PDBe: Protein Data Bank in Europe.
2014,
Pubmed
Hayles,
A genome-wide resource of cell cycle and cell shape genes of fission yeast.
2013,
Pubmed
Hedegaard,
Methods for interpreting lists of affected genes obtained in a DNA microarray experiment.
2009,
Pubmed
Herráez,
Biomolecules in the computer: Jmol to the rescue.
2006,
Pubmed
Hibbs,
Exploring the functional landscape of gene expression: directed search of large microarray compendia.
2007,
Pubmed
Hopkins,
The druggable genome.
2002,
Pubmed
Hornbeck,
PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.
2012,
Pubmed
Horowitz,
One-gene-one-enzyme: remembering biochemical genetics.
1995,
Pubmed
Huang,
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.
2009,
Pubmed
Huang,
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.
2009,
Pubmed
Hung,
Gene set enrichment analysis: performance evaluation and usage guidelines.
2012,
Pubmed
Hunter,
InterPro in 2011: new developments in the family and domain prediction database.
2012,
Pubmed
Huntley,
QuickGO: a user tutorial for the web-based Gene Ontology browser.
2009,
Pubmed
Hutchins,
Systematic analysis of human protein complexes identifies chromosome segregation proteins.
2010,
Pubmed
Kanehisa,
Data, information, knowledge and principle: back to metabolism in KEGG.
2014,
Pubmed
Karolchik,
The UCSC Genome Browser database: 2014 update.
2014,
Pubmed
Kersey,
Ensembl Genomes 2013: scaling up access to genome-wide data.
2014,
Pubmed
Kersey,
The International Protein Index: an integrated database for proteomics experiments.
2004,
Pubmed
Kim,
Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe.
2010,
Pubmed
Kirschner,
The meaning of systems biology.
2005,
Pubmed
Kornberg,
The private life of DNA polymerase I.
1990,
Pubmed
Kosuge,
DDBJ progress report: a new submission system for leading to a correct annotation.
2014,
Pubmed
Kouskoumvekaki,
Facilitating the use of large-scale biological data and tools in the era of translational bioinformatics.
2014,
Pubmed
Landrum,
ClinVar: public archive of relationships among sequence variation and human phenotype.
2014,
Pubmed
Lane,
p53-based cancer therapy.
2010,
Pubmed
Lappalainen,
DbVar and DGVa: public archives for genomic structural variation.
2013,
Pubmed
Law,
DrugBank 4.0: shedding new light on drug metabolism.
2014,
Pubmed
Lee,
The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes.
2005,
Pubmed
Lees,
Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis.
2014,
Pubmed
Letunic,
SMART 7: recent updates to the protein domain annotation resource.
2012,
Pubmed
Liebel,
Bioinformatic "Harvester": a search engine for genome-wide human, mouse, and rat protein resources.
2005,
Pubmed
Lipinski,
Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings.
2001,
Pubmed
Lotia,
Cytoscape app store.
2013,
Pubmed
Lu,
PubMed and beyond: a survey of web tools for searching biomedical literature.
2011,
Pubmed
Lu,
DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications.
2013,
Pubmed
Lütjohann,
'Sciencenet'--towards a global search and share engine for all scientific knowledge.
2011,
Pubmed
Madej,
MMDB: 3D structures and macromolecular interactions.
2012,
Pubmed
Marchler-Bauer,
CDD: conserved domains and protein three-dimensional structure.
2013,
Pubmed
Mi,
Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences.
2012,
Pubmed
Mi,
PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees.
2013,
Pubmed
Müller,
Textpresso: an ontology-based information retrieval and extraction system for biological literature.
2004,
Pubmed
NCBI Resource Coordinators,
Database resources of the National Center for Biotechnology Information.
2014,
Pubmed
Neumann,
Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes.
2010,
Pubmed
Niedringhaus,
Landscape of next-generation sequencing technologies.
2011,
Pubmed
Obenauer,
Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs.
2003,
Pubmed
Ooi,
ANNIE: integrated de novo protein sequence annotation.
2009,
Pubmed
Orchard,
The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases.
2014,
Pubmed
Orchard,
Protein interaction data curation: the International Molecular Exchange (IMEx) consortium.
2012,
Pubmed
Ozsolak,
RNA sequencing: advances, challenges and opportunities.
2011,
Pubmed
Pakseresht,
Assembly information services in the European Nucleotide Archive.
2014,
Pubmed
Petryszak,
Expression Atlas update--a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments.
2014,
Pubmed
Pruitt,
RefSeq: an update on mammalian reference sequences.
2014,
Pubmed
Que,
Evaluation of protein phosphorylation site predictors.
2010,
Pubmed
Reardon,
Project ranks billions of drug interactions.
2013,
Pubmed
Rose,
The RCSB Protein Data Bank: new resources for research and education.
2013,
Pubmed
Rosenbloom,
ENCODE data in the UCSC Genome Browser: year 5 update.
2013,
Pubmed
Rustici,
ArrayExpress update--trends in database growth and links to data analysis tools.
2013,
Pubmed
Saito,
A travel guide to Cytoscape plugins.
2012,
Pubmed
Schomburg,
BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA.
2013,
Pubmed
Schreiber,
TreeFam v9: a new website, more species and orthology-on-the-fly.
2014,
Pubmed
Sigrist,
New and continuing developments at PROSITE.
2013,
Pubmed
Sillitoe,
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures.
2013,
Pubmed
Smith,
InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data.
2012,
Pubmed
Smoot,
Cytoscape 2.8: new features for data integration and network visualization.
2011,
Pubmed
Sönnichsen,
Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans.
2005,
Pubmed
Stelzer,
In-silico human genomics with GeneCards.
2011,
Pubmed
Suzek,
UniRef: comprehensive and non-redundant UniProt reference clusters.
2007,
Pubmed
UniProt Consortium,
Activities at the Universal Protein Resource (UniProt).
2014,
Pubmed
Villaveces,
Dasty3, a WEB framework for DAS.
2011,
Pubmed
Walther,
Mass spectrometry-based proteomics in cell biology.
2010,
Pubmed
Wang,
PubChem BioAssay: 2014 update.
2014,
Pubmed
Wang,
PubChem: a public information system for analyzing bioactivities of small molecules.
2009,
Pubmed
Wolfsberg,
A user's guide to the human genome.
2002,
Pubmed
Wood,
PomBase: a comprehensive online resource for fission yeast.
2012,
Pubmed
Yang,
Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.
2013,
Pubmed
Young,
Systems-wide proteomic characterization of combinatorial post-translational modification patterns.
2010,
Pubmed