|
Fig. 1. Number of unique gene-phenotype associations, identification of phenologs, and the example of a worm model of breast cancer. (A) The rate of associating genes to organism-level phenotypes in model organisms greatly exceeds that in humans (data from refs. 8â11, 14). Thus, appropriate mapping of model organism phenotypes to human diseases could significantly accelerate discovery of human disease gene associations. Orthologous phenotypes (phenologs) offer one such approach. (B) Phenologs can be identified based on significantly overlapping sets of orthologous genes (gene A is orthologous to A', B to B', etc.), such that each gene in a given set (green box or cyan box) gives rise to the same phenotype in that organism. The phenotypes may differ in appearance between organisms because of differing organismal contexts. As gene-phenotype associations are often incompletely mapped, genes currently linked to only one of the orthologous phenotypes become candidate genes for the other phenotype; that is, the gene A' is a new candidate for phenotype 2. (C) An example of a phenolog mapping high incidence of male C. elegans progeny to human breast/ovarian cancers (details in text).
|
|
Fig. 2. Systematic identification of phenologs. (A) For a pair of organisms, sets of genes known to be associated with mutational phenotypes are assembled, considering only orthologous genes between the two organisms. Pairs of mutational phenotypesâone phenotype from each organism, each associated with a set of genesâare then compared to determine the extent of overlap of the associated gene sets, calculating the significance of overlap by the hypergeo-metric probability. Comparison of the distribution of observed probabilities with those derived from the same analysis following permutation of gene-phenotype associations reveals that many more orthologous phenotypes are observed than expected by random chance, as shown in B for the case of the human-yeast comparison (also Fig. S1), and summarized for each organism pair in C.
|
|
Fig. 3. Example of a nonobvious disease model revealed by phenologs: a yeast model of angiogenesis. (A) The sets of 8 genes (considering only mouse/yeast orthologs) associated with mouse angiogenesis defects and 67 genes associated with yeast hypersensitivity to the hypercholesterolemia drug lovastatin significantly overlap, suggesting that the yeast gene set may predict angiogenesis genes. This prediction was verified in Xenopus embryos for eight genes (three from literature support and five based upon vascular expression patterns) (Fig. S3) and studied in detail for the case of the transcription factor sox13. (B) sox13 is expressed in developing Xenopus vasculature, as measured by in situ hybridization (also Fig. S4). (C) Morpholino (MO) knockdown of sox13 induces defects in vasculature, measured using in situ hybridization versus the vasculature markers erg (defects observed in 31 of 49 animals tested) or agtrl1 (12 of 19 animals tested) (Fig S5). Such defects are rare in untreated control animals and five base pair mismatch morpholino (MM) knockdowns (0 of 22 control animals tested with agtrl1, 2 of 46 tested with erg; 5 of 28 MM animals tested with erg). (D) Hemorrhaging (white arrows) is apparent in stage 45 Xenopus embryos because of dysfunctional vasculature following sox13 morpholino knockdown (12 of 50 animals tested; two also showed unusually small hearts with defective morphology; Right: magnification of yellow boxed region in Middle), but is rare in control animals (1 of 45 tested untreated animals, 1 of 22 sox13-MM knockdown animals tested). All phenotypes in Figs. 3 and 4 are significantly different from controls by Ï2 tests (P < 0.001). (E) In an in vitro human umbilical vein endothelial cell model of angiogenesis, knockdown of human SOX13 by siRNA disrupts tube formation (an in vitro model for capillary formation) to an extent comparable to knockdown of a known effector of angiogenesis (HOXA9) and significantly more than untreated cells or cells transfected with an off-target (scrambled) negative control siRNA. (Scale bar, 100 μm.)
|
|
Fig. 4. Phenologs reveal plant models of human disease, including a model of Waardenburg syndrome (WS) neural crest defects. (A) Many orthologous phenotypes are observed between Arabidopsis and worms, yeast, mouse, and humans, with hundreds more than expected by chance. Many mammalian/plant phenologs relate to vertebrate developmental defects, including models for WS and other birth defects. (B) Considering only human/Arabidopsis orthologs, the three known WS genes significantly overlap the five genes associated with negative gravitropism defects in Arabidopsis. The plant gene set suggests unique candidate WS genes. (C) In situ hybridization versus candidate sec23ip in developing Xenopus embyros confirms neural crest cell expression. (D) Unilateral morpholino knockdown of sec23ip induces (E) defects in neural crest cell migration on the side with the knockdown (E'') but not the control side (E'), measured using in situ hybridization versus two independent markers of neural crest cells, snai2-a (defects observed in 23 of 35 animals tested) and twist (8 of 14 animals tested) (Fig S7). Such defects are rare in untreated control animals and off-target morpholino (OM) knockdowns (0 of 21 control animals tested with snai2-a; 1 of 14 OM animals tested with snai2-a; 0 of 14 OM animals tested with twist).
|
|
Fig. S1. Enrichment for phenologs above random expectation can be seen following all pair-wise comparisons of the mutational phenotypes from mouse,
human, yeast, or worm. Histograms are plotted as in Fig. 2B.
|
|
Fig. S2. Ten-fold cross-validated tests show strong disease gene prediction by single phenologs for approx. one-sixth to one-fifth of tested diseases; simple
weighted combinations of phenologs (e.g., evaluating the k = 40 best phenologs) provide strong predictability for approximately one-third to one-half of the
tested diseases. Predictability is measured as the area under a receiver-operater characteristic (ROC) curve as described in SI Materials and Methods and
evaluated separately for each human genetic disease with ⥠2 associated genes. An area under the ROC curve (AUC) of 1 indicates perfect prediction of known
disease genes in a cross-validated test; an AUC of 0.5 indicates performance no better than chance. Error bars indicate first quartile, median, and third quartile
of predictions of shuffled disease gene sets from the k = 1 test; score distributions from shuffling tests are similar for both k = 1 and k = 40 and center around
AUC = 0.5, as expected by chance. OMIM, Online Mendelian Inheritance in Man.
|
|
Fig. S3. In situ hybridization shows vascular expression of four candidate angiogenesis genes in NF stage 32 Xenopus embryos.
|
|
Fig. S4. (AâC) In situ hybridization shows sox13 expression in veins and developing heart of a stage 32 Xenopus embryo.
PCV : posterior cardinal vein.
|
|
Fig. S5. Morpholino (MO) knockdown of sox13 induces defects in vasculature, measured using in situ hybridization versus an independent marker of the vasculature, the angiotensin receptor homolog agtrl1 (12 of 19 animals tested). Such defects are rare in untreated control animals (0 of 22 control animals
tested with agtrl1).
|
|
Fig. S6. Enrichment for phenologs above random expectation can be seen following all pair-wise comparisons of Arabidopsis phenotypes with those from mouse, human, yeast, or worm. Histograms are plotted as in Fig. 2B and Fig. S1.
|
|
Fig. S7. Morpholino (MO) knockdown of sec23ip induces defects in neural crest cell migration, measured using in situ hybridization versus twist, an independent marker of the neural crest cells (8 of 14 animals tested). Such defects are rare in untreated control animals (0 of 14 control animals tested with twist).
|
|
Fig. S8. Genes involved in phenologs show enhanced interconnectivity in a gene network, shown here for yeast genes (10). All significant yeast-worm
phenologs with at least four orthologs in both the âintersectionâ and ânonintersectionâ sets (SI Materials and Methods) were tested for network connectivity,
measured as the area under a ROC plot as described in ref. 2, with values ranging from 0.5 (random network connectivity) to 1 (high network connectivity).
Genes from phenolog intersections show significantly higher network connectivity than genes associated with a phenolog, but outside of the intersection,
which in turn show significantly higher connectivity than size-matched random gene sets. Thus, phenologs capture subnetworks or network modules informative
about a given phenotype pair, and carry predictive value for additional genes relevant to the phenotypes. At the left of each box-and-whisker plot,
the center of the blue diamond indicates the mean AUC across phenologs, the top and bottom of the diamond indicate the 95% confidence interval, and the
accompanying solid vertical line indicates ± 2 SD. The bottom, middle, and top horizontal lines of the box-and-whisker plots represent the first quartile, the
median, and the third quartile of AUCs, respectively; whiskers indicate 1.5 times the interquartile range. Red plus signs represent individual outliers.
|
|
Fig. S9. To rule out the possibility that phenolog intersections arise predominantly from âdeep paralogs,â we measured the pair-wise BLAST E-values between
genes in phenolog intersections (I) and genes in the same phenotypes but not the phenolog intersections (the differential gene sets D1 or D2), comparing the Evalue
distributions on a species-by-species basis for each species pair (see SI Materials and Methods for details). Distributions are plotted above; box plots
represent first quartile, median, and third quartile, whiskers 1.5 Ã interquartile range, and stars represent outliers >3 IQR [thus, the majority of pair-wise
sequence comparisons show no significant similarity with âlog10(Eval) = 3]. In general, genes in phenolog intersections were no more likely to encode similar
protein sequences than genes in the phenologs but outside the intersections (i.e., associated with the phenotype in only one species), indicating that deep
paralogy is not a dominant factor in identifying phenologs. Across 20 such comparisons (10 species pairs, performing tests on a per species basis for each
comparison), in 14 cases gene pairs in the I sets showed less significant BLAST E-values than those in the D sets (one-tailed P < 0.0001 for each; Wilcoxon-Mann-
Whitney); in 3 cases gene pairs in the I sets showed more significant BLAST E-values than in the D sets (1 tailed P < 0.0001, P < 0.02, P < 0.03); and in 3 cases the
sets were not significantly biased in either direction.
|
|
hmha1 (histocompatibility (minor) HA-1) gene expression in Xenopus laevis embryos, NF stage 32, as assayed by in situ hybridization. Lateral view: anterior left, dorsal up.
|
|
rab11b.1 (RAB11B, member RAS oncogene family, gene 1) gene expression in Xenopus laevis embryos, NF stage 32, as assayed by in situ hybridization, lateral view, anterior left, dorsal up.
|
|
sec23ip (SEC23 interacting protein) gene expression in Xenopus laevis embryos, NF stage 26, as assayed by in situ hybridization. Lateral view: anterior left, dorsal up.
|
|
tbl1xr1 (transducin (beta)-like 1 X-linked receptor 1)gene expression in Xenopus laevis embryos, NF stage 32, as assayed by in situ hybridization, lateral view, anterior left, dorsal up.
|
|
tcea1 (transcription elongation factor A (SII), 1) gene expression in Xenopus laevis embryos, NF stage 32, as assayed by in situ hybridization, lateral view, anterior left, dorsal up.
|