XB-ART-58060
Elife
2021 May 04;10. doi: 10.7554/eLife.66747.
Show Gene links
Show Anatomy links
Mapping single-cell atlases throughout Metazoa unravels cell type evolution.
Tarashansky AJ
,
Musser JM
,
Khariton M
,
Li P
,
Arendt D
,
Quake SR
,
Wang B
.
???displayArticle.abstract???
Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning sponge to mouse, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.
???displayArticle.pubmedLink??? 33944782
???displayArticle.pmcLink??? PMC8139856
???displayArticle.link??? Elife
???displayArticle.grants??? [+]
Beckman Young Investigator Award Arnold and Mabel Beckman Foundation, 1R35GM138061 NIH HHS , R35 GM138061 NIGMS NIH HHS , 788921 European Commission
Species referenced: Xenopus Xenopus tropicalis
Genes referenced: foxa1 gcm1 ids klf17 myb myod1 sox10 sox9 tcf4 wnt11 xbp1
GO keywords: embryo development
???attribute.lit??? ???displayArticles.show???
Figure 1 with 1 supplement SAMap addresses challenges in mapping cell atlases of distantly related species. (A) Schematic showing the phylogenetic relationships among seven species analyzed. (B) Challenges in mapping single-cell transcriptomes. Gene duplications cause large numbers of homologs per gene, determined by reciprocal BLAST (cut-off: E-value <10−6), and frequent gene losses and the acquisition of new genes result in large fractions of transcriptomes lacking homology, which limits the amount of information comparable across species. (C) SAMap workflow. Homologous gene pairs initially weighted by protein sequence similarity are used to align the manifolds, low dimensional representations of the cell atlases. Gene-gene correlations calculated from the aligned manifolds are used to update the edge weights in the bipartite graph, which are then used to improve manifold alignment. (D) Mutual nearest neighborhoods improve the detection of cross-species mutual nearest neighbors by connecting cells that target one other’s within-species neighborhoods. (E) Convergence of SAMap is evaluated by the root mean square error (RMSE) of the alignment scores between mapped clusters in adjacent iterations for all 21 pairwise comparisons of the seven species. | |
Figure 1—figure supplement 1 Scalability of SAMap. The runtime (A) and memory usage (B) of all mappings performed in this study are plotted versus the total number of cells from both datasets. For this study, SAMap was run on a standard desktop computer running Ubuntu 18.04, with an 8-core i7 Intel processor and 64 Gb of RAM. | |
Figure 2 with 1 supplement SAMap successfully maps D. rerio and X. tropicalis atlases. (A) UMAP projection of the combined zebrafish (yellow) and Xenopus (blue) manifolds, with example cell types circled. (B) Sankey plot summarizing the cell type mappings. Edges with alignment score <0.1 are omitted. Edges that connect developmentally distinct secretory cell types are highlighted in black, with connections across germ layers highlighted in red. (C) Heatmaps of alignment scores between developmental time points for ionocyte, forebrain/midbrain, placodal, and neural crest lineages. X-axis: Xenopus. Y-axis: zebrafish. (D) Expressions of orthologous gene pairs linked by SAMap are overlaid on the combined UMAP projection. Expressing cells are color-coded by species, with those connected across species colored cyan. Cells with no expression are shown in gray. The mapped secretory cell types are highlighted with circles. (E) SAMap alignment scores compared to those of benchmarking methods using one-to-one vertebrate orthologs as input. Each dot represents a cell type pair supported by ontogeny annotations. | |
Figure 2—figure supplement 1 Existing methods failed to map D. rerio and X. tropicalis atlases. (A) UMAP projections of the integration results from SAMap using the full homology graph, compared to LIGER, BBKNN, Scanorama, Seurat, Harmony, and SAMap using 1–1 orthologs. For fair comparisons, all methods were run on the D. rerio and X. torpicalis atlases subsampled to approximately 15,000 cells to satisfy the computational constraints of Seurat and LIGER. (B) Histograms of alignment scores between individual cells. | |
Figure 3 with 1 supplement SAMap reveals prevalent paralog substitutions in frog and zebrafish. (A) Expression of orthologous (top) and paralogous (bottom) gene pairs overlaid on the combined UMAP projection. Expressing cells are color-coded by species, with those that are connected across species colored cyan. Cells with no expression are shown in gray. Paralogs are ordered by the evolutionary time when they are inferred to have duplicated. (B) Paralog substitution scores of all cell types. The substitution score counts the number of substituting paralogs that are differentially expressed in a particular cell type while normalizing for the number of differentially expressed genes in a cell type and the number of paralogs of a gene (see Materials and methods). (C) The percentage of paralogs from each phylogenetic age that were substituted for orthologs in frog or zebrafish lineages. | |
Figure 3—figure supplement 1 Paralog substitution analysis yields similar results using the SAMap manifold constructed from one-to-one orthologs. Comparison of substitution rates for different paralog ages (A) and cell type substitution scores (B) calculated from the original frog-zebrafish manifold versus the manifold generated using only one-to-one orthologs. (C) Histogram showing the distribution of correlation differences for paralog substitutions specific to the original (teal) and one-to-one ortholog based analyses (orange), along with those identified in both mappings (blue). Note that the majority of substitution events, especially in the large correlation difference regime, are present in both mapped manifolds. | |
Figure 4 with 2 supplements SAMap transfers cell type information from a well-annotated organism (planarian S. mediterranea) to its less-studied cousin (schistosome S. mansoni) and identifies parallel stem cell compartments. (A) UMAP projection of the combined manifolds. Tissue type annotations are adopted from the S. mediterranea atlas (Fincher et al., 2018). The schistosome atlas was collected from juvenile worms, which we found to contain neoblasts with an abundance comparable to that of planarian neoblasts (Li et al., 2021). (B) Overlapping expressions of selected tissue-specific TFs with expressing cell types circled. (C) UMAP projection of the aligned manifolds showing planarian and schistosome stem cells, with homologous subpopulations circled. Planarian neoblast data is from Zeng et al., 2018, and cNeoblasts correspond to the Nb2 population, which are pluripotent cells that can rescue neoblast-depleted planarians in transplantation experiments. (D) Distributions of conserved TF expressions in each neoblast subpopulation. Expression values are k-nearest-neighbor averaged and standardized, with negative values set to zero. Blue: planarian; yellow: schistosome. | |
Figure 4—figure supplement 1 SAMap-linked gene pairs that are enriched in cell type pairs between S. mediterranea and S. mansoni. (A) Rows: linked cell types. Schistosome cell types correspond to Leiden clusters. Columns: genes linked by SAMap with overlapping eukaryotic eggNOG orthology groups. We calculate the average standardized expression of each gene in an orthology group for its corresponding cell type in a particular pair and report the highest expression. A selected set of orthology groups corresponding to transcriptional regulators are labeled. (B) Fluorescence in situ hybridization shows the co-expression of wnt11 (Smp_156540) and a panel of muscle markers (collagen, troponin, myosin and tropomyosin) in S. mansoni juveniles. The body wall muscles are expected to be located close to the parasite surface (dashed outline). The images are maximum intensity projections constructed from ~10 confocal slices with optimal axial spacing recommended by the Zen software collected on a Zeiss LSM 800 confocal microscope using a 40× (N.A. = 1.1, working distance = 0.62 mm) water-immersion objective (LD C-Apochromat Corr M27). (C) Whole mount in situ hybridization images showing that the expression of wnt11 and frizzled (Smp_174350) are concentrated in the parasite tail (arrows) with decreasing gradients extending anteriorly. In planarian muscles, Wnt genes provide the positional cues for setting up the body plan during regeneration (Scimone et al., 2017; Reddien, 2018). The presence of an anterior-posterior expression gradient of wnt11 and frizzled in muscles of schistosome juveniles suggests that they may have similar functional roles in patterning during development. | |
Figure 4—figure supplement 2 Schistosome muscle progenitors express canonical muscle markers. UMAP projections of schistosome stem cells with gene expressions overlaid. μ and μ’ cells are circled. Colormap: expression in units of log2(D+1) . For visualization, expression was smoothed via nearest-neighbor averaging using SAM. Note that myod1 and cabp are expressed in both presumptive muscle progenitor populations, whereas all other markers are enriched in μ’ cells. All genes displayed are also expressed in fully differentiated muscle tissues. | |
Figure 5 with 2 supplements Mapping evolutionarily distant species identifies densely connected cell type groups. (A) Schematic illustrating edge (left) and node (right) transitivities, defined as the fraction of triads (set of three connected nodes) in closed triangles. (B) The percentage of cell type pairs that are topologically equivalent to the green edge in each illustrated motif. (C) Network graphs showing highly connected cell type families. Each node represents a cell type, color-coded by species (detailed annotations are provided in Supplementary file 7). Mapped cell types are connected with an edge. (D) Boxplot showing the median and interquartile ranges of node transitivities for highly connected cell type groups. For all box plots, the whiskers denote the maximum and minimum observations. The average node transitivity per group is compared to a bootstrapped null transitivity distribution, generated by repeatedly sampling subsets of nodes in the cell type graph and calculating their transitivities. **p<5×10−5, ***p<5×10−7. (E) Boxplot showing the median and interquartile ranges of the number of enriched gene pairs in highly connected cell type groups. All cell type connections in these groups have at least 40 enriched gene pairs (dashed line). | |
Figure 5—figure supplement 1 Number of enriched gene pairs are mostly independent of edge transitivity. (A) Box plot showing the median and interquartile ranges of the number of enriched gene pairs in cell type mappings from all 21 pairwise mappings between the seven species. The whiskers denote the maximum and minimum observations. Of cell type mappings, 87% have greater than 40 enriched gene pairs (dashed line). Species acronyms are the same as in Figure 1A. (B) Top left: The edge transitivity is plotted against the number of enriched gene pairs for all cell type pairs in the connectivity graph. Dashed line: the linear best fit, with the Pearson correlation coefficient reported at the top. Top right: magnified view of the mapped cell type pairs supported by small numbers of gene pairs (<40) to show that those edges have low transitivity scores (<0.4). The sublots below show the number of enriched gene pairs and edge transitivity for individual species pairs. | |
Figure 5—figure supplement 2 Alignment scores are mostly independent of edge transitivity. Top left: alignment scores and edge transitivity for all cell type pairs in the connectivity graph including the seven species. Dashed line: the linear best fit, with the Pearson correlation coefficient reported at the top. Alignment scores and edge transitivity for individual species pairs are shown in the remaining subplots. | |
Figure 6 with 1 supplement SAMap identifies muscle and stem cell transcriptional signatures conserved across species. (A) Enrichment of KOG functional annotations calculated for genes shared in contractile cell types. For each species, genes enriched in individual contractile cell types are combined. (B) Expression and enrichment of conserved muscle genes in contractile cell types. Color: mean standardized expression. Symbol size: the fraction of cells each gene is expressed in per cell type. Homologs are grouped based on overlapping eukaryotic eggNOG orthology groups. If multiple genes from a species are contained within an orthology group, the gene with highest standardized expression is shown. Genes in blue: core transcriptional program of bilaterian muscles; red: transcriptional regulators conserved throughout Metazoa. (C) Enrichment of KOG functional annotations for genes shared by stem cell types. (D) Top: boxplot showing the median and interquartile ranges of the mean standardized expressions of stem cell-enriched genes in multipotent stem cells (MSCs), lineage-committed stem cells (LSCs), and differentiated cells (DCs). MSCs include sponge archaeocytes (Musser et al., 2019), hydra interstitial stem cells (Siebert et al., 2019), planarian neoblasts cluster 0 defined in Fincher et al., 2018, schistosome ε-cells (Tarashansky et al., 2019). LSCs include sponge transition cells, hydra ecto- and endo-epithelial stem cells; planarian piwi+ cells that cluster with differentiated tissues, and schistosome tissue-specific progenitors. Bottom: dot plot showing the mean standardized expressions of selected transcriptional regulators. The transcript IDs corresponding to each gene are listed in Supplementary file 6. | |
Figure 6—figure supplement 1 Phylogenetic reconstruction of animal contractile cell transcriptional regulators. Trees depict Csrp/Crip (A) and Fox group I (B) gene families. Genes labeled red are enriched in at least one contractile gene pair identified via SAMap. Support values indicate bootstrap support from 1000 nonparametric (Csrp) or ultrafast (Fox) bootstrap replicates. Besides these two transcriptional regulators, contractile cells in all seven species were found to be also enriched for transcription factors from the C2H2 Zinc Finger, Lim Homeobox, and Paired Homeobox families, though in different cell types we found enrichment of a number of distinct orthologs. Whether this reflects an ancestral role for these transcription factor families in regulating contractility or their independent evolution will require additional taxonomic sampling and broader coverage of muscle cell diversity to resolve. |
References [+] :
Alié,
The ancestral gene repertoire of animal stem cells.
2015, Pubmed
Alié, The ancestral gene repertoire of animal stem cells. 2015, Pubmed
Arendt, Evolution of neuronal types and families. 2019, Pubmed
Arendt, The origin and evolution of cell types. 2016, Pubmed
Barkas, Joint analysis of heterogeneous single-cell RNA-seq dataset collections. 2019, Pubmed
Baron, A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. 2016, Pubmed
Bennett, Nodal signaling activates differentiation genes during zebrafish gastrulation. 2007, Pubmed
Betancur, A Sox10 enhancer element common to the otic placode and neural crest is activated by tissue-specific paralogs. 2011, Pubmed
Bialkowska, Krüppel-like factors in mammalian stem cells and development. 2017, Pubmed
Briggs, The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. 2018, Pubmed , Xenbase
Brunet, The evolutionary origin of bilaterian smooth and striated myocytes. 2016, Pubmed
Buzgariu, Multi-functionality and plasticity characterize epithelial cells in Hydra. 2015, Pubmed
Cao, Comprehensive single-cell transcriptome lineages of a proto-vertebrate. 2019, Pubmed
Dubaissi, A secretory cell type develops alongside multiciliated cells, ionocytes and goblet cells, and provides a protective, anti-infective function in the frog embryonic mucociliary epidermis. 2014, Pubmed , Xenbase
Eddy, A probabilistic model of local sequence alignment that simplifies statistical significance estimation. 2008, Pubmed
Eden, GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. 2009, Pubmed
El-Brolosy, Genetic compensation triggered by mutant mRNA degradation. 2019, Pubmed
Erwin, The evolution of hierarchical gene regulatory networks. 2009, Pubmed
Fincher, Cell type transcriptome atlas for the planarian Schmidtea mediterranea. 2018, Pubmed
Gabaldón, Functional and evolutionary implications of gene orthology. 2013, Pubmed
Geirsdottir, Cross-Species Single-Cell Analysis Reveals Divergence of the Primate Microglia Program. 2019, Pubmed
Haghverdi, Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. 2018, Pubmed
Hie, Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. 2019, Pubmed
Hu, Lineage dynamics of the endosymbiotic cell type in the soft coral Xenia. 2020, Pubmed
Huerta-Cepas, ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. 2016, Pubmed
Huerta-Cepas, eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. 2019, Pubmed
Janicke, Zebrafish grainyhead-like1 is a common marker of different non-keratinocyte epidermal cell lineages, which segregate from each other in a Foxi3-dependent manner. 2010, Pubmed
Kalyaanamoorthy, ModelFinder: fast model selection for accurate phylogenetic estimates. 2017, Pubmed
Korsunsky, Fast, sensitive and accurate integration of single-cell data with Harmony. 2019, Pubmed
Kurauchi, Involvement of Neptune in induction of the hatching gland and neural crest in the Xenopus embryo. 2010, Pubmed , Xenbase
Larroux, Genesis and expansion of metazoan transcription factor gene classes. 2008, Pubmed
Laumer, Nuclear genomic signals of the 'microturbellarian' roots of platyhelminth evolutionary innovation. 2015, Pubmed
Letunic, Interactive Tree Of Life (iTOL) v4: recent updates and new developments. 2019, Pubmed
Li, Single-cell analysis of Schistosoma mansoni identifies a conserved genetic program controlling germline stem cell fate. 2021, Pubmed
Littlewood, Evolution: a turn up for the worms. 2015, Pubmed
MacPherson, HBO1 is required for the maintenance of leukaemia stem cells. 2020, Pubmed
Madeira, The EMBL-EBI search and sequence analysis tools APIs in 2019. 2019, Pubmed
Malkov, Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. 2020, Pubmed
Musser, Profiling cellular diversity in sponges informs animal cell type and nervous system evolution. 2021, Pubmed
Nanes Sarfati, Single-cell deconstruction of stem-cell-driven schistosome development. 2021, Pubmed
Nehrt, Testing the ortholog conjecture with comparative functional genomic data from mammals. 2011, Pubmed
Nguyen, IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. 2015, Pubmed
Pan, Myb permits multilineage airway epithelial cell differentiation. 2014, Pubmed
Pijuan-Sala, A single-cell molecular map of mouse gastrulation and early organogenesis. 2019, Pubmed
Plass, Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. 2018, Pubmed
Polański, BBKNN: fast batch alignment of single cell transcriptomes. 2020, Pubmed
Prince, Splitting pairs: the diverging fates of duplicated genes. 2002, Pubmed
Reddien, The Cellular and Molecular Basis for Planarian Regeneration. 2018, Pubmed
Regev, The Human Cell Atlas. 2017, Pubmed
Sarkar, The sox family of transcription factors: versatile regulators of stem and progenitor cell fate. 2013, Pubmed
Scimone, Orthogonal muscle fibres have different instructive roles in planarian regeneration. 2017, Pubmed
Sebé-Pedrós, Early metazoan cell type diversity and the evolution of multicellular gene regulation. 2018, Pubmed
Shafer, Gene family evolution underlies cell-type diversification in the hypothalamus of teleosts. 2022, Pubmed
Shafer, Cross-Species Analysis of Single-Cell Transcriptomic Data. 2019, Pubmed
Siebert, Stem cell differentiation trajectories in Hydra resolved at single-cell resolution. 2019, Pubmed
Sikder, Nonhistone human chromatin protein PC4 is critical for genomic integrity and negatively regulates autophagy. 2019, Pubmed
Stamboulian, The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. 2020, Pubmed
Stuart, Comprehensive Integration of Single-Cell Data. 2019, Pubmed
Studer, How confident can we be that orthologs are similar, but paralogs differ? 2009, Pubmed
Suzuki, Characterization of biklf/klf17-deficient zebrafish in posterior lateral line neuromast and hatching gland development. 2019, Pubmed
Tarashansky, Self-assembling manifolds in single-cell RNA sequencing data. 2019, Pubmed
Tatusov, The COG database: an updated version includes eukaryotes. 2003, Pubmed
Tosches, The bilaterian forebrain: an evolutionary chimaera. 2013, Pubmed
Tosches, Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles. 2018, Pubmed
Traag, From Louvain to Leiden: guaranteeing well-connected communities. 2019, Pubmed
Wagner, Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. 2018, Pubmed
Wang, Stem cell heterogeneity drives the parasitic life cycle of Schistosoma mansoni. 2018, Pubmed
Weir, A molecular filter for the cnidarian stinging response. 2020, Pubmed
Welch, Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity. 2019, Pubmed
Wendt, Flatworm-specific transcriptional regulators promote the specification of tegumental progenitors in Schistosoma mansoni. 2018, Pubmed
Wendt, Schistosomiasis as a disease of stem cells. 2016, Pubmed
Wolf, SCANPY: large-scale single-cell gene expression data analysis. 2018, Pubmed
Wong, Co-expression of synaptic genes in the sponge Amphimedon queenslandica uncovers ancient neural submodules. 2019, Pubmed
Yan, OrthoClust: an orthology-based network framework for clustering data across multiple species. 2014, Pubmed
Zeng, Prospectively Isolated Tetraspanin+ Neoblasts Are Adult Pluripotent Stem Cells Underlying Planaria Regeneration. 2018, Pubmed
Zeng, Heterochromatin protein 1 promotes self-renewal and triggers regenerative proliferation in adult stem cells. 2013, Pubmed