|
Fig 1. Ectoderm microdissections and experimental strategy. (A) Xenopus laevis neurula-stage embryos were dissected at Nieuwkoop and Faber stage (St.) 12.5 and St. 14 according to the patterns indicated. Live embryos are shown in dorsal view. The superficial ectoderm was separated from the underlying mesoderm germ layer. The tissues collected corresponded to the developing neural plate (NP, blue), medial and lateral neural border (NB, red-pink-beige), anterior neural border and preplacodal ectoderm (ANB/PPE or ANF/PPE, violet), and nonneural ectoderm (NNE, green, yellow-brown). NP, NB (at St. 12.5 and St. 14), and NNE (at St. 14 only) were further subdivided into 2 or 3 pieces: anterior (a, darker color) and posterior (p, lighter color) fragments, as indicated. For St. 12.5 NB only, a more lateral fragment was cut (l, beige), (lateral refers to medial-lateral coordinates on the dorsal view or corresponds to more ventral if using dorsal-ventral whole embryo coordinates). Each sample is associated to 1 color throughout this study (color code reference in S1 Table). (B) This dissection pattern (dotted lines) matches with reference genes expression patterns in the developing neurula ectoderm as indicated for pax3 (dorsal-lateral NP and NB), sox2 (NP and St. 14 PPE), and snail2 (premigratory neural crest within the NB). The dissection line is indicated outside the embryo whenever the dissection encompasses tissues located more ventrally than seen in dorsal view (e.g., for PPE and NNE). Similarly, the blastopore (white circle) can be indicated shifted outside the embryo if located more ventrally. (C) Whole embryos at each stage of neurulation were used for studying the time course of gene expression in neurulas. Drawing after living embryos. (D) The main steps of sample treatments and bioinformatics analysis are indicated. After total RNA purification from each individual dissected sample or whole embryo, quality was checked using quantitative PCR. Suitable samples (see text) were processed for deep transcriptome sequencing. Details of analysis workflow are indicated in Results and Materials and methods.
|
|
Fig 2. Unsupervised analysis of variance between samples using principal component analysis (PCA) retrieves biologically meaningful components, matching embryo polarity axes, neural border (NB) formation, and developmental stage. (A, B) At Nieuwkoop and Faber stage (St.) 12.5 (A, circles) and St. 14 (B, triangles), the first 2 PCA components segregated ectodermal samples according to their position along the dorsal-ventral (D-V) and anterior-posterior (A-P) embryonic axes, respectively. Percent of total gene expression variance captured by each component is indicated between parentheses. Sample color code is identical to Fig 1A. Lighter background colors are Voronoi diagram areas. These areas are drawn using the barycenter of samples belonging to the same dissected region. Each area indicates the regions of the PCA plane, where all points are closer to the barycenter of this dissected region than to any other barycenter. Arrows indicate anterior neural border samples falling slightly outside their intended dissected region (see text for details). (C, D) At St. 12.5 (C) and St. 14 (D), t-distributed stochastic neighbor embedding (tSNE) groups samples according to dissected regions. (E, F) At St. 12.5 (E) and St. 14 (F), neural border samples are located higher than most others along the third PCA component. This indicates that component 3 correlates with the acquisition of the NB characteristics during neurulation. (G, H) When combining St. 12.5 (circles) and St. 14 (triangles) samples together, the first component remains correlated to the D-V axis. The second component sorts out most samples according to their stage, irrespective of their position along the D-V axis (triangles separated from circles, (G). The first and third PCA components segregate most samples according to their position along the D-V and A-P axes, irrespective of their stage (circles and triangles of same color together, [H], Voronoi diagram). The posterior neural border (NBp), which is known to resemble the posterior neural plate (NPp) at St. 12.5, grouped with the NPp, while the anterior neural border (NBa)âSt. 12.5, NBaâSt. 14, and the lateral neural border (NBl)âSt. 12.5 belong to the same Voronoi cell. As noticed above, NBaâSt. 14 is more heterogeneous (red triangles).
|
|
Fig 3. Nonnegative matrix factorization (NMF) efficiently deconvolutes differential expression profiles. The NMF algorithm defines a set of ectoderm tissues (NMF-tissues) at Nieuwkoop and Faber stage (St.) 12.5 (A) and St. 14 (B). Columns in the mixing matrix indicate how these tissues contribute to the initial dissected samples. Red indicates high contribution; blue indicates low/no contribution. Interesting intermediate contributions are outlined by white dotted lines; stars (*) refer to comments in text. From these contributions, dissected samples (left part of the pattern drawn on embryo) were matched with 1 of the NMF-tissues (right part of the pattern). At St. 12.5, striped pattern indicated the mixed contribution of posterior neural plate and neural border (NMF-NP(p), NMF-NB(p)) to NBp samples. Abbreviations and colors are the same as in Fig 1 and S1 Table.
|
|
Fig 4. Nonnegative matrix factorization (NMF) predicts accurate expression patterns for neural border (NB) genes. In order to represent gene expression levels on the map of the developing ectoderm, we used 2 approaches: averaging the expression values for each of 7 domains or using NMF deconvolution predictions in 5 domains. We found those 2 approaches complementary. Average patterns directly describe raw data but are sensitive to shifts in dissection. NMF deconvolution predicted expression levels in each NMF-tissue. In the case of the anterior neural border (NBa), the NMF pattern is closer to in situ hybridization patterns than average patterns, e.g., for known genes expressed in various ectoderm areas, sox2, snail2, pax3, and foxi1e (also see Fig 8, S4 and S7 Figs for more examples). Moreover, these patterns predict expression for genes of unknown pattern and with low expression level, such as sp8. We present the average pattern and NMF-predicted pattern (NMF pattern); each is normalized on a percent scale, 100% being maximal expression of the gene of interest. This implies that genes with low expression will be represented in similar shades as genes expressed at high level, which allows better visualization but could be misleading if the gene is very weakly expressed in all ectoderm (e.g., sp8 at Nieuwkoop and Faber stage [St.] 12.5). To complete pattern prediction with expression level information, we thus also include the average expression levels per dissected sample for each gene (reads per kilobase per million [RPKM] shown for St. 14 here). Robustly expressed genes (>100 RPKM, e.g., sox2, snail2, and pax3 at St. 14) are predicted as well as genes with very low expression (e.g., snail2 at St. 12.5 or sp8 at St. 14). In addition, the color of the dot indicates high ectoderm specificity (green) to low ectoderm enrichment (orange). See text and Materials and methods for details, see S11 Table for numerical data.
|
|
Fig 5. Identification of gene ontology (GO) biological functions enriched in weighted gene co- expression network analysis (WGCNA) groups. (A) Distribution of Xenopus laevis transcripts in WGCNA groups. The color-filled bars on the histogram indicate the number of unique Entrez geneID records per group. Functions of some developmentally relevant groups are highlighted: the Wnt responding genes in blue (Groups #23 and #41 in blue), neural crest (#51, purple), forebrain (#38, orange), mRNA splicing (#80, pink), and cilium morphogenesis in green (#22, green). Refer to S6 and S6B Table for the full list of functions associated with each group. (B) Comparison between groups generated by WGCNA (blue) and those created by random assignment of transcripts (red). About 45% of the WGCNA groups associate to at least 1 biological function, which is significantly more than the average obtained for the random dataset groups (4%; the 5 random datasets of 141 groups). The distribution of adjusted p-values and sensitivities (1-β) of overrepresented terms show that, overall, WGCNA groups are more likely to contain a relevant biological function than groups assigned randomly. (C) Specificity versus sensitivity plot, in which each of the data points represents a single GO function enriched in a group. Black circles represent all 80 biological functions enriched in the 5 random datasets (total of 705 groups). The Wnt responding genes (Groups #23, blue), neural crest (#51, purple), forebrain (#38, orange) are highlighted. For further description of the function, refer to S6B Table; see S11 Table for numerical data.
|
|
Fig 6. Distribution of Wnt-responding genes and genes involved in neural, neural crest (NC), and placodal development across weighted gene co-expression network analysis (WGCNA) synexpression groups. The histogram shows how many transcripts were found in each synexpression group and if this represented a significant enrichment compared to uniform distribution in all groups; see S11 Table for numerical data. (A) The Wnt-responsive transcripts identified by Kjolby and Harland [49] were mainly distributed in 4 co-expression groups: 47% of the genes positively regulated by Wnt signaling from the input list were found in groups #23, #31, and #41, while 62% of genes negatively regulated by Wnt were found in group #92. Group #23 contained posterior genes (e.g., cdx1/2/4, hes5/9, ngfr, kremen2); group #31, Wnt-signaling components (xarp, tcf7); group #41, meso-endoderm specifiers (e.g., msgn1, myf5, t, vegt); and group #92, genes up-regulated in absence of Wnt signaling. (B) Placodal and NC genes were enriched in groups #38 (43% of the input list) and #51 (70% of the input list), respectively. Contrarily to NC markers, the neural border genes, many of which are more broadly expressed, were distributed across many different co-expression groups; however, pax3 and myc were part of group #51.
|
|
Fig 7. Genes with predicted expression and functional association to the neural crest (NC). (A) Validation of the expression pattern of sh3pxd2a at Nieuwkoop and Faber stages (St.) 12.5 and St. 14. Upon prediction of NC expression by nonnegative matrix factorization (NMF), we cloned a probe for novel gene sh3pxd2a and analyzed its in vivo pattern: while expressed at very low levels (B, 25 reads per kilobase per million [RPKM] in the anterior neural plate border [NBa] St. 14), it was readily detected, at low level, in the prospective NC in vivo. (B) Average RPKM expression per tissue dissection highlighting sh3pxd2a ectodermal enrichment in the NBa. Expression level for each homeologous gene copy is indicated. (C) Expression pattern predictions for 2 novel genes grouped in G-51, a group associated with NC development: cacng, cdon. Egr4 was recently shown expressed in cranial NC [35]. Mean expression of both homeologous copies is indicated for each dissected region. (D) Temporal expression during neurulation (mean expression levels in whole embryos) for each homeologous copy of cacng4, cdon, and egr4. See S11 Table for numerical data.
|
|
Fig 8. A few Xenopus laevis homeologous gene pairs display spatial subfunctionalization. We have studied how homeologous genes are expressed in the whole dataset (using weighted gene co-expression network analysis [WGCNA]) or in the dissected regions. We found that most pairs with differential expression exhibit asymmetrical decrease of 1 copy (e.g., egr4, sh3pxd2a, snail2, Fig 7, S11 Fig). We found 5 pairs with spatial differential expression, suggesting spatial subfunctionalizations: at Nieuwkoop and Faber stage (St.) 12.5, alas1, ccdc68, gmnc-like, and hs6st1 and at St. 14, p2ry1. Average and nonnegative matrix factorization (NMF) expression patterns are shown at the relevant stage.
|
|
Fig 9. Gene co-expression network of Xenopus neurula visualized biologically meaningful relationships between weighted gene co-expression network analysis (WGCNA) groups and subsets of genes. We obtained a gene co- expression network using a large input gene list, including Wnt-related transcripts, neural, neural crest (NC), placode, and mesoderm genes as input nodes (large points, S10 Table). The p-value cutoff <1E-09 generated 2,001 nodes (small points indicate closely correlated genes; the color of the points refers to a given WGCNA group) and 4,945 edges. Circles are drawn to groups of genes with close spatial/temporal expression patterns. The group was named based on known members of the group. Groups are linked to other groups with positive (thin line) or negative (thick line) correlation. As an example, NC genes (groups #51 and #132) and placodes (group #38, in green) display negative correlation of expression with posterior genes (groups #23, #20, thick lines). The network also indicates interactions between ventral genes such as ventx genes (group #138), positively correlated with bmp4 and bmp7.2 (thin lines). When running the EctoMap application, the network is interactive and indicates each gene name upon selection.
|
|
Fig 10. Genes linked to human melanoma are tightly associated with Wnt-associated genes and neural crest (NC) genes during development. (A) Several (26) genes linked to melanoma super-enhancers defined by Kauffman et al. (2016) were found among the posterior/Wnt genes group (#23) and the NC group (#51). (B) Cumulative frequency of genetic alterations found in these 26 genes in various cancer types: NC-derived cancers are outlined in purple. High mutation frequency (>45%) is found in melanomas. The nature of the genetic alteration is plotted along y-axis. We used 70 studies describing at least 100 tumors of each kind (TCGA). (C) These 26 genes were frequently mutated in cutaneous and uveal melanoma: this diagram displays genetic alterations found for each gene above 5% frequency, in 191 out of 287 cutaneous melanomas and in 24 out of 80 uveal melanomas. See S11 Table for numerical data. Panels (B) and (C) are based on data obtained by the TCGA Research Network: http://cancergenome.nih.gov and visualized by cBioPortal.
|
|
Fig 11. EctoMap: An application to explore gene synexpression during neurulation. (A) Workflow for the EctoMap application, indicating the input options and the files needed to produce the output plots. (B) Example of outputs (see also S1 Web Archive).
|
|
S1 Fig. Silhouette analysis. (A, B) We have used the silhouette number (a clustering quality measure) to assess how samples cluster according to their dissected region in the space defined by the first 3 principal component analysis (PCA) components. We have compared 3 com- monly used methods at stage 12.5 and stage 14: the range of log2(expression) (range), the vari- ance (var), and the interquartile range (iqr). The silhouette number allows comparing these 3 methods and defining the appropriate number of genes to select (red line). All 3 methods gave similar silhouette profiles as a function of number of genes, with a sharp drop as genes with lower differential expression were included. We used these curves to define that between 60 and 1,200 genes should be selected for stage 12.5 and between 1,100 and 6,000 genes for stage 14. (C, D) PCA plots along component 1 (PC1) and component 2 (PC2) illustrate the influence of the number of genes selected on the quality of PCA results. (E, F) At both stages 12.5 and 14, a common threshold of 5 (red dotted line) in the range of gene expression was used to define the set of genes used in the first analyses: 1,174 genes at stage 12.5 and 1,859 genes at stage 14.
|
|
S2 Fig. Distribution of principal component analysis (PCA) components. Each PCA com- ponent captures a decreasing part of the variance in gene expression between samples. Signifi- cance is computed as indicated in Materials and methods, and 1% significance line is drawn. PCA component contribution to variance is plotted for stage 12.5 samples (A), stage 14 sam- ples (B), and both stages together (C).
|
|
S3 Fig. Principal component analysis (PCA) identifies genes whose expression is strongly correlated with PCA components 1 and 2. As we matched PCA components 1 and 2 to dor- sal-ventral (D-V) and anterior-posterior (A-P) axes, we looked for the genes best correlated to each component (S4 Table). Color code indicates dissection region identity of the sample according to Fig 1 and S1 Table. (A, B) Neural plate gene sox2.1 expression is correlated to PCA component 1 at both stages, more weakly at stage 12.5 (A, correlation coefficient = â0.86) than at stage 14 (B, correlation coefficient = â0.91). (C, D) Novel gene zc4h2.l expression is even more highly correlated to PCA component 1 at stage 12.5 (C, correlation coefficient = â0.98) and stage 14 (D, correlation coefficient = â0.96). (E, F) Posterior gene cdx2.s presents a high negative correlation with PCA component 2, both at stage 12.5 (E, correlation coefficient = â0.93) and 14 (F, correlation coefficient = â0.95). (G, H) Anterior gene fzd8.s is highly posi- tively correlated with PCA component 2, both at stage 12.5 (G, correlation coefficient = 0.91) and 14 (H, correlation coefficient = 0.88). See S11 Table for numerical data.
|
|
S4 Fig. Nonnegative matrix factorization (NMF)-predicted expression for genes most cor- related to principal component analysis (PCA) axes 1 and 2. Dissected regions are projected along PCA components 1 and 2 in a pattern matching with embryonic dorsal-ventral (D-V) and anterior-posterior (A-P) axes, respectively. We have selected the 5 genes most correlated to these components (from list in S4 Table) and predicted their expression pattern using NMF. Patterns for the 5 genes with best positive or negative correlation are presented. This indepen- dent analysis confirms that the selected genes show clear D-V or A-P pattern restrictions and could thus be good diagnostic markers for position along each axis, respectively.
|
|
S5 Fig. Expression level distribution between biological replicates. (A, B) Expression level for each biological replicate is plotted for snail2 and sox2, for each dissected tissue. This high- lights the variability of expression that complicates the establishment of region-specific gene signatures for adjacent regions. Ec = NNE, nonneural ectoderm. See S11 Table for numerical data.
|
|
S6 Fig. Nonnegative matrix factorization (NMF) convergence. Convergence of NMF to a single minimum was assessed by clustering rows of the mixing matrix obtained by running NMF deconvolution with random initialization 20 times and checking that the number of tight clusters obtained equal NMF rank. This procedure shows that a single minimum is obtained up to rank 5 both at stage 12.5 (A) and stage 14 (B), although for stage 12.5, rank 4 does not lead to tight clusters. Higher ranks do not lead to a single solution, as the number of clusters recovered exceeds the rank and/or clusters are not tight (rank 6 at stage 12.5). Mixing matrices for each solution from 2 to 7 ranks are shown. Stars ( ) point out clusters that are not tight.
|
|
S7 Fig. Gini index defines a specific gene signature for each nonnegative matrix factorization (NMF)-predicted ectoderm region. Gini enrichment index was used to define the genes most specifically enriched in 1 of the 5 tissues predicted by NMF deconvolution (S5 Table). Five genes with highest Gini index are shown for each NMF-tissue. This defined known and novel genes to characterize each ectoderm region.
|
|
S8 Fig. Distinction between ectoderm-enriched genes and posterior mesoderm-expressed genes by combining nonnegative matrix factorization (NMF) pattern and average expres- sion analysis. In complement to average or NMF-predicted expression patterns, we indicate the enrichment of gene expression in ectoderm, compared to whole embryo expression. Ecto- derm enrichment index allows distinguishing between genes highly and specifically expressed in the ectoderm germ layer (e.g., sox10, dark green, expression of which is initiated at stage 14) and genes found in the posterior neural tissue because of attached mesoderm cells (myod, myf5, yellow, low level). All intermediate situations are found, including ubiquitously expressed genes or genes enriched in 1 dissected region but not in the others. Ec = NNE, non- neural ectoderm. See S11 Table for numerical data.
|
|
S9 Fig. Weighted gene correlation network analysis (WGCNA). (A) Selection of the soft threshold power used to obtain the signed matrix. (B) Mean connectivity for each of the soft threshold powers. (C) Dendrogram displaying the 141 gene co-expression groups obtained by the high topological overlap using the dynamic tree cut algorithm, using a soft power threshold of 22, hybrid cut, deep split = 4. A minimum of 15 genes were required to belong to a cluster; otherwise, genes were assigned to Group #0. See S11 Table for numerical data.
|
|
S10 Fig. Weighted gene correlation network analysis (WGCNA) identifies homeologous gene pairs expressed with different spatial or temporal dynamics during neurulation.
(A) The gene set used for WGCNA contained 2,520 gene pairs, most of which display different expression dynamics, thus falling into different WGCNA groups. (B) Global analysis of groups: groups associated with developmental processes (G51, G23, G34, and G38) tend to contain both copies of homeologous pairs, while groups associated with global cell biology retained few pairs. (C) Gene ontology (GO) terms associated with groups retaining few pairs (terms related to cellâcell relationships) or with groups retaining pairs (terms related to tran- scription). See S11 Table for numerical data.
|
|
S11 Fig. Expression dynamics of homeologous gene copies during neurulation. Expression pattern comparison between selected homeologous transcription factor pairs. The spatial and temporal expression of both copies of pax3, sox10, and six3 highly correlate in all 79 tissue samples (R2 > 96, p-value = 0). Other genes exhibit expression level differences over different time points (id2, zic1, six4) or across space and time (snai2, id2). The lowest correlations observed between the displayed homeologous pairs are snai2.l and snai2.s (R2 = 0.60, p-value = 4e-09) and zic1.s and zic1.l (R2 = 0.65, p-value = 4e-11). The parentheses after the gene names correspond to the co-expression groups assigned by weighted gene correlation network analy- sis (WGCNA). See S11 Table for numerical data.
|
|
S12 Fig. Analysis of evolution of snail2 homeologous gene pair. Snail2.l and snail2.s exhibit differential expression with asymmetrical decrease of snail2.s (S11 Fig). (A) Tree for snail2 genes. Snail2.l, the copy with retained expression in X. laevis, is closer to X. tropicalis gene than to X. laevis snail2.s. (B) In X. laevis, mutations are accumulating in the low complexity regions of Snail2.s protein compared to Snail2.l or X. tropicalis Snail2. (C) Detail of the mutation observed on Snail2 proteins. (D) Protein alignment confirms that the mutations in Snail2.s are found on residues conserved in other vertebrates.
|