|
Graphical Abstract
|
|
Figure 1. Measuring the Aneuploidy in Tumors by the Broadening of Allelic Frequency Ratios Calculated from Sequencing Data
(A) 10 examples of FA histograms, each portraying the allelic frequencies of the heterozygous SNPs of human breast tumors. The number in the top right corner of each plot represents the FA ranking based on the broadening of the allele frequency peaks, with 1 being the widest peak and 510 the narrowest peak. Note that the lower the number, the broader the central peak of allelic frequencies.
(B) Replots of 5 of the tumors from (A) showing the allele frequency ratios along chromosome positions corresponding to coordinates in the GRCh37 genome (so each chromosome runs p arm to q arm). Note that the LOH events that separate the central peaks into two often span large segments of chromosomes or whole chromosomes, which is consistent with chromosome missegregation driving the LOH event.
(C) Plot of the frequency of whole (blue) or partial (red) chromosome LOH for each chromosome.
(D and E) The number of chromosomes with an LOH event in breast tumors correlates with the ranking (D) and FA score (E) of the tumors. R2 values were calculated by fitting to a second-order polynomial curve in Excel. Our method of LOH quantification is summarized in Figure S5.
|
|
Figure 2. Identification of Genes Mutated in Aneuploid Tumors
(A) Histogram showing the distribution of FA scores of the 522 analyzed human breast tumors. The boxes highlight the distributions of the 100 highest and lowest scoring tumors.
(B) The number of chromosomes with LOH events was compared in the 100 highest and lowest aneuploid tumors to demonstrate that the analysis stratifies tumors by aneuploid status. The p value was generated by Welch two-sample t test in R.
(C) All genes significantly mutated in the high aneuploid tumor sets were identified by comparing the sequence data from the 100 highest and lowest ranked tumors using the VAAST program. The p values were calculated by VAAST.
(D) p53 mutations are correlated with functional aneuploidy in the 250 top ranked tumors.
|
|
Figure 3. Regulators of Mitosis Are Overexpressed in Aneuploid Breast Tumors
The 100 most overexpressed genes in the high-FA-ranking tumors (BrFA100) were identified by comparing RNA expression data of the 100 high-FA and low-FA tumors.
(A) The overlap of the BrFA100 genes, the CIN70 list, and genes present in 3â6 previously published proliferation signatures (Multiple Proliferation Signatures).
(B) STRING diagrams (http://string-db.org) of the BrFA100 list show a highly noded grouping of mitotic regulators, mitotic cell-cycle genes, and DNA replication and repair proteins. The top 10 GO terms of the BrFA100 list shows a strong enrichment of cell-cycle genes, which is driven by a high M-phase gene enrichment (Table S4).
(C) Specifically, we plotted the relative fold enrichment of genes in the chromosome segregation GO term in the BrFA100, CIN70, and six different proliferation signatures.
See also Table S4.
|
|
Figure 4. Mutations in TP53 and Overexpression of E2F1, MYB2L, and FOXM1 Are Highly Associated in Breast Tumors
(A) The overlap of BrFA100 and target genes of mitotic transcriptional regulation complexes DREAM, FoxM1-MuvB/MMB, and Rb-E2F.
(B) The percentage of tumors in each group of 50 (ranked by aneuploid status) with 1, 2, or 3 of the transcription factors MYBL2, FOXM1, and E2F1.
(C) Venn diagram of the overlap of the BrFA100 with the top 400 genes downregulated by TP53 (p53 expression score of less than â10, as listed in Fischer et al., 2016).
(D) Venn diagram to show the overlap of ChIP-seq datasets for E2F1, MYB2L, and FOXM1 with the BrFA100 list. Gene lists are shown in Table S3.
(E) Association p values of TP53, E2F1, MYB2L, and FOXM1 as individual pairs. p values were obtained through Fisher exact tests with Benjamini-Hochberg multiple test corrections.
(F) The percentage of tumors in each group of 50 that have a TP53 mutation and 1, 2, or 3 overexpressed transcription factors.
(G) Association of TP53, MYBL2, E2F1, and FOXM1 in 960 human breast tumors of the TCGA. Plots were generated at the cBioPortal (www.cbioportal.org). (Gâ) The percentage of the 960 tumors with a TP53 mutation and either an amplification (AMP) of the gene as defined by a positive GISTIC score or an upregulation (Up) of the mRNA as defined by a Z score > 2. TF, transcription factor.
See also Table S3.
|
|
Figure 5. Overexpression of hE2F1, hFOXM1, and hMYBL2 Is Sufficient to Generate CIN Phenotypes in Xenopus Embryos
(A) 2-cell-stage embryos were injected with either RNA containing stop codon after 33 nt (â) or functional hE2F1, hFoxM1, and hMybL2 (+), as detected by western blot.
(B) Representative images of TOPRO-stained normally dividing animal caps and two of the most common CIN phenotypes seen in triple-overexpressing embryos. Yellow arrows indicate a lagging chromatid; blue arrow indicates a micronucleus.
(C and D) Quantification of lagging chromatids (C) and micronuclei (D) in control, triply overexpressing, or singly injected embryos through fixed-animal cap analysis.
(E) Representative time-lapse series of an animal cap expressing H2B:GFP with normally dividing control cells (co-injected with Ruby-Dextran) or CIN-like phenotypes seen in neighboring triple-overexpressing cells. Blue arrows indicate abnormal divisions seen as lagging chromatids and micronuclei. Time points chosen to show anaphase events.
(F) Quantification of lagging chromatid events as seen in time-lapse videos of control embryos, triply overexpressing embryos, and overexpression of only xMYBL2.
Full supplemental videos are available upon request. Scale bars represent 40 μm in all images. ââp < 0.01; âââp < 0.001; ââââp < 0.0001, one-way ANOVA and Bonferroni post-test statistics,. Error bars represent ±SEM. 8 hpf, 8 hr post-fertilization.
|
|
Figure 6. Characterization of the Tumors Scored as High and Low FA
(A) Tumor subtype distribution of the 100 tumors scored as the highest FA and lowest FA.
(B) Kaplan-Meier curve demonstrating that FA status indicates good prognosis for the luminal B subtype of tumors.
(C) Our two-hit model for the generation and propagation of functional aneuploidy; note that we do not indicate which event takes place first.
|
|
S1. Why aneuploidy changes the AAF of heterozygous alleles (Related to Figure 1). We calculate the AAF of a SNP by the number of next gen sequence reads of the alternate allele divided by the number of total reads for that locus. All heterozygous SNPs will generate an AAF around 0.5 in a normal sample. However, in the theoretical case where there is an extra chromosome in about half the cells in the tumor then the heterozygous SNPs on that chromosome will both increase (if they are the alternate allele) or decrease if they are the allele on the reference genome. The AAF is dependent upon both the amount of aneuploidy (in this case 3 chromosomes) and the percentage of the cells in the tumor that have that aneuploidy (50% of the cells). In this theoretical case, the AAF for all the heterozygous SNPs on the extra chromosome will be either 0.42 or
0.58.
|
|
S2. The pipeline used to measure aneuploidy in tumors (Related to Figure 1). Germline heterozygous SNPs are identified from breast cancer TCGA exome data based on their presence in paired normal samples. After calculation of alternate allele frequencies (AAFs; top plots), heterozygous SNPs are defined as those with an AAF>=0.25 and AAF<=0.75 in the normal sample (middle plots). When one generates histograms quantifying the number of initially heterozygous SNPs with various AAF the distribution in the tumor samples then one can detect aneuploidy and tumor heterogeneity by two different mechanisms. First the central peak around AAF broadens. Second, if there is LOH of chromosomes in a large percentage of the cells then all of the SNPs now generate AAF peaks at 0 or 1, which generate peaks that are outside the central peak. We also generate a second plot for each tumor (bottom plots). In these plots the AAF is on the Y-axis and Chromosome position is on the X-axis and each SNP is given a single dot. The tumor used in this example was scored as having the 14th most functional aneuploidy of 522 tumors.
|
|
S3. Method of scoring FA according to the AAF plots (Related to Figures 1,2). We generated line graphs that represent the shape of the associated AAF plot and then calculated a standard deviation of the associated curve, as visualized by the width of the peaks.
|
|
S4. AAF histograms of the Normal (non-transformed) samples (Related to Figure 1). Here we show the matched normal samples from the patients whose tumors are shown in Figure 1A. The ranking of each tumor is shown as the number in the top left corner of each histogram.
|
|
S5. Method of distinguishing between whole and partial LOH events (Related to Figure 1 and 4). A) We manually visualized the chromosome position vs. AAF plots. We scored every chromosome of every tumor by determining if there were two major peaks maxima below AAF 0.25 and greater than 0.75 that spanned along a chromosome. These AAF ratios were chosen to rule triploidization events that generated peaks at 0.33 and 0.67. Note that it is possible that we miscall a chromosome LOH event if there are more than 4 times the number of one parental chromosome over its homolog. Each chromosome was scored as: 1) entire chromosome that had a split peak across all positions for a chromosome was scored as âWhole Chromosome LOHâ, if we could find splitting of some regions of chromosome but others with allele frequencies between 0.25 and 0.75 it was scored as âPartial Chromosome LOHâ, if we could not find any splitting of peaks along a contiguous region of a chromosome the tumor was scored as âNo Chromosome LOHâ. The number of chromosomes with an LOH events comprising a whole chromosome (B) in breast tumors correlates with the FA score (similar to Fig 1D). R2 value was generated by fitting the points to a linear regression in Excel. C) Plot to show that partial chromosome events correlated with tumor ranking although this correlation was lower than either the total number of LOH events or the whole chromosome events. R2 value was generated by fitting the points to a second order polynomial curve in Excel. D) TCGA RNA-seq gene expression data from primary solid tumor sample of breast cancer patients for MYBL2, E2F1, and FOXM1 was compared for the 200 highest and 200 lowest FA scoring tumors. We stratified the data by different subtypes of breast cancer patients, including Basal, HER2+, Luminal A, and Luminal B and performed a T-test between high FA and low FA TCGA breast cancer patients, and report the p-value.
|
|
S6. BrFA100 is significantly from proliferation signatures (Related to Figure 4) and xtp53 knockdown alone is not responsible for Xenopus phenotypes (Related to Figure 5). The overlap of the BrFA100, CIN70 and each of 6 different proliferation signatures are shown through Venn Diagrams (A). Gene lists are available in Table S3. (B) A different visual representation of the overlap of the BrFA100 and the ChIP-Seq data sets for E2F1, FoxM1, and MybL2. (C) Western Blot analysis of p53 Morpholino injected Xenopus embryos at Stage 9 and Stage 22 shows that a significant decrease in tp53 protein level is not seen until much later than when most of our in vivo experiments take place. This is why we do not see an increase in the number of micronuclei or lagging chromosomes in p53MO injected embryos (D) (n=30 for controls, n=15 for p53MO experiments).
|