|
Figure 1. Theoretical models of how gRNA-specific efficiencies and frameshift gene editing outcome probabilities influence the cellular composition and percentage of protein knockout cells in a mosaic F0 animal model. (A) There is a non-linear relationship between gRNA-specific probability of obtaining a frameshift gene editing outcome (x-axis) and the probability of obtaining a biallelic frameshift gene editing outcome in a single cell (y-axis). E.g. upon a gRNA-specific frameshift frequency of 80%, the probability of a single biallelic edited cells to be biallelic frameshift mutant is 64% (0.80*0.80). (Grey demarcation). (B) Examples of theoretical outcomes of gene editing (presuming 100% on-target efficiency) in an F0 mosaic varying one parameter: gRNA-specific probability of frameshift editing. (C) Examples of theoretical outcomes of gene editing in an F0 mosaic varying two parameters: gRNA-specific probability of frameshift editing and gRNA-specific on-target efficiency. E.g. for a 100% efficient gRNA with an 80% gRNA-specific probability of frameshift editing, we expect 64% of the cells to be biallelic frameshift mutant (see grey demarcation in A). Please note, blue circles represent cells that are biallelic gene edited, but retain at least one in-frame mutation and cannot be considered complete protein knock-out. (D) Flowchart representing the pipe-line for investigating the correlations between experimentally observed in vivo gene editing outcomes and gene editing outcomes projected by computational prediction models.
|
|
Figure 2. The InDelphi prediction model, trained in mESC cells, accurately predicts CRISPR/Cas9 gene editing outcomes and outperforms several other prediction models in X. tropicalis embryos. (A) Scatter plot with model-predicted cumulative frameshift gene editing frequencies correlated to experimentally observed cumulative frameshift gene editing frequencies, for each sgRNA (nâ=â28) separately, in X. tropicalis embryos. Black demarcated lines show the perfect correlation râ=â1. Light-grey shows the standard error of the best-fit linear regression line. (B) Scatter plot with model-predicted INDEL patterns correlated to experimentally observed INDEL patterns, for all gRNAs simultaneously. Black lines show linear regression models of all correlations. Black demarcated lines show the perfect correlation râ=â1. (C) Correlations between model-predicted and experimentally observed INDEL patterns, for each gRNA separately. Error bars represent meanâ±âSD. (***pâ<â0.001; **pâ<â0.01; *pâ<â0.05; nsâ=ânot significant; ShapiroâWilk (pâ>â0.05); Levene (pâ<â0.05); One-way Welsh ANOVA to adjust for unequal variances (pâ<â0.001), with Games-Howell multiple comparisons) (Table S2). (D) Violin plots of the residuals (predicted frequencyâobserved frequency) between model-predicted and experimentally observed frequency ofâ+â1 insertion gene editing outcome. (E) The SEM of the mean residual difference (predicted frequencyâobserved frequency) between model-predicted and experimentally observed frequency of all deletion variants modeled.
|
|
Figure 3. The InDelphi-mESC model accurately predicts CRISPR/Cas9 gene editing outcomes in X. tropicalis, X. laevis and zebrafish embryos which can be exploited to identify high-frameshift frequency gRNAs. (AâF) Scatter plot with InDelphi-mESC-predicted cumulative frameshift gene editing frequencies correlated to experimentally observed cumulative frameshift gene editing frequencies, for each sgRNA separately, in X. tropicalis (nâ=â14) (Panel A), in X. laevis (nâ=â6) (Panel B) and in zebrafish (nâ=â15) embryos (Panel C). Scatter plot with InDelphi-mESC-predicted INDEL patterns correlated to experimentally observed INDEL patterns, for all gRNAs simultaneous, in X. tropicalis (nâ=â14) (Panel D), in X. laevis (nâ=â6) (Panel E) and zebrafish (nâ=â15) (Panel F) embryos. Black demarcated lines show the perfect correlation râ=â1. Light-grey areas show the standard error on the best-fit linear regression line. Black lines show linear regression model. (G) Correlations between model-predicted INDEL patterns to experimentally observed INDEL patterns, for each gRNA separately. Correlations for X. tropicalis embryos (nâ=â14) (dark blue) and X. laevis embryos (nâ=â6) (middle blue) analyzed by Sanger sequencing and sequence trace decomposition. Correlations for zebrafish embryos analyzed by targeted amplicon sequencing (TAS) (nâ=â15) (light blue). (H) Using the distribution of the expected probability of frameshift frequency for a large dataset of SpCas9 human target sites in mESC cells from Shen et al. 2018 (black lineâmonoallelic)27, we draw the derivative distribution of the probability of a randomly designed gRNA to generate biallelic frameshift editing. This distribution is shown for different editing efficiencies within the F0 mosaic animal: 100%, 50% and 25% (in reducing intensities of blueâ100 circles, each circle represents a cell within a total mosaic of a 100 cells). E.g. The probability of a randomly designed gRNA to yield more than 80% biallelic frameshift mutant cells in a developing mosaic, assuming 100% efficiency, is the area under curve highlighted in pink and represents only a 3.24% probability.
|
|
Figure 4. Integrating CRISPRscan and the InDelphi-mESC model allows identification of efficient high frameshift frequency gRNAs in X. tropicalis. (A) Scatterplot with marginal histograms demonstrating for 339,693 gRNAs across the coding sequence for 4,860 X. tropicalis genes the relationships between calculated CRISPRscan score, InDelphi-mESC predicted frequency of MMEJ repair and InDelphi-mESC predicted knockout-score (KO-score). KO-score is defined as the predicted percentage of cells with biallelic out-of-frame mutations within the pool of all mutant cells (i.e. in-frame and out-of-frame; mono- and bi-allelic) in the mosaic mutant embryo and is calculated as the square of the frameshift frequency predicted by InDelphi-mESC. For each gene (nâ=â4,860), the gRNA with the highest predicted KO-score (Highest-in-class) is highlighted in blue, while the gRNA with the lowest predicted KO-score (Lowest-in-class) is highlighted in orange. Demarcations illustrate those quadrants where gRNAs suffice to certain cutoff thresholds. Ideally, designed gRNAs fall within the aquamarine demarcation (high predicted KO-score, high CRISPRscan score), but not the orange (low predicted KO-score, high CRISPRscan score) or purple demarcation (high predicted KO-score, low predicted CRISPRscan score). (B) Violin plot illustrating that highest-in-class gRNAs and lowest-in-class gRNAs have a higher predicted percentage of repair by microhomology-mediated end joining than a random selection of guides. (****pâ<â0.001âTable S2). (C) No distinct difference in calculated CRISPRscan scores between highest-in-class gRNAs, lowest-in-class gRNAs and a random selection of gRNAs. (D) Comparison of three pairs of gRNAs targeting the second exon of the tyrosinase gene responsible for pigmentation in X. tropicalis. As these three pairs of guides have very similar genome editing efficiencies, as determined by targeted amplicon sequencing, the impact of differential predicted KO-scores on phenotypic penetrance is revealed. (D, E) Phenotypic scoring is based on retinal pigmentation at Nieuwkoop-Faber stage 38 and a trend is observed where guides with higher predicted KO-scores yield a higher phenotypic score under very similar genome editing efficiencies.
|
|
Fig. S1: ezh2 CRISPR/Cas9 gene editing outcome can be accurately predicted via the online prediction algorithm InDelphi. (A) Column graphs showing overlay of variant calls (%) between in vivo observations and in silico predictions (B) Pearson correlation with significance interval between in vivo observations and in silico predictions for the ezh2 gRNA.
|
|
Fig. S2: Pearson correlations between in vivo observed (obtained by targeted amplicon sequencing) and respective in silico predicted variant frequencies for 28 gRNAs injected in X. tropicalis embryos. gRNAs are injected as Cas9/gRNA-ribonucleoprotein complexes at early developmental stages (2 to 8 cell stage). Target regions are PCR amplified and sequenced using MiSeq sequencing (Illumina) and raw data is processed using the BATCH-GE analysis software. In silico predictions are generated by the InDelphi software algorithm. Plots show correlations between in vivo observed and in silico predicted variant frequencies. x_g1, x_g2, x_g3 refers to different guide RNAs against the same gene. (****p < 0.0001; ***p < 0.001; **p < 0.01).
|
|
Fig. S3: Pearson correlations between in vivo observations (generated by Sanger sequencing and sequence trace deconvolution) and respective in silico predictions of 14 gRNAs injected in X. tropicalis embryos. gRNAs are injected as Cas9/gRNA-ribonucleoprotein complexes at early developmental stages (1-cell stage). Target regions are PCR amplified and sequenced using Sanger sequencing and deconvoluted using the Inference of CRISPR Edits (ICE) algorithm. In silico predictions are generated by the InDelphi software algorithm. Plots show correlations between in vivo observed and in silico predicted variant frequencies. x_g1, x_g2 refers to different guide RNAs against the same gene. (****p < 0.0001; ***p < 0.001; **p < 0.01; *p < 0.05; ns = not significant).
|
|
Fig. S4: Pearson correlations between in vivo observations (generated by Sanger sequencing and sequence trace deconvolution) and respective in silico predictions of 10 gRNAs injected in X. laevis embryos. gRNAs are injected as Cas9/gRNA-ribonucleoprotein complexes at early developmental stages (1-cell stage). Target regions are PCR amplified and sequenced using Sanger sequencing and deconvoluted using the Inference of CRISPR Edits (ICE) algorithm. In silico predictions are generated by the InDelphi software algorithm. Plots show correlations between in vivo observed and in silico predicted variant frequencies. Gene name_S and gene name_L refers to the two homeologues of a particular gene present on the small and large chromosome, respectively. (****p < 0.0001; **p < 0.01; *p < 0.05; ns = not significant).
|
|
Fig. S5: Pearson correlations between in vivo observations (generated by targeted amplicon sequencing) and respective in silico predictions of 15 gRNAs injected in zebrafish embryos. gRNAs are injected as Cas9/gRNA- ribonucleoprotein complexes at early developmental stages (1 cell stage). Target regions are PCR amplified and sequenced using MiSeq sequencing (Illumina) and raw data is processed using the BATCH-GE analysis software. In silico predictions are generated by the InDelphi software algorithm. Plots show correlations between in vivo observed and in silico predicted variant frequencies. x_g1, x_g2, x_g3 refers to different guide RNAs against the same gene. (****p < 0.0001; ***p < 0.001; **p < 0.01).
|
|
Fig. S6: Pictures from eyes of tyrosinase mutant embryos with their associated threshold mask used for quantification.
|