CRISPR-based genome engineering revolutionized the gene editing field by making experimental workflows considerably easier, faster, and more efficient than previous methods. Still, generating reliable results from CRISPR edit data requires the help of robust software tools. As a consequence, a critical step in the gene editing workflow – analyzing the data – is often under-appreciated or over-looked.
Synthego has developed a new tool called ICE (short for Inference of CRISPR Edits) to make analyzing CRISPR experiments easier than ever. This tool was initially created to support the CRISPR analysis needs of Synthego’s scientists – there were simply no other suitable software tools available. We took what we learned from analyzing the results of all the CRISPR experiments we performed to build the ICE analysis tool. It’s now free for everyone to use at ice.synthego.com.
We rigorously evaluated the effectiveness of the ICE tool by analyzing thousands of edits performed over multiple experiments and comparing the robustness, accuracy, and speed with existing tools. For example, we examined ICE analysis of Sanger sequencing data alongside the analysis of NGS-based amplicon sequencing data and found that the accuracy of ICE analysis results was highly comparable (within r2=0.96) to that of NGS data.
Advantages of ICE
Determine Editing Efficiency and Types of Edits with Low-Cost Sanger Sequencing
ICE uses Sanger sequencing data to produce quantitative, NGS-quality analysis of CRISPR editing – enabling a ~100-fold reduction in cost relative to NGS-based amplicon sequencing. To use the tool, you start by uploading your Sanger sequencing files, either one at a time or as a batch of hundreds, and indicate the guide RNA sequence you used. The ICE tool will calculate overall editing efficiency and determine the profiles of all the different types of edits that are present and their relative abundances.
The ICE tool also provides a Knockout score (KO-score) which represents the proportion of cells that have either a frameshift or 21+ bp indel. This score is a useful measure for those who are interested in understanding how many of the contributing indels are likely to result in a functional KO of the targeted gene.
Analyze Complex Edits
Traditional Sanger sequencing-based analysis tools are unable to detect or analyze complex edits, such as those generated by delivering multiple sgRNAs to the cells at once (referred to as “multiplexing”). Multiplexing is a commonly used strategy for creating functional Knockouts or large deletions. Thus the ability to analyze and interpret data about these types of edits is critical to the CRISPR workflow. We addressed this in our ICE algorithm by enabling the analysis of edits resulting from multiple sgRNA targets. The tool also includes visual representations of all detected edit types in the sample. The visualization helps to illustrate which of the multiplexed sgRNA was involved in the particular edit and how it was involved.
Use ICE to easily detect large indels in your #CRISPR data without having to do #NGS. Click To Tweet
How ICE Works
Following delivery of CRISPR components into target cells, genomic DNA from both edited and unedited (control) populations is PCR-amplified and Sanger sequenced. ICE compares these sequence traces to give a detailed analysis of CRISPR editing. ICE software identifies the percentage of genomes that have been successfully modified with insertions or deletions (indels) and then characterizes the sequence and abundance of each particular indel.
For a complete guide to ICE analysis, including detailed explanation and examples, read the paper on bioRxiv.
Using the ICE Tool
Learning how to use the ICE software tool is quick and easy. Simply upload your Sanger sequencing files and provide basic information such as your sample names and guide sequences, and ICE will do the rest. There are no parameters that need optimizing and no complicated steps to learn. For increased flexibility and scalability, the ICE software has two analysis formats: “sample by sample” analysis, which can compare up to five editing experiments at a time, and “batch” analysis, which compares hundreds of samples simultaneously.
The current version of ICE software can analyze indels that result from single or multiplex CRISPR-Cas9 double-strand DNA breaks using SpCas9. More features will be added to the ICE software tool soon. To request other nucleases or other features and functionalities, please email email@example.com.
Overview of ICE Editing Analysis
Once the analysis is complete, a new screen appears with a graphical representation of the results and a list of the analyzed samples (see below).
If the sample run had no issues, the analysis window shows a green checked circle in front of the sample name. If there was a minor error during processing, the window shows a yellow checked circle. Typically, a yellow check mark indicates that a particular parameter needs to be adjusted to generate results. ICE automatically adjusts these parameters. If there are no results or there was a processing error, you will see a red exclamation point in front of that sample. You can hover over the yellow or red circles to gather details on the issues associated with each sample.
Successfully analyzed samples will display the following parameters:
Sample– The unique label name that you provided for each sample.
Guide Target– This is the user-defined 17-23 nucleotide sequence of the DNA-targeting region of the guide RNA, excluding the PAM sequence.
PAM Sequence– The Protospacer Adjacent Motif (PAM) sequence for the nuclease used. Currently, ICE is configured for the nuclease from Streptococcus pyogenes (SpCas9).
ICE– The editing efficiency (percentage of the pool with non-wild type sequence) as determined by comparing the edited trace to the control trace. In the ICE algorithm, potential editing outcomes are proposed and fitted to the observed data using linear regression.
R2 Score– When the ICE linear regression is computed during generation of the ICE Score, the Pearson correlation coefficient (r) is also computed and reported. The higher the R2 value, the more confident you can be in the ICE score.
KO-Score– Represents the proportion of cells that have either a frameshift or 21+ bp indel. This score is a useful measure for those who are interested in understanding how many of the contributing indels are likely to result in a functional Knockout (KO) of the targeted gene.
The analysis can be sorted by any of the parameters displayed in the summary table. You can also use your browser’s “Control+F” (or “Command+F” on a Mac) functionality to search for a particular sequence or name. You can perform more in-depth analyses on each sample by clicking on the sample name or on its corresponding bar graph entry. Clicking to initiate the in-depth analysis will open up a new window with three tabs. Each of the three tabs – Contributions, Indel Distribution, and Traces – provides particular details about the indel profile of the edited sample.
You can download the entire analysis as a ZIP file by clicking the “Download Analysis Data” button on the bottom right of the analysis screen.NGS-quality #CRISPR analysis from Sanger sequencing data. Click To Tweet
ICE Analysis Details
The “Contributions” tab shows the inferred sequences present in your edited population and their relative representation in the edited pool. The black vertical dotted line represents the cut site, and “+” symbol on the far left marks the wild type. If you are viewing a multiplex sample, the cut site will be aligned to the most upstream cut site.
In the “Indel Distributions” tab, you’ll find an Indel plot which displays the inferred distribution of indel sizes in the entire edited population of genomes. Hovering over each bar of the Indel plot shows the size of the insertion or deletion (+ or – 1 or more nucleotides), along with the percentage of genomes that contain it.
The discordance plot shows the level of disagreement between the non-edited wild type (control) and the edited sample in the inference window (the region around the cut site). It shows, base-by-base, the average amount of signal that disagrees with the reference sequence derived from the control trace file. On the plot, the green (edited sample) and orange (control sample) lines should be close together before the cut site, and a typical CRISPR edit results in a jump in the discordance near the cut site and continuing after the cut site (representing a high level of sequence discordance).
The “Traces” tab shows the edited and control, non-edited Sanger traces in the region around the guide binding site(s). The sequence base calls from the .ab1 file are also shown above each trace. The horizontal black underlined region represents the guide sequence, and the horizontal red underline is the PAM site. The vertical black dotted line represents the cut site. Cutting and error-prone repair typically result in mixed sequencing bases downstream of the cut.
Finally a good data analysis tool for the last step in the Design → Edit → Analyze #CRISPR workflow. Click To Tweet