Rapid Quantitative Evaluation of CRISPR Genome Editing by TIDE and TIDER

Eva Karina Brinkman, Bas van Steensel

Published: 2021-09-03 DOI: 10.17504/protocols.io.bqzmmx46

DNA mutational analysis/methods

Abstract

Current genome editing tools enable targeted mutagenesis of selected DNA sequences in many species. However, the efficiency and the type of introduced mutations by the genome editing method are largely dependent on the target site. As a consequence, the outcome of the editing operation is difficult to predict. Therefore, a quick assay to quantify the frequency of mutations is vital for a proper assessment of genome editing actions. We developed two methods that are rapid, cost-effective, and readily applicable: (1) TIDE, which can accurately identify and quantify insertions and deletions (indels) that arise after introduction of double strand breaks (DSBs); (2) TIDER, which is suited for template-mediated editing events including point mutations. Both methods only require a set of PCR reactions and standard Sanger sequencing runs. The sequence traces are analyzed by the TIDE or TIDER algorithm (available at https://tide.nki.nl https://tide.nki.nl or https://deskgen.com). The routine is easy, fast, and provides much more detailed information than current enzyme-based assays. TIDE and TIDER accelerate testing and designing of DSB-based genome editing strategies.

Steps

3.1 Control and Experimental Sample Generation

For both methods genomic DNA is isolated from the cell pool that was transfected with the nuclease or guide RNA alone (control) and from cells exposed to both Cas9 and guide RNA (experimental sample). For TIDER the experimental sample is also co-transfected with the donor template.

Then a region of about 500–1500 base pairs around the target site is amplified by PCR from DNA of the control and experimental sample (Fig. 1a, b).

Fig. 1Method to generate the required input samples for TIDE and TIDER. Control and test samples can be obtained by PCR using primers spanning the CRISPR target site (primers a, b). The reference sequence (TIDER only) can be created in a similar way as site-directed mutagenesis [16] (see Section 3.2 for detailed explanation)

Next, the PCR amplicons are subjected to conventional Sanger sequencing. In the PCR product of the experimental sample, the sequence trace may consist of a combination of multiple sequences derived from unmodified DNA and DNA that has acquired a mutation (Fig. 2a).

Fig. 2Overview of TIDE and TIDER algorithm. Due to imperfect repair (and repair by homology-directed repair with a donor template) after cutting by a targeted nuclease, the DNA in the cell pool consists of a mixture of indels (and designed mutations). The various introduced mutations in the pool are disentangled by TIDE or TIDER. (a) TIDE requires as input a guide RNA sequence string and two sequences are required: (1) wild-type control, (2) composite test sample. (b) For quality control the aberrant sequence signal is visualized in control (black) and treated sample (green), the expected break site (vertical dotted line), region used for alignment (pink bar), and the region used for decomposition (gray bar). A constant composite sequence signal is yielded after the break site. (c) Trace decomposition yields the spectrum of indels with their frequencies. (d) In presence of +1 insertions, the base composition is estimated. (e) Input files for TIDER are identical to TIDE and one additional sequence file with designed mutations in the used donor template. (f) Quality plot showing only the proportion of desired mutated nucleotide(s) as designed in donor template that is/are present in the control (black) and treated sample (green). The region for alignment (pink bar) and decomposition (gray bar) as used in TIDER are represented. (g and h) Decomposition gives the spectrum of indels (g) and the HDR events (h) with their frequencies

3.2 Reference Sample Generation (TIDER Only)

TIDER is required for genome editing experiments in the presence of a donor template. In addition to the control and experimental sample trace ( see Section 3.1), TIDER requires one extra Sanger sequencing trace called “reference.” The reference is similar to the control sequence, except that it carries the desired base pair changes as designed in the donor template (Fig. 2e). There are two paths to obtain the reference sequence as described below:

4.1.

The reference sequence can be easily created in a 2-step PCR protocol based on site-directed mutagenesis [16].

Here, two additional primers are required that overlap and carry the desired mutation(s) (mutated primers c, d, which are reverse complement of each other) (Fig. 1c). These primers are used in combination with the primers used for the amplification of the control and experimental sample (control primers a, b). The control forward primer a is combined with the reverse mutated primer c and the forward mutated primer d with the control reverse primer b, resulting in two PCR amplicons that incorporate the designed mutations.

Then the two amplicons are denatured and hybridized at the complementary ends in an annealing reaction. The second PCR uses the annealing mixture as a template and the control forward and reverse (primers a and b) as primers. This PCR starts with an extension step followed by exponential amplification. This results in a PCR product carrying the designed mutations ( see Notes 2 and 3 ).

4.2.

Alternatively, the reference DNA can be ordered as synthesized DNA. The design should include a similar DNA code as the PCR product of the control sample, except that it should carry the designed mutation(s) as in the donor template. The annealing sequences for the forward and reverse primers (a, b) should also be present in the synthesized fragment. Similar to the control and test sample, the reference can be amplified with primer a, b ( see Note 3 ).

3.3 Web Tool

Process the PCR products of the control, optional reference, and experimental sample by conventional Sanger sequencing.

The resulting sequence trace files (.ab1 or .scf format) are then uploaded into the TIDE or TIDER web tool (both available at http://tide.nki.nl http://tide.nki.nl and https://deskgen.com). In addition, a character string representing the guide RNA sequence (20 nt) is required as input ( see Notes 4 and 5 ).

Then, the software will perform several calculations.

First, the guide RNA sequence is aligned to the control sequence in order to determine the position of the expected Cas9 break site.

Next, in all Sanger sequence traces an alignment window is automatically selected that runs from 100 to 15 bp upstream of the break site. The sequence segment in this window of the experimental sample (and the optional reference) is aligned to that of the control in order to determine any offset between the sequence reads. Users may change the default settings for these calculations, which is necessary when alignment problems occur with these settings ( see Notes 6 and 7 ).

Subsequently, two output plots are generated: one plot that can help with quality control and one that displays the indel/HDR spectrum.

3.4 Quality Control

For generation of the quality control plot the signals of all nucleotides: A, G, T, C at each position in the sequence file are used. In general, each position in the sequence trace is represented by one predominant nucleotide signal indicative of the actual nucleotide. The minor signals from the other three nucleotides are normally considered as background. In TIDE(R) the percentage of these aberrant nucleotides is plotted along the sequence trace of the control and the experimental sample. Thus, a value of 0% at a position indicates that the detected nucleotide does not differ from the control sequence while a value of 100% indicates that the expected nucleotide was not detected at all (and instead only one or more of the other three nucleotides) (Fig. 2b). The percentages of aberrant nucleotides in the control should be low along the whole sequence trace. However, the experimental sample consists of a mixture of multiple sequences due to the presence of indels and possible point mutations. Around the break site the sequences start to deviate from the control, which is visible with consistently elevated signal of the aberrant sequence signal. Note that there is a 25% chance that an identical nucleotide in a mutated sequence is found as is present in the control sequence at the same position, because there are only 4 different nucleotides available. This plot allows the user to visually inspect the sequence deviation caused by the targeted nuclease and enables to verify the alignments and quality of the data. It is important to confirm that:

(1) the break site is located as expected,

(2) the aberrant signal is only increasing around the break site and

(3) remains elevated downstream of the break site.

The sequence trace downstream of the break site is decomposed into its individual sequence components. The region used for this purpose is marked as the decomposition window. All parameters in TIDE(R) have default settings but can be adjusted if necessary. The user can interactively change the alignment and decomposition windows. Choosing a different decomposition window is often a remedy to circumvent locally poor sequence traces, which should be avoided ( see Notes 8-10 ).

For TIDER two additional quality plots are generated.

In one, the aberrant signal of the reference trace compared to the control trace is plotted. This can be used to verify whether the designed mutation(s) is/are present at the expected location.

In the second one, the percentage of the designed mutation(s) present in the experimental sample is plotted, representing the relative incorporation of the donor template (Fig. 2f).

3.5 Mutation Detection by Decomposition

10.

For the detection of individual mutations with the corresponding frequencies, the TIDE and TIDER software perform a decomposition of the mixed sequence signal in the experimental sample. This composite sequence trace is a linear combination of the wild type (control) and the mutated sequences. For TIDE, the decomposition is performed on a sequence segment downstream of the break site. As a rule of thumb, the larger the decomposition window is chosen, the more robust the estimation of mutations is ( see Note 9 ).

To perform the decomposition, generate a set of sequence trace models that contain all possible indels of size {0..n} (n is by default set to 10).

The models are derived from the control trace and contain all nucleotide peak signals of the decomposition window shifted by the appropriate number of positions to the left or right.

A wild-type trace (shift 0) is also added as a model.

Then, using non-negative linear modeling the combination of trace models that can best explain the composite sequence trace in the experimental sample is determined (Fig. 2c) ( see Note 11 ). An R ²value is calculated as a measure of the goodness of fit ( see Notes10 and 12 ), and the statistical significance of the detection of each indel is calculated.

11.

For TIDER the mutation detection is more complex. It is mandatory that the decomposition window in TIDER covers the location of the designed mutation(s) in the donor template ( see Notes 9 and 13 ). In contrast to TIDE, the decomposition window of TIDER spans by default only 100 bp. In case only few base pair changes are introduced, the sequence with the designed mutation will be very similar to the wild-type sequence. The smaller decomposition window of TIDER emphasizes the difference between the control and reference better.

Simulations of all possible insertions and deletions are generated from the control file and placed in a decomposition matrix together with the control and reference. Subsequently, decomposition of the experimental sample is performed thereby choosing the best combination of the models in the decomposition matrix. This results in an estimation of the incorporation frequency of template-directed mutation(s) and distinguishes these from the background of indels that are introduced by error-prone repair ( see Note 14 ).

12.

The reliability of TIDE and TIDER depends on the quality of the input samples ( see Note 15 ). For an accurate TIDE(R) estimation it is recommended that (1) R ² > 0.9 and (2) aberrant signals upstream of the break site are below 10% in the quality plot. This applies to all files: control, reference, and experimental sample. To verify the results the samples can be sequenced from the opposite strand ( see Note 13 ).

3.6 Sequence Determination of the +1 Insertion (TIDE Only)

13.

During repair of CRISPR-Cas9 a single base pair is frequently inserted at one of the DNA ends of the break [13, 17, 18]. TIDE provides an estimate of the base composition of this insertion. This may be of interest if one wishes to obtain a particular sequence variant (Fig. 2d). For longer insertions this base calling is computationally complicated and currently not implemented.