(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Identification of SaCas9 orthologs containing a conserved serine residue that determines simple NNGG PAM recognition [1] ['Shuai Wang', 'State Key Laboratory Of Genetic Engineering', 'School Of Life Sciences', 'Zhongshan Hospital', 'Fudan University', 'Shanghai', 'Chen Tao', 'Huilin Mao', 'Linghui Hou', 'Yao Wang'] Date: 2022-12 Due to different nucleotide preferences at target sites, no single Cas9 is capable of editing all sequences. Thus, this highlights the need to establish a Cas9 repertoire covering all sequences for efficient genome editing. Cas9s with simple protospacer adjacent motif (PAM) requirements are particularly attractive to allow for a wide range of genome editing, but identification of such Cas9s from thousands of Cas9s in the public database is a challenge. We previously identified PAMs for 16 SaCas9 orthologs. Here, we compared the PAM-interacting (PI) domains in these orthologs and found that the serine residue corresponding to SaCas9 N986 was associated with the simple NNGG PAM requirement. Based on this discovery, we identified five additional SaCas9 orthologs that recognize the NNGG PAM. We further identified three amino acids that determined the NNGG PAM requirement of SaCas9. Finally, we engineered Sha2Cas9 and SpeCas9 to generate high-fidelity versions of Cas9s. Importantly, these natural and engineered Cas9s displayed high activities and distinct nucleotide preferences. Our study offers a new perspective to identify SaCas9 orthologs with NNGG PAM requirements, expanding the Cas9 repertoire. Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: authors have applied a patent related to the work. Funding: This work was supported by grants from the National Key Research and Development Program of China (2021YFA0910602, 2021YFC2701103 to YW), the National Natural Science Foundation of China (82070258, 81870199 to YW),Open Research Fund of State Key Laboratory of Genetic Engineering, Fudan University (No. SKLGE-2104 to YW) and Science and Technology Research Program of Shanghai (19DZ2282100 to YW). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2022 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Cas9 nucleases with flexible PAM requirements are crucial for large-scale genome editing. We previously developed Cas9 nucleases with highly flexible NNGG PAMs recognition [ 13 , 14 ]. To rapidly identify additional natural Cas9 nucleases recognizing NNGG PAMs, we compared the PAM-interacting (PI) domains of SaCas9 orthologs and found that the serine residue corresponding to SaCas9 N986 was associated with the NNGG PAM. We identified five additional SaCas9 orthologs recognizing the NNGG PAM. We further engineered two of them to improve the specificity. Our study expands the Cas9 repertoire and provides a foundation to search for Cas9s with NNGG PAMs in the future. Editing efficiency is a major hurdle of the CRISPR system. Every Cas nuclease has its own nucleotide preference [ 7 ]. For example, SpCas9 prefers guanine-rich sequences [ 8 ], while AsCas12a prefers adenine-rich sequences [ 9 ]. SpCas9 is generally considered the most efficient Cas nuclease, whose efficiency varies from 0% to approximately 100% depending on the target sequences [ 8 ]. Although previous studies have focused on limitations of the PAM [ 10 – 12 ], the sole presence of a PAM within a locus does not guarantee that it can be efficiently edited. For high efficiency of genome editing to be achieved, it is essential to establish a Cas9 repertoire that can accommodate all sequences. The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-RNA-guided Cas endonuclease system is based on the bacterial adaptive immune system and has been utilized as a fast and efficient method for precise genome editing [ 1 – 6 ]. This system is made up of two main components: a Cas9 nuclease and a chimeric single-guide RNA (sgRNA) derived from CRISPR RNA (crRNA) and the trans-activating crRNA (tracrRNA) [ 2 ]. Cas9 and sgRNA combine to form a complex that recognizes the target DNA that is complementary to the 5′ end of the sgRNA [ 2 ]. In addition to sgRNA-target DNA complementarity, DNA recognition requires a specific DNA sequence known as protospacer adjacent motif (PAM), flanking the target sequence [ 2 ]. The PAM allows the Cas nuclease to discriminate between the target DNA and the DNA sequence encoding the sgRNA but also restricts its ability to target any sequence in the genome. Results Genome editing for endogenous loci Next, we tested the capacity of these Cas9s for genome editing at selected endogenous sites in HEK293T cells. Five days after transfection of Cas9 and sgRNA expression plasmid DNA, we extracted genomic DNA and amplified target sites by PCR. As an initial screen, we used the T7EI assay to rapidly analyze the efficiency for each Cas9. SmiCas9, Sha2Cas9, and SpeCas9 displayed higher editing efficiency, while SwaCas9 and Swa2Cas9 displayed lower editing efficiency (S4A and S4B Fig). In the subsequent experiments, we only focused on SmiCas9, Sha2Cas9, and SpeCas9. We compared the activity of these three Cas9s to that of SaCas9 at 13 endogenous sites with NNGGRT PAMs. All tested Cas9s were expressed from the same construct and achieved similar expression levels, as revealed by western blot (Fig 3A and 3B). All four Cas9 nucleases generated indels with different efficiencies depending on the target sites in HEK293T cells (Fig 3C). Interestingly, these Cas9s displayed different activities at some sites. For example, Sha2Cas9 displayed higher activity at site E0, while SmiCas9 and SpeCas9 displayed higher activity at site G10. SaCas9 displayed lower efficiency than newly identified Cas9s at sites G3 and G9. These data demonstrated that these Cas9s prefer distinct target sequences. Overall, SaCas9, Sha2Cas9, and SpeCas9 displayed comparable activities, while SmiCas9 displayed lower activity (Fig 3D). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 3. Genome editing for endogenous sites. (A) Schematic of the Cas9 expression constructs. (B) Protein expression level of Cas9s was measured by western blot. Cells without Cas9 transfection was used as a negative control. (C) Comparison of SaCas9, SmiCas9, Sha2Cas9, and SpeCas9 efficiency for genome editing at 13 endogenous loci. Additional “g” is added for U6 promoter transcription (n = 3). Underlying data for all summary statistics can be found in S1 Data. (D) Quantification of editing efficiency for SaCas9, SmiCas9, Sha2Cas9, and SpeCas9. Underlying data for all summary statistics can be found in S1 Data. https://doi.org/10.1371/journal.pbio.3001897.g003 Specificity of SmiCas9, Sha2Cas9, and SpeCas9 Next, we compared the specificity of SmiCas9, Sha2Cas9, SpeCas9, and SaCas9 using the GFP activation assay. A panel of sgRNAs with dinucleotide mutations along the protospacer was generated to detect the specificity of each Cas9. Off-target cleavage is considered to have occurred when the mismatched sgRNAs induce GFP expression. Overall, SaCas9 and SmiCas9 had negligible off-target effects, while Sha2Cas9 and SpeCas9 displayed moderate off-target effects (S5 Fig). Specifically, SaCas9 was highly sensitive to mismatches at PAM-proximal and PAM-distal positions but relatively less sensitive at middle positions; SmiCas9 displayed minimal off-target effects with mismatches at all positions; and Sha2Cas9 and SpeCas9 were sensitive to mismatches at PMA-proximal positions 18 through 20 but less sensitive at other positions. Recently, Tan and colleagues unraveled the crystal structure of the SaCas9/sgRNA–target DNA complex and identified four amino acid residues (R245, N413, N419, and R654) forming polar contacts within a 3.0-Å distance from the target DNA strand [18]. When one or more of these residues were replaced by alanine, SaCas9 specificity was significantly improved [18]. To investigate whether the specificity of Sha2Cas9 can be improved, we used pairwise alignment to identify the corresponding residues (R247, N415, S421, and R656; S6 Fig) and generated single amino acid mutants by alanine substitution. The GFP activation assay revealed that the R247A and N415A mutations could significantly improve specificity without compromising the on-target activity (S7A and S7B Fig). The R656A mutation also improved the specificity although this was accompanied by markedly decreased on-target activity. We introduced the R247A and N415A double mutations into Sha2Cas9 to generate a high-fidelity version of Cas9 named Sha2Cas9-HF. The GFP activation assay revealed that double mutations further improved its specificity (Fig 4A). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 4. Analysis of Sha2Cas9-HF and SpeCas9-HF specificity. (A) Schematic of the GFP activation assay for specificity analysis is shown on the top. A panel of sgRNAs with dinucleotide mutations is shown below. sgRNA activities were measured based on GFP expression. Mismatches are shown in red (n = 3). Underlying data for all summary statistics can be found in S1 Data. (B) Off-targets for EMX1 locus are analyzed by GUIDE-seq. Read numbers for on- and off-targets are shown on the right. Mismatches compared with the on-target site are shown and highlighted in color. https://doi.org/10.1371/journal.pbio.3001897.g004 We simultaneously identified the corresponding residues for SpeCas9 (R247, N415, S421, and R656; S6 Fig) and generated single amino acid mutants by alanine substitution (S8A Fig). The GFP activation assay revealed that the R247A, N415A, and S421A mutations could significantly improve specificity without compromising the on-target activity (S8B Fig). We introduced the R247A, N415A, and S421A triple mutations into SpeCas9 to generate a high-fidelity version of Cas9 named SpeCas9-HF. The GFP activation assay revealed that triple mutations further improved specificity (Fig 4A). Genome-wide unbiased off-target effects of Sha2Cas9, Sha2Cas9-HF, SpeCas9, and SpeCas9-HF were next evaluated by GUIDE-seq [19]. We evaluated two sites targeting the EXM1 gene and one site targeting the RUNX1 gene. Five days after transfection of the Cas9 plasmid, the sgRNA plasmid, and the GUIDE-seq oligos, we prepared libraries for deep sequencing. Sequencing and analysis showed that on-target cleavage occurred for all Cas9 nucleases at 3 targets, as reflected by the high GUIDE-seq read counts (Fig 4B). High-fidelity versions of Cas9s displayed significantly fewer off-target effects than wild-type Cas9s, reflected by the numbers of off-target sites and off-target read counts. For example, SpeCas9 and SpeCas9-HF generated similar read counts (225,292 versus 202,764) at the EXM1-sg2 site. SpeCas9 induced four off-target sites, while SpeCas9-HF induced two off-target sites. For one off-target, SpeCas9 generated 60,331 read counts, while SpeCas9-HF generated 1,061 read counts. For another off-target, SpeCas9 generated 5,634 read counts, while SpeCas9-HF generated 2 read counts. These data demonstrated that the occurrence of off-target events is significantly lower when using Sha2Cas9-HF and SpeCas9-HF. [END] --- [1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001897 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/