(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Genetic diversity and evolutionary convergence of cryptic SARS- CoV-2 lineages detected via wastewater sequencing [1] ['Devon A. Gregory', 'Department Of Molecular Microbiology', 'Immunology', 'University Of Missouri-School Of Medicine', 'Columbia', 'Missouri', 'United States Of America', 'Monica Trujillo', 'Department Of Biological Sciences', 'Geology'] Date: 2022-12 Wastewater-based epidemiology (WBE) is an effective way of tracking the appearance and spread of SARS-COV-2 lineages through communities. Beginning in early 2021, we implemented a targeted approach to amplify and sequence the receptor binding domain (RBD) of SARS-COV-2 to characterize viral lineages present in sewersheds. Over the course of 2021, we reproducibly detected multiple SARS-COV-2 RBD lineages that have never been observed in patient samples in 9 sewersheds located in 3 states in the USA. These cryptic lineages contained between 4 to 24 amino acid substitutions in the RBD and were observed intermittently in the sewersheds in which they were found for as long as 14 months. Many of the amino acid substitutions in these lineages occurred at residues also mutated in the Omicron variant of concern (VOC), often with the same substitutions. One of the sewersheds contained a lineage that appeared to be derived from the Alpha VOC, but the majority of the lineages appeared to be derived from pre-VOC SARS-COV-2 lineages. Specifically, several of the cryptic lineages from New York City appeared to be derived from a common ancestor that most likely diverged in early 2020. While the source of these cryptic lineages has not been resolved, it seems increasingly likely that they were derived from long-term patient infections or animal reservoirs. Our findings demonstrate that SARS-COV-2 genetic diversity is greater than what is commonly observed through routine SARS-CoV-2 surveillance. Wastewater sampling may more fully capture SARS-CoV-2 genetic diversity than patient sampling and could reveal new VOCs before they emerge in the wider human population. During the COVID-19 pandemic, wastewater-based epidemiology has become an effective public health tool. Because many infected individuals shed SARS-CoV-2 in feces, wastewater has been monitored to reveal infection trends in the sewersheds from which the samples were derived. Here we report novel SARS-CoV-2 lineages in wastewater samples obtained from 3 different states in the USA. These lineages appeared in specific sewersheds intermittently over periods of up to 14 months, but generally have not been detected beyond the sewersheds in which they were initially found. Many of these lineages may have diverged in early 2020. Although these lineages share considerable overlap with each other, they have never been observed in patients anywhere in the world. While the wastewater lineages have similarities with lineages observed in long-term infections of immunocompromised patients, animal reservoirs cannot be ruled out as a potential source. Funding: This project has been funded in part with federal funds from the NIDA/NIH ( www.nida.nih.gov/ ) under contract numbers 1U01DA053893-01 to JW and MCJ and by the New York City Department of Environmental Protection ( www.nyc.gov/dep ) under contract number 1484-RDOP to JJD. This work was supported by financial support through Rockefeller Regional Accelerator for Genomic Surveillance ( www.rockefellerfoundation.org ,133 AAJ4558), Wisconsin Department of Health Services Epidemiology and Laboratory Capacity funds ( www.dhs.wisconsin.gov , 144 AAJ8216) to DHO. The work was supported by funds from the California Department of Health ( www.dhcs.ca.gov/ ) to RSK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The DEP played no role in study design, data collection, analysis or preparation of the manuscript. However, they did require that they review the manuscript and approve its publication. Data Availability: The MO raw sequence reads are available in NCBI’s SRA under the BioProject accession PRJNA748354. The NY raw sequence reads are available in NCBI’s SRA under the BioProject accession PRJNA715712. The indicated NCBI SRA data can be found at https://www.ncbi.nlm.nih.gov/sra . The script used for the haplotype condensation can be found at https://github.com/degregory/SARS2_Cryptic_WW/blob/main/Deconv_condenser.py . This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. Here we describe an expanded set of cryptic lineages from multiple locations around the United States. While each sewershed contains its own signature lineages and at least some of the lineages appear to have diverged independently from one another, we present evidence that some likely shared a common ancestor. Finally, we show evidence of strong positive selection and rapid divergence of these lineages from ancestral SARS-CoV-2. Using our targeted sequencing approach, we identified and previously reported circulating VOCs in different sewersheds around the United States [ 8 , 9 ]. Variant frequencies in these sewersheds closely tracked VOCs frequency estimates from clinical sampling in the same areas [ 8 , 9 ]. However, in some locations, we noted the presence of cryptic lineages not observed in clinical samples anywhere in the world. Several of these lineages contained amino acid substitutions that were rarely reported in global databases such as gisaid.org [ 22 – 24 ] (e.g., N460K, Q493K, Q498Y, and N501S) [ 8 ]. Interestingly, polymorphisms in these lineages show considerable overlap with the Omicron VOC and with each other, suggesting convergent evolution due to similar selective pressures. A. Schematic of regions targeted by the RBD and S1 primer sets (see Methods for primer sequences). Overview of the SARS-COV-2 Spike RBD lineages identified in B. the MO33 sewershed and C. the MO45 sewershed. Each row represents a unique lineage and each column is an amino acid position in the Spike protein (left). Amino acid changes similar to (green boxes) or identical to (orange boxes) changes in Omicron (BA.1, BA.2 or BA.5) are indicated. Synonymous changes (syn) are indicated in gray. The major US VOCs (Alpha, Beta, Gamma, BA.1, BA.2, and BA.5) are indicated. The heatmap (right) illustrates lineage (row) detection by date (column), colored by the log 10 percent relative abundance of that lineage. Uncondensed output in S1 and S2 Data. To address these issues, we developed a “targeted” sequencing approach that amplifies and sequences the Spike RBD of the SARS-CoV-2 genome as a single amplicon ( Fig 1A ) [ 8 , 9 ]. Since the Spike RBD is relevant to SARS-CoV-2 infectivity, transmission, and antibody-mediated neutralization [ 17 – 21 ], this approach ensures that the RBD receives high sequencing coverage. Additionally, RBD sequencing enables linkage of polymorphisms, forming short, phased haplotypes [ 16 ]. These phased haplotypes permit easier lineage identification, even at low concentrations, if the targeted sequence (s) are rich in lineage-defining polymorphisms [ 9 ]. The continuing evolution of SARS-CoV-2 [ 10 ] and the appearance of variants of concern (VOC), such as the Omicron VOC [ 11 ], highlight the importance of maintaining a vigilant watch for the emergence of unexpected, novel variants. The fact that the origins and early spread of the Alpha and Omicron VOCs were not observed strongly motivates efforts to detect and monitor novel variants [ 12 ]. However, whole genome sequencing of SARS-CoV-2 RNA isolated from wastewater often suffers from low sequencing depth of coverage in epidemiologically relevant areas of the genome, such as the Spike receptor binding domain (RBD) [ 13 – 15 ]. Additionally, because wastewater may contain a mixture of viral lineages and whole genome sequencing relies on sequencing small fragments of the genome, computational strategies to identify variants with linked mutations often fail to identify lineages present at low concentrations [ 16 ]. These features have made it difficult to detect unexpected, novel variants from wastewater samples from whole genome sequencing data. SARS-CoV-2 is shed in feces of infected individuals [ 1 , 2 ], and SARS-CoV-2 RNA can be extracted and quantified from community wastewater to provide estimates of SARS-CoV-2 community prevalence [ 3 , 4 ]. This approach is especially powerful since it randomly samples all community members and can detect viruses shed by individuals whose infections are not recorded, such as asymptomatic individuals, those who abstain from testing, or those who test at home [ 5 , 6 ]. Additionally, SARS-CoV-2 RNA isolated from wastewater can be sequenced using high-throughput sequencing technologies to define the composition of variants in the community [ 7 – 9 ]. 2. Results Beginning in early 2021, wastewater surveillance programs including RBD amplicon sequencing (Fig 1A) were independently implemented in Missouri [9] and NYC [25]. A similar strategy was subsequently adopted in California by the University of California, Berkeley wastewater monitoring laboratory (COVID-WEB). All of the sequence output was analyzed with our previously described SAM Refiner pipeline [9], which is designed to remove PCR-generated chimeric sequences. While the vast majority of sequences observed with this method matched to known lineages identified in patients, reproducible lineages that did not match the known circulating lineages were also detected. Herein, we refer to each RBD haplotype with a unique combination of amino acid changes as a lineage, and combinations of lineages that all have specific amino acid changes in common as lineage classes. Amino acid combinations identified that have not been seen previously from patients are referred to as cryptic lineages. Here we describe cryptic lineages detected from January 1, 2021 through March 15, 2022. For display purposes, for most sewersheds (those with >3 cryptic lineage-positive samples) individual polymorphisms were only displayed if they were present in at least two independent samples. Further, individual lineages were only displayed if they were over 2% of the total signal in at least one sample, or were present in at least 2 independent samples. The detailed display criteria is outlined in Materials and Methods. The complete uncompressed data sets are included in S1–S9 Data. 2.3 Long-read sequencing of S1 identifies substantial NTD modifications and suggests high dN/dS ratio With each sample that contained novel cryptic lineages, attempts were made to amplify a larger fragment of the S1 domain of Spike. Amplification of larger fragments from wastewater is often inefficient, but sometimes can be achieved. To gain more information about the S1 domain of Spike and independently confirm the authenticity of the RBD lineages, we optimized a PCR strategy that amplifies 1.6 kb of the SARS-COV-2 Spike encompassing amino acids 57–579. These fragments were then either subcloned and sequenced or directly sequenced using Pacific Biosciences HiFi sequencing (Fig 7A). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 7. S1 amplifications. A. Overview of the SARS-COV-2 Spike S1 lineages in the Alpha, Delta, Omicron VOCs and six of the sewersheds with cryptic lineages. S1 amplifications were sequenced by subcloning (SC) and Sanger sequencing, or were sequenced using a PacBio (PB) deep sequencing. B. Plot of the number of synonymous and non-synonymous changes in the S1 sequences shown. https://doi.org/10.1371/journal.ppat.1010636.g007 The S1 amplification from the MO33 and MO45 sewersheds contained the RBD amino acid changes previously seen and each contained 3 additional amino acid changes upstream from the region sequenced using the targeted amplicon strategy described above (Fig 7A). Many of the S1 amplifications from the NY10, NY11, NY13 and NY14 sewersheds contained numerous changes in S1 (Fig 7A). In particular, many of the sequences contained deletions near amino acid positions 63–75, 144, and 245–248. All three of these areas are unstructured regions of the SARS-COV-2 spike where deletions have been commonly observed in sequences obtained from patients [35]. Two distinct S1 sequences were detected from the NY14 sample collected on June 28, 2021. Interestingly, the first sequence contained 13 amino acid changes which matched the RBD sequences from the same sewershed. The second sequence did not match any lineage that had been seen before, though it contained several mutations that were commonly seen in other cryptic lineages (see section 2.2). This second sequence presumably represented a unique lineage that had not been detected by routine wastewater surveillance. A single S1 sequence was obtained from the NY13 samples collected on October 31, 2021. This sequence generally matched the RBD sequence from the same date, but did contain minor variations. Importantly, the S1 sequence contained deletions at positions 69–70 and 144, which, along with the amino acid changes N501Y and A570D, match the changes found in the Alpha VOC lineage. This information is consistent with the NY13 lineages being derived from the Alpha VOC. Comparing the number of non-synonymous to synonymous mutations in a sequence can elucidate the strength of positive selection imposed on a sequence. The ratios of non-synonymous and synonymous mutations in this region of S1 from the Alpha, Delta, and Omicron VOCs (BA.1) were 19/0, 2/0, and 4/1, respectively. It was not possible to calculate the formal dN/dS ratios since many of the sequences did not have synonymous mutations in this region, so instead the numbers of non-synonymous and synonymous mutations were plotted. The cryptic lineages contained 5 to 25 total non-synonymous mutations and 0 to 2 total synonymous mutations (Fig 7B). [END] --- [1] Url: https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1010636 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/