(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Genomic evidence of contemporary hybridization between Schistosoma species [1] ['Duncan J. Berger', 'Wellcome Sanger Institute', 'Hinxton', 'United Kingdom', 'Royal Veterinary College', 'University Of London', 'London', 'Elsa Léger', 'London Centre For Neglected Tropical Diseases Research', 'Imperial College Faculty Of Medicine'] Date: 2022-10 Abstract Hybridization between different species of parasites is increasingly being recognised as a major public and veterinary health concern at the interface of infectious diseases biology, evolution, epidemiology and ultimately control. Recent research has revealed that viable hybrids and introgressed lineages between Schistosoma spp. are prevalent across Africa and beyond, including those with zoonotic potential. However, it remains unclear whether these hybrid lineages represent recent hybridization events, suggesting hybridization is ongoing, and/or whether they represent introgressed lineages derived from ancient hybridization events. In human schistosomiasis, investigation is hampered by the inaccessibility of adult-stage worms due to their intravascular location, an issue which can be circumvented by post-mortem of livestock at abattoirs for Schistosoma spp. of known zoonotic potential. To characterise the composition of naturally-occurring schistosome hybrids, we performed whole-genome sequencing of 21 natural livestock infective schistosome isolates. To facilitate this, we also assembled a de novo chromosomal-scale draft assembly of Schistosoma curassoni. Genomic analyses identified isolates of S. bovis, S. curassoni and hybrids between the two species, all of which were early generation hybrids with multiple generations found within the same host. These results show that hybridization is an ongoing process within natural populations with the potential to further challenge elimination efforts against schistosomiasis. Author summary Schistosomiasis is a chronic and debilitating major neglected tropical disease affecting both humans and livestock. Increasingly, zoonotic spillover of livestock infections, facilitated by hybridization between different Schistosoma species, is increasingly being recognised as a risk to human health. Multiple surveys conducted within endemic regions have found a high prevalence of these hybrid lineages. However, it is often unclear whether these lineages are derived from recent hybridization events, suggesting hybridization is ongoing and may be linked to anthropogenic environmental change, or simply indicators of introgression from ancient hybridization events. To understand the origin and evolution of these hybrid lineages, we produced a chromosomal-scale assembly of Schistosoma curassoni and performed whole-genome sequencing of 21 natural livestock-infective S. curassoni, S. bovis and hybridized schistosome isolates, including multi-stage sampling from the same hosts. Our analyses exclusively identified early generation hybrid lineages, including multiple unrelated generations within the same hosts, suggesting that these hybrids are viable and derived from multiple independent hybridization events. Citation: Berger DJ, Léger E, Sankaranarayanan G, Sène M, Diouf ND, Rabone M, et al. (2022) Genomic evidence of contemporary hybridization between Schistosoma species. PLoS Pathog 18(8): e1010706. https://doi.org/10.1371/journal.ppat.1010706 Editor: Mostafa Zamanian, University of Wisconsin-Madison, UNITED STATES Received: March 1, 2022; Accepted: June 27, 2022; Published: August 8, 2022 Copyright: © 2022 Berger et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Raw Illumina sequencing data generated during this study are available in the European Nucleotide Archive (ENA) repository under study accession number PRJEB22769. The S. curassoni reference assembly and associated raw Hi-C reads are in the ENA repository under study accession number PRJEB44848 and were deposited in Zenodo (https://zenodo.org/record/5154271). The raw PacBio subreads are in the ENA repository under study accession number PRJEB3054. The code to reproduce all analysis and figures for this manuscript is described in https://github.com/duncanberger/schisto_livestock_hybrids. Funding: This work was supported by the Wellcome Trust (grant number 206194); the Biotechnology and Biological Sciences Research Council, the Department for International Development, the Economic & Social Research Council, the Medical Research Council, the Natural Environment Research Council and the Defence Science & Technology Laboratory, under the Zoonoses and Emerging Livestock Systems (ZELS) programme (grant number BB/L018985/1 to JPW and MS and grant number BB/S013822/1 to JPW, MS and NDD). FA, AE and MR received funding from the Wellcome Trust (grant number 104958/Z/14/Z). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction Interspecific hybridization, the interbreeding of two individuals from different species, is widely recognized as an important and widespread evolutionary process [1,2]. Hybridization can facilitate the transfer of adaptive advantages across species boundaries and act as a source of genetic diversity across short time scales, facilitating rapid adaptation to new and changing environments [3–7]. While the ecological outcomes of hybridization can be highly variable, hybridization between parasites is an emerging major public and veterinary health concern [8]. Hybrid parasite lineages have been identified with, for example, increased virulence [9], reduced drug susceptibility [9] and increased host range [10,11]. It is expected that the shifting distributions of humans, livestock and parasites due to migration, trade and climate change will increase the frequency of mixed-species co-infections providing more frequent opportunities for hybridization [12–15]. Hybridization between species of schistosomes, the dioecious trematodes that cause the chronic and debilitating neglected tropical disease (NTD) schistosomiasis, is a particular public and veterinary health risk [16]. Schistosomiasis currently infects over 220 million people across 78 countries [17], and untold millions of livestock [18,19]. WHO targets aim to eliminate schistosomiasis as a public health problem in all endemic countries and interrupt transmission in selected regions by 2030 [20,21]. The recently launched WHO 2021–2030 road map states, to achieve these goals, actions required include a better understanding of zoonotic transmission, hybridization and a consideration of potential veterinary and One Health interventions [21]. Domestic livestock often play a key role in cross-species transmission of zoonoses in general, due to their close association with human populations. Furthermore, animal schistosomiasis represents a double burden to poor rural communities for whom their livestock are a crucial source of income and economic security [19]. While the epidemiological risks of Schistosoma spp. hybridization are only just beginning to be realised, it is becoming clear that this has the potential to alter transmission and morbidity dynamics, circumventing current control measures. For example, field and experimental studies, most notably within the Haematobium clade, have demonstrated that hybridization results in viable offspring with altered phenotypes such as that of shifted cercarial shedding patterns [22–24], increased growth rate and higher reproductive output [25,26], altered morbidity profiles induced within their human hosts [27], as well as expanded intermediate or definitive host ranges [23,28–30] relative to their single species counterparts. Indeed, recently a child in Niger was found to be infected with a schistosome displaying markers from two livestock-infective schistosome species, S. bovis and S. curassoni, which on their own do not reach patency, further indicating that hybridization could promote expanded host range with zoonotic transmission [30]. Likewise, countries, most notably those across West Africa, which can have high proportions of hybridized schistosomes amongst both their human and livestock populations, often display extremely high persistence of infection prevalence and intensities despite years of targeted control within the human populations [20,31]. Moreover, it has been suggested that hybridization between S. bovis and S. haematobium facilitated a 2013 outbreak of schistosomiasis in Corsica (France), outside the endemic zone of these species [32–34], although the classification of these samples as recent hybrids has been disputed [35]. Understanding the disease dynamics and evolutionary consequences of these hybridization events in both humans and livestock is therefore essential for predicting disease persistence and to inform control efforts, as well as broaden our knowledge regarding inter-specific hybridization events amongst parasites and pathogens in general. Schistosomes derived from possible hybridization events were originally identified using egg morphology [36–38] or biochemical markers [39,40]. More recently, single-gene or multi-gene sequencing has been used to confirm their existence, in particular through identifying cases of mitochondrial and nuclear discordance or nuclear markers heterozygous for alleles typical of different species [31,41–43]. While these approaches have identified that hybridization could be common in natural populations, they lack the precision to reveal the full history of hybrid populations and may miss highly backcrossed or introgressed hybrid lineages. A recent genomic analysis of hybridization in schistosomes reported no evidence for contemporary hybridization amongst samples from infected children from Niger (n = 96), instead finding evidence of an ancient introgression event from the livestock-infective S. bovis into S. haematobium that occurred an estimated 108–613 generations ago [44]. Both this study and a second smaller-scale genomic survey (n = 19) found evidence of directional selection on the S. bovis-derived alleles in a gene that might facilitate definitive host infection [35,44]. Recent larger-scale genotyping surveys have also suggested that both ancient (occurring hundreds of generations prior to sampling) and more recent hybrid lineages could be present in schistosome populations [31,42,45]. These studies leave, however, many unresolved questions about the details of hybridization in current schistosome populations. These details matter: for example, the presence of first-generation hybrids does not demonstrate that hybrids are themselves fertile and of long-term importance, while fertile back-crosses with parental species could lead to introgression of clinically important traits between species. Evidence of adaptive introgression from historic hybridization can identify adaptive alleles, whereas evidence of recent hybridization could indicate demographic shifts or anthropogenic environmental changes influencing parasite population structures. Here we present novel whole-genome sequence data from 21 schistosome isolates from naturally-infected cattle, collected from two regions of Northern Senegal where livestock are known to be infected with S. curassoni, S. bovis and hybrids between these species [31,46]. Sequenced isolates were selected to represent both non-hybrid genotypes of both species and hybrid genotypes found in this population, and to include both miracidial and adult stages. By sampling from livestock, we were uniquely able to capture multiple developmental stages, notably both adult pairs and miracidia representing parents and potential offspring, respectively. These stages are otherwise unavailable to all previous studies on humans from which adult schistosomes cannot be obtained, even following chemoexpulsion, in contrast to the situation for many other helminth species. We used these data to screen for evidence of hybridization and determine whether admixed samples were derived from recent hybridization events or whether these represent introgressed lineages derived from ancient hybridization events. Discussion Hybridization of schistosomes is an emerging One Health concern with the potential to have major impacts on the epidemiology and control of schistosomiasis. Therefore, it is important to identify and characterize incidence and transmission of admixed Schistosoma species. To investigate the genomic composition of livestock schistosomes, we performed whole-genome sequencing of individual miracidia and adult stage isolates from endemic regions, where admixed isolates are common. To support the population genomic analyses, we produced a chromosomal-scale assembly of the S. curassoni genome that we used to analyse population structure and ancestry of each sequenced isolate. We successfully characterized 21 isolates sampled from four individual cattle hosts, including both miracidia and adult stage parasites from the same host. We have shown that these are most likely early generation hybrids (predominantly F 1 and F 1 backcrosses) between S. bovis and S. curassoni and we found examples of bidirectional hybridization (backcrosses to either parent). This was further supported by the mosaic composition of the backcrossed samples which showed wide variations of f 3 , Patterson’s D and ADMIXTURE estimates across the genome suggesting alternating genomic ancestry from either parent. The presence of F 1 and a later generation backcross suggests that these can produce viable offspring with both parental species as well as other F 1 hybrids. By sampling from livestock, we were able to sequence parasites from multiple developmental stages. This is in contrast to studies sampling humans, which are typically unable to sample adult-stage worms and so are reliant only on miracidia stage parasites. While genomic measures of relatedness are unlikely to be accurate when including first- or second-generation hybrid samples, the presence of a range of adult and miracidial stage early generation hybrids from different hosts suggests that these hybrid lineages were not derived from a single hybridization event but rather multiple independent interspecies pairings. While this study did not aim assess the overall prevalence of S. curassoni–S. bovis hybrids, the presence of multiple unrelated admixed offspring within a single host (BK16_B8) suggests that in mixed-species coinfections these two species of schistosome can hybridize. This is not surprising given the relatively recent divergence time between S. bovis and S. curassoni when compared to other species that can hybridize such as S. bovis and S. haematobium [57,58]. It is therefore likely that pre-zygotic barriers, possibly the limited overlap of the geographical distributions of these two species (and their intermediate host snails in particular), have maintained the reproductive isolation of these species. Indeed, this is consistent with genotyping surveys conducted between 2015 and 2017 in Senegal which found only limited overlap between the geographical and host ranges of these two Schistosoma species, at least in part restricted to their differing intermediate host snail species distribution [31]. Extensive surveys conducted in Barkedji estimated an 8% prevalence of S. curassoni in cattle, 84% in goats and 73% in sheep compared to a 2% prevalence of S. bovis across all three host types [31]. In contrast, in Richard Toll and the Lac de Guiers the same authors estimated a 1% prevalence of S. curassoni in cattle and sheep and 11% in goats but a 92% prevalence of S. bovis among cattle, a 14% prevalence in sheep and 15% prevalence in goats. Whilst any current/previous geographical barriers would have assisted with the reproductive isolation of these species this also means, however, that changes in human or livestock migration patterns or differential control pressures have the potential to radically alter the prevalence of hybrid and non-hybrid schistosomes [59]. This may be particularly pertinent given the evidence that hybrid Schistosoma can also expand their molluscan intermediate host range [25,28], and that schistosomes displaying markers from two livestock-infective schistosome species have been identified to infect humans [30]. Furthermore, while our whole-genome sequencing results were largely consistent with the widely-used dual marker approaches (such as cox1/ITS-rDNA genotyping), we found examples of gains in sensitivity suggesting that the prevalence of hybrid lineages has likely been under-estimated previously. If our observations of novel S. curassoni–S.bovis hybrid lineages are representative of livestock populations in general, then it appears that new allelic combinations are continuously being introduced into schistosome populations. The long-term survival of these hybrid lineages will be determined, at least in part, by the prevalence of non-hybrid lineages. In Barkedji, where all admixed samples were found, the estimated prevalence of S. curassoni–S. bovis hybrids across all three host types varied between 2–4% [31]. In the same study, the authors did not find snails shedding S. bovis in Barkedji concluding that S. bovis was most likely imported from neighbouring regions. Based on the overall prevalence estimates of S. curassoni, S. bovis and hybrids in Barkedji we would expect the majority of hybrid lineages to not be maintained in the population and instead repeatedly backcross with S. curassoni [59]. However, multiple studies have observed that hybrid lineages from closely-related species display increased fitness compared to non-hybrid lineages. For example, within mixed host infrapopulations, S. guineensis and S. haematobium females will preferentially form heterospecific pairings with S. guineensis–S. haematobium hybrid males compared to homospecific pairings to males of their own species [60,61]. These and other competitive mating interactions have been suggested as a factor in the declining prevalence of S. guineensis in Cameroon [62,63]. Conversely, if the S. bovis-S. curassoni hybrids display reduced fitness, as observed in pairings between distantly-related schistosome species [64–66], then they would be less likely to be maintained in the population. This ambiguity underscores the importance of epidemiological surveys to determine to what extent novel hybrid lineages are spreading through livestock populations. To conclude, the threat to public and animal health that is presented by parasite hybridization and zoonotic spillover is predicted to increase with rapid anthropogenic changes and global trends such as migration and changing land and agricultural practices. Our study, using WGS in combination with de novo genome assembly, provides conclusive evidence of contemporary hybridization between schistosomes species under natural settings. We found multiple unrelated generations of hybrids within the same definitive host demonstrating that these hybrids are viable beyond the F 1 generation. We now recommend more extensive, longitudinal genomic surveys to determine the long-term viability of these hybrids and to search for evidence of adaptive introgression from historic hybridization events. Our de novo genome assembly of S. curassoni presented here will be a valuable resource for future investigations into both livestock schistosomes and hybrid infections of both humans and animals alike. Such studies should provide greater insights into the underlying mechanisms maintaining species boundaries and the long-term genomic impacts of hybridization in general. Given the potential for zoonotic risk of livestock hybrid schistosomes our results also highlight the importance of considering control measures in a One Health framework. Understanding how these parasite populations are adapting to recent anthropogenic changes will be vital to predicting the outcomes and long-term successes of current elimination efforts, as well as to forecast the likely spread of tropical diseases such as schistosomiasis beyond their original host and geographic boundaries. Methods Ethics statement Ethical approval for sampling was provided by: (i) Imperial College (London, UK) application 03.36; (ii) the Clinical Research and Ethical Review Board at the Royal Veterinary College (London, UK) application URN20151327; (iii) the Comité National d’Ethique pour la Recherche en Santé (Dakar, Senegal) application SEN15/68. Written informed consent was obtained from all livestock owners. Livestock owners were offered standard anthelmintics for infected animals. Schistosoma curassoni genome assembly A pool of 20 adult Schistosoma curassoni males was obtained from the Schistosome Collection at the Natural History Museum (SCAN) [67]. These represent the second passage of field isolates originally sampled from Dakar, Senegal and were stored in liquid nitrogen on 4th August 1993. DNA was extracted using a modified agarose plug based protocol (Bionano Prep Animal Tissue DNA Isolation Soft Tissue Protocol, Bionano Genomics, San Diego, CA, USA) yielding a total of 2.6 μg of high-molecular weight DNA. The eluted DNA was cleaned using standard phenol/chloroform extraction protocols, DNA concentrations were determined using Qubit high sensitivity kit and DNA size distributions were confirmed using FEMTO Pulse (Agilent, Santa Clara, CA, USA). A Pacific Bioscience CLR long-read sequencing library was constructed using the SMRTbell Template Prep Kit 1.0 and sequenced on the Sequel II platform, by the Scientific Operations core at the Wellcome Sanger Institute. This yielded a total of 38.61 Gb of PacBio subreads (N50 = 16.6 kb, S12 Table). PacBio subreads were assembled using the FALCON assembly and FALCON-unzip (falcon kit v1.8.1). For initial quality control, primary contigs and alternate haplotigs were merged into a single assembly. PacBio subreads were mapped to the assembly using Minimap2 (v2.16-r922) [68]. Taxonomic annotation of contigs was performed using both Blastn (v.2.2.31) and Diamond (v0.9.14.115) against NCBI nt (v2.10.1+) and UniProt Reference Proteomes (Release 2017_07) databases respectively. Using the mapping and taxonomic annotation files Blobtools was used to remove contigs representing potentially contaminants (total = 69.61 kb) [69] (S16 Fig). Two rounds of error correction were performed by first mapping PacBio subreads to the genome using pbmm2 (v1.4.0) and then error correction was performed using gcpp (v2.0.0). Alternate haplotigs identified using FALCON-Unzip were first removed and any remaining alternate haplotigs were identified using purge_dups [70]. Arima Hi-C data (S12 Table) was then used to scaffold the primary assembly into chromosome spanning scaffolds using the Juicer and 3D-DNA pipelines [71,72]. PacBio pre-assembled reads (from FALCON assembly) were mapped to the genome using minimap2 and the assembly was visualised using Gap5 [73]. Hi-C reads were mapped to the assembly using bwa mem and visualised using PretextView. The assembly was then manually curated in Gap5 using the Hi-C data as a guide to break misjoins and scaffold over repetitive or highlighly haplotypic regions. Then a final round of polishing using pbmm2 and GCPP was performed as above. We assessed the completeness of the curated assembly by searching for conserved, single copy genes using BUSCO (Benchmarking Universal Single-Copy Orthologs) (v3.0.2) and the eukaryota lineage (odb10) dataset [74]. Juicer was used to map all Hi-C data to the final assembly and Juicebox was used to visualise the data. Pseudoautosomal regions (PARs) of the Z chromosome were identified based on synteny to S. mansoni. Annotation of repetitive elements was conducted using RepeatModeler and RepeatMasker [75]. Per chromosome GC content was measured using Bedtools [76], assembly features were visualised using the circlize package [77]. Genome annotation Gene structures of conserved S. mansoni genes were predicted using GenomeThreader (v1.7.1) [78], using spliced alignments of S. mansoni (v9) transcript and protein sequences [47]. Synteny inference Comparison of synteny between S. curassoni and S. mansoni were performed using MCscan [79]. Gene coordinates were extracted from GFF3 files and pairwise synteny was inferred using coding sequences, setting the cscore cutoff to 0.99. Macrosynteny was visualised using the JCVI utility libraries [80]. Sampling of abattoir field isolates Post-mortem abattoir surveys of routinely slaughtered cattle, sheep and goats were carried out between 2015 and 2017 in three villages of Northern Senegal, Richard Toll (16.47057, -15.694457) in the Senegal River Basin and Linguère (15.24241, -15.06592) and Barkedji (15.28243333, -14.87238333) in the Vallée du Ferlo [31]. In Richard Toll water bodies are permanent and transmission is perennial. Due to the proximity of the Diama Dam, the area has undergone substantial land-use changes, with desalination and creation of irrigation canals that have facilitated expansion of habitats for snail intermediate hosts and increased sharing of water contact points by communities and their animals. The main livestock-schistosome species circulating in this area is S. bovis. By contrast, in Barkedji and Linguère, temporary ponds are the main water sources. Schistosomiasis transmission is seasonal as they disappear completely during the dry season which necessitates seasonal migration by a large proportion of livestock-keeping communities. The main livestock-schistosome species circulating in this area is S. curassoni [31]. During post-mortem abattoir surveys of routinely slaughtered animals, mesenteric and rectal blood vessels and bladder and associated vasculature were visually inspected for the presence of adult-stage worms. Fecal samples were collected directly from the rectum as well sections of liver, spleen, lungs and kidney and processed following an adapted miracidial hatching technique [31]. Urine was collected for filtration. All adult-stage worms (single males, females and split couples) were stored in RNA-later. Following urine filtration and miracidia hatching of faeces and tissue samples, free-swimming miracidia were individually pipetted onto Whatman Indicating FTA Classic cards (GE Healthcare Life Sciences, UK). Genotyping field isolates Single miracidia were isolated from Whatman Indicating FTA Classic cards using a Harris micro-punch. DNA was extracted from single miracidia using the CGP protocol originally described by Moore et al. [81] and used to sequence S. mansoni miracidia by Doyle et al. [82]. Adult-stage worm samples were manually inspected and separated so only individuals were sequenced. DNA was extracted from these samples using the QIAGEN DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany), following manufacturer’s instructions. Partial fragments of the mitochondrial cytochrome c oxidase subunit 1 (cox1) and the complete nuclear ribosomal DNA internal transcribed spacer (ITS) were amplified from individual DNA extracts [83,84](S13 Table). The following PCR conditions were used: initial denaturation step of 5 minutes at 95°C, 40 cycles of 30 seconds at 95°C, 30 sec at 58°C and 1 min at 72°C; final extension period of 7 min at 72°C. Polymerase chain reaction (PCR) fragments were sequenced by Eurofins Genomics (Cologne, Germany) using original primers. CodonCode Aligner v7.0.1 (Centerville, USA) was used to manually edit and assemble sequences and were compared to Schistosoma reference sequences to confirm species. Adult-stage worms or miracidia displaying both S. bovis and S. curassoni signals were designated as hybrids. Sequencing of field isolates DNA sequencing libraries for all miracidia samples were prepared using the laser capture microdissected biopsy (LCMB) protocol described by Lee-Six et al. [85]. In short, 20 μl of lysate was mixed with 50 μl of Ampure XP beads and 50 μl of TE buffer (Ambion 10 mM Tris-HCL, 1 mM EDTA), followed by incubation (5 minutes, room temperature). Tubes were placed on magnetic racks and beads were washed twice with 75% ethanol, followed by resuspension in 26 μl of TE buffer. DNA fragmentation and A-tailing was performed by mixing each sample with 7 μl of 5x Ultra II FS buffer and 2 μl of Ultra II FS enzyme (New England BioLabs) followed by two incubation steps (12 minutes at 37°C then 30 minutes at 65°C). For adaptor ligation, 1 μl ligation enhancer (New England BioLabs), 0.9 μl nuclease-free water (Ambion), 30 μl of ligation mix and 0.1 μl duplexed adapters were added to each sample followed by incubation (20 minutes, 20°C). Library purification and elution was performed by adding 65 μl of Ampure XP beads and 65 μl of TE buffer. 25 μl KAPA HiFi HotStart ReadyMix (KAPA Biosystems) and 1 μl PE1.0 primer to 21.5 μl of eluted library and each sample was thermal-cycled as follows: 98°C for 5 mins, then 12 cycles of 98°C for 30 secs, 65°C for 30 sec, 72°C for 1 min and 72°C for 5 mins. Amplified libraries were purified using a 0.7:1 volumetric ratio of Ampure beads to library, followed by elution in 25 μl of nuclease-free water. Multiplexed 150 bp paired-end read libraries were sequenced using the Illumina Hiseq X system at the Wellcome Sanger Institute. Cluster generation and sequencing were undertaken according to the manufacturer’s protocol. For all adult-stage samples DNA was quantified with the Biotium Accuclear Ultra high sensitivity dsDNA Quantitative kit using Mosquito LV liquid platform, Bravo WS and BMG FLUOstar Omega plate reader. For each sample, 200ng/120μl was used to prepare sequencing libraries. DNA was sheared to target 450 bp fragment size using a Covaris LE220 instrument, samples were then purified using Agencourt AMPure XP SPRI beads using the Agilent Bravo WS automation system. Library construction was performed using a NEB Ultra II DNA library preparation kit using the Agilent Bravo WS automation system. PCR was performed using the KapaHiFi Hot start mix and IDT 96 iPCR tag barcodes. 6 standard cycles of PCR were performed using the following conditions: initial denaturation step of 5 minutes at 95°C, 30 seconds at 98°C, 30 sec at 65°C and 1 min at 72°C; final extension period of 10 min at 72°C. Libraries were then purified using Agencourt AMPure XP SPRI beads on Beckman BioMek NX96 liquid handling platform and libraries were quantified with the Biotium Accuclear Ultra high sensitivity dsDNA Quantitative kit using Mosquito LV liquid platform, Bravo WS and BMG FLUOstar Omega plate reader. Libraries were pooled in equimolar amounts and normalised to 2.8nM. Multiplexed 150 bp paired-end read libraries were sequenced using the Illumina Hiseq X10 system at the Wellcome Sanger Institute. Cluster generation and sequencing were undertaken according to the manufacturer’s protocol. Reference samples We downloaded additional genome sequence data from the NCBI Short Read Archives for seven other species in the Haematobium clade as reference samples for each species. This included read data for each of the following species: S. curassoni, S. guineensis (unpublished), S. intercalatum (unpublished), S. matthei and S. margrebowiei [48] two S. bovis [48,86] and S. haematobium [44]. Sequence data for S. guineensis and S. intercalatum were originally sequenced as part of a consortia project [48] but were not included in the publication, below we have detailed the sample background and sequencing methods as background for future work. (S14 Table). S. guineensis was collected in Chacara, Sao Tomé and S. intercalatum from the Democratic Republic of Congo. Both were passaged through mice at the Natural History Museum, London. DNA was extracted from a single adult-stage male worm of each isolate by phenol-chloroform extraction, sheared to target 450bp fragment size using a Covaris LE220 instrument, purified with AMPure XP SPRI beads and sequencing libraries generated using the NEB Ultra II library prep kit and 8 cycles of PCR before sequencing 200bp paired-end reads on a HiSeq 2000 platform. Sequence analysis Sequence reads were aligned to the S. curassoni reference assembly (all autosomes, Z chromosome and the mitochondrial genome) using BWA mem (v.0.7.17) and duplicates were marked using PicardTools MarkDuplicates (as part of GATK v.4.1.0.0). Per-sample variant calling was performed using GATK HaplotypeCaller and consolidated using GATK CombineGVCFs. Joint-call cohort genotyping was performed using GATK GenotypeGVCFs. We partitioned and filtered SNPs and indels/mixed (variant sites that had both SNPs and indels called) sites separately using GATK SelectVariants and filtered using VariationFiltration, respectively. SNPs were retained if they met the following criteria: QD ≥ 2.0, FS ≤ 60.0, MQ ≥ 40.0, MQRankSum ≥ -12.5, ReadPosRankSum ≥ -8.0, SOR ≤ 3.0 (S17 Fig). Indels and mixed sites were filtered independently and variants were retained if they met the following criteria: QD ≥ 2.0, FS ≤ 200.0, ReadPosRankSum ≥ -20.0, SOR ≤ 10.0 (S17 Fig). We removed all sites with high proportions of SNP missingness (>5%) using BCFtools [87]. For the rest of the genome, we first excluded a single sample (BK16_B8_18) due to high rates of SNP missingness (>90% of all variant sites could not be genotyped). We then removed sites with a high proportion of SNP missingness across samples (>5%) using BCFtools. We called variants against the mitochondrial genome using GATK HaplotypeCaller in haploid mode but otherwise followed the same filtering parameters as the non-mitochondrial variants and excluded the BK16_B8_18 as above. We identified samples with same-species contamination by examining allelic imbalances in heterozygous variants, where the ratio of reads supporting each allele substantially deviates from 50% (based on methods described by https://speciationgenomics.github.io/allelicBalance/) (S5–S8 Figs). Results were manually inspected and non-reference samples with clear, continuous peaks of allelic ratios at >0.6 or <0.4, were classified as contaminated. This identified three samples with evidence of contamination and we excluded them from further analysis. As additional lines of evidence, for each sample we estimated the fraction of reads coming from cross-sample contamination using GetPileupSummaries and CalculateContamination (as implemented in GATK v.4.2.0.0; S4 Table). Finally, for all samples not removed due to contamination, we estimated the average number of haplotypes in each sample using estMOI (v.1.03; S5 Table) [88]. Depth of coverage We calculated coverage in 25 kb windows along each chromosome using bedtools coverage. We calculated median read coverage across the pseudoautosomal regions (PARs) of the Z chromosome: PAR1 (coordinates 0–9,212,000 bp), PAR2 (coordinates 44,200,000–88,179,078 bp) and the non-pseudoautosomal: Z-specific region (ZSR, coordinates 9,212,000–44,200,000 bp). To differentiate male and female samples we compared the relative coverage over the ZSR and PAR regions. Samples with a PAR/ZSR ratio >0.75 were classified as male and PAR/ZSR ratio < 0.75 were classified as female. Population structure We used PLINK (v2.0) to remove variants in strong linkage disequilibrium (LD), discarding variants according to the observed correlation between pairs of variant sites. We scanned the genome in sliding windows of 50 variants, in steps of 10 variants and removed variants with squared correlation coefficients > 0.15. We also excluded all non-autosomal variants (those found on the Z-chromosome and mitochondrial genome) resulting in 366,486 autosomal variants. Principal component analysis (PCA) was performed using PLINK. We used publicly available scripts to convert 366,486 autosomal SNPs into Phylip format (github.com/edgardomortiz/vcf2phylip/vcf2phylip.py) followed by removal of all invariant sites (github.com/btmartin721/raxml_ascbias/ascbias.py). IQ-TREE was used to perform phylogenetic inference using the best-fit substitution model with ascertainment bias correction selected by ModelFinder (TVM+F+ASC+R2) and 1000 ultrafast bootstraps. The resulting phylogeny was visualised using ggtree (v.1.10.5) [89]. We then repeated these analyses for the mitochondrial variants without LD pruning. Additionally, we used PLINK (v1.9) to produce an identity-by-state distance matrix from a subset of samples and autosomal variants. We then used ape (v5.2) bionj algorithm to produce a neighbour-joining phylogeny. The resulting phylogeny was visualised using ggtree. Ancestry estimation Using the same LD pruned variant set as the previous section was converted into a BED file using PLINK (2.0) and we then estimated individual ancestry using ADMIXTURE [90] including only S. curassoni, S. bovis and S. haematobium samples. We used a range of K values (number of hypothetical ancestral populations) ranging from 1–20, 10-fold cross-validation, standard error estimation with 250 bootstraps and repeated the process 10 times with different random seeds. We then used PLINK to divide variants into 50 kb non-overlapping windows across each autosome and the Z chromosome. We designated reference populations for both S. curassoni (SCUR_01, BK16_B3_01 and BK16_B8_03) and S. bovis (SBOV_01, SBOV_02, RT15_B6_01) selected from published datasets or those with no evidence of admixture (based on the ADMIXTURE results and cox1/ITS-rDNA). We repeated the ADMIXTURE analysis across each 50 kb window without bootstrapping, specifying two populations (K = 2) and using the six reference samples as designated populations for each species. We produced a second autosomal neighbour-joining phylogeny (as above) excluding six samples (SCUR_01, SBOV_01, SBOV_02, BK16_B3_01, BK16_B8_03, RT15_B6_01) and plotted ancestry estimates in phylogenetic order. Autosomal junctions (switches in genomic ancestry resulting from recombination between heterogenic chromosomes) were identified using ADMIXTURE. We created 1 Mb sliding windows along each chromosome, proceeding in steps of 500 kb, and calculated the ADMIXTURE value for each window as above. We created 3 ancestry categories: predominantly S. bovis ancestry (S. bovis admixture proportion > = 0.75), predominantly S. curassoni ancestry (S. bovis admixture proportion < = 0.25), admixed (S. bovis admixture proportion >0.25 & <0.75). We defined an ancestry junction as any region of the genome where one or more 500 kb windows were in a different category than the previous window. We estimated the expected number of autosomal junctions for each generation since an initial hybridization event using the junctions R package [91]. We tested values of initial heterozygosity, assuming an infinite population size and an autosomal genetic map length of 983.9 cM. We used scikit-allel to calculate the average outgroup f 3 statistic for each sample, standard errors were estimated with a block jackknife over 1000 markers (blen = 1000). We defined three populations described by the tree ((P1,P2),P3)). We designated reference populations for S. bovis (SBOV_01, SBOV_02, RT15_B6_01) and S. curassoni (SCUR_01, BK16_B3_01 and BK16_B8_03) as P1 and P2, respectively. Individual samples to be analysed were designated at P3. We used Dsuite (v0.5 r45) to calculate Patterson’s D (ABBA-BABA) statistics using all autosomes [92]. The test was performed using four populations P1, P2, P3, and O, described by the rooted tree (((P1,P2),P3),O). S. haematobium was designated as the outgroup population (O), samples to be analysed (newly sequenced non-reference samples) were designated P2, reference populations for S. curassoni and S. bovis were designated as either P1 or P3 (analysis was performed with populations in each position). The Bayesian clustering software NEWHYBRIDS was used to assign each sample into different hybrid categories (pure S. bovis, pure S. curassoni, F 1 , F 1 backcross or F 2 ). NEWHYBRIDS can only be run efficiently on a small number of loci, so we randomly subsampled 300 bi-allelic autosomal variants and used PGDSpider to convert the subset VCF file into NEWHYBRIDS format. NEWHYBRIDS was run for 150,000 sweeps of burn-in and 500,000 sweeps of data collection. We repeated this analysis 14 additional times using different random subsets of 300 variants for each replicate. Relatedness estimation We used NGSremix to estimate pairwise relatedness between all S. bovis, S. curassoni and admixed samples [93]. The LD pruned BED file was used as input along with admixture proportions and ancestral allele frequencies estimated by ADMIXTURE. The summary statistics k 0 , k 1 , k 2 (the proportion of the genome where a pair of individuals share 0,1 or 2 alleles IBD, respectively) were used to estimate relatedness. Values of (k 1 , k 2 ) of (0,0) corresponded to unrelated pairs, (1,0) for parent-offspring, (0.5,0.25) for full siblings, (0.25,0) for first-cousins and (0.625,0) for second cousins [94]. Estimating fixed variants between species We identified biallelic SNP sites found as homozygous reference alleles in three non-admixed S. curassoni samples (SCUR_01, BK16_B3_01 and BK16_B8_03) and as homozygous alternate alleles three non-admixed S. bovis samples (SBOV_01, SBOV_02, RT15_B6_01). This identified 309,040 SNPs fixed between these two populations, per SNP site statistics were calculated for the remaining samples using BCFtools. Acknowledgments We would like to thank the DNA pipeline operations and informatics teams at the Wellcome Sanger Institute for production of the sequence data. We would also like to thank Sarah Buddenborg for her helpful feedback on earlier versions of this manuscript. The authors are extremely grateful to the people who helped in the collection of the samples: Dr Anna Borlase, Mr. Samba D. Diop, Dr Stefano Catalano, Dr Cheikh Binetou Fall, Mr. Cheikh Tidiane Thiam and Mr Alassane Ndiaye. [END] --- [1] Url: https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1010706 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/