(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Comparative analysis reveals the long-term coevolutionary history of parvoviruses and vertebrates [1] ['Matthew A. Campbell', 'University Of Alaska Museum Of The North', 'Fishes', 'Marine Invertebrates', 'Fairbanks', 'Alaska', 'United States Of America', 'Shannon Loncar', 'University Of Massachusetts Medical School', 'Department Of Microbiology'] Date: 2022-12 Parvoviruses (family Parvoviridae) are small DNA viruses that cause numerous diseases of medical, veterinary, and agricultural significance and have important applications in gene and anticancer therapy. DNA sequences derived from ancient parvoviruses are common in animal genomes and analysis of these endogenous parvoviral elements (EPVs) has demonstrated that the family, which includes twelve vertebrate-specific genera, arose in the distant evolutionary past. So far, however, such “paleovirological” analysis has only provided glimpses into the biology of ancient parvoviruses and their long-term evolutionary interactions with hosts. Here, we comprehensively map EPV diversity in 752 published vertebrate genomes, revealing defining aspects of ecology and evolution within individual parvovirus genera. We identify 364 distinct EPV sequences and show these represent approximately 200 unique germline incorporation events, involving at least five distinct parvovirus genera, which took place at points throughout the Cenozoic Era. We use the spatiotemporal and host range calibrations provided by these sequences to infer defining aspects of long-term evolution within individual parvovirus genera, including mammalian vicariance for genus Protoparvovirus, and interclass transmission for genus Dependoparvovirus. Moreover, our findings support a model of virus evolution in which the long-term cocirculation of multiple parvovirus genera in vertebrates reflects the adaptation of each viral genus to fill a distinct ecological niche. Our findings show that efforts to develop parvoviruses as therapeutic tools can be approached from a rational foundation based on comparative evolutionary analysis. To support this, we published our data in the form of an open, extensible, and cross-platform database designed to facilitate the wider utilisation of evolution-related domain knowledge in parvovirus research. Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: R.M.K. is a co-founder of Carbon Therapeutics, Inc., which is a co-assignee of a patent application filed on behalf of University of Massachusetts Medical School and Carbon Biosciences, Inc. Funding: This work was supported by funding from the Association Monégasque Contre les Myopathies (RK), and the Bill & Melinda Gates Foundation (OPP1202116 to RK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2022 Campbell et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Comparative studies have shown that endogenous parvoviral element (EPV) sequences occur frequently in vertebrate genomes, and many of these derive from germline incorporation events that occurred million years ago (Mya) [ 15 – 18 ]. In this study, we perform broad-scale comparative analysis of 752 published vertebrate genomes to recover 364 distinct EPV sequences representing at least 199 unique loci and involving at least five distinct parvovirus genera. Through broad-scale phylogenetic and genomic analysis—encompassing all known vertebrate EPVs and parvovirus species—we reveal the long-term evolutionary interactions between parvoviruses and their vertebrate hosts. In recent years, high-throughput sequencing and new metagenomic analytical methods have led to the discovery of numerous novel parvovirus species, and the taxonomy of the family Parvoviridae has now been extensively reorganised to accommodate this newly discovered diversity [ 1 , 2 ]. The availability of genome sequence data from a wide range of diverse parvovirus species provides unprecedented opportunities to utilise comparative approaches to investigate parvovirus biology. Furthermore, progress in whole genome sequencing (WGS) has revealed that DNA sequences derived from parvoviruses (and many other virus groups) are widespread within metazoan genomes [ 12 – 14 ]. Such “endogenous viral elements” (EVEs) arise when infection of germline cells results in virus-derived DNA sequences being incorporated into chromosomes and inherited as host alleles. EVE sequences can sometimes persist in the gene pool over many generations with the result that some are genetically “fixed” (i.e., they reach a frequency of 100%). Fixed EVEs have unique value to studies of virus evolution because—much like a virus “fossil record”—they preserve retrospective information from which the evolutionary interactions of viruses and hosts across geologic timescales can be inferred. For example, identification of orthologous EVE loci in multiple related host species demonstrates that virus integration occurred in the common ancestor of those species, prior to their divergence [ 12 , 13 ]. A robust minimum age estimate for EVE integration can therefore be inferred from host species divergence times (which are in part derived from fossil evidence). Parvoviruses have highly robust, icosahedral capsids (T = 1) that contain a linear, single-stranded DNA genome approximately 5 kilobases (kb) in length. Their compact genomes are typically organised into two major gene cassettes, one (Rep/NS) that encodes the nonstructural replication proteins, and another (Cap/VP) that encodes the structural coat proteins of the virion [ 11 ]. However, some genera contain additional open reading frames (ORFs) adjacent to these genes or overlapping them in alternative reading frames. All parvovirus genomes are flanked at their 3′ and 5′ ends by palindromic inverted terminal repeat (ITR) or “telomere” sequences that are the only cis elements required for replication. Parvoviruses (family Parvoviridae) are a diverse group of small, nonenveloped DNA viruses that infect a broad range of animal species [ 1 , 2 ]. The family includes numerous important pathogens of humans and domesticated species, including erythroparvovirus B19 (fifth disease) [ 3 ], carnivore protoparvovirus 1 (canine parvovirus) [ 4 ], and carnivore amdoparvovirus 1 (Aleutian mink disease) [ 5 ]. Parvoviruses are also being developed as next-generation therapeutic tools: Adeno-associated virus (AAV) has been successfully adapted as a gene therapy vector, and other parvoviruses are leading candidates for human gene therapy [ 6 , 7 ]. Rodent protoparvoviruses show natural oncotropic and oncolytic properties and are being explored as potential anticancer therapeutics [ 8 – 10 ]. Results Open resources for comparative genomic analysis of parvoviruses To facilitate greater reproducibility and reusability in comparative genomic analyses, we previously developed GLUE (Genes Linked by Underlying Evolution), a bioinformatics software framework for the development and maintenance of “virus genome data resources” [19]. Here, we used the GLUE framework to create Parvovirus-GLUE [20], an openly accessible online resource for comparative analysis of parvovirus genomes (S1 and S2 Figs). Data items collated in Parvovirus-GLUE include the following: (i) a set of 135 reference genome sequences (S1 Table) each representing a distinct parvovirus species and linked to isolate-associated data (isolate name, time and place of sampling, host species); (ii) a standardized set of 51 parvovirus genome features (S2 Table); (iii) genome annotations specifying the coordinates of these genome features within reference genome sequences (S3 Table); and (iv) a set of multiple sequence alignments (MSAs) constructed to represent distinct taxonomic levels within the family Parvoviridae (Table 1 and S3 Fig). PPT PowerPoint slide PNG larger image TIFF original image Download: Table 1. Summary of MSA hierarchy constructed for the family Parvoviridae. https://doi.org/10.1371/journal.pbio.3001867.t001 The Parvovirus-GLUE project is built by using GLUE’s native command layer to create a bespoke MySQL database that not only contains the data items associated with our analysis, but also maps the semantic links between them (e.g., the associations between specific sequences, genome features, and MSA segments) (S1 and S2 Figs). Standardised, reproducible comparative genomic analyses can then be implemented by using GLUE’s command layer to coordinate interactions between the project database and bioinformatics software tools. Parvovirus-GLUE aims to provide a platform through which researchers working in different areas of parvovirus genomics can benefit from one another’s work. The project can be installed on all commonly used computing platforms and is also fully containerised via Docker [21]. In the interests of maintaining a lightweight, flexible approach, the published project contains only a single reference genome for each parvovirus species. However, it can readily be extended to allow in-depth analysis at the species level (a tutorial included with the published resource demonstrates how this can be done; [20]). Parvovirus-GLUE is hosted in an openly accessible online version control system (GitHub), providing a platform for its ongoing development by the research community, following practices established in the software industry (S1C Fig) [22]. To facilitate its use across a broad range of analysis contexts, the resource adheres to a “data-oriented programming” paradigm that directly addresses issues of reusability, complexity, and scale in the design of information systems [23]. Conservation of genome features in Parvovirinae evolution We examined the distribution of conserved genome features among Parvovirinae genera in relation to the Parvovirinae phylogeny (Fig 5). For example, the “telomeres” that flank parvovirus genomes are heterotelomeric (asymmetrical) in some genera (Amdo-, Proto-, Boca-, and Aveparvovirus) whereas they are homotelomeric (symmetrical) in others [31]. Interestingly, the distribution of this trait across sublineages within the subfamily Parvovirinae suggests that the asymmetrical form (which is found across the “Amdo-Proto” and “Ave-Boca” sublineages) is more likely to be ancestral. Similarly, in all Parvovirinae genera except Aveparvovirus and Amdoparvovirus, the N-terminal region of VP1 (the largest of the capsid) contains a phospholipase A2 (PLA2) enzymatic domain that becomes exposed at the particle surface during cell entry and is required for escape from the endosomal compartments. Phylogenetic reconstructions indicate that this domain was present ancestrally and has been convergently lost in the Aveparvovirus and Amdoparvovirus genera (Fig 5) [2,32]. Parvovirinae genera also show variation in their gene expression strategies through differential promoter usage and alternative splicing. Members of the Proto- and Dependoparvovirus genera use two to three separate transcriptional promoters, whereas the Amdo-, Erythro-, and Boca- genera express all genes from a single promoter and use genus-specific read-through mechanisms to produce alternative transcripts [2,11]. Interestingly, both the Proto- and Dependoparvovirus genera utilise the first of these expression strategies despite being relatively distantly related, suggesting that the use of separate promoters could be the ancestral strategy within the subfamily Parvovirinae. However, this would mean that mechanisms to express multiple genes from a single promoter were acquired independently by the parvovirus genera that utilise them (Fig 5). Mammalian vicariance has shaped the evolution of protoparvoviruses The recovery of a rich fossil record for protoparvoviruses allowed us to examine how their evolution has been shaped by macroevolutionary processes impacting on mammals over the past 150 to 200 My, such as continental drift [33]. Around 200 Mya, the supercontinent of Pangaea, then the sole landmass on the planet, began separating into two subcomponents (Fig 7). One (Laurasia) comprised Europe, North America, and most of Asia, while the second (Gondwanaland) comprised Africa, South America, Australia, India, and Madagascar. Mammalian subpopulations were fragmented by these events, and then fragmented further as Gondwanaland separated into its component continents. The associated genetic isolation due to geographic separation (vicariance) drove the early diversification of major subgroups, including indigenous mammalian lineages in South America (xenarthans and marsupials), Australia (marsupials), and Africa (afrotherians). At points throughout the Cenozoic Era, placental mammal groups that evolved in Laurasia (boreoeutherians) expanded into other continental regions. For example, the ancestors of contemporary New World rodents (which include capybaras, chinchillas, and guinea pigs among many other, highly diversified species) are thought to have reached the South American continent approximately 35 Mya [34]. Protoparvoviruses phylogenies strikingly reflect the impact of mammalian vicariance—and later migration—on protoparvovirus emergence and spread during the Cenozoic Era. When protoparvovirus-related EPVs are included in ML-based reconstructions, the internal structure of the resultant phylogeny has extremely robust support (Fig 6). Moreover, this phylogeny can readily be mapped onto a phylogeny of mammals (obtained via TimeTree; [35]) so that the three major protoparvovirus lineages emerge in concert with major groups of mammalian hosts (Fig 7C). Importantly, however, one exception to this pattern occurs in the “Archeoproto” clade in which EPVs from New World rodent genomes group with EPVs found in marsupial genomes, with the closest relatives being EPVs identified in the common opossum (Monodelphis domestica), a South American marsupial (Fig 6). We propose that, as shown in Fig 7, these relationships can be accounted for by a parsimonious model of protoparvovirus evolution wherein (i) ancestral protoparvovirus species were present in Pangaea prior to its breakup; (ii) vicariance among ancestral mammal populations led to the emergence of distinct protoparvovirus clades in distinct biogeographic regions, with the “archeoprotoparvovirus” (ArcPV) clade evolving in marsupials, and the “meso-” and “neo-” clades evolving in placental mammals; and (iii) founding populations of New World rodents were exposed to infection with ArcPVs following rodent colonisation of the South American continent (estimated to have occurred approximately 50 to 30 Mya; [34]). This simple model can account for the phylogenetic relationships shown in Fig 6, as well as the high frequency of ArcPV-derived EPVs in the genomes of New World rodent species versus their complete absence from the genomes of Old World rodent species. [END] --- [1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001867 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/