(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Population genetic models for the spatial spread of adaptive variants: A review in light of SARS-CoV-2 evolution [1] ['Margaret C. Steiner', 'Department Of Human Genetics', 'University Of Chicago', 'Chicago', 'Illinois', 'United States Of America', 'John Novembre', 'Department Of Ecology', 'Evolution'] Date: 2022-11 Abstract Theoretical population genetics has long studied the arrival and geographic spread of adaptive variants through the analysis of mathematical models of dispersal and natural selection. These models take on a renewed interest in the context of the COVID-19 pandemic, especially given the consequences that novel adaptive variants have had on the course of the pandemic as they have spread through global populations. Here, we review theoretical models for the spatial spread of adaptive variants and identify areas to be improved in future work, toward a better understanding of variants of concern in Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) evolution and other contemporary applications. As we describe, characteristics of pandemics such as COVID-19—such as the impact of long-distance travel patterns and the overdispersion of lineages due to superspreading events—suggest new directions for improving upon existing population genetic models. Citation: Steiner MC, Novembre J (2022) Population genetic models for the spatial spread of adaptive variants: A review in light of SARS-CoV-2 evolution. PLoS Genet 18(9): e1010391. https://doi.org/10.1371/journal.pgen.1010391 Editor: Justin C. Fay, University of Rochester, UNITED STATES Published: September 22, 2022 Copyright: © 2022 Steiner, Novembre. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This material is based upon work supported by the National Science Foundation (DGE1746045 to MCS) and the National Institute of General Medical Sciences (R01GM132383 to JN). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction The Coronavirus Disease 2019 (COVID-19) pandemic has been one of the most significant events in recent human history where the processes of evolutionary biology are unquestionably paramount. The importance of “variants of concern” (VOCs) is now well recognized, and substantial effort now goes to monitoring and studying their properties [1,2]. In considering any adaptive variant, one of the key aspects of its evolutionary dynamics is how it spreads geographically, from the place of its origin to populations potentially across the globe. In the context of COVID-19, the successive establishment and geographic spread of adaptive variants has become a major factor in the progression of the pandemic and is now a dominant management challenge in reacting to and quelling the pandemic. Intrinsic to this process is the geographic spread of an adaptive variant, a topic that has long been studied in evolutionary population genetics using theoretical models. Motivated by COVID-19 and the dispersal of variants of infectious agents more broadly, we provide a review of the theoretical population genetic literature on models for the geographic spread of adaptive alleles. While this has been an ongoing area of research for over 80 years, no recent literature review of these models is readily available. In our writing, we give special attention to how relevant these models are to the problems occasioned by the spread of adaptive variants in pathogens. In a retrospective way, we ask: Given this long history of study, were the theoretical models available as the pandemic began ready to provide insights regarding Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)? And to the extent they were not, what gaps exist and what research directions should be emphasized for the future? While we limit ourselves to the theoretical population genetic literature, evolutionary aspects of pandemics overlap with many academic disciplines, and we recommend readers also see other excellent reviews in this broad area (for instance, [3–5]). We additionally limit our scope to prospective, forward-in-time theoretical population genetic models, thus excluding retrospective approaches such as genealogy-based and phylogeographic models for which existing reviews are available (see [6,7], respectively). As we will show, the COVID-19 pandemic highlights several gaps in current models for the geographic spread of adaptive alleles, the resolution of which will be informative for both scientific and public health goals. Before reviewing specific theoretical models of spatial spread with selection, it is necessary to introduce some foundational vocabulary for each of the processes involved in the spatial spread of alleles. At its core, dispersal involves movement of individuals between locations in space, as described by either continuous or discrete spatial models. In many continuous models, dispersal is assumed to be diffusive, meaning, dispersal is dominated by short-range movement with few to no large, discontinuous jumps. Alternatively, when large, discontinuous jumps are more common, dispersal is described as fat-tailed. The name arises because if one considers a probability distribution on the geographic displacement between an offspring allele and its parental allele (Fig 1A; also known as a dispersal kernel), the distribution has substantial probability mass in its tails, which represent long-distance jumps. Formally, the tails decay slower than an exponentially decaying function (Fig 1A; and see [8] for more on dispersal kernels). Dispersal may be isotropic, meaning movement in any direction is equally probable or anisotropic (for example, when movement occurs along predominant axes). Lastly, dispersal may also be pairwise symmetric or asymmetric, an important example of asymmetry being where dispersal has a nonzero displacement vector (as might arise when movement in one direction is greater than in the reverse direction). Dispersal can also be spatially homogeneous, meaning the same dispersal distribution applies across the whole space, or in more complicated cases, spatially heterogeneous. In some cases, dispersal is modeled as occurring among discrete populations (for example, lattice, stepping-stone, meta-population, and network models; Fig 1B). In these models, locations take the form of nodes in a network of discrete units, typically representing local well-mixed subpopulations, known as demes in the population genetic literature (Fig 1B). In this case, varying numeric weights on the edges connecting individual demes can be used to model spatially heterogeneous levels of dispersal, and the presence of edges between distant nodes in the network can represent long-distance dispersal (Fig 1C). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. (A) Negative exponentially bounded vs. fat-tailed dispersal kernels. (Left) Four example dispersal kernels: exponential, folded Gaussian, and Pareto distributions. The Pareto distribution is a form of power-law distribution, which is a classic example of a “fat-tailed” distribution. (Middle) Probability density of each dispersal kernel, with the inset showing values in the tail. (Right) Density plots obtained from the set of highest values of 100 draws for each dispersal kernel. (B) Nearest-neighbor stepping stone model of migration. (C) Example of non-nearest-neighbor migration in the form of commercial flight routes originating from O’Hare International Airport in Chicago, Illinois. Dots represent airport locations. Constructed using publicly available data from https://openflights.org/data.html and Natural Earth using the R package ggmap [9]. https://doi.org/10.1371/journal.pgen.1010391.g001 In addition to dispersal, one must consider basic features of the mutational, adaptive, and reproductive processes. First, in the simplest case, an adaptive variant can be traced back to a single mutational event (variously called a unique event polymorphism or that all carriers of the mutation are identical-by-descent). Alternatively, in scenarios with multiple origins or recurrent mutation, a particular mutation may have arisen multiple times, complicating the spatial modeling. The adaptive sequence landscape is a mapping of sequences to fitness, with an important feature being how many single-mutant neighbors of a particular sequence result in an increase in fitness and by how much [10]. Notably, the adaptive landscape can vary both spatially and temporally as local conditions change. The fitness of an allele will impact the number of offspring its carriers will have, i.e., the distribution of offspring number. A key feature of the distribution of offspring number is its variance and the relative amount of density in the tail (i.e., skew). In classical population genetic (and epidemiological) models, the variance of offspring number is usually assumed to be finite (for example, Cannings models [11]) and sometimes assumed to be small (for example, in the Wright–Fisher model, approximately 1) and with the distribution having an exponentially bounded tail. Alternatively, in models with overdispersed offspring number distributions, a few carriers may have very large number of offspring, creating what are referred to as “superspreading” events in an epidemiological context [12]. An additional note of clarifying vocabulary is that the terms variant, allele, and clade have closely related and often overlapping meanings and uses. Many classic theoretical population genetic models are formulated in terms of the abstract notion of an allele, which denotes a form of genetic material in a particular locus, regardless of its exact molecular basis. The term variant is quite similar, though used more often in a modern context where the exact molecular basis of the allele, i.e., the defining mutation(s), is known. Phylogenetic approaches often classify variation in the form of clades within an inferred phylogenetic tree: Members of the same clade carry a shared set of mutations that occurred on branches ancestral to the node defining the clade. Models for the spread of an allele, in many cases, can be applied to the spread of a clade. Clades are also sometimes referred to as lineages. For instance, the Pango nomenclature system identifies lineages with epidemiological relevance [13,14]. Additionally, the Greek letter system used by the World Health Organization denotes variants of concern and variants of interest based on evidence of impact on disease characteristics (for instance, transmissibility; [15]). Both Pango and Greek letter lineages/variants are also related to clade definitions given by Nexstrain [2] and GISAID [16]. It is important to note that this nomenclature is not consistent across viruses, with HCV and HIV lineages being referred to commonly as “genotypes” and “subtypes,” respectively [17,18]. In this paper, given our intention to focus on the theoretical population genetics literature, we will often use the terms allele and variant, noting that clade in a phylogenetic tree is a special case of an allele where all carriers are identical by descent (see above). Given the range of possibilities implied by the vocabulary just introduced, theoretical models can take many forms, with each conferring a degree of approximation or simplification. In this review, given our motivating interest in the geographic spread of SARS-CoV-2 variants, we focus mostly on the major landmarks in the spatial modeling of adaptive variant evolution and discuss relevant aspects of mutational and reproductive processes as they arise. Before beginning, we need to clarify one more key aspect of the terminology in our writing. In the population genetic models we discuss, the processes of geographic dispersal, mutation, and reproduction each occur every generation. To think about these models in the context of a virus such as SARS-CoV-2, a natural simplification is to treat each passage from infection to transmission as a reproductive generation for the virus. In this simplification, any change in dominant viral type between an infection and transmission (i.e., within-host evolution) is considered as a mutation. Given most SARS-CoV-2 transmission occurs over a spatial scale of meters, the dispersal in each “generation” is primarily mediated by the movement of infected individuals. Additionally, in the case of SARS-CoV-2, the environment includes the immune system of the human host (as well as any other localized factors affecting transmission). Thus, the treatment of SARS-CoV-2 in the framing of these evolutionary models represents a substantial simplification. Yet, there have been few reviews of the theoretical population genetic models of spread and the lens of SARS-CoV-2 provides an interesting test case for understanding new directions in which the models could be developed. Complications in adaptive evolution: Spatially heterogeneous selection, allele surfing, and adaptive landscapes Many models cited above assume homogeneous selection across the entire geographic landscape; however, this is often untrue across the species range and so modeling heterogeneous selection is important. Differing selection along a cline has long been studied using reaction-diffusion models (see [49–52]) and in the context of integro-difference equations (for instance, [53]). Many relevant models of spatially varying selection also arise in the study of the evolution of quantitative traits [54–56], species range expansions and range limits (see, for instance, [57,58]), and discrete population or metapopulation models (see [59,60]). In the case of the spread of adaptive alleles, recent work has addressed the case where fitness is “patchy” across the geographic landscape, i.e., when certain alleles are adaptive in some local environments but deleterious in other regions, using stochastic approaches. For instance, an additional paper by Ralph and Coop [31] derives a critical distance between regions where the allele is favored, and this distance determines whether an allele is expected to evolve independently in each region or whether an influx of the adaptive allele from migration is expected. Notably, this model assumes that the dispersal displacement follows a Gaussian distribution and so may not be well suited for studying long-distance dispersal. In studying alleles that appear to be adaptive, it is also important to consider processes that can result in neutral alleles appearing to be selected. In particular, allele surfing is a phenomenon in which alleles present at the edge of an expanding population drift to higher than expected frequencies [61,62]. These alleles may be neutral, adaptive, or even deleterious [63]. This phenomenon is a result of stochastic effects at the wave front, akin to serial founder effects, such that mutations close to the edge in effect produce more offspring than those occurring internally. Notably, after the expansion has occurred, the center of the spatial distribution of a “surfed” allele will often be distant from its point of origin, complicating the interpretation of frequency data [61]. Allele surfing results in distinct regions where an allele is carried at high frequency, which radiate from the allele’s mutational origin along the direction of an expansion, referred to as sectors [64]. This lowers local genetic diversity, resulting in a pattern that may be misinterpreted as evidence of a selective sweep. Long-distance dispersal events are capable of breaking down these regional patterns and maintaining local diversity, though the behavior of these systems is dependent on the extent of long-distance dispersal, i.e., the tails of the dispersal kernel [65]. The offspring distribution of alleles at the wave front is also overdispersed due to the occurrence of these chance events, violating the assumptions of standard population genetics models (see [66,67] for recent work in this area). Allele surfing dynamics are potentially relevant to quickly expanding viral populations, such as SARS-CoV-2 VOCs [68,69], and introduce further complexities to the interpretation of mutation frequency data. Lastly, the models discussed thus far largely disregard the process of mutational introduction—a topic that can be discussed in terms of a nongeographic “landscape” of mutational or sequence space. Characterizing adaptive landscapes allows one to address questions regarding the number of possible adaptive variants accessible via mutation, including the number and characteristics of paths leading to them, and the probability of each occurring. Describing a full adaptive landscape is extremely difficult due to the large number of mutational combinations and orderings of paths, though important insights can be made by focusing on subsets of relevant mutations (as in [70]) or through approaches such as deep mutational scanning (see [71] for such analyses in SARS-CoV-2). Discussion In considering population genetic models for the spatial spread of adaptive alleles and their potential applications to SARS-CoV-2 variant evolution, we have identified several shortcomings of the models with respect to both evolutionary and epidemiological complexities. These include aspects of both geographic dispersal (i.e., simultaneous short- and long-distance dispersal, dependence of spread on heterogeneous travel networks) and transmission or reproduction of the virus itself (i.e., superspreading). Beyond what we review above, one must also consider details of viral life history, such as how viral fitness is mediated through components of immune evasion and transmissibility [124], as well as the properties of the human adaptive immune system as an evolutionary system in and of itself [125]. For instance, the phenomenon of accelerated SARS-CoV-2 within-host evolution in immunocompromised individuals [126,127] has been recently discussed in the context of the Omicron variant, which carries an exceptionally high number of derived mutations [69]. Our review thus highlights several goals for future work (Table 2). An important strategic challenge is how to address them. The computational epidemiology literature includes many large-scale, parameter-rich models (for instance, the CityCOVID model from Argonne National Laboratory; [128]). Phylodynamic and phylogeographic methods take a retrospective approach (see, for instance, [68,129,130]). The theoretical population genetic literature (reviewed here) tends to be more abstract and prospective. Certainly, an integrative model involving spatial, genetic, and epidemiological aspects of SARS-CoV-2 evolution would be ideal, in principle, for developing better prediction and insight regarding the evolution of viral pathogens such as SARS-CoV-2. That said, more elaborate models pose an incredible technical challenge to develop. Even if achievable on a technical level, there is an inherent risk of failure given the vagaries of human behavior, including the unpredictable ways humans have responded to policy changes. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 2. Ongoing challenges for future work in theoretical models of the spread of adaptive viral variants. https://doi.org/10.1371/journal.pgen.1010391.t002 Yet, work toward addressing the complexities listed in Table 2—either independently or in tandem—remains a worthwhile goal. Through rigorous spatial modeling, qualitative aspects of the possible evolutionary dynamics of viruses like SARS-CoV-2 will likely become apparent that can help guide public health responses. For instance, already, the core-halo structure identified in Paulose and colleagues [23] is insightful when interpreting observations of new clusters of variant transmission (see above). As a second example, the initial success of effective distance as a metric for simplifying models for the spread of SARS-CoV-2 [84,87] suggests the metric may be useful for genetic models. As a graph-based metric, its efficacy is also an indicator of opportunities to utilize results from the more general literature on spreading processes on networks [131–133]. In general, by studying these models and the patterns they predict—in particular those which are unexpected or perhaps counterintuitive outside of a spatial context—we may learn principles that will aid in the management of adaptive variants in future epidemics and pandemics. Overall, improved modeling of these processes has the potential to answer many compelling questions regarding SARS-CoV-2 and future pandemics, for example, how often will novel adaptive variants spread only locally versus globally? What kind of lag time should we expect between origin in one location and arrival in another? How much interference should we expect between adaptive variants? And how is this impacted by the geographic location of origin of new variants and/or patterns of long range dispersal? Moreover, what is the relative importance of public health measures that control local transmission (for example, mask policies) versus host movement (for example, travel bans)? In closing, our review of population genetic models for the spatial spread of adaptive variation identifies major gaps, in particular with respect to spatially and temporally varying dispersal, high variance in offspring number, and simultaneously spreading adaptive lineages. While we have largely focused our discussion on practical applications to modeling SARS-CoV-2 VOCs, the requisite development of theory will advance spatial genetic modeling generally. Beyond preparing for modeling and reacting to future epidemics, continued work in this area will give insights to problems in ecology and evolutionary biology such as the spread of invasive species and the consequences of population structure for adaptive evolution. Acknowledgments We thank Sarah Cobey, Rasa Muktupavela, Fernando Racimo, Daniel Rice, and Montgomery Slatkin for their comments on previous versions of this manuscript, as well as members of the Novembre, Berg, and Steinrücken labs for helpful discussions at various stages of this project. [END] --- [1] Url: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010391 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/