(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Getting higher on rugged landscapes: Inversion mutations open access to fitter adaptive peaks in NK fitness landscapes [1] ['Leonardo Trujillo', 'Université De Lyon', 'Insa-Lyon', 'Inria', 'Cnrs', 'Université Claude Bernard Lyon', 'Ecl', 'Université Lumière Lyon', 'Liris', 'Lyon'] Date: 2022-11 It can be verified that for point mutations 〈κ〉 = N, while for inversions mutations 〈κ〉 > N (for N ≥ 2) and therefore the genotypes are “more connected” to each other. Paraphrasing in terms of evolutionary biology, they are “more mutable”. In this sense, 〈κ〉 defines a mean mutability, which quantifies the ability to reach a different genome when the sequence undergoes a mutation. This property also holds for linear chromosomes, although as mentioned above, the average of node degrees is smaller since the number of possible inversions is lower than for circular chromosomes. Representative examples for N = 4, 7 and 10. Colour indicates the node’s degree κ. The reported values correspond to average node degrees 〈κ〉. The upper graphs show the point mutation case, verifying that the mutational networks are Hamming graphs and therefore isomorphic to their genotype spaces: (A) ; (B) and (C) . The lower graphs (D), (E) and (F) correspond to the inversion mutations cases, where we can note that these mutational networks are not isomorphic to their genotype spaces. For point mutations the mutational network is the Hamming graph [ 51 , p. 230] (as we can see in Fig 2A, 2B and 2C for , and respectively), which is isomorphic to the canonical genotype space . The notion of isomorphism means that the mutational network for point mutations preserves the adjacency of the edge structure of the genotype space. Historically, the canonical graph of the genotype space overshadowed the richness of the (full) mutation graph, since theoretically only point mutations are usually considered as generators of mutational networks. For inversions, the mutational network does not necessarily inherit the (local) topology of the genotype space. For example, Fig 2D, 2E and 2F outline the structure of the mutational networks for inversion mutations for N = 4, 7 and 10. From the point of view of graph theory, inversion mutations “rewire” the adjacency of the genotype space, i.e. they link genomes such that d H ( x , x ′) ≥ 1. Also, in graph terms, the total number of accessible mutants per genotype corresponds to the node’s degree κ x (defined as the number of edges in the graph incident on x [ 50 , p. 3]). Therefore, κ x = D ν ( x ). On the other hand, the average node degree quantifies accessible-mutants (nodes) interconnections: Now, for each directed graph of mutations m( x ), we can associate a graph M( x ) on the same set of vertices V(m). Corresponding to each directed edge of m, there is an edge of M with the same ends (loops being excluded). In this sense, the graph M( x ) is the underlying (simple) graph of the directed graph m( x ). That is, the graphs without self- and directed-edges so that when several edges (mutations) connect the genotype x with the same accessible-mutant x ′, only one (undirected) edge is kept. Thus, every directed graph m( x ) defines a unique, up to isomorphism, reduced graph M( x ) (see the mathematical definition in [ 50 ], p.3). Now it is natural to do the union of each graph M( x ), to describe how genotypes can be reached from somewhere in the genotype space in one mutation operation. We call this object the mutational network. It is defined as: Example of the total enumeration of inversion mutations, represented as graphs of accessible mutants, for each one of the 2 4 genotypes (central red nodes) with size N = 4. Edges colour quantifies the Hamming distance d H ( x , x ′), between the central nodes x (wild-types) and their mutants x ′. Each wild type is labeled in red and its number of accessible mutants D I is also displayed. Let us remark that this enumeration also depends on the fact that in our model the sequences are circular (i.e. periodic boundary conditions: x N+i = x i , ∀i ∈ {1, …, N}). Inversion’s combinatorics is not trivial since it involves the permutation of a subsequence and its flips between each strand (see Methods , schemata 2 and 6). Nevertheless, from the algorithmic point of view, for a given genotype x the mutational operations can be used to enumerate all the accessible mutants (see Methods , Algorithm 1: Mutate). Also, the combinatorics can be represented as a directed multigraph of mutations (see the mathematical definition in [ 50 , p.8]), i.e. the ordered triple where is the set of vertices (formed by a given genotype x and its mutated genotypes ), E(m) is the set of directed edges (from genotype x to a mutated genotype x ′) and is an incidence relation that associates to each element of E(m) an ordered pair of V(m). In fact, the incidence relation I m corresponds to a mutation operation (see Methods , Eqs ( 3 ), ( 4 ) and ( 5 )). As an example, in Fig 1 we display the atlas of accessible-mutants for N = 4, constructed by calculating all inversion mutations for each one of the 2 4 (wild-type) sequences (central red dots). In this example (see also Table 1 ), it is verified that the number of accessible-mutants for inversion mutations D I ( x ), ∀ x ∈ {0, 1} 4 is in the set {7, 8, 13} (unlike the case for point mutations, which must be a singleton, i.e. the single-value set: D P ( x ) ∈ {4}, ∀ x ∈ {0, 1} 4 ). We can also verify that min(D I ( x )) ≥ max(D P ( x )). In Table 1 we show the enumeration of accessible-mutants for inversions and point mutations for genome sizes ranging from N = 2 to 10. From Fig 1 and Table 1 , we can see that the combinatorics of the inversion mutations is not trivial. We can verify that the maximum number of accessible mutants is equal to N 2 − N + 1, which corresponds to the trivial cases of genotypes x with x i = 0, ∀i ∈ {0, …N} and x i = 1, ∀i ∈ {0, …N}. Note that for a circular sequence of size N, the total number of inversion mutations is N 2 , while for point mutations this number is equal to N. However, the number of mutants accessible by inversions is lower than the total number of inversions mutations (D I < N 2 ). This is due to “degenerate” inversion mutations: several inversions—occurring between different loci and/or for different interval sizes—may mutate the initial sequence to the same accessible mutant (see the multiple edges in Fig 1 ). In Fig 1 , we can also verify that there are loops (an “edge” joining a vertex to itself), that is, “invariant inversions” that preserves the nucleotide sequence after the inversion operation (i.e. d H ( x , x ′) = 0). It can easily be shown that the fraction of invariant inversions converges to 1/N (see S1 Text , section 1). A very important consequence of inversions is that mutated sequences can differ with the wild type by more than one nucleotide, i.e. d H ( x , x ′) > 1 (edges colours in Fig 1 denote the values of the Hamming distance). This result allows us to gain a first insight of how inversions can promote the escape from local fitness peaks: they can “connect”, in a single mutational event, genotypes that are at two or more point-mutational steps away. It is pertinent to remark that the combinatorics of inversions for alphabets with size would imply even more connections. We propose that, for a sequence , the mutation operation (i.e. the mechanistic representation of point or inversion mutations) build the set of accessible mutants (the subindex ν denotes the type of mutation: P for point mutations and I for inversion mutations). Therefore, the number of neighbouring mutants can be reformulated as A convenient way to organise such a set is by graphs connecting two sequences x and x ′ that differ by one point mutation (i.e. d H ( x , x ′) = 1). This is the so-called Hamming graph —a special case of the hypercube graph (the well-known graph representation of the genotype space). On the other hand, the Hamming distance for inversion mutations forms a set of integers satisfying meaning that, contrary to point mutations, the Hamming distance of inversion mutations range from zero to N (see Methods ). Note that if the inversion spans the entire chromosome, then d H ( x , x ′) = N, all loci have changed, but it also implies that 5’— 3’ becomes 3’—5’ and vice versa and nothing has changed biologically. We first study how inversion mutations can increase the number of accessible mutants. For this we translate the canonical notion of neighbour genotypes (see Methods , Eq 1 ) into a graph theory approach, and analyse the simplest and most familiar geometric object in molecular evolutionary theory: the discrete space of binary sequences (unless explicitly stated we will henceforth consider only binary alphabets). Thinking topologically, all the sequences x with N binary-nucleotides x i ∈ {0, 1}, ∀i = 1, …, N and , define the set of 2 N possible genotype combinations. A canonical measure to characterise the topology of the set is the Hamming distance Finally, contrary to point mutations, inversions are not commutative: in many cases, two overlapping inversions applied to a same initial sequence in direct or reversed order lead to different final sequences. This can easily be shown on an example: where, starting from the same sequence, the two inversions (inv(2, 3) and inv(2, 4)) give different outcomes depending on their order. Note that, given this property, the classical definition of mutational epistasis does not hold for inversion mutations. Representative instances of the NK model for N = 4 and their fitness networks in layered representation. The layers are constructed such that each node is assigned to the first possible layer, with the constraint that all its predecessors must be in earlier layers. The colors of the nodes correspond to the values of the out-degrees, i.e. the number of edges going out of a node (note that color scales differ in range between panels). Therefore, nodes with node out-degrees equal to zero correspond to local fitness maxima (sink nodes). The landscapes’ ruggedness are: single peaks K = 0, intermediate ruggedness K = 1 and full rugged case K = 3. Node sizes are scaled with fitness values (the best fitness, the largest, and vice versa). Global maximum of fitness are encircled in red. While the global minimum in blue. The total number of fitness maxima and minima are also reported. See S1 Fig for epistatic interactions with adjacent neighbouring. To illustrate the definition of fitness network used in this work, we need to build fitness landscapes instances. For that, we use the well-known NK-model [ 45 , 46 , 48 ], recalling that for a genome of size N, the parameter K ∈ {0, …, N − 1} corresponds to the epistatic coupling between loci and thus tunes the ruggedness of the landscape (for a brief introduction see Methods ). Here, we use two neighbourhood coupling models between loci: i) the adjacent model in which the K loci are those closest to a focal locus x i ; ii) the random neighbourhood model, where the K loci are chosen randomly among the N − 1 loci other than x i (an illustration of epistatic interactions is sketched in NK model in Methods ). In Fig 3 we show representative examples of fitness networks engineered for N = 4 with K epistatic random neighbours (see S1 Fig for the fitness networks with K epistatic adjacent neighbours). The landscapes range from single peaked K = 0 (no epistatic interaction) to full rugged landscapes K = 3 (highly connected epistatic interactions). Global fitness maxima and minima are highlighted by encircled nodes. In Fig 3A, 3B and 3C we show an instance of fitness networks for genotypes connected through pathways with point mutation steps. They have the same topology as the genotype space and, therefore, are isomorphic to the graph representation of the fitness landscape . When K = 0 all the trajectories arrive to the single global maximum of fitness. When we construct the fitness network with inversion mutations, we can verify in Fig 3D, 3E and 3F that there are more paths between all the genotypes, and so it is easier for an evolutionary process to explore more domains of the fitness landscape compared to point mutations. Many of these paths connect genotypes such that d H ( x , x ′) > 1, and therefore, are like jumps between distant domains of the landscape. We can also verify that, for a given fitness function f, a node that is a local optimum on the fitness graph , is not necessarily a local peak for inversion mutations in the fitness network. In most of the cases, the fitness landscape is “smoothed out” by inversion mutations, since the notion of local peak fades in the fitness network. However, the local peaks are not always smoothed out, as we can see in Fig 3E for K = 1, where genotype 0101 remains as a local peak. This is because 0101 cannot be mutated to genotype 1001 by any inversion mutation (see also the combinatorics of accessible mutants for 0101 in Fig 1 ). Note that for inversion mutations, it is verified that in some cases the global maximum can be reached from the global minimum in a single evolutionary step with d H ( x , x ′) ≠ 1, e.g. Fig 3E with d H = 2. Likewise, as the mutational network we can also define the fitness network , but in this case the edges are directed from genotypes with lower fitness to genotypes with higher fitness: which also depends on the neighbouring , with ν denoting the type of mutation (P and I for point and inversion mutations respectively). Therefore, the fitness network is the anisotropic version of the mutational network, the direction of the evolutionary paths being fitness-dependent. Precisely what mutational and fitness networks reveal to us is the ensemble of possible evolutionary paths. But in this case, fitness networks are diagrams showing the paths upward and their “altitudes” (fitness values). Even though the nature of the genotype-to-fitness function is still largely unknown, an easy way to introduce it into computational models is by assuming that for genotypes there exists a map from the set to the real numbers . In the graph-based representation, each node (genotype) then possesses a fitness value f( x ). This fitness landscape graph F is isomorphic to the hypercube graph (i.e. the genotype space) and therefore can also be represented as Hamming graphs, providing a fitness value per node. So, the fitness landscape graph is univocally defined as: Getting higher on rugged landscapes Up to now, we have shown results on the combinatorial (topological) differences between point and inversion mutations. Inversions cannot be mapped to the classical “fitness landscape” metaphor—being better represented through mutational networks and their juxtaposition with fitness landscapes through fitness networks. This is because, for inversion mutations, there are shortcut routes connecting distant sequences (differing by more than one base) in the genotype space and consequently in the fitness landscape. Therefore, this can be interpreted as “escape routes” from local peaks. We want to verify if as a consequence of these escape routes, an evolutionary process will be able to reach higher peaks of fitness. For that, we performed computer simulations in the SSWM setting, where adaptation occurs by sequential fixing of novel beneficial mutations (see Adaptive walks in Methods). We focus our study on a series of n-repetitions of adaptive walks, where the evolutionary process is driven by (random) mutational steps. For a given set of independent initial random genomes with size N = 100, {x 0 } ∈ {0, 1 }100, we create two pools of n = 100 simulations for point mutations and inversions respectively. As before, we use the NK model to engineer rugged fitness landscapes. In each round, the landscape is the same for simulations with point mutations and inversions, respectively. For independent explorations over (sub)domains of the landscape, we monitor the time-evolution of the fitness values until a fitness optimum is reached. This is when it is verified, in the simulation, that a genotype satisfies Subindex ν denotes the type of mutation (P for point mutations and I for inversion mutations). Then, we calculate the mean fitness value per K as: where the notation | K means that the average is calculated for a fixed value of K ∈ {0, …, N − 1}. In Fig 4 we show the behaviour of the average fitness 〈f ν 〉 K , calculated from n = 100 instances of adaptive walks simulations in NK landscapes, with N = 100 and values of K ranging from 0 to 99. The simulations correspond to the case of epistatic interactions with K closest adjacent loci (‘+’ marker), and with K randomly chosen loci (‘o’ marker). The markers of the simulations with point mutations and inversions are coloured with blue and red respectively. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 4. The average value of local fitness maxima suggests the escape from local fitness peaks by inversion mutations. Changes in mean final fitness for different epistatic parameter K for inversion (red) and point mutations (blue), averaged for 100 instances of adaptive walks simulations in NK landscapes. The circle (respectively cross) markers corresponds to random (respectively adjacent) neighbouring epistatic interactions. Inset: Difference Δf K between the mean of local fitness maxima of inversions and point mutations, for random (circles) and adjacent (cross) neighbouring epistatic interactions. https://doi.org/10.1371/journal.pcbi.1010647.g004 For the simplest case K = 0, with no epistatic interaction between neighbouring loci, we can verify in Fig 4, that the average fitness for point mutations and inversions are equal 〈f P 〉 K=0 = 〈f I 〉 K=0 ≃ 0.667. In this case, the landscape is smooth with only a single peak. Hence, mutations that increase fitness are not hard to find and 〈f ν 〉 K=0 is independent of mutation types. This result also agrees with the Kauffman’s (analytical) result , which was calculated using order statistics arguments [48, p. 55]. Then, for K > 0, the average fitness 〈f ν 〉 K increases with K until reaching a maximum value of fitness. In relation to this maximum, in Fig 4 we can identify the following four cases: i) point mutations with adjacent epistatic interactions, 〈f P 〉 K = 0.707 for K = 2; ii) point mutations with random epistatic interactions, 〈f P 〉 K = 0.722 for K = 5; iii) inversions with adjacent epistatic interactions, 〈f I 〉 K = 0.747 for K = 4; and iv) inversions with random epistatic interactions 〈f I 〉 K = 0.732 for K = 4. After these maximum, the fitness values decrease as K increases. This trend is also consistent with the seminal simulations carried out by Kauffman and Weinberger [46, 48]. For example, when K → N − 1, the mean fitness converge to the same value, regardless of the type of epistatic interaction neighbourhood. For point mutations with random and adjacent epistatic interactions, we obtained 〈f P 〉 K=96 ≃ 0.580 and this agrees very well with the Kauffman’s numerical outcomes for K = 96, c.f. [48, Tables 2.1 and 2.2]. It is worth mentioning that these trends—lower fitness being associated with increasing epistatic interactions—correspond to the well-known “complexity catastrophe” described by Kauffman [48, p. 52] (see also Refs. [52, 53]). These numerical outcomes confirm that our numerical set-up reproduces, with point mutations, what is known about 〈f P 〉 K vs K in the NK model [46, 48]. Now, what’s new is that for inversions, the average fitness values are higher than those for point mutations. Indeed, for K → N − 1, the average fitness trend is very different from that of point mutations. For example, the complexity catastrophe estimates that as K increases, the expected fitness of the local maximum (for point mutations) decreases toward 1/2 [48], which is indeed verified here. But for inversions, the evolutionary process reaches higher expected fitness values 〈f I 〉 K=99 = 0.610 > 〈f P 〉 K=99 = 0.579, for both random and adjacent epistatic interactions. To generalize these results, we reproduced this experiment with other, more restrictive, definition of inversion mutations. More specifically, we tested inversions on linear chromosomes (with boundary conditions) and circular inversions with upper size limit s ≤ N, ranging from 1%—a single locus (i.e. inversions are like point mutations)—up to 100% of chromosome size. Simulations with linear chromosomes show no significant difference from our reference circular model (see supplementary S1 Text, section 2 for detailed results). Simulation with an upper size limit show that the final fitness values increases as the size limit increases. However, the gain is maximal for small s values (typically up to s = 16) showing that small and mid-sized inversions are sufficient to reach high fitness peaks. Fig 4 also show that, in average and for almost all K, the adaptive walks reach higher fitness peaks through inversions than through point mutations. To better visualise this statement, in the inset of Fig 4 we plot the following difference: for the two neighbourhoods. To specify which type of epistatic interaction we are referring to, in what follows, we will use the notation (Δf K ) rnd for the case in which the fitness differences correspond to simulations with random neighbourhood, and (Δf K ) adj for the adjacent ones (respectively the markers ‘o’ and ‘+’ in Fig 4). In the absence of epistatic interactions, i.e. K = 0, we can note that (Δf K ) rnd = (Δf K ) adj = 0. Then, in the presence of epistatic interactions, (Δf K ) rnd is monotonically increasing between 0 < K ≤ 2. Then for 2 < K ≤ 31, (Δf K ) rnd is monotonically decreasing, and for K > 31 it is again monotonically increasing. For random epistatic interactions between 2 < K ≤ 50, the fitness values for the case with inversion are not very different from those of point mutations (note in Fig 4 that, in this interval, the red and blue curves with marker ‘o’ are very close to each other). On the other hand, we can observe that for 5 < K ≤ 65, (Δf K ) adj is monotonically decreasing, and for K > 65 it is again monotonically increasing. We can also observe that, between K > 0 and K ≐ 80, (Δf K ) rnd < (Δf K ) adj . Contrary to the case of random epistatic interactions, for adjacent interactions (Δf K ) adj is higher since the fitness values reached by inversions are higher than those reached by point mutations (note in Fig 4 the gap between the red and blue curves with the marker ‘+’). So, we infer that an inversion—modifying several loci—results in a mutually advantageous conjunction with local epistatic interactions, that allows explorations of more combinations that can be beneficial. Finally, for K > 80, (Δf K ) rnd ≈ (Δf K ) adj (still monotonically increasing), i.e. regardless of the epistatic interaction neighbourhood, inversions can reach higher fitness values and attenuate the complexity catastrophe by not decreasing towards 1/2 (compare with [48, Tables 2.1 and 2.2] and also note in Fig 4, the gap between the tails of the red and blue curves). Therefore, our results show that in the presence of inversion it is possible to reach higher fitness when compared to adaptive walks with only point mutations. A direct interpretation of this result is given by the properties of the inversion’s mutational network as it has been described above (see e.g. Fig 2). Indeed, as it is more densely connected than the point-mutation mutational network, it is likely to allow a larger exploration of the fitness landscape and thus reach higher peaks, as observed here. However, given that the ruggedness of a fitness landscape depends on the mutational operator at work, an alternative explanation is that inversion mutations result in a smoother fitness landscape than that of point mutations, hence facilitating the finding of trajectories leading to higher peaks. To test this assumption, we generalized the roughness measure introduced by Aita, Ikamura and Husimi in [54]. More precisely, we measured deviations from fitness additivity (in the language of the NK model, we say that a landscape is additive when it is non-epistatic, that is, K = 0). We here use the term roughness from [54] to distinguish this measure, which is a local one, from the classical ruggedness of the NK-fitness landscape which is a global property of the landscape. See, for example, Refs. [55] and [56], for other definitions of roughness and how they are calculated. Following the approach introduced in [54], we computed the roughness of the fitness landscape as the root mean square fitness variation due to each possible mutation, for both point mutations and inversions (see Methods for a formal mathematical definition of this measure). The results are shown in Fig 5. As expected, for point mutations the roughness of the fitness landscape is (almost) linearly proportional to the epistatic interaction parameter K, for both types of epistatic interaction neighborhoods (adjacent and random). In contrast, in the case of inversion mutations, the roughness is always greater than that of point mutations, this trend being particularly visible for random epistatic neighborhood when compared to adjacent neighborhood. Interestingly, even for K = 0, the roughness is already higher than for point mutations (both epistatic neighborhood being equivalent in that case) while for K = N − 1 the roughness converges approximately towards similar values both for inversions and point mutations whatever the epistatic neighborhood. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 5. Average value of the local measure of roughness. Local roughness measured as the mean square differences between the fitness in a point of the landscape and its neighbours, for inversion (red) and point mutations (blue), for adjacent (crosses) and random (circles) epistatic interaction neighbourhoods, averaged for 100 instances of NK fitness landscapes. https://doi.org/10.1371/journal.pcbi.1010647.g005 This result shows that inversion mutations actually don’t smooth the fitness landscape. On the opposite, the average roughness increases much faster with K in the case of inversions than in the case of point mutations (the roughness of the inversion-based fitness landscape with K = 1 being similar to the one for point mutations with K = 50, Fig 5). This result also suggests a new explanation for the advantage of inversion mutations over point mutations. While the high connectivity of the inversions mutational network enables a better exploration of the fitness landscape, this effect is hampered (and not facilitated) by the effect of inversions on the roughness and while the combination of both positive (connectivity) and negative (roughness) is favorable for all values of K in the case of adjacent neighborhood, it is only favorable for high values of K in the random epistatic neighborhood. Indeed, for epistatic interactions with a random neighborhood, there is no noticeable difference between the average fitness values up to K ≃ 40 (red and blue curves with markers ‘o’ in Fig 4). This is likely to be due to the fact that inversions are segmental operators. When epistatic interactions are confined to a segment close to the focal nucleotide (which is the case for the adjacent neighborhood but not the random one), both segments can largely overlap, hence limiting the effect of the inversion to a set of epistatically interacting genes. This reduces the average roughness (compared to random epistatic neighborhood), leading to a more efficient exploration of the fitness landscape. Although a full mathematical proof is out of the scope of this paper, we develop a representative mathematical analysis that illustrates the origin of this pattern in S1 Text (section 3). [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010647 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/