(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Amino acid variants of SARS-CoV-2 papain-like protease have impact on drug binding [1] ['Agata P. Perlinska', 'Centre Of New Technologies', 'University Of Warsaw', 'Warsaw', 'Adam Stasiulewicz', 'Department Of Drug Chemistry', 'Faculty Of Pharmacy', 'Medical University Of Warsaw', 'Mai Lan Nguyen', 'Faculty Of Mathematics'] Date: 2022-12 SARS-CoV-2 is undergoing massive academic evaluations, that bring vast amount of data to analyze including its genetic sequence. Due to the pandemic character of the virus, the sequences are gathered from all-around the world and then deposited in the databases for analysis. We used NCBI Virus database and ORF1a amino acid sequences, that include papain-like protease (PLpro) domain, to evaluate its sequential variability in the human population. Our main focus is to assess if the current mutations of the PLpro are present in the inhibitor binding site and if so, what their impact is on the protein-ligand interactions. Out of 2,610,999 PLpro sequences, we found mutations in 21% of them (540,129; see S1 File for detailed information about all of the variants). In most sequences only a single mutation is found in the sequence (86.5%), less frequently at two (12.8%), three (0.5%) and more (0.02%) positions. Upper panel: cartoon representation of the protein with inhibitor S43 shown in sticks and zinc ion as a sphere (based on PDB ID: 7e35). Teal region of the protein indicates N-terminal ubiquitin-like domain (Ubl). Lower panel: cartoon representation of the protein with residues colored according to their mutation frequency (percentage of sequences with a given mutation out of all mutated sequences). Structure of papain-like protease is comprised of two main domains: ubiquitin-like (Ubl) domain and catalytic domain. The latter is further subdivided into the thumb and palm domains, which include the catalytic triad, and the fingers domain, which contains the zinc-finger motif ( Fig 1 ). The inhibitor binding site is located between the thumb and the palm domains. The most crucial part of this site, in terms of inhibitor interactions [ 38 , 39 ], is formed by residues of the blocking loop (BL2) that acts as the lid for the site. After mapping the obtained sequence data on PLpro structure from PDB, we determined whether there are regions in the protein that are particularly susceptible to mutations. Due to the fact that the vast majority of PLpro amino acids can be mutated (98%), no particular region is more affected by modifications than others ( Fig 1 ). We found only five amino acids that are not being mutated in any of the sequences (Y95, L118, Y137, H272, Y273). That includes H272—one of the catalytic residues. Possibly, other invariant amino acids are important for the proper functioning of the protein. Therefore, analysis of non-mutated amino acids of unknown function will help identify positions or regions of interest in drug design that are important to the virus. Interestingly, apart from H272 the rest of the known functionally important amino acids like zinc-binding cysteines (C189, C192, C224, C226) or protein-binding residues (D164, E167) are not invariant. However, their mutation rates are minuscule (see S1 File for more details). P223 is another amino acid that began to mutate more frequently in late 2020, which is consistent with the earlier research [ 40 ]. This amino acid is important for protein function by being in the zinc-binding loop (right next to cysteine directly interacting with the ion) and also as a part of the S1 ubiquitin binding site. Proline located at this position can be changed to one of five amino acids: leucine (the most common variant), serine, histidine, phenylalanine, or threonine. Any of these variants can make a great impact on the region since the observed amino acid change is significant. The second most frequently changed amino acid (P77) is present in 27% of the mutated sequences. Similarly to A145, it can occur as a single mutation, but also as one of many mutations. In addition, there are numerous variants associated with P77, indicating that this mutation is not co-dependent on other positions. Frequent mutations are those with mutation rate >0.002. Residues from inhibitor binding site were selected based on residue-ligand distance (no more than 6 Å) calculated for all available PLpro-ligand crystal structures. Amino acids in Variants column are sorted from most to least frequent. All found variants are available in Supplementary Material. The A145D mutation is the most prevalent variation in the PLpro sequence, present in about 36% of all the sequences that deviate from the reference. Most often it is the only variant in the sequence, but it can also occur together with more mutations. We observe a great diversity among the mutations with which A145D co-occurs—which shows that there is no amino acid whose mutation is dependent on the presence of the A145D variant and vice versa. A145 is located on the surface of the protein and is surrounded mostly by polar residues, including three lysines with no other contacts. Introduction of the aspartic acid may create new strong interactions with the positively charged residues, such as K91 or K92. The mutation to Asp is not an only option seen for this amino acid, we observed also six different variants in this position (V, S, T, G, P, N; Table 1 ). The mutations at this position first appeared in late 2020 (October) and quickly expanded over the following months to become the most frequent PLpro mutation ( S1 Fig ; this accords with earlier findings [ 40 ]). A145 mutation reached its peak of popularity in April and May 2021, when more than 60% of the mutated PLpro sequences carried it. However, in these months we see the highest percentage of mutated sequences—up to 80% of all the sequences that were deposited in the NCBI database were carrying at least one mutation. In our data set such situation appeared for four consecutive months (April-June 2021) and was never repeated ( S1 Fig ). This analysis shows the importance of five amino acids in particular: P247, P248, Y264, Y268, and Q269. They have high impact on the ligand binding energy ( S5 Fig ). Surprisingly, almost all of them were found to be mutated in the data set we analyzed. P247 can be changed to one of four different amino acids and importantly, the change is substantial—from small non-polar proline to polar serine or bigger glutamine. Given that this amino acid interacts with the aromatic rings of ligands, such a change could have negative effects on their binding. However, the most common mutation is P247L, which we predict would not have a severe impact on the protein. Similar findings are applicable for P248, that is most frequently mutated to Leu and Ser. In the case of Y264, we only detected four sequences with its variant and it always co-occurs with a mutation in E263 (E263D-Y264H). The remaining two amino acids (Y268 and Q269) are already established as the most important amino acids, capable of interacting with a variety of inhibitors [ 41 , 42 ]. It was shown that both Y268G and Y268T negatively affect the inhibition of GRL-0617 [ 29 ]. However, we do not observe these specific amino acids, but rather changes to His and Cys at position 268 and to Lys, Leu and Arg at position 269. Interestingly, although mutations of Y268 or Q269 are not common, we observe double and triple variants involving these residues (e.g. T265A-Y268C). GRL-0617 is maintained in the binding site by both polar and hydrophobic interactions, whereas most of the S43 and protein binding interactions are hydrophobic. However, the hydrogen bond present in the crystal structure is further maintained in the simulation of the complex. Bonds with other amino acids are scarcely present—about 10% of the simulation time with side chain of Q269 and up to 10% with main chain of L163. The hydrophobic interaction network is more extensive. In over 80% of the simulation time naphthyl group has non-polar interaction with P248, over 25% with Y268 and over 20% with P247. Moreover, the piperidine ring of S43 interacts for 50% of the time with Y268. In the Wild-Type trajectories additional interaction with Y264 emerged that was not present in the crystal structure (piperidine-Y264 for about 60% of the simulation time). Next, in order to gather more in-depth information about protein-ligand interactions that goes beyond static crystal structures, we performed multiple 500 ns long Molecular Dynamics (MD) simulations of both complexes. In the case of GRL-0617, the hydrogen bond present in the crystal structure is poorly maintained in the WT simulation of the complex (6% of the simulation time). The main hydrogen bond found in the trajectories is with the main chain of Q269 (over 85% of the time). Bonds between the ligand and side chain of Q269, Y264, and Y268 are significant as well (over 40%, 25%, and 10%, respectively). Overall, the average number of hydrogen bonds is 1.7. In over 80% of the simulation time naphthyl group has π-alkyl interaction with P248 and in over 20% π − π with Y268. Our main focus is to identify residues that strongly reduce the inhibitory effect of noncovalent molecules. In order to identify such residues, we first analyzed PLpro-ligand interactions in wild type (WT) complexes with two chosen representatives: S43 (PDB ID: 7d7t) and GRL-0617 (PDB ID: 7jrn). In the crystal structure of the protein-S43 complex, the ligand interacts with the protein via single hydrogen bond with main chain of Y268 and non-polar interactions with P247, P248 and Y268. In the case of the crystal containing GRL-0617, there is also a single hydrogen bond (but with D164) and the hydrophobic interactions involve P248 and Y268. Hydrogen bonding with side chain of D164 is not possible for S43 because the ligand’s polar groups are not properly aligned for such an interaction. Both ligands contain naphthalene, which is involved in the π − π interactions with the protein where it forms T-shaped stacking with Y268. Based on the currently available crystal structures, some of the residues we found to be part of the inhibitor binding site are only interacting with covalent inhibitors (W106-Y112; PDB ID: 6wuu, 6wx4). Because the inhibitors form a bond with C111, all the surrounding residues can be important for the molecule’s binding; yet, the mutation rate of these residues is exceedingly low. In particular, despite its low frequency, N109S variant could be important, since the mutation to serine introduces a smaller residue, which could diminish the interactions between the molecules. The rates of mutations of these residues are presented in Table 1 , along with the residues they are replaced to. It is worth noting that the residues forming the BL2 also can be mutated (aa 267–270). Based on PLpro-S43 complex from PDB ID: 7e35. Amino acids that can undergo mutation are dispersed evenly throughout the protein sequence, including the binding site for the inhibitors. Based on all available PLpro-ligand crystal structures in PDB, we identified the residues interacting with the inhibitors (both covalent and noncovalent; Table 1 ). Most importantly, there are many different variants of 22 amino acids present in the site ( Fig 2 ). The affected residues spread out evenly across the site and can affect binding of the molecules in various ways. Even though in this work we focus on the noncovalent inhibitors, we utilized the information about the residues obtained from complexes with covalent ligands as well. It is because we believe that the current set of inhibitors represents just a beginning of the search for potent PLpro inhibitor and it is crucial to gather and present the entirety of the data. One has to bear in mind that techniques such as docking are inaccurate to semi-accurate. Thus, in some cases they may also produce results that will turn out to be false positives. In order to reduce this risk, we utilized two docking protocols and simplified mutation energy calculations. Merging results from multiple, even semi-accurate techniques, allows to obtain a consensus view that is less prone to feature a random, false outcome. Nevertheless, despite the possible problems with false positive results, such methods are valuable because of their ability to initially screen out mutations unlikely to significantly affect potential drugs binding. For example, based on PLpro crystal structures, Q269 is one of the key residues when it comes to binding site steric properties and creating interactions with the inhibitor. Therefore, the mutation of this amino acid should intuitively have a significant impact on ligand binding. However, both MD and the faster methods show that Q269R does not substantially affect the inhibitors’ affinity to PLpro. Thus, techniques such as docking may effectively act as a preliminary sieve and allow to select only most probable candidates for the more time-costly evaluation. For each PLpro-inhibitor complex, the first column shows mutation energy, which is the difference of the binding energies between the respective mutant and the wild type (WT) protein, estimated using Calculate Binding Energies protocol in Discovery Studio. The next two columns include negatives of the CDOCKER interaction energies from molecular docking conducted to the PLpro mutants prepared with two methods in Discovery Studio. The first one utilized the Build Mutant protocol. The second one consisted of the mutation of specific residues with a subsequent side chain conformation refinement. Results of both docking and energy calculations highlight the mutations that may negatively affect potential drugs’ binding. The three probably most significant ones are P248S, and the double mutations E263D-Y264H and T265A-Y268C ( Table 2 ). The latter variant most strongly affects the inhibitors’ binding. The results from the additional calculations support these findings. Moreover, they indicate that mutations of P247 and the double variant E161D-P247S may be of importance, although it seems less probable than the aforementioned ones. First, in order to choose variants of amino acids from the PLpro binding site worth further investigation, we roughly estimated the impact of the mutations on the inhibitor binding. For this purpose, we utilized molecular docking and simplified binding energy calculations in Discovery Studio. We took into account most of the mutations that had been encountered up-to-date for the residues in the proximity of the inhibitor binding site. We tested their effects on two PLpro-inhibitor complexes, containing GRL-0617 and S43 (PDB IDs: 7jrn and 7d7t, respectively). We utilized three main methods. The first one is the estimation of the mutation energy, namely the difference between the mutant and the wild type (WT) binding energies. The latter two included redocking to the mutated PLpro, prepared in two ways. Additionally, to ascertain the results, we employed additional scoring functions, MMGBSA binding energy calculations, and one more PLpro-inhibitor complex, PDB ID: 7jn2 (see S2 Fig for more information). Impact of selected variants on inhibitor binding Taking all of the above analysis into consideration, we chose five positions in the PLpro, which mutations have the biggest potential to negatively impact inhibitory effects and binding of the ligands (P247, P248, Y264, Y268, Q269; Fig 3A). For each selected variant, we performed all-atom molecular dynamics simulations (at least two trajectories for 500 ns) and calculated ligand binding energy using MMGBSA for two inhibitors: GRL-0617 and S43. Overall, for ten different variants we performed over 30 trajectories. Along with the variants predicted to be impactful, we also analyzed three additional variants (G163S, A246V and N267D) that served as a quality control. We measured the stability of the whole protein and the binding site using RMSD and it shows similar behavior of the mutants in regard to WT (S3 and S4 Figs). The differences we observed are discussed in detail below. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 3. Selected variants found in PLpro. (A) Location of amino acids subject to mutation relative to the ligand (S43) binding site (based on PDB ID: 7d7t). (B) Representative S43 conformations bound to PLpro. Calculations are based on frames sampled every 100 ps from joint Molecular Dynamics trajectories. Conformations found in less than 5% of the frames are not shown for clarity. (C) Frequency of protein-ligand hydrogen bonds in different trajectories. Left: GRL-0617, right: S43. Calculations are based on frames sampled every 100 ps from Molecular Dynamics trajectories. Dashed rectangle points to the amino acid interacting via hydrogen bond with GRL-0617 in the crystal structure (PDB ID: 7jrn), the one with solid line with S43 (PDB ID: 7d7t). In the case of Q269 the ligand can bind to either its main chain (M) or side chain (S). The red asterisk indicates the trajectories in which the ligand dissociated from the binding site. (D) Ligands MMGBSA binding energy (kcal/mol) to PLpro. Left: GRL-0617, right: S43. Calculations are based on frames sampled every 100 ps from Molecular Dynamics trajectories—the averages and standard errors are shown. The trajectories in which the ligand dissociated from the binding site are marked with red asterisk. https://doi.org/10.1371/journal.pcbi.1010667.g003 P247S. Out of the possible variants in position 247 the one with the serine replacing proline has the biggest impact on the inhibitor binding. For the S43 ligand, the binding energy ranges between -25.8 ± 1.8 kcal/mol to -33.1 ± 0.8 kcal/mol, which is significantly higher than the energy observed in WT simulations. Conformation of the ligand in its binding site is mostly resembling that observed in wild type conditions (Fig 3B). However, there are two new representative conformations that are showing changes that appear due to the mutation. For example, we observed two new bonds formed between the ligand and amino acids D164, Y264 (Fig 3C). The main hydrogen bond with Y268 seen in the WT simulation is greatly reduced, especially in the trajectories with ligand leaving the binding site. Overall, the average number of hydrogen bonds in the trajectories is slightly smaller than in WT as it ranges from 0.5 to 1. Due to the lack of proline in 247 position, nonpolar interactions between ligand and Y264 and Y268 are affected. Specifically, very weak interactions between naphthalene and piperidine rings and Y268 seem to be connected to ligand exiting the site, as these interactions are present in P247S trajectories where the ligand is bound the entire time. On the other hand, GRL-0617 is stably bound to the binding site for the entire trajectories. Ligand’s binding energy based on MMGBSA calculation (-31.7 ± 0.1 and -31.8 ± 0.1 kcal/mol; Fig 3D) is not different from the one obtained for WT trajectory. This shows that the mutation does not have an impact on the protein-ligand interactions. Specifically, GRL-0617 does not change its conformation during any of the trajectories. Both polar and nonpolar interactions between protein and the ligand are stable. Moreover, due to the fact, that the important nonpolar interactions with naphthyl group are mainly realized by P248, mutation of P247 does not affect the non-polar interactions between protein and the ligand. P248S. Regardless of the type of ligand bound to the protein containing the P248S variant, the ligand did not dissociate in either of them. However, in both cases, the energy of the ligand bound to the mutated protein is less favorable than in WT, which shows that this variant might affect the protein’s binding capabilities in a universal way, regardless of the ligand. In trajectories with bound GRL-0617, the ligand binding energy ranges from -28.1 ± 0.3 to -28.9 ± 0.2 kcal/mol (Fig 3D), which is higher than the energy observed for the WT protein. Throughout the trajectory, the ligand is stable and maintains its native conformation. Similarly, the polar interactions between the protein and the ligand resemble those found in the WT protein. The only difference is the slightly lower frequency of interactions with the Q269 side chain. However, the main effect of the lack of proline at position 248 is seen when analyzing non-polar interactions, due to the lost P248 interactions with the naphthalene rings. Their absence is offset by the more frequent naphthalene-Y268 interaction. In the case of the protein-S43 complex, the ligand binding energy is also higher than in the WT protein (-33.5 ± 1.0 kcal/mol and -31.1 ± 0.6 kcal/mol), indicating a negative effect of the P248S variant on S43 function. There are only minor differences in polar protein-ligand interactions between the mutant and WT proteins. Lower hydrogen bonding rates with Y268, but slightly higher with the main chain Q269. Overall, the average protein-ligand hydrogen bond number (0.9) remains similar to that of WT (1.1). There is an apparent weakening of the interactions between naphthalene and P247 and Y268. However, Y268 remains in contact with the piperidine ring. Most of the time, the ligand adopts the same conformations as in the not mutated trajectories. Overall, the P248S variant weakens the non-polar interactions between protein and ligand. However, even though the majority of protein-ligand interactions were non-polar, the ligands did not dissociate from the binding site. The importance of the residue for the ligand binding is high—the variant is not able to match the energy composition of the native proline. Also, it affects the P247 contribution (S5 Fig). E263D-Y264H variant. To assess the impact the double variant E263D-Y264H has on the protein-ligand interactions, we performed similar MD simulations as we did with WT protein. Although the detailed results differ between the two ligands studied, the ultimate outcome is the same: impaired protein binding that leads to dissociation from the binding site. For the S43 ligand, the binding energy is higher than in WT protein, showing that S43 is affected by E263D-Y264H variant (-30.1 ± 0.7 kcal/mol and -31.6 ± 1.2 kcal/mol). It is manifested by weaker stabilization of the molecule in its binding site. Ligand S43 changes conformation, which is shown by substantially higher Root Mean Square Deviation (RMSD) than in WT trajectories (4 Å and 5 Å for trajectories 1 and 2, respectively). Despite such high fluctuation values, S43 dissociates from the binding site only from trajectory 2 (after 415 ns). In terms of protein-ligand interactions, we observed that new bonds are formed between main chain of D164, E167 and the ligand (Fig 3C). The main hydrogen bond with Y268 seen in the WT simulation is greatly reduced, to only 12% and 18% for the two trajectories. Overall, the average number of hydrogen bonds in the trajectories is slightly smaller than in WT—0.7 and 0.9. Moreover, there are also slightly weaker interactions between naphthalene and two prolines (P247, P248) than in WT. New interaction between benzene ring and K157, not present in WT trajectories, is a result of ligand’s change in conformation in the binding site (Fig 3B). Also, the mutation of Y264 impacted its π-alkyl interaction with the piperidine ring of S43 ligand. The ligand is dissociating from the binding site in both trajectories for the E263D-Y264H mutant with bound GRL-0617 (after 194 ns and 370 ns). The energy of the ligand binding in this variant is similar to the energy from WT trajectory (-31.7 ± 1.0 kcal/mol and -31.5 ± 0.5 kcal/mol). Even though the ligand exits its binding site in both trajectories, C α RMSD of the protein remains stable and fluctuating around 2 (S3 Fig). The mutated residues are not inducing any substantial change to the protein’s backbone. Also, ligand is not changing its conformation (S6 Fig) and only slightly bigger fluctuation of the 4-methylbenzenamine part of the ligand can be observed. Due to the mutation of Y264 to histidine the rate of the hydrogen bonds with this residue dropped from over 25% to 2%. Similarly, hydrogen bond between main chain of Q269 and the ligand is lower by over 20%. Overall, the number of protein-ligand hydrogen bonds is smaller in this variant (on average 1.2) than in WT (1.7). In terms of protein-ligand stacking interactions, there are only small differences. This variant attributes to more frequent stacking interaction between Y268 and inhibitor’s naphthyl group than in WT. The reason behind ligand’s dissociation from the protein most probably lies in a lower number of hydrogen bonds, mainly with mutated 264 amino acid. T265A-Y268C. Out of two substitutions in this double variant, the mutation of Y268 has bigger impact on the inhibitor binding, due to its non-polar interactions with the ligand. We observe dissociation in one of three trajectories, regardless of which ligand is bound to the protein. For the S43 ligand, the binding energy ranges between -30.9 ± 0.7 kcal/mol to -35.1 ± 0.3 kcal/mol, which is slightly higher than the energy observed in WT simulations. Conformation of the ligand significantly differs from that observed in WT trajectories (Fig 3B). The two main WT conformations are present in only 38% of the time. Moreover, the most frequent conformation is entirely different and found in 33% of the time. For this mutation, we observe higher frequency of interaction with main chain of Q269 (Fig 3C). The main hydrogen bond of WT trajectories that involve mutated Y268 is greatly reduced. Overall, the average number of hydrogen bonds in the trajectories is similar to WT (around 1). Due to the lack of the tyrosine, stacking interactions between ligand and the protein are affected. Also, we observe less frequent interactions between ligand’s naphtyl group and P247, P248 and between Y264 and piperidine ring of S43. Similar circumstances are observed in the trajectories of protein-GRL-0617 complex. Ligand’s binding energy based on MMGBSA calculation (-27.0 ± 1.3 and -30.3 ± 0.3 kcal/mol; Fig 3D) is higher than that of the WT trajectory. Even though GRL-0617 does not change its conformation during any of the trajectories, the hydrogen bonds between protein and the ligand differ from those found in not mutated complex. The interactions between Y264 and side chain of Q269 are diminished. In the case of the non-polar ones, the naphtyl group interacts with P248 slightly less frequently. [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010667 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/