(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ The endoplasmic reticulum proteostasis network profoundly shapes the protein sequence space accessible to HIV envelope ['Jimin Yoon', 'Department Of Chemistry', 'Massachusetts Institute Of Technology', 'Cambridge', 'Massachusetts', 'United States Of America', 'Emmanuel E. Nekongo', 'Jessica E. Patrick', 'Tiffani Hui', 'Tufts University'] Date: 2022-02 The sequence space accessible to evolving proteins can be enhanced by cellular chaperones that assist biophysically defective clients in navigating complex folding landscapes. It is also possible, at least in theory, for proteostasis mechanisms that promote strict quality control to greatly constrain accessible protein sequence space. Unfortunately, most efforts to understand how proteostasis mechanisms influence evolution rely on artificial inhibition or genetic knockdown of specific chaperones. The few experiments that perturb quality control pathways also generally modulate the levels of only individual quality control factors. Here, we use chemical genetic strategies to tune proteostasis networks via natural stress response pathways that regulate the levels of entire suites of chaperones and quality control mechanisms. Specifically, we upregulate the unfolded protein response (UPR) to test the hypothesis that the host endoplasmic reticulum (ER) proteostasis network shapes the sequence space accessible to human immunodeficiency virus-1 (HIV-1) envelope (Env) protein. Elucidating factors that enhance or constrain Env sequence space is critical because Env evolves extremely rapidly, yielding HIV strains with antibody- and drug-escape mutations. We find that UPR-mediated upregulation of ER proteostasis factors, particularly those controlled by the IRE1-XBP1s UPR arm, globally reduces Env mutational tolerance. Conserved, functionally important Env regions exhibit the largest decreases in mutational tolerance upon XBP1s induction. Our data indicate that this phenomenon likely reflects strict quality control endowed by XBP1s-mediated remodeling of the ER proteostasis environment. Intriguingly, and in contrast, specific regions of Env, including regions targeted by broadly neutralizing antibodies, display enhanced mutational tolerance when XBP1s is induced, hinting at a role for host proteostasis network hijacking in potentiating antibody escape. These observations reveal a key function for proteostasis networks in decreasing instead of expanding the sequence space accessible to client proteins, while also demonstrating that the host ER proteostasis network profoundly shapes the mutational tolerance of Env in ways that could have important consequences for HIV adaptation. Funding: This work was funded by UNCF-Merck Postdoctoral Fellowship (to EEN, https://scholarships.uncf.org/Program/Details/1223e136-1f19-4671-84a0-8242b1fd2072 ); Kwanjeong Graduate Fellowship (to JY, http://en.ikef.or.kr/ ); National Science Foundation (Graduate Research Fellowship to AMP and SJH, https://www.nsfgrfp.org/ ); National Cancer Institute (Koch Institute Support (core) Grant P30-CA14051 to MDS, https://www.cancer.gov/grants-training/grants-funding/funding-opportunities ); National Institute of Environmental Health Sciences (Massachusetts Institute of Technology Center for Environmental Health Sciences (core) Grant P30-ES002109 to MDS, https://www.niehs.nih.gov/funding/grants/index.cfm ); Tufts University (to YSL, https://www.tufts.edu/ ); National Science Foundation (CAREER Award 1652390 to MDS, https://www.nsf.gov/funding/ ) and National Institutes of Health (1R35GM136354 to MDS, https://www.nih.gov/grants-funding ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Data Availability: All RNA-Seq data are available from the Gene Expression Omnibus database ( https://www.ncbi.nlm.nih.gov/geo/ ; accession number GSE171356). All FASTQ files from DMS sequencing are available from the Sequence Read Archive ( http://www.ncbi.nlm.nih.gov/sra ; accession number SRP314168; BioProject PRJNA720817). The Python script used to perform DMS data analysis and generate the sequence logo plots is provided in a series of IPython notebooks in ( https://github.com/yoon-jimin/2021_HIV_Env_DMS ). Data used to generate all plots are also provided in the Supporting Information files. This work demonstrates for the first time, to our knowledge, that combined upregulation of chaperones and quality control factors can actually greatly decrease the mutational tolerance of a client protein. It also provides experimental evidence that the host ER proteostasis network profoundly shapes the protein sequence space available to viral membrane proteins and, critically, that the details of the interaction vary from one protein to another—and even within different regions of the same protein. In this study, we used chemical genetic tools to specifically induce the inositol-requiring enzyme-1/X-box binding protein-1 spliced (IRE1-XBP1s) and activating transcription factor 6 (ATF6) transcriptional arms of the UPR separately or in tandem [ 41 ]. This approach provided user-defined modulation of the composition of the host’s ER proteostasis network that mimics the cell’s natural stress response. We observed that the resulting distinct host environments caused a global decrease in Env mutational tolerance, particularly upon XBP1s-mediated enhancement of the ER proteostasis environment. In addition, we observed that sites with different structural or functional roles responded differently to UPR upregulation. For example, conserved regions of Env exhibited an especially strong reduction in mutational tolerance, while a number of sites targeted by broadly neutralizing antibodies displayed an increase in mutational tolerance. Importantly, recent work has revealed that the cellular proteostasis network can indeed impact the sequence space of not just endogenous client proteins, but also viral proteins that hijack their host’s proteostasis machinery [ 29 – 33 ]. This relationship has critical evolutionary and therapeutic implications, because mutational tolerance is directly associated with the ability of a virus to evade the host’s innate and adaptive immune responses, as well as antiviral drugs [ 34 – 40 ]. Early work in this area focused on how viruses like influenza and poliovirus hijack the host’s heat-shock-response-regulated cytosolic chaperones to enhance their mutational tolerance [ 29 – 31 ]. More recently, we discovered that host UPR-mediated upregulation of the ER proteostasis network increases the mutational tolerance of influenza A hemagglutinin specifically at febrile temperatures [ 32 ]. Aside from that hemagglutinin work, to our knowledge no comprehensive studies testing the influence of the ER proteostasis network on client protein evolution, whether viral or endogenous, are available. Here, we evaluated whether and how the unfolded protein response (UPR)–regulated endoplasmic reticulum (ER) proteostasis network influences the sequence space accessible to membrane proteins processed by the secretory pathway. In particular, we used chemical genetic control of the UPR to broadly modulate the composition of the ER proteostasis network, and then used deep mutational scanning (DMS) to assess how such perturbations alter accessible client protein sequence space. We chose human immunodeficiency virus-1 (HIV-1) envelope (Env), a trimeric surface glycoprotein that is folded and quality-controlled by the ER, as our model client protein. We selected Env because its rapid evolution during HIV infections plays a critical role in HIV developing drug and host cell antibody resistance [ 21 – 23 ]. Additionally, Env interacts extensively with various components of the ER proteostasis network, including the ER chaperones calnexin [ 24 ] and calreticulin [ 25 ], binding immunoglobulin protein (BiP) [ 26 ], and ER alpha-mannosidase to initiate ER-associated degradation (ERAD) [ 27 , 28 ], suggesting the strong potential for the host ER proteostasis network to shape Env’s accessible sequence space. In contrast to chaperones increasing sequence space, one might anticipate that protein folding quality control factors would constrain the sequence space accessible to evolving client proteins. For example, promoting the rapid degradation and removal of slow-folding or aberrantly folded protein variants could cut off otherwise accessible evolutionary trajectories [ 16 – 18 ], especially if those variants might have still maintained some level of function if instead allowed to persist in the cellular environment. Unfortunately, efforts to understand the potential contributions of quality control in shaping protein sequence space are limited. This gap in understanding is particularly problematic because natural cellular mechanisms to remodel proteostasis networks function via stress-responsive transcription factors [ 19 , 20 ], rather than via inhibition or upregulation of individual chaperones. These transcription factors tune the levels of both chaperones and quality control mechanisms simultaneously. Such mechanisms may potentially compete in how they impact the sequence space of various evolving client proteins. Protein mutational tolerance is constrained by the biophysical properties of the evolving protein. Selection to maintain proper protein folding and structure purges a large number of otherwise possible mutations that could be functionally beneficial [ 1 – 5 ]. It is no surprise, then, that cellular proteostasis networks play a key role in defining the protein sequence space accessible to client proteins [ 6 – 17 ]. Much attention has been given to the phenomenon of chaperones increasing the sequence space accessible to their client proteins, likely by promoting the folding of protein variants with biophysically deleterious amino acid substitutions [ 7 – 11 ]. Most efforts in this area have focused specifically on how the activities of the heat shock proteins Hsp90 and Hsp70 can expand protein sequence space, in part owing to the availability of specific inhibitors that enable straightforward comparative studies of protein evolution in the presence versus the absence of folding assistance. Results Chemical genetic control of ER proteostasis network composition during HIV infection We began by generating a cell line in which HIV could robustly replicate and we could chemically induce the UPR’s IRE1-XBP1s and ATF6 transcriptional responses separately or simultaneously, in an ER stress-independent manner. We sought ER stress-independent induction of these transcription factors rather than global stress-mediated UPR induction, owing to the pleiotropic effects of chemical stressors and the non-physiologic, highly deleterious consequences of inducing high levels of protein misfolding in the secretory pathway [19,32,41,42]. We selected the IRE1-XBP1s and ATF6 arms of the UPR for chemical control because, in contrast to the protein-kinase-R-like ER kinase arm of the UPR that functions largely through translational attenuation, they are the key pathways responsible for defining levels of ER chaperones and quality control factors [20,41,43] likely to influence Env folding, degradation, and secretion. To allow for robust replication of HIV, we chose human T cell lymphoblasts (SupT1 cells) as the host cells. SupT1 cells support high levels of HIV replication in cell culture, likely due to the lack of cytidine deaminase activity that can cause hypermutation of HIV DNA [44]. Moreover, infection with HIVeGFP/VSV-G virus or HIV itself does not alter the expression of UPR-controlled genes in SupT1 cells [45,46]. To attain user control of the IRE1-XBP1s pathway and ATF6 transcriptional response in these cells, we used a previously described method of stable cell line engineering [41] (detailed in Materials and Methods). Briefly, the XBP1s transcription factor was placed under control of the tetracycline receptor, and induced by treatment with doxycycline (dox). Orthogonally, the active form of the ATF6 transcription factor was fused to an Escherichia coli dihydrofolate reductase (DHFR)–based destabilizing domain, and induced by treatment with trimethoprim (TMP). We termed the resulting engineered cells SupT1DAX cells (Fig 1A), with the DAX signifier indicating the inclusion of both the DHFR.ATF6 and XBP1s constructs. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Stress-independent induction of XBP1s, ATF6, or XBP1s and ATF6 creates 4 distinct endoplasmic reticulum proteostasis environments in SupT1DAX cells (basal, +XBP1s, +ATF6, and +XBP1s/+ATF6). (A) Chemical genetic strategy to orthogonally regulate XBP1s and ATF6 in SupT1DAX cells. (B–D) RNA sequencing (RNA-Seq) analysis of the transcriptomic consequences of (B) XBP1s, (C) ATF6, and (D) XBP1s/ATF6 induction. Transcripts that were differentially expressed under each condition based on a >1.5-fold change in expression level (for dox-, TMP-, or dox- and TMP-treated versus vehicle-treated cells) and a non-adjusted p-value < 10−10 are separated by dashed lines and plotted in red, with select transcripts labeled. The lowest nonzero p-value recorded was 10−291; therefore, p-values equal to 0 were replaced with p-value = 1.00 × 10−300 for plotting purposes. Transcripts for which p-values could not be calculated owing to extremely low expression or noisy count distributions were excluded from plotting. (E–G) Comparison of transcript fold change upon (E) +XBP1s versus +ATF6, (F) +ATF6 versus +XBP1s/+ATF6, and (G) +XBP1s versus +XBP1s/+ATF6 remodeling of the endoplasmic reticulum proteostasis network. Only transcripts with false-discovery-rate-adjusted p-value < 0.05 and fold increase > 1 in both of the indicated conditions are plotted. Dashed lines indicate a 1.5-fold filter to assign genes as selectively induced by the proteostasis condition on the x-axis (red), y-axis (blue), or lacking selectivity (purple). Transcripts with fold increase < 1.2 in either proteostasis environment are colored in grey to indicate low differential expression. The complete RNA-Seq differential expression analysis is provided in S1 Data. dox, doxycycline; TMP, trimethoprim. https://doi.org/10.1371/journal.pbio.3001569.g001 With stably engineered SupT1DAX cells in hand, we anticipated that we could create 4 distinct ER proteostasis environments (basal, XBP1s-induced, ATF6-induced, and XBP1s/ATF6 co-induced) to assess potential consequences for Env mutational tolerance. We induced the XBP1s and ATF6 transcriptional responses in SupT1DAX cells, either separately or together, and evaluated resultant changes in the transcriptome using RNA sequencing (RNA-Seq) (S1 Data). We applied gene set enrichment analysis [47] to the RNA-Seq results using the MSigDB C5 collection, and found that gene sets related to ER stress, Golgi trafficking, and ERAD were highly enriched upon induction of XBP1s, induction of ATF6, and co-induction of XBP1s and ATF6 (S2 Data). In contrast, gene sets that serve as markers of other stress responses (e.g., the heat shock response) were not enriched, consistent with a highly selective, stress-independent induction of UPR transcriptional responses. Comparing the resulting transcriptomes, we observed significant and substantial upregulation of 223 transcripts upon XBP1s induction (+XBP1s), 24 transcripts upon ATF6 induction (+ATF6), and 436 transcripts upon co-induction of XBP1s and ATF6 (+XBP1s/+ATF6) (Fig 1B–1D). For all 3 treatment conditions, the upregulated transcripts were strongly biased towards known UPR-regulated components of the ER proteostasis network. To analyze the extent to which these 3 perturbations (+XBP1s, +ATF6, and +XBP1s/+ATF6) engendered unique ER proteostasis environments, we cross-compared the mRNA fold changes owing to each treatment (Fig 1E–1G). Transcripts known to be targeted primarily by XBP1s were strongly upregulated upon dox treatment (e.g., SEC24D and DNAJB9), whereas transcripts known to be targeted primarily by ATF6 were more strongly upregulated upon TMP treatment (e.g., HSP90B1 and HSPA5) (Fig 1E) [41,48,49]. We used immunoblotting to confirm successful induction of these pathways, observing selective protein-level induction of the XBP1s target Sec24D upon dox treatment versus selective induction of the ATF6 target BiP (HSPA5) upon TMP treatment (S1 Fig). XBP1s induction caused an extensive remodeling of the entire ER proteostasis network, whereas ATF6 induction resulted in targeted upregulation of just a select subset of ER proteostasis factors, consistent with prior work showing that ATF6 induction causes upregulation of fewer transcripts than XBP1s [41,49]. Notably, the combined induction of XBP1s and ATF6 provided access to a third environment where specific transcripts (e.g., genes known to be targets of XBP1s and ATF6 heterodimers, such as HERPUD1) were more strongly upregulated than upon the single induction of either transcription factor (Fig 1F and 1G) [41,50,51]. Taken together, our RNA-Seq results show that we can access 4 distinctive ER proteostasis environments for Env mutational tolerance experiments via chemical genetical control of XBP1s and ATF6 (basal, +XBP1s, +ATF6, and +XBP1s/+ATF6). We assessed whether these perturbations of the ER proteostasis environment had deleterious effects on cell viability or restricted HIV replication, as we had previously observed inhibition of HIV replication upon upregulation of the heat shock response [52]. To address the former, we induced XBP1s and ATF6, individually or simultaneously, in SupT1DAX cells and measured resazurin metabolism 72 h after drug treatment (S2A Fig). We observed that induction of XBP1s and ATF6, either separately or simultaneously, did not alter the metabolic activity of SupT1DAX cells, consistent with no deleterious effects on cell viability. To address whether HIV replication was restricted, we used the TZM-bl assay to quantify HIV infectious titer (S2B Fig). Specifically, we used TZM-bl reporter cells containing the E. coli β-galactosidase gene under the control of an HIV long terminal repeat sequence [53]. When these cells are infected with HIV, the HIV Tat transactivation protein induces expression of β-galactosidase, which cleaves the chromogenic substrate (X-Gal) and causes infected cells to appear blue in color. The infectious titer increased marginally by approximately 3.5-fold when XBP1s was induced, either alone or together with ATF6. Induction of ATF6 alone did not affect HIV infectious titer. Thus, ER proteostasis network perturbation via XBP1s and/or ATF6 induction did not deleteriously impact HIV replication. Env DMS in 4 distinct host ER proteostasis environments We next applied DMS to Env to test our hypothesis that the composition of the host’s ER proteostasis network plays a central role in determining the mutational tolerance of Env. For this purpose, we employed a previously developed set of 3 replicate Env proviral plasmid libraries [22], created by introducing random codon mutations at amino acid residues 31–702 of the Env protein (note that the HXB2 numbering scheme [54] is used throughout). Briefly, the library was generated using a previously described technique that uses pools of primers containing a random NNN nucleotide sequence at the codon of interest, and mutations are introduced via iterative rounds of low-cycle PCR [55]. This technique generates multi-nucleotide (e.g., gca → gAT) as well as single nucleotide (e.g., gca → gAa) codon mutations, thereby introducing mutations at the codon level rather than at the nucleotide level [22,55]. The N-terminal signal peptide and the C-terminal cytoplasmic tail of Env were excluded from mutagenesis owing to their dramatic impact on Env expression and/or HIV infectivity [22]. We generated biological triplicate viral libraries from these mutant Env plasmid libraries by transfecting the plasmid libraries into HEK293T cells and then harvesting the passage 0 (p0) viral supernatant after 4 d. Deep sequencing of the 3 p0 viral libraries showed that 74% of all possible amino acid substitutions were observed at least 3 times in each of the triplicate libraries, and 98% of all possible substitutions were observed at least 3 times in at least 1 of the triplicate libraries, consistent with prior work [22,36]. Mutations that were not included in the viral libraries were dispersed throughout the sequence and did not correspond to specific regions of structural or functional importance (S3 Fig). To establish a genotype–phenotype link, we passaged the p0 transfection supernatants in SupT1 cells at a very low multiplicity of infection (MOI) of 0.005 infectious virions/cell. We next performed batch competitions of each individual Env viral library in SupT1DAX cells in each of the 4 different ER proteostasis environments: basal, +XBP1s, +ATF6, and +XBP1s/+ATF6 (Fig 2A). Briefly, SupT1DAX cells were treated with vehicle, dox, TMP, or both dox and TMP to generate the intended ER proteostasis environment, followed by infection with p1 viral supernatant at a MOI of 0.005 infectious virions/cell. We used this MOI to minimize co-infection of individual cells and thereby maintain the genotype–phenotype link. Non-integrated viral DNA was extracted, and Env amplicons were generated by PCR [22]. Finally, we deep-sequenced the amplicons using barcoded-subamplicon sequencing (S4 Fig) and analyzed the sequencing reads using the dms_tools2 suite (https://jbloomlab.github.io/dms_tools2/) [56,57]. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 2. Upregulation of the host cell’s ER proteostasis environment generally reduces mutational tolerance across the Env protein sequence. (A) Scheme for deep mutational scanning of Env in 4 distinct ER proteostasis environments (basal, +XBP1s, +ATF6, and +XBP1s/+ATF6). SupT1DAX cells were pretreated with DMSO (basal), dox (+XBP1s), TMP (+ATF6), or both dox and TMP (+XBP1s/+ATF6) 18 h prior to infection with biological triplicate Env viral libraries. Four days post-infection, cells were harvested, and non-integrated viral DNA was sequenced to quantify the diffsel of Env variants. (B) Diffsel for each amino acid variant can be visualized in a sequence logo plot. The black horizontal lines represent the diffsel for the wild-type amino acid at that site, and the height of the amino acid letter abbreviations is proportional to the diffsel of that variant in the remodeled ER proteostasis environment relative to the basal environment. Variants that are relatively enriched in the indicated ER proteostasis environment (positive diffsel) are located above the black horizontal line. Variants that are relatively depleted in the indicated ER proteostasis environment (negative diffsel) are located below the black horizontal line. (C) Net site diffsel for all Env sites in 3 perturbed ER proteostasis environments, averaged over biological triplicates. The black horizontal lines on the violin plots indicate the median (solid line) and the first and the third quartiles (dashed lines) of the distribution. The significance of deviation from null (net site diffsel = 0, no selection) was tested using a 1-sample t test, with 2-tailed p-values shown. The mean of the distribution and the number of sites with net site diffsel > 0 or <0 are listed below the distribution. (D and E) Correlation for net site diffsel values for (D) +XBP1s/+ATF6 versus +XBP1s and (E) +XBP1s/+ATF6 versus +ATF6, normalized to the basal proteostasis environment. Pearson correlation coefficients (r) and corresponding p-values are shown. Select sites with highly positive or highly negative net site diffsel values in both proteostasis environments are marked in red and labeled with site numbers. (F) Diffsel for individual Env variants in 3 perturbed ER proteostasis environments, averaged over biological triplicates. The black horizontal lines on the violin plots indicate the median (solid line) and the first and the third quartiles (dashed lines) of the distribution. The significance of deviation from null (diffsel = 0, no selection) was tested using a 1-sample t test, with 2-tailed p-values shown. The mean of the distribution and the number of sites with diffsel > 0 and <0 are listed below the distribution. Diffsel values (C–F) are provided at https://github.com/yoon-jimin/2021_HIV_Env_DMS. diffsel, differential selection; dox, doxycycline; ER, endoplasmic reticulum; TMP, trimethoprim; WT, wild-type. https://doi.org/10.1371/journal.pbio.3001569.g002 To identify amino acid variants that were differentially enriched or depleted in a given ER proteostasis selection condition (+XBP1s, +ATF6, or +XBP1s/+ATF6) relative to the basal ER proteostasis environment, we quantified differential selection (diffsel) (Fig 2B). Diffsel was calculated by taking the logarithm of the variant’s enrichment in the selection condition relative to its enrichment in the basal ER proteostasis network condition [57]. For example, if a variant exhibited positive diffsel in +XBP1s (selection) versus basal (mock), it would indicate that the variant was more enriched relative to the wild-type amino acid in the +XBP1s condition compared to the basal condition. In addition, to decipher reliable signal from experimental noise, we filtered the DMS data using a previously described and validated 2-step strategy [32]. First, we removed variants that were not present in all 3 pre-selection replicate viral libraries. That is, we eliminated even those variants that were strongly enriched or depleted in 2 replicates if they were not present in the starting library of the third replicate. Second, we removed variants that exhibited diffsel in opposite directions in any of the biological triplicates. Using the second filter, we typically removed variants that were minimally affected by the selection, displaying slightly positive diffsel values in one replicate but slightly negative diffsel values in another. By applying these 2 filters, we were able to focus subsequent analyses only on Env variants that exhibited robust, reproducible diffsel across biological triplicates of the same ER proteostasis network conditions (out of 12,787 theoretically possible non-wild-type variants: 3,455 variants for +XBP1s [27%], 2,935 variants for +ATF6 [23%], and 3,308 variants for +XBP1s/+ATF6 [26%]). [END] [1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001569 (C) Plos One. "Accelerating the publication of peer-reviewed science." Licensed under Creative Commons Attribution (CC BY 4.0) URL: https://creativecommons.org/licenses/by/4.0/ via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/