(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Identification of genetic variants of the industrial yeast Komagataella phaffii (Pichia pastoris) that contribute to increased yields of secreted heterologous proteins [1] ['Benjamin Offei', 'Ucd Conway Institute', 'School Of Medicine', 'University College Dublin', 'Dublin', 'Stephanie Braun-Galleani', 'School Of Biochemical Engineering', 'Pontificia Universidad Católica De Valparaíso', 'Valparaíso', 'Anjan Venkatesh'] Date: 2022-12 The yeast Komagataella phaffii (formerly called Pichia pastoris) is used widely as a host for secretion of heterologous proteins, but only a few isolates of this species exist and all the commonly used expression systems are derived from a single genetic background, CBS7435 (NRRL Y-11430). We hypothesized that other genetic backgrounds could harbor variants that affect yields of secreted proteins. We crossed CBS7435 with 2 other K. phaffii isolates and mapped quantitative trait loci (QTLs) for secretion of a heterologous protein, β-glucosidase, by sequencing individual segregant genomes. A major QTL mapped to a frameshift mutation in the mannosyltransferase gene HOC1, which gives CBS7435 a weaker cell wall and higher protein secretion than the other isolates. Inactivation of HOC1 in the other isolates doubled β-glucosidase secretion. A second QTL mapped to an amino acid substitution in IRA1 that tripled β-glucosidase secretion in 1-week batch cultures but reduced cell viability, and its effects are specific to this heterologous protein. Our results demonstrate that QTL analysis is a powerful method for dissecting the basis of biotechnological traits in nonconventional yeasts, and a route to improving their industrial performance. Funding: This work was supported by Science Foundation Ireland (13/IA/1910 to KHW), the European Research Council (789341 to KHW), and Agencia Nacional de Investigacion y Desarrollo Fondecyt (11200933 to SBG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2022 Offei et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Here, we mapped QTLs affecting secretion of a heterologous protein in K. phaffii and identified the causative nucleotides underlying 2 QTLs. To achieve high resolution, we combined 2 techniques that have previously only been used separately in S. cerevisiae QTL studies: screening a large number of meiotic segregants to select ones with extreme phenotypes before genotyping [ 20 , 32 ], and sequencing the genome of each selected segregant individually instead of sequencing bulk DNA from each phenotypic pool [ 33 , 34 ]. Many approaches have been taken to try to further improve the industrial performance of K. phaffii as a host for heterologous protein expression, including bioprocess engineering, expression cassette engineering, and host cell engineering [ 3 , 13 ]. Previous efforts to modify the host cell’s genome have targeted particular pathways, such as the development of protease-free strains [ 14 ], strains that overexpress unfolded protein response pathway genes [ 15 ], and och1 strains deficient in hypermannosylation of secreted proteins [ 16 – 18 ]. In contrast to these targeted approaches, quantitative trait locus (QTL) analysis has the advantage that it does not require any prior hypotheses regarding which genes will contribute to the phenotype of interest. Most previous yeast QTL analyses have been conducted in S. cerevisiae [ 19 ], and they led to the identification of genes responsible for several industrially relevant polygenic phenotypes in that species [ 20 – 24 ]. QTL mapping has been applied to only a few other yeast species, including Schizosaccharomyces [ 25 ], Lachancea [ 26 , 27 ], and Cryptococcus [ 28 ]. K. phaffii seems particularly suitable for QTL analysis because all the strains currently used for heterologous protein production are derived from the same progenitor strain (called CBS7435 or NRRL Y-11430) [ 5 , 29 – 31 ], so it seems likely that other genetic backgrounds may contain beneficial alleles that could be transferred into production strains by genome editing. Nevertheless, QTL approaches have remained unexplored in K. phaffii and other nonconventional yeast species used in biotechnology. Because they are microbial eukaryotes, yeasts offer many beneficial attributes for heterologous protein production that are absent from cellular platforms based on bacterial or mammalian cells. Although Saccharomyces cerevisiae remains a popular expression host, “nonconventional” yeast species have eclipsed it in recent decades. These species offer advantages over S. cerevisiae such as growth to very high cell densities, thermotolerance, and fewer endogenous secreted proteins. Foremost among the nonconventional yeasts is Komagataella phaffii, which has become very widely used for secretion of heterologous proteins [ 1 – 4 ]. K. phaffii is one of the 2 yeast species that were formerly called Pichia pastoris until they were recognized as separate species in 2009 [ 5 ]. K. phaffii is used to produce marketed therapeutic proteins including human insulin [ 6 ], interferon alpha [ 7 ], kallikrein inhibitor [ 8 ], and plasmin [ 8 ], as well as vaccines [ 6 , 9 ] and antibodies [ 10 , 11 ]. Additionally, it is used to produce enzymes for the food and feed industries [ 12 ]. All these proteins are secreted. Results Construction of parental BGL-secreting strains Only 4 different natural isolates of K. phaffii are known, but they harbor substantial genetic and phenotypic diversity [30,31]. Isolates Pp2 (NRRL Y-17741) and Pp4 (NRRL YB-378) both differ from the reference strain CBS7435 by about 42,000 single-nucleotide polymorphisms (SNPs) and are similarly divergent from each other [30,31]. To maximize our ability to find variants of interest, we designed a mating scheme with 2 crosses between these different genetic backgrounds: Pp2 × CBS7435 (Cross 1), and Pp4 × CBS7435 (Cross 2) (Fig 1A). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Parental strains and design of genetic crosses. (A) Experimental design. Two crosses were made between a derivative of CBS7435 (CBS_BGL9) and derivatives of Pp2 (Pp2_BGL5) or Pp4 (Pp4_BGL3). These parental strains each contained a BGL expression cassette and KanR or ZeoR marker, integrated at the GAPDH locus. The resulting diploid interstrain hybrids were sporulated and 1,000 haploid meiotic segregants were isolated from each cross by random spore isolation. BGL secretion in each segregant was assayed and used to choose pools of 30 superior segregants and 30 inferior segregants from each cross. Genomes of the segregants in each pool were then sequenced individually, after which SNP allele frequency plots were constructed for each pool, enabling identification of genomic regions (QTLs) segregating preferentially with contrasting BGL secretion phenotypes in each cross. (B) Quantitative assessment of BGL secretion in the parental strains and controls (CBS_PGAP is a derivative of CBS7435 containing an integrated empty pGAPZα vector; Pp2 and Pp4 are wild-type strains). Bars represent mean data from 3 independent cultures of each strain (open circles). Error bars show standard deviation. Significant differences in BGL secretion between CBS_BGL9 and the other parental strains were tested with unpaired t test and are indicated by asterisks (**, P = 0.0035; and ***, P = 0.0005). Numerical data are listed in S1 Data. (C) Formation of diploids by crossing. Diploid colonies formed by mating grow at the intersections between streaks of the haploid parental strains, when replica plated onto selective media containing zeocin plus geneticin. BGL, β-glucosidase; QTL, quantitative trait locus; SNP, single-nucleotide polymorphism. https://doi.org/10.1371/journal.pbio.3001877.g001 We chose the β-glucosidase (BGL) enzyme of the filamentous fungus Thermoascus aurantiacus as a model secreted protein, because it can be assayed readily in microtiter plates and has been expressed previously in K. phaffii [35]. We cloned the T. aurantiacus BGL gene into a pGAPZα plasmid (Invitrogen), creating a BGL expression cassette with a zeocin resistance gene (ZeoR) for integration at the constitutive P GAP (GAPDH) promoter locus of CBS7435. Additionally, a derivative of this cassette harboring a geneticin resistance gene (KanR) instead of ZeoR was integrated at the P GAP loci of Pp2 and Pp4 (S1 Fig). Transformants were assessed for extracellular BGL secretion using both qualitative 4-MUG assays (UV fluorescence due to hydrolysis of 4-methylumbelliferyl-β-D-glucuronide to 4-methylumbelliferone) and quantitative 4-NPG assays (optical absorbance at 405 nm due to hydrolysis of 4-nitrophenyl-β-D-glucopyranoside to 4-nitrophenol (4-NP)). BGL activity was detected in supernatants of most transformants carrying the BGL expression construct and absent in controls (S1 Fig). We then selected 1 BGL-expressing transformant of each of CBS7435 (CBS_BGL9), Pp2 (Pp2_BGL5), and Pp4 (Pp4_BGL3) for use as parents in genetic crosses. Additionally, we compared BGL secretion per unit biomass in these 3 transformants after a 96-hour cultivation period in 400 μl cultures. The CBS7435-derived strain CBS_BGL9 was superior in terms of BGL secretion and produced approximately twice as much 4-NP per cell as Pp2_BGL5 and Pp4_BGL3 (Fig 1B). QTL mapping by sequencing individual segregants We sequenced the genomes of each of the 120 selected segregants individually, which enables us to calculate the exact allele frequencies of all variants in each pool. Each segregant’s genome was sequenced to approximately 100× coverage (Illumina). After quality filtering, reads were mapped to the CBS7435 reference genome [29] and the genotype of the segregant at each of the approximately 42,000 SNP sites segregating in the cross was determined. For each SNP site, we then calculated the allele frequency of the nonreference allele (i.e., the Pp2 allele in Cross 1, and the Pp4 allele in Cross 2) among the 30 genomes in the superior pool and among the 30 genomes in the inferior pool. These frequencies are plotted versus genomic position, for each pool in each cross (Fig 3A and 3B). Neutral SNP sites are expected to show a nonreference allele frequency of approximately 0.5 in both pools, whereas SNP sites genetically linked to loci affecting BGL secretion should deviate from a 0.5 frequency in opposite directions in the 2 pools. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 3. QTL analysis using SNP allele frequencies calculated from genome sequences of individual segregants. (A) Cross 1 (Pp2_BGL5 × CBS_BGL9). Left axis: Black dots show the frequency of the Pp2 allele among the 30 superior segregants, and orange dots show its frequency among the 30 inferior segregants, at each of the 42,262 polymorphic sites segregating on the 4 K. phaffii chromosomes in this cross. The gray horizontal lines at allele frequency 0.5 represent unbiased segregation. Right axis: Blue bars show the statistical significance of bias in allele frequency (log10 scale; Fisher’s exact test with Bonferroni correction for multiple testing over all SNPs on a chromosome). (B) Cross 2 (Pp4_BGL3 × CBS_BGL9). Black dots show the frequency of the Pp4 allele among the 30 superior segregants, and orange dots show its frequency among the 30 inferior segregants, at each of the 41,552 polymorphic sites segregating in this cross. Other details are as in (A). Numerical data are listed in S2 Data. https://doi.org/10.1371/journal.pbio.3001877.g003 For comparison, we also used the standard bulk segregant analysis approach of combining equal quantities of biomass from the 30 segregants in each pool, sequencing bulk DNA from this mixture, and using SNP frequencies in the sequencing reads as a proxy for their frequencies in the 30 segregants. The results from this approach (S4 Fig) show the same pattern of variation in allele frequencies along chromosomes as we observe in Fig 3, but with much more noise and consequently lower resolution than we obtained by sequencing segregants individually. QTL1 maps to a frameshifted allele of HOC1 in CBS7435 In both Cross 1 and Cross 2, a prominent QTL (designated QTL1) is apparent near the center of chromosome 3 (Fig 3). In the 43-kb region at the peak of QTL1 in Cross 1, all 30 superior segregants contain the CBS7435 haplotype and all 30 inferior segregants contain the Pp2 haplotype. A statistical test for difference in allele frequencies between the 2 pools reaches a significance of P = 1.84 × 10−13 at each SNP site in this peak region of QTL1 in Cross 1 (blue bars in Fig 3A). Similarly, at QTL1 in Cross 2, all 30 superior segregants contain the CBS7435 haplotype and all 30 inferior segregants contain the Pp4 haplotype in a 33-kb region at its peak (Fig 3B; P = 1.81 × 10−13). Thus, in this region of the genome, the CBS7435 allele is superior to both the Pp2 and Pp4 alleles. The regions of peak statistical significance in the 2 crosses overlap in a 23-kb interval that contains 13 genes including HOC1 (Fig 4A). The Pp2 and Pp4 alleles of HOC1 are intact and code for a 397-residue protein that is well conserved among budding yeasts, but the CBS7435 allele contains a frameshift mutation that truncates the Hoc1 protein to 273 residues and presumably inactivates it. S. cerevisiae HOC1 (“homolog of OCH1”) codes for a protein with a mannosyltransferase domain that is a subunit of the Mannan Polymerase II complex, which is located in the Golgi apparatus and extends the α1,6-linked backbone of mannan chains of N-glycosylated proteins [37]. Disruption of S. cerevisiae HOC1 causes cell wall defects, so Hoc1 is postulated to function in glycosylation of cell wall proteins [38]. A deletion of HOC1 was recovered in a genomewide screen for increased protein secretion in S. cerevisiae [39], and disruption or deletion of another Mannan Polymerase II subunit Mnn10 also increases yields of secreted proteins in S. cerevisiae and Kluyveromyces lactis [40,41], so K. phaffii HOC1 was a strong candidate gene at QTL1. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 4. QTL1 maps to a frameshift mutation in HOC1. (A) Gene map at the peak region of QTL1. Blue bars indicate the regions of maximum segregation bias in each cross. In these regions, the CBS7435 SNP alleles are present in all 30 superior segregants and absent in all 30 inferior segregants. Genes are named according to their S. cerevisiae orthologs where possible. The inset shows the frameshift mutation in the CBS7435 allele of HOC1 (systematic gene name BQ9382_C3-3105). (B) BGL secretion in strains with HOC1 edits (blue bars) and their unedited progenitors (gray bars). The relevant genotype of each strain is indicated in yellow. CBS_BGL9_HOC1FL is a derivative of CBS_BGL9 in which the HOC1 gene was restored to functionality by CRISPR/Cas9 editing to remove the frameshift, making it full length; 4 independent cultures of the same edited strain were assayed. For the other 4 edited strains, the Pp2 or Pp4 HOC1 gene was disrupted by introducing a sequence containing 6 consecutive stop codons (6xSTOP tag) after amino acid position Gly157 in the Hoc1 protein, and 2–3 independently edited clones were assayed. For each of the edited diploids, we recovered and assayed 1 clone in which the Pp2/Pp4 allele was disrupted by a 6xSTOP tag and the CBS7435 allele was disrupted by its original frameshift, and 1 clone in which 6xSTOP tags were inserted into both the Pp2/Pp4 allele and the frameshifted CBS7435 allele. Assays were conducted on 4-day 400 μl cultures. Error bars indicate standard deviation. Significant differences in BGL secretion were tested with unpaired t tests (two-tailed) and are indicated by asterisks (*, P < 0.05. **, P < 0.01. ***, P < 0.001 and ****, P < 0.0001). Numerical data are listed in S1 Data. https://doi.org/10.1371/journal.pbio.3001877.g004 We verified that the frameshifted K. phaffii hoc1 allele in CBS7435 is the cause of QTL1. Inactivation of the intact Pp2 or Pp4 HOC1 gene by CRISPR/Cas9 editing [42] more than doubled BGL secretion in Pp2_BGL5 and Pp4_BGL3 haploids, as well as in CBS_BGL9/Pp2_BGL5 and CBS_BGL9/Pp4_BGL3 diploids (Fig 4B). Conversely, correction of the frameshift to repair HOC1 in haploid CBS_BGL9 halved its BGL secretion. Interestingly, among haploids, BGL secretion was higher in the hoc1 derivatives of Pp2 and Pp4 than in the CBS7435 background (Fig 4B), suggesting that they have potential as host strains. Using the edited haploid strains, we also found that K. phaffii hoc1 mutants are sensitive to the cell wall-perturbing agent Calcofluor White, whereas HOC1 strains are resistant (S5 Fig). QTL2 maps to IRA1 Each cross contains one other QTL that reaches statistical significance: QTL2 on chromosome 3 in Cross 1 (P = 1.57 × 10−3), and QTL3 on chromosome 1 in Cross 2 (P = 1.32 × 10−4) (Fig 3). At these QTLs, the superior allele comes from the nonreference parent Pp2 (at QTL2) or Pp4 (at QTL3), so these are alleles with potential to improve protein secretion if introduced into the widely used strain CBS7435. Each of these QTLs is absent in the other cross, which is not surprising because Pp2 and Pp4 are quite divergent from each other [30]. We chose QTL2 for further analysis because it is narrower than QTL3. It is striking that QTL1 and QTL2 are in opposite phases on the same chromosome, so that most of the segregants in both pools in Cross 1 have a crossover in the interval between them (Fig 3A). At the peak of QTL2, the Pp2 allele is present in 22 (73%) of the 30 superior BGL secreting segregants, as compared to 2 (7%) of the 30 inferior segregants, in 2 neighboring regions (each with P = 1.57 × 10−3) in a 31-kb interval (Fig 5A). This interval contains 18 genes. Because QTL2 was detected in Cross 1 but not in Cross 2, we filtered the SNPs in these genes to exclude any variants that are shared by both Pp2 and Pp4 relative to CBS7435. Nine of the 18 genes contain at least 1 nonsynonymous SNP that passed this filter (Fig 5A). We reviewed the functions of these 9 genes and identified K. phaffii IRA1 as a strong candidate gene, because in S. cerevisiae, a defect in IRA2 was previously found to inhibit the degradation of aggregates of a misfolded heterologous secreted protein in the endoplasmic reticulum (ER) [43], and ER stress is a known bottleneck in protein secretion by K. phaffii [15]. Due to the whole-genome duplication in S. cerevisiae, IRA1 of K. phaffii is orthologous to both IRA1 and IRA2 of S. cerevisiae [44]. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 5. QTL2 maps to an N200D variant in IRA1. (A) Detailed map of the QTL2 region in Cross 1. The upper panel shows Pp2 allele frequencies in the QTL2 region on chromosome 3, among the 30 superior and 30 inferior segregants in Cross 1, as in Fig 3A. Blue vertical bars indicate P values for biased segregation of alleles at individual SNP sites. The peak of QTL2 is 31 kb long and consists of 2 regions of 22 kb and 7 kb, each with P = 1.57 × 10−3, separated by a 2-kb region with P = 5.94 × 10−3. The lower panel shows a gene map of the 31-kb interval. IRA1 and the 8 genes colored green contain nonsynonymous SNPs in Pp2 relative to CBS7435 that are absent in Pp4. The 4 such SNP sites in IRA1 are labeled. (B) Reciprocal hemizygosity analysis of the effect of IRA1 alleles on BGL secretion in a CBS_BGL9/Pp2_BGL5 diploid. Haploid strains are included for comparison. X symbols in the cartoons indicate IRA1 alleles disrupted by insertion of a NatMX antibiotic resistance marker. Bars represent an average of 4-NP absorbance values from 3 independent cultures of control strains and at least 7 biological replicates from the reciprocally hemizygote strains. Error bars indicate standard deviation. Significant differences in BGL secretion were tested with unpaired t tests (two-tailed) and are indicated by asterisks (**, P < 0.01 and ****, P < 0.0001). (C) Effects of IRA1 SNP editing on BGL secretion. The edited strains were made in the haploid CBS_BGL9 background and contain individual nonsynonymous substitutions (N200D, V393L, D399N, G1466D) or a frameshift mutation (K404fs). Bars show mean 4-NP absorbance values from 3 independent cultures of control strains and at least 3 biological replicates from the edited strains. Error bars show standard deviation. Significant differences in BGL secretion between CBS_BGL9 and the other strains were tested by one-way ANOVA (Dunnet correction for multiple comparisons) and are indicated by asterisks (*, P < 0.05. **, P < 0.01 and ****, P < 0.0001) as well as ns (nonsignificant). Numerical data are listed in S1 Data. https://doi.org/10.1371/journal.pbio.3001877.g005 To test whether alleles of IRA1 affect BGL secretion, we first used reciprocal hemizygosity analysis in a CBS_BGL9/Pp2_BGL5 diploid background (Fig 5B). This diploid (IRA1CBS7435/IRA1Pp2) secretes BGL at a level similar to the lower of its 2 haploid parents, Pp2_BGL5, probably because the hoc1CBS7435 allele is recessive (Fig 4B). However, a hemizygous derivative of this diploid, in which the IRA1CBS7435 allele is disrupted and the IRA1Pp2 allele remains intact, shows significantly increased BGL secretion (Fig 5B). In contrast, the reciprocal hemizygote with a disruption of only the IRA1Pp2 allele shows little change in BGL secretion. These results functionally confirm that IRA1Pp2 is a recessive beneficial allele at QTL2 for improved BGL secretion. QTL2 is caused by an IRA1N200D variant in Pp2 There are 6 amino acid differences between the proteins encoded by the IRA1 alleles of Pp2 and CBS7435. Two of these differences are at sites where Pp2 and Pp4 have the same amino acid change relative to CBS7435, so they cannot be the cause of QTL2. We therefore focused on the other 4 sites (Fig 5A). We used CRISPR/Cas9 editing [42] to incorporate each of these 4 amino acid substitutions individually into the IRA1 gene of the haploid CBS_BGL9 parental strain. Additionally, a CBS_BGL9 derivative with a frameshift mutation in IRA1 (IRA1K404fs) was obtained fortuitously during CRISPR/Cas9 editing. Phenotyping these edited strains showed that incorporating an IRA1N200D substitution into haploid CBS_BGL9 significantly improved its BGL secretion, by an average of 17% in 4-day 400 μl cultures (Fig 5C). No improvement was seen in the strains harboring the other 3 nonsynonymous edits, most of which showed no significant difference in BGL secretion vis-à-vis CBS_BGL9. The strain with the frameshift showed significantly less BGL secretion than CBS_BGL9 (Fig 5C), suggesting that the N200D variant may increase Ira1 activity. To further test the effect of the IRA1N200D substitution, we measured BGL secretion and growth rate in 100-ml shake-flask cultures over a 1-week period. We compared 4 independently edited clones of CBS_BGL9 harboring the IRA1N200D edit to the unedited parental strain CBS_BGL9 and an empty vector control strain (CBS_PGAP), assaying BGL secretion every 24 hours. The IRA1N200D clones grew more slowly than their unedited parent and secreted more BGL at every time point from 96 hours onward (Fig 6). By the end of the experiment (168 hours), the IRA1N200D clones had secreted 2.96 times the amount of BGL secreted by their unedited parent. Due to the slower growth of the IRA1N200D clones, normalizing BGL secretion by cell density shows that they reached 3.8 times the parental level per cell at 168 hours (S6 Fig). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 6. Effect of the IRA1N200D substitution on BGL secretion in 1-week, 100-ml shake-flask cultures. BGL secretion and cell density data from 4 independently edited clones with the IRA1N200D edit (green points) are compared to 3 independent cultures of their parent CBS_BGL9 (purple) and its empty vector counterpart CBS_PGAP (gray). Lines indicate trends in average 4-NP absorbance and OD 600 with respect to time. Numerical data are listed in S1 Data. https://doi.org/10.1371/journal.pbio.3001877.g006 Since most BGL production is seen in the late growth phase and the IRA1N200D-edited clones have slower growth rates than unedited clones (Fig 6), we examined cell viability and the possibility that the excess BGL production in the edited clones is caused by cell lysis rather than secretion. SDS-PAGE analysis of supernatants from these cultures confirms that BGL is secreted as a relatively pure protein at the expected size of 120 kDa (S7 Fig). Consistent with the results in Fig 6, the intensity of the BGL band is higher in the IRA1N200D-edited clones than in unedited controls, and this difference becomes more pronounced as time progresses from 96 hours to 168 hours (S7 Fig). Propidium iodide staining shows that cell viability in the edited clones decreases to 40% to 50% in the edited clones from 72 hours onward, whereas it decreases little in unedited clones (S8 Fig). Cell lysis, visible as additional bands on the SDS-PAGE gels, is apparent in the IRA1N200D-edited clones, but only at the final 168-hour time point (S7 Fig). [END] --- [1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001877 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/