(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Understanding molecular mechanisms and predicting phenotypic effects of pathogenic tubulin mutations [1] ['Thomas J. Attard', 'Wellcome Trust Centre For Cell Biology', 'School Of Biological Sciences', 'University Of Edinburgh', 'Edinburgh', 'Scotland', 'United Kingdom', 'Julie P. I. Welburn', 'Joseph A. Marsh', 'Mrc Human Genetics Unit'] Date: 2022-11 Cells rely heavily on microtubules for several processes, including cell division and molecular trafficking. Mutations in the different tubulin-α and -β proteins that comprise microtubules have been associated with various diseases and are often dominant, sporadic and congenital. While the earliest reported tubulin mutations affect neurodevelopment, mutations are also associated with other disorders such as bleeding disorders and infertility. We performed a systematic survey of tubulin mutations across all isotypes in order to improve our understanding of how they cause disease, and increase our ability to predict their phenotypic effects. Both protein structural analyses and computational variant effect predictors were very limited in their utility for differentiating between pathogenic and benign mutations. This was even worse for those genes associated with non-neurodevelopmental disorders. We selected tubulin-α and -β disease mutations that were most poorly predicted for experimental characterisation. These mutants co-localise to the mitotic spindle in HeLa cells, suggesting they may exert dominant-negative effects by altering microtubule properties. Our results show that tubulin mutations represent a blind spot for current computational approaches, being much more poorly predicted than mutations in most human disease genes. We suggest that this is likely due to their strong association with dominant-negative and gain-of-function mechanisms. Filament-like structures, called microtubules, are essential for cells to function, distribute material around the cell and organisms, and help cells grow. The building blocks of microtubules are proteins called tubulins, which can rapidly polymerise and depolymerise. Mutations in tubulin genes can have catastrophic consequences on many different types of cells, leading to diseases such as bleeding defects, female infertility, and disorders impairing brain development. However, how these mutations cause disease and whether they can be predicted is still unknown. We used computational and experimental techniques to address these issues. First, we compared how disease-causing tubulin mutations and ones found in healthy people impact the structure of tubulin. Then, we tested the ability of available computational predictors to distinguish between these two types of tubulin mutations. We found these programs poorly predict tubulin mutations that cause diseases, limiting their usefulness. Next, we studied disease-causing mutations that were not predicted by computational methods. We found that these did not prevent tubulin from forming microtubules, indicating these mutations change the function of tubulin without inactivating them. Our work presents tubulins as a weakness of current computational predictors, potentially because they fail to consider different ways in which mutations cause disease. Funding: This work was supported by the Medical Research Council via a Precision Medicine Doctoral Training Programme studentship to TJA and a Career Development Award (MR/M02122X/1) to JAM. JAM is a Lister Institute Research Fellow. JPIW is supported by a Wellcome Trust Senior Research Fellowship (207430). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2022 Attard et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this study, we have first performed a systematic survey of known pathogenic missense mutations across all human tubulins and analysed their positions within the three-dimensional structures of tubulin heterodimers. This approach has allowed us to look for patterns in mutations across isotypes, structural locations, and phenotypes, in an attempt to obtain insight into the likely molecular disease mechanisms. Next, we assessed the performance of several different VEPs in distinguishing between pathogenic and putatively benign missense variants, observing that the predictive performance of all tested methods is poor compared to most other proteins. Finally, we have selected pathogenic tubulin mutations that were poorly predicted by computational approaches for experimental characterisation and show that the mutant proteins are able to be incorporated into microtubules, consistent with a likely dominant-negative mechanism. Our work suggests that many tubulin pathogenic mutations that act via non-loss-of-function mechanisms cause pathogenic phenotypes that cannot be explained computationally using current methods that rely on sequence conservation or protein structure. Overall, this study highlights the need for a greater understanding of microtubule-protein interactions to understand the molecular mechanisms underlying tubulinopathies. With the increasing accessibility of sequencing data, novel tubulin variants are being continually discovered [ 35 ]. Since it is impractical to test them all experimentally, there is a strong need for computational approaches to identify tubulin mutations most likely to be pathogenic. Many variant effect predictors (VEPs) have been developed in recent years, and some of these are now in widespread use to help identify mutations that potentially have clinical significance [ 36 ]. However, the performance of these predictors can vary quite dramatically across different proteins and, to our knowledge, there has been no systematic assessment of their performance on tubulin mutations specifically. Missense mutations that cause pathogenicity by dominant-negative or gain-of-function mechanisms tend to be poorly predicted by most currently available VEPs [ 37 ], which could potentially limit their applicability to tubulins. It is therefore important for us to understand to what extent we can rely on computational predictors when assessing tubulin mutations. The absence of any known protein null mutations in tubulins causing dominant disease is striking. This tells us that the molecular mechanism underlying dominant mutations cannot simply be haploinsufficiency, whereby disease is caused by a complete lack of functional protein produced from the mutant allele. One possibility is that disease is caused by a milder loss of function (i.e. hypomorphic mutations). For example, pathogenic TUBα1A mutations have been reported to disrupt interactions of the nascent protein with tubulin chaperones, which impairs heterodimer formation, suggesting that disease can be caused by a partial loss-of-function [ 25 ]. Alternatively, a pathogenic mutation could act via a non-loss-of-function mechanism, having a dominant-negative effect or causing a gain of function [ 26 ]. This would typically be associated with the mutant protein retaining the ability to incorporate into microtubules, which has been observed for many pathogenic tubulin mutations [ 27 – 32 ]. For dominant-negative mutations, the incorporation of mutant protein directly or indirectly disrupts the activity of the wild-type protein [ 26 ]. In these cases, the mutant tubulin retains its ability to form a heterodimer and assemble into microtubules before consequently impacting function in some other way, e.g. by perturbing microtubule properties or disrupting interactions with MAPs. For instance, TUBβ3 mutations that alter the charged surface of the microtubule prevent molecular motors from binding and thus have profound impacts on cellular transport [ 33 ]. Changes in microtubule properties could also induce a gain of function, as has been proposed for the pathogenic T178M variant in TUBβ2A and TUBβ3, which has been reported to make microtubules more stable and cause altered microtubule growth dynamics [ 34 ]. Despite the large number of identified pathogenic tubulin mutations, our understanding of the molecular mechanisms by which these mutations cause disease remains limited. The large majority of pathogenic tubulin mutations involve missense changes (i.e. single amino acid residue substitutions) and have autosomal dominant inheritance. There are only a few known exceptions, including homozygous null and in-frame deletion mutations in TUBβ8 that can cause an oocyte maturation defect [ 23 ], a dominant nonsense mutation that results in a slightly truncated TUBα4A linked to the neurodegenerative disease amyotrophic lateral sclerosis [ 19 ], and a recessive intron deletion in TUBα8 in polymicrogyria patients that interferes with splicing, producing shorter mRNAs that do not contain exon 2 [ 24 ]. The number of dominant pathogenic mutations identified for each isotype is denoted, as well as their pathogenicity classification. Where necessary, additional comments about the pathogenicity type and references for all mutations are also included. MCM = missense constraint metric (Z-Score obtained from gnomAD); ALS = amyotrophic lateral sclerosis; H-ABC = hypomyelination with atrophy of basal ganglia and cerebellum. A wide range of genetic disorders–called ’tubulinopathies’–have now been attributed to tubulin mutations. Over 225 pathogenic mutations in human tubulin isotypes have been reported (see Table 1 ). These findings highlight the importance of understanding tubulin function in different cell types. The first reported tubulin mutations associated with pathogenic phenotypes were found in TUBα1A, TUBβ2B, TUBβ3, and TUBβ4A and resulted in neurodevelopmental defects [ 15 ]. Mutations in the γ-tubulin isotype TUBγ1 –which is necessary for microtubule nucleation–also cause a similar neurodevelopmental disorder [ 16 , 17 ]. While mutations in TUBα4A and TUBβ4A have been linked to neurodegenerative disease [ 18 , 19 ], phenotypes outside the nervous system are now emerging. These include TUBβ1 mutations associated with bleeding disorders [ 20 ], a link between tubulin-α acetylation and reduced sperm motility [ 21 ], and TUBβ8 mutations connected with female infertility due to incorrect meiotic spindle assembly [ 22 ]. Microtubules self-assemble from tubulin-α and -β heterodimers, with the dynamics of their assembly and disassembly being integral to their function [ 3 ]. Tubulin-α and -β are ubiquitous in eukaryotes, while related proteins in the FtsZ family show similarity in sequence, structure and function in archaea and bacteria [ 4 , 5 ]. Nine tubulin-α and ten tubulin-β genes have been identified in humans, originating from evolutionary gene duplication events [ 6 ]. In the tubulin field, these tubulin paralogues are referred to as isotypes. Between tubulin-α and -β, conservation in sequence and structure is high, especially at interfaces stabilising the heterodimer, and at contacts between tubulin heterodimers across (lateral) and along (longitudinal) protofilaments [ 7 ]. While tubulin-α and -β both bind to GTP, only tubulin-β can hydrolyse it to GDP, with residues on these binding sites amongst the most conserved [ 8 ]. GTP hydrolysis enables distinct conformations that mediate the dimer’s ability to be incorporated into microtubules [ 9 ]. The C-terminal region makes up the outer surface of the microtubule and so contributes to most of the interactions with MAPs [ 10 ]. Furthermore, many differences in amino acid sequences between isotypes occur in this region [ 11 ] and in the unstructured, highly negative tail [ 12 ]. Tubulin-γ, δ and ε are more divergent in sequence than tubulin-α and -β and are involved in the basal bodies of centrioles, rather than being self-assembled into dynamic polymers [ 13 , 14 ]. Microtubules are polarised cytoskeletal filaments essential in several cellular processes, ranging from cell division to signalling and transport. They assemble into axons, cilia, and the mitotic spindle, while also providing tracks for microtubule-associated proteins (MAPs) and motors for molecular trafficking [ 1 , 2 ]. Results Survey of tubulin missense mutations First, we compiled as many previously identified pathogenic or likely pathogenic dominant tubulin missense mutations as possible, using online databases [38,39] and extensive literature searching (S1 Table). In addition, we also identified missense variants in tubulin genes observed across >140,000 people from the gnomAD v2.1 database [40]. Given that the gnomAD dataset comprises mostly healthy individuals without severe genetic disorders, these variants are unlikely to cause dominant disease, and we therefore refer to them as "putatively benign". However, we acknowledge that some of these variants could have milder effects, variable penetrance, or be associated with late-onset disease. Table 1 shows the numbers of pathogenic and gnomAD missense variants for each tubulin isotype and the associated type of genetic disease. Somatic mutations in tubulins are also implicated in cancer development; however, these mutations are likely to provide a selective advantage to cancer cells by providing resistance to chemotherapeutic drugs [41–43]. Hence these mutations might obscure our results and have not been included in our study, although we have noted two isotypes with links to cancer in which no other disease-related mutations have been identified yet (Table 1). While pathogenic mutations occur throughout the tubulins, mutations in both tubulin-α and -β show clustering towards the intermediate and C-terminal domains when shown in the context of the linear amino acid sequence (Fig A in S1 Text). We also considered the gene-level missense constraint metric (MCM) scores provided by the gnomAD database (Table 1). These are derived from a model based on sequence context to predict the number of expected variants present in a healthy population relative to the number of actual variants observed [44]. They provide a metric for the tolerance of each isotype to missense variation, with higher values representing genes that are more intolerant to amino acid sequence changes. Interestingly, we observed high MCM scores for isotypes linked with neurodevelopmental disorders (TUBα1A, TUBβ2A, TUBβ2B, TUBβ3, TUBβ4A and TUBβ5). TUBβ6 is the only exception, and has only one pathogenic mutation reported so far, causing congenital facial palsy. These scores contrast with the much lower scores observed for TUBβ1 and TUBβ8, associated with platelet defects [45–49] and female infertility [22,23,50–53], respectively, which suggest that they are much more tolerant to sequence variation. Overall, our analysis indicates there is a stronger sequence constraint in the human population for tubulin isotypes that contribute to neurodevelopment. This may be due to the selective pressure of the process, compared to tubulin isotypes expressed in cells that affect organism fitness to a lesser extent. Most pathogenic tubulin mutations are not highly disruptive to protein structure Next, we considered the predicted structural perturbations of pathogenic and putatively benign gnomAD missense variants using FoldX [60]. This outputs a ΔΔG value, in units of kcal/mol, with positive values indicating that a mutation is likely to destabilise protein structure and negative values indicating predicted stabilisation. Previous work has shown that computationally predicted ΔΔG values can sometimes show considerable utility for the identification of pathogenic missense mutations, and for understanding likely molecular disease mechanisms [61]. Interestingly, we observe no significant differences between the ΔΔG values of pathogenic and gnomAD missense variants for tubulin-α, -β or -γ (Fig 3A). Of the individual isotypes, only TUBβ2B shows significantly higher ΔΔG values for pathogenic mutations (p = 0.01), although this would not remain significant when accounting for multiple testing (Fig 3B). We initially used ΔΔG values that only consider the structural impact of variants on the monomer alone, as they are more consistent between structures. However, we also observed very similar results using the full ΔΔG values calculated using the entire complex, including intermolecular interactions, as well as when using absolute ΔΔG values (Fig E in S1 Text). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 3. Comparison of predicted changes in protein stability between pathogenic and putatively benign tubulin mutaitons. ΔΔG values representing the predicted change in free energy of folding were calculated with FoldX considering the structure of the monomeric subunit only. Scores are shown for tubulin-α and -β families globally (A) and in isotypes with at least five identified pathogenic mutations (B). Maroon diamonds indicate the mean ΔΔG values, and mutation totals for each group are also shown at the bottom. The p-values displayed were obtained via unpaired Wilcoxon tests. https://doi.org/10.1371/journal.pcbi.1010611.g003 These results suggest that the structural destabilisation is not a primary molecular disease mechanism underlying pathogenic tubulin mutations, and that considering structural impact is not particularly useful for differentiating between pathogenic and benign tubulin variants. Notably, this aligns with recent work showing that the predicted effects on protein stability tend to be much milder in gain-of-function and dominant-negative mutations than for pathogenic mutations associated with a loss of function [37], supporting the idea that most pathogenic tubulin mutations are due to non-loss-of-function effects. Variant effect predictors show poor performance in discrimination between pathogenic and putatively benign tubulin mutations Next, we assessed the abilities of 25 different VEPs to distinguish between pathogenic and putatively benign tubulin missense mutations. A complete set of predictions for every tubulin mutation from all VEPs is provided in S3 Table. To compare the performance of different VEPs, we first used a metric of predictor performance, known as the receiver operating characteristic (ROC) area under the curve (AUC) generated from each VEP over the entire dataset of tubulin missense variants (Fig 4A). Overall, the VEPs performed very poorly. Most had overall AUCs below 0.6, with the top-performing predictor, REVEL, having an AUC of only 0.68. In contrast, a recent study using a very similar methodology found that many VEPs had overall AUCs above 0.8, e.g. REVEL had an AUC of 0.9 for haploinsufficient disease genes, 0.85 for genes associated with a gain-of-function, and 0.83 for genes associated with dominant-negative effects [37]. Thus, even considering that VEPs tend to do worse for non-loss-of-function mutations, the performance we observe here for tubulin mutations is strikingly poor. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 4. Assessment of VEP performance for identification of pathogenic tubulin mutations. (A) ROC AUC values for each VEP across all tubulin-α and -β isotypes with at least one identified pathogenic mutation, colour coded according to predictor category. (B) Distribution of ROC AUC values across all VEPs for isotypes with at least 10 identified pathogenic missense mutations. Dashed line indicates the performance of a random predictor. Maroon diamonds indicate the mean area. https://doi.org/10.1371/journal.pcbi.1010611.g004 It is important to note that our analysis is likely to overstate the predictive power of some VEPs. Supervised machine learning approaches underpin most VEPs, and typically use datasets of known pathogenic and benign variants for training. Since some VEPs are likely to have been trained using some of the tubulin mutations in our evaluation, their performance has a strong possibility of being overstated. This problem is particularly acute for metapredictors, including the top-performing methods in our analysis, REVEL and M-CAP, which combine supervised learning with multiple other predictors as inputs. In contrast, predictors based upon unsupervised machine approaches and those utilising empirical calculations should be free from this bias. Therefore, given the performance of the unsupervised predictor DeepSequence, ranking third overall, we likely consider it to be the most reliable predictor of tubulin mutation pathogenicity, consistent with its top-ranking performance in a recent study [62]. However, even DeepSequence only achieves an AUC of 0.63 for tubulin mutations here, compared to well over 0.8 for all disease-associated proteins tested in that study. Next, we compared the AUCs calculated for individual isotypes across all VEPs, considering isotypes with at least 10 pathogenic mutations. We found that pathogenic mutations in the tubulin-β isotypes TUBβ2B, TUBβ3 and TUBβ4A, which are all associated with neurodevelopmental diseases, were predicted better compared to TUBβ8, associated with oocyte maturation defects (Fig 4B). Therefore, we classified tubulins into two groups based upon observed disease phenotypes: neurodevelopmental and other (as classified in Table 1). We observed a significantly better performance on isotypes linked to neurodevelopmental diseases (Fig 5A). Interestingly, when we compare the performance on neurodevelopmental vs other disorders across the individual VEPs (Fig 5B), we observe that metapredictors and supervised VEPs, like REVEL, M-CAP and VEST4, outperformed all other VEPs on isotypes linked to neurodevelopmental disease but showed a drastic decrease in performance on other phenotypes. In contrast, unsupervised DeepSequence shows very similar performance between the two groups. This strongly suggests that that certain VEPs have likely been overfitted in their training against the neurodevelopmental mutations. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 5. Comparison of VEP performance on pathogenic mutations in tubulin genes associated with neurodevelopmental vs other disease phenotypes. (A) Distribution of ROC AUC values across all VEPs for isotypes with mutations linked with neurodevelopmental or other disease phenotypes. The p-value stated was obtained via a paired Wilcoxon test. Maroon diamonds indicate the mean area. (B) ROC AUC values for each VEP in isotypes with mutations associated with neurodevelopmental or other disease phenotypes. Dashed line indicates the performance of a random predictor. https://doi.org/10.1371/journal.pcbi.1010611.g005 Given its overall performance in our analyses and unsupervised nature, we currently recommend DeepSequence for predicting the effects of tubulin mutations, although we emphasise that its predictive utility is still relatively limited. Therefore, we have produced DeepSequence predictions for every possible amino acid substitution across most tubulin isotypes and have provided them as a resource (S4 Table). We have also calculated optimal thresholds for DeepSequence using the closest point to the top left corner of our ROC curves. Based upon this, we suggest that DeepSequence scores lower than -5.89 are likely to be pathogenic in isotypes linked with neurodevelopmental phenotypes, and lower than -4.83 for isotypes linked with other phenotypes. [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010611 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/