[HN Gopher] What's next for AlphaFold and the AI protein-folding... ___________________________________________________________________ What's next for AlphaFold and the AI protein-folding revolution Author : digital55 Score : 113 points Date : 2022-04-13 14:29 UTC (8 hours ago) (HTM) web link (www.nature.com) (TXT) w3m dump (www.nature.com) | photochemsyn wrote: | It's kind of surprising that AlphaFold has some success with | random sequences of amino acids: | | > "Baker's team gets AlphaFold and RoseTTAFold to "hallucinate" | new proteins. The researchers have altered the AI code so that, | given random sequences of amino acids, the software will optimize | them until they resemble something that the neural networks | recognize as a protein. In December 2021, Baker and his | colleagues reported expressing 129 of these hallucinated proteins | in bacteria, and found that about one-fifth of them folded into | something resembling their predicted shape." | | 20% is not that great but it has potential. One long-standing | goal is the de novo design of protein-based industrial catalysts | for specific chemical transformations. Proteins from bacteria | that live in boiling sulfur vents etc. have been used to some | extent, but the idea is that similar proteins could be designed | for a much wider variety of industrial processes. As the article | notes, specificity remains a challenge (and designed proteins | don't approach the efficiency of the evolutionary selected | proteins), but it still seems promising. | | P.S. I'm a bit more skeptical about the drug-design programs. | It's not so much that novel drugs can't be designed that bind to | the desired targets, it's that they might bind to a whole lot of | undesired targets as well, leading to nasty side effects. Now if | you could screen against the whole proteome, perhaps. | flobosg wrote: | > 20% is not that great but it has potential. | | 20% success rate is in line with other protein design methods, | though. | gfodor wrote: | I'd imagine the success rate isn't apples to apples - the | real measure is "time, energy, and manpower expenditure | needed per generated protein" | flobosg wrote: | Both measures can be quite similar. Most protein designs | can be screened in parallel for solubility and successful | designs can be further engineered and tested in a high- | throughput manner. | fabian2k wrote: | I found it interesting that AlphaFold can't reliably predict the | structures for mutations that disrupt structure. The explanation | makes a lot of sense though. | | It is sometimes important to remind oneself that the selection of | protein structures that exist in nature and that we determined | experimentally is biased. Nature doesn't like proteins that | misfold because they can easily cause trouble. And proteins with | less defined structures are generally harder to solve with the | usual methods like X-ray crystallography. The list of protein | structures we know isn't a representative sample of all possible | protein structures, it's mostly structures that are useful in | nature and that we can determine with the methods we have | available. | alan-hn wrote: | >proteins with less defined structures are generally harder to | solve with the usual methods like X-ray crystallography | | What do you mean by 'proteins with less defined structures'? | I'm not familiar with what this phrase could mean, could you | please expand on this concept? | fabian2k wrote: | Less defined means flexible in this case. So either parts | that are completely random on their own, or parts that can | adopt multiple different structures. | | There are also intrinsically disordered proteins that have no | defined structure when they are on their own, that's | essentially like a piece of string that is almost completely | flexible. Those proteins can still adopt a specific well- | defined structure if they bind to something else. | alan-hn wrote: | So does flexible mean that there may be different amino | acids in a portion of the peptide? From my understanding, | when flexibility is discussed in terms of proteins we're | talking about rigid vs flexible side chains which can move | or rotate along specific bonds | | So for the intrinsically disordered ones, are you mainly | talking about the secondary or tertiary structures? My | assumption based on your statement is that we're keeping | the same primary structure (order of amino acids) but they | don't have many (if any at all) intermolecular | interactions? Would it be safe to assume that you're | referring to shorter polypeptide rather than large | proteins? | dekhn wrote: | in disordered proteins, there is no permanent tertiary | structure. they may have some secondary structure, but | the relations of those structural elements can change in | time. It does not mean the seuqence has variation in it. | alan-hn wrote: | Does this mean that they have multiple conformational | states with similar energies that are easy for it to | transition between? How different are the states and is | this how the protein normally does its proteiny stuff? | dekhn wrote: | yes, I would say that intrisnically disordered proteins | adopt something like an unfolded state, which is to say | that they can visit a wide range of structures that are | at similar energy levels, all of which are accessible at | ~room temp. I can't really answer in more detail because | all the ID proteins are fairly different an dhow they do | their job is hard to understand compared to stable static | "rocks" like enzyymes. | throwawaybio3 wrote: | Enzymes aren't stable and static -- usually in their | active site they have significant conformational changes | that enable catalysis of the relevant chemical reaction. | It's quite a problem that we don't have general robust | ways of directly elucidating those transient structures, | a lot of our understanding of catalysis is still held | back or slow-evolving because we can only use indirect | and cumbersome methods (like isotopic mutation + laser | IR) | | I would consider most enzymes to be intrinsically | disordered at their active sites. | dekhn wrote: | No enzymes are not intrinsically disordered at their | active sites. They are highly ordered. Most enzymes don't | undergo large changes- they accept a molecule, do their | business, and release it. You're thinking of other | proteins like motor proteins which under go large, | controlled conformational changes. | | The active site is structured to stabilize the transition | state of the affected molecule and move it from one state | to the next in the chemical reaction. That requires very | specific shapes and correlated changes. But of course, | this being biology, you can remove all 3 active site | residues in a serine protease catalytic triad, and still | see proteolysis because the protein, when it binds the | substrate, forces the subtrate into its transition | pathway. | | People have been working on these things for quite some | time- I saw talks about time-resolved crystallography of | active sites, and while they say "significant structure | changes", they really only mean localized breathing-like | motions, not massive rotations of entire domains. | f38zf5vdt wrote: | Yes, many proteins have transitional global arrangements | that it traverses as it meets some goal. For example, | kinesin and dynein walk along microtubules in a way where | we could never perfectly characterize the intermediary | states since it's effectively a motor with free rotation | around certain elements. | | A lot of crystallography is focused on enzymatic | reactions where you bind a ligand that sits there for the | sake of introducing some conformation that you can study. | The ligands generally approximate the natural substrate | at either the beginning, end, or some intermediate step | in enzyme catalyzed synthesis. | panabee wrote: | is it possible to identify which proteins are | intrinsically disordered based on amino acid sequence | alone (or even base sequence)? | | put another way, is it possible to a priori determine if | a protein is ID or ordered? | | for instance, you said enzymes are highly ordered. is | this based on experimental observations (which could | later be wrong if imaging techniques improve) or is there | some principle that allows us to treat this as a fact? | | thanks in advance for your time. | flobosg wrote: | > is it possible to a priori determine if a protein is ID | or ordered? | | There's software that attempts to predict intrinsic | disorder based on sequence alone, but in general, in the | absence of homolog (evolutionarily related) proteins with | known structure you would still need to check | experimentally for disorder. | | EDIT: | | > if the goal is to reliably assess certain viral | proteins as ID or ordered, experimental methods are the | only methods for achieving this? | | If you don't find homologs with solved structures, | experimental characterization is the way to go. | panabee wrote: | thanks for the explanation. to clarify, if the goal is to | reliably assess certain viral proteins as ID or ordered, | are experimental methods the only methods for achieving | this? | [deleted] | dekhn wrote: | A priori? No. Typically this would be determined by | synthesizing or expressing the protein of interest and | then using something like CD (circular dichroism). | | There is an absolutely enormous amount of experimental | data about enzyme structure, but frankly I think the | simplest is to just understand that the modern ideas | about the reversible protein folding process came from | ribonuclease, a protein that cuts RNA: | https://en.wikipedia.org/wiki/Anfinsen%27s_dogma | | There may also be intrinsically disorderd enzymes, I'm | not really sure how they would work, but of course, in | biology, there's always a weird example that violates | normal expectations because evolution once randomly tried | somethign a billion years ago and got stuck with it. | panabee wrote: | thanks for the clarification. your papers also seem | interesting, will check those out. | | the goal is to reliably characterize certain viral | proteins as ID or ordered. would you happen to have any | advice on this? | abcc8 wrote: | Many proteins have intrinsically disordered regions that are | hypothesized to be directly related to the protein's role in | the cell. These regions are termed disordered because current | methods used to determine the structure of proteins are | unable to resolve a regular structure for these regions in | the context of a protein crystal or protein in solution. This | publication is an informative review on the topic: | https://pubs.acs.org/doi/10.1021/cr400525m | jostmey wrote: | The same protein can deform into multiple different 3D | shapes, called conformations. Some proteins are rigid and | exist almost exclusively in a single conformation. It is | probably easier to determine the 3D structure of proteins | with a single, dominant conformation. Other proteins don't | have well defined conformations, and are more like a tangle | of rope that can bend in many different ways | panabee wrote: | thanks for the explanation. what are the biggest factors | influencing conformation? what are the best ways today for | imaging proteins with different conformations, and what are | the limitations of these methods? | dekhn wrote: | think loose floppy piles of spaghetti instead of well-defined | rocks. | tintor wrote: | Example by analogy: Flat tire has less defined structure, and | can take many shapes. Inflated tire has more defined | structure, and behaves more predictably. | axg11 wrote: | Very nicely explained. Also hints at the next big frontier for | protein folding: improving the prediction of those disruptive | effects. | flobosg wrote: | > I found it interesting that AlphaFold can't reliably predict | the structures for mutations that disrupt structure | | It's not _that_ surprising given the conceptual background of | the method. Since it's relying on evolutionarily coupled | residues, AlphaFold is looking at sets of complementary | mutations that keep or rescue a determined structure, i.e. the | complete opposite of structural disruption. | | > The list of protein structures we know isn't a representative | sample of all possible protein structures | | And the same goes for protein sequences. | peter303 wrote: | I see a Nobel Prize around the corner. | | They arent often given for techniques or computation. But the | results are outstanding. | mupuff1234 wrote: | Does Deepmind sell anything? Their site has no mention of any | type of offering. | benrapscallion wrote: | They have spun out a drug design company named Isomorphic Labs. | [1] | | [1] https://www.isomorphiclabs.com/ | dekhn wrote: | amusingly, I work for a pharma and they don't even return our | calls. I wonder how seriously they take this business, | because if I was selling a product based on this, pharma | would be my first customer. | alphabetting wrote: | Could be wrong but I think Deepmind sees more value in | elite AI/ML talent that Alphafold will draw and help retain | than future potential profits on drug discovery. Open | sourcing Alphafold and removing commercial restrictions | wouldn't make much sense if drug profits were their goal. | dekhn wrote: | No, isomorphic labs was set up to specifically | commercialize this. If their goal is to be a discovery | company, they are fairly naive. | alphabetting wrote: | Yeah I know about Isomorphic labs. My point is that | talent Alphafold will draw is more valuable than | potential drug discovery profits. | folli wrote: | If the promise of in silico drug design comes to | fruition, the potential drug discovery profits could very | well rival Google's ad profits. | dekhn wrote: | Sure. AlphaFold is, in fact, the greatest shot at revenue | that DeepMind has shown so far (and they are under | intense pressure from Alphabet to show revenue). | alphabetting wrote: | I don't think there is any pressure on that front. They | are supposedly profitable now (though i'm guessing this | is partially accounting tricks) but there just isn't a | need to be profitable. Search and Youtube print money to | fund their R&D ($31B last year alone). The goal is AGI or | close to it. | | https://venturebeat.com/2021/10/10/ai-lab-deepmind- | becomes-p... | dekhn wrote: | The "profit" you're pointing at is money that Google pays | DeepMind to do software and machine learning as a service | for them. This pays off, for example with Jax, where | nobody in Google Research could touch it because Jeff | Dean/Tensorflow, until DM demonstrated (with alphafold) | that Jax could do nobel-prize-winning research, to the | point where Jeff has admitted that tensorflow has serious | problems and systems like jax are the future (see the | palm paper!!!) | mechagodzilla wrote: | Where does the value come in if you pay them lots of | money to work on unprofitable things? Just by virtue of | not letting your competitors hire them? | alphabetting wrote: | Profit is later. I strongly believe this take. | | https://twitter.com/fchollet/status/1502775288257601540 | pkaye wrote: | Must have taken the Google approach and already terminated | the product. /s | elcomet wrote: | Alphafold was released, the code is open source and the | pretrained weights are available for free. | asdff wrote: | I believe only for academic use though right? I don't know if | it can be used for commercial use. | lucidrains wrote: | incorrect, they modified the license so it can be used for | commercial use - https://github.com/deepmind/alphafold/comm | it/8173117130e6df8... | xnx wrote: | Facebook: Releases a tool that makes amusing image mashups | Google: Makes revolutionary progress in one of the hardest | problems in chemistry | flobosg wrote: | To be fair, they have published a few papers and preprints | related to the topic. See e.g. | https://www.pnas.org/doi/10.1073/pnas.2016239118 and | https://www.biorxiv.org/content/10.1101/2021.02.12.430858v3 | dekhn wrote: | This wasn't Google, it was DeepMind. Google doesn't get any | credit for this. I tried to start this project at Google but it | conflicted with the Google Health team's goals. | xiphias2 wrote: | Even if it's a sister project, it's great PR for Google. I | accept more ads from Google as it gives back so much in | healthcare. I wish Meta would do the same, I wouldn't care if | it's part of Facebook or not. GMail was something similar at | the start: just do something good, to make more people like | Google. | | As for your own project I'm sorry for you: there are no more | 20% projects, like in the old times :( | bawolff wrote: | Its owned by google (alphabet) i think they deserve some | props for it. | dekhn wrote: | google is owned by alphabet. DM and Google are siblings. | codeflo wrote: | I think this article does a good job of highlighting the | difference between simulations and ML-based approaches. The | latter are faster, but have limitations outside of their training | parameters. As with everything in ML, broader training data to | cover those cases probably helps. Though I would guess some of | the problems could be inherent, that there fundamentally is no | computational shortcut to this problem, whether you use a neural | network or not. | dekhn wrote: | I just wish people would stop using the word "fold" for this. | It's not folding. It's just structure prediction. It's great at | structure prediction (static prediction of a single structure) | and not at all at the folding process (which is dynamic and | rapidly changing). | flobosg wrote: | "Protein fold" and "protein folding" are two different | concepts. Folds are structural categories, folding is the | biophysical process. But I agree that there are better words | out there to name such a tool. | dekhn wrote: | That's very misleading, as you can see. I believe we should | not use the term fold for structural categories as it's a | misnaming. It's a historical accident that came about before | people began to understand that folding is a process, not an | on/off switch. | | See my work in this area: | https://pubmed.ncbi.nlm.nih.gov/24345941/ which is explicitly | attempting to simulate an approximate folding pathway(s). | flobosg wrote: | There are other terms analogous to "fold", like "topology" | (as used in CATH), but they will probably never see | widespread use. | daveguy wrote: | The most accurate term would probably be "tertiary | structure". Although AlphaTertiaryStructure is a | mouthful. They could have named it AlphaTS. | flobosg wrote: | > The most accurate term would probably be "tertiary | structure". | | There's an AlphaFold variant that can predict quaternary | structure: https://www.biorxiv.org/content/10.1101/2021.1 | 0.04.463034v2 | | > They could have named it AlphaTS. | | TS looks more like "transition state" to me. | daveguy wrote: | Ah, good point. TS would be as bad or worse than "Fold" | in this context. | dekhn wrote: | DeepMindProteinStructurePredictor or | deep_mind_protein_structure_predictor if you don't like | camel case | gilleain wrote: | Even 'topology' is a little confusing to those more | familiar with the term from maths. | | For the 'CATH' hierarchical classification, the | 'Topology' level is something like the organization of | secondary structure in an 'Architecture'. This has some | relationship to topology in the general sense, but is a | narrower definition. | | For me, the 'fold' is what happens after 'folding' | occurs, but I take the point that it is confusing. | flobosg wrote: | If I recall correctly, the difference between | Architecture and Topology in CATH is that the former is | independent of connectivity. | | > For me, the 'fold' is what happens after 'folding' | occurs, but I take the point that it is confusing. | | Same here. | dekhn wrote: | Topology actually makes some sense here in that a very | small number of proteins do fold into knots! This was a | huge surprise and completely contradicted most | predictions. | https://en.wikipedia.org/wiki/Knotted_protein | dekhn wrote: | Yup. I've had this discussion repeatedly with the | developers of SCOP and the folks who run CASP and they | simply will not budge. ___________________________________________________________________ (page generated 2022-04-13 23:01 UTC)