[HN Gopher] What's next for AlphaFold and the AI protein-folding...
       ___________________________________________________________________
        
       What's next for AlphaFold and the AI protein-folding revolution
        
       Author : digital55
       Score  : 113 points
       Date   : 2022-04-13 14:29 UTC (8 hours ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | photochemsyn wrote:
       | It's kind of surprising that AlphaFold has some success with
       | random sequences of amino acids:
       | 
       | > "Baker's team gets AlphaFold and RoseTTAFold to "hallucinate"
       | new proteins. The researchers have altered the AI code so that,
       | given random sequences of amino acids, the software will optimize
       | them until they resemble something that the neural networks
       | recognize as a protein. In December 2021, Baker and his
       | colleagues reported expressing 129 of these hallucinated proteins
       | in bacteria, and found that about one-fifth of them folded into
       | something resembling their predicted shape."
       | 
       | 20% is not that great but it has potential. One long-standing
       | goal is the de novo design of protein-based industrial catalysts
       | for specific chemical transformations. Proteins from bacteria
       | that live in boiling sulfur vents etc. have been used to some
       | extent, but the idea is that similar proteins could be designed
       | for a much wider variety of industrial processes. As the article
       | notes, specificity remains a challenge (and designed proteins
       | don't approach the efficiency of the evolutionary selected
       | proteins), but it still seems promising.
       | 
       | P.S. I'm a bit more skeptical about the drug-design programs.
       | It's not so much that novel drugs can't be designed that bind to
       | the desired targets, it's that they might bind to a whole lot of
       | undesired targets as well, leading to nasty side effects. Now if
       | you could screen against the whole proteome, perhaps.
        
         | flobosg wrote:
         | > 20% is not that great but it has potential.
         | 
         | 20% success rate is in line with other protein design methods,
         | though.
        
           | gfodor wrote:
           | I'd imagine the success rate isn't apples to apples - the
           | real measure is "time, energy, and manpower expenditure
           | needed per generated protein"
        
             | flobosg wrote:
             | Both measures can be quite similar. Most protein designs
             | can be screened in parallel for solubility and successful
             | designs can be further engineered and tested in a high-
             | throughput manner.
        
       | fabian2k wrote:
       | I found it interesting that AlphaFold can't reliably predict the
       | structures for mutations that disrupt structure. The explanation
       | makes a lot of sense though.
       | 
       | It is sometimes important to remind oneself that the selection of
       | protein structures that exist in nature and that we determined
       | experimentally is biased. Nature doesn't like proteins that
       | misfold because they can easily cause trouble. And proteins with
       | less defined structures are generally harder to solve with the
       | usual methods like X-ray crystallography. The list of protein
       | structures we know isn't a representative sample of all possible
       | protein structures, it's mostly structures that are useful in
       | nature and that we can determine with the methods we have
       | available.
        
         | alan-hn wrote:
         | >proteins with less defined structures are generally harder to
         | solve with the usual methods like X-ray crystallography
         | 
         | What do you mean by 'proteins with less defined structures'?
         | I'm not familiar with what this phrase could mean, could you
         | please expand on this concept?
        
           | fabian2k wrote:
           | Less defined means flexible in this case. So either parts
           | that are completely random on their own, or parts that can
           | adopt multiple different structures.
           | 
           | There are also intrinsically disordered proteins that have no
           | defined structure when they are on their own, that's
           | essentially like a piece of string that is almost completely
           | flexible. Those proteins can still adopt a specific well-
           | defined structure if they bind to something else.
        
             | alan-hn wrote:
             | So does flexible mean that there may be different amino
             | acids in a portion of the peptide? From my understanding,
             | when flexibility is discussed in terms of proteins we're
             | talking about rigid vs flexible side chains which can move
             | or rotate along specific bonds
             | 
             | So for the intrinsically disordered ones, are you mainly
             | talking about the secondary or tertiary structures? My
             | assumption based on your statement is that we're keeping
             | the same primary structure (order of amino acids) but they
             | don't have many (if any at all) intermolecular
             | interactions? Would it be safe to assume that you're
             | referring to shorter polypeptide rather than large
             | proteins?
        
               | dekhn wrote:
               | in disordered proteins, there is no permanent tertiary
               | structure. they may have some secondary structure, but
               | the relations of those structural elements can change in
               | time. It does not mean the seuqence has variation in it.
        
               | alan-hn wrote:
               | Does this mean that they have multiple conformational
               | states with similar energies that are easy for it to
               | transition between? How different are the states and is
               | this how the protein normally does its proteiny stuff?
        
               | dekhn wrote:
               | yes, I would say that intrisnically disordered proteins
               | adopt something like an unfolded state, which is to say
               | that they can visit a wide range of structures that are
               | at similar energy levels, all of which are accessible at
               | ~room temp. I can't really answer in more detail because
               | all the ID proteins are fairly different an dhow they do
               | their job is hard to understand compared to stable static
               | "rocks" like enzyymes.
        
               | throwawaybio3 wrote:
               | Enzymes aren't stable and static -- usually in their
               | active site they have significant conformational changes
               | that enable catalysis of the relevant chemical reaction.
               | It's quite a problem that we don't have general robust
               | ways of directly elucidating those transient structures,
               | a lot of our understanding of catalysis is still held
               | back or slow-evolving because we can only use indirect
               | and cumbersome methods (like isotopic mutation + laser
               | IR)
               | 
               | I would consider most enzymes to be intrinsically
               | disordered at their active sites.
        
               | dekhn wrote:
               | No enzymes are not intrinsically disordered at their
               | active sites. They are highly ordered. Most enzymes don't
               | undergo large changes- they accept a molecule, do their
               | business, and release it. You're thinking of other
               | proteins like motor proteins which under go large,
               | controlled conformational changes.
               | 
               | The active site is structured to stabilize the transition
               | state of the affected molecule and move it from one state
               | to the next in the chemical reaction. That requires very
               | specific shapes and correlated changes. But of course,
               | this being biology, you can remove all 3 active site
               | residues in a serine protease catalytic triad, and still
               | see proteolysis because the protein, when it binds the
               | substrate, forces the subtrate into its transition
               | pathway.
               | 
               | People have been working on these things for quite some
               | time- I saw talks about time-resolved crystallography of
               | active sites, and while they say "significant structure
               | changes", they really only mean localized breathing-like
               | motions, not massive rotations of entire domains.
        
               | f38zf5vdt wrote:
               | Yes, many proteins have transitional global arrangements
               | that it traverses as it meets some goal. For example,
               | kinesin and dynein walk along microtubules in a way where
               | we could never perfectly characterize the intermediary
               | states since it's effectively a motor with free rotation
               | around certain elements.
               | 
               | A lot of crystallography is focused on enzymatic
               | reactions where you bind a ligand that sits there for the
               | sake of introducing some conformation that you can study.
               | The ligands generally approximate the natural substrate
               | at either the beginning, end, or some intermediate step
               | in enzyme catalyzed synthesis.
        
               | panabee wrote:
               | is it possible to identify which proteins are
               | intrinsically disordered based on amino acid sequence
               | alone (or even base sequence)?
               | 
               | put another way, is it possible to a priori determine if
               | a protein is ID or ordered?
               | 
               | for instance, you said enzymes are highly ordered. is
               | this based on experimental observations (which could
               | later be wrong if imaging techniques improve) or is there
               | some principle that allows us to treat this as a fact?
               | 
               | thanks in advance for your time.
        
               | flobosg wrote:
               | > is it possible to a priori determine if a protein is ID
               | or ordered?
               | 
               | There's software that attempts to predict intrinsic
               | disorder based on sequence alone, but in general, in the
               | absence of homolog (evolutionarily related) proteins with
               | known structure you would still need to check
               | experimentally for disorder.
               | 
               | EDIT:
               | 
               | > if the goal is to reliably assess certain viral
               | proteins as ID or ordered, experimental methods are the
               | only methods for achieving this?
               | 
               | If you don't find homologs with solved structures,
               | experimental characterization is the way to go.
        
               | panabee wrote:
               | thanks for the explanation. to clarify, if the goal is to
               | reliably assess certain viral proteins as ID or ordered,
               | are experimental methods the only methods for achieving
               | this?
        
               | [deleted]
        
               | dekhn wrote:
               | A priori? No. Typically this would be determined by
               | synthesizing or expressing the protein of interest and
               | then using something like CD (circular dichroism).
               | 
               | There is an absolutely enormous amount of experimental
               | data about enzyme structure, but frankly I think the
               | simplest is to just understand that the modern ideas
               | about the reversible protein folding process came from
               | ribonuclease, a protein that cuts RNA:
               | https://en.wikipedia.org/wiki/Anfinsen%27s_dogma
               | 
               | There may also be intrinsically disorderd enzymes, I'm
               | not really sure how they would work, but of course, in
               | biology, there's always a weird example that violates
               | normal expectations because evolution once randomly tried
               | somethign a billion years ago and got stuck with it.
        
               | panabee wrote:
               | thanks for the clarification. your papers also seem
               | interesting, will check those out.
               | 
               | the goal is to reliably characterize certain viral
               | proteins as ID or ordered. would you happen to have any
               | advice on this?
        
           | abcc8 wrote:
           | Many proteins have intrinsically disordered regions that are
           | hypothesized to be directly related to the protein's role in
           | the cell. These regions are termed disordered because current
           | methods used to determine the structure of proteins are
           | unable to resolve a regular structure for these regions in
           | the context of a protein crystal or protein in solution. This
           | publication is an informative review on the topic:
           | https://pubs.acs.org/doi/10.1021/cr400525m
        
           | jostmey wrote:
           | The same protein can deform into multiple different 3D
           | shapes, called conformations. Some proteins are rigid and
           | exist almost exclusively in a single conformation. It is
           | probably easier to determine the 3D structure of proteins
           | with a single, dominant conformation. Other proteins don't
           | have well defined conformations, and are more like a tangle
           | of rope that can bend in many different ways
        
             | panabee wrote:
             | thanks for the explanation. what are the biggest factors
             | influencing conformation? what are the best ways today for
             | imaging proteins with different conformations, and what are
             | the limitations of these methods?
        
           | dekhn wrote:
           | think loose floppy piles of spaghetti instead of well-defined
           | rocks.
        
           | tintor wrote:
           | Example by analogy: Flat tire has less defined structure, and
           | can take many shapes. Inflated tire has more defined
           | structure, and behaves more predictably.
        
         | axg11 wrote:
         | Very nicely explained. Also hints at the next big frontier for
         | protein folding: improving the prediction of those disruptive
         | effects.
        
         | flobosg wrote:
         | > I found it interesting that AlphaFold can't reliably predict
         | the structures for mutations that disrupt structure
         | 
         | It's not _that_ surprising given the conceptual background of
         | the method. Since it's relying on evolutionarily coupled
         | residues, AlphaFold is looking at sets of complementary
         | mutations that keep or rescue a determined structure, i.e. the
         | complete opposite of structural disruption.
         | 
         | > The list of protein structures we know isn't a representative
         | sample of all possible protein structures
         | 
         | And the same goes for protein sequences.
        
       | peter303 wrote:
       | I see a Nobel Prize around the corner.
       | 
       | They arent often given for techniques or computation. But the
       | results are outstanding.
        
       | mupuff1234 wrote:
       | Does Deepmind sell anything? Their site has no mention of any
       | type of offering.
        
         | benrapscallion wrote:
         | They have spun out a drug design company named Isomorphic Labs.
         | [1]
         | 
         | [1] https://www.isomorphiclabs.com/
        
           | dekhn wrote:
           | amusingly, I work for a pharma and they don't even return our
           | calls. I wonder how seriously they take this business,
           | because if I was selling a product based on this, pharma
           | would be my first customer.
        
             | alphabetting wrote:
             | Could be wrong but I think Deepmind sees more value in
             | elite AI/ML talent that Alphafold will draw and help retain
             | than future potential profits on drug discovery. Open
             | sourcing Alphafold and removing commercial restrictions
             | wouldn't make much sense if drug profits were their goal.
        
               | dekhn wrote:
               | No, isomorphic labs was set up to specifically
               | commercialize this. If their goal is to be a discovery
               | company, they are fairly naive.
        
               | alphabetting wrote:
               | Yeah I know about Isomorphic labs. My point is that
               | talent Alphafold will draw is more valuable than
               | potential drug discovery profits.
        
               | folli wrote:
               | If the promise of in silico drug design comes to
               | fruition, the potential drug discovery profits could very
               | well rival Google's ad profits.
        
               | dekhn wrote:
               | Sure. AlphaFold is, in fact, the greatest shot at revenue
               | that DeepMind has shown so far (and they are under
               | intense pressure from Alphabet to show revenue).
        
               | alphabetting wrote:
               | I don't think there is any pressure on that front. They
               | are supposedly profitable now (though i'm guessing this
               | is partially accounting tricks) but there just isn't a
               | need to be profitable. Search and Youtube print money to
               | fund their R&D ($31B last year alone). The goal is AGI or
               | close to it.
               | 
               | https://venturebeat.com/2021/10/10/ai-lab-deepmind-
               | becomes-p...
        
               | dekhn wrote:
               | The "profit" you're pointing at is money that Google pays
               | DeepMind to do software and machine learning as a service
               | for them. This pays off, for example with Jax, where
               | nobody in Google Research could touch it because Jeff
               | Dean/Tensorflow, until DM demonstrated (with alphafold)
               | that Jax could do nobel-prize-winning research, to the
               | point where Jeff has admitted that tensorflow has serious
               | problems and systems like jax are the future (see the
               | palm paper!!!)
        
               | mechagodzilla wrote:
               | Where does the value come in if you pay them lots of
               | money to work on unprofitable things? Just by virtue of
               | not letting your competitors hire them?
        
               | alphabetting wrote:
               | Profit is later. I strongly believe this take.
               | 
               | https://twitter.com/fchollet/status/1502775288257601540
        
             | pkaye wrote:
             | Must have taken the Google approach and already terminated
             | the product. /s
        
         | elcomet wrote:
         | Alphafold was released, the code is open source and the
         | pretrained weights are available for free.
        
           | asdff wrote:
           | I believe only for academic use though right? I don't know if
           | it can be used for commercial use.
        
             | lucidrains wrote:
             | incorrect, they modified the license so it can be used for
             | commercial use - https://github.com/deepmind/alphafold/comm
             | it/8173117130e6df8...
        
       | xnx wrote:
       | Facebook: Releases a tool that makes amusing image mashups
       | Google: Makes revolutionary progress in one of the hardest
       | problems in chemistry
        
         | flobosg wrote:
         | To be fair, they have published a few papers and preprints
         | related to the topic. See e.g.
         | https://www.pnas.org/doi/10.1073/pnas.2016239118 and
         | https://www.biorxiv.org/content/10.1101/2021.02.12.430858v3
        
         | dekhn wrote:
         | This wasn't Google, it was DeepMind. Google doesn't get any
         | credit for this. I tried to start this project at Google but it
         | conflicted with the Google Health team's goals.
        
           | xiphias2 wrote:
           | Even if it's a sister project, it's great PR for Google. I
           | accept more ads from Google as it gives back so much in
           | healthcare. I wish Meta would do the same, I wouldn't care if
           | it's part of Facebook or not. GMail was something similar at
           | the start: just do something good, to make more people like
           | Google.
           | 
           | As for your own project I'm sorry for you: there are no more
           | 20% projects, like in the old times :(
        
           | bawolff wrote:
           | Its owned by google (alphabet) i think they deserve some
           | props for it.
        
             | dekhn wrote:
             | google is owned by alphabet. DM and Google are siblings.
        
       | codeflo wrote:
       | I think this article does a good job of highlighting the
       | difference between simulations and ML-based approaches. The
       | latter are faster, but have limitations outside of their training
       | parameters. As with everything in ML, broader training data to
       | cover those cases probably helps. Though I would guess some of
       | the problems could be inherent, that there fundamentally is no
       | computational shortcut to this problem, whether you use a neural
       | network or not.
        
       | dekhn wrote:
       | I just wish people would stop using the word "fold" for this.
       | It's not folding. It's just structure prediction. It's great at
       | structure prediction (static prediction of a single structure)
       | and not at all at the folding process (which is dynamic and
       | rapidly changing).
        
         | flobosg wrote:
         | "Protein fold" and "protein folding" are two different
         | concepts. Folds are structural categories, folding is the
         | biophysical process. But I agree that there are better words
         | out there to name such a tool.
        
           | dekhn wrote:
           | That's very misleading, as you can see. I believe we should
           | not use the term fold for structural categories as it's a
           | misnaming. It's a historical accident that came about before
           | people began to understand that folding is a process, not an
           | on/off switch.
           | 
           | See my work in this area:
           | https://pubmed.ncbi.nlm.nih.gov/24345941/ which is explicitly
           | attempting to simulate an approximate folding pathway(s).
        
             | flobosg wrote:
             | There are other terms analogous to "fold", like "topology"
             | (as used in CATH), but they will probably never see
             | widespread use.
        
               | daveguy wrote:
               | The most accurate term would probably be "tertiary
               | structure". Although AlphaTertiaryStructure is a
               | mouthful. They could have named it AlphaTS.
        
               | flobosg wrote:
               | > The most accurate term would probably be "tertiary
               | structure".
               | 
               | There's an AlphaFold variant that can predict quaternary
               | structure: https://www.biorxiv.org/content/10.1101/2021.1
               | 0.04.463034v2
               | 
               | > They could have named it AlphaTS.
               | 
               | TS looks more like "transition state" to me.
        
               | daveguy wrote:
               | Ah, good point. TS would be as bad or worse than "Fold"
               | in this context.
        
               | dekhn wrote:
               | DeepMindProteinStructurePredictor or
               | deep_mind_protein_structure_predictor if you don't like
               | camel case
        
               | gilleain wrote:
               | Even 'topology' is a little confusing to those more
               | familiar with the term from maths.
               | 
               | For the 'CATH' hierarchical classification, the
               | 'Topology' level is something like the organization of
               | secondary structure in an 'Architecture'. This has some
               | relationship to topology in the general sense, but is a
               | narrower definition.
               | 
               | For me, the 'fold' is what happens after 'folding'
               | occurs, but I take the point that it is confusing.
        
               | flobosg wrote:
               | If I recall correctly, the difference between
               | Architecture and Topology in CATH is that the former is
               | independent of connectivity.
               | 
               | > For me, the 'fold' is what happens after 'folding'
               | occurs, but I take the point that it is confusing.
               | 
               | Same here.
        
               | dekhn wrote:
               | Topology actually makes some sense here in that a very
               | small number of proteins do fold into knots! This was a
               | huge surprise and completely contradicted most
               | predictions.
               | https://en.wikipedia.org/wiki/Knotted_protein
        
               | dekhn wrote:
               | Yup. I've had this discussion repeatedly with the
               | developers of SCOP and the folks who run CASP and they
               | simply will not budge.
        
       ___________________________________________________________________
       (page generated 2022-04-13 23:01 UTC)