[HN Gopher] Alphafold ___________________________________________________________________ Alphafold Author : matejmecka Score : 311 points Date : 2021-07-15 18:22 UTC (4 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | swalsh wrote: | _edit_ I was wrong. Please ignore. | ali_m wrote: | > This is a completely new model that was entered in CASP14 and | published in Nature. | f38zf5vdt wrote: | From the repo: | | > This package provides an implementation of the inference | pipeline of AlphaFold v2.0 | culopatin wrote: | Does anyone know if this can be made to work with rna fold? | qeternity wrote: | Ok, so biochemists: which bit of the secret sauce are they | leaving out? | duckerude wrote: | > The AlphaFold parameters are made available for non-commercial | use only, under the terms of the Creative Commons Attribution- | NonCommercial 4.0 International (CC BY-NC 4.0) license. You can | find details at: https://creativecommons.org/licenses/by- | nc/4.0/legalcode | | Does CC BY-NC actually do this? As far as I can tell it only | really talks about sharing/reproducing, not using. | | Or is the only thing prohibiting other commercial use the words | "available for non-commercial use only"? | mikewarot wrote: | If you took their parameters, then trained it for while on a | different set of data, it would vary from the original. I | wonder how much compute would be required to make the offset | far enough to hold up from scrutiny, and in court. | | Alternatively, you could manually change the network model, add | a few hidden layers, etc... modifying the parameters in step, | and result in a new model and new parameters. Some training to | vary the parameters, and it's now a new work. | sillysaurusx wrote: | Artbreeder has some interesting prior art here: nVidia forbid | commercial use of StyleGAN, but artbreeder disregarded it and | happily sold all the breeding you wanted. No one seemed to | care. | | I suspect that the clause is there to prevent a startup | launching on the basis of "see this trained model? Yeah, that's | literally our business model" though, which is a mildly amusing | thought, wot wot. | | So basically, a few tens of thousands, sure. A few million, big | G might have a problem. | | Still, the smart move would be to launch the business anyway, | and gamble that you can work out a licensing deal. | jfengel wrote: | So... is it possible to clone this and turn it into a | Folding@Home client? How does it do? | kmckiern wrote: | Where there isn't an available crystal structure, Alphafold can | be used to create initial structures for simulation via | folding@home, replacing older homology modeling techniques. | | Source: former folding@home researcher. | dekhn wrote: | no, it wouldn't make sense to do that. Folding@Home is for ab | initio where you don't have any prior info for the structure, | this is for homology modelling. F@H probes the dynamics of | protein folding, this just makes a static prediction. | thesausageking wrote: | The PDF is linked in the article: | | https://www.nature.com/articles/s41586-021-03819-2_reference... | mensetmanusman wrote: | Distribution of this 2 TB file seems like a good use of | torrent... | dekhn wrote: | Fantastic, they released the dataset and code to train the model. | Science will be able to proceed. edit: not the code to train the | model, just the code to run inference. | | The underlying sequence datasets include PDB strucrures and | sequences, and how those map to large collections of sequences | with no known structure (no surprise). Each of those datasets | represents decades of thousands of scientists work, along with | programmers and admins who kept the databases running for decades | with very little grant money (funding long-term databases is | something NIH hated to do until recently). | FredFS456 wrote: | There's a preview paper as well: | https://www.nature.com/articles/s41586-021-03819-2 | dekhn wrote: | Yes, I skimmed the paper already and it wasn't too | surprising. There are details that will take some time to | parse out to understand how important they are. | | Personally, I've found over decades that academic papers like | that are far less useful to me than a github project and | downloadable data that I can inspect, run and modify on my | own. Other folks I know could read that paper and write the | code in a day, I always wish I could do that. | cing wrote: | The process is described in Supplementary, but where do you see | the code to train the model? The repository is the inference | pipeline. | dekhn wrote: | I misread. The data dump is required for inference. | gopalv wrote: | > The total download size is around 428 GB and the total size | when unzipped is 2.2 TB. Please make sure you have a large | enough hard drive space, bandwidth and time to download. | | > This was tested on Google Cloud with a machine using the | nvidia-gpu-cloud-image with 12 vCPUs, 85 GB of RAM, a 100 GB | boot disk, the databases on an additional 3 TB disk, and an | A100 GPU. | | This is amazingly detailed for a researcher who wants to follow | in the track and also Apache licensed, which is one road-bump | out of the way for a commercial enterprise, like an actual drug | manufacturer who wants to burn some money trying this out. | | edit: said the last part too fast, the code has a "the | AlphaFold parameters are made available for non-commercial use | only under the terms of the CC BY-NC 4.0 license" | dekhn wrote: | Yes, all science should be communicated in the form of an | academic paper wiht a supporting git repo and quickly | downloadable dataset and a fast path to reproducing the work. | That would be a huge change from the establishment. | | It's quite unclear what value this will have to pharma; | personally I doubt this has any direct applications (and I'm | one of the few people in the world that can say that with | deep authority). | aantix wrote: | Who benefits from this work? | dekhn wrote: | Primarily the community that previously depended on | homology models. | gnufx wrote: | Surely not all science. Just as well Dirac wasn't required | to communicate that way the equation that fundamentally | underlies the phenomenon discussed, and you couldn't put | the unique facility my thesis work pioneered into git! I do | highly approve of publishing software and data where | possible, of course, since before Free Software needed to | be coined, and it's much easier now. | dekhn wrote: | If you're just publishing equations, you should have an | associated notebook which executes the equations. | | I don't know what you mean you can't put your thesis work | into git. Is it a physical thing? Too big for git? | astro-codes wrote: | Why wouldn't this have much value to pharma? Is it because | its application is actually really limited in scope? | dekhn wrote: | there are research groups this would be useful for but | structures are not on the critical path to drug discovery | or approval. | [deleted] | dekhn wrote: | I missed an important detail: """an academic team has developed | its own protein-prediction tool inspired by AlphaFold 2, which is | already gaining popularity with scientists. That system, called | RoseTTaFold, performs nearly as well as AlphaFold 2, and is | described in a paper in Science paper also published on 15 | July""" | | One of the things I say about CASP has to be updated. It used to | be "2 years after Baker wins CASP, the other advanced teams have | duplicated his methods and accuracy, and 4 years after, | everything Baker did is now open source and trivially | reproducible" | | now, it's baker catching up to DeepMind and it took about a year | | https://doi.org/10.1126/science.abj8754 | radus wrote: | Very cool! Great to see this competition between academia and | industry yielding improvements on all fronts. | Cas9 wrote: | Honest question: since AlphaFold doesn't really _solve_ the | protein folding problem (it's NP-complete after all), but only | _approximates_ solutions very well, what are the real impacts of | this? Isn't a good approximation of a protein enough to cause | unexpected problems? How do we know that an approximate structure | will perform the same as the correct solution? | radus wrote: | Yes, it is still useful. Even structures obtained through | traditional means (eg. x-ray crystallography) are | approximations to an extent since there are limits to the | resolution that you can obtain and oftentimes regions of | proteins are "disordered". Additionally, these structures are | only snapshots of a protein in a particular state, which may | not completely reflect the dynamics of the protein in its | native environment. | nmca wrote: | NP completeness tells you about the hardest cases, not the most | useful cases. | thxg wrote: | > (it's NP-complete after all) | | Protein folding is a physical/biological phenomenon. AFAIK we | don't currently have a proper exact mathematical formulation of | the problem that would let one determine its complexity. | | You may be referring to this paper [1]. It only claims that one | particular optimization problem, believed to give a solution to | protein folding problems, is NP-hard. So, even if a suitable | exact formulation exists, it is not yet proven that protein | folding is hard, although it for sure seems plausible. | | By the way, it is perfectly possible today to solve some very | large-scale NP-hard problems (think millions of variables and | constraints) in reasonable amounts of time (think minutes or | hours). Examples are knapsack problems, SAT problems [2], the | Traveling Salesman Problem [3] or more generally Mixed Integer | Programming [4]. | | [1] "Complexity of protein folding", 1993, by Aviezri S. | Fraenkel | | [2] http://www.satcompetition.org | | [3] http://www.math.uwaterloo.ca/tsp/ | | [4] http://plato.asu.edu/bench.html | hobofan wrote: | I would expect that once AlphaFold has helped you identify a | potential protein (e.g. as a drug) out of a bigger set of | potential proteins, there will still be a manual step of | traditional cryoEM, NMR, etc. to get an accurate high- | resolution structure. | t_serpico wrote: | To me, the interesting thing is not the specific results but | rather that you can accurately predict crystal structures from | sequence alone. This begets the question: what other physical | biological properties can we predict? | saithound wrote: | AlphaFold is not about solving any kind of NP-complete problem. | | Proteins consist of chains of amino acids which spontaneously | fold up to form a structure. Understanding how the amino acid | chain determines the protein structure is highly challenging, | and this is called the "protein folding problem". | | People use mathematical models to predict how proteins fold in | nature. Many such mathematical models are stated in terms such | as "proteins fold into a configuration that minimizes a certain | energy function". Even the simplest such models [1] give rise | to NP-hard decision problems, which are also known (somewhat | confusingly) as "protein folding problems". To make this a bit | less confusing, I will call the mathematical decision problems | PFPs. | | Like all mathematical models, our protein folding models don't | correspond exactly to reality. Even if you are somehow able to | determine the exact mathematical solution to a mathematical | PFP, that _still_ doesn't guarantee that the real protein that | you were trying to model behaves like the mathematical solution | would indicate. E.g. the protein may fold in such a way that it | gets stuck in a local optimum of the energy function you were | using. | | How do we detect this? We make inferences about how the protein | should behave, given the mathematical solution to the Protein | Folding Problem, and then we perform experiments, and find out | (empirically) that the protein behaves in a manner that is | inconsistent with the inferences drawn from the mathematical | model. Scientists _do_ do this. And they would have to do it | even if they had a fast, exact way to solve NP-complete | problems, because the NP-complete problems are still just part | of a mathematical model, and need not correspond to reality in | any way. | | The success of AlphaFold is not measured by how well it solves | (or approximates) mathematical PFPs. The success of AlphaFold | is measured by making successful predictions about how certain | proteins will fold. And this is exactly how it was tested [2]: | they threw it at a bunch of problems for which scientists have | empirically determined how certain amino acid chains fold, but | didn't release the results. And then they compared the | solutions predicted by AlphaFold, and found that most of the | predictions were consistent with what they knew to be the | case.* | | [1] https://en.wikipedia.org/wiki/Lattice_protein | | [2] https://predictioncenter.org/casp14/index.cgi | | * That's an understatement. The solutions were really very | good, much better than those produced by any other submission | to CASP14. | whimsicalism wrote: | You want to find a protein that has X structure (since | structure determines function to a degree). | | If AlphaFold is substantially more accurate at solving | proteins, it can mean that drug discovery is faster, assays are | faster, etc. etc. | | The "unexpected problems" would be caught in the assay stage. | radus wrote: | Kind of disagree with this.. solving protein structures is | not the rate limiting step in drug discovery or in | biochemical assays -- not by a long shot. See this excellent | comment by @dekhn on a related submission: | https://news.ycombinator.com/item?id=27849046 | dekhn wrote: | The protein folding problem is not NP complete. The "formal" | protein folding problem, as posed (find the set of dihedral | angles whose resulting structure has the lowest energy) might | be, but that bears only a distant resemblance to how people | "solve" the problem today. At the very least, the statement is | incorrect because many proteins don't actually fold to their | energy minimum, they get stuck in kinetic traps, and the formal | PF defintion never accomodated that idea. | bawolff wrote: | I dont know much about protein folding, but for most things in | life,exact solutions to NPC problems usually aren't needed for | non-contrived problems. In many cases, approximations are good | enough. | | Besides, this is real life - if predictions and real life | match, that's great. If they don't, well you know you went | wrong somewhere. | wpasc wrote: | A very-non-expert opinion, if an approach approximates it | pretty well and can be improved upon, then it could end up | being quite useful. Given that biology exists on a real, | tangible scale then perfection in the fold prediction isn't | necessary, instead just an approximation that is sufficiently | good to be functionally useful. | | ^ That sounds like word-salad BS but I think there's some truth | to it. I know protein folding has been postulated to be useful | in terms of understanding basic biology, understanding disease | pathology, and drug prediction. While a wide range of | approximations are functionally useless, perhaps the Alphafold | approach or some improved version of it surpasses the | functionally useful threshold. | | At least I hope so | ashtonbaker wrote: | Not really an answer to your question, but is the problem | really NP-complete, or just combinatorially difficult? For | example how is this condition of NP-completeness satisfied? | | > it is a problem for which the correctness of each solution | can be verified quickly [0] | | [0] https://en.wikipedia.org/wiki/NP-completeness | Cas9 wrote: | According to this answer[0] it seems it's actually NP-Hard, | my bad. Haven't seen the proof though, and I'm not an expert. | | [0] https://cs.stackexchange.com/questions/128493/is-protein- | fol... | mrfusion wrote: | Is it really np complete? If so we could map other np complete | problems onto it and let biology solve it for us. | nextos wrote: | Alphafold 2 is very very cool, but we need a little dose of | reality. It's still a bit away from really solving protein | folding as it was marketed. | | For example, multi-complex proteins are not well predicted yet | and these are really important in many biological processes and | drug design: | | https://occamstypewriter.org/scurry/2020/12/02/no-deepmind-h... | | A disturbing thing is that the architecture is much less novel | than I originally thought it would be, so this shows perhaps one | of the major difficulties was having the resources to try | different things on a massive set of multiple alignments. This is | something an industrial lab like DeepMind excels at. Whereas | universities tend to suck at anything that requires a directed | effort of more than a handful of people. | dekhn wrote: | many of these resources are available, it's mostly that | academic scientists don't have the time, money, or expertise to | manage large datasets. However, the community has maintained | high quality MSA database for decades and that's exactly the | work that DM drafted off. | gnufx wrote: | > academic scientists don't have the time, money, or | expertise to manage large datasets | | I may be cynical about general expertise, as a support | person, but large datasets have long been stock in trade of | areas I'm more or less familiar with, whether "large" is TBs | or PBs like CERN experiments. (When I were a lad, it was what | you could push past the tape interface in a few days -- data | big in cubic feet...) | dekhn wrote: | Tape is worthless except for archival purposes (and it's | not particularly good). it should not be the constraint on | the dataset (IE, any important dataset should already be in | live serving with replication). | | Very few players wrangle petabytes effectively. Many | players _have_ petabytes, but they 're just piles of | disorganized data that couldn't be used for training ML. | Moving petabytes is still a huge pain and few folks have | proficiency in giving ML algorithms high performance access | to the data. | zamalek wrote: | I'm genuinely curious: could the output of Alphafold be fed | into a classical folding algorithm (as a starting point), or is | the output of Alphafold too far down the wrong path, in these | cases? | sbierwagen wrote: | >A disturbing thing is that the architecture is much less novel | than I originally thought it would be, so this shows perhaps | one of the major difficulties was having the resources to try | different things on a massive set of multiple alignments. | | A similar concern has sparked some worries about "AI overhang" | https://www.lesswrong.com/posts/75dnjiD8kv2khe9eQ/measuring-... | | Most of the compute in ML research seems to be going into | architecture search. Once the architecture is found, training | and net finetuning/transfer learning is comparatively cheap, | and then inference is cheaper still. This implies we could see | 10-100x gains in AI algorithms using today's hardware, or | sudden surprising appearance of AI dominance in an unexpected | field. (Object grasping in unstructured environments? Art | synthesis?) A task could go from totally impossible to trivial | in a year. In retrospect, the EfficientNet scaling graph should | have alarmed more people than it did: | https://learnopencv.com/wp-content/uploads/2019/06/Efficient... | | Waymo has been puttering along for years, not announcing much | of interest. This may have caused some complacency about self- | driving cars, which is a mistake. Algorithms only get better, | while humans stay the same. Once Waymo can replace some human | drivers some of the time, things will start changing very | quickly. | timr wrote: | > A disturbing thing is that the architecture is much less | novel than I originally thought it would be, so this shows | perhaps one of the major difficulties was having the resources | to try different things on a massive set of multiple | alignments. This is something an industrial lab like DeepMind | excels at. Whereas universities tend to suck at anything that | requires a directed effort of more than a handful of people. | | Yeah, the HN commentary on Alphafold has a high heat-to-light | ratio. I'm eager to read the paper _because_ the previous | description of the method sounded remarkably similar to methods | that have been around for ages, plus a few twists. | | The devil is going to be in the details on this one. | TaupeRanger wrote: | That's the case with basically everything DeepMind does. They | have a very good PR department which hypes up everything they | do while conveniently ignoring that basically nothing of any | practical consequence has come of their endeavors. But I do | think it's important that these companies exist now so we can | see what _not_ to try going forward. | timr wrote: | Well, the CASP14 results do speak for themselves. Protein | structure prediction is not necessarily of great meaning to | drug discovery or biology, but they pretty much blew | everyone else out of the water in a fair contest. For that | reason, they deserve praise. | | It's a little like making a robot that is very, very good | at something pointless (say, using a yo-yo). Who knows | where it might lead, but if they make the best damned yo-yo | bot in the world, they deserve whatever praise they get | from the yo-yo community. | MrsPeaches wrote: | > high heat-to-light ratio | | Sorry for the ignorance but what does this mean? | AlexCoventry wrote: | Emotion-to-understanding ratio | butMuhCulture wrote: | It's trying to say light is more valuable than heat, or | some such folksy thing. I cook steak in the dark so I don't | find it to be a very insightful metaphor. | Azrael3000 wrote: | Incandescent light bulbs are generally very inefficient in | producing light, compared to LED for example. They produce | a lot of heat and not much light for which they are made. | | So in this context I suppose that gp implies that these | threads don't provide much meaningful discussion but rather | lots of hand waving. | HPsquared wrote: | Light is also often used in metaphors relating to | knowledge, wisdom etc. | dekhn wrote: | "Fiat Lux" not "Fiat Calor" | timr wrote: | It's an idiom implying that there's a lot of chatter and | bold claims, but very little of it is factual or | informative. | dm319 wrote: | The key difference seems to be using the multiple alignments | and assumption about evolutionary conservation? Useful for | genes conserved, but less useful for de-novo proteins (like | COVID and cancer) I guess? | timr wrote: | Dunno yet. MSAs were always a key input to Rosetta | (previous best method). How they were used was very | different. | | Fundamentally, everything in this space (= non-physical | methods) is about inferring structure from things that are | closely related. And you can't solve the problem at all for | non-trivial proteins using physics, so here we are. | pjfin123 wrote: | I'm assuming you can't run this on any consumer computer? | pjfin123 wrote: | Nevermind | | > The simplest way to run AlphaFold is using the provided | Docker script. This was tested on Google Cloud with a machine | using the nvidia-gpu-cloud-image with 12 vCPUs, 85 GB of RAM, a | 100 GB boot disk, the databases on an additional 3 TB disk, and | an A100 GPU. | sambroner wrote: | That's... way closer to consumer than I expected | qeternity wrote: | For inference... | | Still accessible, but expensive to run at scale. And | training even worse. | lifthrasiir wrote: | Except for (DGX) A100. | erhk wrote: | 2.2TB data | dekhn wrote: | which is basically nothing. They could put it in a cloud | bucket and you could copy it to another bucket in minutes. | lasagnaphil wrote: | Nah, 4TB disk drives are not that expensive. | crazysim wrote: | Amazing. That's not a lot of libraries of congresses at all. | fossuser wrote: | Does anyone on HN work in bio or drug discovery? | | Could you give an overview of how people can leverage this (or | how you might?). | | From reading around about it, it sounds like there's often a need | to find a certain type of molecule to activate/inhibit another | based on shape and the ability to programmatically solve for this | makes the searching way easier. | | Is this too oversimplified/wrong? How will this be used in | practice. | | [Edit]: Thanks for the answers! | timr wrote: | > Could you give an overview of how people can leverage this | (or how you might?). | | Short answer: nobody knows. Traditionally, protein folding is a | solution in search of a problem, but that's largely because the | predictions were...unusably bad. This was always more of a | super-difficult validation problem for the force fields and | simulation methods, which could then be used for other problems | of greater value (such as rational protein design, or | simulation of the motion of proteins with known structures). | | These predictions are better, but still pretty far from the | level of precision that you'd want for any kind of rational | drug design, where the exact locations of protein side-chains | (for example) matter a lot. You'll note that AlphaFold returns | structures that are "relaxed" using one of the oldest | simulation systems for proteins: AMBER. So it's not exactly a | clean-room solution to the problem, and you can't assume that | the details (which matter to drug design) are going to be any | better than for the older methods. | | But that said, if you have a method that can _reliably_ give | you a blurry view of the overall shape of a protein, even that | could be useful for things like target discovery or inference | of biological networks. But this is still a lot closer to pure | research than "revolutionizing drug discovery", as is | frequently batted around on reddit, HN and the press. | dekhn wrote: | Also I would say that really they just made improvements to | protein structure prediction, not _protein folding_ which is | the dynamic process by which proteins reach their equilibrium | fold. | timr wrote: | Most definitely. | dumb1224 wrote: | I work in cancer research with a drug discovery focus in a lab | with some structure biologists. My understanding is that if we | identified proteins targets suitable for therapeutics then | understand its structure to identify secondary binding sites | could be crucial for drug discovery. Drugs can then be designed | to modulate its biological functions. | COGlory wrote: | You can't do intelligent drug design if you don't know what the | target protein looks like. We've gotten great at solving | protein structures with things like crystallography and cryo-EM | microscopy. Unfortunately, many interesting drug targets reside | in the membrane of a cell, which means you can't easily work | with them in a lab because they aren't soluble in anything but | a plasma membrane. For instance, this is an issue with the | 5HT2A protein, a g coupled protein receptor that is implicated | in many serotonin related pathways. | | Being able to predict what it would look like would be a huge | deal because then you can go about intelligently designing | drugs for it. | ponsko wrote: | You should check out Salipro (https://www.salipro.com/) for | membrane protein reconstitution. | dekhn wrote: | I've worked in bio and drug discovery for some 25 years. That | includes building classifiers using gradient descent in the 90s | (when algorithms, computers and data were all much worse). I | ported DOCK to Linux in ~96 or 97. Since then I built an | academic and then industrial career with some emphasis on using | computing to solve problems in drug discovery, but I don't play | that role any more. | | It doesn't look like the models produced by this would | immediately turn the challenging problem of finding, approving, | and marketing successful pharmaceuticals (IE, it doesn't | eliminate any real bottleneck). | | There was a long-term dream of structure-based drug discovery | based on docking, but IMO, it has never really proved itself | (most of the examples of success are cherry picked from a much | larger pile of massive failures). | miltondts wrote: | > ... but I don't play that role any more. | | I was thinking of going into that field. Can you expand a bit | on why you left? | dekhn wrote: | Because programming computers is far more lucrative, and | I'm better at it. However, if I had an unlimited budget I | would return to biology. | | I spent 15 years trying to be a professor and failed | miserably. I was bad at it and didn't like what professors | have to do. | | I then moved to industry to be a random engineer and | thrived doing things entirely unrelated to drug discovery. | Eventually, I convinced my company to invest heavily in | life sciences. This was successful and I was on track to be | a powerful player (a "research engineer", just like the DM | folks who are building these things) in this space, when | the project got very popular and I was elbowed aside by | others who are more aggressive. So I went back to being a | programmer again, it's much less stressful, pays better, | and realistically, much of my time is just telling | scientists what I would do if I was in their place anyway. | | "Don't swim with the sharks if you don't like being bitten" | gnufx wrote: | > much of my time is just telling scientists what I would | do if I was in their place anyway. | | That sounds familiar. I guess they mostly don't listen, | whatever your record -- especially if it was in a | different field they could learn from -- but I hope it's | not always like that. | yudlejoza wrote: | Most comp-biologists who work directly with programmers | are some of the biggest jerks, and the least qualified | tech folks. | | They hide all of that under "I'm a scientist, you're | not". | fossuser wrote: | Maybe a culture clash? Academia is all about status and | prestige - more often scientific outcomes seem to be a | means to get the former (why journals don't publish | negative results, why studies fail to replicate, why | stuff isn't open access, why people worry about getting | scooped, etc.) | | Tech (at its best) hates credentialism (sometimes I think | to a point of over-correction). | | That said, 80% of the devs in the bay area seem to have | gone to Stanford or MIT, so... | nick238 wrote: | I haven't worked on the drug-side of things, but here my bio | perspective: It's kind of out-of-vogue, but consider the "lock | and key" model of proteins and small molecules (drugs). For | drug design, what you want to do is get a key that fits just | one lock (to pull whatever lever) and not others (to avoid | side-effects). It's relatively easy to find a molecule that | fits a protein, because that protein is what you might spend | years researching and probing, but it's tricky to check if it | does anything against ~100,000 others in humans. If you could | do an _in silico_ computational survey to be like, oh, maybe it | 'll target this accidentally, you could spot-check those _in | vitro_ , and/or stick on some other atoms to your small- | molecule to make it not fit that off-target. | | Holy grail, IMO, though is being able to design _de novo_ | protein sequences (to make "biologics", aka engineered protein | drugs) that can a) target (bind/block/enhance) or do (chemical | reactions) what you want and only that, b) are easily | synthesizeable by bacteria/yeast (cheap to make), and c) are | stable (easy to transport/store). | slownews45 wrote: | First seems reasonable. I've not heard of anything on the | later coming even close credibly - though is an obvious holy | grail. | zosima wrote: | It can be an aid in drug development, and can perhaps assist a | bit in tuning small molecule drugs for more stable binding. | | Though I think the major impacts will be two-fold: | | (1) The field of structural biology is going to see a change, | with much more data available. Some structures of difficult to | crystallize proteins will be solved, which may lead to much | greater biological understanding. We may enter a time, where | once you have a primary sequence, you also have a likely | 3d-structure, which will probably change the daily work of | quite a few biologists a bit. | | (2) Industrial protein design. A tool such as this can | potentially have great utility in optimizing proteins as | chemical catalysts for various processes in different | industries. This includes expanding the conditions under which | a protein is active and also making their conformation more | stable and so the protein more long-lived in solution. | dekhn wrote: | For those that are unaware, industrial protein design is a | multibillion dollar industry. For example, decades ago | Genentech and Dow Corning formed a company that developed | proteases (proteins that cut other proteins) that worked at | much higher temperatures than the ones in nature. This was | then sold to P&G and other major laundry companies (laundry | detergent contains idle enzymes activated by the heat of the | laundry water, and they go clean up. "Protein gets out | protein" was the marketing jingle. | | That was a few billion dollars right there and almost all the | work was done by hand by lab scientists. | [deleted] | Cas9 wrote: | Honest question: since AlphaFold doesn't really _solve_ the | protein folding problem (it's NP-complete after all), but only | _approximates_ solutions very well, what are the real impacts of | this? Isn't a good approximation of a protein enough to cause | unexpected problems? How do we know that an approximate structure | will perform the same as the correct solution? | Ultimatt wrote: | There is a lot of bias in the chat here from a more chemistry | and pharma slant. If you ignore this AlphaFold solves in a very | meaningful way the problem blocking a lot of science | investigation. | | For comparative and evolutionary analysis structure is far more | conserved than sequence. Especially in things like viruses or | anything with a high rate of reproduction like bacteria. Just | knowing the general fold or overall structure is enough to do | structural alignment and tell if two genes are related on that | basis, even if their genomic sequence is completely dissimilar. | Large groups of researchers rely on sequence homology built | from sequences of known structure. | | But AlphaFold works well in new sequence space to far more | accuracy than is needed. If we had an AlphaFold prediction for | every known sequence suddenly the evolutionary relationships | between all genes and even all species would be far clearer. | This on its own unlocks a new foundation to reason about | function and molecular interaction with a wholistic systems | view without gaps in what we can know with some reasonable | assurance. | | For an analogy think of the difference between having books in | different languages describing objects. You know what some of | the book in English might say but you dont even know if the | book in Spanish is even talking about the same things. | AlphaFold is like an AI that transforms all the books into | picture books and now we can use image similarity or have one | person look at all pictures. | devindotcom wrote: | Also announced today was RoseTTAFold from UW's Baker Lab, which | claims nearly the same accuracy at much higher efficiencies. | There's a public server and paper in Science. | | More info here and here: | | https://www.bakerlab.org/index.php/2021/07/15/accurate-prote... | | https://techcrunch.com/2021/07/15/researchers-match-deepmind... | [deleted] | stupidcar wrote: | The model parameters are only available for non-commercial use. | That's a shame, as I presume there might be a lot of medical | startups that would benefit from having this kind protein-folding | tech available. | mikewarot wrote: | Unless I'm mistaken, you could train the model yourself, | starting with a random set of values. In time, your error rates | would be low enough to have a new set of parameters which you | could use however you like. | COGlory wrote: | I am a structural biologist. This is one of the handful of topics | that overlaps with my field here. I'm very excited to play with | this, although it might eventually put me out of a job. | AnimalMuppet wrote: | Here's where I think we need to be going: You go to a doctor's | office, sick. 1) They take a blood sample. 2) They find the | malignant bacteria and DNA sequence it. 3) If it's a known | strain, they know what antibiotics to use on it. 4) If not, | they solve protein folding on the genes. 5) From that, they see | which existing antibiotics would kill it. 6) If none will, then | given the proteins, they have to derive a new antibiotic. | | 1) is easy. 2) might not be - there can be a lot of things in a | blood sample, and finding only the interesting (bad) things | might not be simple. The sequencing part is pretty much solved. | 3) would take a bit of work, but I think it's possible now. 4) | we're getting there. 5) might have a fair amount in common with | 3), but it probably takes some additional work. 6) is... | probably non-trivial. | | That's just one research agenda. There are others. You may have | to move to related work, but I doubt you're going to be out of | a job in this lifetime. | rllearneratwork wrote: | why would it put you out of job? Wouldn't it just become one of | the tools you use? | dekhn wrote: | It would both become a tool he used (to produce initial | structures to fit in density maps) and a tool that used his | or her output (because alphafold requires known protein | structures that are homologous to the one you're predicting). | nikhilsimha wrote: | The implicit assumption you are making is that the demand | increases in lock step with productivity gains. 100x faster | drug discovery, 100x more drugs _need_ to be discovered = > | same number of people employed. | | These correlations do hold for technical fields, but | logically there should be a point beyond which productivity | gains outpace, demand growth / demand could even stop | growing. One should either retool to solve a newer problem | before this point is reached, or hope that the point is not | reached in the span of their career. | | Oil rig builders for example - manufacturing has been | increasingly automated, but the demand for oil rig building | has grown consistently. But they should probably look into | solving other problems given that demand is shifting. | mensetmanusman wrote: | However, complexity for the structures is essentially | unbounded on a time scale of the universe timeframe. | sbierwagen wrote: | >but logically there should be a point beyond which | productivity gains outpace | | The limiting factor on drug approval is clinical trials. | Once every living person is enrolled in a clinical trial, | we will have hit the maximum rate at which humanity can | produce new drugs. | | That might be more than 10x the current rate, but probably | less than 1000x. | dekhn wrote: | In principle you could put people into multiple trials | and gain somewhat additional throughput. Google | implemented putting users into multiple different | experiments (paper by Tang et al) and that made a huge | difference. ___________________________________________________________________ (page generated 2021-07-15 23:00 UTC)