[HN Gopher] AlphaFold: a solution to a 50-year-old grand challen... ___________________________________________________________________ AlphaFold: a solution to a 50-year-old grand challenge in biology Author : momeara Score : 369 points Date : 2020-11-30 13:31 UTC (9 hours ago) (HTM) web link (deepmind.com) (TXT) w3m dump (deepmind.com) | xbmcuser wrote: | This is a lot bigger than people are assuming if protein folding | can be done quickly and cheaply it will trickle down to a lot | more than medicine. It is going to advance bio fuels, food | production and a lot more. | shawnz wrote: | Imagine protein computers or protein metamaterials | leafmeal wrote: | I'm using mine right now to imagine one. | flobosg wrote: | De novo design of protein logic gates: | https://science.sciencemag.org/content/368/6486/78 | sabujp wrote: | so are all these protein folding labs and projects e.g. folding | at home, etc essentially dead projects now? | haolez wrote: | Does this make Folding@Home obsolete? | elevenoh wrote: | So the median accuracy went from ~58% (2018) to 84% (2020) in 2 | years? | | Does 84% == solved? | | Also, any low hanging frut implications for longevity tech? | dekhn wrote: | 100% accuracy is "solved". | randcraw wrote: | Solving the inverse problem would be even more valuable -- | given a specific shape (and other biochemical desiderata), | what sequence of amino acids would create that protein? | | As hard as the protein folding problem is, the inverse | problem is harder still. THAT is the one true grail. | dekhn wrote: | We "solved" this at Google years ago using Exacycle. We ran | Rosetta (the premier protein design tool) at scale. The | visiting scientist (who later joined GOogle and created | DeepDream) said it worked really well "I could just watch a | folder and good designs would show up as PDB files in a | directory". | gfodor wrote: | You can't get 100% accuracy on something for which you don't | or can't know the ground truth. | dekhn wrote: | The protein folding problem is predicated on the idea that | there is a ground truth (a single static set of atomic | coordinates with positional variances). If your point is | that even experimental methods can't truly reach 100% (due | either to underlying motion in the protein, or can't | determine the structure), that's more or less what Moult is | saying (they more or less arbitrarily define ~1A resoution | and GDT of 90 as the "threshold at which the problem is | solved"). | liuliu wrote: | The article implies that the "ground-truth" (experimental | determined) structure has accuracy interval as well. Above 90% | is the same accuracy as what you get from experimental | determined results, hence the "solved" claim. | heycosmo wrote: | Fascinating! AlphaFold (and other competitors) seem to use MSA | (Multiple Sequence Aligment) and this (brilliant) idea of co- | evolving residues to build an initial graph of sections of | protein chain that are likely proximal. This seems like a useful | trick for predicting existing biological structures (i.e. ones | that evolved) from genomic data. I wonder (as very much a non- | biologist), do MSA-based approaches also help understand "first- | principles" folding physics any better? and to what degree? If I | write a random genetic sequence (think drug discovery) that has | many aligned sequences, without the strong assumption of co- | evolution at my disposal, there does not seem any good reason for | the aligned sequences to also be proximal. Please pardon my | admittedly deep knowledge gaps. | flobosg wrote: | > do MSA-based approaches also help understand "first- | principles" folding physics any better? | | Not really. MSA-based approaches, as most structure prediction | methods, have as a goal to find the lowest energy conformation | of the protein chain, disregarding folding kinetics and | basically all dynamic aspects of protein structure. | | > If I write a random genetic sequence (think drug discovery) | that has many aligned sequences, without the strong assumption | of co-evolution at my disposal, there does not seem any good | reason for the aligned sequences to also be proximal. | | I don't think I fully understood this, but I'll give it a shot | anyway. If your artificial sequence aligns with others, there's | a chance that it will fold like them, depending on the quality | and accuracy of the multiple sequence alignment. Since multiple | sequence alignments are built under the assumption of homology | (all sequences have a common ancestor), it's a matter of how | far from the "sequence sampling space" your sequence is located | compared to the others. | heycosmo wrote: | > I don't think I fully understood this, but I'll give it a | shot anyway. If your artificial sequence aligns with others, | there's a chance that it will fold like them, depending on | the quality and accuracy of the multiple sequence alignment. | Since multiple sequence alignments are built under the | assumption of homology (all sequences have a common | ancestor), it's a matter of how far from the "sequence | sampling space" your sequence is located compared to the | others. | | I understand that similar sequences may fold similarly | (although as length increases, I highly doubt it, but IDK). | I'm talking about aligned sub-sequences within one chain and | their ultimate distance from each other in the final | structure. Co-evolution suggests that aligned sub-sequences | are also proximal. But manufactured chains did not evolve, | therefore the assumption is no longer useful. | flobosg wrote: | Oh, I see! Yes, an intrachain alignment of an artificial | sequence does not by itself give any information about co- | evolution, especially since you don't know whether your | protein is actually folding. To assess co-evolution you | need a multiple sequence alignment between protein homologs | containing correlated mutations. | | > I understand that similar sequences may fold similarly | (although as length increases, I highly doubt it, but IDK). | | As long as the sequence similarity is kept between those | sequences, length is not an issue. | | > Co-evolution suggests that aligned sub-sequences are also | proximal | | What do you mean by "proximal"? Close in space, or similar | in structure? | ashtonbaker wrote: | This is a really insightful question and I need to take some | time to fully understand the ensuing discussion. | | If my speculation is correct, then drug discovery should use a | process of genetic programming, using something like this to | score the resulting amino acid sequences. I'm wondering if an | artificial process of evolution would be sufficient to satisfy | the co-evolution assumption here. | flobosg wrote: | > I'm wondering if an artificial process of evolution would | be sufficient to satisfy the co-evolution assumption here. | | In principle yes, if you can generate a significant number of | artificially evolved variants that are folded/functional. | ampdepolymerase wrote: | @dang, please combine the thread with | https://news.ycombinator.com/item?id=25253488 | lawrenceyan wrote: | Earlier post on this with direct results: | https://news.ycombinator.com/item?id=25253488 | chetan_v wrote: | First Nobel prize for AI from this? | aardvarkr wrote: | We'll know ten years from now | xgulfie wrote: | Hopefully they give one out for this, if only so I can say I'm | a Nobel Prize contributor | TRcontrarian wrote: | No way, you were on the team? Congrats. | hoppla wrote: | I am puzzled me about "AI-knowledge". Have we really learnt | anything? Is distilling the knowledge from AlphaFold just as a | hard problem as solving protein folding? | fairity wrote: | If you forgot how to do long division, but still had a | calculator, wouldn't the calculator still be useful? | EGreg wrote: | What happens when AI is better at everything measurable than | humans? | | Better at conversation. Better at making people laugh, and | generate attraction or other emotions, better at motivating them, | and organizing movements, etc. | | Clearly we are not ready for such an efficient system... it would | be a big disruption to all human organizations and relations. It | would start with Twitter botnets and directing sentiment. | WanderPanda wrote: | We indeed stand on the shoulders of a small number of giants! I'm | infinitely thankful for the work DeepMind is doing. Lets maybe | celebrate this accomplishment for one day and start being worried | about big tech again tomorrow. Many of the comments here usually | suggest that we should live in worries and fear but to my | knowledge there is not too much historical evidence for these | kind of companies turning evil. | harperlee wrote: | Not knowing a lot about biotechnology, I read the article and it | sounds great, but how big is this as a gamechanger? Can someone | comment on how big are the implications of this in, let's say, 5 | years from now, on day to day life? Does this mean that biotech | is going to explode? Or just that drugs will come to market | faster, perhaps cheaper for rare diseases, but from the same | industry structure as always? | xyzzyz wrote: | My friend, who is working in crystallization lab, has told me | that she's gonna be claiming unemployment soon, and she was | only half joking. | dalke wrote: | She can still work on complexes, binding modes, and | engineered biomolecules (eg, protein-drug conjugates and | antisense oligonucleotide dimers) where the training data | isn't really there. | _RPL5_ wrote: | The industry process will not change. You still need industrial | biologists to generate and validate AphaFold structures, | interpret the results as part of the bigger picture, and to | finally design the drugs. And, then, of course you still need | to validate the drugs in experimental systems (first the test | tube, then mice, then humans). | | So your second guess is correct - one of the steps is much | cheaper now, which marginally improves the entire pipeline. As | a result, drugs should now arrive to the market faster. | | As a side note, I am curious what happens to the field of | structural biology in 10 to 15 years from now. Every research | university has a large structural biology department with super | expensive Xray/NRM/Cryo-EM machines, and armies of students who | routinely spend 4-6 years of their PhD trying to solve a | structure of a single protein. If AlphaFold works as | advertised, NIH will gradually shift funding to other problems. | | (It was predicted that it'd be taxi drivers, not professors, | that AI got first. Ironic.) | dalke wrote: | > "armies of students who routinely spend 4-6 years of their | PhD trying to solve a structure of a single protein" | | Back in the 1990s, when I worked on structure data, I | remember that at least some crystallizations were easy enough | they could be done as a rotation project. | | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287266/ | suggests that life is now a lot easier than the 1990s. | Quoting the abstract: | | > Macromolecular crystallography evolved enormously from the | pioneering days, when structures were solved by "wizards" | performing all complicated procedures almost by hand. In the | current situation crystal structures of large systems can be | often solved very effectively by various powerful automatic | programs in days or hours, or even minutes. Such progress is | to a large extent coupled to the advances in many other | fields, such as genetic engineering, computer technology, | availability of synchrotron beam lines and many other | techniques, creating the highly interdisciplinary science of | macromolecular crystallography. Due to this unprecedented | success crystallography is often treated as one of the | analytical methods and practiced by researchers interested in | structures of macromolecules, but not highly competent in the | procedures involved in the process of structure | determination. | | Certainly some proteins are extremely hard to crystallize, | and the new single-atom EM work will help a lot. But are | there really "armies of students who routinely spend 4-6 | years of their PhD trying to solve a structure of a single | protein" these days? | | I honestly don't know. I'm sure some do. But if so, that army | is pretty small compared to the vast numbers who more | routinely use crystallography. | t_serpico wrote: | Also, one important thing to realize is that AlphaFold was | trained largely on proteins that we were able to | crystallize. I'd be very curious to see how its performance | fares as a function of 'ease of crystallization'. | _RPL5_ wrote: | You aren't wrong. I got caught up making the comparison | between structural biologists and taxi drivers being ran | out of business by AI, so I ended up exaggerating the work | load that's addressed by AlphaFold. I should been more | precise. | dekhn wrote: | It seems unlikely there will be any large changes in life from | solving protein folding. Knowing the structure of a protein (or | really, its dynamics) is useful for identifying drugs that | bind, but the real bottlenecks n drug discovery and biotech are | elsewhere. | ramraj07 wrote: | If folding and docking, alongwith dynamics simulations, start | getting commodified, that might change things significantly | though. I can already start imagining project workflows that | are significantly streamlined without much thought, god knows | what other scientists would dream up when we reach those | steps | candiodari wrote: | This will allow us to discover much more about the structure of | the cell (of "life") at a before this unprecedented speed. We | should find many, many more mechanisms and targets for | medicine, but it takes 10-20 years to bring a new medicine to | market. | | So in 5 years you'll see exactly zero new medicines pop up. | pmastela wrote: | I agree. The main inhibitor of speed that products of this | advancement will be deployed at will likely be determined by | local policies. Though, given just how profound some of the | impacts on medicine might be, the speed at which they can be | deployed might become a matter of national security (a | healthier population bodes well for a healthier economy which | in turn strengthens national security). Hopefully this | competition shortens the time-to-market for all these new | medicines. | piyh wrote: | No new medicines, but way more biotech tools. Higher yield | GMO plants, foundational research into disease, science | backed recommendations for lifestyle changes to avoid disease | that previously eluded us, some crazy stuff happening in | animal models. The progress in biotech the past 20 years | makes moore's law look slow. | nabla9 wrote: | Getting from DNA structure from tissue samples is relatively | straight forward. DNA -> RNA -> unfolded protein is basically | one-to-one mapping in most cases. How protein functions depends | on how it folds into itself. Once you solve protein folding, | you can take DNA sample and see the structure of the molecule | without working in lab using crystallography techniques. | | Solving protein folding is huge, Nobel in chemistry scale | achievement. It would be massive leap for biochemistry. | | It seems that Deep Mind solved competition benchmark and made | huge leap, but it's just partial solution that works on limited | set. | | After you have solved protein folding, there is still problem | of solving chemical interactions between molecules accurately. | Quantum chemistry is extremely compute intensive. | ivalm wrote: | This is still for proteins that fold without chaperons, but I | guess it does cover a lot. | comicjk wrote: | The most accurate technique in computational drug discovery is | protein-ligand binding prediction (https://blogs.sciencemag.org | /pipeline/archives/2015/02/23/is...). Given the protein | structure, you can predict which molecules will bind with it, | even for molecules which have never been sythesized. Many | protein targets have not been amenable to this because we don't | know what the potential binding pockets look like. That set of | proteins will now drastically shrink. We're going to have a lot | of new drug candidates, and with any luck new drugs, come out | of this. | shoguning wrote: | IMO, this is huge. One of the biggest applications of ML to | science that I know of for sure. People used to manually | crystallize proteins at great effort to solve for structures. | | Of course, there is a caveat. The static, crystallized | structure is only one aspect of a protein. The dynamic behavior | dissolved in H2O, at different pH, different ionic strength, | with different ligands/cofactors are all also important, and | not (afaik) directly addressed by this research. | fabian2k wrote: | Protein folding is a big and important problem, so this is | certainly big news if it works as well as it seems. But I | wouldn't assume that this changes everything, we can already | determine how proteins fold by experimental work. The | disadvantage is that this is a lot of work, though the methods | there also improved a lot. | | One question is how robust the predictions are that DeepMind | produces. I would also assume that right now it can't e.g. | determine protein structures in the present of other small | molecules, or protein complexes. A lot of the interesting stuff | lies in the interactions between molecules. | | And in general in life sciences any new development will take | at least a decade until it hits day to day life, likely even | more. We're living with a exception to this rule right now due | to the pandemic, but in general things take quite a bit of time | in that space. | gulperxcx wrote: | but how would this affect day to day life, though? Not how | long you think it will. | derefr wrote: | We can already determine how _a few_ proteins (170k -- which | sounds like a lot, but which is only 0.09% of all currently- | catalogued protein sequences) fold by experimental work. | | What an accurate model of protein folding allows us to do, is | to take our big database of DNA, predict protein foldings for | _all_ of it, and then stand up a _search index_ for this | database, keying each amino-acid "row" by the "words" of its | predicted protein's structural features. | | We could then, with a simple search query that executes in | O(log n) time, find DNA targets that produce molecules with | interesting structures that might be worthy of study. | | This would, for example, be a game-changer in how | biopharmaceutical macromolecule-therapy R&D is conducted. | Right now we have to notice that some bacterium or another | produces some interesting protein, _and then_ engineer a | bioreactor to get more of that protein. With this tech, we | can work backward from an _entirely hyothetical, under- | specified_ "interesting protein", to figure out what | catalogued-but-unstudied DNA sequences produce never-before- | catalogued proteins that fit that particular functional | "shape", and therefore might do the interesting thing. Then | we can either directly synthesize that same DNA, or find the | organism we originally sampled it from and study it more. | btilly wrote: | _We can already determine how a few proteins fold by | experimental work._ | | Where "a few" is around 0.1% of the known 180 million | proteins. So a relative few and a whole lot. | | But the catch is which proteins could we figure out by | experiment, and which not. In particular membrane proteins | are hard to experimentally determine. But knowing how they | fold is very important for figuring out how to get things | to react with or get through membranes such as cell walls. | Which is an important problem for everything from | understanding how viruses work to targeted delivery of | drugs. We now have a way to find those structures. | fabian2k wrote: | "A few" does appear quite dismissive of the enormous | amounts of effort in structural biology so far. There are | more than 170,000 structures in the PDB right now. | | To determine potential targets for drugs we have to | understand what the proteins do. Having the structure is | not really enough for that, it doesn't tell you the purpose | of the protein (though it certainly can give you some | hints). | | In most cases the proteins were determined to be | interesting by other experiments, and then people decided | to try and solve their structure. So the structures we | already solved are also biased towards the more | biologically relevant proteins. | entropicdrifter wrote: | 170,000 is three orders of magnitude less than the number | of recorded protein sequences. I don't think it's | dismissive to describe that as comparatively few. | flobosg wrote: | Structure is much, much more conserved than sequence. In | other words, protein sequences with low sequence identity | can fold similarly due to the physical constraints that | guide protein folding. | ClumsyPilot wrote: | I don't know the field, and I understood 'a few' as like | a dozen, certainly not in the thousands. | | Anyone uninitiated with think the same, and thise already | informed. Well, they are already informed. | ALittleLight wrote: | I also don't know the field and the opposite concern is | that 170,000 sounds like a lot, but, apparently, it's a | relatively small amount compared to the number of | proteins there are. It makes sense to me to refer to it | as a small number - e.g. "That hard drive is tiny." "No, | it stores several million bytes..." | derefr wrote: | 170k is "a few" compared to 180 million (i.e. the size of | the PDB as soon as someone runs AlphaFold over everything | in the UniProt.) | | > In most cases the proteins were determined to be | interesting by other experiments, and then people decided | to try and solve their structure. | | Yes, that's what we're doing _right now_ , because | structure is not a useful predictor, _because_ we don 't | have structure available in advance of studies on the | protein itself. There was no point to a "functional | taxonomy" of proteins, because we were never trying to | predict with protein-structure as the only data | available. | | In a world where protein structure is "on tap" in a data | warehouse, part of the game of bioinformatics _will_ | become "structural analysis" of classes of known- | function proteins, to find functional sub-units that do | similar things among all studied proteins, allowing | searches to be conducted for other proteins that express | similar functional sub-units. | Rochus wrote: | It's a step forward for sure, but structures change over | time to perform their function. The method described here | only returns a static structure. Much more research and | development is needed to be able to predict the dynamic | behavior and interplay with other proteins or RNA. | AlexCoventry wrote: | > as soon as someone runs AlphaFold over everything in | the UniProt | | It'll take a while before those results can be trusted, | though, right? There's probably a selection bias in the | training data for proteins which are easy to crystallize, | so many proteins probably aren't well represented by the | training examples. | fabian2k wrote: | Determining what a protein structure does might be even | harder than folding. Right now we can't really do that ab | initio, you have determine the activity in the lab and | then look at the structure. And that allows you to | potentially identify this motif in other proteins. | | If someone produces an AI that you give a sequence and it | tells you what the protein does exactly, I'd be extremely | impressed. I don't see that happening soon. | | The specifics matter a lot here. We can often determine | rough functions for subdomains by homology alone. But | that really doesn't tell you the full story, it only | gives you some hints on what that protein actually does. | jeffxtreme wrote: | Five years ago, I would have said the following: | | "If someone produces an AI that you give a sequence and | it tells you the protein conformation, I'd be extremely | impressed". | | Sure there are many more things to solve in this space; | but that doesn't take away that this is an impressive | achievement and does unlock quite a few things (including | making more tractable the problem you just brought up). | I'm excited to see what DeepMind works on now and what | the new state of the world will be just five years from | now. | fabian2k wrote: | I think I have to clarify that my response was to a large | part to the "this will change all our lives" part, and | might look too negative on its own. I'm very, very | impressed by these results, but that still doesn't mean | that we just solved biology. If this works that well on | folding, this could mean that a lot of other stuff that | simply didn't work well in silico might come into reach. | | I'm maybe overcompensating for the tech-centric | population here, with some comments speculating for very | near and drastic impacts from discoveries like this. | Biology and life sciences are much slower, and there's | always more complexity below every breakthrough. That | does tend to push me towards commenting with the more | skeptical and sober view here. | whatshisface wrote: | My understanding of this is not perfect, but wouldn't | answering the "actually does" question require a full | biomolecular model of the cell, or even the whole | organism? If so I see what you mean. I suppose that it | might be possible to get around this by improving the | theory of catalysts so that you could look at a site and | say, "oh, this will act in such a way..." Dynamic quantum | simulation of a few atoms at the active site is hardly | easy but a far sight easier than the other. | ghostpepper wrote: | This does indeed sound like a game changer then, if true | IgniteTheSun wrote: | Considering that this system "uses approximately 128 TPUv3 | cores (roughly equivalent to ~100-200 GPUs) run over a few | weeks" to determine a single protein structure, making | predictions for all proteins encoded in a human genome | seems impractical at this stage. With luck, this advance | will help lead to discovery and definition of new folding | rules and optimizations that will make protein folding | predictions for the whole human genome more tractable. | mrDmrTmrJ wrote: | I think it is possible to make predictions for all | proteins encoded in the human genome. Perhaps you misread | a very long and confusing sentence? | | Background, Neural networks have two modes 1) training - | where you learn all the model weights and 2) inference - | where you run the model once on new data. Training takes | takes a long time, because you're computing derivatives | to implement updates rules on millions or billions of | parameters based on iteratively examining massive | datasets. Inference is extremely fast because you're just | running matrix multiplies of those parameters on new | data. And TPUs/GPUs are specially designed to compute | matrix multiplies. | | The article said: "We trained this system [...] over a | few weeks." I searched for, but did not see them identify | the inference time. I do expect inference time to be well | under one second, though I'm not personally experienced | with running inference on this type of network | architecture. | | For comparison, GPT-3 and AlphaStar have month long | training times and real-time (sub-second) inference | times. | Rochus wrote: | Still much faster than synthesizing the protein and then | doing NMR or cristallography to solve the structure | puzzle what easily takes half a year or more (and very | expensive equipment). | sanxiyn wrote: | That's training time, not inference time. | [deleted] | foota wrote: | My reading based on context was that this was time to | train, not time to predict. | FredFS456 wrote: | There are post-translational modifications to proteins. | This means that for many (most?) proteins, the amino acid | chain sequence is different from what you would predict | from the DNA. These modifications are dependent on the | state of the cell at the time of translation, and so cannot | be predicted from the DNA alone. This means that even with | a 100% accurate folding model, we cannot simply know the | shapes of all the proteins inside the human body based on | the genome. | carlob wrote: | Here is another interesting approach in synthetic protein | building: | | https://science.sciencemag.org/content/369/6502/440.abstrac | t | baybal2 wrote: | One young lady I knew worked on neural algos recognition of | X-ray images. | | They always had single digit, bizarre artifacts, where the | program can't sometimes recognise the very data it was trained | on with most minute differences. | | Other artifact was that the most "stereotypical cases" were | least reliably recognised, and they hot a lot of flak for | screwed up live demos, where a radiologist put a very, very | obvious tumor shot onto the scanner, and it didn't work without | a half an hour of wiggling the film, and a camera. | | The "bruteforce" solution may well be always, 80-85% off, but | off consistently, and always. NN algo so far beat them, but | fail with double digit frequencies on "artifacts" which they | themselves can't do anything about. | | How well it deals with the later, is what I believe will | measure its real world usefullness. | tuatoru wrote: | I agree. The failures have to be explicable if we are to | trust a model. | asah wrote: | Doesn't it depends on the application ? i.e. some | applications can tolerate false positives/negatives ? | baybal2 wrote: | May well be, but if you spend more compute, and human | time checking for those corner cases than if you went | with another, more consistent exhaustive search algorith, | then the method looses to it economically. | | This is more the case the more close to bruteforce you | come, like encryption cracking. Imagine, spending years | of HPC cluster time, trying to break a password, while | knowing you have a single digit chance to miss the right | key, in a way which would be completely impossible with | with a conventional solution. | klmadfejno wrote: | I find this disingenuous. Yes, its important that the algos | can perform well on real world data, but the framing of this | post begins with an anecodote about one person who had a bad | model, and implicitly extrapolates that these problems are | generalized throughout all neural nets. | | One could say the same thing about programmers automating a | task, or a number of other trivial examples. I would lean | towards assuming deep mind has competent model validation | teams vs. not, even if data science is hard. | npunt wrote: | In short, a core problem of biochem (the wagon) was just | hitched to Moore's law (the horse). Our understanding of | proteins will now grow exponentially not linearly, helping us | to move up a level of abstraction to higher level biochemistry | and biology problems. | breck wrote: | I never worked directly with protein folding or structure, but | worked a bit in proteomics on teams measuring gene expression | (which you could roughly think of as how much of each protein | is found in this cell). IIRC there are 50,000 - potentially | millions of "kinds" of proteins found in a human, and the | "shape" of most of them is unknown, and that determines a lot | about how they work. | | So imagine you gave an iPhone to someone in the 1800's, they | wouldn't understand how most of it works, but this may be | analogous to them finally figuring out some key aspects of the | transistor. So it's another tool in the toolbelt and like all | good tools will be used in all sorts of unpredictable ways. | | Someone else I'm sure could do a lot better at explaining how | important shape is to understanding the function and behavior | of proteins. | fogleman wrote: | How will this get into the hands of those who could use it? | sanxiyn wrote: | Realistically speaking, if you are a scientist who could use | this and you mailed DeepMind, they will probably run it for | free and send you the result. It would be a good PR. | jeffbee wrote: | Pretty interesting that they only used about $15k worth of | resources (retail price) to achieve this. It's not a technique | that would have been out of reach for other organizations based | only on not being able to afford the compute. | moritonal wrote: | The tech might not be out of reach but the talent pool is. | | Whether it's good PR or not is to be debated, but it seems that | the talent at DeepMind simply can accomplish things other's | can't. | allenz wrote: | That's only for the final model. To find it, they'd need to run | 1,000 experiments, trying many high-level approaches, many | architectures for each component, hyperparameter search, and | multiple seeds. Large machine learning projects need $10M in | capital. | jeffbee wrote: | I bet it's still a lot less than they spent training | AlphaStar. | ducttapecrown wrote: | How much would the labor cost, though? | mdjt wrote: | Based on the going rate of a 32-core TPUv3 slice ($32/hr USD) | running "for a few weeks", isn't this closer to $65k USD? | entropicdrifter wrote: | One could buy 200 GPUs for cheaper, I think that's where the | other comment's price estimate came from. | jeffbee wrote: | It says $1,752/mo for v3-8, so I just multiplied it 8x. | mdjt wrote: | Fair enough, that calculation is still a bit off if they | used 128 cores (16x instead of 8x). Not that it really | matters... | epsylon wrote: | I'm pretty sure that this took more than 1 junior engineer- | month. | seek3r wrote: | Kudos to DeepMind. I'm eager to read their paper. | wespiser_2018 wrote: | This will undoubtably change our understanding of human health | and biology in many impactful ways in the years to come! | | The same information we get through x-ray diffraction will now be | available 100x or even 1000x cheaper, and using this model can | even aid the interpretation of xray diffraction data! | | What excites me most isn't doing what we can do now, for cheaper | (which will surely lead to more effective research methods), but | the potential to gain a systematic view of protein structures, | either across the genome, species, or through time which will | give us a deeper and more fundamental understanding of biology. | mensetmanusman wrote: | This is amazing, if we can simulate multi-protein interactions, | you could imagine in our lifetimes being able to see a fully | computation driven simulation of a human blood cell. That would | be a huge breakthrough. | visarga wrote: | What amazed me most was that they used hundreds of millions of | unlabelled protein scans. This means we can collect massive | data in a new modality, besides the usual suspects: images, | video, audio, text, lidar and sensors. Soon I expect neural | implant data to be massive as well. | | They surely did unsupervised training on raw data and then | fine-tuning on the 170K labelled sequences. I expect the data | volume could be increased by orders of magnitude in the next | couple of years and we'll see a GPT-3 like jump. | hsnewman wrote: | That's kinda a big deal. | dang wrote: | Url changed from | https://predictioncenter.org/casp14/zscores_final.cgi, which | points to this. | 6gvONxR4sf7o wrote: | I hate headlines like "X has solved Y." How often have we see | computer vision and natural language solved at this point, | whenever a model does well enough in a benchmark? Their own | article doesn't even have that headline. This is a massively cool | thing that's happened. Why ruin it with a massively hyperbolic | headline? | falcor84 wrote: | I don't think I ever saw a headline saying natural language is | solved; who's claiming that? | TheRealPomax wrote: | Because only the experts in this field get to tell us, the | laymen, what "solving the protein folding problem means", and | they defined it not as "perfect" but as "more than good enough | to be acceptable as correct result". Which this did. | | X has _actually_ solved Y. That 's not so much "massively | cool", that's historical. | 6gvONxR4sf7o wrote: | I think the "they" you're referring to is only whatever PR | person wrote the headline. Nowhere in the substance of this | (PR!) post does it refer to it as anything but a great leap. | When an expert in the field outside of deepmind says protein | folding has been solved, I'll believe it. | nharada wrote: | It does appear other experts in the field are claiming | this: | https://twitter.com/MoAlQuraishi/status/1333383769861054464 | danaris wrote: | The "solved protein folding" part isn't even in the article. It | appears to be clickbait editorialization by whoever submitted | the link. | cs702 wrote: | Two years ago, after DeepMind submitted its first set of | predictions to CASP (Critical Assessment of protein Structure | Prediction), Mohammed AlQuraishi, an expert in the field, asked, | "What just happened?" | | https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp... | | Now that the problem of static protein structure prediction has | been _solved_ (prediction errors are below the threshold that is | considered acceptable in experimental measurements), we can | confidently answer AlQuraishi 's question: | | Protein Folding just had its "ImageNet moment." | | In hindsight, AlphaFold v1 represented for protein structure | prediction in 2018 what AlexNet represented for visual | recognition in 2012. | dmix wrote: | > However, if the (AlphaFold-adjusted) trend in the above | figure were to continue, then perhaps in two CASPs, i.e. four | years, we'll actually get to a point where the problem can be | called solved, in terms of gross topology (mean GDT_TS ~ 85% or | so). Interesting prediction within. | | It turned out only to be one more year instead of four | (depending on whether getting to the 90~ range is "solved". | | I'm curious to see if AlphaFold can do even better the next two | years. | | Those last mile percentages always tend to be small anyway. | xral wrote: | AlQuraishi's tweet [0] about this: | | > CASP14 #s just came out and they're astounding--DeepMind | looks to have solved protein structure prediction. Median | GDT_TS went from 68.5 (CASP13) to 92.4!!!! Cf. their 2nd best | CASP13 struct scored 92.8 (out of 100). Median RMSD is 2.1A. I | think it's over | https://predictioncenter.org/casp14/zscores_final.cgi | | [0]: | https://twitter.com/MoAlQuraishi/status/1333383634649313280 | elwell wrote: | > https://predictioncenter.org/casp14/zscores_final.cgi | | `.cgi`... we've come full circle | matsemann wrote: | What does that A mean? Never seen our letter been used in a | scientific context. | seslattery wrote: | It's the symbol for Angstrom, a unit of length 10^-10m | https://en.wikipedia.org/wiki/Angstrom | kolinko wrote: | 0.1nm - approximately a size of an atom - used in organic | chemistry often. | smt1 wrote: | It used a lot when systems are examined at the nano-scale. | Metrification and creating a "fubini's theorem" for a | specific problem to measure something (indeed category | theory is useful for building a localized "global wire" | with appropriate "gauges" of interesting where optimization | methods will work (to achieve the non-equilibrium control- | theoretic orient-folds of interesting of whatever "the | soln" is) with enough "space" to "try" pull-backs and push- | forwards as needed (for a class/family of physically | analogous of data). I thinking looking at things trough | Joseph Fourier's eyes is pretty englightening. He sees to | have ideated both the heat transfer problem (and being able | to apply modern methodology by forming distributed or | sparse representations of it, then assessing the non-linear | dynamics of it modern robotics and mathematics senses to | it, which would be very much applying pfaffian dynamics to | me, and being able to know about cohomogies is a blessing | such that the appropriate physical effect where the maximum | likelihood is constained). is important in both scale free | systems, fibers of networks of systems that need to be | localized (this is approximately global sections of global | optimization but then model indentified), mass effect which | require some sort of techno-economic analysis (think the | climate resilience problem) and (historically, I think | COVID will shift that) lack of progress towards applied | coding in the life sciences vs information sciences. What's | pretty surreal to me is that exploring (and documenting | some of interesting blurs between fields), say like math, | physics, statistics, computer sciences, signal processing, | natural language (even of language of scientific | discourse), renormalization methods, naturalizations, | socializations, and what are global/local laws lets you | almost do a approach it as a "reverse Robin Hood" problem. | flobosg wrote: | Angstrom, a length unit. 1 A = 0.1 nm. | softwaredoug wrote: | > I don't think we would do ourselves a service by not | recognizing that what just happened presents a serious | indictment of academic science. | | Much like other fields, I do begin to question the academic | structure to making advances. It appears something is rotten in | the state of academia. Oddly it's academia doing incremental | improvements to existing methods but industry making novel | leaps and bounds... The other major case in point being NLP | codingslave wrote: | Academia keeps employing people who have done well in classes | and within fine bounds. Its a careerist track. Industry cares | about results, its more meritocratic | ac42 wrote: | I think so, too. Linear algebra, control theory and quantum | mechanics haven't gotten us anywhere and ivory towers prevail | as this machine learning solution to a problem in biological | chemistry clearly demonstrates. /s | flobosg wrote: | AlQuraishi described the progress made in CASP13 (2018) as "two | CASPs in one". This one is an even bigger breakthrough. | Seanambers wrote: | I particularly like the rant on pharmaceuticals companies | lack of basic research. My impression has been that medical | progression have been slow for quite some time, nice to see | that there are some truth to that. | | In the end software and tech companies might just eat up the | pharmaceutical industry as well. - It's all just code at some | level. | | The Deepmind team did this with ; | | "We trained this system on publicly available data consisting | of ~170,000 protein structures from the protein data bank | together with large databases containing protein sequences of | unknown structure. It uses approximately 128 TPUv3 cores | (roughly equivalent to ~100-200 GPUs) run over a few weeks, | which is a relatively modest amount of compute in the context | of most large state-of-the-art models used in machine | learning today." | | So it wasn't out of reach for academia, pharmaceuticals, or | others with a bit of resources. | flobosg wrote: | Yeah, it was a big slap in the face. But, to be fair, most | of the scientific and technological advances (sequencing | efforts, structural genomics projects, etc.) that generated | the data used by DeepMind came from academia and, to a | lesser extent, the pharma industry. | sjg007 wrote: | I think the lesson here is that most of the big data | genomic, metabolic, pharmacologic and other research will | _all_ be driven by deep learning. The models themselves | however require 100+ gpus so we are sort of back in that | phase where you need large compute systems to even | compete. A single lab will have issues unless they can | leverage a cloud and then also get grant funding to spend | that money on the cloud compute... which may be difficult | b /c its basically a consumable now and you don't have | any hardware leftover. | throwawayiionqz wrote: | This is the cost of training the final architecture with | all the refinements enabled by years of research. | | These years of research involved trying many different | architectures, many of which received as much or more | compute time than the final system. | | The price of training the final architecture is | meaningless. Researching and training AlphaGo was expensive | but it enabled the ideas and development of AlphaZero which | is more computationally tractable. | | To have any chance, an academic team would need the same | compute resources as what the DeepMind protein folding team | used during the whole development of the architecture | during the last few years, not only the resources used to | train the final system. And I bet this funding is not | available to most if not all academic teams. | mjn wrote: | Even if you try to account for the overall R&D cost, | DeepMind isn't _that_ large an organization by the | standards of biomedical research. It 's very big and well | funded for a _computer science_ research organization, | yes, and most CS departments can 't match its resources. | But the NIH budget is $40 billion, and private | pharmaceutical companies do another $80 billion in annual | R&D. It's interesting that this kind of breakthrough | didn't come from those sectors. | dekhn wrote: | DeepMind is taking advantage of NIH's funding. For | example, Anfinsen who demonstrated that proteins fold | spontaneously and reproducibly | (https://en.wikipedia.org/wiki/Anfinsen%27s_dogma) ran a | lab at NIH. Levinthal (who postulated an early and easily | refutable model of protein folding) was funded by NIH for | decades. Most of the competitors at CASP are supported by | NIH and its investments have contributed to the modern | results significantly. | | That said I think the academic and pharma communities had | engineered themselves into a corner and weren't going to | see huge gains (even thogh they are exploring similar | ideas) for a number of banal reasons. | WanderPanda wrote: | It seems like spending these government funds on creating | new challenges like CASP and ImageNet could have an | enormous ROI. Don't let them try to choose the winner, | just let them define the game | mjn wrote: | That's a good point; this system certainly didn't come | from nowhere! The protein datasets they used also mostly | came out of various NIH-funded projects. | | What I meant to focus on was that I think DeepMind has | less of a pure money/scale advantage in this area than in | some others. In something like Go or Atari game-playing, | there are many academic groups researching similar | things, but their resources are laughably small compared | to what DeepMind threw at it. So you might argue that | they got good results there in part because they directed | 1000x the personnel and compute at the problem compared | to what any academic group could afford. In biomed | though, their peers in academia and industry are also | pretty well-funded. | dekhn wrote: | Personally I think a major part of the secret sauce is | Google's internal compute infrastructure. When I was an | academic, 50% of my time went to building infra to do my | science. At Google, petabytes of storage, millions of | cores, algorithms, and brains were all easily tappable | within a common software repo and cluster infrastructure. | That immediately translates to higher scientific | productivity. | smt1 wrote: | I agree. What's doubly interesting is google's internal | transparency, and open source first policy. I think it's | probable that that effect spreads and creates fly wheel | effects for life, natural sciences, and behavioral | sciences. Keep in mind that they've also absorbed | effectively the R&D side of Bell Labs from a computer | science /distributed computing point of view, gopher is | pretty much that, and also in effect interesting from a | sociological p.o.v, "this is the shifting the resources | of the polyad network problem", or problems caused by | rapid commercializations of the World Wide Web rather | than physics like was originally ideated @ CERN) and | moving to effective effort in other fields, even if it | doesn't happen @ Alphabet. Hell, they could be dismantled | (given the FTC complainants), and probably the resultant | companies would rebuild like paperclips sort of like | Pa'Bell did post-1984. | t_serpico wrote: | You hit the nail on the head here. | [deleted] | asah wrote: | Having recently experienced both, 1000x this. | MaxBarraclough wrote: | Has cloud computing changed this? | dekhn wrote: | Mostly? I left google to work at a biotech startup | working in a related area and found that the big three | cloud providers have built systems that greatly improve | computational science. That said, it's still a lot of | work to get productive, many in the field are really | resistant to changes like version control, continuous | integration, testing, and architecting distributed | systems for handling complex lab production environments. | | Here's an exemplar of how I think it evolved well in a | cloud world: https://gnomad.broadinstitute.org/ | | that project adopts many concepts from google and others | and greatly improved our analytic capabilities for large- | scale genomics. | zaroth wrote: | > _The price of training the final architecture is | meaningless._ | | The research is the giant shoulders you stand on, the | compute cost is the price of the tool you need to do the | present-day work. | | Both are relevant but the shoulder's of giants are | generally more accessible, particularly if we're talking | about published research and not proprietary tech. | | A competing team is not starting from the same place the | DeepMind team started at 5 or 10 years ago. | zaroth wrote: | To expand on this, after fully reading AlQuraishi's "What | Just Happened" post from a couple years ago, was this | point that he made; | | > _I don't think we would do ourselves a service by not | recognizing that what just happened presents a serious | indictment of academic science. There are dozens of | academic groups, with researchers likely numbering in the | (low) hundreds, working on protein structure prediction. | We have been working on this problem for decades, with | vast expertise built up on both sides of the Atlantic and | Pacific, and not insignificant computational resources | when measured collectively. For DeepMind's group of ~10 | researchers, with primarily (but certainly not | exclusively) ML expertise, to so thoroughly route | everyone surely demonstrates the structural inefficiency | of academic science. This is not Go, which had a handful | of researchers working on the problem, and which had no | direct applications beyond the core problem itself. | Protein folding is a central problem of biochemistry, | with profound implications for the biological and | chemical sciences. How can a problem of such vital | importance be so badly neglected?_ | | In short, academia got utterly schooled by a small group | at Google spending a relatively small dollar amount on | compute, using techniques that in hindsight are fairly | described as "simplistic". There's no way around it. | Invictus0 wrote: | I don't think AlQuraishi really hits the mark in his | critique. The mere fact that hundreds or thousands of | people working on a problem for decades doesn't account | for the fact that the field of machine learning has been | growing extremely rapidly over the last decade, the | compute power available has grown exponentially, and the | people working on the problem simply weren't looking at | the problem in the way that the deepmind people were | looking at it. | | If you were trying to get across the Atlantic, this would | be like getting upset at a group of bridgebuilders for | trying to solve the problem by building a bridge across | instead of by inventing the airplane. The approaches are | that different. | flobosg wrote: | > and the people working on the problem simply weren't | looking at the problem in the way that the deepmind | people were looking at it. | | >The approaches are that different. | | I'm not sure if that analogy applies here. DeepMind | wasn't the first group tackling structure prediction with | machine learning. Their success lies in the innovations | that they implemented (predicting interresidue distances | as opposed to contacts, for example). | dash2 wrote: | To be fair, I'm not sure that they are "simplistic" in | the sense that, e.g., writing a neural network to | recognise cat pictures is now simplistic. I don't know | how many people have Deepmind levels of expertise in ML, | or could implement what they have done, but I doubt it is | many, and they are thinly spread amongst many interesting | problems. | craftinator wrote: | > The price of training the final architecture is | meaningless. | | Meaningless in historical terms, but meaningful in future | terms. It's meaningless how long the training took | because there were countless resources spent to get to | that point. It's meaningful in the future, because we | know that training times are fairly short, and iteration | can be done fairly quickly. | beowulfey wrote: | I mean, credit where credit is due. Google employs some of | the greatest names in artificial intelligence and the | DeepMind team had a huge chunk of them working on this | problem. While the _resources_ may have been available, I | don't think any other single institution had the level of | brain power. | mrDmrTmrJ wrote: | Absolutely. The capability to "create" the breakthrough | is extremely rare. Perhaps only DeepMind, OpenAI, and | GoogleBrain can assemble these types of teams. Luckily, | the capability to replicate and exploit the breakthrough | is far more 'common'; though still very rare. | | Excited to see how follow on use of these models, by many | more teams, researchers, and companies plays out over the | next two decades. | | This is a foundational advance! | elcritch wrote: | It also makes one reconsider the notion that monopolies | are _entirely_ bad. This essentially appears to be a | vanity project for Google. Though of course they 'll | benefit from it in many ways, but it's not like they're | doing this as the core product of their service. It's a | pretty awesome achievement. | Ericson2314 wrote: | You've just describe why many Socialists 100 years were | very skeptical of anti-trust as trying to sacrifice | modernity to proper up a romanticized notion of the past | as disaggregated pure-petit-bourgeois capitalism. Really | not that different than the critism of the Luddites 100 | years before that. | | See | https://ilr.law.uiowa.edu/print/volume-100-issue-5/all-i- | rea... | bosswipe wrote: | Imagine we lived in a culture that did not believe | "government is always bad at everything". Government | could then pay Google-level salaries and provide Google- | level resources to the top minds in the world and give | them free rein to tackle problems like this. It's worked | in the past, such as Manhattan project or moon landing. | But I don't think it's doable nowadays because of the | anti-government political culture. Even when government | is fully funding things these days the work has to be | farmed out to private interests. | Supermancho wrote: | > It also makes one reconsider the notion that monopolies | are entirely bad. | | Much like political dictators, they can be exceedingly | efficient and have resources (and authority) to do things | in spite of opposing interests. | | People who faced with the narrative that countries have a | monopoly on a number of aspects of life find monopolies | are not a BAD THING(tm), but that they are bad for a | consumer market - as a monopoly eventually blockades | aspects of the market. | bawolff wrote: | Or to put another way, the kings and queens of yesteryear | funded a staggering amount of beautiful art, etc. | e_y_ wrote: | I think there's some merit to the idea that huge | corporate monopolies have the resources to accomplish | undertakings that smaller companies cannot. But it's | often a what-if, because we don't know what the | alternative might have been. | | Big companies can suck up all the air in the room by | monopolizing talent and making it harder for startups to | pay the kinds of salaries needed for top tier AI | research. Xerox PARC came up with all kinds of | groundbreaking inventions that were never commercialized | (by them). For every invention that comes out of a big | company, it's worth thinking about whether it might have | actually come out faster if it was borne of competition | instead of a side project. Or in the grand scheme of | things, if corporate taxes were higher and the money was | given to a university research lab. | | I think the best results may come from the middle ground. | Smaller/medium companies are so worried about staying | afloat or hitting their quarterly earnings that they have | trouble making long term investments. Large companies are | diverse and profitable enough that they can afford to | blow money on things that might not pan out, but they | don't have the same drive -- and in fact have some | pressure to avoid being "too" innovative because it could | cannibalize their existing products. | generalizations wrote: | Note that Bell Labs is another example of the corporate | monopoly research lab producing things that others | couldn't / didn't. | xzel wrote: | Look at all of the incredible things that came out of | Bell labs during their monopolistic reign. I think a | better way to put it is not all monopolies are bad for | research and progress but many are bad for other social | and economic reasons. Like any position of power, it | depends on how it is used snd who is using it. | soup10 wrote: | It's kind of like a modern day Bell Labs where they have | so much excess profit from adtech that they can fund lots | of "basic research" or the computer science equivalent of | that. | nightski wrote: | Not even a little bit. There is nothing here that would | require Google to be a monopoly to accomplish. If | anything companies become lazy without competition. | | I feel like that is not too far from saying it makes one | reconsider communism because good things can happen with | authoritarian control. | IfOnlyYouKnew wrote: | In a prior(/n) life I worked on Protein folding, and | participated in CASP. | | This was a/the "holy grail" problem of molecular biology, | long thought to be an automatic Nobel. It's somewhat unfair | to characterise developments prior to this as | insignificant. In fact by the time I was working on it, | that "automatic Nobel" was no longer assumed, because the | field had made quite a bit of progress, in many tiny steps | by many different groups, and the assumption was it would | continue in this slog until reaching some state of | sufficiency for practical applications without ever seeing | the sort of singular achievement that would be worthy of | praise and prize. | | Far more went into this breakthrough, obviously, than those | TPU-hours: the development of those TPUs, for example, and | assembling a team that can make use of them. The protein | folding problem requires very little knowledge of biology | or physics to understand and was always pre-destined for | some outsider to sweep. Indeed, there was game that allowed | people to solve structures by intuition alone, and, IIRC, | some 13-year old Mexican kid cleaned everyone's clock some | years back. | | Why didn't some research group do this first? Most of them | just don't have the budget. We were five people, total, | IIRC, and felt pretty rich because we were computer-people | getting the same budget for materials as everyone at our | institution, which was all wetlab, otherwise. So I was a | student being paid $20/h but with a $50,000/p.a. hardware | budget. How many false start does it take before you do | that run with 128TPUs "for a few weeks" that works? If you | blow your budget on one gigantic Google invoice, what's | going to happen to you when it doesn't pan out, and the | whole institute laughs at you? Etc... | | There are quite a few rather good things this problem has | inspired over the years, though. Among them is CASP itself: | the idea of instituting a yearly competition that gives | unequivocal feedback on the state of the field and every | group working on it is rather rare, I believe, and it's | been successful. Indeed, it would seem that CASP was | necessary to attract outside groups like Deepmind, i. e. | deep-pocketed industry groups striving to prove themselves | on a clearly defined problem. Chess, Jeopardy, CASP: maybe | it would be worthwhile to explore not <solving x>, but | <stating X as a problem that attracts Google/IBM/etc.-scale | money> as a superior strategy in some cases. | | There was also folding@home, pioneering the distributed- | donated-computing model, and the aforementioned | gamification of the problem, and hundreds of the most | intricate, custom-tailed, more-or-less insane ideas people | devoted months and/or careers and/or careers of their most | promising post-docs to that didn't pan out. | | Like cellular automata. They don't work for this, trust me. | (Great hit for interactive poster sessions, though) | tonfa wrote: | > So it wasn't out of reach for academia, pharmaceuticals, | or others with a bit of resources. | | How much does hiring a deepmind-like team cost though? | (massively more than the TPU resources?) | | Still within reach of pharmaceutical industry I guess, but | maybe not so easy for academia. | t_serpico wrote: | Also, pharma does not really have a huge incentive to | work on this problem. Solving the protein folding problem | does not automatically translate to new drugs just in the | same way CRISPR or DNA sequencing did not. It's another | tool in the toolbox (which to be clear is a big deal). | Seanambers wrote: | From what I can gather, Google bought Deepmind for 500 | million USD in 2014, they have outstanding debt to its | parent company as of 2019 of 1.3 billion USD. | | And they had income around 100 million in 2019 but it's | all against Google, so looks like a 2 billion +/- 0.5 | operation so far, and who knows if they pay for compute. | | Other articles place the runrate at 500 million per year | in 2019. | | Which means 500 million * 6 years = 3 bn + 0.5 purchase | price. = 3.5 bn. So somewhere in the 2.5 - 3.5 billion | range its seems likely as total cost so far. | | Nevertheless doesn't seem out of reach for a | multinational. | sseagull wrote: | It would still be a significant amount of money for a lot | of companies. | | Remember, we are looking in hindsight that it seemingly | paid off. A few years ago, this was just an educated bet; | only the richest companies with money to burn (from | selling ads) would be willing to take on that kind of a | risk. | TulliusCicero wrote: | That's the cost of running DeepMind as a whole, right? | Which includes all the other stuff they've worked on, | like games. | Seanambers wrote: | Yeah, as far as I can tell, that's the whole lot of it. | mwcampbell wrote: | How far does the similarity extend? Specifically, the big | question for me is whether AlphaFold will be freely available | like ImageNet, or proprietary. | 0-_-0 wrote: | ImageNet is a competition and a dataset, AlphaFold is a | neural network. | ramraj07 wrote: | The competition requires enough revealing about the | methodology for other teams to replicate it so open | implementations are going to be available for sure. | | It also looks like they came up with a brand new jiggling | algorithm which is probably just V1 now, this really changes | things in a significant way! | sanxiyn wrote: | I expect this to be quickly replicated once published. | Training data is public and training compute is not enormous | and AlphaFold of 2018 did get replicated. | dekhn wrote: | CASP typically works this way: one person "wins" by getting | a slightly higher score than everybody else. Two years | later, the top teams have all duplicated the previous | winner's tech, and two years after that, there's a github | you can download and run on your GPU to reproduce | everything. | kxs wrote: | How do you define enormous? "It uses approximately 128 | TPUv3 cores (roughly equivalent to ~100-200 GPUs) run over | a few weeks". Also last time it took about a year for good | replications to pop up. | dragontamer wrote: | A lot of labs have access to the various strategic | supercomputers of the USA. | | Ex: Summit has 27,648 V100 GPUs (and those V100s have | Tensor units). If you're saying that only 200 GPUs are | needed to replicate the experiment, that doesn't even use | up 1% of Summit's available utilization. | elcritch wrote: | A couple of hundred GPU's is well within the reach of | many even moderately well heeled research institutes. | It'd seem that about 3 weeks of compute time with 128 TPU | v3's would be about $170,311.68. | kxs wrote: | But of course that cost would only be for the final | model. Anyway, I think I am just living in a different | world... :-) We could never compete with that | elcritch wrote: | Yah, big grant money. Now the grad students programming | the open source clones will only make approximately | $0.56, or 4.2 Ramen packs, for their effort. ;) | sdenton4 wrote: | Also with keeping in mind that once a good open source | model is available, researchers with less resources can | still use it to fine tune and get new results for far | cheaper than training a new model from scratch. | intpx wrote: | or cryptominers | mrDmrTmrJ wrote: | A year is a fast time to replication in many scientific | fields. | | While substantial, the resources here are well within | reach of many labs, research institutes, and | organizations. For this result this big, I'd guess we'll | have 2-6 additional implementations in the next 18 | months. The problem has been 'open' for 40+ years, so | that's lightening fast! | justinzollars wrote: | I have a Masters in Biology. This was once described as an | impossible problem to solve. A huge achievement. | SubiculumCode wrote: | RIP folding at home? | | EDIT: Just throwing this out there: Are there national security | issues to think about with this? Can it be used to weaponize | computational biology? | flobosg wrote: | Folding@home tackles a related but different problem. They | simulate folding dynamics, i.e. how does a protein reach its | folded structure. | | If AlphaFold gives you a picture of a protein structure, | Folding@home shoots a video of that protein undergoing folding. | mylons wrote: | 12-13 years ago in a classroom the professor for my intro to | bioinformatics class said if you were to solve this problem, you | would win a Nobel prize. Congrats to the team! What an | achievement. | comicjk wrote: | CASP (Critical Assessment of protein Structure Prediction) is | calling it a solution. To quote from the article: | | "We have been stuck on this one problem - how do proteins fold up | - for nearly 50 years. To see DeepMind produce a solution for | this, having worked personally on this problem for so long and | after so many stops and starts, wondering if we'd ever get there, | is a very special moment." | | --Professor John Moult Co-founder and chair of CASP | light_hue_1 wrote: | This is an issue of the more subtle aspects of English. | | "To see DeepMind produce a solution for this" does not imply | something is solved. I can produce a bad solution. I can | produce a really good solution. All without solving a problem. | comicjk wrote: | This is a really good solution. Of course, there's still room | for more research and better methods in the future, but now | computational protein structure prediction can compete with | experiments actually measuring the structure. | dekhn wrote: | It's an improvement- and a big one- but not a solution to the | problem. It mainly shows just how stuck the community had | gotten with their techniques and how recently improvements in | DNNs and information theory methods can be exploited if you | have lots of TPU time. | aardvarkr wrote: | It's officially recognized as a solution. | cambalache wrote: | Well, it's not. Nature does not have a committee sorry. | Proteins are delicate "machines" where even a a small | change in the sequence (and thus the 3D structure) as small | as a few amino-acids would change effectively the structure | and the function of it. On top of that, proteins are | dynamic beasts. In any case, it's a great advance, but DM, | as many companies likes a little bit too much to tout its | own horn. | [deleted] | ClumsyPilot wrote: | I am not sure we are talking about the same thing -i.e. | there is a solution for hunger, but it's not a solved | problem. | mrDmrTmrJ wrote: | This benchmark maybe solved, but simultaneously, there | remain other open problems relating to protein folding | which are unsolved and which may not even have benchmarks | yet :) | | Said differently, there's vast space between having a | great result on a specific benchmark (this) and solving | all interesting problems in a scientific field. | dekhn wrote: | No, it's not. The folks who run CASP gave some nice PR, but | it doesn't mean that protein folding is solved. | dr_dshiv wrote: | "It has occurred decades before many people in the field would | have predicted. It will be exciting to see the many ways in which | it will fundamentally change biological research." | breck wrote: | v2 looks amazing. that jump is even more incredible than the | first. More context from v1 in 2018: | | https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp... | m3kw9 wrote: | Does this obsolete Folding@home? | michaelcampbell wrote: | My question exactly; or Rosetta @ home, or any of the other | protein folding "@home"s. I participate in a few, but would | gladly donate my compute resources elsewhere if this is no | longer necessary. | amelius wrote: | This was also one of the main selling points of quantum | computers. | | Makes you wonder what Deep Learning will tackle next. | Factorization of large integers? | dang wrote: | All: there are multiple pages of comments; if you're curious to | read them, click More at the bottom of the page, or like this: | | https://news.ycombinator.com/item?id=25253488&p=2 | | We changed the URL from | https://predictioncenter.org/casp14/zscores_final.cgi to the blog | post, which has more background info. | kovek wrote: | I've seen you mention this [More] comment a few times now. I | like it, though what if you change the design of the More | functionality? | nathancahill wrote: | Also, what do the traffic stats look like for the | second/third pages of big threads like this one? Pretty steep | falloff? | TrackerFF wrote: | What are the immediate real-world applications of this? Just | asking, because I have very little knowledge in this area. | candiodari wrote: | Given the DNA code for one of the "machines" that run cells, we | can generate an atomic model of that machine. This means we can | "compile" (one part of) the DNA code. It was already possible, | but so slow that entire datacenters would spend months | calculating this for a single protein and even then we can't | use them on the really complex ones at all, necessitating | things like neutron spectroscopy which are totally insane, and | only work on like 1% of proteins. | | This is useful because for example chemical simulation tools | don't run on DNA code, but on atomic models. And also to | produce "images" of the molecules (images between quotes | because most proteins are too small to interact with reasonable | photons, and no interaction with photons means you can't see | them in any way) | | DNA has other parts that are really important but we don't | understand at all yet, where this doesn't help at all. This | applies to sections of DNA sent to ribosomes, to produce actual | molecules. Besides that, there are pieces of DNA that "index" | the DNA, pointers (from one gene to another), triggers (that | for instance start production of an enzyme based on some | external influence, like detection of a marker molecule) and | export markers (that tell you what to do once the protein is | produced, for example, mark a protein to be removed from the | cell, incorporated into the cell membrane, or for instance used | inside the cell nucleus, and there's also one that essentially | says "at this point stop producing a protein and instead couple | the rest of the DNA code to the end of the protein you just | made"). | Rochus wrote: | This is about proteins, not DNA. | shawnz wrote: | Proteins which are coded by DNA. | Rochus wrote: | So what? The DNA only codes for the RNA and amino acid | sequence. Structure determination is yet another topic. | When we determine the protein structure we already know | the sequence. Neither DeepMind has to look at the DNA to | train their DNN. | shawnz wrote: | They are two topics which are both relevant to the | discussion. | | Structure determination is what allows you to see the | purpose/effect of the sequence that the DNA encoded. | Rochus wrote: | Have you read the article? It's about protein structure | determination. The DNA only determines the RNA and amino | acid sequence. But who cares. I will get a bit less work | and citations because http://cara.nmr.ch/doku.php will be | less used in future. | candiodari wrote: | The full chain is DNA -> mRNA -> Ribosome -> tRNA | combinations -> amino acid chain -> protein. | | It's true that in nature there are many steps between DNA | and proteins (this list doesn't even include the steps that | mediate the translation, ie. start it, stop it, slow it | down, ...), but the structure of a protein is fully | determined by the DNA code. | | Protein folding is about you start from the DNA code that | is fed into the ribosome ignoring all the meta information, | and come up with an atomic model (VERY long list like "H | atom at 3.27,2.17,12.18, C atom at 2.87, 2.19, 12.33, | ..."). Now there's a million niceties we've discovered to | make this problem simpler and nicer looking, but that's | what it boils down to. | Rochus wrote: | Thank you very much; almost forgot I did a Phd on the | subject ;-) | | But anyway your answer does not contradict my statement. | What you say belongs to the basics of molecular biology, | but does not justify that DNA should be considered when | determining the structure of proteins. In practice, the | amino acid sequence is always already present. | Rochus wrote: | For the sceptics: if you read the referenced article, you | will see that it is about protein structure determination | by means of deep neural networks. It's not about gene | expression, which is a different topic. What benefit does | it have to respond to the question "What are the | immediate real-world applications of this" by reciting | some molecular biology dogmas from text books mixed with | misconceptions, instead of responding to the real | question? | shawnz wrote: | Nobody is suggesting that this research has anything to | do with gene expression or anything like that. Their | point was simply that we now have better tools to | actually see the meaning/effect of a given DNA sequence. | | Also, there is no need to passive-agressively highlight | your credentials. I already researched them before | replying. | Rochus wrote: | I rather think most people comment without even having a | look at the referenced article. And since when is the | reference to a qualification considered aggressive? If | your doctor hangs his doctor's certificate on the wall, | is he "passive-aggressive"? Pretty weird. | | > that we now have better tools to actually see the | meaning/effect of a given DNA sequence | | Note that the "meaning/effect" of a DNA segment encoding | a protein is known and unrelated to the protein folding | process. The protein gets its conformation after the | translation process. | shawnz wrote: | > Note that the "meaning/effect" of a DNA segment | encoding a protein [...] | | The "meaning" of a DNA segment is not to encode a | protein. The "meaning" is to describe a mechanism in the | host organism (by way of encoding a protein). That is a | complex process which involves gene expression AND | protein folding. | | For example would you say that the "meaning" of some Java | code is to generate bytecode? Of course not, the | "meaning" is to run some algorithm on the computer that | executes it | [deleted] | Rochus wrote: | > _What are the immediate real-world applications of this?_ | | A protein is actually a linear sequence of amino acids, but in | a cell this sequence has a three-dimensional arrangement like a | clew of thread. The arrangement is not random, but dependent on | the specific composition of the sequence (i.e. selection and | order of amino acids) and some other factors. To understand the | function of a protein, we need to know this three-dimensional | arrangement (i.e. structure). Up to now the structure | determination process was mostly manual, complex, time- | consuming (several months up to more than a year) and error | prone. If structure determination by DNN is reliable, this is a | big win for life science. There are still a lot of problems | open: e.g. the structure is not constant over time but there | are "moving parts" in the structure which are important for its | function. | randcraw wrote: | For-profit corporations that value protein engineering will | beat a path to DeepMind's door ASAP, like pharmas. | | Protein conformation prediction is essential when engineering | new small-molecule drug compounds that must 'dock' with the | specific proteins that regulate disease. Knowing how to create | a protein with the precise shape to become biologically active | has soaked up a lot of R&D funding toward pie-in-the-sky | techniques that promise to advance that agenda (like quantum or | DNA computing). | | If this method works as DeepMind says, it will immediately be | adopted by every pharma to assess and tweak the shape of | candidate proteins. | dekhn wrote: | you give pharma too much credit. I had built a previous | system to do something similar to this that produced | excellent results and tried to give it away for free to | Genentech, which ignored me. They said it didn't work for | their purchasing department. | TheRealPomax wrote: | I don't believe you, but I look forward to you showing | proof of this with some links (and if you tried giving it | for free, I assume you just open sourced the whole deal, so | I look forward to a repo link or the like). | dekhn wrote: | I developed the Exacycle system at Google and used it to | publish my work (I wrote that blog entry): | https://ai.googleblog.com/2013/12/groundbreaking- | simulations... | | we offered the service for free to Genentech since I used | to work there and knew they could probably use it to get | some good publications. | | We didn't open source the distributed computing | framework, but the underlying technology (Folding@Home) | is based on gromacs, which is open source. It's the scale | at which it ran, and the processing pipeline for | filtering the results that had the real value. | mncharity wrote: | Additional commentary in Science: | https://www.sciencemag.org/news/2020/11/game-has-changed-ai-... | | (submitted by furcyd : | https://news.ycombinator.com/item?id=25254888 ). | ramraj07 wrote: | The most amazing part: | | > The organizers even worried DeepMind may have been cheating | somehow. So Lupas set a special challenge: a membrane protein | from a species of archaea, an ancient group of microbes. For 10 | years, his research team tried every trick in the book to get | an x-ray crystal structure of the protein. "We couldn't solve | it." | | > But AlphaFold had no trouble. It returned a detailed image of | a three-part protein with two long helical arms in the middle. | The model enabled Lupas and his colleagues to make sense of | their x-ray data; within half an hour, they had fit their | experimental results to AlphaFold's predicted structure. "It's | almost perfect," Lupas says. "They could not possibly have | cheated on this. I don't know how they do it." | dekhn wrote: | If I interpret this properly, they're saying they used the DM | prediction (not an actual model, just a prediction) to do | molecular replacement | (https://en.wikipedia.org/wiki/Molecular_replacement) which | sounds pretty audacious. I see it recently made it into the | literature: https://journals.iucr.org/m/issues/2020/06/00/mf5 | 047/index.h... | pmastela wrote: | Like the old Arthur C. Clark quote goes: "Any sufficiently | advanced technology is indistinguishable from magic" -- | unless it might be cheating in which case throw them a curve | ball. | | Kudos to the DeepMind team for making magic happen. | 14 wrote: | I am happy you mention this. I was reading the article and | thinking "wow the amount of scientific knowledge these guys | need to know to understand what they are doing is way | beyond me". I work in health care and I always talk to | clients about all the cool things they witnessed in their | life. Cell phones, TVs, microwaves are some obvious ones I | like to talk about. I sit and wonder what are the things my | generation will get to look back on and say "I was alive | when that happened". I guess for many of us we will talk | about how the internet was vs what it surely will be in the | future, a shell of its initial glory. | rsiqueira wrote: | "A sufficiently advanced Artificial Intelligence would be | indistinguishable from God." (Way Of The Future - AI | Church) | sleepysysadmin wrote: | I made new years prediction about exactly this. I predicted | folding@home would die despite huge interest again because of | covid. | AlexCoventry wrote: | What's the actual news, here? AlphaFold is amazing, but it's been | around for a while. | typon wrote: | AlphaFold 2. The article specifically mentions it. | AlexCoventry wrote: | Thanks. | lgeorget wrote: | See also the piece in Nature about the topic: | https://www.nature.com/articles/d41586-020-03348-4 | gravy wrote: | Maybe combine with https://news.ycombinator.com/item?id=25254772 | ? | troelsSteegin wrote: | Can anyone (yet) provide a sketch of how this works? I saw a | mention of "attention", which I vaguely take to be a surrogate | for some form of structural information. It's an astonishing | result. How does it work? | lucidrains wrote: | Amazing day for structural biology! If it weren't for the | pandemic, I would be out at the bars celebrating tonight! | postingpals wrote: | Heh, soon you'll be able to do that too when the vaccine comes | out. What a great end to the year. | Havoc wrote: | Does something like Folding@Home still have meaning after this? | empiricus wrote: | I am actually scared. This plus CRISPR means real nanotechnology | is within reach. | marcosdumay wrote: | There is still at least one NP-hard problem on the way, that is | creating a protein with a desired format. | jcims wrote: | I think this is the interesting part because there aren't going | to be the same regulatory hurdles for using ribosomes to | manufacture technology as there are for medicines. Synthetic | organelles that weave fibers, build metamaterials, etc could | lead to pretty magical advances in our capability. | entropicdrifter wrote: | Perhaps we'll live to see The Diamond Age | wrinkl3 wrote: | Can't wait to join a distributed computing bacchanalia. | enchiridion wrote: | My thought as well. I wonder what the world will look like in | 20 years because of this. | | I'm willing to bet it will be staggeringly different than what | most people are expecting. | dynamite-ready wrote: | Far from an expert here, but your comment makes me think of | Michael Crichton's 'Prey', if you've not already read it. Not | that I wish to add to your apprehension. | [deleted] | echelon wrote: | This sounds wonderful and frightening. On the one hand, now we | can engineer drugs at light speed. But wasn't protein folding | supposed to be NP-hard? | | Can deep learning find the cracks in P vs NP? | | Perhaps making clever guesses at prime factors because it learned | some weird structural fact that has eluded mathematicians. | | If we break crypto, there goes the modern world. Banks, bitcoin, | privacy, Internet, the whole shebang. | | (I obviously am _not_ an expert in computational complexity and | hope that some domain experts can chime in and assuage my fears.) | glatteis wrote: | > But wasn't protein folding supposed to be NP-hard? | | Yeah, at least some variations of it are NP-hard. SAT is THE | NP-complete problem, but there are some really good SAT solvers | around. This basically means: They have a solution that mostly | does very well on most instances. But because (probably) P != | NP, you will never have a polynomial time algorithm for this. | sgt101 wrote: | I think that this is a heuristic "near optimal" method rather | than an exact analytic method (I have little to no idea of what | that would be in protein folding). A domain I do understand a | bit which is np-hard is the travelling sales man. Computing an | exact solution is unrealistic, but doing heuristic searches | that get you to 99% of the optimal 99% of the time is | relatively doable. | | But - you don't know that you are 1% from the solution... even | if you are pretty confident that you are. It's quite possible | (unlikely) that you are way off the optimal, but if you have a | decent solution that's ok. | aparsons wrote: | There is probably a team at DeepMind working on cracking simple | crypto. Problem is, it can be difficult to cast the problem | properly/"correcty". How does a one way function get | represented? | Someone wrote: | NP-hard doesn't say how hard it is to solve finite problems. | Even for n = 1,000,000, _O(e^n)_ isn't necessarily problematic, | _if_ the constant is small enough, or if you throw enough | hardware at it. | | This "uses approximately 128 TPUv3 cores (roughly equivalent to | ~100-200 GPUs) run over a few weeks". That is a moderate amount | of hardware for this kind of work, so it seems they have a more | efficient algorithm. | | Also, this algorithm doesn't solve protein folding in the | mathematical sense; it 'just' produces good approximations. | ichbinwiederda wrote: | Far from an expert on complexity theory, but NP-hard problems | can be approximated in polynomial time. With Deep Learning you | are doing approximation. So this is nothing ground breaking in | that respect. | Vervious wrote: | there are also a variety of problems that are hard to | approximate. | foxtr0t wrote: | That actually isn't totally true. Approximate methods, in the | formal sense, require a guarantee that they perform within X | of the optimal solution. Not all NP-hard problems have | polynomial approximations and the methods shown here are | likely not approximations because they very likely provide no | guarantees on performance. They provide zero guarantees. | ichbinwiederda wrote: | Yes thank you for elaborating. I agree with you on both | counts. | blamestross wrote: | > Can deep learning find the cracks in P vs NP? | | No. It really is just heuristic building. A core problem with | using ML in this sort of use case is that it is often brittle. | Once it gets outside of the context it was trained in it may or | may not be able to generalize it's training to new contexts. We | may have difficulty knowing when it is very wrong. | | I think ML in research science could be viewed as a very good | intuitive oracle. Even if they are right 95% of the time, you | have to do this work prove the long way every time because that | 5% matters. The real utility is in "scanning the field" to | better focus research on things likely to bear fruit. | karl-j wrote: | I think I'm almost as uninformed as you, but I believe it comes | down to the difference between perfect solutions and close | enough solutions. Consider the classic NP problem of the | traveling salesman problem. | | "[Modern heuristic and approximation algorithms] can find | solutions for extremely large problems (millions of cities) | within a reasonable time which are with a high probability just | 2-3% away from the optimal solution." [0] | | When close enough is enough, NP problems can often be solved in | P time, and I suspect this is one of those cases. For crypto | however, close enough is not enough. | | [0] | https://en.wikipedia.org/wiki/Travelling_salesman_problem#He... | The_rationalist wrote: | Let's imagine that as a researcher I make a breaktrhough NN | model, but that I need a lot of TPUs/GPUs in order to test it, is | there a service for temporarily lending such hardware to me for | free/not much ? (e.g google colab ?) Otherwise researchers will | plateau with their hardware budget. | vadansky wrote: | Just to add to this whole "It's not solved! Yes it is!" | discussion. Note that | | >According to Professor Moult, a score of around 90 GDT is | informally considered to be competitive with results obtained | from experimental methods. | | So if we go by >= 90 as solved: | | >In the results from the 14th CASP assessment, released today, | our latest AlphaFold system achieves a median score of 92.4 GDT | overall across all targets. | | they solved for their targets, but | | >Even for the very hardest protein targets, those in the most | challenging free-modelling category, AlphaFold achieves a median | score of 87.0 GDT (data available here). | | They basically admit they still haven't "solved" it for "most | challenging free-modelling category" | | Take that as you will, not sure how useful the ">= 90 is solved" | criteria is since they call it "informal" themselves. | fastball wrote: | What do you mean you're not sure how useful ">= 90" is as a | criteria? | | You literally said why it is useful in your comment: | | > 90 GDT is informally considered to be competitive with | results obtained from experimental methods. | | It's informal because we don't have a true "gold-standard" for | determining a protein's folded structure - the best we have is | experimental methods of trying to determine the structure which | still have a great deal of error (compared to other things we | can measure). | | So all we can do is say "the GDT between two experimental | measurements (of the same protein) is often around 90, so if we | get there with predictive models that's pretty much just as | good". | | As soon as we have better experimental methods for determining | protein tertiary structure, you can be sure we will require | predictive models to deliver better results too. Until then, | the point is that the delta between any two experimental | determinations of folded structure is approximately the same as | the delta between an experimental determination and an | AlphaFold guess. So the AlphaFold guess may as well be an | experimental measurement. Except the AlphaFold guess happens | fairly trivially (once you give it the DNA sequence[1]), where | as the experimental method is involved and expensive. | | [1] Or the primary structure, I'm unsure what inputs are given | to AlphaFold. | vadansky wrote: | Just to add to my own comment. Why does HN like being so | pedantic about the definitions of words? This is an interesting | post regarding AI and cellular biochemistry. Do we really need | to add a philosophical debate about the meaning of "solution"? | Personally I think anyone who can't add to the discussion about | AI and protein folding should just not comment, instead of | settled on adding to the what does solution mean "debate". I'd | love to see a blanket rule flagging pedantic posts. | 6gvONxR4sf7o wrote: | HN pushes back on hype and because there's generally too much | hype in announcements. | pretendscholar wrote: | 87 GDT sounds pretty much solved to me if 90 is the benchmark | garmaine wrote: | That's shifting goal posts. The hardest structures are also | going to be harder experimentally. | | What makes them hard to predict is the very close energies | involved in different folding pathways. Those close energies | mean there will be more variant structures which change by use | the experimental approach too. | [deleted] | tpoacher wrote: | Whenever deepmind comes up with something like this, my first | instinct is to say "yay for humanity" ... then I remember who | they work for, and the second instinct is to say "Ah. Crap." | unchocked wrote: | Been out of the field for a while, could someone currently in it | qualify these results? Hyperbolic title notwithstanding, they | approach 90% median free modeling accuracy. The "other 90%" still | remains to be solved... | gmorainbows wrote: | The method relies on multiple-sequence-alignment (MSA) of | homologous proteins. This cannot fold arbitrary proteins, only | biologically relevant ones that have high quality MSAs | available. It's also worth pointing out that the gold-standard | for validating MSAs relies on PDBs of folded proteins. This is | exciting work that will assist NMR and XRay crystallographers, | but it's not a panacea of protein folding. | | https://github.com/deepmind/deepmind-research/issues/18 | flobosg wrote: | In their CASP abstract[1] they mention alternatives to | typical co-evolution features which improve performance in | shallow MSA depths. | | [1]: https://predictioncenter.org/casp14/doc/CASP14_Abstracts | .pdf... | gmorainbows wrote: | It doesn't matter so much how they perform the feature | extraction, so much as what their inputs to the feature | extraction are. | | This model requires a collection of wild-type proteins in | an accurate MSA. Producing an accurate MSA is hard even if | you have many homologs. | | They require protein homologs which means they can "only" | do this for wild-type proteins. This work is useless with | mutant and synthetic proteins. This is a big advancement | that will assist crystallographers and NMR structural | biologists with difficult wild-type proteins, but it | doesn't "solve protein folding" by any stretch of the | imagination. | flobosg wrote: | > Producing an accurate MSA is hard even if you have many | homologs. | | To assess co-evolutionary couplings the amount of | homologs in the MSA is not as important as the number of | _effective sequences_ (i.e. sequence depth and diversity) | in it. | | > They require protein homologs which means they can | "only" do this for wild-type proteins. | | Even remote homologs work, as shown by the widespread use | of HHM-based methods in the prediction pipelines. | | > This work is useless with mutant and synthetic | proteins. | | Unless you generate a flurry of data with them using deep | mutational scanning for example. As long as correlated | mutations are present in the MSA the technique should | work as expected no matter where the protein sequences | originated. | gmorainbows wrote: | I'm honestly not familiar with "deep mutational | scanning." Can you share a link? I'm first author on | papers related to the structural biology of coevolution | and I competed in CASP about a decade ago, but I haven't | kept up much since then. | flobosg wrote: | Sure! Here's a paper about the method: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410700/ | | And another one about its application in structure | prediction: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295002/ | asdfasgasdgasdg wrote: | I don't think anyone on HN is going to have more authority to | qualify the results than the independent experts quoted in the | linked article. Among whom are numbered a Nobel laureate, the | president of the group that designs the tests of protein | folding systems, and the former CEO of Genentech+current CEO of | Calico. | dekhn wrote: | Art's a smart guy and I have a lot of respect for his | biological intuition, but his understanding of computational | biology is very limited. | asdfasgasdgasdg wrote: | I would imagine that he is not assessing this advancement | merely using his own personal expertise, but rather the | combined expertise of the resources he represents. CEOs | don't just look at problems and potential solutions. They | have people who look at those things, and then tell them | their opinion. In any case, you've picked a nit with one of | the three people quoted. Any objections to the other two? | dekhn wrote: | My main objection to Vivek (the Nobel Prize winner) is | the prize in that case should have gone to my advisor, | Harry Noller. John Moult... he's a nice guy but I think | he's being a bit breathless here. | asdfasgasdgasdg wrote: | I see. The co-founder of the organization that tests | protein folding is a "nice guy." | dekhn wrote: | CASP is not "the organization that tests protein | folding". It's _an_ organization that every two years | does a blind prediction and publishes the results (I 've | competed, some 20 years ago). John's a protein expert, no | question about it. I knew him moderately well back in the | day because our advisors moved in similar circles. | mrDmrTmrJ wrote: | dekhn, in what way is Art's "understanding of computational | biology very limited?" | | I'd love to hear more. Specifically, what do you think that | computational biology can do that you think Art doesn't | understand or credit? | WWWWH wrote: | Quite right. And the Nobel laureate in question is a | structural biologist--so his expertise is directly relevant. | verroq wrote: | So who will have access to this? DeepMind never publishes their | models. | randcraw wrote: | I suspect DM will sell this as a service, especially to | corporations like pharmas who create small molecule drugs. If | their method works as advertised, it may rejuvenate the | flagging prospects of Rational Drug Design, the guiding R&D | drug development methodology behind most new molecular entities | (drugs) for the past ~25 years, which has not proven to be the | clear economic win that had been hoped. | TomJansen wrote: | According to [1], they must release enough information for | others to replicate the AI model: "As a condition of entering | CASP, DeepMind--like all groups--agreed to reveal sufficient | details about its method for other groups to re-create it. That | will be a boon for experimentalists, who will be able to use | accurate structure predictions to make sense of opaque x-ray | and cryo-EM data." | | [1]: https://www.sciencemag.org/news/2020/11/game-has-changed- | ai-... | [deleted] | sjg007 wrote: | Looks like a transformer model. Anyone have any insights? | yk wrote: | Very interesting, however now the problem becomes to characterize | such machine learning approaches. With traditional simulation | methods the authors can usually explain easily in which situation | a specific approach is good or bad, with neural networks we don't | really have a good approach how to analyze the quality of the | prediction. | CJefferson wrote: | Has anyone got any good other references for this? After some of | the dodgy experiments related to alpha zero (comparing to | purposefully degraded chess systems), I'd love to see some | independent analysis. | sanxiyn wrote: | CASP is that independent analysis... | CJefferson wrote: | True, but I haven't seen an independent discussion of the | CASP results. There is a good chance this is great, but I | don't trust deepmind press releases. | andi999 wrote: | I am also wondering. I generally find these kind of approaches | hard to believe, but this might be my prejudices. | syncsynchalt wrote: | The article in Science implies that we have independent | confirmation of predictions yielding useful results, beyond the | challenge itself: | | > The organizers even worried DeepMind may have been cheating | somehow. So Lupas set a special challenge: a membrane protein | from a species of archaea, an ancient group of microbes. For 10 | years, his research team tried every trick in the book to get | an x-ray crystal structure of the protein. "We couldn't solve | it." | | > But AlphaFold had no trouble. It returned a detailed image of | a three-part protein with two long helical arms in the middle. | The model enabled Lupas and his colleagues to make sense of | their x-ray data; within half an hour, they had fit their | experimental results to AlphaFold's predicted structure. "It's | almost perfect," Lupas says. "They could not possibly have | cheated on this. I don't know how they do it." | bayeslaw wrote: | as so many time recently the hn crowd proves to be completely | clueless and uneducated when it comes to ai.. this is a miracle.. | it is THE achievement we'll remember from the past decade when it | comes to ai.. if you don't understand why I recommend learning | and reading.the level of ignorance and often proud ignorance here | is frightening to me.. ppl who downplay this are either stupid in | biochemistry or ai or both .. please don't listen to them. this | right here is the single biggest news of 2020.. | piva00 wrote: | This sounds big, like really really big. At least from my old | times providing my idle computing resources to Folding@Home and | following that project, this seems like the major golden | milestone for protein folding. | FrojoS wrote: | Exactly what I was thinking. In a very small way many of us | tried to help with this problem back in the day. Makes it feel | even more important. | | Now I'm waiting for the equivalent news about SETI@Home ;-) | iandanforth wrote: | Title as submitted is hyperbole, please fix? | breck wrote: | I don't think it is. Look at the graph. | dang wrote: | We changed the title to that of the article as the site | guidelines ask. Submitted title was "DeepMind Solved Protein | Folding". | sanxiyn wrote: | It is not a hyperbole. | EgoIncarnate wrote: | I agree. "AlphaFold achieves a median score of 87.0 GDT". While | this is a major advance, to me 100 GDT would be 'solved', not | 87. | jhrmnn wrote: | By this metric, nothing has been ever solved in natural | sciences. So this is not a useful metric. | ClumsyPilot wrote: | Has it not? Neuton's laws of motion and Ohm's law are | pretty om point | joshuamorton wrote: | Not when you introduce quantum effects. | caymanjim wrote: | Newton's laws of motion were not a complete solution, as | they didn't account for relativity. | piva00 wrote: | If you can explain how gravity works in a quantum level | you'd deserve a Nobel. It's not 100% solved, Newton's | Laws of Motion are a model, not a solution. Just like the | vast majority of science. | jhrmnn wrote: | No, they are very crude (but useful!) models of reality. | General relativity and quantum electrodynamics are much | better corresponding models, respectively, and even those | are just approximations. | ashtonbaker wrote: | > To me | | Are you a domain expert? Because: | | > According to Professor Moult, a score of around 90 GDT is | informally considered to be competitive with results obtained | from experimental methods. | The_rationalist wrote: | but experimental methods have not solved protein folding | either. AlphaFold has'nt solved protein folding but I can't | wait to see their progress for ALphaFold 3. | | What would be informatively useful would be to know how | much accuracy is needed on average for drug engineers, I'd | say that 99% is more likely to be the minimum to make solid | inferences | ashtonbaker wrote: | > but experimental methods have not solved protein | folding either. | | I might be missing something here, but isn't | "experimental methods" just shorthand for "our best | knowledge of a protein's structure, obtained via NMR or | X-ray crystallography"? In that case, I'm not sure what | "solving" protein folding even means - literally zero | mean error? We can't know/solve anything beyond our best | knowledge, that's tautological. | | > What would be informatively useful would be to know how | much accuracy is needed on average for drug engineers. | | Yeah that would be interesting, but: | | > I'd say that 99% is more likely to be the minimum to | make solid inferences | | ...what are you basing this on? | The_rationalist wrote: | It's pretty clear what solving means, it means to have an | exact representation of the 3D structure. Our partial | knowledge obtained from such techniques is what it is, | partial. We need new metrology that increase the | observability accuracy and completeness OR better | deterministic models from sequences. | | "We can't know/solve anything beyond our best knowledge, | that's tautological." yes it is indeed tautological if | you assume that experimental methods can't get better | then guess what? It follows that they can't get better! | | "what are you basing this on?" on nothing solid, that's | why I say it _would_ be interesting. 99% is a non | negligible error rate given that proteins have generally | a not very high atom count and they the protein will be | produced an enormous amount of time, then the 1% error | progagate and can a priori easily break the system. But | this guess is not solid as I 'm not an expert. 99% | accuracy for simple (low atom count) proteins is a | sensitive error and could be negligible for very high | atom count proteins. | ashtonbaker wrote: | > It's pretty clear what solving means, it means to have | an exact representation of the 3D structure. | | That's not clear at all, because perfect measurement | doesn't exist. I agree that improving is always a worthy | goal, but clearly we don't need 100% accuracy to consider | something "solved" for the purposes of science. Also, "3D | structure" of a protein is not a fixed truth, the parts | are in motion all the time and may even have multiple | semi-stable conformations. Rather than focusing on X,Y,Z | perfection, I would imagine getting the angles between | bonds, or the general topological conformation right | would be more valuable. | | > if you assume that experimental methods can't get | better ... | | I'm saying that if your definition for "solved" is | "perfect knowledge", then we might as well not discuss | whether method X or Y solves the problem, because they | obviously do not. | | The more I think about it, the more I think we should | just drop the whole debate over the word "solved". | Clearly different experiments and different proteins will | have different requirements which may or may not be met | by this or by other techniques - I agree that I would be | interested to hear an expert weigh in on those | requirements. | jjoonathan wrote: | It's not. | kgwgk wrote: | I agree. If a newspaper published a headline "Dr. Whatever | cured cancer (... in some of her patients)" we would find it | misleading. | deeviant wrote: | If there was a headline, "Company X with Product Y cured | cancer" and it turns out that product Y actually only cured | 90% of cancer, I'm pretty sure most people would be happy the | headline. | | Oh, and to be a true parallel example, in this case the | remaining 10% of cancers might not even be cancers, as | experimental accuracy of protein structures is only ~90% | accurate, the model could very well be __more __accurate than | our current ability to experimentally detect protein | structure. | kgwgk wrote: | I really interpreted that headline as "found a general | solution to the protein-folding question" not as the also | interesting but not that much "can be used to solve | protein-folding problems". | purpleidea wrote: | Does this produce the various different foldings that each | protein can often "sit" in? | | Can it take temperature and other environmental conditions into | account? | | Can you specify that a particular ligand or electrical current is | present so that you can see the resultant shape change? | | Is all the source code for this available so that other | scientists can build on top of this, or will we have to go | through a paid or SaaS google API to use it? | xyzal wrote: | Does it mean there is no point in playing fold.it anymore? | breck wrote: | Yes, no point, as far as I understand it. | hobofan wrote: | fold.it was always more geared towards being edutainment than | actually contributing solutions. Of the ~20 publications made | related to fold.it over a decade, ~5 of them seem to have | contributed to solving structures, while the rest of them are | about the game itself. | flobosg wrote: | Besides structure prediction, Foldit is used for the inverse | problem: protein design. | IgniteTheSun wrote: | Considering the resource requirements for this AI approach | mentioned in the article, its unlikely that its been tested on | more than a few tens to hundreds of proteins. This may only | work on a subset of the proteome so I would think it worth it | to continue playing if you find it to be a fun past-time. | nynx wrote: | Those were the requirements for training it. | Rochus wrote: | Great. So then farewell CARA (http://cara.nmr.ch/doku.php), we | had a good time. | Rochus wrote: | After I had some time to think about it, I come to a different | conclusion. Contrary to my first assumption, Bio NMR (in | contrast to crystallography) will become more and more | important, since the method allows to study the dynamic | properties of proteins. With the structure predicted by DNNs, | the chemical shifts to be expected in the NMR spectra can be | calculated; the assignment problem is thus largely eliminated. | Bio NMR can then be used specifically to study the "parts that | move". | breatheoften wrote: | Anyone care to muse about appropriate investment strategies based | on the not previously feasible research approaches that might now | be possible? | | Should we expect to see faster progress in large well capitalized | bioscience companies -- or a sudden increase in the viability of | smaller biotech and/or biotech startups ...? Are we gonna see top | talent fleeing the old biotech companies to start their own | ventures with a new belief that the potential for huge reward | might suddenly seem achievable? | | What kind of companies do we think will be the first that are | able to translate this new knowledge into profits? | enchiridion wrote: | I agree. I think a company that is working on large scale | automated bio experiments would be well positioned to take | advantage of something like this. | | What companies are doing that work? | EgoIncarnate wrote: | "AlphaFold achieves a median score of 87.0 GDT". Game changing, | and a huge improvement, but not 100% solved. Also this is for | static folding. Dynamic folding and interaction is a much harder | problem. Those need to be tackled too before I would consider | protein folding 'solved'. | nabla9 wrote: | They solved the latest folding competition benchmark set. | | Shorter problems are easy to solve. Median score is mix of | easier hand harder problems. Next year competition will have | new set of much bigger and harder problems to solve. | | This seems like a leap, not solved as in having solution that | just works and scales. | hans1729 wrote: | >Those need to be tackled too before I would consider protein | folding 'solved' | | Semantics. From a systemtheoretical point of view, dynamic | folding is an abstraction of static folding; solve (i.e. | understand the underlying mechanisms) static folding and you | can start progressing on dynamic folding, building up on your | previously achieved solution. | | Wether it's solved or not depends on wether you mean `general | folding` or the `entire spectrum of folding` when considering | the problem. | 6gvONxR4sf7o wrote: | Solve could mean understanding the underlying mechanism, but | in this case, I don't think that's how they did it. | hans1729 wrote: | My intuition for deeplearning was exactly that, statistical | inference of underlying mechanisms. But I haven't read the | paper yet, so you might be right | ramraj07 wrote: | It's probably never going to be solved though right. To truly | solve protein folding we'd have to have a program that can | stimulate a small but still significant system at the QM level; | looks like deep learning can get us 60% (conservatively | estimating the whole problem domain ) but not all the edge | cases, just like it did in other problem domains as well. | dekhn wrote: | It remains unclear whether QM is required to fold proteins | accurately. So far classical methods have shown they require | far less computer power to get far closer to the right | structure. | PaulDavisThe1st wrote: | Despite this breakthrough by DeepMind, at this point we still | do not _understand_ protein folding. That makes it very hard | to say precisely which features would be required to do the | simulation correctly. | | DeepMind/AlphaFold might have something to contribute there | too, depending on how interpretable their network model(s?) | are. | ramraj07 wrote: | They seem to have a completely new tension algorithm that's | doing the heavy lifting now, so it's likely we will learn | much about how folding practically works from these results | as well. | sabujp wrote: | So it might "be over" for small molecules, but let's see large | macromolecules and protein assemblies be predicted | tgbugs wrote: | My conclusion reading this is that a gradient is a gradient is a | gradient. If you can minimize one, you can minimize them all. The | hard work would seem to be figuring out how to transform into a | gradient that your hardware can solve. It will also be | interesting to see the kinds of systematic errors that will come | as a result of the biases in the training set, and whether it can | be used to predict what the structures would look like under | slightly different conditions (e.g. pH). | 29athrowaway wrote: | So what's going to happen to fold.it and folding@home now? | flobosg wrote: | See https://news.ycombinator.com/item?id=25256318 and | https://news.ycombinator.com/item?id=25256772 | mabbo wrote: | Sometimes announcements like this are a bit over-the-top. But | what really, to me, cements the 'big-deal' of this is the "Median | Free-Modelling Accuracy" graph half way down the page. | | Scores of 30-45 for 15 years. Now scores of 87-92. | | This isn't a minor improvement, it's a leap forward. | martinpw wrote: | Why is the graph not monotonically increasing? Does the | complexity of the problem to be solved increase each time? If | so, does that make the relative improvement from the previous | result even more impressive? | breatheoften wrote: | That's quite interesting ... I believe the test set size is | not constant year to year but rather a function of how many | new structures have been experimentally discovered since the | last contest? | | Does seem like the contest structure could include quite a | bit of risk for hiding the effect of overfitting ... I wonder | if there is anything inherent about the problem that reduces | that risk ...? | FrojoS wrote: | My understanding is, that it's always 100 new structures, | which is a small fraction of the total structures | identified in that year. | | The reason why the top score in one year, can be lower than | in the previous year, is that the test (the 100 structures | to guess) is always new and different, so it can end up | being 'harder' than the year before. Luck will also play a | small role. | | Another explanation for a reduction in the top score would | be, that previous winners are not re-submitted unchanged. | For instance AlphaFold v1 seems to not have been submitted | to the latest competition. | breatheoften wrote: | Only 100 new structures each test cycle? That seems a | very small test set size ... | | Is it really possible to select 100 new structures which | together are likely to represent a meaningful increase in | the sample generalization versus the prior years test set | ...? | MauranKilom wrote: | Given that we only _know_ the structure of on the order | of 100k proteins, we might only get another 10k new ones | per year. I guess. | | Using 1% of those (presumably from the more-often- | reproduced subset) for this challenge seems reasonable? | Note that the structures have to remain secret up until | the challenge, and presumably all those teams uncovering | the structures don't want to have to wait up to 2 years | every time to actually make their results public. | breatheoften wrote: | Interesting ... plenty of opportunity then potentially | for the 100 samples to have prediction similarity to the | set of published discoveries (for expected or unknown | reasons)? | | I suppose it will take a few more years of repetition for | the challenge to confirm that the problem has been been | solved -- but I wonder if a new version of the contest is | going to be needed as well? Maybe the model accuracy is | now high enough to invert the contest to a form where | models generate predictions for randomly selected unknown | samples -- and experimental teams are then expected to | make observations for those particular sequences over the | next two years as part of their otherwise research agenda | selected experimental workload? | entropicdrifter wrote: | Not to mention the fact that two years ago they took it from | 45% to >60%. If they can continue improving, even with an | exponential decay in rate of improvement, this is certainly a | stunning example of technological disruption. | Zenst wrote: | Even without any improvement, the amount of grunt-work the AI | can pre-do and get down to a short-list - that in itself will | see changes in progress speeding research up. | kordlessagain wrote: | > and get down to a short-list | | There's no reason to believe the list will contain all | solutions, however. | patagurbon wrote: | No but it will hopefully contain _some_. Which for many | if not most problems is all that matters | WhompingWindows wrote: | This reminds me of AlphaGo and AlphaZero. DeepMind was able to | produce a very solid model on their first attempt, at both | protein folding and at Go (and Starcraft2 as well). Their | second models, however, seemed to blow their first out of the | water. | | This bodes extremely well for the future of computational | biology, I'm very excited thinking about the prospects. If we | know how a protein folds, we know its shape, meaning we know | which shaped/charged molecules are needed to act as | suppressors/enhancers of those proteins. | layer8 wrote: | One difference to AlphaZero though, if my understanding is | correct, is that AlphaFold is trained on a predetermined data | set and hence didn't learn how "arbitrary" proteins fold in | general, but just how the kinds of proteins fold for which we | already know how they fold. To work more like AlphaZero, | AlphaFold would have to be able to synthesize arbitrary | proteins and run the experiments on them to verify and | correct its predictions. Therefore it's conceivable that | AlphaFold is biased by the existing training data and doesn't | fully generalize to all proteins we would want to apply it | to. Maybe that won't be a problem in practice, but | nevertheless it makes for a significant difference from what | AlphaZero was about, being solely self-trained. | the8472 wrote: | > AlphaFold would have to be able to synthesize arbitrary | proteins and run the experiments on them to verify and | correct its predictions. | | Could this lead to a virtuous cycle where AlphaFold is used | generate a ton of random sequences where it has low | confidence, those are then screened for ease of synthesis, | measured and the results used to improve the model? | | Edit: nevermind, according to another comment[0] there are | still plenty of real proteins without experimental data | left to explore. | | [0] https://news.ycombinator.com/item?id=25255601 | treis wrote: | That is an impressive improvement, but I think you've missed | the most important point: | | >a score of around 90 GDT is informally considered to be | competitive with results obtained from experimental methods | | So DeepMind is to the point where it's a question of whether | their generated model or the experimentally determined | structure is closest to the actual physical structure. | timr wrote: | _" So DeepMind is to the point where it's a question of | whether their generated model or the experimentally | determined structure is closest to the actual physical | structure."_ | | While this is an accomplishment, nobody is going to be | confusing these models for structures produced | experimentally. The CASP metric is for backbone atoms. To | have a useful model of protein structure, you really need to | have the positions of the protein side-chain atoms modeled | correctly. Experimental methods will do that, but this | method, as I understand it, does not. | jey wrote: | So it's a really good start, but nobody is going to be | throwing these structures into molecular docking | simulations for drug discovery or etc just yet. But | hopefully those details can be worked out soon enough. | timr wrote: | Yeah, there's a huge difference between a 1A _all-atom_ | RMSD structure, and a 1A _backbone_ RMSD structure. The | non-backbone atoms in a protein make up most of the mass | and volume. When structural biologists talk about RMSD, | this is what they mean. | mabbo wrote: | Then we get the really fun question: if the experimentally | determined structure is only 90% accurate, can machine | learning actually reach 100%? Can you learn exact truth from | inexact examples? | | Which gets into the concept of whether the ML model has | _actually_ learned some deeper conceptual ideas than we have, | some deeper truth about how this works. If so, can we somehow | extract that truth, or is it truly a black box that does the | thing we want? | | I'm reminded of a sci-fi book I read long ago in which humans | are discussing the fact that the science they are utilizing | is beyond the scope of a human mind to comprehend- only the | AIs can intuitively deal with 12-dimensional manifolds (or | something to that extent). Maybe we've reached the doorstep | of that future. | carlmr wrote: | If you have an experimental error that is somewhat normally | distributed around the mean, the the AI should, with enough | examples, learn what the rules are that are closest to the | mean. Because it will minimize the sum of errors. | | So i do think the results could be more accurate than | measurement. | SubiculumCode wrote: | Something like this comes up in assessing the accuracy of | automated segmentation results of brain regions e.g. the | hippocampus. Human-machine reliability is approaching the | human to human reliability, so it becomes harder to improve | the automated methods. | asdfasgasdgasdg wrote: | I have a related question about this. If experimental methods | produce results around a score of 90, what is the baseline we | are comparing the DeepMind results against? If the | experimental error is equal to the observed DeepMind error, | how can we say which one is actually more erroneous? | mrDmrTmrJ wrote: | Excellent question. At somepoint, I think the only answer | is, "have a bunch of different people run a bunch of | experiments on the same protein." | | The threshold for "real" in particle physics is +5 sigma. | Which takes a lot of data. | cpeterso wrote: | And is it even meaningful for DeepMind to score better than | experimental results? How are DeepMind's results scored | then? | IfOnlyYouKnew wrote: | The "experiments" here use X-Ray Crystallography. Like most | methods of measuring anything, we have a pretty good idea | of its accuracy under various conditions. | | Think of it like satellite imagery of a tree: A score of | zero would be a single green-ish pixel, while a score of | 100 would show each leaf within the range it naturally | moves in due to wind etc. (proteins tend to wiggle quite a | bit under natural conditions, as well) | 0-_-0 wrote: | That's a damn good question, it looks like we don't know | how much above 90 AlphaFold is. | [deleted] | marcosdumay wrote: | Finding the energy of each configuration should be much | easier than finding the lowest-energy configuration. Can | that be calculated ab-initio or it is still too expensive? | crispycrafter2 wrote: | The problem with ab-initio methods in this context is the | sheer number of non-covalent interactions present in | these large proteins. A simple protein would require a | hybrid quantum mechanic/molecular mechanics simulation to | even approximate the vibrational energy required to | validate equilibrium. | | These proteins are so massive that we often use Daltons | [1] as an averaged measure of molecular weight. | | Conceptually one of the most promising applications of | quantum computing is theoretical chemistry, and we are | only now starting to make progress in this avenue [2]. I | anticipate it would require quantum computing to | explicitly optimise large folded proteins. | | 1. https://en.m.wikipedia.org/wiki/Dalton_(unit) 2. | https://arxiv.org/abs/2004.04174 | radioactivist wrote: | I think it's that a score of >90 means the result is within | the error bars of whatever particular experiment was chosen | to be the "reference". | contravariant wrote: | Of course this may no longer be the case for methods solely | trained to optimize that particular metric. | mFixman wrote: | I don't have a background in biology, and that quote confused | me. | | What's an experimental method for protein folding and why is | it so good? Are they talking about creating an actual, | physical protein in a lab and observing how it folds? | flobosg wrote: | > Are they talking about creating an actual, physical | protein in a lab and observing how it folds? | | Exactly. Researches purify the folded protein and then use | methods such as X-ray crystallography, nuclear magnetic | resonance, and cryo-electron microscopy to determine its | three-dimensional atomic structure. | beowulfey wrote: | I don't think you can say DeepMind could ever be more | accurate to the true physical structure since it was built on | the same experimental structures that it is being compared | to. The limit of accuracy is the experimental data. However, | I think we can say that a DeepMind prediction could at least | be _as good as_ a new experimental structure. | dwiel wrote: | This seems like an obvious assumption to make, but it isnt | always true. It is easier to see why if you are measuring a | single value multiple times in order to get a more accurate | estimate of the true value. In that case your "model" is | simply the mean of all measurements made and can exceed the | accuracy of a single measurement. | | In this case, the model is predicting values of multiple | structures, but patterns could still theoretically be found | which allow for predictions beyond the accuracy of a single | measurement. | dekhn wrote: | DM is merging several experimental data: known x-ray | structures, and evolutionary data. The experimental method | (xray) doesn't take advantage of the evolutionary data. And | it also doesn't model the underlying protein behavior | accurately (xray basically assumes a single static model | with atoms fluctuating in little gaussian "puffs" around | the atomic centers, but that's not how most proteins | behave). | FrojoS wrote: | Is that true? I thought fundamentally, the simulation tries | to find the state of lowest energy, which is defined by | physics. So, your result can be better than the data set | used for training. | robocat wrote: | But DeepMind could be used to find errors in the training | set. | | Let's say you have 100000 proteins in the training set. Now | remove #1 and train on 99999, and then check that it still | predicts the same protein result for #1 as the experimental | result. | | Or remove from training whole sets of proteins by | particular teams to find systematic errors made by teams? | mactrey wrote: | I'm not a biologist but I'm not sure that follows. It could | be that the experimentally-derived structure is 100% accurate | to the actual physical structure but getting 90% of your | predicted residues to match that is enough to get an accurate | prediction of protein behavior and hence "competitive." | phonebucket wrote: | This is a huge jump forward. Last year's performance already was | a big step up over the previous, and this seems to go much | further. So big kudos to the research team. | | Nonetheless, I'd like to hear more from specialists outside the | context of a marketing blog post before I fully buy into a claim | of a solution. | | There's also a rabbit hole about what 'solution' actually means. | Is the performance sufficient for any protein folding prediction | application that might arise in the future? | flobosg wrote: | See also the news in Nature: | https://www.nature.com/articles/d41586-020-03348-4 | yarabarla wrote: | Man, I remember running folding@home years ago on my terrible | laptop. Now this was done with what they say is equivalent to | only 100-200 GPUs. Crazy to see how far we've come in just a | short amount of time. | sumtechguy wrote: | me too... should have done bitcoins :) | jjk166 wrote: | Now onto the much harder problem of doing the reverse: taking an | arbitrary structure and determining an amino-acid sequence that | will fold into it. | Rochus wrote: | What for? | jjk166 wrote: | The forward folding problem lets you determine structures | from a known genetic sequence. So for example you could very | quickly sequence the genome of a virus and figure out how it | worked much faster than current methods allow. | | The reverse folding problem lets you specify a structure and | then make a genetic sequence to produce it. For example you | could look at this virus to see how it infects its host, then | design a custom protein to act as an anti-body stopping it, | which is a capability we don't currently have. | | Forward folding is certainly useful, but reverse folding | would be revolutionary. | Rochus wrote: | The set of all proteins which can potentially be expressed | in an organism is known. Now maybe we also get decent | (static) structure information for these. But the | interaction of a virus with the host cell is much more | complex. There is much more than just an amino acid | sequence involved. And these parts are all moving, so a | static picture as we now can create faster than before does | not contain all the information necessary to fully | understand the functions. | jjk166 wrote: | Precisely why I referred to it as a different and harder | problem | Rochus wrote: | There are a lot of different harder problems. | jjk166 wrote: | So? | weregiraffe wrote: | >The set of all proteins which can potentially be | expressed is known. | | Sure, "known", but it's on the order of 20^10000. It | won't fit in the entire visible volume of the universe. | Rochus wrote: | No, the genome of the host is much smaller than the | theoretical number of combinations. There are about 20 to | 30k different proteins in a human cell (about 20k | directly encoded on the DNA). | jjk166 wrote: | If you are designing proteins, you're not limited to | those that are already encoded in the host's DNA. | Rochus wrote: | Right, but you made the example with the virus docking at | a known organism. If you do synthetic biology and modify | bacteria to produce any proteins then the situation is | different of course. | ramraj07 wrote: | The other comment mentioned the example of making proteins | that bind a structure. Heres an extension - a general | understanding of how an enzyme works to catalyze chemical | reactions, is that it binds the reaction intermediate with | higher affinity than the two substrates; thus if we have this | reverse ability, we can start inventing enzymes that can | catalyze any arbitrary chemical reaction, even ones that need | energy input, so you could imagine for example enzyme systems | that can convert plastic to fuel! | Rochus wrote: | Ok, then this is about enzymes which do not yet exist in | the organism. You could then modify bacteria so they | produce this enzyme and feed on plastic, I see. | flobosg wrote: | Plastic degradation is a thing already in naturally | occurring bacteria that evolved a PETase: | https://science.sciencemag.org/content/351/6278/1196/tab- | fig... | Rochus wrote: | But producing fuel as the fellow suggested would then be | another function to be added to the bacterium; and maybe | it should work on different kinds of plastic. | flobosg wrote: | Of course, that's why I focused on degradation. There's | plenty of room for improvement. For instance, PETase is | not very efficient actually, and many research groups are | working on its engineering. | abecedarius wrote: | I think you have this backwards in practice. It was in the 80s | that I first read a paper about a de-novo protein design | engineered for a specific stable conformation. Natural proteins | have no reason to be particularly predictable, just as genetic | programming produces hard-to-understand programs relative to | human-written ones. In fact making the structure especially | stable against perturbations seems like it'd make it less | responsive to changing evolutionary pressures. | | (Am not a structural biologist.) | | Added: 2019 article on de novo design: | https://www.nature.com/articles/d41586-019-02251-x Not to say | that better prediction won't also make design easier -- of | course I expect it will. | flobosg wrote: | Deep learning methods are being applied here as well; see for | example | https://www.biorxiv.org/content/10.1101/2020.07.22.211482v1 | ashtonbaker wrote: | I assume if the forward direction is fast enough, the reverse | could be done by evolutionary methods. | Metacelsus wrote: | David Baker's lab is working on this; their Rosetta program has | been getting reasonably good at it. | uuuuuuuuuuuu wrote: | I feel like DeepMind has a disproportionately large scientific | impact relative to its resource pool. How would one (or a group) | go about replicating its success? | ribrars wrote: | I think the key here to replicating the success is the | deployment of deep learning effectively. But I would argue that | deepmind's resource pool is immense, it's backed by Google. The | resources of GPU's (and more advanced TPU's) are in | abundance... not to mention the many brilliant PhD scientists | who work there. | danaris wrote: | The title here is not merely breathless clickbait, it also has | very little to do with the headline of the actual article, which | is "AlphaFold: a solution to a 50-year-old grand challenge in | biology". | | I thought the #1 criterion for titles was that they should match | the original if at all reasonable...? | woeirua wrote: | This is a big step forward, but the outstanding question as far | as to whether or not this is useful for evaluating novel | proteins, is going to be how good is the confidence metric at | telling the user to trust or not trust the results. You can see | from their examples, that AlphaFold is very good but not perfect. | I imagine for some proteins it will still give misleading or | erroneous results and if you can't tell when that happens without | verifying the structure experimentally then this will likely not | be that useful for new science. | mcshicks wrote: | I was wondering the same thing. But I also wonder if having | good guesses makes the x-ray crystallography and other | experiments to verify a given protein easier/cheaper/quicker? I | don't know enough about the actual techniques to have an | informed opinion but I would think it would be helpful. | sanxiyn wrote: | It does. https://www.nature.com/articles/d41586-020-03348-4 | reports a case of x-ray crystallography helped by AlphaFold | prediction. | asdfasgasdgasdg wrote: | > the outstanding question as far as to whether or not this is | useful for evaluating novel proteins | | That is not an outstanding question. The test on which DeepMind | scored high marks is a test of how well the algorithm folds | novel proteins -- proteins whose ground-truth structure has not | yet been published. | sundarurfriend wrote: | You missed the actual outstanding question in their comment: | | > the outstanding question ... is going to be how good is the | confidence metric at telling the user to trust or not trust | the results. | deeviant wrote: | You don't generally look at neural network output like | that. | | There is generally a threshold, less than X, not the class, | equal or more, is the class. Then you run the network with | the same threshold on a known data set and compute a | confusion matrix, which tells you about the error, I don't | even want to know what a confusion matrix analogue for 3D | geometry would look like but I'm sure they have something. | | This is literally the process that one does in taking part | of the this. And the error rate (specifically the lack of | errors) is what is everybody is talking about. 90 is just | as accurate as we can get with experimental measurement. | It's likely at this point the source of error is in the | data set (we can only train on data we experimentally | measure and these are not perfect measurements). It's also | possible, at this point, the model generalized so well that | when it deviates from experimental measurements __it 's | actually correct and the experimental value was the one | that was wrong __. | | So no, the outstanding question is not "is going to be how | good is the confidence metric at telling the user to trust | or not trust the results.". Nobody is going to be looking | confidence values when it model is giving an output, they | are going to be looking at the overall error rate across a | broad spectrum of proteins to get a sense of it's accuracy. | woeirua wrote: | We'd have to see the distribution of GDT scores evaluated on | unknown proteins to say anything about how confident we can | be. If the distribution is tightly distributed around the | median then great, this works really well. If the variance is | large though then you're going to have a hard time using this | for meaningful predictions. | foota wrote: | According to the article there's a confidence score as | well. As long as this is sufficiently predictive of errors | either a tight or wide distribution is likely acceptable. | woeirua wrote: | We need to see the relationship between confidence and | GDT score. If you have a nice relationship then again | everything is great. But... most confidence metrics from | neural networks do not have a nice relationship to the | primary metric. | theptip wrote: | It's a good question, and I'm not a domain expert here. | | The article did claim: | | > According to Professor Moult, a score of around 90 GDT is | informally considered to be competitive with results obtained | from experimental methods. | | So perhaps their score of 87 GDT is pretty significant. But | "competitive with" is not the same as "always in agreement | with", as you point out. Could be the failure modes are | problematic. | kxs wrote: | There are other experimental methods that are much cheaper that | can be used to assist validation. Also the models look damn | impressive, even down to the sidechain packing. | aardvarkr wrote: | Every simulator is going to have error. In this case this | biennial challenge represents the computational state of the | art with scores of 30-40 over the last decade. The AlphaFold2 | model sends that score up to 87 with errors about than the | width of the atom. You can actually see the difference between | their prediction and the actual result and it's stunning. This | is all on the blog site so I recommend reading before throwing | shade. | woeirua wrote: | I read the blog. But there's a big difference between a mind | blowing tech demo and something that can be used in a | commercially viable process. | comicjk wrote: | Scientists can verify that an AlphaFold-predicted structure is | correct, or at least useful, without being able to get the | structure experimentally. For instance, we could use the | AlphaFold-predicted structure to do protein-ligand binding | calculations for a bunch of known molecules. If these | calculations agree with experimental protein-ligand binding | (which they generally do for proteins with known structures), | then we can say with high confidence that we've got a good | structure. | dontbeevil1992 wrote: | does that mean that protein-folding is sort of in NP? | hedora wrote: | It's probably not in NP, in that there is not a polynomial | time algorithm that checks solutions for correctness. | dekhn wrote: | The way computer scientists do it, yes, it is. In the CS | situation you define an energy function (in this case | representing the physical behavior of the protein in water) | and find a heuristic to approximate the coordinates of the | lowest energy configuration; done, problem solved. | | in reality, that's not how it works at all. The energy | functions we have are crappy and require too much sampling | before we can find the lowest energy configuration. And | more importantly, it doesn't look like proteins typically | fold to their lowest energy configuration (with the | exception of some small fast two state folders), but rather | explore a kinetically accessible region around there (or | even somewhere else entirely, if the energy cost to | transition is too high). | | Methods like AF depend heavily on large amount of | information correlation from evolutionary data, which has | historically been of the highest value for making decisions | about protein structure. | jeffxtreme wrote: | GDT_TS for AlphaFold is now comparable is at experimental levels; | but that's based on the class of proteins for which we've been | able to determine the 3D structure of the protein, for which | there might be selection bias. | | I wonder if we can determine if this extends to proteins that | aren't as keen to determining their 3D structure? | | For example, certain proteins are more crystallizable than | others.. For these non-crystallizable proteins, I wonder if we | can say that AlphaFold would generate accurate 3D models? And if | possible, might there be a way to map out this uncertainty? | deeviant wrote: | > I wonder if we can determine if this extends to proteins that | aren't as keen to determining their 3D structure? | | This is already happened. | | "An AlphaFold prediction helped to determine the structure of a | bacterial protein that Lupas's lab has been trying to crack for | years. Lupas's team had previously collected raw X-ray | diffraction data, but transforming these Rorschach-like | patterns into a structure requires some information about the | shape of the protein. Tricks for getting this information, as | well as other prediction tools, had failed. "The model from | group 427 gave us our structure in half an hour, after we had | spent a decade trying everything," Lupas says." | | From: https://www.nature.com/articles/d41586-020-03348-4 | jeffxtreme wrote: | Agree this is great to hear, but the fact that they had X-ray | diffraction data indicates this protein was indeed | crystallizable no? | | Though the next paragraph in the article shows that DeepMind | is indeed working on mapping out reliability: | | "Demis Hassabis, DeepMind's co-founder and chief executive, | says that the company plans to make AlphaFold useful so other | scientists can employ it. (It previously published enough | details about the first version of AlphaFold for other | scientists to replicate the approach.) It can take AlphaFold | days to come up with a predicted structure, which includes | estimates on the reliability of different regions of the | protein. "We're just starting to understand what biologists | would want," adds Hassabis, who sees drug discovery and | protein design as potential applications." | flobosg wrote: | > Agree this is great to hear, but the fact that they had | X-ray diffraction data indicates this protein was indeed | crystallizable no? | | Yes. CASP uses as targets proteins with no known published | structure but a solved or soon-to-be-solved one. They are | then kept on hold until the end of the competition. | dalbasal wrote: | Question for the wise: | | Assuming optimistic further progress, what are the implications | of accurately predicting protein folding? What are we hoping to | discover, or succeed in doing? | spenczar5 wrote: | AlphaFold was used to analyze proteins in SARS-CoV-2 | (https://www.crick.ac.uk/news/2020-03-05_crick-scientists- | sup...). Does anyone know what impact that has had? | | This is really an amazing moment. | nmca wrote: | https://deepmind.com/blog/article/alphafold-a-solution-to-a-... | optimalsolver wrote: | Can just anyone enter this challenge, or do you have to be part | of a major institution? | dmd wrote: | Anyone. | Quarrel wrote: | As someone who wrote a thesis many moons ago about protein | folding, this is pretty astonishing to see. Yay science. | optimalsolver wrote: | Can just anyone enter this challenge, or do you have to be part | of a major institution? | flobosg wrote: | I think anyone can take part. There are a few unaffiliated | participants. | schemescape wrote: | Did AlphaFold2 also have the biggest budget? :) | | Edit: from the other HN article on this topic: | | > We trained this system on publicly available data consisting of | ~170,000 protein structures from the protein data bank together | with large databases containing protein sequences of unknown | structure. It uses approximately 128 TPUv3 cores (roughly | equivalent to ~100-200 GPUs) run over a few weeks | | https://deepmind.com/blog/article/alphafold-a-solution-to-a-... | [deleted] | curiousllama wrote: | Actually, no! Or at least, the budget they used (<$100k at | retail prices to train the model) is well within the feasible | range for other research institutions. | | In other words, it's less like GPT3 and more like ImageNet. | whimsicalism wrote: | I don't know - tens of thousands per train is not accessible | for most academic institutions when you consider the | necessity of ablation studies, experimentation, etc. | anchpop wrote: | For a topic like protein folding, it should be | whimsicalism wrote: | > For a topic like protein folding, it should be | | Well, I've worked in some academic deep research labs and | they did not have the money to do the experiments they | wanted to do. | lacksconfidence wrote: | Is the cost to train really the relevant metric for | developing this? It seems like the salary's involved are | probably at least 10x whatever they spent on hardware. | TomJansen wrote: | >Is the cost to train really the relevant metric for | developing this? | | Yes, because they must release sufficient information for | others to recreate the AI model, according to the rules of | entering CASP. | lacksconfidence wrote: | I was replying in the context of the grand parent: | | > Did AlphaFold2 also have the biggest budget? :) | | And then the parent | | > Actually, no! Or at least, the budget they used (<$100k | at retail prices to train the model) is well within the | feasible range for other research institutions. | | I'm not sure how the cost of replicating the model in the | future is relevant in this context. We appear to be | discussing the cost of developing this model from | scratch, such as what it would have taken an alternate | team to create and submit this if DeepMind never got | involved. | kevincox wrote: | Additionally the training test of all of the models during | development. | [deleted] | partingshots wrote: | I continue to be impressed by how quickly DeepMind has managed to | progress in such a short time. CASP13 was a shocker to all of us | I think, but many were skeptical as to the longevity of the | performance DeepMind was able to achieve. I believe with CASP14 | rankings now released, it's safe to say that they've proven | themselves. | | Congratulations to the team! This work will have far reaching | impacts, and I hope that you continue to invest heavily in this | area of research. | [deleted] | whimsicalism wrote: | Progress like this was, in my view, inevitable after the | invention of unsupervised transformers. | | It'll be genetics next. | | e: although AlphaFold appears to be convolutionally based! I | suspect that'll change soon. | alquemist wrote: | FWIW, transformers is to sequences what convnets is to grids, | modulo important considerations like kernel size and | normalization. Think of transformers as really wide (N) and | really short (1) convolutions. Both are instances of | graphnets with a suitable neighbor function. Once | normalization was cracked by transformers, all sort of | interesting graphnets became possible, though it's possible | that stacked k-dimensional convolutions are sufficient in | practice. | whimsicalism wrote: | I work in the field, I don't need the difference explained | to me. | | > Think of transformers as really wide (N) and really short | (1) convolutions | | Modern transformer networks are not "really short" and | you're also conflating the difference between intra- and | inter- attention. | | There is still a pitched battle being waged between | convnets and transformers for sequences, although it looks | like transformers have the upper hand accuracy wise right | now, convnets are competitive speed-wise. | klmr wrote: | > _It 'll be genetics next._ | | Which part of genetics are you thinking of? Much of genetics | isn't amenable to this kind of ML, because it isn't some kind | of optimisation problem. And many other parts don't require | ML because they can be modelled very closely using exact | methods. ML _does_ get used here, and sometimes to great | effect (e.g. DeepVariant, which often outperforms other | methods, but not by much -- not because DeepVariant isn't | good, but rather because we have very efficient | approximations to the exact solution). | whimsicalism wrote: | What do you mean? | | Genetics is amenable because the genome is a sequence that | can be language modeled/auto-regressed for depth of | understanding by the network. | | There are plenty of inferences that you would want to do on | genetic sequences that we can't model exactly and there is | some past work on doing stuff like this, although biology | is usually a few years behind. | | https://www.nature.com/articles/s41592-018-0138-4 | | e: for clarity | garmaine wrote: | This is word salad. | whimsicalism wrote: | Rude. I would appreciate substantive criticism, | especially when I'm linking papers in Nature starting to | do _exactly_ what I 'm talking about. | garmaine wrote: | I cannot give constructive feedback to something which is | incomprehensible. | | "the genome is a sequence that can be language | modeled/auto-regressed for depth of understanding by the | network" | | The genome is not a sequence so much as a discrete set of | genes which are themselves sequences which specify | construction plans for proteins. That distinction is | important. | | Language modeling in the context of machine learning | typically means NLP methods. Genetics is nothing like | natural language. | | Auto-regression is using (typically time series) | information to predict the next codon. This makes very | little sense in the context of genetics since, again, the | genetic code is not an information carrying medium in the | same sense as human language. Being able to predict the | next codon tells you zilch in terms of useable | information. | | "Depth of understanding by the network" ... what does | that even mean??? | | The above sentence is a bunch of popular technical jargon | from an unrelated field thrown together in a nonsensical | way. AKA word salad. | whimsicalism wrote: | > The genome is not a sequence so much as a discrete set | of genes which are themselves sequences which specify | construction plans for proteins. That distinction is | important. | | aka a sequence. "a book is not a sequence so much as a | discrete set of chapters which are themselves sequences | of paragraphs which are themselves sequences of | sentences" -> still a sequence | | these techniques are already being used, such as in the | paper I just linked. | | > Being able to predict the next codon tells you zilch in | terms of useable information. | | You have absolutely no way of knowing that apriori. And | autogressive tasks can be more sophisticated than just | next codon. | | > bunch of popular technical jargon from an unrelated | field thrown together in a nonsensical way | | Okay, feel free to think that. | | There's always this assumption of it "will never work on | _my_ field. " I've done work on NLP and on proteins and | read others' work on genetics. I think you will end up | being surprised, although it might take a few years. | klmr wrote: | I meant, which _specifics_ are you thinking of? | | > _Genetics is amenable because it is a sequence_ | | Not sure what you mean by that. Genetics is a field of | research. The _genome_ is a sequence. And yes, that | sequence can be modelled for various purposes but without | a specific purpose there's no point in doing so (and | furthermore doing so without specific purpose is trivial | -- e.g. via markov chains or even simpler stochastic | processes -- but not informative). | | > _There are plenty of inferences that you would want to | do on genetic sequences_ | | I'm aware (I'm in the field). But, again, I was looking | for specific examples where you'd expect ML to provide | breakthroughs. Because so far, the reason why ML hasn't | provided many breakthroughs in less about the lack of | research and more because it's not as suitable here as | for other hard questions. For instance, polygenic risk | scores (arguably the current "hotness" in the general | field of genetics) can already be calculated fairly | precisely using GWAS, it just requires a ton of clinical | data. GWAS arguably already uses ML but, more to the | point, throwing more ML at the problem won't lead to | breakthroughs because the problem isn't compute bound or | vague, it's purely limited by data availability. | | I could imagine that ML can help improve spatial | resolution of single-cell expression data (once again ML | is already used here) but, again, I don't think we'll see | improvements worthy of called breakthroughs, since we're | already fairly good. | whimsicalism wrote: | > Not sure what you mean by that | | I spoke loosely, my mind skipped ahead of my writing, and | I didn't realize that we were parsing so closely. | "Genetics (the field) is amenable because the object of | its study (the genome) is a sequence" would have been | more correct but I thought it was implied. | | > without a specific purpose there's no point in doing so | | Well yes, prior to the success of transfer learning I | could see why you would think that is the case, but if | you've been following deep sequence research recently | then you would know there are actually immense benefits | to doing so because the embeddings learned can then be | portably used on downstream tasks. | | > it's purely limited by data availability. | | Yes, and transfer learning on models pre-trained on | unsupervised sequence tasks provides a (so-far under- | explored) path around labeled data availability problems. | | I already linked to a paper showing a task that these | sorts of approaches outperform, and that is without using | the most recent techniques in sequence modeling. | | Maybe read the paper in Nature that uses this exact LM | technique to predict the effect of mutations before | assuming that it doesn't work: https://sci- | hub.do/10.1038/s41592-018-0138-4 | | I am not directly in the field, you are right - but I | think you are also being overconfident if you think that | these approaches are exactly the same as the HMM/markov | chain approaches that came before. | klmr wrote: | Thanks for the paper, I'll check it out; this isn't my | speciality so I'm definitely learning something. Just one | minor clarification: | | > _Maybe read the paper ... before assuming that it doesn | 't work_ | | I don't assume that. In fact, I _know_ that using ML | _works_ on many problems in genetics. What I'm less | convinced by is that we can expect a _breakthrough_ due | to ML any time soon, partly because conventional | techniques (including ML) already have a handle on some | current problems in genetics, and because there isn't | really a specific (or flashy) hard, algorithmic problem | like there is in structural biology. Rather, there's lots | of stuff where I expect to see steady incremental | improvement. In fact, in Wikipedia's list of unsolved | biological problems [1] there isn't a single one that I'd | characterise specifically as a question from the field of | genetics (as a geneticist, that's slightly depressing). | | But my question was even more innocent than that: I'm not | even _that_ sceptical, I'm just not aware of anything and | genuinely wanted an answer. And the paper you've posted | might provide just that, so go and do my research now. | | [1] https://en.wikipedia.org/wiki/List_of_unsolved_proble | ms_in_b... | jcims wrote: | _Not_ being in the field, I would term what I see in this | story as a 'bottom up' approach to understanding genetics | /molecular biology. More akin to applied sciences than | medicine or health. This, for example, seems to be very | important but it still leaves us with a jello jigsaw | puzzle with 200 million pieces and probably far removed | from immediate utility in health outcomes. | | Then there's the more clinically oriented approaches of | looking at effects, trying to find associated | genes/mutations whatever mechanisms exist in between to | cause a desirable or undesirable outcome. I'd call that | 'top down'. | | I'm sure the lines get blurred more every day, but is | there a meaningful distinction into these and/or more | categories that are working the problem from both ends? | If so, are there associated terms of art for them? | the8472 wrote: | > but many were skeptical as to the longevity of the | performance DeepMind was able to achieve | | For a non-biologist, on what is this skepticism based? | | Just purely based on following ML news it looks like the trend | for ML solutions has been that they've overtaken expert-systems | once they've gained a solid foodhold in a field. Maybe this is | some perception bias. Are there any cases where ML performed | decently but then hit a ceiling while expert systems kept | improving? | garmaine wrote: | > Are there any cases where ML performed decently but then | hit a ceiling while expert systems kept improving? | | Yes, this describes entire history of AI including several | boom-bust cycles. In particular the 80's come to mind. Yes | the practitioners think that there's no technical barriers | stopping them from eating the world, but that's exactly what | people thought about other so-called revolutionary advances. | | Although to be pedantic, "expert systems" is the technology | behind AI boom of the 80's. At the time people were saying | expert systems can't be as good as existing algorithms | (including what we would now call "machine learning" | techniques), then suddenly the expert systems were better and | there was rampant speculation real AI was around the corner. | Then they plateaued. | | We _appear_ to be at the tail end of the maximum hype part of | the boom-bust cycle. Thinking that the rapid gains being made | by the current deep learning approaches will soon hit a wall | is a reasonable outside-view prediction to make: nearly every | time we 've had a similarly transformative technology in the | AI space and elsewhere, hitting the wall is exactly what | happened. The onus would be on practitioners to show that | this time really is different. | sdenton4 wrote: | I think the disconnect this time around is in | productionization. We're getting breakthroughs in a wide | range of problems, and translating those gains in the | problem space into 'real' stable, practical solutions we | can use in the world is the remaining gap, and often takes | years of additional effort. It's still really expensive to | launch this stuff, and often requires domain expertise that | the ML research team doesn't have. | | We're seeing a lot of this pattern: ML Researcher shows up, | says 'hey gimme your hardest problem in a nice parseable | format' and then knocks a solution out of the park. The ML | researcher then goes to the next field of study, leaving | (say) the doctors or whatever to try to bridge the gap | between the nice competition data and actual medical | records. It also turns out that there's a host of closely | related but different problems that ALSO need to be solved | for the competition problem to really be useful. | | I don't think this means that the ML has failed, though; | it's probably similar to the situation for accounting | software circa 1980: everything was on paper, so using a | computerized system was more trouble than it was worth. But | today the situation in accounting has completely flipped. | Apply N+1 years of consistent effort improving data | ecosystems, and the ML might be a lot easier to use on | generic real world problems. | garmaine wrote: | Next time you fly through a busy airport, think about the | system which assigns planes to gates in realtime based on | a large number of variable factors in order to maximize | utilization and minimize waits. This is an expert system | design in the 80's and which allowed a huge increase in | the number of planes handled per day at the busiest | airports. | | Or when you drive your car, think about the lights-out | factory that built-it, using robotics technologies | developed in the 80's and 90's, and the freeways which | largely operate without choke points again due to expert | system models used by city planners. | | These advances were just as revolutionary before, and | people were just as excited about AI technologies eating | the world. Still, it largely didn't happen. To continue | the example of robotics, we don't have an equivalent of | the Jetson's home robot Rosey. We can make a robot | assemble a $50,000 car, but we can't get it to fold the | laundry. | | These rapid successes you see aren't literally "any | problem from any field" -- it's specific problems chosen | specifically for their likely ease in solving using | current methods. DeepMind didn't decide to take on | protein folding at random; they looked around and picked | a problem that they thought they could solve. Don't | expect them to have as much success on every problem they | put their minds to. | | No, machine learning is not trivially solving the hardest | problems in every field. Not even close. In biomedicine, | for example, protein folding is probably one of the | easiest challenges. It's a hard problem, yes, but it's | self-contained: given an amino acid sequence, predict the | structure. Unlike, say, predicting the metabolism of a | drug applied to a living system, which requires | understanding an extremely dense network of existing | metabolic pathways and their interdependencies on local | cell function. There's no magic ML pixie dust that can | make that hard problem go away. | ramraj07 wrote: | It's because for many researchers ML is just to take a | standard keras or scikitlearn model shove their data in and | get some table or number out, and see if that solves their | problem. If that's your only ML experience then I suppose | this is how sceptical you'd be of ML in general. | | It looks like DeepMind invented a completely new method for | this round that's not just an extension of their previous | work, showing how much you can gain if you don't shoebox | yourself into just trying to improve existing methods. | | That all the scientists were highly skeptical about the scope | of ML (and these are computer scientists to begin with mind | you) just shows how little they knew of what they did know of | what a computer or a program can possibly do, which is a bit | appalling to be honest. | timr wrote: | _" It looks like DeepMind invented a completely new method | for this round that's not just an extension of their | previous work, showing how much you can gain if you don't | shoebox yourself into just trying to improve existing | methods. That all the scientists were highly skeptical | about the scope of ML (and these are computer scientists to | begin with mind you) just shows how little they knew of | what they did know of what a computer or a program can | possibly do, which is a bit appalling to be honest."_ | | My PhD (now over a decade ago...yikes) was in applying much | simpler ML methods to these kinds of problems (I started in | protein folding, finished in protein / nucleic acid | recognition, but my real interest was always protein | design). Even back then, it was clear that ML methods had a | lot more potential for structural biology (pun unintended) | than for which they were being given credit. But it was | hard to get interest from a research community that cared | little about non-physical solutions. No matter how well you | did, people would dismiss it as a "black box solution", and | that pretty much limited your impact. | | Some of this is understandable: even today, it's not at all | clear that a custom-built ML model for protein folding is | of much use to anyone -- particularly a model that doesn't | consider all of the atoms in the protein. The traditional | justification for research in this area is that if you | could develop a _sufficiently general_ model of protein | physics, it would also allow you to do all sorts of _other_ | stuff that is much more interesting: rational protein | design, drug binding, etc. | | The alphafold model is not really useful for any of this, | so in a way, it's kind of like the weinermobile of science: | cool and impressive when done well ("hey! a giant hot dog | on wheels!"), but not really useful outside of the niche | for which it was designed. So it's hard to blame | researchers in this field -- who generally have to chase | funding and justify their existence -- from pursuing the | application of deep learning to this one, narrow problem | domain. | | Obviously there will now be a wave of follow-on research, | and it's impossible to know what methods this will spawn. | Maybe this will revolutionize computational structural | biology, maybe not. But I think it's a _little_ unfair to | demonize the entire field. Protein folding just | traditionally hasn 't been a very useful or interesting | area, and like all "pure science", it leads to a lot of | small-stakes, tribal thinking amongst the few players who | can afford to compete. This is right out of Thomas Kuhn: a | newcomer sweeps into a field, glances at the work of the | past, then bashes it over the head, dismissively. | ramraj07 wrote: | We don't know too much about the exact model they made | but it looks sufficiently generalizable to be able to | give a candidate protein structure for any given | sequence. It doesn't automatically cure cancer and inject | the drug but that by itself is an amazing tool that if | available to everyone will revolutionize biology | experimentation. | | I will definitely blame the protein structure field in | multiple levels though. It was always frustrating to me | to open up Nature or Science and see it filled with | papers about structure - like they are innovating so much | that half of the top science magazines every week have | papers in that field, yet it's not going anywhere? Or is | it simply just a bunch of professors tooting their own | horns about ostensible progress in a field that's archaic | by decades if not years? The overall protein structure | field internalised some dogmas in self defeating ways to | everyone's detriment and finally events like this (and | Cryo em, maybe) will jolt them out or make them fully | irrelevant so we can move on. it's only doubly ironic | that this came from a team in a company with minimal | academic ties showing how toxic that entire system is. I | only feel pity for the graduate students still trying to | crystallize proteins in this day and age. | dekhn wrote: | The reason for your second paragraph is pretty | straightforward. There has been an immense amount of | support for proteins as "the workhorses of the cell" for | hundred+ years. I call it the "protein bias". We've seen | in many times- for example when it was first hypothesized | and then proved that DNA, rather than protein, is the | heredity-encoding material, and seen many times, for | example in the denial that RNA could act as an enzyme or | the functional core of the ribosome could be a ribozyme. | | I think what basically happened is a very influential | group of scientists mainly in Cambridge around the 50s | and 60s convinced everytbody that reductionist molecular | biology would be able to crystallize proteins and | "understand precisely how they function" by inspecting | the structures carefully enough. | | I learned, after reading all those breathless papers | about individual structures and how they explain the | function of protein is that in the vast majority of | cases, they don't have enough data to speculate | responsibility about the behavior of proteins and how | they implement their functions. There are definiteyl | cases of where an elucidated structure immediately led to | an improved understanding of function: | | "It has not escaped our notice (12) that the specific | pairing we have postulated immediately suggests a | possible copying mechanism for the genetic material." | | but most papers about how cytochrome "works" aren't | really illuminating at all. | timr wrote: | _" We don't know too much about the exact model they made | but it looks sufficiently generalizable to be able to | give a candidate protein structure for any given | sequence. It doesn't automatically cure cancer and inject | the drug but that by itself is an amazing tool that if | available to everyone will revolutionize biology | experimentation."_ | | They say on their own press-release page that side-chains | are a future research problem, and nothing about their | method description makes me believe they've innovated on | all-atom modeling. This software seems able to generate | good models of protein backbones; these kinds of models | certainly have uses, but a backbone model is not enough | for drug design. | | This is certainly an advancement, but you're exaggerating | the scope of the accomplishment. | | _" I only feel pity for the graduate students still | trying to crystallize proteins in this day and age. "_ | | Nothing about this changes the fact that protein | crystallography is a gold-standard method for determining | a protein structure. CryoEM has made it possible to | obtain good structures for classes of proteins we could | never achieve before, and it's certainly _interesting_ if | we can run a computer for a few days to get a 1A ab | initio model for a protein sequence, but we could | _already_ do that for a large class of proteins with | homology modeling. These predicted structures still aren | 't generally that useful for drug design, where tiny | details of molecular interactions matter. | | To put it in perspective: protein energetics are measured | on the scale of _tens of kcal / mol_. Protein-drug | interactions are measured in _fractions of a kcal_. A | single hydrogen bond or cation-pi interaction or | displaced water molecule can make the difference between | a drug candidate and an abandoned lead. Tiny changes in | backbone position make the difference between a good | structure and a bad one. Alphafold isn 't doing that kind | of modeling. | ramraj07 wrote: | Of course, they havent solved everything, but you seem to | be doing exactly what I accuse that entire field (and | academia in general) of doing - which is to insist a | problem is intractable or hard and undermine someone | potentially challenging that. When they released the 2018 | results tbey field did embrace it (for sure I'd consider | the groups organizing CASP as at least forward thinking) | but was still skeptical on how much more progress it can | make; now they blow everyone's minds again by a | monumental leap and again people want to come say of | course this is the last big jump! | | I understand the self preservation instincts that kick in | when there's a suggestion that the entire field has been | in a dark age for a while, but I hope you can see that | there might be something fundamentally wrong with how | research is done in academia and that is to blame for why | this didn't happen sooner, and why it's so hard for many | to embrace it. | | Regarding your comments on the inapplicability of this | current solution for docking, I'm sure that's the next | project they're taking up, and let's see where that goes. | | This is exactly the same type of progression that | happened with Go, where when their software bet a | professional player everyone's like "yeah but I bet he | wasn't that good". A few years later and Lee Sedol just | decided to retire. I am interested to see what happens to | that entire academic field in a similar vein, though my | interests are more in knowing how science can advance | from more people thinking this way. | whimsicalism wrote: | ML is a super overloaded term. | | There are definitely cases where machine learned statistical | solutions do not perform as well as the systems tuned by the | experts, but if you can define the task well and get the data | for a deep solution, usually those will overtake. | penagwin wrote: | This. I believe technically just linear regression could be | considered "machine learning". | misnome wrote: | I've seen people at bio conferences actively calling | linear regression machine learning. | diab0lic wrote: | This is likely because linear regression meets most | widely accepted definitions of machine learning. [0][1] | It is simple and very effective when learning in linear | space. | | [0] https://en.wikipedia.org/wiki/Machine_learning | | [1] https://www.cs.cmu.edu/~tom/mlbook.html | amelius wrote: | Curious, what are the sizes of the training and validation/test | datasets (number of structures)? | papaf wrote: | _Curious, what are the sizes of the training and validation | /test datasets (number of structures)?_ | | The proteins are shown on the CASP website [1]. Both the number | of residues and number of proteins are bigger than I expected. | | [1] https://predictioncenter.org/casp14/targetlist.cgi | piannucci wrote: | This is so cool. I hope they will also tackle the problem of | predicting RNA structures and catalytic activity. | xphos wrote: | Like this is awesome and a huge advancement but one thing that | worries me with an AI solution is that it doesn't really draw us | any closer to the why. Why do proteins fold the way they do? We | can predict the resulting structure which is extremely | significant, we have no clue why. While we get the insight of | being able to predict some structures we don't get the insight of | why things are happening the way they are. In some cases like | this it __might __not matter but in other cases that insight | might actually be way more significant than answer the problem to | begin with. Of course we can review over the problem with the | additional predictions that AI gives us but this can be | haphazardous because what if there is specific sequence spins in | some certain way that we and thus the AI has never seen and it | goes missed. I 'm not a biologist to say this is possible but I | known this kind of edge case can come up and what rabbit holes | will we go down because we only have the AI implied insight. | | disclaimer I think the contributions are super useful for science | but they do come with worries as does every path of discovery | MauranKilom wrote: | > Why do proteins fold the way they do? | | I think the _why_ is pretty clearly understood | (https://en.wikipedia.org/wiki/Protein_folding), in the same | way that we understand the _mechanism_ behind the three body | problem in physics or quantum computing. But that does | necessarily imply that there is an efficient way for us to | simulate /predict the results of having nature play out those | mechanisms. | Odenwaelder wrote: | I have no idea what you are talking about. | xphos wrote: | AI solve the process but doesn't give a whole lot of insight | into the formulas and the description what's going on. Where | we as humans have reasonably found that e = mc^2. However AI | would gives us e or m but backboxes us away from seeing that | c aka the speed of light was involved(unless we implied that | before). There might be interesting relationships that are | useful that AI unintentionally masks that could be ground | breaking if we could only understand process more | holistically. I think a different commenter eluded in this | case we think we understand protein folding well we just | struggle to synthesis it in a compact mathematical way even | though with AI we can simulate the process well for known | examples. | | The issue with AI is we don't know if our current example set | includes every case what if there is a strange sequence of | amino acid that causes something "weird" to happen that we | have haven't seen. AI cannot predict something novel it or us | haven't seen which is the issue. The process(if it exists) of | how one could solve this problem might also be exportable to | other fields if it was formulized with math rather than | estimated with AI. | deeviant wrote: | > While we get the insight of being able to predict some | structures we don't get the insight of why things are happening | the way they are. | | This isn't something specific to AI, but science itself. We | know the value of C, but now _why_ the value is C, sure we can | point to something like the Lorentz transformation, but we can | 't and probably won't even be able to explain why it has these | particular constants, we just know that we can measure them and | they are this. | | Science isn't in the business of answering why. A successful | scientific theory does two things, A) Makes useful predictions, | B) Is correct in its predictions. It'd be wrong to call a NN a | scientific theory, but it certainly does make predictions and | as these results show, it is correct in its predictions. | | Sometime soon, humanity is going to have to come to terms that | we will soon (or perhaps already have) enter an age where | mankind is not the only source of new knowledge. AI-derived | knowledge will only increase as the future unfolds and the | analysis of such knowledge will likely become it's own branch | of study itself. | naringas wrote: | > Science isn't in the business of answering why. | | I agree as long as science is a business. But why is science | a business? | | If science is not meant to answer why, does this mean we | cannot know why? | | should we just give up on having story-like (narrative) | explanations for why and how things work? it seems like we | are headed to a world where the computer just tells us what | to do and where to go. a world in which we are free from | having to think about why we are being told to do whatever it | is we're doing. click (or tap) buttons, get tokens to buy | food and pay rent. | crystaln wrote: | These are predictions. Presumably the proteins will be | inspected and the model refined and updated before we start | using DNA without first checking the output. | hailwren wrote: | There are two threads here. The first is that it would not be | surprising to learn that describing the way that proteins fold | is a very hard thing for humans to understand. See i.e. 4CT [1] | and its computational proofs. | | The second is that explainability in ML is much more tractable | than it was 10 years ago. This is not to say that it's solved, | but having solved the predictive problem -- I would expect | model simplifications and SME research to proceed more quickly | towards understanding the how now. I did some work w/ an | Astrophysics postdoc using beta-VAEs [2] to classify | astronomical observations, and simplifying models in order to | achieve human-explainability proved to not cost as much | predictive power as you might expect. It might be that the same | holds true here. | | 1- https://mathworld.wolfram.com/Four-ColorTheorem.html | | 2 - https://paperswithcode.com/method/beta-vae | [deleted] | andy_ppp wrote: | So, sorry to be a philistine but what specific discoveries will | this lead to... will it make it easier to produce antivirals or | even molecular machines? | tim333 wrote: | "DeepMind said it had started work with a handful of scientific | groups and would focus initially on malaria, sleeping sickness | and leishmaniasis, a parasitic disease" | https://www.theguardian.com/technology/2020/nov/30/deepmind-... | dluan wrote: | I worked in the lab that helped develop folding@home, as well as | the game where the crowd was the chaotically trained machine that | folded and unfolded one amino acid at a time. This feels like a | pretty significant new chapter in the humanity movie. | | A few times, I get immense pangs of jealousy for younger people a | generation or a half before me. And I'm only 30! This is one of | those times. | maxlamb wrote: | Is the team really that young? 20 year olds? | sgillen wrote: | I think he means that those who are in their teens / even | younger now will get to experience immensely cool tech in | their lifetime. | notkaiho wrote: | Who'd have thought that the kid who programmed Theme Park would | go on to do this kind of work. | uoaei wrote: | No, they didn't. They approximated a solution to protein folding. | | The two are different concepts -- this isn't the typical HN | pedantry. | | "Solving" the problem would entail developing an interpretable | algorithm for taking a string of amino acids and determining the | 3D structure once folded. | | Approximating a solution would entail simulating that algorithm, | which is what their neural network is doing. It is of course | usually accurate, but you would expect this with any suitable | universal function approximator. | | Props to DeepMind and congrats to CASP but is it not obvious that | this is more hype-rhetoric for public consumption? | stupidcar wrote: | > this isn't the typical HN pedantry | | This is the absolute definition of it. | visarga wrote: | > "Solving" the problem would entail developing an | interpretable algorithm | | It looks like you'd like a grokable solution, but the problem | might be just too complex to grasp for the human brain. | "Solved" means they solved the protein puzzles on the official | benchmark. | | > but you would expect this with any suitable universal | function approximator | | Yeah, it's just that easy. Function approximator, engage! It | took a team of Deep Mind researchers, two years and God knows | how much compute. The universal function approximation theorem | doesn't also say how to find that network. | uoaei wrote: | > the problem might be just too complex to grasp for the | human brain | | Maybe all at once, but having a self-consistent, unified | theory is very important. | | We can't understand the full brain, but we can understand the | essential components and how they work together. This still | constitutes "interpretable". | | > The universal function approximation theorem doesn't also | say how to find that network. | | Correct, and irrelevant. | [deleted] | deeviant wrote: | > this isn't the typical HN pedantry. | | Then launches into what can only be recognized as an exercise | in pedantry. | uoaei wrote: | "Pedantry" implies that the distinction is not meaningful. | | This is true if you're only paying attention to how this | system can be utilized to answer questions posed to it. | | This achievement by itself, however, does not do much to push | the _science_ of protein folding much further. Those advances | will come when people poke, prod, and break the model to | develop a unified theory for protein folding. | deeviant wrote: | The "science" of protein folding has a primary goal: to | predict the structure of a protein given it's constituent | parts. | | This is what alphaFold does, and it's been verified to | produce results at an apparent accuracy at or above | something like X-ray protein crystallography. The advances | will come, after these results are validated and accepted | by the scientific community as whole, simply when groups | start using this technique to immediately access the | structure of proteins that in the past would be | prohibitively expensive and time consuming or down right | impossible to access before, and then use that knowledge to | do their work. | | You seem to think the first thought a researcher will have | after this becomes widely available is, "Oh hey, I can now | accurately predict the shape of an arbitrary protein which | unlocks untold potential scientific progress on numerous | scientific fronts, but the thing I want to spend my time on | is trying to replicate the results of the network myself, | so I can do it manually thousands of times slower...", | which is patently inane. | uoaei wrote: | This model will be an amazing tool toward a science of | protein folding, but we have not "solved" protein folding | as long as that remains elusive. | kkoncevicius wrote: | This is exactly right. It's like saying you solved chess | because for each configuration of pieces on the board you can | use machine learning to predict whether that position can be | achieved with valid chess moves. With 90% accuracy. | astroalex wrote: | The distinction you're making between "solved" and "closely | approximated" makes logical sense to me. However, if I'm | interpreting the AlphaFold results correctly, this distinction | isn't practically significant, right? | | If you can approximate an algorithm with error that is "below | the threshold that is considered acceptable in experimental | measurements" (to quote another HN comment), then you have | something as good as the algorithm itself for all intents and | purposes. | | Therefore the use of the word "solve" doesn't qualify as hype- | rhetoric, and the distinction you're making does seem somewhat | pedantic (even if technically true). | | (I'm speaking as someone with only the tiniest amount of | stats/ML experience, so I could be totally wrong!) | colonelcanoe wrote: | It might be the case that the relevant, practical threshold | now tightens. For example, perhaps it is easier to | experimentally verify a protein shape predicted by an | algorithm than it is to experimentally determine the protein | shape? | intpx wrote: | exactly. Even an incomplete map with somewhat limited | resolution makes navigation a hell of a lot easier than | flying blind. This effectively is a data reduction | solution-- if you have a fuzzy shape of the thing you are | trying to model, and you learn the mechanics better with | each thing you model, your ability to quickly and | accurately reach a goal improves | uoaei wrote: | That's true and also is not what is being challenged by | my comment. | robocat wrote: | From: https://www.sciencemag.org/news/2020/11/game-has- | changed-ai-... | | "The organizers even worried DeepMind may have been | cheating somehow. So Lupas set a special challenge: a | membrane protein from a species of archaea, an ancient | group of microbes. For 10 years, his research team tried | every trick in the book to get an x-ray crystal structure | of the protein. 'We couldn't solve it.'" | | "But AlphaFold had no trouble. It returned a detailed image | of a three-part protein with two long helical arms in the | middle. The model enabled Lupas and his colleagues to make | sense of their x-ray data; within half an hour, they had | fit their experimental results to AlphaFold's predicted | structure. 'It is almost perfect,' Lupas says." | uoaei wrote: | Practically there is little difference if all you're | interested in is determining folds from protein sequences. | | The difference comes in developing a theory for generalizing | the study of protein folding as a scientific pursuit. | asbund wrote: | Exiting time | m-p-3 wrote: | I'm wondering what this means for folding@home. | great_tankard wrote: | Folding@home mostly tries to calculate protein dynamics using | already solved structures, so their work is still critical. | breck wrote: | I'm pretty sure this means they can pack it up? Or point their | infra to a different problem? | touisteur wrote: | Or just do billions of inferences per second. Next step? | matsemann wrote: | I came here wondering the same. Is this based on work done by | folding@home for instance? (As in, it used their precomputed | stuff as training data) | 0xBA5ED wrote: | Does this give the ability to engineer cures for currently | incurable diseases? | jfarlow wrote: | In short - certain ones, yes. This should be one step (that was | a bottleneck) in helping a company with a fixed budget do an | order or magnitude more 'experiments' with the same amount of | resources. Lab resources are expensive and fixed, so if you can | pre-compute what you need, you can get right to the more | powerful results. | | We design proteins for immunotherapies - this kind of thing | would help us more rapidly design our proteins (and more | efficiently use our wet-lab resources to speed existing | projects). For others, some drugs are hard build without | knowing how they will interact - this could both provide new | 'targets' to go after, but also might help prevent projects | that would otherwise accidentally target an important protein. | aparsons wrote: | Fascinating work. I wonder if this approach works to model | interactions (no reason it shouldn't). The interactions of | proteins with other proteins and well as as molecules like | lipids, water and electrolytes form the basis for cellular | processes. If that can be inferred correctly, you are looking at | the building blocks of a "human simulator". | optimalsolver wrote: | They should make this into a Kaggle competition. | | Maybe they might get an even better model. | _greim_ wrote: | At Sun back in the day our workstations tended to have fairly | promiscuous login settings, so one of my coworkers took the | liberty to launch folding@home on every machine in the org. | Listing running processes one day, I saw this thing pegging my | CPU; asked around and others had it too. A virus!?! Then he | fessed up. Kinda miffed at first but ultimately really cool, so | we let the thing keep running. That was my introduction to the | whole protein folding problem, and it's really great to see this | milestone! | dekhn wrote: | I ran Folding@Home at Google on hundreds of thousands of fast | Xeon cores for over a year. I concluded at the end that | unbiased MD simulations are not an effective use of computer | time. | t_serpico wrote: | Out of curiosity, why not? | dekhn wrote: | for the dollars invested, the amount of basic and applied | results out weren't worth it. | FrojoS wrote: | Bell Labs invented the transistor. Now this. Monopoly money at | its best! ___________________________________________________________________ (page generated 2020-11-30 23:00 UTC)