[HN Gopher] AlphaFold: a solution to a 50-year-old grand challen...
       ___________________________________________________________________
        
       AlphaFold: a solution to a 50-year-old grand challenge in biology
        
       Author : momeara
       Score  : 369 points
       Date   : 2020-11-30 13:31 UTC (9 hours ago)
        
 (HTM) web link (deepmind.com)
 (TXT) w3m dump (deepmind.com)
        
       | xbmcuser wrote:
       | This is a lot bigger than people are assuming if protein folding
       | can be done quickly and cheaply it will trickle down to a lot
       | more than medicine. It is going to advance bio fuels, food
       | production and a lot more.
        
         | shawnz wrote:
         | Imagine protein computers or protein metamaterials
        
           | leafmeal wrote:
           | I'm using mine right now to imagine one.
        
           | flobosg wrote:
           | De novo design of protein logic gates:
           | https://science.sciencemag.org/content/368/6486/78
        
       | sabujp wrote:
       | so are all these protein folding labs and projects e.g. folding
       | at home, etc essentially dead projects now?
        
       | haolez wrote:
       | Does this make Folding@Home obsolete?
        
       | elevenoh wrote:
       | So the median accuracy went from ~58% (2018) to 84% (2020) in 2
       | years?
       | 
       | Does 84% == solved?
       | 
       | Also, any low hanging frut implications for longevity tech?
        
         | dekhn wrote:
         | 100% accuracy is "solved".
        
           | randcraw wrote:
           | Solving the inverse problem would be even more valuable --
           | given a specific shape (and other biochemical desiderata),
           | what sequence of amino acids would create that protein?
           | 
           | As hard as the protein folding problem is, the inverse
           | problem is harder still. THAT is the one true grail.
        
             | dekhn wrote:
             | We "solved" this at Google years ago using Exacycle. We ran
             | Rosetta (the premier protein design tool) at scale. The
             | visiting scientist (who later joined GOogle and created
             | DeepDream) said it worked really well "I could just watch a
             | folder and good designs would show up as PDB files in a
             | directory".
        
           | gfodor wrote:
           | You can't get 100% accuracy on something for which you don't
           | or can't know the ground truth.
        
             | dekhn wrote:
             | The protein folding problem is predicated on the idea that
             | there is a ground truth (a single static set of atomic
             | coordinates with positional variances). If your point is
             | that even experimental methods can't truly reach 100% (due
             | either to underlying motion in the protein, or can't
             | determine the structure), that's more or less what Moult is
             | saying (they more or less arbitrarily define ~1A resoution
             | and GDT of 90 as the "threshold at which the problem is
             | solved").
        
         | liuliu wrote:
         | The article implies that the "ground-truth" (experimental
         | determined) structure has accuracy interval as well. Above 90%
         | is the same accuracy as what you get from experimental
         | determined results, hence the "solved" claim.
        
       | heycosmo wrote:
       | Fascinating! AlphaFold (and other competitors) seem to use MSA
       | (Multiple Sequence Aligment) and this (brilliant) idea of co-
       | evolving residues to build an initial graph of sections of
       | protein chain that are likely proximal. This seems like a useful
       | trick for predicting existing biological structures (i.e. ones
       | that evolved) from genomic data. I wonder (as very much a non-
       | biologist), do MSA-based approaches also help understand "first-
       | principles" folding physics any better? and to what degree? If I
       | write a random genetic sequence (think drug discovery) that has
       | many aligned sequences, without the strong assumption of co-
       | evolution at my disposal, there does not seem any good reason for
       | the aligned sequences to also be proximal. Please pardon my
       | admittedly deep knowledge gaps.
        
         | flobosg wrote:
         | > do MSA-based approaches also help understand "first-
         | principles" folding physics any better?
         | 
         | Not really. MSA-based approaches, as most structure prediction
         | methods, have as a goal to find the lowest energy conformation
         | of the protein chain, disregarding folding kinetics and
         | basically all dynamic aspects of protein structure.
         | 
         | > If I write a random genetic sequence (think drug discovery)
         | that has many aligned sequences, without the strong assumption
         | of co-evolution at my disposal, there does not seem any good
         | reason for the aligned sequences to also be proximal.
         | 
         | I don't think I fully understood this, but I'll give it a shot
         | anyway. If your artificial sequence aligns with others, there's
         | a chance that it will fold like them, depending on the quality
         | and accuracy of the multiple sequence alignment. Since multiple
         | sequence alignments are built under the assumption of homology
         | (all sequences have a common ancestor), it's a matter of how
         | far from the "sequence sampling space" your sequence is located
         | compared to the others.
        
           | heycosmo wrote:
           | > I don't think I fully understood this, but I'll give it a
           | shot anyway. If your artificial sequence aligns with others,
           | there's a chance that it will fold like them, depending on
           | the quality and accuracy of the multiple sequence alignment.
           | Since multiple sequence alignments are built under the
           | assumption of homology (all sequences have a common
           | ancestor), it's a matter of how far from the "sequence
           | sampling space" your sequence is located compared to the
           | others.
           | 
           | I understand that similar sequences may fold similarly
           | (although as length increases, I highly doubt it, but IDK).
           | I'm talking about aligned sub-sequences within one chain and
           | their ultimate distance from each other in the final
           | structure. Co-evolution suggests that aligned sub-sequences
           | are also proximal. But manufactured chains did not evolve,
           | therefore the assumption is no longer useful.
        
             | flobosg wrote:
             | Oh, I see! Yes, an intrachain alignment of an artificial
             | sequence does not by itself give any information about co-
             | evolution, especially since you don't know whether your
             | protein is actually folding. To assess co-evolution you
             | need a multiple sequence alignment between protein homologs
             | containing correlated mutations.
             | 
             | > I understand that similar sequences may fold similarly
             | (although as length increases, I highly doubt it, but IDK).
             | 
             | As long as the sequence similarity is kept between those
             | sequences, length is not an issue.
             | 
             | > Co-evolution suggests that aligned sub-sequences are also
             | proximal
             | 
             | What do you mean by "proximal"? Close in space, or similar
             | in structure?
        
         | ashtonbaker wrote:
         | This is a really insightful question and I need to take some
         | time to fully understand the ensuing discussion.
         | 
         | If my speculation is correct, then drug discovery should use a
         | process of genetic programming, using something like this to
         | score the resulting amino acid sequences. I'm wondering if an
         | artificial process of evolution would be sufficient to satisfy
         | the co-evolution assumption here.
        
           | flobosg wrote:
           | > I'm wondering if an artificial process of evolution would
           | be sufficient to satisfy the co-evolution assumption here.
           | 
           | In principle yes, if you can generate a significant number of
           | artificially evolved variants that are folded/functional.
        
       | ampdepolymerase wrote:
       | @dang, please combine the thread with
       | https://news.ycombinator.com/item?id=25253488
        
       | lawrenceyan wrote:
       | Earlier post on this with direct results:
       | https://news.ycombinator.com/item?id=25253488
        
       | chetan_v wrote:
       | First Nobel prize for AI from this?
        
         | aardvarkr wrote:
         | We'll know ten years from now
        
         | xgulfie wrote:
         | Hopefully they give one out for this, if only so I can say I'm
         | a Nobel Prize contributor
        
           | TRcontrarian wrote:
           | No way, you were on the team? Congrats.
        
       | hoppla wrote:
       | I am puzzled me about "AI-knowledge". Have we really learnt
       | anything? Is distilling the knowledge from AlphaFold just as a
       | hard problem as solving protein folding?
        
         | fairity wrote:
         | If you forgot how to do long division, but still had a
         | calculator, wouldn't the calculator still be useful?
        
       | EGreg wrote:
       | What happens when AI is better at everything measurable than
       | humans?
       | 
       | Better at conversation. Better at making people laugh, and
       | generate attraction or other emotions, better at motivating them,
       | and organizing movements, etc.
       | 
       | Clearly we are not ready for such an efficient system... it would
       | be a big disruption to all human organizations and relations. It
       | would start with Twitter botnets and directing sentiment.
        
       | WanderPanda wrote:
       | We indeed stand on the shoulders of a small number of giants! I'm
       | infinitely thankful for the work DeepMind is doing. Lets maybe
       | celebrate this accomplishment for one day and start being worried
       | about big tech again tomorrow. Many of the comments here usually
       | suggest that we should live in worries and fear but to my
       | knowledge there is not too much historical evidence for these
       | kind of companies turning evil.
        
       | harperlee wrote:
       | Not knowing a lot about biotechnology, I read the article and it
       | sounds great, but how big is this as a gamechanger? Can someone
       | comment on how big are the implications of this in, let's say, 5
       | years from now, on day to day life? Does this mean that biotech
       | is going to explode? Or just that drugs will come to market
       | faster, perhaps cheaper for rare diseases, but from the same
       | industry structure as always?
        
         | xyzzyz wrote:
         | My friend, who is working in crystallization lab, has told me
         | that she's gonna be claiming unemployment soon, and she was
         | only half joking.
        
           | dalke wrote:
           | She can still work on complexes, binding modes, and
           | engineered biomolecules (eg, protein-drug conjugates and
           | antisense oligonucleotide dimers) where the training data
           | isn't really there.
        
         | _RPL5_ wrote:
         | The industry process will not change. You still need industrial
         | biologists to generate and validate AphaFold structures,
         | interpret the results as part of the bigger picture, and to
         | finally design the drugs. And, then, of course you still need
         | to validate the drugs in experimental systems (first the test
         | tube, then mice, then humans).
         | 
         | So your second guess is correct - one of the steps is much
         | cheaper now, which marginally improves the entire pipeline. As
         | a result, drugs should now arrive to the market faster.
         | 
         | As a side note, I am curious what happens to the field of
         | structural biology in 10 to 15 years from now. Every research
         | university has a large structural biology department with super
         | expensive Xray/NRM/Cryo-EM machines, and armies of students who
         | routinely spend 4-6 years of their PhD trying to solve a
         | structure of a single protein. If AlphaFold works as
         | advertised, NIH will gradually shift funding to other problems.
         | 
         | (It was predicted that it'd be taxi drivers, not professors,
         | that AI got first. Ironic.)
        
           | dalke wrote:
           | > "armies of students who routinely spend 4-6 years of their
           | PhD trying to solve a structure of a single protein"
           | 
           | Back in the 1990s, when I worked on structure data, I
           | remember that at least some crystallizations were easy enough
           | they could be done as a rotation project.
           | 
           | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287266/
           | suggests that life is now a lot easier than the 1990s.
           | Quoting the abstract:
           | 
           | > Macromolecular crystallography evolved enormously from the
           | pioneering days, when structures were solved by "wizards"
           | performing all complicated procedures almost by hand. In the
           | current situation crystal structures of large systems can be
           | often solved very effectively by various powerful automatic
           | programs in days or hours, or even minutes. Such progress is
           | to a large extent coupled to the advances in many other
           | fields, such as genetic engineering, computer technology,
           | availability of synchrotron beam lines and many other
           | techniques, creating the highly interdisciplinary science of
           | macromolecular crystallography. Due to this unprecedented
           | success crystallography is often treated as one of the
           | analytical methods and practiced by researchers interested in
           | structures of macromolecules, but not highly competent in the
           | procedures involved in the process of structure
           | determination.
           | 
           | Certainly some proteins are extremely hard to crystallize,
           | and the new single-atom EM work will help a lot. But are
           | there really "armies of students who routinely spend 4-6
           | years of their PhD trying to solve a structure of a single
           | protein" these days?
           | 
           | I honestly don't know. I'm sure some do. But if so, that army
           | is pretty small compared to the vast numbers who more
           | routinely use crystallography.
        
             | t_serpico wrote:
             | Also, one important thing to realize is that AlphaFold was
             | trained largely on proteins that we were able to
             | crystallize. I'd be very curious to see how its performance
             | fares as a function of 'ease of crystallization'.
        
             | _RPL5_ wrote:
             | You aren't wrong. I got caught up making the comparison
             | between structural biologists and taxi drivers being ran
             | out of business by AI, so I ended up exaggerating the work
             | load that's addressed by AlphaFold. I should been more
             | precise.
        
         | dekhn wrote:
         | It seems unlikely there will be any large changes in life from
         | solving protein folding. Knowing the structure of a protein (or
         | really, its dynamics) is useful for identifying drugs that
         | bind, but the real bottlenecks n drug discovery and biotech are
         | elsewhere.
        
           | ramraj07 wrote:
           | If folding and docking, alongwith dynamics simulations, start
           | getting commodified, that might change things significantly
           | though. I can already start imagining project workflows that
           | are significantly streamlined without much thought, god knows
           | what other scientists would dream up when we reach those
           | steps
        
         | candiodari wrote:
         | This will allow us to discover much more about the structure of
         | the cell (of "life") at a before this unprecedented speed. We
         | should find many, many more mechanisms and targets for
         | medicine, but it takes 10-20 years to bring a new medicine to
         | market.
         | 
         | So in 5 years you'll see exactly zero new medicines pop up.
        
           | pmastela wrote:
           | I agree. The main inhibitor of speed that products of this
           | advancement will be deployed at will likely be determined by
           | local policies. Though, given just how profound some of the
           | impacts on medicine might be, the speed at which they can be
           | deployed might become a matter of national security (a
           | healthier population bodes well for a healthier economy which
           | in turn strengthens national security). Hopefully this
           | competition shortens the time-to-market for all these new
           | medicines.
        
           | piyh wrote:
           | No new medicines, but way more biotech tools. Higher yield
           | GMO plants, foundational research into disease, science
           | backed recommendations for lifestyle changes to avoid disease
           | that previously eluded us, some crazy stuff happening in
           | animal models. The progress in biotech the past 20 years
           | makes moore's law look slow.
        
         | nabla9 wrote:
         | Getting from DNA structure from tissue samples is relatively
         | straight forward. DNA -> RNA -> unfolded protein is basically
         | one-to-one mapping in most cases. How protein functions depends
         | on how it folds into itself. Once you solve protein folding,
         | you can take DNA sample and see the structure of the molecule
         | without working in lab using crystallography techniques.
         | 
         | Solving protein folding is huge, Nobel in chemistry scale
         | achievement. It would be massive leap for biochemistry.
         | 
         | It seems that Deep Mind solved competition benchmark and made
         | huge leap, but it's just partial solution that works on limited
         | set.
         | 
         | After you have solved protein folding, there is still problem
         | of solving chemical interactions between molecules accurately.
         | Quantum chemistry is extremely compute intensive.
        
           | ivalm wrote:
           | This is still for proteins that fold without chaperons, but I
           | guess it does cover a lot.
        
         | comicjk wrote:
         | The most accurate technique in computational drug discovery is
         | protein-ligand binding prediction (https://blogs.sciencemag.org
         | /pipeline/archives/2015/02/23/is...). Given the protein
         | structure, you can predict which molecules will bind with it,
         | even for molecules which have never been sythesized. Many
         | protein targets have not been amenable to this because we don't
         | know what the potential binding pockets look like. That set of
         | proteins will now drastically shrink. We're going to have a lot
         | of new drug candidates, and with any luck new drugs, come out
         | of this.
        
         | shoguning wrote:
         | IMO, this is huge. One of the biggest applications of ML to
         | science that I know of for sure. People used to manually
         | crystallize proteins at great effort to solve for structures.
         | 
         | Of course, there is a caveat. The static, crystallized
         | structure is only one aspect of a protein. The dynamic behavior
         | dissolved in H2O, at different pH, different ionic strength,
         | with different ligands/cofactors are all also important, and
         | not (afaik) directly addressed by this research.
        
         | fabian2k wrote:
         | Protein folding is a big and important problem, so this is
         | certainly big news if it works as well as it seems. But I
         | wouldn't assume that this changes everything, we can already
         | determine how proteins fold by experimental work. The
         | disadvantage is that this is a lot of work, though the methods
         | there also improved a lot.
         | 
         | One question is how robust the predictions are that DeepMind
         | produces. I would also assume that right now it can't e.g.
         | determine protein structures in the present of other small
         | molecules, or protein complexes. A lot of the interesting stuff
         | lies in the interactions between molecules.
         | 
         | And in general in life sciences any new development will take
         | at least a decade until it hits day to day life, likely even
         | more. We're living with a exception to this rule right now due
         | to the pandemic, but in general things take quite a bit of time
         | in that space.
        
           | gulperxcx wrote:
           | but how would this affect day to day life, though? Not how
           | long you think it will.
        
           | derefr wrote:
           | We can already determine how _a few_ proteins (170k -- which
           | sounds like a lot, but which is only 0.09% of all currently-
           | catalogued protein sequences) fold by experimental work.
           | 
           | What an accurate model of protein folding allows us to do, is
           | to take our big database of DNA, predict protein foldings for
           | _all_ of it, and then stand up a _search index_ for this
           | database, keying each amino-acid  "row" by the "words" of its
           | predicted protein's structural features.
           | 
           | We could then, with a simple search query that executes in
           | O(log n) time, find DNA targets that produce molecules with
           | interesting structures that might be worthy of study.
           | 
           | This would, for example, be a game-changer in how
           | biopharmaceutical macromolecule-therapy R&D is conducted.
           | Right now we have to notice that some bacterium or another
           | produces some interesting protein, _and then_ engineer a
           | bioreactor to get more of that protein. With this tech, we
           | can work backward from an _entirely hyothetical, under-
           | specified_ "interesting protein", to figure out what
           | catalogued-but-unstudied DNA sequences produce never-before-
           | catalogued proteins that fit that particular functional
           | "shape", and therefore might do the interesting thing. Then
           | we can either directly synthesize that same DNA, or find the
           | organism we originally sampled it from and study it more.
        
             | btilly wrote:
             | _We can already determine how a few proteins fold by
             | experimental work._
             | 
             | Where "a few" is around 0.1% of the known 180 million
             | proteins. So a relative few and a whole lot.
             | 
             | But the catch is which proteins could we figure out by
             | experiment, and which not. In particular membrane proteins
             | are hard to experimentally determine. But knowing how they
             | fold is very important for figuring out how to get things
             | to react with or get through membranes such as cell walls.
             | Which is an important problem for everything from
             | understanding how viruses work to targeted delivery of
             | drugs. We now have a way to find those structures.
        
             | fabian2k wrote:
             | "A few" does appear quite dismissive of the enormous
             | amounts of effort in structural biology so far. There are
             | more than 170,000 structures in the PDB right now.
             | 
             | To determine potential targets for drugs we have to
             | understand what the proteins do. Having the structure is
             | not really enough for that, it doesn't tell you the purpose
             | of the protein (though it certainly can give you some
             | hints).
             | 
             | In most cases the proteins were determined to be
             | interesting by other experiments, and then people decided
             | to try and solve their structure. So the structures we
             | already solved are also biased towards the more
             | biologically relevant proteins.
        
               | entropicdrifter wrote:
               | 170,000 is three orders of magnitude less than the number
               | of recorded protein sequences. I don't think it's
               | dismissive to describe that as comparatively few.
        
               | flobosg wrote:
               | Structure is much, much more conserved than sequence. In
               | other words, protein sequences with low sequence identity
               | can fold similarly due to the physical constraints that
               | guide protein folding.
        
               | ClumsyPilot wrote:
               | I don't know the field, and I understood 'a few' as like
               | a dozen, certainly not in the thousands.
               | 
               | Anyone uninitiated with think the same, and thise already
               | informed. Well, they are already informed.
        
               | ALittleLight wrote:
               | I also don't know the field and the opposite concern is
               | that 170,000 sounds like a lot, but, apparently, it's a
               | relatively small amount compared to the number of
               | proteins there are. It makes sense to me to refer to it
               | as a small number - e.g. "That hard drive is tiny." "No,
               | it stores several million bytes..."
        
               | derefr wrote:
               | 170k is "a few" compared to 180 million (i.e. the size of
               | the PDB as soon as someone runs AlphaFold over everything
               | in the UniProt.)
               | 
               | > In most cases the proteins were determined to be
               | interesting by other experiments, and then people decided
               | to try and solve their structure.
               | 
               | Yes, that's what we're doing _right now_ , because
               | structure is not a useful predictor, _because_ we don 't
               | have structure available in advance of studies on the
               | protein itself. There was no point to a "functional
               | taxonomy" of proteins, because we were never trying to
               | predict with protein-structure as the only data
               | available.
               | 
               | In a world where protein structure is "on tap" in a data
               | warehouse, part of the game of bioinformatics _will_
               | become  "structural analysis" of classes of known-
               | function proteins, to find functional sub-units that do
               | similar things among all studied proteins, allowing
               | searches to be conducted for other proteins that express
               | similar functional sub-units.
        
               | Rochus wrote:
               | It's a step forward for sure, but structures change over
               | time to perform their function. The method described here
               | only returns a static structure. Much more research and
               | development is needed to be able to predict the dynamic
               | behavior and interplay with other proteins or RNA.
        
               | AlexCoventry wrote:
               | > as soon as someone runs AlphaFold over everything in
               | the UniProt
               | 
               | It'll take a while before those results can be trusted,
               | though, right? There's probably a selection bias in the
               | training data for proteins which are easy to crystallize,
               | so many proteins probably aren't well represented by the
               | training examples.
        
               | fabian2k wrote:
               | Determining what a protein structure does might be even
               | harder than folding. Right now we can't really do that ab
               | initio, you have determine the activity in the lab and
               | then look at the structure. And that allows you to
               | potentially identify this motif in other proteins.
               | 
               | If someone produces an AI that you give a sequence and it
               | tells you what the protein does exactly, I'd be extremely
               | impressed. I don't see that happening soon.
               | 
               | The specifics matter a lot here. We can often determine
               | rough functions for subdomains by homology alone. But
               | that really doesn't tell you the full story, it only
               | gives you some hints on what that protein actually does.
        
               | jeffxtreme wrote:
               | Five years ago, I would have said the following:
               | 
               | "If someone produces an AI that you give a sequence and
               | it tells you the protein conformation, I'd be extremely
               | impressed".
               | 
               | Sure there are many more things to solve in this space;
               | but that doesn't take away that this is an impressive
               | achievement and does unlock quite a few things (including
               | making more tractable the problem you just brought up).
               | I'm excited to see what DeepMind works on now and what
               | the new state of the world will be just five years from
               | now.
        
               | fabian2k wrote:
               | I think I have to clarify that my response was to a large
               | part to the "this will change all our lives" part, and
               | might look too negative on its own. I'm very, very
               | impressed by these results, but that still doesn't mean
               | that we just solved biology. If this works that well on
               | folding, this could mean that a lot of other stuff that
               | simply didn't work well in silico might come into reach.
               | 
               | I'm maybe overcompensating for the tech-centric
               | population here, with some comments speculating for very
               | near and drastic impacts from discoveries like this.
               | Biology and life sciences are much slower, and there's
               | always more complexity below every breakthrough. That
               | does tend to push me towards commenting with the more
               | skeptical and sober view here.
        
               | whatshisface wrote:
               | My understanding of this is not perfect, but wouldn't
               | answering the "actually does" question require a full
               | biomolecular model of the cell, or even the whole
               | organism? If so I see what you mean. I suppose that it
               | might be possible to get around this by improving the
               | theory of catalysts so that you could look at a site and
               | say, "oh, this will act in such a way..." Dynamic quantum
               | simulation of a few atoms at the active site is hardly
               | easy but a far sight easier than the other.
        
             | ghostpepper wrote:
             | This does indeed sound like a game changer then, if true
        
             | IgniteTheSun wrote:
             | Considering that this system "uses approximately 128 TPUv3
             | cores (roughly equivalent to ~100-200 GPUs) run over a few
             | weeks" to determine a single protein structure, making
             | predictions for all proteins encoded in a human genome
             | seems impractical at this stage. With luck, this advance
             | will help lead to discovery and definition of new folding
             | rules and optimizations that will make protein folding
             | predictions for the whole human genome more tractable.
        
               | mrDmrTmrJ wrote:
               | I think it is possible to make predictions for all
               | proteins encoded in the human genome. Perhaps you misread
               | a very long and confusing sentence?
               | 
               | Background, Neural networks have two modes 1) training -
               | where you learn all the model weights and 2) inference -
               | where you run the model once on new data. Training takes
               | takes a long time, because you're computing derivatives
               | to implement updates rules on millions or billions of
               | parameters based on iteratively examining massive
               | datasets. Inference is extremely fast because you're just
               | running matrix multiplies of those parameters on new
               | data. And TPUs/GPUs are specially designed to compute
               | matrix multiplies.
               | 
               | The article said: "We trained this system [...] over a
               | few weeks." I searched for, but did not see them identify
               | the inference time. I do expect inference time to be well
               | under one second, though I'm not personally experienced
               | with running inference on this type of network
               | architecture.
               | 
               | For comparison, GPT-3 and AlphaStar have month long
               | training times and real-time (sub-second) inference
               | times.
        
               | Rochus wrote:
               | Still much faster than synthesizing the protein and then
               | doing NMR or cristallography to solve the structure
               | puzzle what easily takes half a year or more (and very
               | expensive equipment).
        
               | sanxiyn wrote:
               | That's training time, not inference time.
        
               | [deleted]
        
               | foota wrote:
               | My reading based on context was that this was time to
               | train, not time to predict.
        
             | FredFS456 wrote:
             | There are post-translational modifications to proteins.
             | This means that for many (most?) proteins, the amino acid
             | chain sequence is different from what you would predict
             | from the DNA. These modifications are dependent on the
             | state of the cell at the time of translation, and so cannot
             | be predicted from the DNA alone. This means that even with
             | a 100% accurate folding model, we cannot simply know the
             | shapes of all the proteins inside the human body based on
             | the genome.
        
             | carlob wrote:
             | Here is another interesting approach in synthetic protein
             | building:
             | 
             | https://science.sciencemag.org/content/369/6502/440.abstrac
             | t
        
         | baybal2 wrote:
         | One young lady I knew worked on neural algos recognition of
         | X-ray images.
         | 
         | They always had single digit, bizarre artifacts, where the
         | program can't sometimes recognise the very data it was trained
         | on with most minute differences.
         | 
         | Other artifact was that the most "stereotypical cases" were
         | least reliably recognised, and they hot a lot of flak for
         | screwed up live demos, where a radiologist put a very, very
         | obvious tumor shot onto the scanner, and it didn't work without
         | a half an hour of wiggling the film, and a camera.
         | 
         | The "bruteforce" solution may well be always, 80-85% off, but
         | off consistently, and always. NN algo so far beat them, but
         | fail with double digit frequencies on "artifacts" which they
         | themselves can't do anything about.
         | 
         | How well it deals with the later, is what I believe will
         | measure its real world usefullness.
        
           | tuatoru wrote:
           | I agree. The failures have to be explicable if we are to
           | trust a model.
        
             | asah wrote:
             | Doesn't it depends on the application ? i.e. some
             | applications can tolerate false positives/negatives ?
        
               | baybal2 wrote:
               | May well be, but if you spend more compute, and human
               | time checking for those corner cases than if you went
               | with another, more consistent exhaustive search algorith,
               | then the method looses to it economically.
               | 
               | This is more the case the more close to bruteforce you
               | come, like encryption cracking. Imagine, spending years
               | of HPC cluster time, trying to break a password, while
               | knowing you have a single digit chance to miss the right
               | key, in a way which would be completely impossible with
               | with a conventional solution.
        
           | klmadfejno wrote:
           | I find this disingenuous. Yes, its important that the algos
           | can perform well on real world data, but the framing of this
           | post begins with an anecodote about one person who had a bad
           | model, and implicitly extrapolates that these problems are
           | generalized throughout all neural nets.
           | 
           | One could say the same thing about programmers automating a
           | task, or a number of other trivial examples. I would lean
           | towards assuming deep mind has competent model validation
           | teams vs. not, even if data science is hard.
        
         | npunt wrote:
         | In short, a core problem of biochem (the wagon) was just
         | hitched to Moore's law (the horse). Our understanding of
         | proteins will now grow exponentially not linearly, helping us
         | to move up a level of abstraction to higher level biochemistry
         | and biology problems.
        
         | breck wrote:
         | I never worked directly with protein folding or structure, but
         | worked a bit in proteomics on teams measuring gene expression
         | (which you could roughly think of as how much of each protein
         | is found in this cell). IIRC there are 50,000 - potentially
         | millions of "kinds" of proteins found in a human, and the
         | "shape" of most of them is unknown, and that determines a lot
         | about how they work.
         | 
         | So imagine you gave an iPhone to someone in the 1800's, they
         | wouldn't understand how most of it works, but this may be
         | analogous to them finally figuring out some key aspects of the
         | transistor. So it's another tool in the toolbelt and like all
         | good tools will be used in all sorts of unpredictable ways.
         | 
         | Someone else I'm sure could do a lot better at explaining how
         | important shape is to understanding the function and behavior
         | of proteins.
        
       | fogleman wrote:
       | How will this get into the hands of those who could use it?
        
         | sanxiyn wrote:
         | Realistically speaking, if you are a scientist who could use
         | this and you mailed DeepMind, they will probably run it for
         | free and send you the result. It would be a good PR.
        
       | jeffbee wrote:
       | Pretty interesting that they only used about $15k worth of
       | resources (retail price) to achieve this. It's not a technique
       | that would have been out of reach for other organizations based
       | only on not being able to afford the compute.
        
         | moritonal wrote:
         | The tech might not be out of reach but the talent pool is.
         | 
         | Whether it's good PR or not is to be debated, but it seems that
         | the talent at DeepMind simply can accomplish things other's
         | can't.
        
         | allenz wrote:
         | That's only for the final model. To find it, they'd need to run
         | 1,000 experiments, trying many high-level approaches, many
         | architectures for each component, hyperparameter search, and
         | multiple seeds. Large machine learning projects need $10M in
         | capital.
        
           | jeffbee wrote:
           | I bet it's still a lot less than they spent training
           | AlphaStar.
        
         | ducttapecrown wrote:
         | How much would the labor cost, though?
        
         | mdjt wrote:
         | Based on the going rate of a 32-core TPUv3 slice ($32/hr USD)
         | running "for a few weeks", isn't this closer to $65k USD?
        
           | entropicdrifter wrote:
           | One could buy 200 GPUs for cheaper, I think that's where the
           | other comment's price estimate came from.
        
           | jeffbee wrote:
           | It says $1,752/mo for v3-8, so I just multiplied it 8x.
        
             | mdjt wrote:
             | Fair enough, that calculation is still a bit off if they
             | used 128 cores (16x instead of 8x). Not that it really
             | matters...
        
         | epsylon wrote:
         | I'm pretty sure that this took more than 1 junior engineer-
         | month.
        
       | seek3r wrote:
       | Kudos to DeepMind. I'm eager to read their paper.
        
       | wespiser_2018 wrote:
       | This will undoubtably change our understanding of human health
       | and biology in many impactful ways in the years to come!
       | 
       | The same information we get through x-ray diffraction will now be
       | available 100x or even 1000x cheaper, and using this model can
       | even aid the interpretation of xray diffraction data!
       | 
       | What excites me most isn't doing what we can do now, for cheaper
       | (which will surely lead to more effective research methods), but
       | the potential to gain a systematic view of protein structures,
       | either across the genome, species, or through time which will
       | give us a deeper and more fundamental understanding of biology.
        
       | mensetmanusman wrote:
       | This is amazing, if we can simulate multi-protein interactions,
       | you could imagine in our lifetimes being able to see a fully
       | computation driven simulation of a human blood cell. That would
       | be a huge breakthrough.
        
         | visarga wrote:
         | What amazed me most was that they used hundreds of millions of
         | unlabelled protein scans. This means we can collect massive
         | data in a new modality, besides the usual suspects: images,
         | video, audio, text, lidar and sensors. Soon I expect neural
         | implant data to be massive as well.
         | 
         | They surely did unsupervised training on raw data and then
         | fine-tuning on the 170K labelled sequences. I expect the data
         | volume could be increased by orders of magnitude in the next
         | couple of years and we'll see a GPT-3 like jump.
        
       | hsnewman wrote:
       | That's kinda a big deal.
        
       | dang wrote:
       | Url changed from
       | https://predictioncenter.org/casp14/zscores_final.cgi, which
       | points to this.
        
       | 6gvONxR4sf7o wrote:
       | I hate headlines like "X has solved Y." How often have we see
       | computer vision and natural language solved at this point,
       | whenever a model does well enough in a benchmark? Their own
       | article doesn't even have that headline. This is a massively cool
       | thing that's happened. Why ruin it with a massively hyperbolic
       | headline?
        
         | falcor84 wrote:
         | I don't think I ever saw a headline saying natural language is
         | solved; who's claiming that?
        
         | TheRealPomax wrote:
         | Because only the experts in this field get to tell us, the
         | laymen, what "solving the protein folding problem means", and
         | they defined it not as "perfect" but as "more than good enough
         | to be acceptable as correct result". Which this did.
         | 
         | X has _actually_ solved Y. That 's not so much "massively
         | cool", that's historical.
        
           | 6gvONxR4sf7o wrote:
           | I think the "they" you're referring to is only whatever PR
           | person wrote the headline. Nowhere in the substance of this
           | (PR!) post does it refer to it as anything but a great leap.
           | When an expert in the field outside of deepmind says protein
           | folding has been solved, I'll believe it.
        
             | nharada wrote:
             | It does appear other experts in the field are claiming
             | this:
             | https://twitter.com/MoAlQuraishi/status/1333383769861054464
        
         | danaris wrote:
         | The "solved protein folding" part isn't even in the article. It
         | appears to be clickbait editorialization by whoever submitted
         | the link.
        
       | cs702 wrote:
       | Two years ago, after DeepMind submitted its first set of
       | predictions to CASP (Critical Assessment of protein Structure
       | Prediction), Mohammed AlQuraishi, an expert in the field, asked,
       | "What just happened?"
       | 
       | https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp...
       | 
       | Now that the problem of static protein structure prediction has
       | been _solved_ (prediction errors are below the threshold that is
       | considered acceptable in experimental measurements), we can
       | confidently answer AlQuraishi 's question:
       | 
       | Protein Folding just had its "ImageNet moment."
       | 
       | In hindsight, AlphaFold v1 represented for protein structure
       | prediction in 2018 what AlexNet represented for visual
       | recognition in 2012.
        
         | dmix wrote:
         | > However, if the (AlphaFold-adjusted) trend in the above
         | figure were to continue, then perhaps in two CASPs, i.e. four
         | years, we'll actually get to a point where the problem can be
         | called solved, in terms of gross topology (mean GDT_TS ~ 85% or
         | so). Interesting prediction within.
         | 
         | It turned out only to be one more year instead of four
         | (depending on whether getting to the 90~ range is "solved".
         | 
         | I'm curious to see if AlphaFold can do even better the next two
         | years.
         | 
         | Those last mile percentages always tend to be small anyway.
        
         | xral wrote:
         | AlQuraishi's tweet [0] about this:
         | 
         | > CASP14 #s just came out and they're astounding--DeepMind
         | looks to have solved protein structure prediction. Median
         | GDT_TS went from 68.5 (CASP13) to 92.4!!!! Cf. their 2nd best
         | CASP13 struct scored 92.8 (out of 100). Median RMSD is 2.1A. I
         | think it's over
         | https://predictioncenter.org/casp14/zscores_final.cgi
         | 
         | [0]:
         | https://twitter.com/MoAlQuraishi/status/1333383634649313280
        
           | elwell wrote:
           | > https://predictioncenter.org/casp14/zscores_final.cgi
           | 
           | `.cgi`... we've come full circle
        
           | matsemann wrote:
           | What does that A mean? Never seen our letter been used in a
           | scientific context.
        
             | seslattery wrote:
             | It's the symbol for Angstrom, a unit of length 10^-10m
             | https://en.wikipedia.org/wiki/Angstrom
        
             | kolinko wrote:
             | 0.1nm - approximately a size of an atom - used in organic
             | chemistry often.
        
             | smt1 wrote:
             | It used a lot when systems are examined at the nano-scale.
             | Metrification and creating a "fubini's theorem" for a
             | specific problem to measure something (indeed category
             | theory is useful for building a localized "global wire"
             | with appropriate "gauges" of interesting where optimization
             | methods will work (to achieve the non-equilibrium control-
             | theoretic orient-folds of interesting of whatever "the
             | soln" is) with enough "space" to "try" pull-backs and push-
             | forwards as needed (for a class/family of physically
             | analogous of data). I thinking looking at things trough
             | Joseph Fourier's eyes is pretty englightening. He sees to
             | have ideated both the heat transfer problem (and being able
             | to apply modern methodology by forming distributed or
             | sparse representations of it, then assessing the non-linear
             | dynamics of it modern robotics and mathematics senses to
             | it, which would be very much applying pfaffian dynamics to
             | me, and being able to know about cohomogies is a blessing
             | such that the appropriate physical effect where the maximum
             | likelihood is constained). is important in both scale free
             | systems, fibers of networks of systems that need to be
             | localized (this is approximately global sections of global
             | optimization but then model indentified), mass effect which
             | require some sort of techno-economic analysis (think the
             | climate resilience problem) and (historically, I think
             | COVID will shift that) lack of progress towards applied
             | coding in the life sciences vs information sciences. What's
             | pretty surreal to me is that exploring (and documenting
             | some of interesting blurs between fields), say like math,
             | physics, statistics, computer sciences, signal processing,
             | natural language (even of language of scientific
             | discourse), renormalization methods, naturalizations,
             | socializations, and what are global/local laws lets you
             | almost do a approach it as a "reverse Robin Hood" problem.
        
             | flobosg wrote:
             | Angstrom, a length unit. 1 A = 0.1 nm.
        
         | softwaredoug wrote:
         | > I don't think we would do ourselves a service by not
         | recognizing that what just happened presents a serious
         | indictment of academic science.
         | 
         | Much like other fields, I do begin to question the academic
         | structure to making advances. It appears something is rotten in
         | the state of academia. Oddly it's academia doing incremental
         | improvements to existing methods but industry making novel
         | leaps and bounds... The other major case in point being NLP
        
           | codingslave wrote:
           | Academia keeps employing people who have done well in classes
           | and within fine bounds. Its a careerist track. Industry cares
           | about results, its more meritocratic
        
           | ac42 wrote:
           | I think so, too. Linear algebra, control theory and quantum
           | mechanics haven't gotten us anywhere and ivory towers prevail
           | as this machine learning solution to a problem in biological
           | chemistry clearly demonstrates. /s
        
         | flobosg wrote:
         | AlQuraishi described the progress made in CASP13 (2018) as "two
         | CASPs in one". This one is an even bigger breakthrough.
        
           | Seanambers wrote:
           | I particularly like the rant on pharmaceuticals companies
           | lack of basic research. My impression has been that medical
           | progression have been slow for quite some time, nice to see
           | that there are some truth to that.
           | 
           | In the end software and tech companies might just eat up the
           | pharmaceutical industry as well. - It's all just code at some
           | level.
           | 
           | The Deepmind team did this with ;
           | 
           | "We trained this system on publicly available data consisting
           | of ~170,000 protein structures from the protein data bank
           | together with large databases containing protein sequences of
           | unknown structure. It uses approximately 128 TPUv3 cores
           | (roughly equivalent to ~100-200 GPUs) run over a few weeks,
           | which is a relatively modest amount of compute in the context
           | of most large state-of-the-art models used in machine
           | learning today."
           | 
           | So it wasn't out of reach for academia, pharmaceuticals, or
           | others with a bit of resources.
        
             | flobosg wrote:
             | Yeah, it was a big slap in the face. But, to be fair, most
             | of the scientific and technological advances (sequencing
             | efforts, structural genomics projects, etc.) that generated
             | the data used by DeepMind came from academia and, to a
             | lesser extent, the pharma industry.
        
               | sjg007 wrote:
               | I think the lesson here is that most of the big data
               | genomic, metabolic, pharmacologic and other research will
               | _all_ be driven by deep learning. The models themselves
               | however require 100+ gpus so we are sort of back in that
               | phase where you need large compute systems to even
               | compete. A single lab will have issues unless they can
               | leverage a cloud and then also get grant funding to spend
               | that money on the cloud compute... which may be difficult
               | b /c its basically a consumable now and you don't have
               | any hardware leftover.
        
             | throwawayiionqz wrote:
             | This is the cost of training the final architecture with
             | all the refinements enabled by years of research.
             | 
             | These years of research involved trying many different
             | architectures, many of which received as much or more
             | compute time than the final system.
             | 
             | The price of training the final architecture is
             | meaningless. Researching and training AlphaGo was expensive
             | but it enabled the ideas and development of AlphaZero which
             | is more computationally tractable.
             | 
             | To have any chance, an academic team would need the same
             | compute resources as what the DeepMind protein folding team
             | used during the whole development of the architecture
             | during the last few years, not only the resources used to
             | train the final system. And I bet this funding is not
             | available to most if not all academic teams.
        
               | mjn wrote:
               | Even if you try to account for the overall R&D cost,
               | DeepMind isn't _that_ large an organization by the
               | standards of biomedical research. It 's very big and well
               | funded for a _computer science_ research organization,
               | yes, and most CS departments can 't match its resources.
               | But the NIH budget is $40 billion, and private
               | pharmaceutical companies do another $80 billion in annual
               | R&D. It's interesting that this kind of breakthrough
               | didn't come from those sectors.
        
               | dekhn wrote:
               | DeepMind is taking advantage of NIH's funding. For
               | example, Anfinsen who demonstrated that proteins fold
               | spontaneously and reproducibly
               | (https://en.wikipedia.org/wiki/Anfinsen%27s_dogma) ran a
               | lab at NIH. Levinthal (who postulated an early and easily
               | refutable model of protein folding) was funded by NIH for
               | decades. Most of the competitors at CASP are supported by
               | NIH and its investments have contributed to the modern
               | results significantly.
               | 
               | That said I think the academic and pharma communities had
               | engineered themselves into a corner and weren't going to
               | see huge gains (even thogh they are exploring similar
               | ideas) for a number of banal reasons.
        
               | WanderPanda wrote:
               | It seems like spending these government funds on creating
               | new challenges like CASP and ImageNet could have an
               | enormous ROI. Don't let them try to choose the winner,
               | just let them define the game
        
               | mjn wrote:
               | That's a good point; this system certainly didn't come
               | from nowhere! The protein datasets they used also mostly
               | came out of various NIH-funded projects.
               | 
               | What I meant to focus on was that I think DeepMind has
               | less of a pure money/scale advantage in this area than in
               | some others. In something like Go or Atari game-playing,
               | there are many academic groups researching similar
               | things, but their resources are laughably small compared
               | to what DeepMind threw at it. So you might argue that
               | they got good results there in part because they directed
               | 1000x the personnel and compute at the problem compared
               | to what any academic group could afford. In biomed
               | though, their peers in academia and industry are also
               | pretty well-funded.
        
               | dekhn wrote:
               | Personally I think a major part of the secret sauce is
               | Google's internal compute infrastructure. When I was an
               | academic, 50% of my time went to building infra to do my
               | science. At Google, petabytes of storage, millions of
               | cores, algorithms, and brains were all easily tappable
               | within a common software repo and cluster infrastructure.
               | That immediately translates to higher scientific
               | productivity.
        
               | smt1 wrote:
               | I agree. What's doubly interesting is google's internal
               | transparency, and open source first policy. I think it's
               | probable that that effect spreads and creates fly wheel
               | effects for life, natural sciences, and behavioral
               | sciences. Keep in mind that they've also absorbed
               | effectively the R&D side of Bell Labs from a computer
               | science /distributed computing point of view, gopher is
               | pretty much that, and also in effect interesting from a
               | sociological p.o.v, "this is the shifting the resources
               | of the polyad network problem", or problems caused by
               | rapid commercializations of the World Wide Web rather
               | than physics like was originally ideated @ CERN) and
               | moving to effective effort in other fields, even if it
               | doesn't happen @ Alphabet. Hell, they could be dismantled
               | (given the FTC complainants), and probably the resultant
               | companies would rebuild like paperclips sort of like
               | Pa'Bell did post-1984.
        
               | t_serpico wrote:
               | You hit the nail on the head here.
        
               | [deleted]
        
               | asah wrote:
               | Having recently experienced both, 1000x this.
        
               | MaxBarraclough wrote:
               | Has cloud computing changed this?
        
               | dekhn wrote:
               | Mostly? I left google to work at a biotech startup
               | working in a related area and found that the big three
               | cloud providers have built systems that greatly improve
               | computational science. That said, it's still a lot of
               | work to get productive, many in the field are really
               | resistant to changes like version control, continuous
               | integration, testing, and architecting distributed
               | systems for handling complex lab production environments.
               | 
               | Here's an exemplar of how I think it evolved well in a
               | cloud world: https://gnomad.broadinstitute.org/
               | 
               | that project adopts many concepts from google and others
               | and greatly improved our analytic capabilities for large-
               | scale genomics.
        
               | zaroth wrote:
               | > _The price of training the final architecture is
               | meaningless._
               | 
               | The research is the giant shoulders you stand on, the
               | compute cost is the price of the tool you need to do the
               | present-day work.
               | 
               | Both are relevant but the shoulder's of giants are
               | generally more accessible, particularly if we're talking
               | about published research and not proprietary tech.
               | 
               | A competing team is not starting from the same place the
               | DeepMind team started at 5 or 10 years ago.
        
               | zaroth wrote:
               | To expand on this, after fully reading AlQuraishi's "What
               | Just Happened" post from a couple years ago, was this
               | point that he made;
               | 
               | > _I don't think we would do ourselves a service by not
               | recognizing that what just happened presents a serious
               | indictment of academic science. There are dozens of
               | academic groups, with researchers likely numbering in the
               | (low) hundreds, working on protein structure prediction.
               | We have been working on this problem for decades, with
               | vast expertise built up on both sides of the Atlantic and
               | Pacific, and not insignificant computational resources
               | when measured collectively. For DeepMind's group of ~10
               | researchers, with primarily (but certainly not
               | exclusively) ML expertise, to so thoroughly route
               | everyone surely demonstrates the structural inefficiency
               | of academic science. This is not Go, which had a handful
               | of researchers working on the problem, and which had no
               | direct applications beyond the core problem itself.
               | Protein folding is a central problem of biochemistry,
               | with profound implications for the biological and
               | chemical sciences. How can a problem of such vital
               | importance be so badly neglected?_
               | 
               | In short, academia got utterly schooled by a small group
               | at Google spending a relatively small dollar amount on
               | compute, using techniques that in hindsight are fairly
               | described as "simplistic". There's no way around it.
        
               | Invictus0 wrote:
               | I don't think AlQuraishi really hits the mark in his
               | critique. The mere fact that hundreds or thousands of
               | people working on a problem for decades doesn't account
               | for the fact that the field of machine learning has been
               | growing extremely rapidly over the last decade, the
               | compute power available has grown exponentially, and the
               | people working on the problem simply weren't looking at
               | the problem in the way that the deepmind people were
               | looking at it.
               | 
               | If you were trying to get across the Atlantic, this would
               | be like getting upset at a group of bridgebuilders for
               | trying to solve the problem by building a bridge across
               | instead of by inventing the airplane. The approaches are
               | that different.
        
               | flobosg wrote:
               | > and the people working on the problem simply weren't
               | looking at the problem in the way that the deepmind
               | people were looking at it.
               | 
               | >The approaches are that different.
               | 
               | I'm not sure if that analogy applies here. DeepMind
               | wasn't the first group tackling structure prediction with
               | machine learning. Their success lies in the innovations
               | that they implemented (predicting interresidue distances
               | as opposed to contacts, for example).
        
               | dash2 wrote:
               | To be fair, I'm not sure that they are "simplistic" in
               | the sense that, e.g., writing a neural network to
               | recognise cat pictures is now simplistic. I don't know
               | how many people have Deepmind levels of expertise in ML,
               | or could implement what they have done, but I doubt it is
               | many, and they are thinly spread amongst many interesting
               | problems.
        
               | craftinator wrote:
               | > The price of training the final architecture is
               | meaningless.
               | 
               | Meaningless in historical terms, but meaningful in future
               | terms. It's meaningless how long the training took
               | because there were countless resources spent to get to
               | that point. It's meaningful in the future, because we
               | know that training times are fairly short, and iteration
               | can be done fairly quickly.
        
             | beowulfey wrote:
             | I mean, credit where credit is due. Google employs some of
             | the greatest names in artificial intelligence and the
             | DeepMind team had a huge chunk of them working on this
             | problem. While the _resources_ may have been available, I
             | don't think any other single institution had the level of
             | brain power.
        
               | mrDmrTmrJ wrote:
               | Absolutely. The capability to "create" the breakthrough
               | is extremely rare. Perhaps only DeepMind, OpenAI, and
               | GoogleBrain can assemble these types of teams. Luckily,
               | the capability to replicate and exploit the breakthrough
               | is far more 'common'; though still very rare.
               | 
               | Excited to see how follow on use of these models, by many
               | more teams, researchers, and companies plays out over the
               | next two decades.
               | 
               | This is a foundational advance!
        
               | elcritch wrote:
               | It also makes one reconsider the notion that monopolies
               | are _entirely_ bad. This essentially appears to be a
               | vanity project for Google. Though of course they 'll
               | benefit from it in many ways, but it's not like they're
               | doing this as the core product of their service. It's a
               | pretty awesome achievement.
        
               | Ericson2314 wrote:
               | You've just describe why many Socialists 100 years were
               | very skeptical of anti-trust as trying to sacrifice
               | modernity to proper up a romanticized notion of the past
               | as disaggregated pure-petit-bourgeois capitalism. Really
               | not that different than the critism of the Luddites 100
               | years before that.
               | 
               | See
               | https://ilr.law.uiowa.edu/print/volume-100-issue-5/all-i-
               | rea...
        
               | bosswipe wrote:
               | Imagine we lived in a culture that did not believe
               | "government is always bad at everything". Government
               | could then pay Google-level salaries and provide Google-
               | level resources to the top minds in the world and give
               | them free rein to tackle problems like this. It's worked
               | in the past, such as Manhattan project or moon landing.
               | But I don't think it's doable nowadays because of the
               | anti-government political culture. Even when government
               | is fully funding things these days the work has to be
               | farmed out to private interests.
        
               | Supermancho wrote:
               | > It also makes one reconsider the notion that monopolies
               | are entirely bad.
               | 
               | Much like political dictators, they can be exceedingly
               | efficient and have resources (and authority) to do things
               | in spite of opposing interests.
               | 
               | People who faced with the narrative that countries have a
               | monopoly on a number of aspects of life find monopolies
               | are not a BAD THING(tm), but that they are bad for a
               | consumer market - as a monopoly eventually blockades
               | aspects of the market.
        
               | bawolff wrote:
               | Or to put another way, the kings and queens of yesteryear
               | funded a staggering amount of beautiful art, etc.
        
               | e_y_ wrote:
               | I think there's some merit to the idea that huge
               | corporate monopolies have the resources to accomplish
               | undertakings that smaller companies cannot. But it's
               | often a what-if, because we don't know what the
               | alternative might have been.
               | 
               | Big companies can suck up all the air in the room by
               | monopolizing talent and making it harder for startups to
               | pay the kinds of salaries needed for top tier AI
               | research. Xerox PARC came up with all kinds of
               | groundbreaking inventions that were never commercialized
               | (by them). For every invention that comes out of a big
               | company, it's worth thinking about whether it might have
               | actually come out faster if it was borne of competition
               | instead of a side project. Or in the grand scheme of
               | things, if corporate taxes were higher and the money was
               | given to a university research lab.
               | 
               | I think the best results may come from the middle ground.
               | Smaller/medium companies are so worried about staying
               | afloat or hitting their quarterly earnings that they have
               | trouble making long term investments. Large companies are
               | diverse and profitable enough that they can afford to
               | blow money on things that might not pan out, but they
               | don't have the same drive -- and in fact have some
               | pressure to avoid being "too" innovative because it could
               | cannibalize their existing products.
        
               | generalizations wrote:
               | Note that Bell Labs is another example of the corporate
               | monopoly research lab producing things that others
               | couldn't / didn't.
        
               | xzel wrote:
               | Look at all of the incredible things that came out of
               | Bell labs during their monopolistic reign. I think a
               | better way to put it is not all monopolies are bad for
               | research and progress but many are bad for other social
               | and economic reasons. Like any position of power, it
               | depends on how it is used snd who is using it.
        
               | soup10 wrote:
               | It's kind of like a modern day Bell Labs where they have
               | so much excess profit from adtech that they can fund lots
               | of "basic research" or the computer science equivalent of
               | that.
        
               | nightski wrote:
               | Not even a little bit. There is nothing here that would
               | require Google to be a monopoly to accomplish. If
               | anything companies become lazy without competition.
               | 
               | I feel like that is not too far from saying it makes one
               | reconsider communism because good things can happen with
               | authoritarian control.
        
             | IfOnlyYouKnew wrote:
             | In a prior(/n) life I worked on Protein folding, and
             | participated in CASP.
             | 
             | This was a/the "holy grail" problem of molecular biology,
             | long thought to be an automatic Nobel. It's somewhat unfair
             | to characterise developments prior to this as
             | insignificant. In fact by the time I was working on it,
             | that "automatic Nobel" was no longer assumed, because the
             | field had made quite a bit of progress, in many tiny steps
             | by many different groups, and the assumption was it would
             | continue in this slog until reaching some state of
             | sufficiency for practical applications without ever seeing
             | the sort of singular achievement that would be worthy of
             | praise and prize.
             | 
             | Far more went into this breakthrough, obviously, than those
             | TPU-hours: the development of those TPUs, for example, and
             | assembling a team that can make use of them. The protein
             | folding problem requires very little knowledge of biology
             | or physics to understand and was always pre-destined for
             | some outsider to sweep. Indeed, there was game that allowed
             | people to solve structures by intuition alone, and, IIRC,
             | some 13-year old Mexican kid cleaned everyone's clock some
             | years back.
             | 
             | Why didn't some research group do this first? Most of them
             | just don't have the budget. We were five people, total,
             | IIRC, and felt pretty rich because we were computer-people
             | getting the same budget for materials as everyone at our
             | institution, which was all wetlab, otherwise. So I was a
             | student being paid $20/h but with a $50,000/p.a. hardware
             | budget. How many false start does it take before you do
             | that run with 128TPUs "for a few weeks" that works? If you
             | blow your budget on one gigantic Google invoice, what's
             | going to happen to you when it doesn't pan out, and the
             | whole institute laughs at you? Etc...
             | 
             | There are quite a few rather good things this problem has
             | inspired over the years, though. Among them is CASP itself:
             | the idea of instituting a yearly competition that gives
             | unequivocal feedback on the state of the field and every
             | group working on it is rather rare, I believe, and it's
             | been successful. Indeed, it would seem that CASP was
             | necessary to attract outside groups like Deepmind, i. e.
             | deep-pocketed industry groups striving to prove themselves
             | on a clearly defined problem. Chess, Jeopardy, CASP: maybe
             | it would be worthwhile to explore not <solving x>, but
             | <stating X as a problem that attracts Google/IBM/etc.-scale
             | money> as a superior strategy in some cases.
             | 
             | There was also folding@home, pioneering the distributed-
             | donated-computing model, and the aforementioned
             | gamification of the problem, and hundreds of the most
             | intricate, custom-tailed, more-or-less insane ideas people
             | devoted months and/or careers and/or careers of their most
             | promising post-docs to that didn't pan out.
             | 
             | Like cellular automata. They don't work for this, trust me.
             | (Great hit for interactive poster sessions, though)
        
             | tonfa wrote:
             | > So it wasn't out of reach for academia, pharmaceuticals,
             | or others with a bit of resources.
             | 
             | How much does hiring a deepmind-like team cost though?
             | (massively more than the TPU resources?)
             | 
             | Still within reach of pharmaceutical industry I guess, but
             | maybe not so easy for academia.
        
               | t_serpico wrote:
               | Also, pharma does not really have a huge incentive to
               | work on this problem. Solving the protein folding problem
               | does not automatically translate to new drugs just in the
               | same way CRISPR or DNA sequencing did not. It's another
               | tool in the toolbox (which to be clear is a big deal).
        
               | Seanambers wrote:
               | From what I can gather, Google bought Deepmind for 500
               | million USD in 2014, they have outstanding debt to its
               | parent company as of 2019 of 1.3 billion USD.
               | 
               | And they had income around 100 million in 2019 but it's
               | all against Google, so looks like a 2 billion +/- 0.5
               | operation so far, and who knows if they pay for compute.
               | 
               | Other articles place the runrate at 500 million per year
               | in 2019.
               | 
               | Which means 500 million * 6 years = 3 bn + 0.5 purchase
               | price. = 3.5 bn. So somewhere in the 2.5 - 3.5 billion
               | range its seems likely as total cost so far.
               | 
               | Nevertheless doesn't seem out of reach for a
               | multinational.
        
               | sseagull wrote:
               | It would still be a significant amount of money for a lot
               | of companies.
               | 
               | Remember, we are looking in hindsight that it seemingly
               | paid off. A few years ago, this was just an educated bet;
               | only the richest companies with money to burn (from
               | selling ads) would be willing to take on that kind of a
               | risk.
        
               | TulliusCicero wrote:
               | That's the cost of running DeepMind as a whole, right?
               | Which includes all the other stuff they've worked on,
               | like games.
        
               | Seanambers wrote:
               | Yeah, as far as I can tell, that's the whole lot of it.
        
         | mwcampbell wrote:
         | How far does the similarity extend? Specifically, the big
         | question for me is whether AlphaFold will be freely available
         | like ImageNet, or proprietary.
        
           | 0-_-0 wrote:
           | ImageNet is a competition and a dataset, AlphaFold is a
           | neural network.
        
           | ramraj07 wrote:
           | The competition requires enough revealing about the
           | methodology for other teams to replicate it so open
           | implementations are going to be available for sure.
           | 
           | It also looks like they came up with a brand new jiggling
           | algorithm which is probably just V1 now, this really changes
           | things in a significant way!
        
           | sanxiyn wrote:
           | I expect this to be quickly replicated once published.
           | Training data is public and training compute is not enormous
           | and AlphaFold of 2018 did get replicated.
        
             | dekhn wrote:
             | CASP typically works this way: one person "wins" by getting
             | a slightly higher score than everybody else. Two years
             | later, the top teams have all duplicated the previous
             | winner's tech, and two years after that, there's a github
             | you can download and run on your GPU to reproduce
             | everything.
        
             | kxs wrote:
             | How do you define enormous? "It uses approximately 128
             | TPUv3 cores (roughly equivalent to ~100-200 GPUs) run over
             | a few weeks". Also last time it took about a year for good
             | replications to pop up.
        
               | dragontamer wrote:
               | A lot of labs have access to the various strategic
               | supercomputers of the USA.
               | 
               | Ex: Summit has 27,648 V100 GPUs (and those V100s have
               | Tensor units). If you're saying that only 200 GPUs are
               | needed to replicate the experiment, that doesn't even use
               | up 1% of Summit's available utilization.
        
               | elcritch wrote:
               | A couple of hundred GPU's is well within the reach of
               | many even moderately well heeled research institutes.
               | It'd seem that about 3 weeks of compute time with 128 TPU
               | v3's would be about $170,311.68.
        
               | kxs wrote:
               | But of course that cost would only be for the final
               | model. Anyway, I think I am just living in a different
               | world... :-) We could never compete with that
        
               | elcritch wrote:
               | Yah, big grant money. Now the grad students programming
               | the open source clones will only make approximately
               | $0.56, or 4.2 Ramen packs, for their effort. ;)
        
               | sdenton4 wrote:
               | Also with keeping in mind that once a good open source
               | model is available, researchers with less resources can
               | still use it to fine tune and get new results for far
               | cheaper than training a new model from scratch.
        
               | intpx wrote:
               | or cryptominers
        
               | mrDmrTmrJ wrote:
               | A year is a fast time to replication in many scientific
               | fields.
               | 
               | While substantial, the resources here are well within
               | reach of many labs, research institutes, and
               | organizations. For this result this big, I'd guess we'll
               | have 2-6 additional implementations in the next 18
               | months. The problem has been 'open' for 40+ years, so
               | that's lightening fast!
        
       | justinzollars wrote:
       | I have a Masters in Biology. This was once described as an
       | impossible problem to solve. A huge achievement.
        
       | SubiculumCode wrote:
       | RIP folding at home?
       | 
       | EDIT: Just throwing this out there: Are there national security
       | issues to think about with this? Can it be used to weaponize
       | computational biology?
        
         | flobosg wrote:
         | Folding@home tackles a related but different problem. They
         | simulate folding dynamics, i.e. how does a protein reach its
         | folded structure.
         | 
         | If AlphaFold gives you a picture of a protein structure,
         | Folding@home shoots a video of that protein undergoing folding.
        
       | mylons wrote:
       | 12-13 years ago in a classroom the professor for my intro to
       | bioinformatics class said if you were to solve this problem, you
       | would win a Nobel prize. Congrats to the team! What an
       | achievement.
        
       | comicjk wrote:
       | CASP (Critical Assessment of protein Structure Prediction) is
       | calling it a solution. To quote from the article:
       | 
       | "We have been stuck on this one problem - how do proteins fold up
       | - for nearly 50 years. To see DeepMind produce a solution for
       | this, having worked personally on this problem for so long and
       | after so many stops and starts, wondering if we'd ever get there,
       | is a very special moment."
       | 
       | --Professor John Moult Co-founder and chair of CASP
        
         | light_hue_1 wrote:
         | This is an issue of the more subtle aspects of English.
         | 
         | "To see DeepMind produce a solution for this" does not imply
         | something is solved. I can produce a bad solution. I can
         | produce a really good solution. All without solving a problem.
        
           | comicjk wrote:
           | This is a really good solution. Of course, there's still room
           | for more research and better methods in the future, but now
           | computational protein structure prediction can compete with
           | experiments actually measuring the structure.
        
         | dekhn wrote:
         | It's an improvement- and a big one- but not a solution to the
         | problem. It mainly shows just how stuck the community had
         | gotten with their techniques and how recently improvements in
         | DNNs and information theory methods can be exploited if you
         | have lots of TPU time.
        
           | aardvarkr wrote:
           | It's officially recognized as a solution.
        
             | cambalache wrote:
             | Well, it's not. Nature does not have a committee sorry.
             | Proteins are delicate "machines" where even a a small
             | change in the sequence (and thus the 3D structure) as small
             | as a few amino-acids would change effectively the structure
             | and the function of it. On top of that, proteins are
             | dynamic beasts. In any case, it's a great advance, but DM,
             | as many companies likes a little bit too much to tout its
             | own horn.
        
               | [deleted]
        
             | ClumsyPilot wrote:
             | I am not sure we are talking about the same thing -i.e.
             | there is a solution for hunger, but it's not a solved
             | problem.
        
               | mrDmrTmrJ wrote:
               | This benchmark maybe solved, but simultaneously, there
               | remain other open problems relating to protein folding
               | which are unsolved and which may not even have benchmarks
               | yet :)
               | 
               | Said differently, there's vast space between having a
               | great result on a specific benchmark (this) and solving
               | all interesting problems in a scientific field.
        
             | dekhn wrote:
             | No, it's not. The folks who run CASP gave some nice PR, but
             | it doesn't mean that protein folding is solved.
        
       | dr_dshiv wrote:
       | "It has occurred decades before many people in the field would
       | have predicted. It will be exciting to see the many ways in which
       | it will fundamentally change biological research."
        
       | breck wrote:
       | v2 looks amazing. that jump is even more incredible than the
       | first. More context from v1 in 2018:
       | 
       | https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp...
        
       | m3kw9 wrote:
       | Does this obsolete Folding@home?
        
         | michaelcampbell wrote:
         | My question exactly; or Rosetta @ home, or any of the other
         | protein folding "@home"s. I participate in a few, but would
         | gladly donate my compute resources elsewhere if this is no
         | longer necessary.
        
       | amelius wrote:
       | This was also one of the main selling points of quantum
       | computers.
       | 
       | Makes you wonder what Deep Learning will tackle next.
       | Factorization of large integers?
        
       | dang wrote:
       | All: there are multiple pages of comments; if you're curious to
       | read them, click More at the bottom of the page, or like this:
       | 
       | https://news.ycombinator.com/item?id=25253488&p=2
       | 
       | We changed the URL from
       | https://predictioncenter.org/casp14/zscores_final.cgi to the blog
       | post, which has more background info.
        
         | kovek wrote:
         | I've seen you mention this [More] comment a few times now. I
         | like it, though what if you change the design of the More
         | functionality?
        
           | nathancahill wrote:
           | Also, what do the traffic stats look like for the
           | second/third pages of big threads like this one? Pretty steep
           | falloff?
        
       | TrackerFF wrote:
       | What are the immediate real-world applications of this? Just
       | asking, because I have very little knowledge in this area.
        
         | candiodari wrote:
         | Given the DNA code for one of the "machines" that run cells, we
         | can generate an atomic model of that machine. This means we can
         | "compile" (one part of) the DNA code. It was already possible,
         | but so slow that entire datacenters would spend months
         | calculating this for a single protein and even then we can't
         | use them on the really complex ones at all, necessitating
         | things like neutron spectroscopy which are totally insane, and
         | only work on like 1% of proteins.
         | 
         | This is useful because for example chemical simulation tools
         | don't run on DNA code, but on atomic models. And also to
         | produce "images" of the molecules (images between quotes
         | because most proteins are too small to interact with reasonable
         | photons, and no interaction with photons means you can't see
         | them in any way)
         | 
         | DNA has other parts that are really important but we don't
         | understand at all yet, where this doesn't help at all. This
         | applies to sections of DNA sent to ribosomes, to produce actual
         | molecules. Besides that, there are pieces of DNA that "index"
         | the DNA, pointers (from one gene to another), triggers (that
         | for instance start production of an enzyme based on some
         | external influence, like detection of a marker molecule) and
         | export markers (that tell you what to do once the protein is
         | produced, for example, mark a protein to be removed from the
         | cell, incorporated into the cell membrane, or for instance used
         | inside the cell nucleus, and there's also one that essentially
         | says "at this point stop producing a protein and instead couple
         | the rest of the DNA code to the end of the protein you just
         | made").
        
           | Rochus wrote:
           | This is about proteins, not DNA.
        
             | shawnz wrote:
             | Proteins which are coded by DNA.
        
               | Rochus wrote:
               | So what? The DNA only codes for the RNA and amino acid
               | sequence. Structure determination is yet another topic.
               | When we determine the protein structure we already know
               | the sequence. Neither DeepMind has to look at the DNA to
               | train their DNN.
        
               | shawnz wrote:
               | They are two topics which are both relevant to the
               | discussion.
               | 
               | Structure determination is what allows you to see the
               | purpose/effect of the sequence that the DNA encoded.
        
               | Rochus wrote:
               | Have you read the article? It's about protein structure
               | determination. The DNA only determines the RNA and amino
               | acid sequence. But who cares. I will get a bit less work
               | and citations because http://cara.nmr.ch/doku.php will be
               | less used in future.
        
             | candiodari wrote:
             | The full chain is DNA -> mRNA -> Ribosome -> tRNA
             | combinations -> amino acid chain -> protein.
             | 
             | It's true that in nature there are many steps between DNA
             | and proteins (this list doesn't even include the steps that
             | mediate the translation, ie. start it, stop it, slow it
             | down, ...), but the structure of a protein is fully
             | determined by the DNA code.
             | 
             | Protein folding is about you start from the DNA code that
             | is fed into the ribosome ignoring all the meta information,
             | and come up with an atomic model (VERY long list like "H
             | atom at 3.27,2.17,12.18, C atom at 2.87, 2.19, 12.33,
             | ..."). Now there's a million niceties we've discovered to
             | make this problem simpler and nicer looking, but that's
             | what it boils down to.
        
               | Rochus wrote:
               | Thank you very much; almost forgot I did a Phd on the
               | subject ;-)
               | 
               | But anyway your answer does not contradict my statement.
               | What you say belongs to the basics of molecular biology,
               | but does not justify that DNA should be considered when
               | determining the structure of proteins. In practice, the
               | amino acid sequence is always already present.
        
               | Rochus wrote:
               | For the sceptics: if you read the referenced article, you
               | will see that it is about protein structure determination
               | by means of deep neural networks. It's not about gene
               | expression, which is a different topic. What benefit does
               | it have to respond to the question "What are the
               | immediate real-world applications of this" by reciting
               | some molecular biology dogmas from text books mixed with
               | misconceptions, instead of responding to the real
               | question?
        
               | shawnz wrote:
               | Nobody is suggesting that this research has anything to
               | do with gene expression or anything like that. Their
               | point was simply that we now have better tools to
               | actually see the meaning/effect of a given DNA sequence.
               | 
               | Also, there is no need to passive-agressively highlight
               | your credentials. I already researched them before
               | replying.
        
               | Rochus wrote:
               | I rather think most people comment without even having a
               | look at the referenced article. And since when is the
               | reference to a qualification considered aggressive? If
               | your doctor hangs his doctor's certificate on the wall,
               | is he "passive-aggressive"? Pretty weird.
               | 
               | > that we now have better tools to actually see the
               | meaning/effect of a given DNA sequence
               | 
               | Note that the "meaning/effect" of a DNA segment encoding
               | a protein is known and unrelated to the protein folding
               | process. The protein gets its conformation after the
               | translation process.
        
               | shawnz wrote:
               | > Note that the "meaning/effect" of a DNA segment
               | encoding a protein [...]
               | 
               | The "meaning" of a DNA segment is not to encode a
               | protein. The "meaning" is to describe a mechanism in the
               | host organism (by way of encoding a protein). That is a
               | complex process which involves gene expression AND
               | protein folding.
               | 
               | For example would you say that the "meaning" of some Java
               | code is to generate bytecode? Of course not, the
               | "meaning" is to run some algorithm on the computer that
               | executes it
        
               | [deleted]
        
         | Rochus wrote:
         | > _What are the immediate real-world applications of this?_
         | 
         | A protein is actually a linear sequence of amino acids, but in
         | a cell this sequence has a three-dimensional arrangement like a
         | clew of thread. The arrangement is not random, but dependent on
         | the specific composition of the sequence (i.e. selection and
         | order of amino acids) and some other factors. To understand the
         | function of a protein, we need to know this three-dimensional
         | arrangement (i.e. structure). Up to now the structure
         | determination process was mostly manual, complex, time-
         | consuming (several months up to more than a year) and error
         | prone. If structure determination by DNN is reliable, this is a
         | big win for life science. There are still a lot of problems
         | open: e.g. the structure is not constant over time but there
         | are "moving parts" in the structure which are important for its
         | function.
        
         | randcraw wrote:
         | For-profit corporations that value protein engineering will
         | beat a path to DeepMind's door ASAP, like pharmas.
         | 
         | Protein conformation prediction is essential when engineering
         | new small-molecule drug compounds that must 'dock' with the
         | specific proteins that regulate disease. Knowing how to create
         | a protein with the precise shape to become biologically active
         | has soaked up a lot of R&D funding toward pie-in-the-sky
         | techniques that promise to advance that agenda (like quantum or
         | DNA computing).
         | 
         | If this method works as DeepMind says, it will immediately be
         | adopted by every pharma to assess and tweak the shape of
         | candidate proteins.
        
           | dekhn wrote:
           | you give pharma too much credit. I had built a previous
           | system to do something similar to this that produced
           | excellent results and tried to give it away for free to
           | Genentech, which ignored me. They said it didn't work for
           | their purchasing department.
        
             | TheRealPomax wrote:
             | I don't believe you, but I look forward to you showing
             | proof of this with some links (and if you tried giving it
             | for free, I assume you just open sourced the whole deal, so
             | I look forward to a repo link or the like).
        
               | dekhn wrote:
               | I developed the Exacycle system at Google and used it to
               | publish my work (I wrote that blog entry):
               | https://ai.googleblog.com/2013/12/groundbreaking-
               | simulations...
               | 
               | we offered the service for free to Genentech since I used
               | to work there and knew they could probably use it to get
               | some good publications.
               | 
               | We didn't open source the distributed computing
               | framework, but the underlying technology (Folding@Home)
               | is based on gromacs, which is open source. It's the scale
               | at which it ran, and the processing pipeline for
               | filtering the results that had the real value.
        
       | mncharity wrote:
       | Additional commentary in Science:
       | https://www.sciencemag.org/news/2020/11/game-has-changed-ai-...
       | 
       | (submitted by furcyd :
       | https://news.ycombinator.com/item?id=25254888 ).
        
         | ramraj07 wrote:
         | The most amazing part:
         | 
         | > The organizers even worried DeepMind may have been cheating
         | somehow. So Lupas set a special challenge: a membrane protein
         | from a species of archaea, an ancient group of microbes. For 10
         | years, his research team tried every trick in the book to get
         | an x-ray crystal structure of the protein. "We couldn't solve
         | it."
         | 
         | > But AlphaFold had no trouble. It returned a detailed image of
         | a three-part protein with two long helical arms in the middle.
         | The model enabled Lupas and his colleagues to make sense of
         | their x-ray data; within half an hour, they had fit their
         | experimental results to AlphaFold's predicted structure. "It's
         | almost perfect," Lupas says. "They could not possibly have
         | cheated on this. I don't know how they do it."
        
           | dekhn wrote:
           | If I interpret this properly, they're saying they used the DM
           | prediction (not an actual model, just a prediction) to do
           | molecular replacement
           | (https://en.wikipedia.org/wiki/Molecular_replacement) which
           | sounds pretty audacious. I see it recently made it into the
           | literature: https://journals.iucr.org/m/issues/2020/06/00/mf5
           | 047/index.h...
        
           | pmastela wrote:
           | Like the old Arthur C. Clark quote goes: "Any sufficiently
           | advanced technology is indistinguishable from magic" --
           | unless it might be cheating in which case throw them a curve
           | ball.
           | 
           | Kudos to the DeepMind team for making magic happen.
        
             | 14 wrote:
             | I am happy you mention this. I was reading the article and
             | thinking "wow the amount of scientific knowledge these guys
             | need to know to understand what they are doing is way
             | beyond me". I work in health care and I always talk to
             | clients about all the cool things they witnessed in their
             | life. Cell phones, TVs, microwaves are some obvious ones I
             | like to talk about. I sit and wonder what are the things my
             | generation will get to look back on and say "I was alive
             | when that happened". I guess for many of us we will talk
             | about how the internet was vs what it surely will be in the
             | future, a shell of its initial glory.
        
             | rsiqueira wrote:
             | "A sufficiently advanced Artificial Intelligence would be
             | indistinguishable from God." (Way Of The Future - AI
             | Church)
        
       | sleepysysadmin wrote:
       | I made new years prediction about exactly this. I predicted
       | folding@home would die despite huge interest again because of
       | covid.
        
       | AlexCoventry wrote:
       | What's the actual news, here? AlphaFold is amazing, but it's been
       | around for a while.
        
         | typon wrote:
         | AlphaFold 2. The article specifically mentions it.
        
           | AlexCoventry wrote:
           | Thanks.
        
       | lgeorget wrote:
       | See also the piece in Nature about the topic:
       | https://www.nature.com/articles/d41586-020-03348-4
        
       | gravy wrote:
       | Maybe combine with https://news.ycombinator.com/item?id=25254772
       | ?
        
       | troelsSteegin wrote:
       | Can anyone (yet) provide a sketch of how this works? I saw a
       | mention of "attention", which I vaguely take to be a surrogate
       | for some form of structural information. It's an astonishing
       | result. How does it work?
        
       | lucidrains wrote:
       | Amazing day for structural biology! If it weren't for the
       | pandemic, I would be out at the bars celebrating tonight!
        
         | postingpals wrote:
         | Heh, soon you'll be able to do that too when the vaccine comes
         | out. What a great end to the year.
        
       | Havoc wrote:
       | Does something like Folding@Home still have meaning after this?
        
       | empiricus wrote:
       | I am actually scared. This plus CRISPR means real nanotechnology
       | is within reach.
        
         | marcosdumay wrote:
         | There is still at least one NP-hard problem on the way, that is
         | creating a protein with a desired format.
        
         | jcims wrote:
         | I think this is the interesting part because there aren't going
         | to be the same regulatory hurdles for using ribosomes to
         | manufacture technology as there are for medicines. Synthetic
         | organelles that weave fibers, build metamaterials, etc could
         | lead to pretty magical advances in our capability.
        
           | entropicdrifter wrote:
           | Perhaps we'll live to see The Diamond Age
        
             | wrinkl3 wrote:
             | Can't wait to join a distributed computing bacchanalia.
        
         | enchiridion wrote:
         | My thought as well. I wonder what the world will look like in
         | 20 years because of this.
         | 
         | I'm willing to bet it will be staggeringly different than what
         | most people are expecting.
        
         | dynamite-ready wrote:
         | Far from an expert here, but your comment makes me think of
         | Michael Crichton's 'Prey', if you've not already read it. Not
         | that I wish to add to your apprehension.
        
       | [deleted]
        
       | echelon wrote:
       | This sounds wonderful and frightening. On the one hand, now we
       | can engineer drugs at light speed. But wasn't protein folding
       | supposed to be NP-hard?
       | 
       | Can deep learning find the cracks in P vs NP?
       | 
       | Perhaps making clever guesses at prime factors because it learned
       | some weird structural fact that has eluded mathematicians.
       | 
       | If we break crypto, there goes the modern world. Banks, bitcoin,
       | privacy, Internet, the whole shebang.
       | 
       | (I obviously am _not_ an expert in computational complexity and
       | hope that some domain experts can chime in and assuage my fears.)
        
         | glatteis wrote:
         | > But wasn't protein folding supposed to be NP-hard?
         | 
         | Yeah, at least some variations of it are NP-hard. SAT is THE
         | NP-complete problem, but there are some really good SAT solvers
         | around. This basically means: They have a solution that mostly
         | does very well on most instances. But because (probably) P !=
         | NP, you will never have a polynomial time algorithm for this.
        
         | sgt101 wrote:
         | I think that this is a heuristic "near optimal" method rather
         | than an exact analytic method (I have little to no idea of what
         | that would be in protein folding). A domain I do understand a
         | bit which is np-hard is the travelling sales man. Computing an
         | exact solution is unrealistic, but doing heuristic searches
         | that get you to 99% of the optimal 99% of the time is
         | relatively doable.
         | 
         | But - you don't know that you are 1% from the solution... even
         | if you are pretty confident that you are. It's quite possible
         | (unlikely) that you are way off the optimal, but if you have a
         | decent solution that's ok.
        
         | aparsons wrote:
         | There is probably a team at DeepMind working on cracking simple
         | crypto. Problem is, it can be difficult to cast the problem
         | properly/"correcty". How does a one way function get
         | represented?
        
         | Someone wrote:
         | NP-hard doesn't say how hard it is to solve finite problems.
         | Even for n = 1,000,000, _O(e^n)_ isn't necessarily problematic,
         | _if_ the constant is small enough, or if you throw enough
         | hardware at it.
         | 
         | This "uses approximately 128 TPUv3 cores (roughly equivalent to
         | ~100-200 GPUs) run over a few weeks". That is a moderate amount
         | of hardware for this kind of work, so it seems they have a more
         | efficient algorithm.
         | 
         | Also, this algorithm doesn't solve protein folding in the
         | mathematical sense; it 'just' produces good approximations.
        
         | ichbinwiederda wrote:
         | Far from an expert on complexity theory, but NP-hard problems
         | can be approximated in polynomial time. With Deep Learning you
         | are doing approximation. So this is nothing ground breaking in
         | that respect.
        
           | Vervious wrote:
           | there are also a variety of problems that are hard to
           | approximate.
        
           | foxtr0t wrote:
           | That actually isn't totally true. Approximate methods, in the
           | formal sense, require a guarantee that they perform within X
           | of the optimal solution. Not all NP-hard problems have
           | polynomial approximations and the methods shown here are
           | likely not approximations because they very likely provide no
           | guarantees on performance. They provide zero guarantees.
        
             | ichbinwiederda wrote:
             | Yes thank you for elaborating. I agree with you on both
             | counts.
        
         | blamestross wrote:
         | > Can deep learning find the cracks in P vs NP?
         | 
         | No. It really is just heuristic building. A core problem with
         | using ML in this sort of use case is that it is often brittle.
         | Once it gets outside of the context it was trained in it may or
         | may not be able to generalize it's training to new contexts. We
         | may have difficulty knowing when it is very wrong.
         | 
         | I think ML in research science could be viewed as a very good
         | intuitive oracle. Even if they are right 95% of the time, you
         | have to do this work prove the long way every time because that
         | 5% matters. The real utility is in "scanning the field" to
         | better focus research on things likely to bear fruit.
        
         | karl-j wrote:
         | I think I'm almost as uninformed as you, but I believe it comes
         | down to the difference between perfect solutions and close
         | enough solutions. Consider the classic NP problem of the
         | traveling salesman problem.
         | 
         | "[Modern heuristic and approximation algorithms] can find
         | solutions for extremely large problems (millions of cities)
         | within a reasonable time which are with a high probability just
         | 2-3% away from the optimal solution." [0]
         | 
         | When close enough is enough, NP problems can often be solved in
         | P time, and I suspect this is one of those cases. For crypto
         | however, close enough is not enough.
         | 
         | [0]
         | https://en.wikipedia.org/wiki/Travelling_salesman_problem#He...
        
       | The_rationalist wrote:
       | Let's imagine that as a researcher I make a breaktrhough NN
       | model, but that I need a lot of TPUs/GPUs in order to test it, is
       | there a service for temporarily lending such hardware to me for
       | free/not much ? (e.g google colab ?) Otherwise researchers will
       | plateau with their hardware budget.
        
       | vadansky wrote:
       | Just to add to this whole "It's not solved! Yes it is!"
       | discussion. Note that
       | 
       | >According to Professor Moult, a score of around 90 GDT is
       | informally considered to be competitive with results obtained
       | from experimental methods.
       | 
       | So if we go by >= 90 as solved:
       | 
       | >In the results from the 14th CASP assessment, released today,
       | our latest AlphaFold system achieves a median score of 92.4 GDT
       | overall across all targets.
       | 
       | they solved for their targets, but
       | 
       | >Even for the very hardest protein targets, those in the most
       | challenging free-modelling category, AlphaFold achieves a median
       | score of 87.0 GDT (data available here).
       | 
       | They basically admit they still haven't "solved" it for "most
       | challenging free-modelling category"
       | 
       | Take that as you will, not sure how useful the ">= 90 is solved"
       | criteria is since they call it "informal" themselves.
        
         | fastball wrote:
         | What do you mean you're not sure how useful ">= 90" is as a
         | criteria?
         | 
         | You literally said why it is useful in your comment:
         | 
         | > 90 GDT is informally considered to be competitive with
         | results obtained from experimental methods.
         | 
         | It's informal because we don't have a true "gold-standard" for
         | determining a protein's folded structure - the best we have is
         | experimental methods of trying to determine the structure which
         | still have a great deal of error (compared to other things we
         | can measure).
         | 
         | So all we can do is say "the GDT between two experimental
         | measurements (of the same protein) is often around 90, so if we
         | get there with predictive models that's pretty much just as
         | good".
         | 
         | As soon as we have better experimental methods for determining
         | protein tertiary structure, you can be sure we will require
         | predictive models to deliver better results too. Until then,
         | the point is that the delta between any two experimental
         | determinations of folded structure is approximately the same as
         | the delta between an experimental determination and an
         | AlphaFold guess. So the AlphaFold guess may as well be an
         | experimental measurement. Except the AlphaFold guess happens
         | fairly trivially (once you give it the DNA sequence[1]), where
         | as the experimental method is involved and expensive.
         | 
         | [1] Or the primary structure, I'm unsure what inputs are given
         | to AlphaFold.
        
         | vadansky wrote:
         | Just to add to my own comment. Why does HN like being so
         | pedantic about the definitions of words? This is an interesting
         | post regarding AI and cellular biochemistry. Do we really need
         | to add a philosophical debate about the meaning of "solution"?
         | Personally I think anyone who can't add to the discussion about
         | AI and protein folding should just not comment, instead of
         | settled on adding to the what does solution mean "debate". I'd
         | love to see a blanket rule flagging pedantic posts.
        
           | 6gvONxR4sf7o wrote:
           | HN pushes back on hype and because there's generally too much
           | hype in announcements.
        
         | pretendscholar wrote:
         | 87 GDT sounds pretty much solved to me if 90 is the benchmark
        
         | garmaine wrote:
         | That's shifting goal posts. The hardest structures are also
         | going to be harder experimentally.
         | 
         | What makes them hard to predict is the very close energies
         | involved in different folding pathways. Those close energies
         | mean there will be more variant structures which change by use
         | the experimental approach too.
        
         | [deleted]
        
       | tpoacher wrote:
       | Whenever deepmind comes up with something like this, my first
       | instinct is to say "yay for humanity" ... then I remember who
       | they work for, and the second instinct is to say "Ah. Crap."
        
       | unchocked wrote:
       | Been out of the field for a while, could someone currently in it
       | qualify these results? Hyperbolic title notwithstanding, they
       | approach 90% median free modeling accuracy. The "other 90%" still
       | remains to be solved...
        
         | gmorainbows wrote:
         | The method relies on multiple-sequence-alignment (MSA) of
         | homologous proteins. This cannot fold arbitrary proteins, only
         | biologically relevant ones that have high quality MSAs
         | available. It's also worth pointing out that the gold-standard
         | for validating MSAs relies on PDBs of folded proteins. This is
         | exciting work that will assist NMR and XRay crystallographers,
         | but it's not a panacea of protein folding.
         | 
         | https://github.com/deepmind/deepmind-research/issues/18
        
           | flobosg wrote:
           | In their CASP abstract[1] they mention alternatives to
           | typical co-evolution features which improve performance in
           | shallow MSA depths.
           | 
           | [1]: https://predictioncenter.org/casp14/doc/CASP14_Abstracts
           | .pdf...
        
             | gmorainbows wrote:
             | It doesn't matter so much how they perform the feature
             | extraction, so much as what their inputs to the feature
             | extraction are.
             | 
             | This model requires a collection of wild-type proteins in
             | an accurate MSA. Producing an accurate MSA is hard even if
             | you have many homologs.
             | 
             | They require protein homologs which means they can "only"
             | do this for wild-type proteins. This work is useless with
             | mutant and synthetic proteins. This is a big advancement
             | that will assist crystallographers and NMR structural
             | biologists with difficult wild-type proteins, but it
             | doesn't "solve protein folding" by any stretch of the
             | imagination.
        
               | flobosg wrote:
               | > Producing an accurate MSA is hard even if you have many
               | homologs.
               | 
               | To assess co-evolutionary couplings the amount of
               | homologs in the MSA is not as important as the number of
               | _effective sequences_ (i.e. sequence depth and diversity)
               | in it.
               | 
               | > They require protein homologs which means they can
               | "only" do this for wild-type proteins.
               | 
               | Even remote homologs work, as shown by the widespread use
               | of HHM-based methods in the prediction pipelines.
               | 
               | > This work is useless with mutant and synthetic
               | proteins.
               | 
               | Unless you generate a flurry of data with them using deep
               | mutational scanning for example. As long as correlated
               | mutations are present in the MSA the technique should
               | work as expected no matter where the protein sequences
               | originated.
        
               | gmorainbows wrote:
               | I'm honestly not familiar with "deep mutational
               | scanning." Can you share a link? I'm first author on
               | papers related to the structural biology of coevolution
               | and I competed in CASP about a decade ago, but I haven't
               | kept up much since then.
        
               | flobosg wrote:
               | Sure! Here's a paper about the method:
               | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410700/
               | 
               | And another one about its application in structure
               | prediction:
               | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295002/
        
         | asdfasgasdgasdg wrote:
         | I don't think anyone on HN is going to have more authority to
         | qualify the results than the independent experts quoted in the
         | linked article. Among whom are numbered a Nobel laureate, the
         | president of the group that designs the tests of protein
         | folding systems, and the former CEO of Genentech+current CEO of
         | Calico.
        
           | dekhn wrote:
           | Art's a smart guy and I have a lot of respect for his
           | biological intuition, but his understanding of computational
           | biology is very limited.
        
             | asdfasgasdgasdg wrote:
             | I would imagine that he is not assessing this advancement
             | merely using his own personal expertise, but rather the
             | combined expertise of the resources he represents. CEOs
             | don't just look at problems and potential solutions. They
             | have people who look at those things, and then tell them
             | their opinion. In any case, you've picked a nit with one of
             | the three people quoted. Any objections to the other two?
        
               | dekhn wrote:
               | My main objection to Vivek (the Nobel Prize winner) is
               | the prize in that case should have gone to my advisor,
               | Harry Noller. John Moult... he's a nice guy but I think
               | he's being a bit breathless here.
        
               | asdfasgasdgasdg wrote:
               | I see. The co-founder of the organization that tests
               | protein folding is a "nice guy."
        
               | dekhn wrote:
               | CASP is not "the organization that tests protein
               | folding". It's _an_ organization that every two years
               | does a blind prediction and publishes the results (I 've
               | competed, some 20 years ago). John's a protein expert, no
               | question about it. I knew him moderately well back in the
               | day because our advisors moved in similar circles.
        
             | mrDmrTmrJ wrote:
             | dekhn, in what way is Art's "understanding of computational
             | biology very limited?"
             | 
             | I'd love to hear more. Specifically, what do you think that
             | computational biology can do that you think Art doesn't
             | understand or credit?
        
           | WWWWH wrote:
           | Quite right. And the Nobel laureate in question is a
           | structural biologist--so his expertise is directly relevant.
        
       | verroq wrote:
       | So who will have access to this? DeepMind never publishes their
       | models.
        
         | randcraw wrote:
         | I suspect DM will sell this as a service, especially to
         | corporations like pharmas who create small molecule drugs. If
         | their method works as advertised, it may rejuvenate the
         | flagging prospects of Rational Drug Design, the guiding R&D
         | drug development methodology behind most new molecular entities
         | (drugs) for the past ~25 years, which has not proven to be the
         | clear economic win that had been hoped.
        
         | TomJansen wrote:
         | According to [1], they must release enough information for
         | others to replicate the AI model: "As a condition of entering
         | CASP, DeepMind--like all groups--agreed to reveal sufficient
         | details about its method for other groups to re-create it. That
         | will be a boon for experimentalists, who will be able to use
         | accurate structure predictions to make sense of opaque x-ray
         | and cryo-EM data."
         | 
         | [1]: https://www.sciencemag.org/news/2020/11/game-has-changed-
         | ai-...
        
       | [deleted]
        
       | sjg007 wrote:
       | Looks like a transformer model. Anyone have any insights?
        
       | yk wrote:
       | Very interesting, however now the problem becomes to characterize
       | such machine learning approaches. With traditional simulation
       | methods the authors can usually explain easily in which situation
       | a specific approach is good or bad, with neural networks we don't
       | really have a good approach how to analyze the quality of the
       | prediction.
        
       | CJefferson wrote:
       | Has anyone got any good other references for this? After some of
       | the dodgy experiments related to alpha zero (comparing to
       | purposefully degraded chess systems), I'd love to see some
       | independent analysis.
        
         | sanxiyn wrote:
         | CASP is that independent analysis...
        
           | CJefferson wrote:
           | True, but I haven't seen an independent discussion of the
           | CASP results. There is a good chance this is great, but I
           | don't trust deepmind press releases.
        
         | andi999 wrote:
         | I am also wondering. I generally find these kind of approaches
         | hard to believe, but this might be my prejudices.
        
         | syncsynchalt wrote:
         | The article in Science implies that we have independent
         | confirmation of predictions yielding useful results, beyond the
         | challenge itself:
         | 
         | > The organizers even worried DeepMind may have been cheating
         | somehow. So Lupas set a special challenge: a membrane protein
         | from a species of archaea, an ancient group of microbes. For 10
         | years, his research team tried every trick in the book to get
         | an x-ray crystal structure of the protein. "We couldn't solve
         | it."
         | 
         | > But AlphaFold had no trouble. It returned a detailed image of
         | a three-part protein with two long helical arms in the middle.
         | The model enabled Lupas and his colleagues to make sense of
         | their x-ray data; within half an hour, they had fit their
         | experimental results to AlphaFold's predicted structure. "It's
         | almost perfect," Lupas says. "They could not possibly have
         | cheated on this. I don't know how they do it."
        
       | bayeslaw wrote:
       | as so many time recently the hn crowd proves to be completely
       | clueless and uneducated when it comes to ai.. this is a miracle..
       | it is THE achievement we'll remember from the past decade when it
       | comes to ai.. if you don't understand why I recommend learning
       | and reading.the level of ignorance and often proud ignorance here
       | is frightening to me.. ppl who downplay this are either stupid in
       | biochemistry or ai or both .. please don't listen to them. this
       | right here is the single biggest news of 2020..
        
       | piva00 wrote:
       | This sounds big, like really really big. At least from my old
       | times providing my idle computing resources to Folding@Home and
       | following that project, this seems like the major golden
       | milestone for protein folding.
        
         | FrojoS wrote:
         | Exactly what I was thinking. In a very small way many of us
         | tried to help with this problem back in the day. Makes it feel
         | even more important.
         | 
         | Now I'm waiting for the equivalent news about SETI@Home ;-)
        
       | iandanforth wrote:
       | Title as submitted is hyperbole, please fix?
        
         | breck wrote:
         | I don't think it is. Look at the graph.
        
         | dang wrote:
         | We changed the title to that of the article as the site
         | guidelines ask. Submitted title was "DeepMind Solved Protein
         | Folding".
        
         | sanxiyn wrote:
         | It is not a hyperbole.
        
         | EgoIncarnate wrote:
         | I agree. "AlphaFold achieves a median score of 87.0 GDT". While
         | this is a major advance, to me 100 GDT would be 'solved', not
         | 87.
        
           | jhrmnn wrote:
           | By this metric, nothing has been ever solved in natural
           | sciences. So this is not a useful metric.
        
             | ClumsyPilot wrote:
             | Has it not? Neuton's laws of motion and Ohm's law are
             | pretty om point
        
               | joshuamorton wrote:
               | Not when you introduce quantum effects.
        
               | caymanjim wrote:
               | Newton's laws of motion were not a complete solution, as
               | they didn't account for relativity.
        
               | piva00 wrote:
               | If you can explain how gravity works in a quantum level
               | you'd deserve a Nobel. It's not 100% solved, Newton's
               | Laws of Motion are a model, not a solution. Just like the
               | vast majority of science.
        
               | jhrmnn wrote:
               | No, they are very crude (but useful!) models of reality.
               | General relativity and quantum electrodynamics are much
               | better corresponding models, respectively, and even those
               | are just approximations.
        
           | ashtonbaker wrote:
           | > To me
           | 
           | Are you a domain expert? Because:
           | 
           | > According to Professor Moult, a score of around 90 GDT is
           | informally considered to be competitive with results obtained
           | from experimental methods.
        
             | The_rationalist wrote:
             | but experimental methods have not solved protein folding
             | either. AlphaFold has'nt solved protein folding but I can't
             | wait to see their progress for ALphaFold 3.
             | 
             | What would be informatively useful would be to know how
             | much accuracy is needed on average for drug engineers, I'd
             | say that 99% is more likely to be the minimum to make solid
             | inferences
        
               | ashtonbaker wrote:
               | > but experimental methods have not solved protein
               | folding either.
               | 
               | I might be missing something here, but isn't
               | "experimental methods" just shorthand for "our best
               | knowledge of a protein's structure, obtained via NMR or
               | X-ray crystallography"? In that case, I'm not sure what
               | "solving" protein folding even means - literally zero
               | mean error? We can't know/solve anything beyond our best
               | knowledge, that's tautological.
               | 
               | > What would be informatively useful would be to know how
               | much accuracy is needed on average for drug engineers.
               | 
               | Yeah that would be interesting, but:
               | 
               | > I'd say that 99% is more likely to be the minimum to
               | make solid inferences
               | 
               | ...what are you basing this on?
        
               | The_rationalist wrote:
               | It's pretty clear what solving means, it means to have an
               | exact representation of the 3D structure. Our partial
               | knowledge obtained from such techniques is what it is,
               | partial. We need new metrology that increase the
               | observability accuracy and completeness OR better
               | deterministic models from sequences.
               | 
               | "We can't know/solve anything beyond our best knowledge,
               | that's tautological." yes it is indeed tautological if
               | you assume that experimental methods can't get better
               | then guess what? It follows that they can't get better!
               | 
               | "what are you basing this on?" on nothing solid, that's
               | why I say it _would_ be interesting. 99% is a non
               | negligible error rate given that proteins have generally
               | a not very high atom count and they the protein will be
               | produced an enormous amount of time, then the 1% error
               | progagate and can a priori easily break the system. But
               | this guess is not solid as I 'm not an expert. 99%
               | accuracy for simple (low atom count) proteins is a
               | sensitive error and could be negligible for very high
               | atom count proteins.
        
               | ashtonbaker wrote:
               | > It's pretty clear what solving means, it means to have
               | an exact representation of the 3D structure.
               | 
               | That's not clear at all, because perfect measurement
               | doesn't exist. I agree that improving is always a worthy
               | goal, but clearly we don't need 100% accuracy to consider
               | something "solved" for the purposes of science. Also, "3D
               | structure" of a protein is not a fixed truth, the parts
               | are in motion all the time and may even have multiple
               | semi-stable conformations. Rather than focusing on X,Y,Z
               | perfection, I would imagine getting the angles between
               | bonds, or the general topological conformation right
               | would be more valuable.
               | 
               | > if you assume that experimental methods can't get
               | better ...
               | 
               | I'm saying that if your definition for "solved" is
               | "perfect knowledge", then we might as well not discuss
               | whether method X or Y solves the problem, because they
               | obviously do not.
               | 
               | The more I think about it, the more I think we should
               | just drop the whole debate over the word "solved".
               | Clearly different experiments and different proteins will
               | have different requirements which may or may not be met
               | by this or by other techniques - I agree that I would be
               | interested to hear an expert weigh in on those
               | requirements.
        
         | jjoonathan wrote:
         | It's not.
        
         | kgwgk wrote:
         | I agree. If a newspaper published a headline "Dr. Whatever
         | cured cancer (... in some of her patients)" we would find it
         | misleading.
        
           | deeviant wrote:
           | If there was a headline, "Company X with Product Y cured
           | cancer" and it turns out that product Y actually only cured
           | 90% of cancer, I'm pretty sure most people would be happy the
           | headline.
           | 
           | Oh, and to be a true parallel example, in this case the
           | remaining 10% of cancers might not even be cancers, as
           | experimental accuracy of protein structures is only ~90%
           | accurate, the model could very well be __more __accurate than
           | our current ability to experimentally detect protein
           | structure.
        
             | kgwgk wrote:
             | I really interpreted that headline as "found a general
             | solution to the protein-folding question" not as the also
             | interesting but not that much "can be used to solve
             | protein-folding problems".
        
       | purpleidea wrote:
       | Does this produce the various different foldings that each
       | protein can often "sit" in?
       | 
       | Can it take temperature and other environmental conditions into
       | account?
       | 
       | Can you specify that a particular ligand or electrical current is
       | present so that you can see the resultant shape change?
       | 
       | Is all the source code for this available so that other
       | scientists can build on top of this, or will we have to go
       | through a paid or SaaS google API to use it?
        
       | xyzal wrote:
       | Does it mean there is no point in playing fold.it anymore?
        
         | breck wrote:
         | Yes, no point, as far as I understand it.
        
         | hobofan wrote:
         | fold.it was always more geared towards being edutainment than
         | actually contributing solutions. Of the ~20 publications made
         | related to fold.it over a decade, ~5 of them seem to have
         | contributed to solving structures, while the rest of them are
         | about the game itself.
        
         | flobosg wrote:
         | Besides structure prediction, Foldit is used for the inverse
         | problem: protein design.
        
         | IgniteTheSun wrote:
         | Considering the resource requirements for this AI approach
         | mentioned in the article, its unlikely that its been tested on
         | more than a few tens to hundreds of proteins. This may only
         | work on a subset of the proteome so I would think it worth it
         | to continue playing if you find it to be a fun past-time.
        
           | nynx wrote:
           | Those were the requirements for training it.
        
       | Rochus wrote:
       | Great. So then farewell CARA (http://cara.nmr.ch/doku.php), we
       | had a good time.
        
         | Rochus wrote:
         | After I had some time to think about it, I come to a different
         | conclusion. Contrary to my first assumption, Bio NMR (in
         | contrast to crystallography) will become more and more
         | important, since the method allows to study the dynamic
         | properties of proteins. With the structure predicted by DNNs,
         | the chemical shifts to be expected in the NMR spectra can be
         | calculated; the assignment problem is thus largely eliminated.
         | Bio NMR can then be used specifically to study the "parts that
         | move".
        
       | breatheoften wrote:
       | Anyone care to muse about appropriate investment strategies based
       | on the not previously feasible research approaches that might now
       | be possible?
       | 
       | Should we expect to see faster progress in large well capitalized
       | bioscience companies -- or a sudden increase in the viability of
       | smaller biotech and/or biotech startups ...? Are we gonna see top
       | talent fleeing the old biotech companies to start their own
       | ventures with a new belief that the potential for huge reward
       | might suddenly seem achievable?
       | 
       | What kind of companies do we think will be the first that are
       | able to translate this new knowledge into profits?
        
         | enchiridion wrote:
         | I agree. I think a company that is working on large scale
         | automated bio experiments would be well positioned to take
         | advantage of something like this.
         | 
         | What companies are doing that work?
        
       | EgoIncarnate wrote:
       | "AlphaFold achieves a median score of 87.0 GDT". Game changing,
       | and a huge improvement, but not 100% solved. Also this is for
       | static folding. Dynamic folding and interaction is a much harder
       | problem. Those need to be tackled too before I would consider
       | protein folding 'solved'.
        
         | nabla9 wrote:
         | They solved the latest folding competition benchmark set.
         | 
         | Shorter problems are easy to solve. Median score is mix of
         | easier hand harder problems. Next year competition will have
         | new set of much bigger and harder problems to solve.
         | 
         | This seems like a leap, not solved as in having solution that
         | just works and scales.
        
         | hans1729 wrote:
         | >Those need to be tackled too before I would consider protein
         | folding 'solved'
         | 
         | Semantics. From a systemtheoretical point of view, dynamic
         | folding is an abstraction of static folding; solve (i.e.
         | understand the underlying mechanisms) static folding and you
         | can start progressing on dynamic folding, building up on your
         | previously achieved solution.
         | 
         | Wether it's solved or not depends on wether you mean `general
         | folding` or the `entire spectrum of folding` when considering
         | the problem.
        
           | 6gvONxR4sf7o wrote:
           | Solve could mean understanding the underlying mechanism, but
           | in this case, I don't think that's how they did it.
        
             | hans1729 wrote:
             | My intuition for deeplearning was exactly that, statistical
             | inference of underlying mechanisms. But I haven't read the
             | paper yet, so you might be right
        
         | ramraj07 wrote:
         | It's probably never going to be solved though right. To truly
         | solve protein folding we'd have to have a program that can
         | stimulate a small but still significant system at the QM level;
         | looks like deep learning can get us 60% (conservatively
         | estimating the whole problem domain ) but not all the edge
         | cases, just like it did in other problem domains as well.
        
           | dekhn wrote:
           | It remains unclear whether QM is required to fold proteins
           | accurately. So far classical methods have shown they require
           | far less computer power to get far closer to the right
           | structure.
        
           | PaulDavisThe1st wrote:
           | Despite this breakthrough by DeepMind, at this point we still
           | do not _understand_ protein folding. That makes it very hard
           | to say precisely which features would be required to do the
           | simulation correctly.
           | 
           | DeepMind/AlphaFold might have something to contribute there
           | too, depending on how interpretable their network model(s?)
           | are.
        
             | ramraj07 wrote:
             | They seem to have a completely new tension algorithm that's
             | doing the heavy lifting now, so it's likely we will learn
             | much about how folding practically works from these results
             | as well.
        
       | sabujp wrote:
       | So it might "be over" for small molecules, but let's see large
       | macromolecules and protein assemblies be predicted
        
       | tgbugs wrote:
       | My conclusion reading this is that a gradient is a gradient is a
       | gradient. If you can minimize one, you can minimize them all. The
       | hard work would seem to be figuring out how to transform into a
       | gradient that your hardware can solve. It will also be
       | interesting to see the kinds of systematic errors that will come
       | as a result of the biases in the training set, and whether it can
       | be used to predict what the structures would look like under
       | slightly different conditions (e.g. pH).
        
       | 29athrowaway wrote:
       | So what's going to happen to fold.it and folding@home now?
        
         | flobosg wrote:
         | See https://news.ycombinator.com/item?id=25256318 and
         | https://news.ycombinator.com/item?id=25256772
        
       | mabbo wrote:
       | Sometimes announcements like this are a bit over-the-top. But
       | what really, to me, cements the 'big-deal' of this is the "Median
       | Free-Modelling Accuracy" graph half way down the page.
       | 
       | Scores of 30-45 for 15 years. Now scores of 87-92.
       | 
       | This isn't a minor improvement, it's a leap forward.
        
         | martinpw wrote:
         | Why is the graph not monotonically increasing? Does the
         | complexity of the problem to be solved increase each time? If
         | so, does that make the relative improvement from the previous
         | result even more impressive?
        
           | breatheoften wrote:
           | That's quite interesting ... I believe the test set size is
           | not constant year to year but rather a function of how many
           | new structures have been experimentally discovered since the
           | last contest?
           | 
           | Does seem like the contest structure could include quite a
           | bit of risk for hiding the effect of overfitting ... I wonder
           | if there is anything inherent about the problem that reduces
           | that risk ...?
        
             | FrojoS wrote:
             | My understanding is, that it's always 100 new structures,
             | which is a small fraction of the total structures
             | identified in that year.
             | 
             | The reason why the top score in one year, can be lower than
             | in the previous year, is that the test (the 100 structures
             | to guess) is always new and different, so it can end up
             | being 'harder' than the year before. Luck will also play a
             | small role.
             | 
             | Another explanation for a reduction in the top score would
             | be, that previous winners are not re-submitted unchanged.
             | For instance AlphaFold v1 seems to not have been submitted
             | to the latest competition.
        
               | breatheoften wrote:
               | Only 100 new structures each test cycle? That seems a
               | very small test set size ...
               | 
               | Is it really possible to select 100 new structures which
               | together are likely to represent a meaningful increase in
               | the sample generalization versus the prior years test set
               | ...?
        
               | MauranKilom wrote:
               | Given that we only _know_ the structure of on the order
               | of 100k proteins, we might only get another 10k new ones
               | per year. I guess.
               | 
               | Using 1% of those (presumably from the more-often-
               | reproduced subset) for this challenge seems reasonable?
               | Note that the structures have to remain secret up until
               | the challenge, and presumably all those teams uncovering
               | the structures don't want to have to wait up to 2 years
               | every time to actually make their results public.
        
               | breatheoften wrote:
               | Interesting ... plenty of opportunity then potentially
               | for the 100 samples to have prediction similarity to the
               | set of published discoveries (for expected or unknown
               | reasons)?
               | 
               | I suppose it will take a few more years of repetition for
               | the challenge to confirm that the problem has been been
               | solved -- but I wonder if a new version of the contest is
               | going to be needed as well? Maybe the model accuracy is
               | now high enough to invert the contest to a form where
               | models generate predictions for randomly selected unknown
               | samples -- and experimental teams are then expected to
               | make observations for those particular sequences over the
               | next two years as part of their otherwise research agenda
               | selected experimental workload?
        
         | entropicdrifter wrote:
         | Not to mention the fact that two years ago they took it from
         | 45% to >60%. If they can continue improving, even with an
         | exponential decay in rate of improvement, this is certainly a
         | stunning example of technological disruption.
        
           | Zenst wrote:
           | Even without any improvement, the amount of grunt-work the AI
           | can pre-do and get down to a short-list - that in itself will
           | see changes in progress speeding research up.
        
             | kordlessagain wrote:
             | > and get down to a short-list
             | 
             | There's no reason to believe the list will contain all
             | solutions, however.
        
               | patagurbon wrote:
               | No but it will hopefully contain _some_. Which for many
               | if not most problems is all that matters
        
         | WhompingWindows wrote:
         | This reminds me of AlphaGo and AlphaZero. DeepMind was able to
         | produce a very solid model on their first attempt, at both
         | protein folding and at Go (and Starcraft2 as well). Their
         | second models, however, seemed to blow their first out of the
         | water.
         | 
         | This bodes extremely well for the future of computational
         | biology, I'm very excited thinking about the prospects. If we
         | know how a protein folds, we know its shape, meaning we know
         | which shaped/charged molecules are needed to act as
         | suppressors/enhancers of those proteins.
        
           | layer8 wrote:
           | One difference to AlphaZero though, if my understanding is
           | correct, is that AlphaFold is trained on a predetermined data
           | set and hence didn't learn how "arbitrary" proteins fold in
           | general, but just how the kinds of proteins fold for which we
           | already know how they fold. To work more like AlphaZero,
           | AlphaFold would have to be able to synthesize arbitrary
           | proteins and run the experiments on them to verify and
           | correct its predictions. Therefore it's conceivable that
           | AlphaFold is biased by the existing training data and doesn't
           | fully generalize to all proteins we would want to apply it
           | to. Maybe that won't be a problem in practice, but
           | nevertheless it makes for a significant difference from what
           | AlphaZero was about, being solely self-trained.
        
             | the8472 wrote:
             | > AlphaFold would have to be able to synthesize arbitrary
             | proteins and run the experiments on them to verify and
             | correct its predictions.
             | 
             | Could this lead to a virtuous cycle where AlphaFold is used
             | generate a ton of random sequences where it has low
             | confidence, those are then screened for ease of synthesis,
             | measured and the results used to improve the model?
             | 
             | Edit: nevermind, according to another comment[0] there are
             | still plenty of real proteins without experimental data
             | left to explore.
             | 
             | [0] https://news.ycombinator.com/item?id=25255601
        
         | treis wrote:
         | That is an impressive improvement, but I think you've missed
         | the most important point:
         | 
         | >a score of around 90 GDT is informally considered to be
         | competitive with results obtained from experimental methods
         | 
         | So DeepMind is to the point where it's a question of whether
         | their generated model or the experimentally determined
         | structure is closest to the actual physical structure.
        
           | timr wrote:
           | _" So DeepMind is to the point where it's a question of
           | whether their generated model or the experimentally
           | determined structure is closest to the actual physical
           | structure."_
           | 
           | While this is an accomplishment, nobody is going to be
           | confusing these models for structures produced
           | experimentally. The CASP metric is for backbone atoms. To
           | have a useful model of protein structure, you really need to
           | have the positions of the protein side-chain atoms modeled
           | correctly. Experimental methods will do that, but this
           | method, as I understand it, does not.
        
             | jey wrote:
             | So it's a really good start, but nobody is going to be
             | throwing these structures into molecular docking
             | simulations for drug discovery or etc just yet. But
             | hopefully those details can be worked out soon enough.
        
               | timr wrote:
               | Yeah, there's a huge difference between a 1A _all-atom_
               | RMSD structure, and a 1A _backbone_ RMSD structure. The
               | non-backbone atoms in a protein make up most of the mass
               | and volume. When structural biologists talk about RMSD,
               | this is what they mean.
        
           | mabbo wrote:
           | Then we get the really fun question: if the experimentally
           | determined structure is only 90% accurate, can machine
           | learning actually reach 100%? Can you learn exact truth from
           | inexact examples?
           | 
           | Which gets into the concept of whether the ML model has
           | _actually_ learned some deeper conceptual ideas than we have,
           | some deeper truth about how this works. If so, can we somehow
           | extract that truth, or is it truly a black box that does the
           | thing we want?
           | 
           | I'm reminded of a sci-fi book I read long ago in which humans
           | are discussing the fact that the science they are utilizing
           | is beyond the scope of a human mind to comprehend- only the
           | AIs can intuitively deal with 12-dimensional manifolds (or
           | something to that extent). Maybe we've reached the doorstep
           | of that future.
        
             | carlmr wrote:
             | If you have an experimental error that is somewhat normally
             | distributed around the mean, the the AI should, with enough
             | examples, learn what the rules are that are closest to the
             | mean. Because it will minimize the sum of errors.
             | 
             | So i do think the results could be more accurate than
             | measurement.
        
           | SubiculumCode wrote:
           | Something like this comes up in assessing the accuracy of
           | automated segmentation results of brain regions e.g. the
           | hippocampus. Human-machine reliability is approaching the
           | human to human reliability, so it becomes harder to improve
           | the automated methods.
        
           | asdfasgasdgasdg wrote:
           | I have a related question about this. If experimental methods
           | produce results around a score of 90, what is the baseline we
           | are comparing the DeepMind results against? If the
           | experimental error is equal to the observed DeepMind error,
           | how can we say which one is actually more erroneous?
        
             | mrDmrTmrJ wrote:
             | Excellent question. At somepoint, I think the only answer
             | is, "have a bunch of different people run a bunch of
             | experiments on the same protein."
             | 
             | The threshold for "real" in particle physics is +5 sigma.
             | Which takes a lot of data.
        
             | cpeterso wrote:
             | And is it even meaningful for DeepMind to score better than
             | experimental results? How are DeepMind's results scored
             | then?
        
             | IfOnlyYouKnew wrote:
             | The "experiments" here use X-Ray Crystallography. Like most
             | methods of measuring anything, we have a pretty good idea
             | of its accuracy under various conditions.
             | 
             | Think of it like satellite imagery of a tree: A score of
             | zero would be a single green-ish pixel, while a score of
             | 100 would show each leaf within the range it naturally
             | moves in due to wind etc. (proteins tend to wiggle quite a
             | bit under natural conditions, as well)
        
             | 0-_-0 wrote:
             | That's a damn good question, it looks like we don't know
             | how much above 90 AlphaFold is.
        
               | [deleted]
        
             | marcosdumay wrote:
             | Finding the energy of each configuration should be much
             | easier than finding the lowest-energy configuration. Can
             | that be calculated ab-initio or it is still too expensive?
        
               | crispycrafter2 wrote:
               | The problem with ab-initio methods in this context is the
               | sheer number of non-covalent interactions present in
               | these large proteins. A simple protein would require a
               | hybrid quantum mechanic/molecular mechanics simulation to
               | even approximate the vibrational energy required to
               | validate equilibrium.
               | 
               | These proteins are so massive that we often use Daltons
               | [1] as an averaged measure of molecular weight.
               | 
               | Conceptually one of the most promising applications of
               | quantum computing is theoretical chemistry, and we are
               | only now starting to make progress in this avenue [2]. I
               | anticipate it would require quantum computing to
               | explicitly optimise large folded proteins.
               | 
               | 1. https://en.m.wikipedia.org/wiki/Dalton_(unit) 2.
               | https://arxiv.org/abs/2004.04174
        
             | radioactivist wrote:
             | I think it's that a score of >90 means the result is within
             | the error bars of whatever particular experiment was chosen
             | to be the "reference".
        
           | contravariant wrote:
           | Of course this may no longer be the case for methods solely
           | trained to optimize that particular metric.
        
           | mFixman wrote:
           | I don't have a background in biology, and that quote confused
           | me.
           | 
           | What's an experimental method for protein folding and why is
           | it so good? Are they talking about creating an actual,
           | physical protein in a lab and observing how it folds?
        
             | flobosg wrote:
             | > Are they talking about creating an actual, physical
             | protein in a lab and observing how it folds?
             | 
             | Exactly. Researches purify the folded protein and then use
             | methods such as X-ray crystallography, nuclear magnetic
             | resonance, and cryo-electron microscopy to determine its
             | three-dimensional atomic structure.
        
           | beowulfey wrote:
           | I don't think you can say DeepMind could ever be more
           | accurate to the true physical structure since it was built on
           | the same experimental structures that it is being compared
           | to. The limit of accuracy is the experimental data. However,
           | I think we can say that a DeepMind prediction could at least
           | be _as good as_ a new experimental structure.
        
             | dwiel wrote:
             | This seems like an obvious assumption to make, but it isnt
             | always true. It is easier to see why if you are measuring a
             | single value multiple times in order to get a more accurate
             | estimate of the true value. In that case your "model" is
             | simply the mean of all measurements made and can exceed the
             | accuracy of a single measurement.
             | 
             | In this case, the model is predicting values of multiple
             | structures, but patterns could still theoretically be found
             | which allow for predictions beyond the accuracy of a single
             | measurement.
        
             | dekhn wrote:
             | DM is merging several experimental data: known x-ray
             | structures, and evolutionary data. The experimental method
             | (xray) doesn't take advantage of the evolutionary data. And
             | it also doesn't model the underlying protein behavior
             | accurately (xray basically assumes a single static model
             | with atoms fluctuating in little gaussian "puffs" around
             | the atomic centers, but that's not how most proteins
             | behave).
        
             | FrojoS wrote:
             | Is that true? I thought fundamentally, the simulation tries
             | to find the state of lowest energy, which is defined by
             | physics. So, your result can be better than the data set
             | used for training.
        
             | robocat wrote:
             | But DeepMind could be used to find errors in the training
             | set.
             | 
             | Let's say you have 100000 proteins in the training set. Now
             | remove #1 and train on 99999, and then check that it still
             | predicts the same protein result for #1 as the experimental
             | result.
             | 
             | Or remove from training whole sets of proteins by
             | particular teams to find systematic errors made by teams?
        
           | mactrey wrote:
           | I'm not a biologist but I'm not sure that follows. It could
           | be that the experimentally-derived structure is 100% accurate
           | to the actual physical structure but getting 90% of your
           | predicted residues to match that is enough to get an accurate
           | prediction of protein behavior and hence "competitive."
        
       | phonebucket wrote:
       | This is a huge jump forward. Last year's performance already was
       | a big step up over the previous, and this seems to go much
       | further. So big kudos to the research team.
       | 
       | Nonetheless, I'd like to hear more from specialists outside the
       | context of a marketing blog post before I fully buy into a claim
       | of a solution.
       | 
       | There's also a rabbit hole about what 'solution' actually means.
       | Is the performance sufficient for any protein folding prediction
       | application that might arise in the future?
        
       | flobosg wrote:
       | See also the news in Nature:
       | https://www.nature.com/articles/d41586-020-03348-4
        
       | yarabarla wrote:
       | Man, I remember running folding@home years ago on my terrible
       | laptop. Now this was done with what they say is equivalent to
       | only 100-200 GPUs. Crazy to see how far we've come in just a
       | short amount of time.
        
         | sumtechguy wrote:
         | me too... should have done bitcoins :)
        
       | jjk166 wrote:
       | Now onto the much harder problem of doing the reverse: taking an
       | arbitrary structure and determining an amino-acid sequence that
       | will fold into it.
        
         | Rochus wrote:
         | What for?
        
           | jjk166 wrote:
           | The forward folding problem lets you determine structures
           | from a known genetic sequence. So for example you could very
           | quickly sequence the genome of a virus and figure out how it
           | worked much faster than current methods allow.
           | 
           | The reverse folding problem lets you specify a structure and
           | then make a genetic sequence to produce it. For example you
           | could look at this virus to see how it infects its host, then
           | design a custom protein to act as an anti-body stopping it,
           | which is a capability we don't currently have.
           | 
           | Forward folding is certainly useful, but reverse folding
           | would be revolutionary.
        
             | Rochus wrote:
             | The set of all proteins which can potentially be expressed
             | in an organism is known. Now maybe we also get decent
             | (static) structure information for these. But the
             | interaction of a virus with the host cell is much more
             | complex. There is much more than just an amino acid
             | sequence involved. And these parts are all moving, so a
             | static picture as we now can create faster than before does
             | not contain all the information necessary to fully
             | understand the functions.
        
               | jjk166 wrote:
               | Precisely why I referred to it as a different and harder
               | problem
        
               | Rochus wrote:
               | There are a lot of different harder problems.
        
               | jjk166 wrote:
               | So?
        
               | weregiraffe wrote:
               | >The set of all proteins which can potentially be
               | expressed is known.
               | 
               | Sure, "known", but it's on the order of 20^10000. It
               | won't fit in the entire visible volume of the universe.
        
               | Rochus wrote:
               | No, the genome of the host is much smaller than the
               | theoretical number of combinations. There are about 20 to
               | 30k different proteins in a human cell (about 20k
               | directly encoded on the DNA).
        
               | jjk166 wrote:
               | If you are designing proteins, you're not limited to
               | those that are already encoded in the host's DNA.
        
               | Rochus wrote:
               | Right, but you made the example with the virus docking at
               | a known organism. If you do synthetic biology and modify
               | bacteria to produce any proteins then the situation is
               | different of course.
        
           | ramraj07 wrote:
           | The other comment mentioned the example of making proteins
           | that bind a structure. Heres an extension - a general
           | understanding of how an enzyme works to catalyze chemical
           | reactions, is that it binds the reaction intermediate with
           | higher affinity than the two substrates; thus if we have this
           | reverse ability, we can start inventing enzymes that can
           | catalyze any arbitrary chemical reaction, even ones that need
           | energy input, so you could imagine for example enzyme systems
           | that can convert plastic to fuel!
        
             | Rochus wrote:
             | Ok, then this is about enzymes which do not yet exist in
             | the organism. You could then modify bacteria so they
             | produce this enzyme and feed on plastic, I see.
        
               | flobosg wrote:
               | Plastic degradation is a thing already in naturally
               | occurring bacteria that evolved a PETase:
               | https://science.sciencemag.org/content/351/6278/1196/tab-
               | fig...
        
               | Rochus wrote:
               | But producing fuel as the fellow suggested would then be
               | another function to be added to the bacterium; and maybe
               | it should work on different kinds of plastic.
        
               | flobosg wrote:
               | Of course, that's why I focused on degradation. There's
               | plenty of room for improvement. For instance, PETase is
               | not very efficient actually, and many research groups are
               | working on its engineering.
        
         | abecedarius wrote:
         | I think you have this backwards in practice. It was in the 80s
         | that I first read a paper about a de-novo protein design
         | engineered for a specific stable conformation. Natural proteins
         | have no reason to be particularly predictable, just as genetic
         | programming produces hard-to-understand programs relative to
         | human-written ones. In fact making the structure especially
         | stable against perturbations seems like it'd make it less
         | responsive to changing evolutionary pressures.
         | 
         | (Am not a structural biologist.)
         | 
         | Added: 2019 article on de novo design:
         | https://www.nature.com/articles/d41586-019-02251-x Not to say
         | that better prediction won't also make design easier -- of
         | course I expect it will.
        
         | flobosg wrote:
         | Deep learning methods are being applied here as well; see for
         | example
         | https://www.biorxiv.org/content/10.1101/2020.07.22.211482v1
        
         | ashtonbaker wrote:
         | I assume if the forward direction is fast enough, the reverse
         | could be done by evolutionary methods.
        
         | Metacelsus wrote:
         | David Baker's lab is working on this; their Rosetta program has
         | been getting reasonably good at it.
        
       | uuuuuuuuuuuu wrote:
       | I feel like DeepMind has a disproportionately large scientific
       | impact relative to its resource pool. How would one (or a group)
       | go about replicating its success?
        
         | ribrars wrote:
         | I think the key here to replicating the success is the
         | deployment of deep learning effectively. But I would argue that
         | deepmind's resource pool is immense, it's backed by Google. The
         | resources of GPU's (and more advanced TPU's) are in
         | abundance... not to mention the many brilliant PhD scientists
         | who work there.
        
       | danaris wrote:
       | The title here is not merely breathless clickbait, it also has
       | very little to do with the headline of the actual article, which
       | is "AlphaFold: a solution to a 50-year-old grand challenge in
       | biology".
       | 
       | I thought the #1 criterion for titles was that they should match
       | the original if at all reasonable...?
        
       | woeirua wrote:
       | This is a big step forward, but the outstanding question as far
       | as to whether or not this is useful for evaluating novel
       | proteins, is going to be how good is the confidence metric at
       | telling the user to trust or not trust the results. You can see
       | from their examples, that AlphaFold is very good but not perfect.
       | I imagine for some proteins it will still give misleading or
       | erroneous results and if you can't tell when that happens without
       | verifying the structure experimentally then this will likely not
       | be that useful for new science.
        
         | mcshicks wrote:
         | I was wondering the same thing. But I also wonder if having
         | good guesses makes the x-ray crystallography and other
         | experiments to verify a given protein easier/cheaper/quicker? I
         | don't know enough about the actual techniques to have an
         | informed opinion but I would think it would be helpful.
        
           | sanxiyn wrote:
           | It does. https://www.nature.com/articles/d41586-020-03348-4
           | reports a case of x-ray crystallography helped by AlphaFold
           | prediction.
        
         | asdfasgasdgasdg wrote:
         | > the outstanding question as far as to whether or not this is
         | useful for evaluating novel proteins
         | 
         | That is not an outstanding question. The test on which DeepMind
         | scored high marks is a test of how well the algorithm folds
         | novel proteins -- proteins whose ground-truth structure has not
         | yet been published.
        
           | sundarurfriend wrote:
           | You missed the actual outstanding question in their comment:
           | 
           | > the outstanding question ... is going to be how good is the
           | confidence metric at telling the user to trust or not trust
           | the results.
        
             | deeviant wrote:
             | You don't generally look at neural network output like
             | that.
             | 
             | There is generally a threshold, less than X, not the class,
             | equal or more, is the class. Then you run the network with
             | the same threshold on a known data set and compute a
             | confusion matrix, which tells you about the error, I don't
             | even want to know what a confusion matrix analogue for 3D
             | geometry would look like but I'm sure they have something.
             | 
             | This is literally the process that one does in taking part
             | of the this. And the error rate (specifically the lack of
             | errors) is what is everybody is talking about. 90 is just
             | as accurate as we can get with experimental measurement.
             | It's likely at this point the source of error is in the
             | data set (we can only train on data we experimentally
             | measure and these are not perfect measurements). It's also
             | possible, at this point, the model generalized so well that
             | when it deviates from experimental measurements __it 's
             | actually correct and the experimental value was the one
             | that was wrong __.
             | 
             | So no, the outstanding question is not "is going to be how
             | good is the confidence metric at telling the user to trust
             | or not trust the results.". Nobody is going to be looking
             | confidence values when it model is giving an output, they
             | are going to be looking at the overall error rate across a
             | broad spectrum of proteins to get a sense of it's accuracy.
        
           | woeirua wrote:
           | We'd have to see the distribution of GDT scores evaluated on
           | unknown proteins to say anything about how confident we can
           | be. If the distribution is tightly distributed around the
           | median then great, this works really well. If the variance is
           | large though then you're going to have a hard time using this
           | for meaningful predictions.
        
             | foota wrote:
             | According to the article there's a confidence score as
             | well. As long as this is sufficiently predictive of errors
             | either a tight or wide distribution is likely acceptable.
        
               | woeirua wrote:
               | We need to see the relationship between confidence and
               | GDT score. If you have a nice relationship then again
               | everything is great. But... most confidence metrics from
               | neural networks do not have a nice relationship to the
               | primary metric.
        
         | theptip wrote:
         | It's a good question, and I'm not a domain expert here.
         | 
         | The article did claim:
         | 
         | > According to Professor Moult, a score of around 90 GDT is
         | informally considered to be competitive with results obtained
         | from experimental methods.
         | 
         | So perhaps their score of 87 GDT is pretty significant. But
         | "competitive with" is not the same as "always in agreement
         | with", as you point out. Could be the failure modes are
         | problematic.
        
         | kxs wrote:
         | There are other experimental methods that are much cheaper that
         | can be used to assist validation. Also the models look damn
         | impressive, even down to the sidechain packing.
        
         | aardvarkr wrote:
         | Every simulator is going to have error. In this case this
         | biennial challenge represents the computational state of the
         | art with scores of 30-40 over the last decade. The AlphaFold2
         | model sends that score up to 87 with errors about than the
         | width of the atom. You can actually see the difference between
         | their prediction and the actual result and it's stunning. This
         | is all on the blog site so I recommend reading before throwing
         | shade.
        
           | woeirua wrote:
           | I read the blog. But there's a big difference between a mind
           | blowing tech demo and something that can be used in a
           | commercially viable process.
        
         | comicjk wrote:
         | Scientists can verify that an AlphaFold-predicted structure is
         | correct, or at least useful, without being able to get the
         | structure experimentally. For instance, we could use the
         | AlphaFold-predicted structure to do protein-ligand binding
         | calculations for a bunch of known molecules. If these
         | calculations agree with experimental protein-ligand binding
         | (which they generally do for proteins with known structures),
         | then we can say with high confidence that we've got a good
         | structure.
        
           | dontbeevil1992 wrote:
           | does that mean that protein-folding is sort of in NP?
        
             | hedora wrote:
             | It's probably not in NP, in that there is not a polynomial
             | time algorithm that checks solutions for correctness.
        
             | dekhn wrote:
             | The way computer scientists do it, yes, it is. In the CS
             | situation you define an energy function (in this case
             | representing the physical behavior of the protein in water)
             | and find a heuristic to approximate the coordinates of the
             | lowest energy configuration; done, problem solved.
             | 
             | in reality, that's not how it works at all. The energy
             | functions we have are crappy and require too much sampling
             | before we can find the lowest energy configuration. And
             | more importantly, it doesn't look like proteins typically
             | fold to their lowest energy configuration (with the
             | exception of some small fast two state folders), but rather
             | explore a kinetically accessible region around there (or
             | even somewhere else entirely, if the energy cost to
             | transition is too high).
             | 
             | Methods like AF depend heavily on large amount of
             | information correlation from evolutionary data, which has
             | historically been of the highest value for making decisions
             | about protein structure.
        
       | jeffxtreme wrote:
       | GDT_TS for AlphaFold is now comparable is at experimental levels;
       | but that's based on the class of proteins for which we've been
       | able to determine the 3D structure of the protein, for which
       | there might be selection bias.
       | 
       | I wonder if we can determine if this extends to proteins that
       | aren't as keen to determining their 3D structure?
       | 
       | For example, certain proteins are more crystallizable than
       | others.. For these non-crystallizable proteins, I wonder if we
       | can say that AlphaFold would generate accurate 3D models? And if
       | possible, might there be a way to map out this uncertainty?
        
         | deeviant wrote:
         | > I wonder if we can determine if this extends to proteins that
         | aren't as keen to determining their 3D structure?
         | 
         | This is already happened.
         | 
         | "An AlphaFold prediction helped to determine the structure of a
         | bacterial protein that Lupas's lab has been trying to crack for
         | years. Lupas's team had previously collected raw X-ray
         | diffraction data, but transforming these Rorschach-like
         | patterns into a structure requires some information about the
         | shape of the protein. Tricks for getting this information, as
         | well as other prediction tools, had failed. "The model from
         | group 427 gave us our structure in half an hour, after we had
         | spent a decade trying everything," Lupas says."
         | 
         | From: https://www.nature.com/articles/d41586-020-03348-4
        
           | jeffxtreme wrote:
           | Agree this is great to hear, but the fact that they had X-ray
           | diffraction data indicates this protein was indeed
           | crystallizable no?
           | 
           | Though the next paragraph in the article shows that DeepMind
           | is indeed working on mapping out reliability:
           | 
           | "Demis Hassabis, DeepMind's co-founder and chief executive,
           | says that the company plans to make AlphaFold useful so other
           | scientists can employ it. (It previously published enough
           | details about the first version of AlphaFold for other
           | scientists to replicate the approach.) It can take AlphaFold
           | days to come up with a predicted structure, which includes
           | estimates on the reliability of different regions of the
           | protein. "We're just starting to understand what biologists
           | would want," adds Hassabis, who sees drug discovery and
           | protein design as potential applications."
        
             | flobosg wrote:
             | > Agree this is great to hear, but the fact that they had
             | X-ray diffraction data indicates this protein was indeed
             | crystallizable no?
             | 
             | Yes. CASP uses as targets proteins with no known published
             | structure but a solved or soon-to-be-solved one. They are
             | then kept on hold until the end of the competition.
        
       | dalbasal wrote:
       | Question for the wise:
       | 
       | Assuming optimistic further progress, what are the implications
       | of accurately predicting protein folding? What are we hoping to
       | discover, or succeed in doing?
        
       | spenczar5 wrote:
       | AlphaFold was used to analyze proteins in SARS-CoV-2
       | (https://www.crick.ac.uk/news/2020-03-05_crick-scientists-
       | sup...). Does anyone know what impact that has had?
       | 
       | This is really an amazing moment.
        
       | nmca wrote:
       | https://deepmind.com/blog/article/alphafold-a-solution-to-a-...
        
       | optimalsolver wrote:
       | Can just anyone enter this challenge, or do you have to be part
       | of a major institution?
        
         | dmd wrote:
         | Anyone.
        
       | Quarrel wrote:
       | As someone who wrote a thesis many moons ago about protein
       | folding, this is pretty astonishing to see. Yay science.
        
       | optimalsolver wrote:
       | Can just anyone enter this challenge, or do you have to be part
       | of a major institution?
        
         | flobosg wrote:
         | I think anyone can take part. There are a few unaffiliated
         | participants.
        
       | schemescape wrote:
       | Did AlphaFold2 also have the biggest budget? :)
       | 
       | Edit: from the other HN article on this topic:
       | 
       | > We trained this system on publicly available data consisting of
       | ~170,000 protein structures from the protein data bank together
       | with large databases containing protein sequences of unknown
       | structure. It uses approximately 128 TPUv3 cores (roughly
       | equivalent to ~100-200 GPUs) run over a few weeks
       | 
       | https://deepmind.com/blog/article/alphafold-a-solution-to-a-...
        
         | [deleted]
        
         | curiousllama wrote:
         | Actually, no! Or at least, the budget they used (<$100k at
         | retail prices to train the model) is well within the feasible
         | range for other research institutions.
         | 
         | In other words, it's less like GPT3 and more like ImageNet.
        
           | whimsicalism wrote:
           | I don't know - tens of thousands per train is not accessible
           | for most academic institutions when you consider the
           | necessity of ablation studies, experimentation, etc.
        
             | anchpop wrote:
             | For a topic like protein folding, it should be
        
               | whimsicalism wrote:
               | > For a topic like protein folding, it should be
               | 
               | Well, I've worked in some academic deep research labs and
               | they did not have the money to do the experiments they
               | wanted to do.
        
           | lacksconfidence wrote:
           | Is the cost to train really the relevant metric for
           | developing this? It seems like the salary's involved are
           | probably at least 10x whatever they spent on hardware.
        
             | TomJansen wrote:
             | >Is the cost to train really the relevant metric for
             | developing this?
             | 
             | Yes, because they must release sufficient information for
             | others to recreate the AI model, according to the rules of
             | entering CASP.
        
               | lacksconfidence wrote:
               | I was replying in the context of the grand parent:
               | 
               | > Did AlphaFold2 also have the biggest budget? :)
               | 
               | And then the parent
               | 
               | > Actually, no! Or at least, the budget they used (<$100k
               | at retail prices to train the model) is well within the
               | feasible range for other research institutions.
               | 
               | I'm not sure how the cost of replicating the model in the
               | future is relevant in this context. We appear to be
               | discussing the cost of developing this model from
               | scratch, such as what it would have taken an alternate
               | team to create and submit this if DeepMind never got
               | involved.
        
             | kevincox wrote:
             | Additionally the training test of all of the models during
             | development.
        
           | [deleted]
        
       | partingshots wrote:
       | I continue to be impressed by how quickly DeepMind has managed to
       | progress in such a short time. CASP13 was a shocker to all of us
       | I think, but many were skeptical as to the longevity of the
       | performance DeepMind was able to achieve. I believe with CASP14
       | rankings now released, it's safe to say that they've proven
       | themselves.
       | 
       | Congratulations to the team! This work will have far reaching
       | impacts, and I hope that you continue to invest heavily in this
       | area of research.
        
         | [deleted]
        
         | whimsicalism wrote:
         | Progress like this was, in my view, inevitable after the
         | invention of unsupervised transformers.
         | 
         | It'll be genetics next.
         | 
         | e: although AlphaFold appears to be convolutionally based! I
         | suspect that'll change soon.
        
           | alquemist wrote:
           | FWIW, transformers is to sequences what convnets is to grids,
           | modulo important considerations like kernel size and
           | normalization. Think of transformers as really wide (N) and
           | really short (1) convolutions. Both are instances of
           | graphnets with a suitable neighbor function. Once
           | normalization was cracked by transformers, all sort of
           | interesting graphnets became possible, though it's possible
           | that stacked k-dimensional convolutions are sufficient in
           | practice.
        
             | whimsicalism wrote:
             | I work in the field, I don't need the difference explained
             | to me.
             | 
             | > Think of transformers as really wide (N) and really short
             | (1) convolutions
             | 
             | Modern transformer networks are not "really short" and
             | you're also conflating the difference between intra- and
             | inter- attention.
             | 
             | There is still a pitched battle being waged between
             | convnets and transformers for sequences, although it looks
             | like transformers have the upper hand accuracy wise right
             | now, convnets are competitive speed-wise.
        
           | klmr wrote:
           | > _It 'll be genetics next._
           | 
           | Which part of genetics are you thinking of? Much of genetics
           | isn't amenable to this kind of ML, because it isn't some kind
           | of optimisation problem. And many other parts don't require
           | ML because they can be modelled very closely using exact
           | methods. ML _does_ get used here, and sometimes to great
           | effect (e.g. DeepVariant, which often outperforms other
           | methods, but not by much -- not because DeepVariant isn't
           | good, but rather because we have very efficient
           | approximations to the exact solution).
        
             | whimsicalism wrote:
             | What do you mean?
             | 
             | Genetics is amenable because the genome is a sequence that
             | can be language modeled/auto-regressed for depth of
             | understanding by the network.
             | 
             | There are plenty of inferences that you would want to do on
             | genetic sequences that we can't model exactly and there is
             | some past work on doing stuff like this, although biology
             | is usually a few years behind.
             | 
             | https://www.nature.com/articles/s41592-018-0138-4
             | 
             | e: for clarity
        
               | garmaine wrote:
               | This is word salad.
        
               | whimsicalism wrote:
               | Rude. I would appreciate substantive criticism,
               | especially when I'm linking papers in Nature starting to
               | do _exactly_ what I 'm talking about.
        
               | garmaine wrote:
               | I cannot give constructive feedback to something which is
               | incomprehensible.
               | 
               | "the genome is a sequence that can be language
               | modeled/auto-regressed for depth of understanding by the
               | network"
               | 
               | The genome is not a sequence so much as a discrete set of
               | genes which are themselves sequences which specify
               | construction plans for proteins. That distinction is
               | important.
               | 
               | Language modeling in the context of machine learning
               | typically means NLP methods. Genetics is nothing like
               | natural language.
               | 
               | Auto-regression is using (typically time series)
               | information to predict the next codon. This makes very
               | little sense in the context of genetics since, again, the
               | genetic code is not an information carrying medium in the
               | same sense as human language. Being able to predict the
               | next codon tells you zilch in terms of useable
               | information.
               | 
               | "Depth of understanding by the network" ... what does
               | that even mean???
               | 
               | The above sentence is a bunch of popular technical jargon
               | from an unrelated field thrown together in a nonsensical
               | way. AKA word salad.
        
               | whimsicalism wrote:
               | > The genome is not a sequence so much as a discrete set
               | of genes which are themselves sequences which specify
               | construction plans for proteins. That distinction is
               | important.
               | 
               | aka a sequence. "a book is not a sequence so much as a
               | discrete set of chapters which are themselves sequences
               | of paragraphs which are themselves sequences of
               | sentences" -> still a sequence
               | 
               | these techniques are already being used, such as in the
               | paper I just linked.
               | 
               | > Being able to predict the next codon tells you zilch in
               | terms of useable information.
               | 
               | You have absolutely no way of knowing that apriori. And
               | autogressive tasks can be more sophisticated than just
               | next codon.
               | 
               | > bunch of popular technical jargon from an unrelated
               | field thrown together in a nonsensical way
               | 
               | Okay, feel free to think that.
               | 
               | There's always this assumption of it "will never work on
               | _my_ field. " I've done work on NLP and on proteins and
               | read others' work on genetics. I think you will end up
               | being surprised, although it might take a few years.
        
               | klmr wrote:
               | I meant, which _specifics_ are you thinking of?
               | 
               | > _Genetics is amenable because it is a sequence_
               | 
               | Not sure what you mean by that. Genetics is a field of
               | research. The _genome_ is a sequence. And yes, that
               | sequence can be modelled for various purposes but without
               | a specific purpose there's no point in doing so (and
               | furthermore doing so without specific purpose is trivial
               | -- e.g. via markov chains or even simpler stochastic
               | processes -- but not informative).
               | 
               | > _There are plenty of inferences that you would want to
               | do on genetic sequences_
               | 
               | I'm aware (I'm in the field). But, again, I was looking
               | for specific examples where you'd expect ML to provide
               | breakthroughs. Because so far, the reason why ML hasn't
               | provided many breakthroughs in less about the lack of
               | research and more because it's not as suitable here as
               | for other hard questions. For instance, polygenic risk
               | scores (arguably the current "hotness" in the general
               | field of genetics) can already be calculated fairly
               | precisely using GWAS, it just requires a ton of clinical
               | data. GWAS arguably already uses ML but, more to the
               | point, throwing more ML at the problem won't lead to
               | breakthroughs because the problem isn't compute bound or
               | vague, it's purely limited by data availability.
               | 
               | I could imagine that ML can help improve spatial
               | resolution of single-cell expression data (once again ML
               | is already used here) but, again, I don't think we'll see
               | improvements worthy of called breakthroughs, since we're
               | already fairly good.
        
               | whimsicalism wrote:
               | > Not sure what you mean by that
               | 
               | I spoke loosely, my mind skipped ahead of my writing, and
               | I didn't realize that we were parsing so closely.
               | "Genetics (the field) is amenable because the object of
               | its study (the genome) is a sequence" would have been
               | more correct but I thought it was implied.
               | 
               | > without a specific purpose there's no point in doing so
               | 
               | Well yes, prior to the success of transfer learning I
               | could see why you would think that is the case, but if
               | you've been following deep sequence research recently
               | then you would know there are actually immense benefits
               | to doing so because the embeddings learned can then be
               | portably used on downstream tasks.
               | 
               | > it's purely limited by data availability.
               | 
               | Yes, and transfer learning on models pre-trained on
               | unsupervised sequence tasks provides a (so-far under-
               | explored) path around labeled data availability problems.
               | 
               | I already linked to a paper showing a task that these
               | sorts of approaches outperform, and that is without using
               | the most recent techniques in sequence modeling.
               | 
               | Maybe read the paper in Nature that uses this exact LM
               | technique to predict the effect of mutations before
               | assuming that it doesn't work: https://sci-
               | hub.do/10.1038/s41592-018-0138-4
               | 
               | I am not directly in the field, you are right - but I
               | think you are also being overconfident if you think that
               | these approaches are exactly the same as the HMM/markov
               | chain approaches that came before.
        
               | klmr wrote:
               | Thanks for the paper, I'll check it out; this isn't my
               | speciality so I'm definitely learning something. Just one
               | minor clarification:
               | 
               | > _Maybe read the paper ... before assuming that it doesn
               | 't work_
               | 
               | I don't assume that. In fact, I _know_ that using ML
               | _works_ on many problems in genetics. What I'm less
               | convinced by is that we can expect a _breakthrough_ due
               | to ML any time soon, partly because conventional
               | techniques (including ML) already have a handle on some
               | current problems in genetics, and because there isn't
               | really a specific (or flashy) hard, algorithmic problem
               | like there is in structural biology. Rather, there's lots
               | of stuff where I expect to see steady incremental
               | improvement. In fact, in Wikipedia's list of unsolved
               | biological problems [1] there isn't a single one that I'd
               | characterise specifically as a question from the field of
               | genetics (as a geneticist, that's slightly depressing).
               | 
               | But my question was even more innocent than that: I'm not
               | even _that_ sceptical, I'm just not aware of anything and
               | genuinely wanted an answer. And the paper you've posted
               | might provide just that, so go and do my research now.
               | 
               | [1] https://en.wikipedia.org/wiki/List_of_unsolved_proble
               | ms_in_b...
        
               | jcims wrote:
               | _Not_ being in the field, I would term what I see in this
               | story as a 'bottom up' approach to understanding genetics
               | /molecular biology. More akin to applied sciences than
               | medicine or health. This, for example, seems to be very
               | important but it still leaves us with a jello jigsaw
               | puzzle with 200 million pieces and probably far removed
               | from immediate utility in health outcomes.
               | 
               | Then there's the more clinically oriented approaches of
               | looking at effects, trying to find associated
               | genes/mutations whatever mechanisms exist in between to
               | cause a desirable or undesirable outcome. I'd call that
               | 'top down'.
               | 
               | I'm sure the lines get blurred more every day, but is
               | there a meaningful distinction into these and/or more
               | categories that are working the problem from both ends?
               | If so, are there associated terms of art for them?
        
         | the8472 wrote:
         | > but many were skeptical as to the longevity of the
         | performance DeepMind was able to achieve
         | 
         | For a non-biologist, on what is this skepticism based?
         | 
         | Just purely based on following ML news it looks like the trend
         | for ML solutions has been that they've overtaken expert-systems
         | once they've gained a solid foodhold in a field. Maybe this is
         | some perception bias. Are there any cases where ML performed
         | decently but then hit a ceiling while expert systems kept
         | improving?
        
           | garmaine wrote:
           | > Are there any cases where ML performed decently but then
           | hit a ceiling while expert systems kept improving?
           | 
           | Yes, this describes entire history of AI including several
           | boom-bust cycles. In particular the 80's come to mind. Yes
           | the practitioners think that there's no technical barriers
           | stopping them from eating the world, but that's exactly what
           | people thought about other so-called revolutionary advances.
           | 
           | Although to be pedantic, "expert systems" is the technology
           | behind AI boom of the 80's. At the time people were saying
           | expert systems can't be as good as existing algorithms
           | (including what we would now call "machine learning"
           | techniques), then suddenly the expert systems were better and
           | there was rampant speculation real AI was around the corner.
           | Then they plateaued.
           | 
           | We _appear_ to be at the tail end of the maximum hype part of
           | the boom-bust cycle. Thinking that the rapid gains being made
           | by the current deep learning approaches will soon hit a wall
           | is a reasonable outside-view prediction to make: nearly every
           | time we 've had a similarly transformative technology in the
           | AI space and elsewhere, hitting the wall is exactly what
           | happened. The onus would be on practitioners to show that
           | this time really is different.
        
             | sdenton4 wrote:
             | I think the disconnect this time around is in
             | productionization. We're getting breakthroughs in a wide
             | range of problems, and translating those gains in the
             | problem space into 'real' stable, practical solutions we
             | can use in the world is the remaining gap, and often takes
             | years of additional effort. It's still really expensive to
             | launch this stuff, and often requires domain expertise that
             | the ML research team doesn't have.
             | 
             | We're seeing a lot of this pattern: ML Researcher shows up,
             | says 'hey gimme your hardest problem in a nice parseable
             | format' and then knocks a solution out of the park. The ML
             | researcher then goes to the next field of study, leaving
             | (say) the doctors or whatever to try to bridge the gap
             | between the nice competition data and actual medical
             | records. It also turns out that there's a host of closely
             | related but different problems that ALSO need to be solved
             | for the competition problem to really be useful.
             | 
             | I don't think this means that the ML has failed, though;
             | it's probably similar to the situation for accounting
             | software circa 1980: everything was on paper, so using a
             | computerized system was more trouble than it was worth. But
             | today the situation in accounting has completely flipped.
             | Apply N+1 years of consistent effort improving data
             | ecosystems, and the ML might be a lot easier to use on
             | generic real world problems.
        
               | garmaine wrote:
               | Next time you fly through a busy airport, think about the
               | system which assigns planes to gates in realtime based on
               | a large number of variable factors in order to maximize
               | utilization and minimize waits. This is an expert system
               | design in the 80's and which allowed a huge increase in
               | the number of planes handled per day at the busiest
               | airports.
               | 
               | Or when you drive your car, think about the lights-out
               | factory that built-it, using robotics technologies
               | developed in the 80's and 90's, and the freeways which
               | largely operate without choke points again due to expert
               | system models used by city planners.
               | 
               | These advances were just as revolutionary before, and
               | people were just as excited about AI technologies eating
               | the world. Still, it largely didn't happen. To continue
               | the example of robotics, we don't have an equivalent of
               | the Jetson's home robot Rosey. We can make a robot
               | assemble a $50,000 car, but we can't get it to fold the
               | laundry.
               | 
               | These rapid successes you see aren't literally "any
               | problem from any field" -- it's specific problems chosen
               | specifically for their likely ease in solving using
               | current methods. DeepMind didn't decide to take on
               | protein folding at random; they looked around and picked
               | a problem that they thought they could solve. Don't
               | expect them to have as much success on every problem they
               | put their minds to.
               | 
               | No, machine learning is not trivially solving the hardest
               | problems in every field. Not even close. In biomedicine,
               | for example, protein folding is probably one of the
               | easiest challenges. It's a hard problem, yes, but it's
               | self-contained: given an amino acid sequence, predict the
               | structure. Unlike, say, predicting the metabolism of a
               | drug applied to a living system, which requires
               | understanding an extremely dense network of existing
               | metabolic pathways and their interdependencies on local
               | cell function. There's no magic ML pixie dust that can
               | make that hard problem go away.
        
           | ramraj07 wrote:
           | It's because for many researchers ML is just to take a
           | standard keras or scikitlearn model shove their data in and
           | get some table or number out, and see if that solves their
           | problem. If that's your only ML experience then I suppose
           | this is how sceptical you'd be of ML in general.
           | 
           | It looks like DeepMind invented a completely new method for
           | this round that's not just an extension of their previous
           | work, showing how much you can gain if you don't shoebox
           | yourself into just trying to improve existing methods.
           | 
           | That all the scientists were highly skeptical about the scope
           | of ML (and these are computer scientists to begin with mind
           | you) just shows how little they knew of what they did know of
           | what a computer or a program can possibly do, which is a bit
           | appalling to be honest.
        
             | timr wrote:
             | _" It looks like DeepMind invented a completely new method
             | for this round that's not just an extension of their
             | previous work, showing how much you can gain if you don't
             | shoebox yourself into just trying to improve existing
             | methods. That all the scientists were highly skeptical
             | about the scope of ML (and these are computer scientists to
             | begin with mind you) just shows how little they knew of
             | what they did know of what a computer or a program can
             | possibly do, which is a bit appalling to be honest."_
             | 
             | My PhD (now over a decade ago...yikes) was in applying much
             | simpler ML methods to these kinds of problems (I started in
             | protein folding, finished in protein / nucleic acid
             | recognition, but my real interest was always protein
             | design). Even back then, it was clear that ML methods had a
             | lot more potential for structural biology (pun unintended)
             | than for which they were being given credit. But it was
             | hard to get interest from a research community that cared
             | little about non-physical solutions. No matter how well you
             | did, people would dismiss it as a "black box solution", and
             | that pretty much limited your impact.
             | 
             | Some of this is understandable: even today, it's not at all
             | clear that a custom-built ML model for protein folding is
             | of much use to anyone -- particularly a model that doesn't
             | consider all of the atoms in the protein. The traditional
             | justification for research in this area is that if you
             | could develop a _sufficiently general_ model of protein
             | physics, it would also allow you to do all sorts of _other_
             | stuff that is much more interesting: rational protein
             | design, drug binding, etc.
             | 
             | The alphafold model is not really useful for any of this,
             | so in a way, it's kind of like the weinermobile of science:
             | cool and impressive when done well ("hey! a giant hot dog
             | on wheels!"), but not really useful outside of the niche
             | for which it was designed. So it's hard to blame
             | researchers in this field -- who generally have to chase
             | funding and justify their existence -- from pursuing the
             | application of deep learning to this one, narrow problem
             | domain.
             | 
             | Obviously there will now be a wave of follow-on research,
             | and it's impossible to know what methods this will spawn.
             | Maybe this will revolutionize computational structural
             | biology, maybe not. But I think it's a _little_ unfair to
             | demonize the entire field. Protein folding just
             | traditionally hasn 't been a very useful or interesting
             | area, and like all "pure science", it leads to a lot of
             | small-stakes, tribal thinking amongst the few players who
             | can afford to compete. This is right out of Thomas Kuhn: a
             | newcomer sweeps into a field, glances at the work of the
             | past, then bashes it over the head, dismissively.
        
               | ramraj07 wrote:
               | We don't know too much about the exact model they made
               | but it looks sufficiently generalizable to be able to
               | give a candidate protein structure for any given
               | sequence. It doesn't automatically cure cancer and inject
               | the drug but that by itself is an amazing tool that if
               | available to everyone will revolutionize biology
               | experimentation.
               | 
               | I will definitely blame the protein structure field in
               | multiple levels though. It was always frustrating to me
               | to open up Nature or Science and see it filled with
               | papers about structure - like they are innovating so much
               | that half of the top science magazines every week have
               | papers in that field, yet it's not going anywhere? Or is
               | it simply just a bunch of professors tooting their own
               | horns about ostensible progress in a field that's archaic
               | by decades if not years? The overall protein structure
               | field internalised some dogmas in self defeating ways to
               | everyone's detriment and finally events like this (and
               | Cryo em, maybe) will jolt them out or make them fully
               | irrelevant so we can move on. it's only doubly ironic
               | that this came from a team in a company with minimal
               | academic ties showing how toxic that entire system is. I
               | only feel pity for the graduate students still trying to
               | crystallize proteins in this day and age.
        
               | dekhn wrote:
               | The reason for your second paragraph is pretty
               | straightforward. There has been an immense amount of
               | support for proteins as "the workhorses of the cell" for
               | hundred+ years. I call it the "protein bias". We've seen
               | in many times- for example when it was first hypothesized
               | and then proved that DNA, rather than protein, is the
               | heredity-encoding material, and seen many times, for
               | example in the denial that RNA could act as an enzyme or
               | the functional core of the ribosome could be a ribozyme.
               | 
               | I think what basically happened is a very influential
               | group of scientists mainly in Cambridge around the 50s
               | and 60s convinced everytbody that reductionist molecular
               | biology would be able to crystallize proteins and
               | "understand precisely how they function" by inspecting
               | the structures carefully enough.
               | 
               | I learned, after reading all those breathless papers
               | about individual structures and how they explain the
               | function of protein is that in the vast majority of
               | cases, they don't have enough data to speculate
               | responsibility about the behavior of proteins and how
               | they implement their functions. There are definiteyl
               | cases of where an elucidated structure immediately led to
               | an improved understanding of function:
               | 
               | "It has not escaped our notice (12) that the specific
               | pairing we have postulated immediately suggests a
               | possible copying mechanism for the genetic material."
               | 
               | but most papers about how cytochrome "works" aren't
               | really illuminating at all.
        
               | timr wrote:
               | _" We don't know too much about the exact model they made
               | but it looks sufficiently generalizable to be able to
               | give a candidate protein structure for any given
               | sequence. It doesn't automatically cure cancer and inject
               | the drug but that by itself is an amazing tool that if
               | available to everyone will revolutionize biology
               | experimentation."_
               | 
               | They say on their own press-release page that side-chains
               | are a future research problem, and nothing about their
               | method description makes me believe they've innovated on
               | all-atom modeling. This software seems able to generate
               | good models of protein backbones; these kinds of models
               | certainly have uses, but a backbone model is not enough
               | for drug design.
               | 
               | This is certainly an advancement, but you're exaggerating
               | the scope of the accomplishment.
               | 
               |  _" I only feel pity for the graduate students still
               | trying to crystallize proteins in this day and age. "_
               | 
               | Nothing about this changes the fact that protein
               | crystallography is a gold-standard method for determining
               | a protein structure. CryoEM has made it possible to
               | obtain good structures for classes of proteins we could
               | never achieve before, and it's certainly _interesting_ if
               | we can run a computer for a few days to get a 1A ab
               | initio model for a protein sequence, but we could
               | _already_ do that for a large class of proteins with
               | homology modeling. These predicted structures still aren
               | 't generally that useful for drug design, where tiny
               | details of molecular interactions matter.
               | 
               | To put it in perspective: protein energetics are measured
               | on the scale of _tens of kcal / mol_. Protein-drug
               | interactions are measured in _fractions of a kcal_. A
               | single hydrogen bond or cation-pi interaction or
               | displaced water molecule can make the difference between
               | a drug candidate and an abandoned lead. Tiny changes in
               | backbone position make the difference between a good
               | structure and a bad one. Alphafold isn 't doing that kind
               | of modeling.
        
               | ramraj07 wrote:
               | Of course, they havent solved everything, but you seem to
               | be doing exactly what I accuse that entire field (and
               | academia in general) of doing - which is to insist a
               | problem is intractable or hard and undermine someone
               | potentially challenging that. When they released the 2018
               | results tbey field did embrace it (for sure I'd consider
               | the groups organizing CASP as at least forward thinking)
               | but was still skeptical on how much more progress it can
               | make; now they blow everyone's minds again by a
               | monumental leap and again people want to come say of
               | course this is the last big jump!
               | 
               | I understand the self preservation instincts that kick in
               | when there's a suggestion that the entire field has been
               | in a dark age for a while, but I hope you can see that
               | there might be something fundamentally wrong with how
               | research is done in academia and that is to blame for why
               | this didn't happen sooner, and why it's so hard for many
               | to embrace it.
               | 
               | Regarding your comments on the inapplicability of this
               | current solution for docking, I'm sure that's the next
               | project they're taking up, and let's see where that goes.
               | 
               | This is exactly the same type of progression that
               | happened with Go, where when their software bet a
               | professional player everyone's like "yeah but I bet he
               | wasn't that good". A few years later and Lee Sedol just
               | decided to retire. I am interested to see what happens to
               | that entire academic field in a similar vein, though my
               | interests are more in knowing how science can advance
               | from more people thinking this way.
        
           | whimsicalism wrote:
           | ML is a super overloaded term.
           | 
           | There are definitely cases where machine learned statistical
           | solutions do not perform as well as the systems tuned by the
           | experts, but if you can define the task well and get the data
           | for a deep solution, usually those will overtake.
        
             | penagwin wrote:
             | This. I believe technically just linear regression could be
             | considered "machine learning".
        
               | misnome wrote:
               | I've seen people at bio conferences actively calling
               | linear regression machine learning.
        
               | diab0lic wrote:
               | This is likely because linear regression meets most
               | widely accepted definitions of machine learning. [0][1]
               | It is simple and very effective when learning in linear
               | space.
               | 
               | [0] https://en.wikipedia.org/wiki/Machine_learning
               | 
               | [1] https://www.cs.cmu.edu/~tom/mlbook.html
        
       | amelius wrote:
       | Curious, what are the sizes of the training and validation/test
       | datasets (number of structures)?
        
         | papaf wrote:
         | _Curious, what are the sizes of the training and validation
         | /test datasets (number of structures)?_
         | 
         | The proteins are shown on the CASP website [1]. Both the number
         | of residues and number of proteins are bigger than I expected.
         | 
         | [1] https://predictioncenter.org/casp14/targetlist.cgi
        
       | piannucci wrote:
       | This is so cool. I hope they will also tackle the problem of
       | predicting RNA structures and catalytic activity.
        
       | xphos wrote:
       | Like this is awesome and a huge advancement but one thing that
       | worries me with an AI solution is that it doesn't really draw us
       | any closer to the why. Why do proteins fold the way they do? We
       | can predict the resulting structure which is extremely
       | significant, we have no clue why. While we get the insight of
       | being able to predict some structures we don't get the insight of
       | why things are happening the way they are. In some cases like
       | this it __might __not matter but in other cases that insight
       | might actually be way more significant than answer the problem to
       | begin with. Of course we can review over the problem with the
       | additional predictions that AI gives us but this can be
       | haphazardous because what if there is specific sequence spins in
       | some certain way that we and thus the AI has never seen and it
       | goes missed. I 'm not a biologist to say this is possible but I
       | known this kind of edge case can come up and what rabbit holes
       | will we go down because we only have the AI implied insight.
       | 
       | disclaimer I think the contributions are super useful for science
       | but they do come with worries as does every path of discovery
        
         | MauranKilom wrote:
         | > Why do proteins fold the way they do?
         | 
         | I think the _why_ is pretty clearly understood
         | (https://en.wikipedia.org/wiki/Protein_folding), in the same
         | way that we understand the _mechanism_ behind the three body
         | problem in physics or quantum computing. But that does
         | necessarily imply that there is an efficient way for us to
         | simulate /predict the results of having nature play out those
         | mechanisms.
        
         | Odenwaelder wrote:
         | I have no idea what you are talking about.
        
           | xphos wrote:
           | AI solve the process but doesn't give a whole lot of insight
           | into the formulas and the description what's going on. Where
           | we as humans have reasonably found that e = mc^2. However AI
           | would gives us e or m but backboxes us away from seeing that
           | c aka the speed of light was involved(unless we implied that
           | before). There might be interesting relationships that are
           | useful that AI unintentionally masks that could be ground
           | breaking if we could only understand process more
           | holistically. I think a different commenter eluded in this
           | case we think we understand protein folding well we just
           | struggle to synthesis it in a compact mathematical way even
           | though with AI we can simulate the process well for known
           | examples.
           | 
           | The issue with AI is we don't know if our current example set
           | includes every case what if there is a strange sequence of
           | amino acid that causes something "weird" to happen that we
           | have haven't seen. AI cannot predict something novel it or us
           | haven't seen which is the issue. The process(if it exists) of
           | how one could solve this problem might also be exportable to
           | other fields if it was formulized with math rather than
           | estimated with AI.
        
         | deeviant wrote:
         | > While we get the insight of being able to predict some
         | structures we don't get the insight of why things are happening
         | the way they are.
         | 
         | This isn't something specific to AI, but science itself. We
         | know the value of C, but now _why_ the value is C, sure we can
         | point to something like the Lorentz transformation, but we can
         | 't and probably won't even be able to explain why it has these
         | particular constants, we just know that we can measure them and
         | they are this.
         | 
         | Science isn't in the business of answering why. A successful
         | scientific theory does two things, A) Makes useful predictions,
         | B) Is correct in its predictions. It'd be wrong to call a NN a
         | scientific theory, but it certainly does make predictions and
         | as these results show, it is correct in its predictions.
         | 
         | Sometime soon, humanity is going to have to come to terms that
         | we will soon (or perhaps already have) enter an age where
         | mankind is not the only source of new knowledge. AI-derived
         | knowledge will only increase as the future unfolds and the
         | analysis of such knowledge will likely become it's own branch
         | of study itself.
        
           | naringas wrote:
           | > Science isn't in the business of answering why.
           | 
           | I agree as long as science is a business. But why is science
           | a business?
           | 
           | If science is not meant to answer why, does this mean we
           | cannot know why?
           | 
           | should we just give up on having story-like (narrative)
           | explanations for why and how things work? it seems like we
           | are headed to a world where the computer just tells us what
           | to do and where to go. a world in which we are free from
           | having to think about why we are being told to do whatever it
           | is we're doing. click (or tap) buttons, get tokens to buy
           | food and pay rent.
        
         | crystaln wrote:
         | These are predictions. Presumably the proteins will be
         | inspected and the model refined and updated before we start
         | using DNA without first checking the output.
        
         | hailwren wrote:
         | There are two threads here. The first is that it would not be
         | surprising to learn that describing the way that proteins fold
         | is a very hard thing for humans to understand. See i.e. 4CT [1]
         | and its computational proofs.
         | 
         | The second is that explainability in ML is much more tractable
         | than it was 10 years ago. This is not to say that it's solved,
         | but having solved the predictive problem -- I would expect
         | model simplifications and SME research to proceed more quickly
         | towards understanding the how now. I did some work w/ an
         | Astrophysics postdoc using beta-VAEs [2] to classify
         | astronomical observations, and simplifying models in order to
         | achieve human-explainability proved to not cost as much
         | predictive power as you might expect. It might be that the same
         | holds true here.
         | 
         | 1- https://mathworld.wolfram.com/Four-ColorTheorem.html
         | 
         | 2 - https://paperswithcode.com/method/beta-vae
        
         | [deleted]
        
       | andy_ppp wrote:
       | So, sorry to be a philistine but what specific discoveries will
       | this lead to... will it make it easier to produce antivirals or
       | even molecular machines?
        
         | tim333 wrote:
         | "DeepMind said it had started work with a handful of scientific
         | groups and would focus initially on malaria, sleeping sickness
         | and leishmaniasis, a parasitic disease"
         | https://www.theguardian.com/technology/2020/nov/30/deepmind-...
        
       | dluan wrote:
       | I worked in the lab that helped develop folding@home, as well as
       | the game where the crowd was the chaotically trained machine that
       | folded and unfolded one amino acid at a time. This feels like a
       | pretty significant new chapter in the humanity movie.
       | 
       | A few times, I get immense pangs of jealousy for younger people a
       | generation or a half before me. And I'm only 30! This is one of
       | those times.
        
         | maxlamb wrote:
         | Is the team really that young? 20 year olds?
        
           | sgillen wrote:
           | I think he means that those who are in their teens / even
           | younger now will get to experience immensely cool tech in
           | their lifetime.
        
       | notkaiho wrote:
       | Who'd have thought that the kid who programmed Theme Park would
       | go on to do this kind of work.
        
       | uoaei wrote:
       | No, they didn't. They approximated a solution to protein folding.
       | 
       | The two are different concepts -- this isn't the typical HN
       | pedantry.
       | 
       | "Solving" the problem would entail developing an interpretable
       | algorithm for taking a string of amino acids and determining the
       | 3D structure once folded.
       | 
       | Approximating a solution would entail simulating that algorithm,
       | which is what their neural network is doing. It is of course
       | usually accurate, but you would expect this with any suitable
       | universal function approximator.
       | 
       | Props to DeepMind and congrats to CASP but is it not obvious that
       | this is more hype-rhetoric for public consumption?
        
         | stupidcar wrote:
         | > this isn't the typical HN pedantry
         | 
         | This is the absolute definition of it.
        
         | visarga wrote:
         | > "Solving" the problem would entail developing an
         | interpretable algorithm
         | 
         | It looks like you'd like a grokable solution, but the problem
         | might be just too complex to grasp for the human brain.
         | "Solved" means they solved the protein puzzles on the official
         | benchmark.
         | 
         | > but you would expect this with any suitable universal
         | function approximator
         | 
         | Yeah, it's just that easy. Function approximator, engage! It
         | took a team of Deep Mind researchers, two years and God knows
         | how much compute. The universal function approximation theorem
         | doesn't also say how to find that network.
        
           | uoaei wrote:
           | > the problem might be just too complex to grasp for the
           | human brain
           | 
           | Maybe all at once, but having a self-consistent, unified
           | theory is very important.
           | 
           | We can't understand the full brain, but we can understand the
           | essential components and how they work together. This still
           | constitutes "interpretable".
           | 
           | > The universal function approximation theorem doesn't also
           | say how to find that network.
           | 
           | Correct, and irrelevant.
        
             | [deleted]
        
         | deeviant wrote:
         | > this isn't the typical HN pedantry.
         | 
         | Then launches into what can only be recognized as an exercise
         | in pedantry.
        
           | uoaei wrote:
           | "Pedantry" implies that the distinction is not meaningful.
           | 
           | This is true if you're only paying attention to how this
           | system can be utilized to answer questions posed to it.
           | 
           | This achievement by itself, however, does not do much to push
           | the _science_ of protein folding much further. Those advances
           | will come when people poke, prod, and break the model to
           | develop a unified theory for protein folding.
        
             | deeviant wrote:
             | The "science" of protein folding has a primary goal: to
             | predict the structure of a protein given it's constituent
             | parts.
             | 
             | This is what alphaFold does, and it's been verified to
             | produce results at an apparent accuracy at or above
             | something like X-ray protein crystallography. The advances
             | will come, after these results are validated and accepted
             | by the scientific community as whole, simply when groups
             | start using this technique to immediately access the
             | structure of proteins that in the past would be
             | prohibitively expensive and time consuming or down right
             | impossible to access before, and then use that knowledge to
             | do their work.
             | 
             | You seem to think the first thought a researcher will have
             | after this becomes widely available is, "Oh hey, I can now
             | accurately predict the shape of an arbitrary protein which
             | unlocks untold potential scientific progress on numerous
             | scientific fronts, but the thing I want to spend my time on
             | is trying to replicate the results of the network myself,
             | so I can do it manually thousands of times slower...",
             | which is patently inane.
        
               | uoaei wrote:
               | This model will be an amazing tool toward a science of
               | protein folding, but we have not "solved" protein folding
               | as long as that remains elusive.
        
         | kkoncevicius wrote:
         | This is exactly right. It's like saying you solved chess
         | because for each configuration of pieces on the board you can
         | use machine learning to predict whether that position can be
         | achieved with valid chess moves. With 90% accuracy.
        
         | astroalex wrote:
         | The distinction you're making between "solved" and "closely
         | approximated" makes logical sense to me. However, if I'm
         | interpreting the AlphaFold results correctly, this distinction
         | isn't practically significant, right?
         | 
         | If you can approximate an algorithm with error that is "below
         | the threshold that is considered acceptable in experimental
         | measurements" (to quote another HN comment), then you have
         | something as good as the algorithm itself for all intents and
         | purposes.
         | 
         | Therefore the use of the word "solve" doesn't qualify as hype-
         | rhetoric, and the distinction you're making does seem somewhat
         | pedantic (even if technically true).
         | 
         | (I'm speaking as someone with only the tiniest amount of
         | stats/ML experience, so I could be totally wrong!)
        
           | colonelcanoe wrote:
           | It might be the case that the relevant, practical threshold
           | now tightens. For example, perhaps it is easier to
           | experimentally verify a protein shape predicted by an
           | algorithm than it is to experimentally determine the protein
           | shape?
        
             | intpx wrote:
             | exactly. Even an incomplete map with somewhat limited
             | resolution makes navigation a hell of a lot easier than
             | flying blind. This effectively is a data reduction
             | solution-- if you have a fuzzy shape of the thing you are
             | trying to model, and you learn the mechanics better with
             | each thing you model, your ability to quickly and
             | accurately reach a goal improves
        
               | uoaei wrote:
               | That's true and also is not what is being challenged by
               | my comment.
        
             | robocat wrote:
             | From: https://www.sciencemag.org/news/2020/11/game-has-
             | changed-ai-...
             | 
             | "The organizers even worried DeepMind may have been
             | cheating somehow. So Lupas set a special challenge: a
             | membrane protein from a species of archaea, an ancient
             | group of microbes. For 10 years, his research team tried
             | every trick in the book to get an x-ray crystal structure
             | of the protein. 'We couldn't solve it.'"
             | 
             | "But AlphaFold had no trouble. It returned a detailed image
             | of a three-part protein with two long helical arms in the
             | middle. The model enabled Lupas and his colleagues to make
             | sense of their x-ray data; within half an hour, they had
             | fit their experimental results to AlphaFold's predicted
             | structure. 'It is almost perfect,' Lupas says."
        
           | uoaei wrote:
           | Practically there is little difference if all you're
           | interested in is determining folds from protein sequences.
           | 
           | The difference comes in developing a theory for generalizing
           | the study of protein folding as a scientific pursuit.
        
       | asbund wrote:
       | Exiting time
        
       | m-p-3 wrote:
       | I'm wondering what this means for folding@home.
        
         | great_tankard wrote:
         | Folding@home mostly tries to calculate protein dynamics using
         | already solved structures, so their work is still critical.
        
         | breck wrote:
         | I'm pretty sure this means they can pack it up? Or point their
         | infra to a different problem?
        
           | touisteur wrote:
           | Or just do billions of inferences per second. Next step?
        
         | matsemann wrote:
         | I came here wondering the same. Is this based on work done by
         | folding@home for instance? (As in, it used their precomputed
         | stuff as training data)
        
       | 0xBA5ED wrote:
       | Does this give the ability to engineer cures for currently
       | incurable diseases?
        
         | jfarlow wrote:
         | In short - certain ones, yes. This should be one step (that was
         | a bottleneck) in helping a company with a fixed budget do an
         | order or magnitude more 'experiments' with the same amount of
         | resources. Lab resources are expensive and fixed, so if you can
         | pre-compute what you need, you can get right to the more
         | powerful results.
         | 
         | We design proteins for immunotherapies - this kind of thing
         | would help us more rapidly design our proteins (and more
         | efficiently use our wet-lab resources to speed existing
         | projects). For others, some drugs are hard build without
         | knowing how they will interact - this could both provide new
         | 'targets' to go after, but also might help prevent projects
         | that would otherwise accidentally target an important protein.
        
       | aparsons wrote:
       | Fascinating work. I wonder if this approach works to model
       | interactions (no reason it shouldn't). The interactions of
       | proteins with other proteins and well as as molecules like
       | lipids, water and electrolytes form the basis for cellular
       | processes. If that can be inferred correctly, you are looking at
       | the building blocks of a "human simulator".
        
       | optimalsolver wrote:
       | They should make this into a Kaggle competition.
       | 
       | Maybe they might get an even better model.
        
       | _greim_ wrote:
       | At Sun back in the day our workstations tended to have fairly
       | promiscuous login settings, so one of my coworkers took the
       | liberty to launch folding@home on every machine in the org.
       | Listing running processes one day, I saw this thing pegging my
       | CPU; asked around and others had it too. A virus!?! Then he
       | fessed up. Kinda miffed at first but ultimately really cool, so
       | we let the thing keep running. That was my introduction to the
       | whole protein folding problem, and it's really great to see this
       | milestone!
        
         | dekhn wrote:
         | I ran Folding@Home at Google on hundreds of thousands of fast
         | Xeon cores for over a year. I concluded at the end that
         | unbiased MD simulations are not an effective use of computer
         | time.
        
           | t_serpico wrote:
           | Out of curiosity, why not?
        
             | dekhn wrote:
             | for the dollars invested, the amount of basic and applied
             | results out weren't worth it.
        
       | FrojoS wrote:
       | Bell Labs invented the transistor. Now this. Monopoly money at
       | its best!
        
       ___________________________________________________________________
       (page generated 2020-11-30 23:00 UTC)