[HN Gopher] Alphafold
       ___________________________________________________________________
        
       Alphafold
        
       Author : matejmecka
       Score  : 311 points
       Date   : 2021-07-15 18:22 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | swalsh wrote:
       | _edit_ I was wrong. Please ignore.
        
         | ali_m wrote:
         | > This is a completely new model that was entered in CASP14 and
         | published in Nature.
        
         | f38zf5vdt wrote:
         | From the repo:
         | 
         | > This package provides an implementation of the inference
         | pipeline of AlphaFold v2.0
        
       | culopatin wrote:
       | Does anyone know if this can be made to work with rna fold?
        
       | qeternity wrote:
       | Ok, so biochemists: which bit of the secret sauce are they
       | leaving out?
        
       | duckerude wrote:
       | > The AlphaFold parameters are made available for non-commercial
       | use only, under the terms of the Creative Commons Attribution-
       | NonCommercial 4.0 International (CC BY-NC 4.0) license. You can
       | find details at: https://creativecommons.org/licenses/by-
       | nc/4.0/legalcode
       | 
       | Does CC BY-NC actually do this? As far as I can tell it only
       | really talks about sharing/reproducing, not using.
       | 
       | Or is the only thing prohibiting other commercial use the words
       | "available for non-commercial use only"?
        
         | mikewarot wrote:
         | If you took their parameters, then trained it for while on a
         | different set of data, it would vary from the original. I
         | wonder how much compute would be required to make the offset
         | far enough to hold up from scrutiny, and in court.
         | 
         | Alternatively, you could manually change the network model, add
         | a few hidden layers, etc... modifying the parameters in step,
         | and result in a new model and new parameters. Some training to
         | vary the parameters, and it's now a new work.
        
         | sillysaurusx wrote:
         | Artbreeder has some interesting prior art here: nVidia forbid
         | commercial use of StyleGAN, but artbreeder disregarded it and
         | happily sold all the breeding you wanted. No one seemed to
         | care.
         | 
         | I suspect that the clause is there to prevent a startup
         | launching on the basis of "see this trained model? Yeah, that's
         | literally our business model" though, which is a mildly amusing
         | thought, wot wot.
         | 
         | So basically, a few tens of thousands, sure. A few million, big
         | G might have a problem.
         | 
         | Still, the smart move would be to launch the business anyway,
         | and gamble that you can work out a licensing deal.
        
       | jfengel wrote:
       | So... is it possible to clone this and turn it into a
       | Folding@Home client? How does it do?
        
         | kmckiern wrote:
         | Where there isn't an available crystal structure, Alphafold can
         | be used to create initial structures for simulation via
         | folding@home, replacing older homology modeling techniques.
         | 
         | Source: former folding@home researcher.
        
         | dekhn wrote:
         | no, it wouldn't make sense to do that. Folding@Home is for ab
         | initio where you don't have any prior info for the structure,
         | this is for homology modelling. F@H probes the dynamics of
         | protein folding, this just makes a static prediction.
        
       | thesausageking wrote:
       | The PDF is linked in the article:
       | 
       | https://www.nature.com/articles/s41586-021-03819-2_reference...
        
       | mensetmanusman wrote:
       | Distribution of this 2 TB file seems like a good use of
       | torrent...
        
       | dekhn wrote:
       | Fantastic, they released the dataset and code to train the model.
       | Science will be able to proceed. edit: not the code to train the
       | model, just the code to run inference.
       | 
       | The underlying sequence datasets include PDB strucrures and
       | sequences, and how those map to large collections of sequences
       | with no known structure (no surprise). Each of those datasets
       | represents decades of thousands of scientists work, along with
       | programmers and admins who kept the databases running for decades
       | with very little grant money (funding long-term databases is
       | something NIH hated to do until recently).
        
         | FredFS456 wrote:
         | There's a preview paper as well:
         | https://www.nature.com/articles/s41586-021-03819-2
        
           | dekhn wrote:
           | Yes, I skimmed the paper already and it wasn't too
           | surprising. There are details that will take some time to
           | parse out to understand how important they are.
           | 
           | Personally, I've found over decades that academic papers like
           | that are far less useful to me than a github project and
           | downloadable data that I can inspect, run and modify on my
           | own. Other folks I know could read that paper and write the
           | code in a day, I always wish I could do that.
        
         | cing wrote:
         | The process is described in Supplementary, but where do you see
         | the code to train the model? The repository is the inference
         | pipeline.
        
           | dekhn wrote:
           | I misread. The data dump is required for inference.
        
         | gopalv wrote:
         | > The total download size is around 428 GB and the total size
         | when unzipped is 2.2 TB. Please make sure you have a large
         | enough hard drive space, bandwidth and time to download.
         | 
         | > This was tested on Google Cloud with a machine using the
         | nvidia-gpu-cloud-image with 12 vCPUs, 85 GB of RAM, a 100 GB
         | boot disk, the databases on an additional 3 TB disk, and an
         | A100 GPU.
         | 
         | This is amazingly detailed for a researcher who wants to follow
         | in the track and also Apache licensed, which is one road-bump
         | out of the way for a commercial enterprise, like an actual drug
         | manufacturer who wants to burn some money trying this out.
         | 
         | edit: said the last part too fast, the code has a "the
         | AlphaFold parameters are made available for non-commercial use
         | only under the terms of the CC BY-NC 4.0 license"
        
           | dekhn wrote:
           | Yes, all science should be communicated in the form of an
           | academic paper wiht a supporting git repo and quickly
           | downloadable dataset and a fast path to reproducing the work.
           | That would be a huge change from the establishment.
           | 
           | It's quite unclear what value this will have to pharma;
           | personally I doubt this has any direct applications (and I'm
           | one of the few people in the world that can say that with
           | deep authority).
        
             | aantix wrote:
             | Who benefits from this work?
        
               | dekhn wrote:
               | Primarily the community that previously depended on
               | homology models.
        
             | gnufx wrote:
             | Surely not all science. Just as well Dirac wasn't required
             | to communicate that way the equation that fundamentally
             | underlies the phenomenon discussed, and you couldn't put
             | the unique facility my thesis work pioneered into git! I do
             | highly approve of publishing software and data where
             | possible, of course, since before Free Software needed to
             | be coined, and it's much easier now.
        
               | dekhn wrote:
               | If you're just publishing equations, you should have an
               | associated notebook which executes the equations.
               | 
               | I don't know what you mean you can't put your thesis work
               | into git. Is it a physical thing? Too big for git?
        
             | astro-codes wrote:
             | Why wouldn't this have much value to pharma? Is it because
             | its application is actually really limited in scope?
        
               | dekhn wrote:
               | there are research groups this would be useful for but
               | structures are not on the critical path to drug discovery
               | or approval.
        
             | [deleted]
        
       | dekhn wrote:
       | I missed an important detail: """an academic team has developed
       | its own protein-prediction tool inspired by AlphaFold 2, which is
       | already gaining popularity with scientists. That system, called
       | RoseTTaFold, performs nearly as well as AlphaFold 2, and is
       | described in a paper in Science paper also published on 15
       | July"""
       | 
       | One of the things I say about CASP has to be updated. It used to
       | be "2 years after Baker wins CASP, the other advanced teams have
       | duplicated his methods and accuracy, and 4 years after,
       | everything Baker did is now open source and trivially
       | reproducible"
       | 
       | now, it's baker catching up to DeepMind and it took about a year
       | 
       | https://doi.org/10.1126/science.abj8754
        
         | radus wrote:
         | Very cool! Great to see this competition between academia and
         | industry yielding improvements on all fronts.
        
       | Cas9 wrote:
       | Honest question: since AlphaFold doesn't really _solve_ the
       | protein folding problem (it's NP-complete after all), but only
       | _approximates_ solutions very well, what are the real impacts of
       | this? Isn't a good approximation of a protein enough to cause
       | unexpected problems? How do we know that an approximate structure
       | will perform the same as the correct solution?
        
         | radus wrote:
         | Yes, it is still useful. Even structures obtained through
         | traditional means (eg. x-ray crystallography) are
         | approximations to an extent since there are limits to the
         | resolution that you can obtain and oftentimes regions of
         | proteins are "disordered". Additionally, these structures are
         | only snapshots of a protein in a particular state, which may
         | not completely reflect the dynamics of the protein in its
         | native environment.
        
         | nmca wrote:
         | NP completeness tells you about the hardest cases, not the most
         | useful cases.
        
         | thxg wrote:
         | > (it's NP-complete after all)
         | 
         | Protein folding is a physical/biological phenomenon. AFAIK we
         | don't currently have a proper exact mathematical formulation of
         | the problem that would let one determine its complexity.
         | 
         | You may be referring to this paper [1]. It only claims that one
         | particular optimization problem, believed to give a solution to
         | protein folding problems, is NP-hard. So, even if a suitable
         | exact formulation exists, it is not yet proven that protein
         | folding is hard, although it for sure seems plausible.
         | 
         | By the way, it is perfectly possible today to solve some very
         | large-scale NP-hard problems (think millions of variables and
         | constraints) in reasonable amounts of time (think minutes or
         | hours). Examples are knapsack problems, SAT problems [2], the
         | Traveling Salesman Problem [3] or more generally Mixed Integer
         | Programming [4].
         | 
         | [1] "Complexity of protein folding", 1993, by Aviezri S.
         | Fraenkel
         | 
         | [2] http://www.satcompetition.org
         | 
         | [3] http://www.math.uwaterloo.ca/tsp/
         | 
         | [4] http://plato.asu.edu/bench.html
        
         | hobofan wrote:
         | I would expect that once AlphaFold has helped you identify a
         | potential protein (e.g. as a drug) out of a bigger set of
         | potential proteins, there will still be a manual step of
         | traditional cryoEM, NMR, etc. to get an accurate high-
         | resolution structure.
        
         | t_serpico wrote:
         | To me, the interesting thing is not the specific results but
         | rather that you can accurately predict crystal structures from
         | sequence alone. This begets the question: what other physical
         | biological properties can we predict?
        
         | saithound wrote:
         | AlphaFold is not about solving any kind of NP-complete problem.
         | 
         | Proteins consist of chains of amino acids which spontaneously
         | fold up to form a structure. Understanding how the amino acid
         | chain determines the protein structure is highly challenging,
         | and this is called the "protein folding problem".
         | 
         | People use mathematical models to predict how proteins fold in
         | nature. Many such mathematical models are stated in terms such
         | as "proteins fold into a configuration that minimizes a certain
         | energy function". Even the simplest such models [1] give rise
         | to NP-hard decision problems, which are also known (somewhat
         | confusingly) as "protein folding problems". To make this a bit
         | less confusing, I will call the mathematical decision problems
         | PFPs.
         | 
         | Like all mathematical models, our protein folding models don't
         | correspond exactly to reality. Even if you are somehow able to
         | determine the exact mathematical solution to a mathematical
         | PFP, that _still_ doesn't guarantee that the real protein that
         | you were trying to model behaves like the mathematical solution
         | would indicate. E.g. the protein may fold in such a way that it
         | gets stuck in a local optimum of the energy function you were
         | using.
         | 
         | How do we detect this? We make inferences about how the protein
         | should behave, given the mathematical solution to the Protein
         | Folding Problem, and then we perform experiments, and find out
         | (empirically) that the protein behaves in a manner that is
         | inconsistent with the inferences drawn from the mathematical
         | model. Scientists _do_ do this. And they would have to do it
         | even if they had a fast, exact way to solve NP-complete
         | problems, because the NP-complete problems are still just part
         | of a mathematical model, and need not correspond to reality in
         | any way.
         | 
         | The success of AlphaFold is not measured by how well it solves
         | (or approximates) mathematical PFPs. The success of AlphaFold
         | is measured by making successful predictions about how certain
         | proteins will fold. And this is exactly how it was tested [2]:
         | they threw it at a bunch of problems for which scientists have
         | empirically determined how certain amino acid chains fold, but
         | didn't release the results. And then they compared the
         | solutions predicted by AlphaFold, and found that most of the
         | predictions were consistent with what they knew to be the
         | case.*
         | 
         | [1] https://en.wikipedia.org/wiki/Lattice_protein
         | 
         | [2] https://predictioncenter.org/casp14/index.cgi
         | 
         | * That's an understatement. The solutions were really very
         | good, much better than those produced by any other submission
         | to CASP14.
        
         | whimsicalism wrote:
         | You want to find a protein that has X structure (since
         | structure determines function to a degree).
         | 
         | If AlphaFold is substantially more accurate at solving
         | proteins, it can mean that drug discovery is faster, assays are
         | faster, etc. etc.
         | 
         | The "unexpected problems" would be caught in the assay stage.
        
           | radus wrote:
           | Kind of disagree with this.. solving protein structures is
           | not the rate limiting step in drug discovery or in
           | biochemical assays -- not by a long shot. See this excellent
           | comment by @dekhn on a related submission:
           | https://news.ycombinator.com/item?id=27849046
        
         | dekhn wrote:
         | The protein folding problem is not NP complete. The "formal"
         | protein folding problem, as posed (find the set of dihedral
         | angles whose resulting structure has the lowest energy) might
         | be, but that bears only a distant resemblance to how people
         | "solve" the problem today. At the very least, the statement is
         | incorrect because many proteins don't actually fold to their
         | energy minimum, they get stuck in kinetic traps, and the formal
         | PF defintion never accomodated that idea.
        
         | bawolff wrote:
         | I dont know much about protein folding, but for most things in
         | life,exact solutions to NPC problems usually aren't needed for
         | non-contrived problems. In many cases, approximations are good
         | enough.
         | 
         | Besides, this is real life - if predictions and real life
         | match, that's great. If they don't, well you know you went
         | wrong somewhere.
        
         | wpasc wrote:
         | A very-non-expert opinion, if an approach approximates it
         | pretty well and can be improved upon, then it could end up
         | being quite useful. Given that biology exists on a real,
         | tangible scale then perfection in the fold prediction isn't
         | necessary, instead just an approximation that is sufficiently
         | good to be functionally useful.
         | 
         | ^ That sounds like word-salad BS but I think there's some truth
         | to it. I know protein folding has been postulated to be useful
         | in terms of understanding basic biology, understanding disease
         | pathology, and drug prediction. While a wide range of
         | approximations are functionally useless, perhaps the Alphafold
         | approach or some improved version of it surpasses the
         | functionally useful threshold.
         | 
         | At least I hope so
        
         | ashtonbaker wrote:
         | Not really an answer to your question, but is the problem
         | really NP-complete, or just combinatorially difficult? For
         | example how is this condition of NP-completeness satisfied?
         | 
         | > it is a problem for which the correctness of each solution
         | can be verified quickly [0]
         | 
         | [0] https://en.wikipedia.org/wiki/NP-completeness
        
           | Cas9 wrote:
           | According to this answer[0] it seems it's actually NP-Hard,
           | my bad. Haven't seen the proof though, and I'm not an expert.
           | 
           | [0] https://cs.stackexchange.com/questions/128493/is-protein-
           | fol...
        
         | mrfusion wrote:
         | Is it really np complete? If so we could map other np complete
         | problems onto it and let biology solve it for us.
        
       | nextos wrote:
       | Alphafold 2 is very very cool, but we need a little dose of
       | reality. It's still a bit away from really solving protein
       | folding as it was marketed.
       | 
       | For example, multi-complex proteins are not well predicted yet
       | and these are really important in many biological processes and
       | drug design:
       | 
       | https://occamstypewriter.org/scurry/2020/12/02/no-deepmind-h...
       | 
       | A disturbing thing is that the architecture is much less novel
       | than I originally thought it would be, so this shows perhaps one
       | of the major difficulties was having the resources to try
       | different things on a massive set of multiple alignments. This is
       | something an industrial lab like DeepMind excels at. Whereas
       | universities tend to suck at anything that requires a directed
       | effort of more than a handful of people.
        
         | dekhn wrote:
         | many of these resources are available, it's mostly that
         | academic scientists don't have the time, money, or expertise to
         | manage large datasets. However, the community has maintained
         | high quality MSA database for decades and that's exactly the
         | work that DM drafted off.
        
           | gnufx wrote:
           | > academic scientists don't have the time, money, or
           | expertise to manage large datasets
           | 
           | I may be cynical about general expertise, as a support
           | person, but large datasets have long been stock in trade of
           | areas I'm more or less familiar with, whether "large" is TBs
           | or PBs like CERN experiments. (When I were a lad, it was what
           | you could push past the tape interface in a few days -- data
           | big in cubic feet...)
        
             | dekhn wrote:
             | Tape is worthless except for archival purposes (and it's
             | not particularly good). it should not be the constraint on
             | the dataset (IE, any important dataset should already be in
             | live serving with replication).
             | 
             | Very few players wrangle petabytes effectively. Many
             | players _have_ petabytes, but they 're just piles of
             | disorganized data that couldn't be used for training ML.
             | Moving petabytes is still a huge pain and few folks have
             | proficiency in giving ML algorithms high performance access
             | to the data.
        
         | zamalek wrote:
         | I'm genuinely curious: could the output of Alphafold be fed
         | into a classical folding algorithm (as a starting point), or is
         | the output of Alphafold too far down the wrong path, in these
         | cases?
        
         | sbierwagen wrote:
         | >A disturbing thing is that the architecture is much less novel
         | than I originally thought it would be, so this shows perhaps
         | one of the major difficulties was having the resources to try
         | different things on a massive set of multiple alignments.
         | 
         | A similar concern has sparked some worries about "AI overhang"
         | https://www.lesswrong.com/posts/75dnjiD8kv2khe9eQ/measuring-...
         | 
         | Most of the compute in ML research seems to be going into
         | architecture search. Once the architecture is found, training
         | and net finetuning/transfer learning is comparatively cheap,
         | and then inference is cheaper still. This implies we could see
         | 10-100x gains in AI algorithms using today's hardware, or
         | sudden surprising appearance of AI dominance in an unexpected
         | field. (Object grasping in unstructured environments? Art
         | synthesis?) A task could go from totally impossible to trivial
         | in a year. In retrospect, the EfficientNet scaling graph should
         | have alarmed more people than it did:
         | https://learnopencv.com/wp-content/uploads/2019/06/Efficient...
         | 
         | Waymo has been puttering along for years, not announcing much
         | of interest. This may have caused some complacency about self-
         | driving cars, which is a mistake. Algorithms only get better,
         | while humans stay the same. Once Waymo can replace some human
         | drivers some of the time, things will start changing very
         | quickly.
        
         | timr wrote:
         | > A disturbing thing is that the architecture is much less
         | novel than I originally thought it would be, so this shows
         | perhaps one of the major difficulties was having the resources
         | to try different things on a massive set of multiple
         | alignments. This is something an industrial lab like DeepMind
         | excels at. Whereas universities tend to suck at anything that
         | requires a directed effort of more than a handful of people.
         | 
         | Yeah, the HN commentary on Alphafold has a high heat-to-light
         | ratio. I'm eager to read the paper _because_ the previous
         | description of the method sounded remarkably similar to methods
         | that have been around for ages, plus a few twists.
         | 
         | The devil is going to be in the details on this one.
        
           | TaupeRanger wrote:
           | That's the case with basically everything DeepMind does. They
           | have a very good PR department which hypes up everything they
           | do while conveniently ignoring that basically nothing of any
           | practical consequence has come of their endeavors. But I do
           | think it's important that these companies exist now so we can
           | see what _not_ to try going forward.
        
             | timr wrote:
             | Well, the CASP14 results do speak for themselves. Protein
             | structure prediction is not necessarily of great meaning to
             | drug discovery or biology, but they pretty much blew
             | everyone else out of the water in a fair contest. For that
             | reason, they deserve praise.
             | 
             | It's a little like making a robot that is very, very good
             | at something pointless (say, using a yo-yo). Who knows
             | where it might lead, but if they make the best damned yo-yo
             | bot in the world, they deserve whatever praise they get
             | from the yo-yo community.
        
           | MrsPeaches wrote:
           | > high heat-to-light ratio
           | 
           | Sorry for the ignorance but what does this mean?
        
             | AlexCoventry wrote:
             | Emotion-to-understanding ratio
        
             | butMuhCulture wrote:
             | It's trying to say light is more valuable than heat, or
             | some such folksy thing. I cook steak in the dark so I don't
             | find it to be a very insightful metaphor.
        
             | Azrael3000 wrote:
             | Incandescent light bulbs are generally very inefficient in
             | producing light, compared to LED for example. They produce
             | a lot of heat and not much light for which they are made.
             | 
             | So in this context I suppose that gp implies that these
             | threads don't provide much meaningful discussion but rather
             | lots of hand waving.
        
               | HPsquared wrote:
               | Light is also often used in metaphors relating to
               | knowledge, wisdom etc.
        
               | dekhn wrote:
               | "Fiat Lux" not "Fiat Calor"
        
             | timr wrote:
             | It's an idiom implying that there's a lot of chatter and
             | bold claims, but very little of it is factual or
             | informative.
        
           | dm319 wrote:
           | The key difference seems to be using the multiple alignments
           | and assumption about evolutionary conservation? Useful for
           | genes conserved, but less useful for de-novo proteins (like
           | COVID and cancer) I guess?
        
             | timr wrote:
             | Dunno yet. MSAs were always a key input to Rosetta
             | (previous best method). How they were used was very
             | different.
             | 
             | Fundamentally, everything in this space (= non-physical
             | methods) is about inferring structure from things that are
             | closely related. And you can't solve the problem at all for
             | non-trivial proteins using physics, so here we are.
        
       | pjfin123 wrote:
       | I'm assuming you can't run this on any consumer computer?
        
         | pjfin123 wrote:
         | Nevermind
         | 
         | > The simplest way to run AlphaFold is using the provided
         | Docker script. This was tested on Google Cloud with a machine
         | using the nvidia-gpu-cloud-image with 12 vCPUs, 85 GB of RAM, a
         | 100 GB boot disk, the databases on an additional 3 TB disk, and
         | an A100 GPU.
        
           | sambroner wrote:
           | That's... way closer to consumer than I expected
        
             | qeternity wrote:
             | For inference...
             | 
             | Still accessible, but expensive to run at scale. And
             | training even worse.
        
             | lifthrasiir wrote:
             | Except for (DGX) A100.
        
         | erhk wrote:
         | 2.2TB data
        
           | dekhn wrote:
           | which is basically nothing. They could put it in a cloud
           | bucket and you could copy it to another bucket in minutes.
        
           | lasagnaphil wrote:
           | Nah, 4TB disk drives are not that expensive.
        
           | crazysim wrote:
           | Amazing. That's not a lot of libraries of congresses at all.
        
       | fossuser wrote:
       | Does anyone on HN work in bio or drug discovery?
       | 
       | Could you give an overview of how people can leverage this (or
       | how you might?).
       | 
       | From reading around about it, it sounds like there's often a need
       | to find a certain type of molecule to activate/inhibit another
       | based on shape and the ability to programmatically solve for this
       | makes the searching way easier.
       | 
       | Is this too oversimplified/wrong? How will this be used in
       | practice.
       | 
       | [Edit]: Thanks for the answers!
        
         | timr wrote:
         | > Could you give an overview of how people can leverage this
         | (or how you might?).
         | 
         | Short answer: nobody knows. Traditionally, protein folding is a
         | solution in search of a problem, but that's largely because the
         | predictions were...unusably bad. This was always more of a
         | super-difficult validation problem for the force fields and
         | simulation methods, which could then be used for other problems
         | of greater value (such as rational protein design, or
         | simulation of the motion of proteins with known structures).
         | 
         | These predictions are better, but still pretty far from the
         | level of precision that you'd want for any kind of rational
         | drug design, where the exact locations of protein side-chains
         | (for example) matter a lot. You'll note that AlphaFold returns
         | structures that are "relaxed" using one of the oldest
         | simulation systems for proteins: AMBER. So it's not exactly a
         | clean-room solution to the problem, and you can't assume that
         | the details (which matter to drug design) are going to be any
         | better than for the older methods.
         | 
         | But that said, if you have a method that can _reliably_ give
         | you a blurry view of the overall shape of a protein, even that
         | could be useful for things like target discovery or inference
         | of biological networks. But this is still a lot closer to pure
         | research than  "revolutionizing drug discovery", as is
         | frequently batted around on reddit, HN and the press.
        
           | dekhn wrote:
           | Also I would say that really they just made improvements to
           | protein structure prediction, not _protein folding_ which is
           | the dynamic process by which proteins reach their equilibrium
           | fold.
        
             | timr wrote:
             | Most definitely.
        
         | dumb1224 wrote:
         | I work in cancer research with a drug discovery focus in a lab
         | with some structure biologists. My understanding is that if we
         | identified proteins targets suitable for therapeutics then
         | understand its structure to identify secondary binding sites
         | could be crucial for drug discovery. Drugs can then be designed
         | to modulate its biological functions.
        
         | COGlory wrote:
         | You can't do intelligent drug design if you don't know what the
         | target protein looks like. We've gotten great at solving
         | protein structures with things like crystallography and cryo-EM
         | microscopy. Unfortunately, many interesting drug targets reside
         | in the membrane of a cell, which means you can't easily work
         | with them in a lab because they aren't soluble in anything but
         | a plasma membrane. For instance, this is an issue with the
         | 5HT2A protein, a g coupled protein receptor that is implicated
         | in many serotonin related pathways.
         | 
         | Being able to predict what it would look like would be a huge
         | deal because then you can go about intelligently designing
         | drugs for it.
        
           | ponsko wrote:
           | You should check out Salipro (https://www.salipro.com/) for
           | membrane protein reconstitution.
        
         | dekhn wrote:
         | I've worked in bio and drug discovery for some 25 years. That
         | includes building classifiers using gradient descent in the 90s
         | (when algorithms, computers and data were all much worse). I
         | ported DOCK to Linux in ~96 or 97. Since then I built an
         | academic and then industrial career with some emphasis on using
         | computing to solve problems in drug discovery, but I don't play
         | that role any more.
         | 
         | It doesn't look like the models produced by this would
         | immediately turn the challenging problem of finding, approving,
         | and marketing successful pharmaceuticals (IE, it doesn't
         | eliminate any real bottleneck).
         | 
         | There was a long-term dream of structure-based drug discovery
         | based on docking, but IMO, it has never really proved itself
         | (most of the examples of success are cherry picked from a much
         | larger pile of massive failures).
        
           | miltondts wrote:
           | > ... but I don't play that role any more.
           | 
           | I was thinking of going into that field. Can you expand a bit
           | on why you left?
        
             | dekhn wrote:
             | Because programming computers is far more lucrative, and
             | I'm better at it. However, if I had an unlimited budget I
             | would return to biology.
             | 
             | I spent 15 years trying to be a professor and failed
             | miserably. I was bad at it and didn't like what professors
             | have to do.
             | 
             | I then moved to industry to be a random engineer and
             | thrived doing things entirely unrelated to drug discovery.
             | Eventually, I convinced my company to invest heavily in
             | life sciences. This was successful and I was on track to be
             | a powerful player (a "research engineer", just like the DM
             | folks who are building these things) in this space, when
             | the project got very popular and I was elbowed aside by
             | others who are more aggressive. So I went back to being a
             | programmer again, it's much less stressful, pays better,
             | and realistically, much of my time is just telling
             | scientists what I would do if I was in their place anyway.
             | 
             | "Don't swim with the sharks if you don't like being bitten"
        
               | gnufx wrote:
               | > much of my time is just telling scientists what I would
               | do if I was in their place anyway.
               | 
               | That sounds familiar. I guess they mostly don't listen,
               | whatever your record -- especially if it was in a
               | different field they could learn from -- but I hope it's
               | not always like that.
        
               | yudlejoza wrote:
               | Most comp-biologists who work directly with programmers
               | are some of the biggest jerks, and the least qualified
               | tech folks.
               | 
               | They hide all of that under "I'm a scientist, you're
               | not".
        
               | fossuser wrote:
               | Maybe a culture clash? Academia is all about status and
               | prestige - more often scientific outcomes seem to be a
               | means to get the former (why journals don't publish
               | negative results, why studies fail to replicate, why
               | stuff isn't open access, why people worry about getting
               | scooped, etc.)
               | 
               | Tech (at its best) hates credentialism (sometimes I think
               | to a point of over-correction).
               | 
               | That said, 80% of the devs in the bay area seem to have
               | gone to Stanford or MIT, so...
        
         | nick238 wrote:
         | I haven't worked on the drug-side of things, but here my bio
         | perspective: It's kind of out-of-vogue, but consider the "lock
         | and key" model of proteins and small molecules (drugs). For
         | drug design, what you want to do is get a key that fits just
         | one lock (to pull whatever lever) and not others (to avoid
         | side-effects). It's relatively easy to find a molecule that
         | fits a protein, because that protein is what you might spend
         | years researching and probing, but it's tricky to check if it
         | does anything against ~100,000 others in humans. If you could
         | do an _in silico_ computational survey to be like, oh, maybe it
         | 'll target this accidentally, you could spot-check those _in
         | vitro_ , and/or stick on some other atoms to your small-
         | molecule to make it not fit that off-target.
         | 
         | Holy grail, IMO, though is being able to design _de novo_
         | protein sequences (to make  "biologics", aka engineered protein
         | drugs) that can a) target (bind/block/enhance) or do (chemical
         | reactions) what you want and only that, b) are easily
         | synthesizeable by bacteria/yeast (cheap to make), and c) are
         | stable (easy to transport/store).
        
           | slownews45 wrote:
           | First seems reasonable. I've not heard of anything on the
           | later coming even close credibly - though is an obvious holy
           | grail.
        
         | zosima wrote:
         | It can be an aid in drug development, and can perhaps assist a
         | bit in tuning small molecule drugs for more stable binding.
         | 
         | Though I think the major impacts will be two-fold:
         | 
         | (1) The field of structural biology is going to see a change,
         | with much more data available. Some structures of difficult to
         | crystallize proteins will be solved, which may lead to much
         | greater biological understanding. We may enter a time, where
         | once you have a primary sequence, you also have a likely
         | 3d-structure, which will probably change the daily work of
         | quite a few biologists a bit.
         | 
         | (2) Industrial protein design. A tool such as this can
         | potentially have great utility in optimizing proteins as
         | chemical catalysts for various processes in different
         | industries. This includes expanding the conditions under which
         | a protein is active and also making their conformation more
         | stable and so the protein more long-lived in solution.
        
           | dekhn wrote:
           | For those that are unaware, industrial protein design is a
           | multibillion dollar industry. For example, decades ago
           | Genentech and Dow Corning formed a company that developed
           | proteases (proteins that cut other proteins) that worked at
           | much higher temperatures than the ones in nature. This was
           | then sold to P&G and other major laundry companies (laundry
           | detergent contains idle enzymes activated by the heat of the
           | laundry water, and they go clean up. "Protein gets out
           | protein" was the marketing jingle.
           | 
           | That was a few billion dollars right there and almost all the
           | work was done by hand by lab scientists.
        
       | [deleted]
        
       | Cas9 wrote:
       | Honest question: since AlphaFold doesn't really _solve_ the
       | protein folding problem (it's NP-complete after all), but only
       | _approximates_ solutions very well, what are the real impacts of
       | this? Isn't a good approximation of a protein enough to cause
       | unexpected problems? How do we know that an approximate structure
       | will perform the same as the correct solution?
        
         | Ultimatt wrote:
         | There is a lot of bias in the chat here from a more chemistry
         | and pharma slant. If you ignore this AlphaFold solves in a very
         | meaningful way the problem blocking a lot of science
         | investigation.
         | 
         | For comparative and evolutionary analysis structure is far more
         | conserved than sequence. Especially in things like viruses or
         | anything with a high rate of reproduction like bacteria. Just
         | knowing the general fold or overall structure is enough to do
         | structural alignment and tell if two genes are related on that
         | basis, even if their genomic sequence is completely dissimilar.
         | Large groups of researchers rely on sequence homology built
         | from sequences of known structure.
         | 
         | But AlphaFold works well in new sequence space to far more
         | accuracy than is needed. If we had an AlphaFold prediction for
         | every known sequence suddenly the evolutionary relationships
         | between all genes and even all species would be far clearer.
         | This on its own unlocks a new foundation to reason about
         | function and molecular interaction with a wholistic systems
         | view without gaps in what we can know with some reasonable
         | assurance.
         | 
         | For an analogy think of the difference between having books in
         | different languages describing objects. You know what some of
         | the book in English might say but you dont even know if the
         | book in Spanish is even talking about the same things.
         | AlphaFold is like an AI that transforms all the books into
         | picture books and now we can use image similarity or have one
         | person look at all pictures.
        
       | devindotcom wrote:
       | Also announced today was RoseTTAFold from UW's Baker Lab, which
       | claims nearly the same accuracy at much higher efficiencies.
       | There's a public server and paper in Science.
       | 
       | More info here and here:
       | 
       | https://www.bakerlab.org/index.php/2021/07/15/accurate-prote...
       | 
       | https://techcrunch.com/2021/07/15/researchers-match-deepmind...
        
         | [deleted]
        
       | stupidcar wrote:
       | The model parameters are only available for non-commercial use.
       | That's a shame, as I presume there might be a lot of medical
       | startups that would benefit from having this kind protein-folding
       | tech available.
        
         | mikewarot wrote:
         | Unless I'm mistaken, you could train the model yourself,
         | starting with a random set of values. In time, your error rates
         | would be low enough to have a new set of parameters which you
         | could use however you like.
        
       | COGlory wrote:
       | I am a structural biologist. This is one of the handful of topics
       | that overlaps with my field here. I'm very excited to play with
       | this, although it might eventually put me out of a job.
        
         | AnimalMuppet wrote:
         | Here's where I think we need to be going: You go to a doctor's
         | office, sick. 1) They take a blood sample. 2) They find the
         | malignant bacteria and DNA sequence it. 3) If it's a known
         | strain, they know what antibiotics to use on it. 4) If not,
         | they solve protein folding on the genes. 5) From that, they see
         | which existing antibiotics would kill it. 6) If none will, then
         | given the proteins, they have to derive a new antibiotic.
         | 
         | 1) is easy. 2) might not be - there can be a lot of things in a
         | blood sample, and finding only the interesting (bad) things
         | might not be simple. The sequencing part is pretty much solved.
         | 3) would take a bit of work, but I think it's possible now. 4)
         | we're getting there. 5) might have a fair amount in common with
         | 3), but it probably takes some additional work. 6) is...
         | probably non-trivial.
         | 
         | That's just one research agenda. There are others. You may have
         | to move to related work, but I doubt you're going to be out of
         | a job in this lifetime.
        
         | rllearneratwork wrote:
         | why would it put you out of job? Wouldn't it just become one of
         | the tools you use?
        
           | dekhn wrote:
           | It would both become a tool he used (to produce initial
           | structures to fit in density maps) and a tool that used his
           | or her output (because alphafold requires known protein
           | structures that are homologous to the one you're predicting).
        
           | nikhilsimha wrote:
           | The implicit assumption you are making is that the demand
           | increases in lock step with productivity gains. 100x faster
           | drug discovery, 100x more drugs _need_ to be discovered = >
           | same number of people employed.
           | 
           | These correlations do hold for technical fields, but
           | logically there should be a point beyond which productivity
           | gains outpace, demand growth / demand could even stop
           | growing. One should either retool to solve a newer problem
           | before this point is reached, or hope that the point is not
           | reached in the span of their career.
           | 
           | Oil rig builders for example - manufacturing has been
           | increasingly automated, but the demand for oil rig building
           | has grown consistently. But they should probably look into
           | solving other problems given that demand is shifting.
        
             | mensetmanusman wrote:
             | However, complexity for the structures is essentially
             | unbounded on a time scale of the universe timeframe.
        
             | sbierwagen wrote:
             | >but logically there should be a point beyond which
             | productivity gains outpace
             | 
             | The limiting factor on drug approval is clinical trials.
             | Once every living person is enrolled in a clinical trial,
             | we will have hit the maximum rate at which humanity can
             | produce new drugs.
             | 
             | That might be more than 10x the current rate, but probably
             | less than 1000x.
        
               | dekhn wrote:
               | In principle you could put people into multiple trials
               | and gain somewhat additional throughput. Google
               | implemented putting users into multiple different
               | experiments (paper by Tang et al) and that made a huge
               | difference.
        
       ___________________________________________________________________
       (page generated 2021-07-15 23:00 UTC)