hngopher.com

       [HN Gopher] AlphaFold Protein Structure Database
       ___________________________________________________________________
        
       AlphaFold Protein Structure Database
        
       Author : matejmecka
       Score  : 229 points
       Date   : 2021-07-22 15:15 UTC (7 hours ago)
        
 (HTM) web link (alphafold.ebi.ac.uk)
 (TXT) w3m dump (alphafold.ebi.ac.uk)
        
       | sdbrown wrote:
       | This is a fabulous convenience! The reach of this ready-to-go
       | data will be much larger (in some directions) than the model and
       | CASP results themselves.
        
       | culopatin wrote:
       | I happen to be working on a database for folds as well. But RNA
       | folds not protein folds. I'm not a bio guy but my gf is and if I
       | understand correctly this is not the same. I hope they are
       | different because it would suck to be me lol.
       | 
       | This is my first big boy project and I'm driving solo so it takes
       | me a while to make any progress. But at least now I have this db
       | and genbank to model after
        
       | ricksunny wrote:
       | I'm sorry but why don't tbey just release the ability for a user
       | to enter a known real-world sequence's accession number from
       | Genbank / GISAID, and generate the protein structure from that?
       | Why do they have to abstract the user from the process by only
       | exposing a completed database of the protein structures the
       | Alphafold researchers decided would be worth producing?
        
         | sherjilozair wrote:
         | DeepMind has already released the open source code and model
         | parameters. The database makes it easier to access the
         | predictions.
        
         | sveme wrote:
         | I'd guess the ad-hoc simulation of the structure is
         | computationally quite expensive and takes a while, though
         | that's just a guess and I haven't read the original paper yet.
        
           | ricksunny wrote:
           | In fact a cost of $1-$4 for the preferred implementation:
           | 
           | https://news.ycombinator.com/item?id=27894060
           | 
           | The colab provides a slightly-less-accurate version that
           | operates in the cloud. For the real mccoy it seems one must
           | set up one's own environment and leverage the git repo.
        
         | tazjin wrote:
         | You can use the open-source code, and we also have a Colab
         | notebook for that: https://bit.ly/alphafoldcolab
         | 
         | More info: https://deepmind.com/blog/article/putting-the-power-
         | of-alpha...
        
           | ricksunny wrote:
           | Thanks for that - I can see why my comment was downvoted now,
           | as the the posted article's FAQ lists these links for those
           | who would like to study their favorite sequenced-but-
           | unmodeled protein. I'm glad Alphafold is as open source as it
           | is, and I recognize that it didn't have to be so.
           | 
           | I think I was primed for a knee-jerk reaction because when
           | Alphafold's results were announced back in Dec. 2020, with
           | expressions of what a boon it would be for researchers around
           | the globe, I anticipated there would be a timeline announced
           | for exposing a tool or for the open-sourcing. (The Github
           | repo has only just been released about 6 days ago ...)
           | 
           | With all the work on SARS-CoV-2's 'interactome', as well as
           | human proteins & enzymes involved in pharmacology of
           | antiviral drugs under development / repurposing , it's easy
           | to imagine that drug developers would have liked to exercise
           | Alphafold as soon as it was announced. (I myself have wanted
           | a structure for human enzyme OATP1A2 that wasn't available on
           | the PDB for such a drug pharmacology study - quite glad it is
           | available at hand now.. .:) ).
           | 
           | Anyway I'm sure good arguments will be made about the need to
           | really 'get it right' before releasing, or internal
           | deliberations on how much to open up vs charging for it.
           | 
           | But 7 months lead time during a pandemic is a long time...
           | 
           | In all cases thanks again for this innovation's availability
           | now. :)
        
       | [deleted]
        
       | spacecity1971 wrote:
       | Quick question, please excuse my ignorance, but is there a way to
       | extrapolate sequence from structure? In other words, can we
       | design proteins and calculate the sequence required to make it?
        
         | kmckiern wrote:
         | It's hard but people do it! This is the field of "protein
         | engineering".
        
       | moyix wrote:
       | Anyone else getting a 403 Forbidden?
       | 
       | If so it might be better to link to the paper instead:
       | https://www.nature.com/articles/s41586-021-03828-1
        
         | jkh1 wrote:
         | Works fine for me. Must have been a temporary glitch.
        
       | dnautics wrote:
       | yikes, this doesn't even do some basic stuff like trim off pre-
       | protein segments for secreted proteins... Without this, you could
       | get some very incorrect structures.
        
         | [deleted]
        
       | lumost wrote:
       | I used to do some RNA molecular dynamics simulations in college
       | which were both computationally expensive and difficult to
       | replicate. Having the ability to reasonably predict protein
       | structure is an incredible scientific achievement - however I am
       | curious if anyone here who is better informed has takes on the
       | following.
       | 
       | 1. How likely is it that alphafold learned to accurately predict
       | protein structure in the narrow domain of proteins that have been
       | experimentally synthesized and whose structure has been measured?
       | in other words will AlphaFold's results generalize to proteins
       | which cannot yet be synthesized in the laboratory.
       | 
       | 2. If Alphafold's accuracy holds, what type of commercial
       | applications does this open up?
        
       | nharada wrote:
       | From the abstract[1]:
       | 
       | > After decades of effort, 17% of the total residues in human
       | protein sequences are covered by an experimentally-determined
       | structure. Here we dramatically expand structural coverage by
       | applying the state-of-the-art machine learning method,
       | AlphaFold2, at scale to almost the entire human proteome (98.5%
       | of human proteins).
       | 
       | [1] https://www.nature.com/articles/s41586-021-03828-1
        
         | vmception wrote:
         | Basically they are saying that decades of distributed protein
         | folding was useless and everyone would have had more utility
         | mining cryptocurrency if it existed several years earlier
         | 
         | But at least it inspired someone to make and release this
        
           | dekhn wrote:
           | you're conflating two different disciplines: distributed
           | protein folding studies the biophysical process of proteins
           | folding over time, while protein structure prediction makes a
           | static single predict of what is believed to be the final
           | structure adopted by the protein in the folding process.
           | 
           | I think many people believe that given infinite computer time
           | the protein folding simulations would produce the same output
           | as the static prediction (modulo a number of complex details)
           | but use far, far more computer time to get there.
           | 
           | The fundamental observation from the DM AF2 paper that I've
           | been able to glean (which I kind of sort of already believed)
           | is that careful multiple sequence alignments of 30-100
           | evolutionarily related proteins is enough to produce coarse
           | distance constraints that can be used to guide a structure
           | prediction to a good answer quickly. And that depended on new
           | ML technology that didn't exist before.
        
             | vmception wrote:
             | thanks for that explanation!
        
           | cing wrote:
           | Just in case you're not joking, it's worth noting that the
           | majority of distributed molecular simulation (past and
           | present) is spent studying "folded proteins" to discover
           | structures of proteins that are often hidden from methods
           | like AlphaFold (currently). For example,
           | https://www.nature.com/articles/s41557-021-00707-0
        
           | dmitryminkovsky wrote:
           | > experimentally-determined structure
           | 
           | refers to structures determined by means of physical
           | examination, with like crystallography, not to attempts at
           | predictive computational analysis prior to AlphaFold, which
           | were not accurate compared to AlphaFold.
        
           | ramraj07 wrote:
           | I don't know if you know, but doctors spent 1,300 YEARS using
           | the wrong anatomy book. A few years and compute time isnt the
           | end of the world. I'm sure oracle's DB2 test suite has burned
           | more carbon than protein folding labs have.
        
           | Jabbles wrote:
           | A third way in which you are wrong is that AlphaFold derives
           | a lot of its power by referring to previously-solved protein
           | structures, or parts of them. It doesn't fold the proteins
           | from scratch in an "alpha-zero" way.
        
             | vmception wrote:
             | so its more like protein folding _was_ useless until an AI
             | could make sense of the 17% solved variations and using
             | that for the other 83% of proteins found in humans?
             | 
             | > After decades of effort, 17% of the total residues in
             | human protein sequences are covered by an experimentally-
             | determined structure. Here we dramatically expand
             | structural coverage by applying the state-of-the-art
             | machine learning method, AlphaFold2, at scale to almost the
             | entire human proteome (98.5% of human proteins).
             | 
             | I just don't actually understand the quote from the article
             | if it isn't comparing the same thing
        
       | _RPL5_ wrote:
       | This is awesome! When they announced CASP results a few months
       | ago, I was wondering if AlphaFold will be accessible as an API,
       | where you can submit a protein id or a sequence and get back a 3D
       | structure. This database is basically that, except it's free &
       | open to the public. Major props!
        
       | Ovah wrote:
       | Interesting that they're porting it to other organisms. Different
       | organisms have variations in ribosomes, post translational
       | modifications and even tRNA repertoire. So it's not a guarantee
       | that two identical DNA sequences will give identical proteins in
       | two different organisms.
        
         | ramraj07 wrote:
         | ??? Unless you jump from eukaryotes to archea these are not
         | real concerns. Most PTM markers are very conserved.
        
           | Ovah wrote:
           | I'd say the jump from eukaryotes to procaryotes is a
           | realistic scenario in recombinant DNA technology.
           | 
           | I have some experience with recombinant yeast and PTMs.
           | Degree of glycosylation actually vary a lot depending on
           | strain used and has a huge effect of protein activity. And of
           | course these PTMs affects the crystal structure.
        
         | pelorat wrote:
         | Shouldn't matter? Protein folding is based on the laws of
         | physics after all. If DNA sequences folds differently in
         | different organisms then an external factor is missing.
        
           | Ovah wrote:
           | While the laws of physics remain the same, the folding
           | machinery between species varies to some degree. Protein
           | folding is determined by the unique environment/machinery of
           | a cell. A concrete example is disulphide bonds (S-S, ex
           | cystein-cystein) that require a certain pH to form. The
           | primary pathways of disulphide-bond formation are localized
           | in the endoplasmic reticulum (ER) of eukaryotic cells and the
           | periplasmic space of prokaryotic cells. So two complete
           | different mechanisms to end up with the same bond (protein
           | structure) depending on the organism.
        
             | dnautics wrote:
             | Outside of missing post translational modifications, can
             | you give a concrete example of a protein that is known to
             | fold differently in different species, not counting, say,
             | stuff getting sent to the garbage bin of inclusion bodies
             | due to the stress of overexpression? My understanding (7
             | years of grad school researching protein folding in the ER)
             | is that outside of some rare corner and disease state
             | cases, folding is pretty much binary event, and if it
             | weren't for most cases the low delta g difference between
             | isoforms would be just as easily overcome over the course
             | of environmental changes in a single individual as "between
             | different species" namely having a deterministic outcome is
             | important for through-time robustness.
        
       | ramraj07 wrote:
       | As an ex biomedical researcher I was trying to think what protein
       | I should enter and see, and couldn't come up with a protein that
       | I know of, that didn't have a structure already (at least a crude
       | one). That is, we roughly know how most known important proteins
       | look like. This is an amazing tool, and will he indispensable in
       | labs (I'll expect any lab to use this site at least once a year?)
       | But it's not as transformative as some might think.
        
         | amelius wrote:
         | https://www.embl.org/news/science/alphafold-potential-impact...
         | 
         | > A discussion of the applications that AlphaFold DB may enable
         | and the possible impact of the resource on science and society
        
         | pelorat wrote:
         | Do we really know the structure of every protein that assembles
         | into a human cell?
        
           | seventytwo wrote:
           | Definitely not.
        
           | cing wrote:
           | One of the reasons we don't have them all is that individual
           | genes can encode for multiple protein isoforms through
           | alternative splicing. AlphaFold was only run on one.
           | Otherwise, there's lots of important biochemical/biophysical
           | processes that impact structure, as cells are only about 50%
           | protein by weight.
        
           | _RPL5_ wrote:
           | From their abstract:
           | 
           | ---
           | 
           | After decades of effort, 17% of the total residues in human
           | protein sequences are covered by an experimentally-determined
           | structure1. Here we dramatically expand structural coverage
           | by applying the state-of-the-art machine learning method,
           | AlphaFold2, at scale to almost the entire human proteome
           | (98.5% of human proteins). The resulting dataset covers 58%
           | of residues with a confident prediction, of which a subset
           | (36% of all residues) have very high confidence.
           | 
           | https://www.nature.com/articles/s41586-021-03828-1
           | 
           | ---
           | 
           | The metric they use (residues) is a bit unusual (I would have
           | used number of proteins instead), but I assume they wanted to
           | account for ambiguity (such as proteins with partial
           | structures).
        
       | narrator wrote:
       | Gain of function researchers working for the world's militaries
       | will use this research to figure out how to get viruses to attach
       | to receptor sites peculiar to particular races. The people
       | developing the antivirals will have a lot harder time countering
       | these weapons because making antivirals that aren't poisonous in
       | some weird way is a much harder job. If this is not the case,
       | please let me know why, it will really help me sleep better at
       | night.
       | 
       | A U.S congressional representative came out of a classified
       | briefing recently and announced that the CCP is hard at work on
       | race specific bioweapons.[1]
       | 
       | Unfortunately, I think this is the launch of a new era of weapons
       | we're seeing right now. The biggest development in war since the
       | atom bomb. Like the atom bomb, the big question was will we kill
       | ourselves with this technology. Who knows?
       | 
       | [1]https://yournews.com/2021/07/22/2185645/rep-marjorie-
       | taylor-...
        
         | drcode wrote:
         | ...and many doctors will use it to attach pharmaceuticals to
         | receptor sites of particular cancers.
        
           | narrator wrote:
           | I'm thinking that the problem is is that it is much harder to
           | develop drugs that only kill cancers very efficiently and
           | don't harm the rest of the body than to tweak viruses that
           | just have to keep the person alive long enough to spread the
           | virus.
        
             | drcode wrote:
             | I 100% agree your point is valid. The counterargument is
             | "Yes, people can do bad things with protein data, just as
             | they can do bad things with a telephone, like use it to
             | discuss a bank robbery."
        
             | narrator wrote:
             | The crazy part is a bioweapons program is really cheap
             | compared to a nuclear weapons program, and now with these
             | new tools it's even cheaper. Before, it was vastly more
             | expensive to do the cycle of creating a new viral protein
             | and testing a bioweapon on human cell culture. Now that
             | process is speeded up millions of times with this
             | technology because that can all take place inside a
             | computer.
             | 
             | This is similar to the change with drone weaponry. Before,
             | you had to have large cruise missiles to get pinpoint
             | strikes. Now small countries like Azerbaijan can buy a
             | whole fleet of drone weapons and get the benefits of having
             | a modern air force with pinpoint strikes and even stealth
             | for vastly less money.
        
               | mlyle wrote:
               | Is this a correct summary of your statements:
               | 
               | Because it -might- make things slightly easier for a
               | state actor with nigh-unlimited resources to enact a
               | doomsday scenario, which they might or might not be
               | pursuing, medical researchers should not publish
               | otherwise helpful research?
        
               | narrator wrote:
               | I think it's great that the Wuhan institute published all
               | their gain of function research. They even said who paid
               | for it. It's a clear trail back to them, but apparently
               | taking any action to acknowledge that this is a bad thing
               | and something fishy might be going on is a completely
               | politicized issue now that apparently gets as many
               | downvotes as arguing about hot button political topics
               | now.
               | 
               | What I'm saying is there should at least be an open and
               | frank discussion of what the whole world is getting
               | itself into right now with all this.
        
         | ravila4 wrote:
         | 1. Gain of function is not as easy as you think. 2. Such bio-
         | weapons are not likely because any virus released in the wild
         | will mutate over time, and also because you cannot target
         | "races" in the way you describe. Phenotypic traits span across
         | geographical borders, and any attempt to do such a thing is
         | likely to backfire.
        
           | narrator wrote:
           | I think if the CCP were successful in creating race targeted
           | bioweapons it would be in their interest to convince the
           | world that they didn't exist.
           | 
           | Insults, character assassination campaigns and politicizing
           | the existence of these bioweapons would be a good way to do
           | that. Just copy paste some of the comments here and change
           | the name to insult anyone who thinks they exist. They could
           | then go and kill millions and not receive any retaliation
           | whatsoever with people praising them for their effective
           | program of keeping the disease epidemic they created under
           | control. Even if you got the guy who discovered AIDS and won
           | the Nobel prize for it to say that these were gain of
           | function viruses that incorporated HIV protein parts, you
           | could just launch a big propaganda campaign to attack his
           | character.[1] Much cheaper than having to fight a war.
           | 
           | [1]https://www.gmanetwork.com/news/scitech/science/736458/fre
           | nc...
        
         | moistly wrote:
         | > _Ridiculous fookin' idjit and compulsive liar M.T.Greene_
         | came out of a classified briefing recently and announced that
         | the CCP is hard at work on race specific bioweapons.
         | 
         | Fixed that for you. By the way, you shouldn't pay attention to
         | that clown.
        
       | jkh1 wrote:
       | Didn't see this post so posted it also. Also relevant:
       | https://www.embl.org/news/science/alphafold-potential-impact...
        
       | pelorat wrote:
       | There's a lot of news about AlphaFold lately but what about
       | Rossettafold? Wasn't it more accurate and much faster?
        
         | creddit wrote:
         | I believe slightly less accurate but significantly faster is
         | where it stands.
        
           | pelorat wrote:
           | Running a sequence against both seems like a good idea. If
           | they agree the certainty will go way up.
        
       | visarga wrote:
       | Citation factory, that's what it is.
        
         | abcc8 wrote:
         | Resources as useful as this are bound to be. We do cite our
         | sources after all.
        
       | stephanheijl wrote:
       | I'm impressed and grateful that DeepMind released this resource,
       | this will save a lot of compute from labs trying to replicate an
       | entire exome for themselves. While some structures look great,
       | there are still some misses here. Important structures like BRCA1
       | (a well-studied breast cancer associated protein) are just
       | structures for the BRCT and RING domains surrounded by a low-
       | confidence string of amino acids, likely shaped to be globular:
       | https://alphafold.ebi.ac.uk/entry/P38398
       | 
       | Maybe I was wrong for expecting the impossible here, but I was
       | excited to see this specific structure and it appears that there
       | is still work to do. Nevertheless, kudos to Deepmind on their
       | amazing achievement and contributions to the field!
        
         | maga wrote:
         | A curious non-biologist here: how valuable are these low
         | confidence predictions for biologists? In other words, is it
         | hard to predict but easy to check situation as with, say, prime
         | numbers in mathematics?
        
           | toufka wrote:
           | The medium-confidence predictions are great for grounding or
           | sourcing intuition. If you're trying to divide up a protein
           | for an experiment and you have to choose where to divy it up
           | - you'd like to use even a bad prediction to help weight an
           | otherwise completely random approach. AND there are great
           | methods to help with this, but they're often custom, time-
           | consuming, and out-of-field for most. So being able to very
           | quickly spot-check using a uniform state-of-the art, for any
           | arbitrary protein, makes it actually pretty useful for
           | certain kinds of pre-experimental guidance.
        
           | devindotcom wrote:
           | Some are valuable for the reasons the other person responding
           | noted, but some of the low confidence predictions may also be
           | high confidence predictions of a disordered class of protein
           | that doesn't have a standard rest state. So it's useful work
           | one way or the other.
        
         | cing wrote:
         | Everything between the BRCT and RING domains of BRCA1 is an
         | intrinsically unstructured region which DeepMind correctly
         | predicts, https://pubmed.ncbi.nlm.nih.gov/15571721/
         | 
         | Another famous one would be R-domain of CFTR, which was not
         | resolved in experimental structure determination, and AlphaFold
         | models correctly show disorder there. Nothing to be done in
         | those cases except perform molecular simulation or other
         | experiments to assess dynamic ensembles,
         | https://alphafold.ebi.ac.uk/entry/P13569
        
       ___________________________________________________________________
       (page generated 2021-07-22 23:00 UTC)