[HN Gopher] AlphaFold Protein Structure Database ___________________________________________________________________ AlphaFold Protein Structure Database Author : matejmecka Score : 229 points Date : 2021-07-22 15:15 UTC (7 hours ago) (HTM) web link (alphafold.ebi.ac.uk) (TXT) w3m dump (alphafold.ebi.ac.uk) | sdbrown wrote: | This is a fabulous convenience! The reach of this ready-to-go | data will be much larger (in some directions) than the model and | CASP results themselves. | culopatin wrote: | I happen to be working on a database for folds as well. But RNA | folds not protein folds. I'm not a bio guy but my gf is and if I | understand correctly this is not the same. I hope they are | different because it would suck to be me lol. | | This is my first big boy project and I'm driving solo so it takes | me a while to make any progress. But at least now I have this db | and genbank to model after | ricksunny wrote: | I'm sorry but why don't tbey just release the ability for a user | to enter a known real-world sequence's accession number from | Genbank / GISAID, and generate the protein structure from that? | Why do they have to abstract the user from the process by only | exposing a completed database of the protein structures the | Alphafold researchers decided would be worth producing? | sherjilozair wrote: | DeepMind has already released the open source code and model | parameters. The database makes it easier to access the | predictions. | sveme wrote: | I'd guess the ad-hoc simulation of the structure is | computationally quite expensive and takes a while, though | that's just a guess and I haven't read the original paper yet. | ricksunny wrote: | In fact a cost of $1-$4 for the preferred implementation: | | https://news.ycombinator.com/item?id=27894060 | | The colab provides a slightly-less-accurate version that | operates in the cloud. For the real mccoy it seems one must | set up one's own environment and leverage the git repo. | tazjin wrote: | You can use the open-source code, and we also have a Colab | notebook for that: https://bit.ly/alphafoldcolab | | More info: https://deepmind.com/blog/article/putting-the-power- | of-alpha... | ricksunny wrote: | Thanks for that - I can see why my comment was downvoted now, | as the the posted article's FAQ lists these links for those | who would like to study their favorite sequenced-but- | unmodeled protein. I'm glad Alphafold is as open source as it | is, and I recognize that it didn't have to be so. | | I think I was primed for a knee-jerk reaction because when | Alphafold's results were announced back in Dec. 2020, with | expressions of what a boon it would be for researchers around | the globe, I anticipated there would be a timeline announced | for exposing a tool or for the open-sourcing. (The Github | repo has only just been released about 6 days ago ...) | | With all the work on SARS-CoV-2's 'interactome', as well as | human proteins & enzymes involved in pharmacology of | antiviral drugs under development / repurposing , it's easy | to imagine that drug developers would have liked to exercise | Alphafold as soon as it was announced. (I myself have wanted | a structure for human enzyme OATP1A2 that wasn't available on | the PDB for such a drug pharmacology study - quite glad it is | available at hand now.. .:) ). | | Anyway I'm sure good arguments will be made about the need to | really 'get it right' before releasing, or internal | deliberations on how much to open up vs charging for it. | | But 7 months lead time during a pandemic is a long time... | | In all cases thanks again for this innovation's availability | now. :) | [deleted] | spacecity1971 wrote: | Quick question, please excuse my ignorance, but is there a way to | extrapolate sequence from structure? In other words, can we | design proteins and calculate the sequence required to make it? | kmckiern wrote: | It's hard but people do it! This is the field of "protein | engineering". | moyix wrote: | Anyone else getting a 403 Forbidden? | | If so it might be better to link to the paper instead: | https://www.nature.com/articles/s41586-021-03828-1 | jkh1 wrote: | Works fine for me. Must have been a temporary glitch. | dnautics wrote: | yikes, this doesn't even do some basic stuff like trim off pre- | protein segments for secreted proteins... Without this, you could | get some very incorrect structures. | [deleted] | lumost wrote: | I used to do some RNA molecular dynamics simulations in college | which were both computationally expensive and difficult to | replicate. Having the ability to reasonably predict protein | structure is an incredible scientific achievement - however I am | curious if anyone here who is better informed has takes on the | following. | | 1. How likely is it that alphafold learned to accurately predict | protein structure in the narrow domain of proteins that have been | experimentally synthesized and whose structure has been measured? | in other words will AlphaFold's results generalize to proteins | which cannot yet be synthesized in the laboratory. | | 2. If Alphafold's accuracy holds, what type of commercial | applications does this open up? | nharada wrote: | From the abstract[1]: | | > After decades of effort, 17% of the total residues in human | protein sequences are covered by an experimentally-determined | structure. Here we dramatically expand structural coverage by | applying the state-of-the-art machine learning method, | AlphaFold2, at scale to almost the entire human proteome (98.5% | of human proteins). | | [1] https://www.nature.com/articles/s41586-021-03828-1 | vmception wrote: | Basically they are saying that decades of distributed protein | folding was useless and everyone would have had more utility | mining cryptocurrency if it existed several years earlier | | But at least it inspired someone to make and release this | dekhn wrote: | you're conflating two different disciplines: distributed | protein folding studies the biophysical process of proteins | folding over time, while protein structure prediction makes a | static single predict of what is believed to be the final | structure adopted by the protein in the folding process. | | I think many people believe that given infinite computer time | the protein folding simulations would produce the same output | as the static prediction (modulo a number of complex details) | but use far, far more computer time to get there. | | The fundamental observation from the DM AF2 paper that I've | been able to glean (which I kind of sort of already believed) | is that careful multiple sequence alignments of 30-100 | evolutionarily related proteins is enough to produce coarse | distance constraints that can be used to guide a structure | prediction to a good answer quickly. And that depended on new | ML technology that didn't exist before. | vmception wrote: | thanks for that explanation! | cing wrote: | Just in case you're not joking, it's worth noting that the | majority of distributed molecular simulation (past and | present) is spent studying "folded proteins" to discover | structures of proteins that are often hidden from methods | like AlphaFold (currently). For example, | https://www.nature.com/articles/s41557-021-00707-0 | dmitryminkovsky wrote: | > experimentally-determined structure | | refers to structures determined by means of physical | examination, with like crystallography, not to attempts at | predictive computational analysis prior to AlphaFold, which | were not accurate compared to AlphaFold. | ramraj07 wrote: | I don't know if you know, but doctors spent 1,300 YEARS using | the wrong anatomy book. A few years and compute time isnt the | end of the world. I'm sure oracle's DB2 test suite has burned | more carbon than protein folding labs have. | Jabbles wrote: | A third way in which you are wrong is that AlphaFold derives | a lot of its power by referring to previously-solved protein | structures, or parts of them. It doesn't fold the proteins | from scratch in an "alpha-zero" way. | vmception wrote: | so its more like protein folding _was_ useless until an AI | could make sense of the 17% solved variations and using | that for the other 83% of proteins found in humans? | | > After decades of effort, 17% of the total residues in | human protein sequences are covered by an experimentally- | determined structure. Here we dramatically expand | structural coverage by applying the state-of-the-art | machine learning method, AlphaFold2, at scale to almost the | entire human proteome (98.5% of human proteins). | | I just don't actually understand the quote from the article | if it isn't comparing the same thing | _RPL5_ wrote: | This is awesome! When they announced CASP results a few months | ago, I was wondering if AlphaFold will be accessible as an API, | where you can submit a protein id or a sequence and get back a 3D | structure. This database is basically that, except it's free & | open to the public. Major props! | Ovah wrote: | Interesting that they're porting it to other organisms. Different | organisms have variations in ribosomes, post translational | modifications and even tRNA repertoire. So it's not a guarantee | that two identical DNA sequences will give identical proteins in | two different organisms. | ramraj07 wrote: | ??? Unless you jump from eukaryotes to archea these are not | real concerns. Most PTM markers are very conserved. | Ovah wrote: | I'd say the jump from eukaryotes to procaryotes is a | realistic scenario in recombinant DNA technology. | | I have some experience with recombinant yeast and PTMs. | Degree of glycosylation actually vary a lot depending on | strain used and has a huge effect of protein activity. And of | course these PTMs affects the crystal structure. | pelorat wrote: | Shouldn't matter? Protein folding is based on the laws of | physics after all. If DNA sequences folds differently in | different organisms then an external factor is missing. | Ovah wrote: | While the laws of physics remain the same, the folding | machinery between species varies to some degree. Protein | folding is determined by the unique environment/machinery of | a cell. A concrete example is disulphide bonds (S-S, ex | cystein-cystein) that require a certain pH to form. The | primary pathways of disulphide-bond formation are localized | in the endoplasmic reticulum (ER) of eukaryotic cells and the | periplasmic space of prokaryotic cells. So two complete | different mechanisms to end up with the same bond (protein | structure) depending on the organism. | dnautics wrote: | Outside of missing post translational modifications, can | you give a concrete example of a protein that is known to | fold differently in different species, not counting, say, | stuff getting sent to the garbage bin of inclusion bodies | due to the stress of overexpression? My understanding (7 | years of grad school researching protein folding in the ER) | is that outside of some rare corner and disease state | cases, folding is pretty much binary event, and if it | weren't for most cases the low delta g difference between | isoforms would be just as easily overcome over the course | of environmental changes in a single individual as "between | different species" namely having a deterministic outcome is | important for through-time robustness. | ramraj07 wrote: | As an ex biomedical researcher I was trying to think what protein | I should enter and see, and couldn't come up with a protein that | I know of, that didn't have a structure already (at least a crude | one). That is, we roughly know how most known important proteins | look like. This is an amazing tool, and will he indispensable in | labs (I'll expect any lab to use this site at least once a year?) | But it's not as transformative as some might think. | amelius wrote: | https://www.embl.org/news/science/alphafold-potential-impact... | | > A discussion of the applications that AlphaFold DB may enable | and the possible impact of the resource on science and society | pelorat wrote: | Do we really know the structure of every protein that assembles | into a human cell? | seventytwo wrote: | Definitely not. | cing wrote: | One of the reasons we don't have them all is that individual | genes can encode for multiple protein isoforms through | alternative splicing. AlphaFold was only run on one. | Otherwise, there's lots of important biochemical/biophysical | processes that impact structure, as cells are only about 50% | protein by weight. | _RPL5_ wrote: | From their abstract: | | --- | | After decades of effort, 17% of the total residues in human | protein sequences are covered by an experimentally-determined | structure1. Here we dramatically expand structural coverage | by applying the state-of-the-art machine learning method, | AlphaFold2, at scale to almost the entire human proteome | (98.5% of human proteins). The resulting dataset covers 58% | of residues with a confident prediction, of which a subset | (36% of all residues) have very high confidence. | | https://www.nature.com/articles/s41586-021-03828-1 | | --- | | The metric they use (residues) is a bit unusual (I would have | used number of proteins instead), but I assume they wanted to | account for ambiguity (such as proteins with partial | structures). | narrator wrote: | Gain of function researchers working for the world's militaries | will use this research to figure out how to get viruses to attach | to receptor sites peculiar to particular races. The people | developing the antivirals will have a lot harder time countering | these weapons because making antivirals that aren't poisonous in | some weird way is a much harder job. If this is not the case, | please let me know why, it will really help me sleep better at | night. | | A U.S congressional representative came out of a classified | briefing recently and announced that the CCP is hard at work on | race specific bioweapons.[1] | | Unfortunately, I think this is the launch of a new era of weapons | we're seeing right now. The biggest development in war since the | atom bomb. Like the atom bomb, the big question was will we kill | ourselves with this technology. Who knows? | | [1]https://yournews.com/2021/07/22/2185645/rep-marjorie- | taylor-... | drcode wrote: | ...and many doctors will use it to attach pharmaceuticals to | receptor sites of particular cancers. | narrator wrote: | I'm thinking that the problem is is that it is much harder to | develop drugs that only kill cancers very efficiently and | don't harm the rest of the body than to tweak viruses that | just have to keep the person alive long enough to spread the | virus. | drcode wrote: | I 100% agree your point is valid. The counterargument is | "Yes, people can do bad things with protein data, just as | they can do bad things with a telephone, like use it to | discuss a bank robbery." | narrator wrote: | The crazy part is a bioweapons program is really cheap | compared to a nuclear weapons program, and now with these | new tools it's even cheaper. Before, it was vastly more | expensive to do the cycle of creating a new viral protein | and testing a bioweapon on human cell culture. Now that | process is speeded up millions of times with this | technology because that can all take place inside a | computer. | | This is similar to the change with drone weaponry. Before, | you had to have large cruise missiles to get pinpoint | strikes. Now small countries like Azerbaijan can buy a | whole fleet of drone weapons and get the benefits of having | a modern air force with pinpoint strikes and even stealth | for vastly less money. | mlyle wrote: | Is this a correct summary of your statements: | | Because it -might- make things slightly easier for a | state actor with nigh-unlimited resources to enact a | doomsday scenario, which they might or might not be | pursuing, medical researchers should not publish | otherwise helpful research? | narrator wrote: | I think it's great that the Wuhan institute published all | their gain of function research. They even said who paid | for it. It's a clear trail back to them, but apparently | taking any action to acknowledge that this is a bad thing | and something fishy might be going on is a completely | politicized issue now that apparently gets as many | downvotes as arguing about hot button political topics | now. | | What I'm saying is there should at least be an open and | frank discussion of what the whole world is getting | itself into right now with all this. | ravila4 wrote: | 1. Gain of function is not as easy as you think. 2. Such bio- | weapons are not likely because any virus released in the wild | will mutate over time, and also because you cannot target | "races" in the way you describe. Phenotypic traits span across | geographical borders, and any attempt to do such a thing is | likely to backfire. | narrator wrote: | I think if the CCP were successful in creating race targeted | bioweapons it would be in their interest to convince the | world that they didn't exist. | | Insults, character assassination campaigns and politicizing | the existence of these bioweapons would be a good way to do | that. Just copy paste some of the comments here and change | the name to insult anyone who thinks they exist. They could | then go and kill millions and not receive any retaliation | whatsoever with people praising them for their effective | program of keeping the disease epidemic they created under | control. Even if you got the guy who discovered AIDS and won | the Nobel prize for it to say that these were gain of | function viruses that incorporated HIV protein parts, you | could just launch a big propaganda campaign to attack his | character.[1] Much cheaper than having to fight a war. | | [1]https://www.gmanetwork.com/news/scitech/science/736458/fre | nc... | moistly wrote: | > _Ridiculous fookin' idjit and compulsive liar M.T.Greene_ | came out of a classified briefing recently and announced that | the CCP is hard at work on race specific bioweapons. | | Fixed that for you. By the way, you shouldn't pay attention to | that clown. | jkh1 wrote: | Didn't see this post so posted it also. Also relevant: | https://www.embl.org/news/science/alphafold-potential-impact... | pelorat wrote: | There's a lot of news about AlphaFold lately but what about | Rossettafold? Wasn't it more accurate and much faster? | creddit wrote: | I believe slightly less accurate but significantly faster is | where it stands. | pelorat wrote: | Running a sequence against both seems like a good idea. If | they agree the certainty will go way up. | visarga wrote: | Citation factory, that's what it is. | abcc8 wrote: | Resources as useful as this are bound to be. We do cite our | sources after all. | stephanheijl wrote: | I'm impressed and grateful that DeepMind released this resource, | this will save a lot of compute from labs trying to replicate an | entire exome for themselves. While some structures look great, | there are still some misses here. Important structures like BRCA1 | (a well-studied breast cancer associated protein) are just | structures for the BRCT and RING domains surrounded by a low- | confidence string of amino acids, likely shaped to be globular: | https://alphafold.ebi.ac.uk/entry/P38398 | | Maybe I was wrong for expecting the impossible here, but I was | excited to see this specific structure and it appears that there | is still work to do. Nevertheless, kudos to Deepmind on their | amazing achievement and contributions to the field! | maga wrote: | A curious non-biologist here: how valuable are these low | confidence predictions for biologists? In other words, is it | hard to predict but easy to check situation as with, say, prime | numbers in mathematics? | toufka wrote: | The medium-confidence predictions are great for grounding or | sourcing intuition. If you're trying to divide up a protein | for an experiment and you have to choose where to divy it up | - you'd like to use even a bad prediction to help weight an | otherwise completely random approach. AND there are great | methods to help with this, but they're often custom, time- | consuming, and out-of-field for most. So being able to very | quickly spot-check using a uniform state-of-the art, for any | arbitrary protein, makes it actually pretty useful for | certain kinds of pre-experimental guidance. | devindotcom wrote: | Some are valuable for the reasons the other person responding | noted, but some of the low confidence predictions may also be | high confidence predictions of a disordered class of protein | that doesn't have a standard rest state. So it's useful work | one way or the other. | cing wrote: | Everything between the BRCT and RING domains of BRCA1 is an | intrinsically unstructured region which DeepMind correctly | predicts, https://pubmed.ncbi.nlm.nih.gov/15571721/ | | Another famous one would be R-domain of CFTR, which was not | resolved in experimental structure determination, and AlphaFold | models correctly show disorder there. Nothing to be done in | those cases except perform molecular simulation or other | experiments to assess dynamic ensembles, | https://alphafold.ebi.ac.uk/entry/P13569 ___________________________________________________________________ (page generated 2021-07-22 23:00 UTC)