[HN Gopher] Making the collective knowledge of chemistry open an... ___________________________________________________________________ Making the collective knowledge of chemistry open and machine actionable Author : bryanrasmussen Score : 81 points Date : 2022-06-14 20:53 UTC (3 days ago) (HTM) web link (www.nature.com) (TXT) w3m dump (www.nature.com) | pfisherman wrote: | Good luck with that...lol. The ontological / informatics space | for chemicals is a mess. | | To make the collective knowledge of chemistry open and available, | you need to represent, organize, and index it. This problem is | not as sexy, but it is orders of magnitude more important. | convolvatron wrote: | this is a huge problem. arguably one of the primary technical | reasons that 'web 2.0' was such a dud. | [deleted] | gtmitchell wrote: | Chemist here. Every few years, someone has the novel idea that we | should have open data for all chemistry laboratories, so then we | can do Better Science. And like every other proposal I've seen, | this one will get approximately zero traction because it doesn't | address any of the core issues behind why laboratory data is | currently closed. | | I try not to be too pessimistic about it, because it really would | be great if there were more open chemical data. I just really | doubt anything could accomplish that without remaking the US | university research system from top to bottom. | bjelkeman-again wrote: | What are the core issues? | mint2 wrote: | Probably dealing with enough meta data to capture the stuff | like the reaction only works because the supplier of one of | the reagents used by that lab had ppm copper impurities | gtmitchell wrote: | Off the top of my head: | | -Academic researchers are already overworked, underpaid, and | undertrained. Asking them to spend even more of their time to | meticulously upload all their notes and data to an electronic | notebook is going to be an uphill battle. | | -Academic scientists live or die by their ability to publish. | Open data, especially if you're sharing in real time, makes | you vulnerable to being scooped by competing researchers. | Even disclosures of data after the fact make it easier for | others to benefit from work you did with no benefit to the | ones who collected the data. Given how cut-throat academics | is, you're also not going to get many researchers on board | with this idea. | | -Interoperability of most laboratory software is poor. People | have been trying to get laboratory instrument manufacturers | to support open data standards for years with little success. | They don't have any financial incentive to allow competitors | to have easy access to their data. | Hellbanevil wrote: | If I was in charge of granting any federal grants; I would | demand the recipients open source the data, and upload | everything in a orderly manner. | | It would just be if you want this money do the above. | JPLeRouzic wrote: | > _Open data, especially if you 're sharing in real time, | makes you vulnerable to being scooped by competing | researchers._ | | Why did something like standards and patents didn't emerge | in the scientific world? | airstrike wrote: | No economic incentive | barry-cotter wrote: | The scientific world rewards people in glory and honor | much more than money. If you want more money go | corporate. If you want to reward people more with money | then they'll pay less attention to the glory but that's | really expensive. | BenoitP wrote: | There are initiatives in the EU to require -by law- that if | it's public research, then it must be released to the public. | And there are official guidelines on how to do so: | | https://hal.archives-ouvertes.fr/hal-03318932 | | I believe such an initiative for chemistry could very well | succeed, even if it takes 10 years. | | Hopefully this can percolate to other countries and continents | too, through EU's normative power. | elcritch wrote: | That could be very valuable. In many ways it's like material | science and parts of chemistry are skimping along on the | fumes of basic science done in the 1950's up to the 70's at | national labs. Good experimentalists made solid careers doing | core research without chasing endless grants or the latest | fads. Seems pretty much all publicity available chemical and | material databases comes from that era. Some specialty areas | have progressed way beyond that but it's rarely | systematically collected, unless you're willing and able to | pay lots of money for private databases. Those private | databases of course largely build from publicly funded | research. | | I hope this pans out. | cellis wrote: | Can someone with more knowledge of Chemistry enlighten me why | chemistry experimentation isn't the killer app for the Metaverse, | at least for low-order reactions? I know the e.g. protein folding | class of problems are prohibitively computationally expensive, | but surely there's some low hanging fruit? | photochemsyn wrote: | If you're talking about computational modeling of chemical | reactions, for example getting a computer to figure out a novel | low-cost synthesis route for an important molecule, well... | This becomes incredibly complicated very quickly. It's | generally more likely to get a result using the traditional | experimental methods, with some exceptions for very small | molecules perhaps. | | The field of physical inorganic/organic chemistry is one of the | more difficult ones to build accurate models for. A first step | is to calculate the electronic structure of products, | reactants, possible intermediaries, and this blows up fast for | even moderately complex molecules. A lot of work has been done | with simpler systems like 2 H2O -> 2 H2 + O2 but even that's | ridiculously complicated, as you have to model the catalyst and | the surrounding environment as well, and then get the kinetic | model right. The computational power required is on the | supercomputer scale, and the level of background knowledge | required is pretty high to even start to implement something | like that, for a taste see: | | https://h2awsm.org/capabilities/dft-and-ab-initio-calculatio... | | This is an area where quantum computers may have applications | (2021): | | https://www.energy.gov/science/ascr/articles/quantum-computi... | ur-whale wrote: | This kind of endeavor should be a common theme to all science, | not just chemistry. | shpongled wrote: | It's certainly a goal to work towards. However, it's pretty | difficult to build One ELN to Rule Them All given how flexible | many kinds of biological experimental designs are - especially | when you're working on the bleeding edge. | | A good first step is to require supplemental materials are | published in a machine readable format (e.g. not manually | thrown together Excel files that lack any kind of normalization | or rational schema) | ur-whale wrote: | But then there are things like GPT-3 , which means stashing | everything in a rigid schema isn't as hard-core of a | requirement as it used to be. | | OTOH, facilitating: 1. access to the raw | data 2. access to the metadata 3. access to | the source code of whatever software was used / created to | run the experiment 4. making sure everything is | computer readable (i.e. not a 256x128 graph as a PNG embedded | in a bloody PDF) | | should be a requirement for any scientific publication worth | its salt. | abraxaz wrote: | > it's pretty difficult to build One ELN to Rule Them All | given how flexible many kinds of biological experimental | designs are - especially when you're working on the bleeding | edge. | | RDF is quite flexible and using a combination of domain | specific ontologies like cheminf[1] and other top level | ontologies like BFO[2] should allow you to capture most of | the semantics. | | [1]: https://www.ebi.ac.uk/ols/ontologies/cheminf [2]: https: | //en.wikipedia.org/wiki/Basic_Formal_Ontology?wprov=sf... | apienx wrote: | "Alchemists turned into chemists when they stopped keeping | secrets." -- Eric S. Raymond | | Open Science (in the publishing sense) used to be fringe just a | decade ago. It's very much mainstream now. | | Open Data will be a much tougher (and long-term) battle, but it's | inevitable. | photochemsyn wrote: | The notion of open-source scientific discovery is a good one, but | some of the suggestions here seem very unlikely to catch much | traction, and even if they do, problems will remain. | | For example, say an academic chemical research group synthesizes | a series of novel compounds in the lab - they're not going to | just release the raw data on everything they did immediately. The | thinking might be, 'we can give this MS student this compound to | work out a better synthesis route for, or this pHD student can | try to extend the synthesis and make other compounds'. | | A more realistic scenario mentioned in the article would be to | require publication of the raw data to a database as a condition | of publication. This is already done to some extent in journals, | but materials and methods sections are notorious for leaving out | some key factor or other, meaning repeatability is an issue and | other labs will generally only try to replicate the more | interesting results (possible new antibiotic, etc.). | | This worked out fairly well with GenBank, the database of | published gene sequences, and also with the protein | crystallography databases, but everyone in the molecular biology | world knows that all sequence data is not of the same quality, | and so cross-referencing by the more reputable researchers and | reading their papers to see if their methods are transparent and | robust or not is still an important step. A database clogged with | low-quality data isn't as valuable as a more carefully curated | one, certainly. | | It would be nice though, to have a database where you could look | up everything there is to know about something like the | antibiotic ciproflaxin, including all the spectral identification | data, optimal reaction conditions, etc. - but this is also a | molecule that researchers are busy making derivatives of, likely | with the hopes of patenting some novel new knockoff and getting | an exclusive license distribution deal with a major pharma corp, | and so they won't be releasing any data, or even publishing in a | timely manner (at least not until the patent application goes | through, and maybe not even then). | | That leads to a controversial question: should research | universities and academics financed by taxpayers behave like for- | profit startups pitching to a VC outfit? | statuslover9000 wrote: | For chemical reaction prediction, see the Open Reaction Database, | a collaboration including the Coley lab at MIT (surprisingly not | cited by OP): | | Paper: https://pubs.acs.org/doi/10.1021/jacs.1c09820 | | Docs: https://docs.open-reaction- | database.org/en/latest/overview.h... | | It's an incredible effort to collate and clean this data, and | even then a substantial portion of it will not be reproducible | due to experimental variability or outright errors. | | For computational methods development it's extremely useful, | maybe even necessary, to have a substantial amount of money and | one's own lab space to collect new data and experimentally test | prospective predictions under tightly controlled conditions. The | historical data is certainly useful but is not a panacea. | mlinksva wrote: | Relatedly (and also not citing) from a couple weeks ago | https://news.ycombinator.com/item?id=31566200 Call for a Public | Open Database of All Chemical Reactions | RationPhantoms wrote: | It would be wonderful to see something like the Materials Project | (https://materialsproject.org/) but for Chemical | research/knowledge. | JPLeRouzic wrote: | Can someone in the field explain how this "machine actionnable" | would be different from Galaxy Pipeline [0], or a Chemputer [1]? | | [0] https://en.wikipedia.org/wiki/Galaxy_(computational_biology) | | [1] https://www.chem.gla.ac.uk/cronin/news/cronin-group- | builds-c... ___________________________________________________________________ (page generated 2022-06-17 23:01 UTC)