[HN Gopher] Dozens of scientific journals have vanished from the... ___________________________________________________________________ Dozens of scientific journals have vanished from the internet Author : bookofjoe Score : 298 points Date : 2020-09-09 16:19 UTC (6 hours ago) (HTM) web link (www.sciencemag.org) (TXT) w3m dump (www.sciencemag.org) | ramshorns wrote: | > The authors defined a vanished journal as one that published at | least one complete volume as immediate OA, and less than 50% of | its content is now available for free online. | | Well, this exact definition could have some false positives, like | a journal that publishes every third volume as complete open | access and keeps the others behind a paywall. But I'm sure they | were a bit more careful than it says here. | l_matthia wrote: | Yes. We checked that all journals were full OA journals (so | nothing like the scenario you just described here). So the | timeline looked like this: OA journal was actively publishing > | then became inactive, but the content on the journal website | was still accessible > and eventually the website and the | content disappeared/became inaccessible. | | In some cases, we found websites (other than the original | journal website) that now host some individual issues, but not | all of the content. | bnewbold wrote: | At the Internet Archive, we are working on this exact problem, | and have been in communication with the pre-print's authors. We | have built open infrastructure (open source, open data) tracking | "preservation coverage", for example: | | https://fatcat.wiki/coverage/search?q=is_oa%3Atrue+year%3A%3... | | and are working to improve crawling. There is a "save paper now" | feature, as well as an API for bots. Organizations like DOAJ, | ISSN, DOI registrars (Crossref, Datacite, others) are crucial for | this. In the broader ecosystem, we hope this can complement | existing efforts that partner with large publishers (like LOCKSS, | Portico, JSTOR) and institutional repositories. A natural niche | for us is web-native (HTML) content, which we have crawled a lot | of but are just getting started to index. For example, | publications like d-lib, first monday, and distill.pub. | | If folks want to help, it would be great to have a "youtube-dl | for open access papers". There is a lot of content on large | platforms and publishers which have anti-crawling measures (even | for gold OA and hybrid content!), as well as a long tail of small | publishers that don't use simple/common mechanisms like OAI-PMH | and the `citation_pdf_url` HTML meta tag to identify fulltext | content. The OAI-PMH ecosystem sadly is not very complete or | helpful for the use case of mirroring. | pintxo wrote: | The predigital and national solution are laws requiring a a copy | to be sent to the national library. | | What's the digital, and post-national solution? | ghaff wrote: | In the US, the mandatory deposit requirement likely worked | pretty well with traditional book/magazine publishers and music | labels. Outside of that, I expect a huge amount slipped through | the cracks. There's no real enforcement AFAIK and I imagine | most who are independently publishing or otherwise working | outside of conventional channels don't deposit. | jumelles wrote: | It's not mandatory and not even a requirement for copyright | protection. | ghaff wrote: | It's independent of copyright registration. | | It is, as far as I can tell, mandatory in theory but not in | practice. https://www.copyright.gov/help/faq/mandatory_depo | sit.html#:~.... | Thlom wrote: | In Norway you are required to allow the national library to | archive your websites, but I assume there's a lot that's | slipping through the cracks ... | heldergg wrote: | Yes, the same applies to newspapers and magazines online. We | need a law demanding a legal digital deposit at least at the | national level. | | It is somewhat trivial to devise an API to be integrated in the | publications pipelines to automatically and transparently | submit new and modified articles to a central repository. | PeterisP wrote: | We are not (yet?) living in a post-national world, but the | simple digital solution is to require a digital copy to be sent | to the national library, as some nations have done; since | national libraries in any case are now all heavily working on | digitizing their pre-digital assets and making them available | online. | panic wrote: | Legalize Sci-Hub. | Jerry2 wrote: | This article reminded me to donate to Sci-Hub again. I feel like | donating to various archives is some of the best use for my | monthly donations budget. | dmix wrote: | archive.is always seems to be struggling to stay online, they | are well deserving too. It's a thankless job trying to work | around all of the pushback against archiving, like copyright | and whatnot. | | Having archives is so important in the legal field and plenty | of areas of research. | Jerry2 wrote: | > archive.is always seems to be struggling to stay online | | Very good point. I will donate to them too. I've been | donating to Archive.org for a long time but I use Archive.is | more often these days so they deserve some love too. | | I have a list of places where I make donations to on my | profile page here. | homarp wrote: | The article does not discuss sci-hub unfortunately. | | list of the 176 vanished is here: | https://github.com/njahn82/vanished_journals/blob/master/Dis... | Aperocky wrote: | A list of 176 but it's an excel file, why? | l_matthia wrote: | Dataset is published on Zenodo (as .csv). | https://zenodo.org/record/4014076#.X1kj-rexVkw | dtgriscom wrote: | Link seems to be broken: fixed version may be | https://github.com/njahn82/vanished_journals/blob/master/dat... | waynecochran wrote: | What would be really useful is to know the average citation | count of these journals. | | As someone who hates to see this stuff disappear, there is | still a cynical person inside me that knows there are a glut of | journals that are often used to bump publishing count for | professors trying to get tenure. | | That cynic inside of me also realizes that a subset of the | journal business is a bit of a scam anyway since frequently | authors have to pay the journals to include their paper and | then the journals charge an exorbitant rate to get access. Have | you tried to buy a journal article lately ($35 for one paper!) | -- yeah, neither have I. | Lagogarda wrote: | true and true. "Sci journals" are everything but science | jszymborski wrote: | It's worth noting that citation counts are an increasingly | poor metric of paper quality (and always has been). | | There are multiple works to show that the rise of search | engine like Google Scholar have meant that researchers are | increasingly citing the same papers, because their searches | are all returning the same thing. | | Meanwhile, there are some "sleeper" papers that are super | relevant to a lot of works, offer great insight, but by | virtue of their low search ranking, never get cited. | | That's not to say there isn't a fair amount of unremarkable | research. It's just that it doesn't always correlate with | citation count. | umlautae wrote: | To possibly find "sleeper" papers check out the "show | similar" feature of this arXiv mirror on condensed matter | physics. https://cond-mat.abbrivia.com Search for some | keywords and then surf by "show similar" on relevant | articles. | elcritch wrote: | Not to mention many of these OA journals were only created to | scam researchers. They'd create a fake front and pretend | they're a prominent journal. Can't recall them specifically | but there were a few news clips about them a few years back. | HenryKissinger wrote: | Can people not try to make a scam out of everything?? | waynecochran wrote: | Problem is that almost anything that can be exploited | eventually will be. | wrkronmiller wrote: | I'm getting a 404 at that link, now. | Jaxkr wrote: | Let's be real here: was anything of value __really __lost? Any | important work was likely cited, paraphrased, or duplicated | elsewhere. | | Anyone disagree? | amirkdv wrote: | Yes, disagree. | | 1. The importance of a piece of scholarly work need not be | immediately apparent to its contemporaries. | | 2. Being cited is a poor proxy for importance. | | 3. Work that is cited is rarely paraphrased or duplicated in a | meaningful way. | | 4. Paraphrased citations are a poor proxy for canonical source; | papers are often cited incompletely or sometimes outright | inaccurately. | | 5. These are the fruits of people's labour. They spent days and | months producing them. To lose them, especially when digital | copies are so cheap, is an unnecessary disregard for said | labour. | jefft255 wrote: | I disagree, I cite important work all the time and you | definitely need the original paper. Paraphrasing isn't really | done enough to negate the need for the original paper, | otherwise what kind of lame plagiarized paper are you writing? | Citations only tell you where to look, and if that article is | gone then you're screwed. | | What do you mean by "duplicated elsewhere"? Sitting on some | scientist's hard drive doesn't count, it has to be | discoverable. That's what the issue is: when journals die, how | do we ensure that the papers are saved somewhere easily | searchable? | jmmcd wrote: | We don't always know what is important at the time it's | published, and there are many examples of this in the | literature. | | Meanwhile people are working hard to preserve every Commodore | 64 game. | nurbl wrote: | Part of the point of citing another article is that you don't | then have to repeat all of it. So if B is cited by A, and B is | no longer available, it's not really possible to read and | understand A either. And likely A used information in B to | justify some claim, which is now weaker. | amirkdv wrote: | I think there is a case to be made for a kind of "public utility" | infrastructure for the distribution and storage of scholarly work | given how | | 1. cheap it is, considering the size of the institutions that | produce and benefit from them. | | 2. absurdly broken the private publishing industry has become. | CydeWeys wrote: | I'm surprised that this doesn't exist already. There's many | hundreds of open access journals already, yet they're not | standardized on one internetworked system? They're all | implementing the basics of a PDF repository independently, and | poorly? Why?? | | Take arXiv and expand its scope and upload everything there. | Boom, problem effectively solved using mostly existing tools. | esfandia wrote: | Shouldn't university libraries be that public utility? | random_visitor wrote: | That exactly what Library Genesis and Sci-Hub are. Assuming, | you aren't expecting this public utility to be 100% lawful, | since the other parties involved here (universities for | instance) don't seem too keen on the idea of a having their | work circulate openly. | Ma8ee wrote: | Why would the universities mind having their work circulated | for free? They don't make any money from the current system, | but pay exorbitant fees for journal subscriptions. | marcosdumay wrote: | > other parties involved here (universities for instance) | don't seem too keen on the idea of a having their work | circulate openly | | Hum... What? | | Universities at worst don't care. Most really want they work | circulating and will do a lot of things to get it (many | useless things that miss the point, but well, that's how | people are). | | Universities could push it harder. But they are surely | pushing on the correct direction. | Lammy wrote: | > Universities at worst don't care. | | I wonder how aaronsw would feel about this statement. | [deleted] | afandian wrote: | There are archiving schemes in scholarly publishing such as | LOCKSS and CLOCKSS. Not saying that they apply in this case, but | YSK. | | https://en.m.wikipedia.org/wiki/LOCKSS | cycomanic wrote: | I can't really say this is a bad thing. The number of journals | has so massively exploded over the last 20 years that it is | pretty much impossible to follow all the literature anymore. I'm | not even counting the predatory OA journals (which I think is the | majority in the list) but just looking at the big societies and | publisher creating ever more journals. | mensetmanusman wrote: | On the bright side, 50% of the vanished content was not | reproducible. | guerby wrote: | In France we have HAL: | | https://en.wikipedia.org/wiki/Hyper_Articles_en_Ligne | | "Hyper Articles en Ligne, generally shortened to HAL, is an open | archive where authors can deposit scholarly documents from all | academic fields." | | I work at a university and I know library people and management | check carefully that every paper we produce is deposited in HAL. | | Also: | | https://fr.wikipedia.org/wiki/Hyper_articles_en_ligne | | "Depuis le 25 septembre 2018, les depots de logiciels sur HAL | sont connectes a Software Heritage" | | https://en.wikipedia.org/wiki/Software_Heritage | | For recruiting some french institutions like CNRS will only | consider papers deposited in HAL when doing the evaluation. | BelleOfTheBall wrote: | I'm astonished there's not an equivalent for this literally | everywhere. Even if some articles seem not terribly important | at the moment, as progress marches on many could become | relevant again and losing access to those is simply | inexcusable. | mattkrause wrote: | There (mostly) is. | | Work funded by the NIH needs to end up in PubMed Central | within a year of publication. The NSF and DoE have similar | policies. Unclassified DoD-funded work also needs to end up | in the Defense Technical Information Center. All of this | flows from a 2013 memo by John Holdren/OST entitled | "Increasing Access to Results of Federally Funded Science", | and, as far as I know, it hasn't been overturned. While this | doesn't formally cover everything, it comes pretty close and | many journals now handle this automatically. | | The memo is here: https://obamawhitehouse.archives.gov/blog/2 | 016/02/22/increas... | | Canada has a similar policy for Tri-Council funded research | (here: http://www.science.gc.ca/eic/site/063.nsf/eng/h_F67654 | 65.htm...) but does not require specific repositories. | programLyrique wrote: | Actually, although it is mainly directed to French researchers, | nothing prevents researchers not based in France to deposit | their papers there. | Thlom wrote: | In Norway any publication needs to be deposited to the national | archive by law. That includes scientific journals and in theory | even small publications distributed in a private setting if | it's a big enough group of people (I'm not sure of the | details). Not sure how it works for scientific work published | in foreign publications, but I assume it's sent to the national | archive as routine. | | However, most of the archive is not publicly accessible due to | copyright, privacy etc. You can request access to specific | content both as a private person and as a researcher. | acomjean wrote: | the US for medical/biology papers "The National Center for | Biotechnology Information" NCBI. They store papers/ abstracts | in a service called pubmed, | | https://pubmed.ncbi.nlm.nih.gov | | It works pretty well. Papers are submitted. The get a unique | id. Some are stored and accessible. (Some US funding sources | require public accessible papers.). | | NCBI has a ton of information. Its a pretty awesome resource. | https://www.ncbi.nlm.nih.gov | | They even index the paper submitted with a controlled | vocabulary of terms (within a month or two) | https://www.ncbi.nlm.nih.gov/mesh/ | mattkrause wrote: | Nearly _all_ federally-funded (and unclassified) research | needs to be publicly accessible, as per a 2013 policy memo. | thaumasiotes wrote: | In the US, I believe the Library of Congress fills a similar | role. | vram22 wrote: | IIRC, some years back I read that the Library even had a | full archive of all of Twitter, up to that date at least. | at-fates-hands wrote: | They changed their policy back in 2017: | | However, almost 12 years of tweets is still very cool. A | LOT of startups probably had quite a bit of material in | those early days of social media. | | _But today, the institution announced it will no longer | archive every one of our status updates, opinion threads, | and "big if true"s. As of Jan. 1, the library will only | acquire tweets "on a very selective basis."_ | | _The library says it began archiving tweets "for the | same reason it collects other materials -- to acquire and | preserve a record of knowledge and creativity for | Congress and the American people." The archive stretches | back to Twitter's beginning, in 2006._ | | _The institution says it will continue to preserve its | collection of tweets from the platform 's first 12 years, | but indicates that it has yet to figure out exactly how | to make the archive public._ | | https://www.npr.org/sections/thetwo- | way/2017/12/26/573609499... | bnewbold wrote: | In Latin America, the SciELO network has been very successful | at providing shared, low-cost, stable infrastructure for | digital journal hosting using state funding: | https://en.wikipedia.org/wiki/SciELO | [deleted] | jmmcd wrote: | Yes, and of course we have arXiv and friends, and sci-hub, and | researchers' and institutions' own pages. | | But all this misses the point a little -- it is not just the | articles that should be preserved, but the journal itself, as a | collection of articles with metadata (including the fact that | it was collected in the journal), records of editorial boards, | editorial articles, etc. | hwbehrens wrote: | A similar service, arXiv[0], is used in other fields as well. | In fact, the parent study that we're discussing was itself | found _on_ arXiv. | | However, arXiv is (or aims to be) a supra-national | organization. Do you think it is preferable to have an | international standard repository for this knowledge like | arXiv, a network of federated, national systems such as HAL, or | both? | | ArXiv has historically sometimes found itself hard-up for | funding, so I think that a valid argument could be made for | both approaches. | | [0]: https://arxiv.org/ | guerby wrote: | HAL proposes to authors automatic transfer to a few open | archives: | | "Transfert automatique des documents vers une archive ouverte | internationale telle qu'ArXiv ou Pubmed Central " | toxik wrote: | FYI, arxiv is for the pre-prints, i.e. the papers as they are | before peer review and publication in a proper journal. IEEE | and friends hold the copyright to the articles after that, | and generally do _not_ want you to publish "their" version. | | Which is strange, as you paid them to publish your article, | that was likely paid for by state funding -- aka tax money. | Ah, the academic racket is so beautiful. | tasogare wrote: | > For recruiting some french institutions like CNRS will only | consider papers deposited in HAL when doing the evaluation. | | While I like HAL in general as a consumer, that policy is | terrible if true for people like me who started their academic | career abroad. One more reason not to go back, I guess. | programLyrique wrote: | You can add all articles afterwards (even though it can be | tedious). | | There is actually a tool that is supported by HAL that makes | it easier to quickly add all your publications by just giving | your name: https://dissem.in/ | | For instance, it automatically fills in all the metadata for | your publications. | tgflynn wrote: | I would be very surprised if there wasn't some exception for | papers published abroad. Otherwise they would be potentially | ignoring a lot of information about candidates. | [deleted] | physicsguy wrote: | In the UK we have to deposit with the institution you were | working/studying at. It's a bit annoying as there's no central | place to deposit. | LunaSea wrote: | French people using unknown french standards only, news at | 20h00. | waihtis wrote: | Your quote prompted me to read the Minitel wikipedia entry, | and apparently there was still 10 million monthly connections | on it in 2009. Unfortunately it was retired in 2012; would | have been interesting to see how it would fare today. | dddddaviddddd wrote: | Such as the metric system | hpfr wrote: | Ironically, the French time format is best in my opinion, | because some operating systems and services can't handle | colons in file names. ISO 8601 allows for Thhmmss.sss which | can be represented in filenames, but I'd rather use something | like 20h33m02.345 because it's much easier to read at a | glance than T203302.345, which looks like one decimal number. | OJFord wrote: | It appeals to me as an EE (by training) too. I've | occasionally slipped a 'PS2k3' or similar and had to | explain... | guerby wrote: | protocol and metadata in HAL follows OAI-PMH: Open Archives | Initiative - Protocol Metadata Harvesting | | https://www.openarchives.org/pmh/ | | With a very long list of institutions following the same | standard. | peter303 wrote: | Library of Congress should preserve them. | Kednicma wrote: | This sounds like apologia from a big closed publisher (AAAS) | explaining why open-access is supposedly bad. See, sometimes | open-access journals fold, and when that happens, nobody knows | what happens to the articles. But they'd like you to ignore two | inconvenient facts: First, that traditional closed publishers | effectively lose _all_ articles by default by this metric! And | second, that the Internet Archive, itself open-access, was | essential to conducting the study in the first place! | isido wrote: | I have been involved in the same projects furthering open | access within Finnish universities as the corresponding author | has, and I think the aim of the study is not make OA look bad, | but to make it better by finding its shortcomings and then | fixing them. | l_matthia wrote: | Exactly. We don't see OA as the problem. OA solves many | issues that exist with traditional publishing and also makes | it easier to preserve content in the first place. The problem | lies with decreasing library budgets, rising subscription | prices, and that preservation services often are not suitable | for smaller OA journals. | | LOCKSS provides a free option for publishers to join, but | only accepts a limited number of OA publishers | (https://www.lockss.org/use-lockss/publishers). A couple | years ago the PKP launched their preservation service, which | we're really excited about as it also offers free | preservation (for OJS journals) and would help esp. those | smaller journals that otherwise couldn't afford to enroll | into preservation schemes. | mturmon wrote: | AAAS is a non-profit, and _Science_ subscriptions are really | cheap -- like US$100 for a year of a weekly magazine. They are | not money-grubbers, and they are very distinct from Springer, | Elsevier, and Nature in this regard. | | I think they legit view preservation of the scientific record | as within their provenance to cover. | mordae wrote: | This. | jmmcd wrote: | Counter-point: we all know that "free" internet services are | flaky. Eg many people prefer to pay for email, for reliability. | This article introduces the perspective that OA journals are a | bit like ad-supported email. | | I don't really agree that traditional publishers lose all | articles by default by this metric. I think there is some value | in a reliable record of the journal itself, as more than the | sum of its parts. | | Still, my preferred solution for all of this is like JMLR -- | very low-cost and open access, and it has reliability by virtue | of association with a top university, and prestige by virtue of | its editorial board (which becomes self-fulfilling). | jp1016 wrote: | read an article about waybackmachine , archive.org on hn few days | back, will there be a copy on it ? | l_matthia wrote: | We found that in some cases, some of the published articles are | archived through the Internet Archive! (Yay!) Unfortunately, | this doesn't amount to complete issues/volumes and seems to be | by chance. | toomuchtodo wrote: | If its in SciHub, it's in Archive.org, just not accessible. | jrochkind1 wrote: | Say more? All of scihub is mirrored at archive.org? Where do | I find out more about this? | thomasballinger wrote: | A related project is fatcat.wiki | dan-robertson wrote: | I feel like a lot of this article is trying to make comparisons | between online only open access journals and traditional closed | publishers. However the paper the article is based on does not | collect any data about the latter and so there isn't any real | comparison to make. | | I don't think the solution is to move back towards the old model. | There are already lots of initiatives towards creating online | archives of academic work that may be piggybacked on. In | mathematics, perhaps the easiest way to set up an open access | journal is as an arxiv overlay journal where at the most basic | level each issue of the journal is a list of links to specific | versions of papers on the arxiv. This would be likely to be | archived sufficiently well. | | For a traditional journal that shuts down to be archived, lots of | things need to happen: | | 1. Some library needs to pay some exorbitant fee to get physical | or (permanent not saas-based) digital copies of the journal | | 2. That library needs to keep hold of that copy for the 100 years | or so until copyright expires | | 3. That library then needs to take the initiative to make its | copies available | | This seems like a harder process than finding some public domain | digital copy. And for a lot of journals, the only reason the | library gets copies is due to the bundling systems which | universities hate. | | I'm curious to know more about these journals which did vanish, | and what sort of quality they are. If a predatory journal offers | open access and later disappears, would they be counted? | rektide wrote: | Welcome to our new dark-aged ultra-tech future. | dang wrote: | Please don't post unsubstantive comments here. | SamLicious wrote: | Yeah, same for wikipedia | aksss wrote: | This isn't just a problem with scientific journals, but also | niche research/enthusiast journals. When a magazine goes under, | and the copyright holder is of a murky/unknown status, still too | dangerous to digitize and make available, which is a shame. ___________________________________________________________________ (page generated 2020-09-09 23:00 UTC)