[HN Gopher] The arXiv of the future will not look like the arXiv ___________________________________________________________________ The arXiv of the future will not look like the arXiv Author : dginev Score : 71 points Date : 2022-03-28 19:14 UTC (3 hours ago) (HTM) web link (ar5iv.labs.arxiv.org) (TXT) w3m dump (ar5iv.labs.arxiv.org) | gsvclass wrote: | At https://42papers.com/ we want to get more folks reading papers | our focus in on surfacing papers from arXiv that our community | would appreciate so we focus on trending papers, improving | readability, etc | wolverine876 wrote: | > the PDF is not a format fit for sharing, discussing, and | reading on the web. PDFs are (mostly) static, 2-dimensional and | non-actionable objects. It is not a stretch to say that a PDF is | merely a digital photograph of a piece of paper. | | It is too far a stretch, murdering the poor subject: | | PDFs are the best format available for long-term information, | such as research papers. They have the advantages of digital | data: Searchable, copy-able, transmittable, and data is | extractable. They are also an open format, don't rely on a | central service to be available, and they preserve presentation | across platforms. They have metadata, and are annotatable and | reviewable. And the PDF format is the best for long-term | preservation, carefully designed to be readable in 50 years - | partly because they preserve presentation across platforms - and | that includes the metadata, annotations, and reviews. | | PDFs _are_ like paper in that they will look the same 50 years | from now as they do today, unlike (almost?) any other digital | format. | | Yes, I wish they were a bit more dynamic in layout, and that the | text was more cleanly extracted. | chaxor wrote: | One thing I would love to see from the arxiv sites is a publicly | available download of an SQLite database _. They have a bunch of | PDFs, and latex source - but the real killer would be a database | with just the text for each section, and then the ability to_ | generate* the pdf, using various different styles. This would | save an enormous amount of space, and make things far more tidy. | I suppose the images could be stored in the SQLite as blobs, but | there's probably a better way with vector dbs or something. | | That's what the future will probably look like. With the SQLite | decentralized on IPFS or torrent, where only queries get stored | on each computer, making more popular queries faster to load | (more peers). | | *(or maybe an archive of a tons of zstd parquets for each table? | - Not sure what the best way to organize several tables in | parquet is yet) | enriquto wrote: | > This would save an enormous amount of space, and make things | far more tidy. | | Why? The output pdf is typically smaller than the input that | produces it. Using rendered pdfs seems simple and very natural, | and at worst can use twice the total amount of space. | azangru wrote: | > The arXiv of the future is format-neutral and separates format | from content. | | Didn't this use to be Latex's tagline? Separate format from | content. Which the authors of the article don't find separate | enough. | | How does the proper separation of format from content even work? | Don't you need to markup your content in order for it to become | formatted? | ssivark wrote: | It's fairly well separated from the perspective of being able | to write content fairly agnostic of a presentation template, | and then swap in the required publication template in the end | (with a few cosmetic tweaks very occasionally). | | But LaTeX is largely an extension of TeX, and these markup | languages seem not very amenable to re-implementing parsing / | automated processing (given numerous attempts that have | resulted in stalemates). | lazyjeff wrote: | I read this article the other day, "There are four schools of | thought on reforming peer review" [1] about how there's four | schools of thought about how to reform publishing and peer | review. Each of them independently are fairly well received and | makes sense in itself, at least among my academic circles. | However, there are tensions between them, so it's hard to come up | with a solution that's universally satisfying to even the | majority of stakeholders. | | This article about ArXiv is clearly in the "Democracy and | Transparency school" as categorized article, but it doesn't yet | address the other three camps. The arxiv article proposes | machine-readable semantics, easier sharing and discoverability, | papers + supplementary materials + reviews all open; this floods | the world with even more publications with varying quality, so | it's even harder to identify good quality work; and when things | can be more easily aggregated by machines and measured with the | alternative metrics proposed, it often leads to a more powerful | winner-takes-all system that can be gamed (there's now a subtle | game of increasing citations that appear on Google Scholar); | finally, with an increase in submissions and materials that go | along with submissions, it puts an even greater strain on the | review system. These problems are not unsolvable, but almost | every idea I've seen proposed so far has only been in a single | camp, and there's side effects that harm the goals of the other | three camps. So I'd love to see more ideas that balance the | interests of all four camps that want to reform peer review and | publishing. | | [1]: | https://blogs.lse.ac.uk/impactofsocialsciences/2022/03/24/th... | curiousgal wrote: | I don't usually read long articles on my phone but the design of | that page on my Pixel 6 was just so perfect! I hope this becomes | the norm! | periheli0n wrote: | This is precisely their point. Reading the usual Arxiv-PDF on a | phone is a pain, even if you just want to glance at some key | parts of the text. Their version is much, much better. It's | self-promotion by the Authorea team on the platform they are | competing with (ArXiv), but they have a point. | | Arxiv needs to go HTML. | stncls wrote: | But the article link is arXiv's own (admittedly experimental) | HTML5 viewer!! And your parent comment is praising it. | bee_rider wrote: | This seems... ambitious. | | I think ArXiv (edit: _Actually this is not by ArXiv, but some | other group_ ) is drastically over-estimating the desire to | submit papers to their service. They are popular because they | host the documents you were going to produce, in the format that | the journals expect. The production of a Arxiv appropriate | document is a side effect of the actual job, which is writing a | paper to submit to a journal (hey, I'm as unhappy as you are that | this is the actual job, but everyone hates publish-or-perish, if | it could be overthrown it would have been). | | "Getting academics to act in a way that is not directly in their | self-interest because they just love sharing information" is a | usually a pretty safe bet, but I think this would be a bit too | far. Unless ArXiv can somehow get journals to expect their format | (good luck!) I think this is going to be hard. | stncls wrote: | The article is not at all by the arXiv people. This is just a | paper submitted to arXiv (about arXiv). The confusion is | understandable, because the link is to arXiv's experimental | HTML5 viewer, not the usual format (which would be: | https://arxiv.org/abs/1709.07020). | | The authors are from Authorea.com, a for-profit that wants to | replace arXiv. | | Edit: Aside from that, fully agree with you. Good luck to them. | bee_rider wrote: | Ah, thanks for the correction, that really changes things! | 0lmer wrote: | I'm still wandering about a service that would be to arXiv what | Github became to Sourceforge. Order of magnitude improvement of | collaboration and interconnection between published materials. | tempnow987 wrote: | "sharing research via PDF must inevitably come to an end." | | Maybe instead of using the obsolete toolset arxiv provides, they | could host their groundbreaking research on their own platform? | The combination of ground breaking features and insightful | commentary would draw users? | | Actually, many of the negatives they list are positives in my | book. The latex barrier screens out a ton of garbage in my view - | I'm on some social science / word based research lists, and the | quality of stuff is mind bogglingly bad. | | Getting stuff it fit into a PDF (instead of the NY times new | scrollable story stuff) makes grabbing or print off or even | reading easy - less dynamic is good in my book. | kkfx wrote: | A small proposal: why not a PopcornTime of papers? Witch means a | distributed network (no matter if BitTorrent, ZeroNet, GNUNet, | I2P or something else) to publish? That's the best freedom | guarantee and just the mere number of nodes with a paper is a | good metric about it's popularity, to avoid oblivion each | uni/researcher can easily store and serve their own papers | forever: files are small, so download is quick, not much | resources are needed. | PeterisP wrote: | What problems does it solve for the authors? The features you | describe above don't seem a problem in the current solutions; | freedom and availability is a non-issue for authors, "to avoid | oblivion each uni/researcher can easily store and serve their | own papers forever" is a flaw not a feature (there are already | far too many ways to do that, which only add extra burden to | the authors if they want to "be everywhere" for the sake of | availability), it doesn't seem that it would be easier than the | current way; the resources/effort needed would be small but | non-zero, so it sounds like just an extra annoyance, not | something beneficial. | | And if it solves some problems for someone else but not the | authors, then how would a comprehensive majority of papers | enter the system? Papers are even less interchangeable than | movies; if you want to have a particular movie and it isn't | available on PopcornTime, you might watch something else, for | papers you just have to go elsewhere that actually does have | everything. | wcerfgba wrote: | Readers may find the Octopus project interesting: | | > Designed to replace journals and papers as the place to | establish priority and record your work in full detail, Octopus | is free to use and publishes all kinds of scientific work, | whether it is a hypothesis, a method, data, an analysis or a peer | review. | | > Publication is instant. Peer review happens openly. All work | can be reviewed and rated. | | > Your personal page records everything you do and how it is | rated by your peers. | | > Octopus encourages meritocracy, collaboration and a fast and | effective scientific process. | | > Created in partnership with the UK Reproducibility Network. | | https://science-octopus.org/ | akvadrako wrote: | It's fascinating to imagine what the arxiv of the future would | look like. | | I imagine all scientific publications available on a distrusted | block store, including raw emails, data and notes on a voluntary | basis. | | Stuff that could be published would include reviews, corrections | in version control fashion, and enough metadata to model | scientific progress. | | What this article is describing sounds reasonable but not game | changing. | stncls wrote: | The authors first list some issues with arXiv. Next, they | describe how to fix those issues. Then the good news arrives: | this improved arXiv already exists. It's called Authorea.com. All | three authors are Authorea.com employees. They do disclose it as | their affiliation. Still, this is essentially an ad written in | LaTeX. | | They correctly point out a few of the limitations of arXiv | (mostly: static LaTeX and PDFs). But I profoundly dislike the | other things they propose: | | 1. "open comments and reviews". I have no problem with open | reviews on a third-party website, but arXiv is literally a | "distribution service". It has one job and does it pretty well. I | don't want it to turn into Reddit or (worse?) ResearchGate. | | 2. "alternative metrics". Enough with the metrics already. We all | know they're destructive, at least all that have been tried so | far. I didn't even know that arXiv showed some bibliometrics | (because they are _thankfully_ hidden behind default-disabled | switches). Their proposed alternatives? "How many times a paper | has been downloaded, tweeted, or blogged." I am not joking, this | is what they propose to include in addition to citations. | Seriously??? | | PS: Just a heads-up to anyone who, like me, would be wondering | about the ar5iv.labs.arxiv.org link. The article is a regular | paper submitted to arXiv. The authors do not belong to the | organization maintaining arXiv. The usual link is: | https://arxiv.org/abs/1709.07020 | | The ar5iv.labs.arxiv.org thing is an experimental html5 paper | viewer by the arXiv people. | | Edit: typos. | jimhefferon wrote: | Thanks. It was not clear to me whether this is a white paper by | the arXiv people, or talk by external folks. | | I now see that Wikipedia says this. | | _Authorea was launched in February 2013 by co-founders Alberto | Pepe and Nathan Jenkins and scientific adviser Matteo | Cantiello, who met while working at CERN. They recognized | common difficulties in the scholarly writing and publishing | process. To address these problems, Pepe and Jenkins developed | an online, web-based editor to support real-time collaborative | writing, and sharing and execution of research data and code. | Jenkins finished the first prototype site build in less than | three weeks. | | Bootstrapping for almost two years, Pepe and Jenkins grew | Authorea by reaching out to friends and colleagues, speaking at | events and conferences, and partnering with early adopter | institutions. | | In September 2014, Authorea announced the successful closure of | a $610K round of seed funding with the New York Angels and ff | Venture Capital groups. In January 2016, Authorea closed a | $1.6M round of funding led by Lux Capital and including the | Knight Foundation and Bloomberg Beta. It later acquired the VC- | backed company The Winnower. | | In 2018 Authorea was acquired for an undisclosed amount by | Atypon (part of Wiley)._ | sdenton4 wrote: | I don't really see how a for-profit preprint service is | desirable, given the terrible track record of other for- | profit entities in academic publishing. The extra features | will be great until the gatekeeping kicks in after the first | missed funding round... | einpoklum wrote: | They lost me at suggesting that a future ArXiv should be | | > Web-native and web-first | | Absolutely not. It should be "physical paper first". Any long- | term archiving cannot rely on electrical devices for viewing | archived material. Electrical grids fail. Technology changes. | Even if ArXiv is not a print archive, the material in it must be, | first and foremost, printable in a consistent manner, and with | the authors targeting the physical printed form. Of course, one | would need to actually print ArXiv items to physically archive | them, but still. | | Now, of course archiving data is useful and important; and large | amounts of data are less appropriate for print archiving. But | that should always be secondary to the archiving on knowledge. ___________________________________________________________________ (page generated 2022-03-28 23:00 UTC)