[HN Gopher] The arXiv of the future will not look like the arXiv
       ___________________________________________________________________
        
       The arXiv of the future will not look like the arXiv
        
       Author : dginev
       Score  : 71 points
       Date   : 2022-03-28 19:14 UTC (3 hours ago)
        
 (HTM) web link (ar5iv.labs.arxiv.org)
 (TXT) w3m dump (ar5iv.labs.arxiv.org)
        
       | gsvclass wrote:
       | At https://42papers.com/ we want to get more folks reading papers
       | our focus in on surfacing papers from arXiv that our community
       | would appreciate so we focus on trending papers, improving
       | readability, etc
        
       | wolverine876 wrote:
       | > the PDF is not a format fit for sharing, discussing, and
       | reading on the web. PDFs are (mostly) static, 2-dimensional and
       | non-actionable objects. It is not a stretch to say that a PDF is
       | merely a digital photograph of a piece of paper.
       | 
       | It is too far a stretch, murdering the poor subject:
       | 
       | PDFs are the best format available for long-term information,
       | such as research papers. They have the advantages of digital
       | data: Searchable, copy-able, transmittable, and data is
       | extractable. They are also an open format, don't rely on a
       | central service to be available, and they preserve presentation
       | across platforms. They have metadata, and are annotatable and
       | reviewable. And the PDF format is the best for long-term
       | preservation, carefully designed to be readable in 50 years -
       | partly because they preserve presentation across platforms - and
       | that includes the metadata, annotations, and reviews.
       | 
       | PDFs _are_ like paper in that they will look the same 50 years
       | from now as they do today, unlike (almost?) any other digital
       | format.
       | 
       | Yes, I wish they were a bit more dynamic in layout, and that the
       | text was more cleanly extracted.
        
       | chaxor wrote:
       | One thing I would love to see from the arxiv sites is a publicly
       | available download of an SQLite database _. They have a bunch of
       | PDFs, and latex source - but the real killer would be a database
       | with just the text for each section, and then the ability to_
       | generate* the pdf, using various different styles. This would
       | save an enormous amount of space, and make things far more tidy.
       | I suppose the images could be stored in the SQLite as blobs, but
       | there's probably a better way with vector dbs or something.
       | 
       | That's what the future will probably look like. With the SQLite
       | decentralized on IPFS or torrent, where only queries get stored
       | on each computer, making more popular queries faster to load
       | (more peers).
       | 
       | *(or maybe an archive of a tons of zstd parquets for each table?
       | - Not sure what the best way to organize several tables in
       | parquet is yet)
        
         | enriquto wrote:
         | > This would save an enormous amount of space, and make things
         | far more tidy.
         | 
         | Why? The output pdf is typically smaller than the input that
         | produces it. Using rendered pdfs seems simple and very natural,
         | and at worst can use twice the total amount of space.
        
       | azangru wrote:
       | > The arXiv of the future is format-neutral and separates format
       | from content.
       | 
       | Didn't this use to be Latex's tagline? Separate format from
       | content. Which the authors of the article don't find separate
       | enough.
       | 
       | How does the proper separation of format from content even work?
       | Don't you need to markup your content in order for it to become
       | formatted?
        
         | ssivark wrote:
         | It's fairly well separated from the perspective of being able
         | to write content fairly agnostic of a presentation template,
         | and then swap in the required publication template in the end
         | (with a few cosmetic tweaks very occasionally).
         | 
         | But LaTeX is largely an extension of TeX, and these markup
         | languages seem not very amenable to re-implementing parsing /
         | automated processing (given numerous attempts that have
         | resulted in stalemates).
        
       | lazyjeff wrote:
       | I read this article the other day, "There are four schools of
       | thought on reforming peer review" [1] about how there's four
       | schools of thought about how to reform publishing and peer
       | review. Each of them independently are fairly well received and
       | makes sense in itself, at least among my academic circles.
       | However, there are tensions between them, so it's hard to come up
       | with a solution that's universally satisfying to even the
       | majority of stakeholders.
       | 
       | This article about ArXiv is clearly in the "Democracy and
       | Transparency school" as categorized article, but it doesn't yet
       | address the other three camps. The arxiv article proposes
       | machine-readable semantics, easier sharing and discoverability,
       | papers + supplementary materials + reviews all open; this floods
       | the world with even more publications with varying quality, so
       | it's even harder to identify good quality work; and when things
       | can be more easily aggregated by machines and measured with the
       | alternative metrics proposed, it often leads to a more powerful
       | winner-takes-all system that can be gamed (there's now a subtle
       | game of increasing citations that appear on Google Scholar);
       | finally, with an increase in submissions and materials that go
       | along with submissions, it puts an even greater strain on the
       | review system. These problems are not unsolvable, but almost
       | every idea I've seen proposed so far has only been in a single
       | camp, and there's side effects that harm the goals of the other
       | three camps. So I'd love to see more ideas that balance the
       | interests of all four camps that want to reform peer review and
       | publishing.
       | 
       | [1]:
       | https://blogs.lse.ac.uk/impactofsocialsciences/2022/03/24/th...
        
       | curiousgal wrote:
       | I don't usually read long articles on my phone but the design of
       | that page on my Pixel 6 was just so perfect! I hope this becomes
       | the norm!
        
         | periheli0n wrote:
         | This is precisely their point. Reading the usual Arxiv-PDF on a
         | phone is a pain, even if you just want to glance at some key
         | parts of the text. Their version is much, much better. It's
         | self-promotion by the Authorea team on the platform they are
         | competing with (ArXiv), but they have a point.
         | 
         | Arxiv needs to go HTML.
        
           | stncls wrote:
           | But the article link is arXiv's own (admittedly experimental)
           | HTML5 viewer!! And your parent comment is praising it.
        
       | bee_rider wrote:
       | This seems... ambitious.
       | 
       | I think ArXiv (edit: _Actually this is not by ArXiv, but some
       | other group_ ) is drastically over-estimating the desire to
       | submit papers to their service. They are popular because they
       | host the documents you were going to produce, in the format that
       | the journals expect. The production of a Arxiv appropriate
       | document is a side effect of the actual job, which is writing a
       | paper to submit to a journal (hey, I'm as unhappy as you are that
       | this is the actual job, but everyone hates publish-or-perish, if
       | it could be overthrown it would have been).
       | 
       | "Getting academics to act in a way that is not directly in their
       | self-interest because they just love sharing information" is a
       | usually a pretty safe bet, but I think this would be a bit too
       | far. Unless ArXiv can somehow get journals to expect their format
       | (good luck!) I think this is going to be hard.
        
         | stncls wrote:
         | The article is not at all by the arXiv people. This is just a
         | paper submitted to arXiv (about arXiv). The confusion is
         | understandable, because the link is to arXiv's experimental
         | HTML5 viewer, not the usual format (which would be:
         | https://arxiv.org/abs/1709.07020).
         | 
         | The authors are from Authorea.com, a for-profit that wants to
         | replace arXiv.
         | 
         | Edit: Aside from that, fully agree with you. Good luck to them.
        
           | bee_rider wrote:
           | Ah, thanks for the correction, that really changes things!
        
       | 0lmer wrote:
       | I'm still wandering about a service that would be to arXiv what
       | Github became to Sourceforge. Order of magnitude improvement of
       | collaboration and interconnection between published materials.
        
       | tempnow987 wrote:
       | "sharing research via PDF must inevitably come to an end."
       | 
       | Maybe instead of using the obsolete toolset arxiv provides, they
       | could host their groundbreaking research on their own platform?
       | The combination of ground breaking features and insightful
       | commentary would draw users?
       | 
       | Actually, many of the negatives they list are positives in my
       | book. The latex barrier screens out a ton of garbage in my view -
       | I'm on some social science / word based research lists, and the
       | quality of stuff is mind bogglingly bad.
       | 
       | Getting stuff it fit into a PDF (instead of the NY times new
       | scrollable story stuff) makes grabbing or print off or even
       | reading easy - less dynamic is good in my book.
        
       | kkfx wrote:
       | A small proposal: why not a PopcornTime of papers? Witch means a
       | distributed network (no matter if BitTorrent, ZeroNet, GNUNet,
       | I2P or something else) to publish? That's the best freedom
       | guarantee and just the mere number of nodes with a paper is a
       | good metric about it's popularity, to avoid oblivion each
       | uni/researcher can easily store and serve their own papers
       | forever: files are small, so download is quick, not much
       | resources are needed.
        
         | PeterisP wrote:
         | What problems does it solve for the authors? The features you
         | describe above don't seem a problem in the current solutions;
         | freedom and availability is a non-issue for authors, "to avoid
         | oblivion each uni/researcher can easily store and serve their
         | own papers forever" is a flaw not a feature (there are already
         | far too many ways to do that, which only add extra burden to
         | the authors if they want to "be everywhere" for the sake of
         | availability), it doesn't seem that it would be easier than the
         | current way; the resources/effort needed would be small but
         | non-zero, so it sounds like just an extra annoyance, not
         | something beneficial.
         | 
         | And if it solves some problems for someone else but not the
         | authors, then how would a comprehensive majority of papers
         | enter the system? Papers are even less interchangeable than
         | movies; if you want to have a particular movie and it isn't
         | available on PopcornTime, you might watch something else, for
         | papers you just have to go elsewhere that actually does have
         | everything.
        
       | wcerfgba wrote:
       | Readers may find the Octopus project interesting:
       | 
       | > Designed to replace journals and papers as the place to
       | establish priority and record your work in full detail, Octopus
       | is free to use and publishes all kinds of scientific work,
       | whether it is a hypothesis, a method, data, an analysis or a peer
       | review.
       | 
       | > Publication is instant. Peer review happens openly. All work
       | can be reviewed and rated.
       | 
       | > Your personal page records everything you do and how it is
       | rated by your peers.
       | 
       | > Octopus encourages meritocracy, collaboration and a fast and
       | effective scientific process.
       | 
       | > Created in partnership with the UK Reproducibility Network.
       | 
       | https://science-octopus.org/
        
       | akvadrako wrote:
       | It's fascinating to imagine what the arxiv of the future would
       | look like.
       | 
       | I imagine all scientific publications available on a distrusted
       | block store, including raw emails, data and notes on a voluntary
       | basis.
       | 
       | Stuff that could be published would include reviews, corrections
       | in version control fashion, and enough metadata to model
       | scientific progress.
       | 
       | What this article is describing sounds reasonable but not game
       | changing.
        
       | stncls wrote:
       | The authors first list some issues with arXiv. Next, they
       | describe how to fix those issues. Then the good news arrives:
       | this improved arXiv already exists. It's called Authorea.com. All
       | three authors are Authorea.com employees. They do disclose it as
       | their affiliation. Still, this is essentially an ad written in
       | LaTeX.
       | 
       | They correctly point out a few of the limitations of arXiv
       | (mostly: static LaTeX and PDFs). But I profoundly dislike the
       | other things they propose:
       | 
       | 1. "open comments and reviews". I have no problem with open
       | reviews on a third-party website, but arXiv is literally a
       | "distribution service". It has one job and does it pretty well. I
       | don't want it to turn into Reddit or (worse?) ResearchGate.
       | 
       | 2. "alternative metrics". Enough with the metrics already. We all
       | know they're destructive, at least all that have been tried so
       | far. I didn't even know that arXiv showed some bibliometrics
       | (because they are _thankfully_ hidden behind default-disabled
       | switches). Their proposed alternatives?  "How many times a paper
       | has been downloaded, tweeted, or blogged." I am not joking, this
       | is what they propose to include in addition to citations.
       | Seriously???
       | 
       | PS: Just a heads-up to anyone who, like me, would be wondering
       | about the ar5iv.labs.arxiv.org link. The article is a regular
       | paper submitted to arXiv. The authors do not belong to the
       | organization maintaining arXiv. The usual link is:
       | https://arxiv.org/abs/1709.07020
       | 
       | The ar5iv.labs.arxiv.org thing is an experimental html5 paper
       | viewer by the arXiv people.
       | 
       | Edit: typos.
        
         | jimhefferon wrote:
         | Thanks. It was not clear to me whether this is a white paper by
         | the arXiv people, or talk by external folks.
         | 
         | I now see that Wikipedia says this.
         | 
         |  _Authorea was launched in February 2013 by co-founders Alberto
         | Pepe and Nathan Jenkins and scientific adviser Matteo
         | Cantiello, who met while working at CERN. They recognized
         | common difficulties in the scholarly writing and publishing
         | process. To address these problems, Pepe and Jenkins developed
         | an online, web-based editor to support real-time collaborative
         | writing, and sharing and execution of research data and code.
         | Jenkins finished the first prototype site build in less than
         | three weeks.
         | 
         | Bootstrapping for almost two years, Pepe and Jenkins grew
         | Authorea by reaching out to friends and colleagues, speaking at
         | events and conferences, and partnering with early adopter
         | institutions.
         | 
         | In September 2014, Authorea announced the successful closure of
         | a $610K round of seed funding with the New York Angels and ff
         | Venture Capital groups. In January 2016, Authorea closed a
         | $1.6M round of funding led by Lux Capital and including the
         | Knight Foundation and Bloomberg Beta. It later acquired the VC-
         | backed company The Winnower.
         | 
         | In 2018 Authorea was acquired for an undisclosed amount by
         | Atypon (part of Wiley)._
        
           | sdenton4 wrote:
           | I don't really see how a for-profit preprint service is
           | desirable, given the terrible track record of other for-
           | profit entities in academic publishing. The extra features
           | will be great until the gatekeeping kicks in after the first
           | missed funding round...
        
       | einpoklum wrote:
       | They lost me at suggesting that a future ArXiv should be
       | 
       | > Web-native and web-first
       | 
       | Absolutely not. It should be "physical paper first". Any long-
       | term archiving cannot rely on electrical devices for viewing
       | archived material. Electrical grids fail. Technology changes.
       | Even if ArXiv is not a print archive, the material in it must be,
       | first and foremost, printable in a consistent manner, and with
       | the authors targeting the physical printed form. Of course, one
       | would need to actually print ArXiv items to physically archive
       | them, but still.
       | 
       | Now, of course archiving data is useful and important; and large
       | amounts of data are less appropriate for print archiving. But
       | that should always be secondary to the archiving on knowledge.
        
       ___________________________________________________________________
       (page generated 2022-03-28 23:00 UTC)