[HN Gopher] ArXiv now offers papers in HTML format ___________________________________________________________________ ArXiv now offers papers in HTML format Author : programd Score : 454 points Date : 2023-12-21 18:34 UTC (4 hours ago) (HTM) web link (blog.arxiv.org) (TXT) w3m dump (blog.arxiv.org) | shrimpx wrote: | Since the article doesn't link to any example HTML article, | here's a random link: | | https://browse.arxiv.org/html/2312.12451v1 | | It's cool that it has a dark mode. Didn't see a toggle but | renders in the system mode. | | Overall will make arXiv a lot more accessible on mobile. | burkaman wrote: | And here's the PDF of the same paper for comparison: | https://arxiv.org/pdf/2312.12451.pdf | FredPret wrote: | The contrast is massive. I'm much more likely to read the | html version; that PDF is deeply off-putting in some hard to | define way. Maybe it's the two columns, or the font, or the | fact that the format doesn't adjust to fit different screen | sizes. | ForkMeOnTinder wrote: | Definitely the two columns for me. It's super annoying | skimming a paper and having to scroll down and back up | again in a zig-zag pattern. | mmis1000 wrote: | I think the consuming device matters. A ipad or computer | have much wider screen width. One column layout is too | wide for them for average people to scan text lines | quickly. | | While it looks perfectly fine on a phone. Two columns | layout looks terrible on a smartphone, the text is too | tiny to read comfortably. | | It would probably be even better if you can flip it left | and right like a ebook instead of scrolling to allocate | the content faster. But current design is good enough | IMO. (Compare to reading a pdf on cellphone) | kjkjadksj wrote: | Just zoom the smartphone into one column. Problem solved. | mmis1000 wrote: | And then you will have to scroll both top bottom and left | right, a even worst experience. | tobias2014 wrote: | This is very interesting, because for me it's just the | opposite. In particular the two column layout is just more | readable and approachable for me. The PDF version also | allows for a presentation just as the authors intended. I | guess it's good that they offer both now. | kjkjadksj wrote: | The authors don't format the pdf, the editor does. | Authors probably sent a double spaced word document with | figures and tables on another file. | tonyg wrote: | In computer science, the usual case is that the author | fully formats the paper. | z2h-a6n wrote: | Not on arXiv (unless I'm much mistaken), which is a | preprint server, not a conventional journal. | | arXiv accepts various flavors of TeX, or PDFs not | produced by TeX [0], and automatically produces PDFs and | HTML where possible (e.g. if TeX is submitted). In the | case of the example paper under discussion, the authors | submitted TeX with PDF figures [1], and the PDF version | of the paper was produced by arXiv. The formatting was | mainly set by using REVTeX, which is a set of macros for | LaTeX intended for American Physical Society journals. | | [0] | https://info.arxiv.org/help/submit/index.html#formats- | for-te... [1] https://arxiv.org/format/2312.12451 | smartmic wrote: | FWIW, I recently learned that it is also possible to | produce nice PDF papers with GNU roff (groff), have a | look at this example: https://github.com/SudarsonNantha/L | inuxConfigs/blob/master/.... | frocmlol wrote: | You are very confidently wrong. | | In the arxiv you use latex and do everything yourself. | There is no editor. | cozzyd wrote: | You typically send a .tar.gz of tex files (and, figures, | .bbl, etc.) to the journal. And then you typically upload | something very similar to the arxiv (I have an arxivify | Makefile target for for my papers that handles some arxiv | idiosyncrasies like requiring all figures to be in the | same folder as the .tex file, and it also clears all the | comments; sometimes you can find amusing things in source | file comments for some papers). | | Some fields may use Word files, but in most of physics | you would get laughed at... | | It is true that most journals will typically reformat | your .tex in a different way than is displayed on the | arXiv. | eigenket wrote: | You are completely wrong. ArXiv doesn't work like that. | JumpCrisscross wrote: | Do you work extensively with LaTeX? | | Two columns is good, albeit annoying on mobile. But the | font. The typeface kills me, and almost every LaTeX- | generated document sports it. | saurik wrote: | Hilariously, I would probably tolerate the HTML version a | lot better if it had the font from the PDF (and FWIW, the | answer for me is "no: I don't work with LaTeX at all... I | just read a lot of papers"). | cozzyd wrote: | Hating on Computer Modern (ok, probably now Latin Modern) | is something close to blasphemy. | kjkjadksj wrote: | If you read a lot of papers in your line of work you will | quickly appreciate the two columns and justification. | FredPret wrote: | Admittedly, I don't read research papers. But with HTML, | surely the choice between one or two columns is a | checkbox away. | IlliOnato wrote: | Which checkbox? | | I cannot find anything relevant in any of the 3 browsers | I use (Vivialdi, Firefox, Chrome). Would really | appreciate this option. | | A quick search gave some apparently unmaintained browser | extensions, and it's it. | FredPret wrote: | No, I'm saying there _should_ be a checkbox. That way, | you can switch between two columns formatted like LaTeX | and that font they always use, and one column with | Helvetica / Arial. | jabroni_salad wrote: | Only problem is jagoffs like me who need the text to be | bigger. On PDFs you now get to experience a horizontal | scrollbar. HTML has text reflow and I can set the line | length by resizing the window. I'm willing to make a lot | of sacrifices for that experience. | z2h-a6n wrote: | For what it's worth, two column layouts are very common in | the physical sciences, or at least in physics which I'm | more familliar with. I have a feeling that the reason is at | least partly to save page space when using displayed math | (e.g. equations that are formatted in a break between | blocks of text), which use the full text width (i.e. the | width of one column) to display what may be much less than | half a page wide. | FredPret wrote: | It makes sense - for paper. But pixels are infinite - | HTML is far better for screen display, which is how | people read things nowadays. | | The extra column next to the one I'm reading introduces a | lot of visual noise, and the content is hard enough as it | is. I'm sure physicists have all gotten used to it, but | it certainly trips me up. | nyssos wrote: | > The extra column next to the one I'm reading introduces | a lot of visual noise | | Papers are generally not read start to finish in one go: | there's lots of rereading and jumping back and forth | between key parts, and anything that moves them further | apart makes this harder. | FredPret wrote: | Ah, that makes more sense. I imagined scientists just | reading the whole thing start-to-finish. | | I still think a flexible layout is best. If you like | multi-columns and have a wide screen, why not display 12 | columns next to each other? | | With PDF this is not possible. With HTML the content can | in principle be sliced and diced how you like it. | shusaku wrote: | Seems like the references aren't working very well. | | I really want journals to have two way links in a paper. I get | google scholar alerts about certain papers being cited, and I | want to skip to "why did they cite this? Did they use it, | improve it, it just mention it?" | r3trohack3r wrote: | I'd never considered setting up citation alerts like this. | | Thank you for the idea! | shrimpx wrote: | Looks like clicking a reference adds the hash to the URL but | doesn't scroll to the reference. If you load the hash URL | directly in the browser you get a 404 page... | burkaman wrote: | https://browse.arxiv.org/html/2312.12451v1#bib.bib1 works, | but https://browse.arxiv.org/html/2312.12451v1/#bib.bib1 | doesn't. | IlliOnato wrote: | Yeah, it seems like a bug in HTML generator... | winwang wrote: | Probably more accessible in general. (PDF) Papers are | psychologically scary. | mmis1000 wrote: | Pdf is by design a image format that can also embed text. It | just don't have the primitives to properly retain the article | structure. | PaulHoule wrote: | Nah, it's a super-complex system that creates a graph of | components, can draw vectors like PostScript, can embed 3-d | models, etc. The spec is here | | https://opensource.adobe.com/dc-acrobat-sdk- | docs/pdfstandard... | | if you look at sections 14.6 through 14.10 you will find | quite baroque facilities for representing the structure of | documents in great detail, making documents with | accessibility data, making documents that can reflow with | HTML, etc. Note to mention the 14.11 stuff which addresses | problems with high end printing (say you want to make litho | plates for a book.) | | For that matter sections 14.4 and 14.5 describe facilities | that can be used to add additional private data to PDF | files for particular applications. For instance Adobe | Illustrator's files are PDF files with some extra private | data, and https://en.wikipedia.org/wiki/GeoPDF | | I like to complain that PDF has no facility to draw a | circle but instead makes you approximate a circle with | (accursed) Bezier curves but other than that the main | complaint people make about PDF is that it is too | complicated not that it is lacking this feature or that | feature. | | Contrast that to a highly opinionated document format like | DjVu | | https://en.wikipedia.org/wiki/DjVu | | which came out around the same time as PDF and is | specialized for the problem of scanned documents and works | by decomposing the document into three layers, one of which | is a bilevel layer intended to represent text. All three | layers have specialized coding schemes, the text layer in | particular tries to identify that every copy of (say) the | letter "e" or the character "Han " is the same and reuse s | the same bitmap for them. | anonimo37 wrote: | You would normally use a library to create the PDF so you | don't need deal with the complexity of the format. A | library would likely provide a function for drawing | circles that translates the circle into Bezier curves. | tarboreus wrote: | One of the reasons is to make the papers more accessible to | people with disabilities, especially the blind. I participated | in a conference they hosted on this a few months ago, I | recommend taking a look at the recordings if you're interested | in thinking on this. | | https://accessibility2023.arxiv.org/ | miki123211 wrote: | Blind person here, can confirm this. Reading PDFs with a | screen reader is bad, reading PDFs that come from LaTeX is | worse, reading LaTeX math is pretty much impossible. All the | semantic info you need is just thrown away. | | You _can_ make decently accessible PDFs but it 's lots of | work, you need Acrobat on the producer' side and might also | need it on the consumer's side. Free tools don't even come | close. There's also the fact that the process of making | accessible PDFs in Acrobat isn't itself accessible. | | With that said, the way screen readers treat HTML math | certainly isn't perfect, it's geared more towards school | children than anything above calculus. I'm probably going to | stay with my LaTeX source files for now. At least ArXiv | offers those, not many sites do. To be fair, that approach | also has its own set of problems (particularly when people | use some extra fancy formatting in their math equations, | making the markup hard to read), but I find this to be the | best approach for me so far, at least on AI/ML papers. | saurik wrote: | Huh. It would seem like, of all the things which should | make it easy to generate the correct accessibility | information, the pipeline of compiling a paper from source | code in LaTeX should nail it... maybe we should all pitch | in to some pool to pay someone to put in the required | effort to connect all the dots? | semi-extrinsic wrote: | Kind of tangential, but it's also kind of surprising how | difficult it is in LaTeX to make a plot of an equation. | | Say I have Equation \ref{eq}. Why can't I just say "plot | \ref{eq} for x from -6 to 11" and get my graph? | | And yes, I know about pgfplots, PSTricks, TikZ etc. But | in all those cases, I need to define the same equation | twice, in different syntax to boot. It's kind of | unsatisfying. | ldenoue wrote: | I wrote an app called PDF Reflow that reflows the original | PDF using image processing to cut out words into tiles so | you see the reflowed version of the text in their original | look. | | https://www.appblit.com/pdfreflow | jakderrida wrote: | Hold on... Are you telling me that all these complex | sentences are being typed out based on your voice alone? | That's insane. | ehPReth wrote: | ? blind people can use keyboards | kzrdude wrote: | Hm tangential question but shouldn't touch typing be well | accessible for many blind computer users? | topato wrote: | I'd say it would be simple to talk type these using | windows 11's redux of voice typing. Pretty damn accurate | and easy to modify/variate text/options. I use it all the | time to make tech/engineering blog posts, faster and more | organic than typing, typically, and it learns your | technoacronyms. Combined with voice access, it makes it | trivial to fully operate your computer (well, at least, | browse the web, email, and media apps) from across the | room. For anyone who hasn't tried the updated version, | highly suggest hitting windowskey+h and giving it a shot. | anthk wrote: | Emacs with Emacspeak has a math reading module. | codethief wrote: | Ugh. I don't belong to the target audience (people with | disabilities) but the typesetting doesn't exactly look pleasant | on my machine (Chrome on Linux). | jll29 wrote: | It's a cool feature because it makes the papers more finable, | more easily navigatable, easier to read online and faster to | scroll through. I am also happy for blind people that they can | more easily use ArXive with Braille readers now. | | (I'm still a fan of printing the PDFs, because I annotate on | paper and refer to page numbers, but the HTML feature is in | addition to PDF download, not a replacement.) | | One thing that still sucks (not ArXiv related though) is | reading mathematical formulae on the Kindle - wonder if someone | with rendering expertise could have a look into the MOBI | format. | alephnerd wrote: | This is a great UX addition. Why did it take them so long? | gwern wrote: | The conversion is still very error-prone. It can't convert a | lot of packages, and the last paper I read, StarVector, half | the HTML version is just missing. (I think it hit an error at a | figure of some sort.) I reported an error, but I've been | reporting errors against the ar5iv and abstracts for years now | and the long tail of problems just seems like an incredible | slog. | KRAKRISMOTT wrote: | Where are the computer vision people? This is the perfect | type of problem for multi modal LLMs | IlliOnato wrote: | Except that the errors made by an LLM might be harder to | spot then converter errors that typically are very blatant, | and don't usually alter text (perhaps just drop parts of | it). | | Also, a bug in a converter is conceptually much easier to | fix than to re-train your LLM. | | I am not sure that AI in it's current state is useful when | "high fidelity" is required. | dginev wrote: | Can confirm. From an ar5iv standpoint, 2.56% articles | currently fail to convert entirely, and 22.9% have known | errors to the converter. That leaves 74.5% of nominally | usable articles. This success rate is noticeably _lower_ for | the newest batches of arXiv submissions, as the converter | hasn 't caught up with the most recent package innovations. | | We have a plan in place to meaningfully fall back for unknown | packages, but that will take at least another year to put in | place, and likely another couple of years to stabilize. | | Meanwhile, there is some hope that with arXiv launching the | HTML Beta we will get more contributions for package support | (LaTeXML is an open source project, with public domain | licensing, everybody benefits). | | But again the original point is spot on. Coverage will be | hit-or-miss for a while longer yet, for an arbitrary arXiv | submission. The good news is that authors _could_ work | towards better support for their articles, if they wanted to. | eviks wrote: | Because this is a rather conservative field with little | dependency on the general public, so without much interest in | hepling disseminate the knowledge broadly & accessibly | (relative to other priorities, not absolute) | Strilanc wrote: | How would you do it quickly? | | For example, HTML isn't divided into numbereres pages while | PDFs are. A lot of latex interacts with page boundaries. | Figures tend towards the tops of pages. And there's \clearpage. | And the reference list might say which page each citation | appeared on. All that stuff needs someone to decide how to | handle it and then to implement that handling. Like... what | value does \pageheight return? Sometimes I resize things to fit | the page height, and if it was doubled then I should have | resized to fit the width instead. | lynndotpy wrote: | Almost universally, we prepare conference papers as LaTeX files | made to export to PDFs which fit within the conferences | template. | | It's nontrivial to export this to HTML in all cases, and even | then, nobody is asking for HTML from us even though we all want | it. I'm guessing Arxiv is using some kind of converter which | _usually_ but not _always_ works. | | That said, this is a long time coming and PDF as the standard | should've died a decade ago. I wish I had this when I was in my | PhD program. | alright2565 wrote: | Latex is a very complicated programming language for creating | documents. It is not easy to create a new backend for it. | | As a glimpse into the very tip of the iceberg, this diagram is | https://tex.stackexchange.com/a/158740/ generated with 100% | Latex code. | binarymax wrote: | Nice! Now I don't need to manually replace arxiv with ar5iv. | Congrats to the team. | imjonse wrote: | "Our ultimate goal is to backfill arXiv's entire corpus so that | every paper will have an HTML version, but for now this feature | is reserved for new papers." | | For now it only works for papers submitted this month. But it's | great to have this feature, makes it so much easier to read on | phones. | eviks wrote: | Finally a modern format you can copy&paste from and read on one | of the most popular computing platforms!!! | pushfoo wrote: | Previously discussed: | https://news.ycombinator.com/item?id=38713215 | carlosjobim wrote: | With the 2024 browser update, this means I can read these | articles on my ancient Kindle perfectly fine. | ChrisArchitect wrote: | [dupe] from yesterday | | More here: https://news.ycombinator.com/item?id=38713215 | ZeroCool2u wrote: | Wow, this is _so_ much better! | choppaface wrote: | Hope they benefit from CDN caching now too. | | Edit: aaaand they got Fastly | https://news.ycombinator.com/item?id=38723373 | cozzyd wrote: | doesn't work great with long author lists... | | https://browse.arxiv.org/html/2312.12907v1 | degenerate wrote: | The PDF is worse, so there is no simple answer to this: | https://arxiv.org/pdf/2312.12907v1.pdf | | At least the HTML version pairs each author with their | affiliations, instead of the PDF which has all the names on | page 1, and all the affiliations on page 2. That's completely | unreadable. | cozzyd wrote: | The PDF is better because I'm trained to scroll past the | author list. That takes forever on the html version . | mattigames wrote: | You can click the "Introduction" anchor on the left side | and it scrolls for you past the author list | cozzyd wrote: | well it skips the abstract too, but yes, you can scroll | back up to see it. | mattigames wrote: | Yeah, its a bit weird that the abstract doesn't have a | link on the left | cozzyd wrote: | Probably because \abstract{ } is treated differently than | \section{ }, I guess... | IlliOnato wrote: | For me the PDF is much better. It's compact and clean, if I | really need to see an affiliation for a particular author, | it's really easy to do so in the PDF, not so in the HTML. | | It's highly unlikely anybody will read an entire author list | this long; typically you would read the first two or three | names, or check if some particular name is on the list. So | the compactness of the list and being able to quickly get to | the article contents is important. | Al-Khwarizmi wrote: | Nice! It would be even better if they offered authors of previous | papers the option of converting to HTML, as the latex sources are | already in the system. | fprog wrote: | The article states they're going to backfill all, or nearly | all, previously submitted papers! | FredPret wrote: | This is brilliant. I don't share academia's love of LateX multi- | column PDFs. | tiagod wrote: | I like multi-column text on paper (literally), but it's awkward | in digital where you can just shape text on the fly to whatever | column size you want | leoncaet wrote: | I just hope they don't stop to offer the papers in PDF. Even when | I'm on a computer, I still prefer to read PDFs. | sylware wrote: | Like the maths noscript/basic (x)html wikipedia generator: | | The magic of inline images at a known DPI, of course you can | provide images for different DPIs. | | Reading maths/science noscript/basic (x)html documents on my 100 | DPI monitor, on wikipedia. Not yet fully ready on arxiv. | gms7777 wrote: | About time. Biorxiv and medrxiv have been doing this for probably | half a decade at this point? | jez wrote: | It would be neat if they offered submitters the chance to upload | their own HTML version alongside the PDF version, instead of | always relying on an automatic conversion process. | | - I can imagine authors feeling frustrated if someone reaches out | about a problem in the HTML version of their paper, but they have | no way to correct it except by hoping that a change to the PDF | fixes a change to the generated HTML. Easier to just fix the | formatting problem in the PDF outright. | | - It would be neat to allow people to experiment with alternative | formatting for their papers. For example, imagine a paper about a | programming language that embeds a sandbox you can use to play | around with the language under discussion. Or a paper about | multivariable calculus and you can interact with a three | dimensional plot of some function. | layer8 wrote: | They'd have to define and document a "safe" subset of HTML, and | implement a filter/checker for it. Otherwise we'd end up with | papers containing ads and tracking and XSS vulnerabilities and | whatnot. | digging wrote: | Those are issues with JavaScript, not HTML. Wouldn't | filtering out iframes pretty much keep us in the clear? | layer8 wrote: | The parent wanted interactive 3D plots, which means | JavaScript embedded in or linked from the HTML. Then | there's stuff like JavaScript embedded in SVG. | diffeomorphism wrote: | > It would be neat if they offered submitters the chance to | upload their own HTML version alongside the PDF version, | instead of always relying on an automatic conversion process. | | Please don't. Then you will have a mismatch between the source | and the "own html" which ruins the point of uploading the | source. | eviks wrote: | Pdf isn't the source | IlliOnato wrote: | But the PDF is also generated. LaTeX is the single source | of truth. | kjkjadksj wrote: | Most authors probably have no interest in learning html. Also | most authors want nothing to do with the work by the time its | submitted. It was probably hell getting the project to that | point of publishing, they want to be done with it and move on | to the next thing going on in their career asap. | jez wrote: | I think this is an argument in favor of doing automatic PDF | -> HTML conversion for the authors that don't want to touch | it, but I don't think it's an argument against letting those | who are fine with HTML provide their own. | tiagod wrote: | I was under the impression the source authors publish to arxiv | was a latex file | jraph wrote: | It is. | jez wrote: | Ah, thanks for clarifying! | | I looked up the submission formats, and it looks like if you | authored the paper in TeX/LaTeX, they do not accept pre- | rendered versions of the document. | | https://info.arxiv.org/help/submit/index.html#formats-for- | te... | | But if you did not author it in TeX/LaTeX (e.g., Word, Google | Docs, etc.) it appears you can upload a PDF or HTML yourself. | IlliOnato wrote: | No, it would not. It's critically important that there is only | one "logical" article, albeit with different representations. | In other words, a single "source of truth". | | With "sideloading" of HTML there is no way in general to make | sure that the _contents_ of LaTeX (and PDF) on one side and | HTML on the other side is the same. | thomasahle wrote: | > It would be neat if they offered submitters the chance to | upload their own HTML version alongside the PDF version, | instead of always relying on an automatic conversion process. | | Can you recommend a system I can use to compile my latex, while | also making sure the html is going to look good? I'd like some | kinds of css style @media queries to switch between certain | parts of the layout, while keeping a single latex file. | endergen wrote: | I was hoping this meant that html native submissions would be | possible, so that people made interactive explanations. | lucidrains wrote: | nice! will make reading papers on the phone so much more | pleasant! | odyssey7 wrote: | article { text-justify: Knuth-Plass; } | matt1 wrote: | For anyone interested in staying informed about important new | AI/ML papers on arXiv, check out https://www.emergentmind.com, a | site I'm building that should help. | | Emergent Mind works by checking social media for arXiv paper | mentions (HackerNews, Reddit, X, YouTube, and GitHub), then ranks | the papers based on how much social media activity there has been | and how long since the paper was published (similar to how HN and | Reddit work, except using social media activity, not upvotes, for | the ranking). Then, for each paper, it summarizes it using GPT-4, | links to the social media discussions, paper references, and | related papers. | | It's a fairly new site and I haven't shared it much yet. Would | love any feedback or requests you all have for improving it. | raccoonDivider wrote: | That looks great. No real feedback yet, but it's the kind of | thing I've always been looking for as a better alternative to | Twitter. | matt1 wrote: | Thanks! I've got a lot more planned for it too. If anyone has | any feedback that doesn't make sense to share here, or if | you're a researcher who is open to some questions about how | you currently follow arXiv papers, drop me a note at | matt@emergentmind.com. | CodeCube wrote: | Love to see Energent Mind continuing to innovate! | sureglymop wrote: | Love the clean design of the website! Looks amazing on mobile. | jakderrida wrote: | This is exactly what I was using HN for. But, yeah, in kinda | sucked compared to yours. Another thing I was trying to create | was some sort of NN model that could use the semanticscholar | h-index of authors along with the abstract text and T5 to | estimate the one-year out citations. Just for personal use, | though. That whole thing fell apart because semanticscholar is | kinda crap for associating author links to the same author. I | frequently ended up with the wrong professors, which I'd think | would be easily fixable for them. | carlossouza wrote: | I did that (used other features). This is how new papers are | ranked here: | | https://trendingpapers.com | apstats wrote: | I wonder if this could be used to train an LLM to convert PDFs | with rich charts into HTML? | reqo wrote: | A lot of AI/ML papers these days have an accompanying interactive | page like [0], will we see anything like these now directly in | arXive? | | [0] https://voyager.minedojo.org/ | z2h-a6n wrote: | I think then arXiv would have to deal with mantaining the tech | stack and providing the presumably much higher server capacity | to serve the more varied web pages that would result, so it | seems like a tall order. arXiv already has an experimental | integration with Papers with Code [0], which I guess provides | similar results for the reader, though the authors have to | figure out their own web hosting. | | [0] https://info.arxiv.org/labs/showcase.html#arxiv-links-to- | cod... | ansk wrote: | When I open a large pdf on arxiv (100+ MB, not uncommon for ML | papers focused on hi-res image generation), there is a | significant load time (10+ seconds) before anything is rendered | at all other than a loading bar. Does anyone know what the source | of this delay is? Is it network-bound or is Chrome just really | slow to render large PDFs? Do PDFs have to be fully downloaded to | begin rendering? In any case, this delay is my only gripe with | arxiv and a progressively rendered HTML doc that instantly loads | the document text would be a huge improvement. | IlliOnato wrote: | It may be even that the time is taken to _generate_ a PDF. | | The format in which articles are submitted and stored in arXive | is LaTeX. PDF is automatically generated from it. | | Probably arXiv does some caching of PDFs so they don't have to | be generated anew every time they are requested, but I don't | know how this caching works. | upbeat_general wrote: | I have the same issue. From what I can tell it's just network- | bound and the Arxiv servers are slow. They theoretically allow | for you to setup a caching server but after spending a while | trying to get it setup, I haven't been able to get it to work. | | https://info.arxiv.org/help/faq/cache.html | arccy wrote: | maybe it'll be faster now with fastly | | https://news.ycombinator.com/item?id=38723373 | ww520 wrote: | That's great. Now I can read the papers on my phone. | svag wrote: | The tool that it's being used for this offering is this one, | https://github.com/arXiv/arxiv-readability, just to save a few | clicks :) | IshKebab wrote: | Wow I did not know they have the LaTeX for all the papers and | compile it themselves! That's pretty crazy. What if they don't | have packages you need? What if your paper isn't written with | LaTeX? | WendyTheWillow wrote: | I'm so far left wanting for an app that gives me a way to easily | track and consume newly published work of a given topic. The | existing apps are not great, and maybe this change will make it | easier to provide better "reader" views, and possibly even tts (I | like to listen+read). | aragonite wrote: | A lot of academic journals (say from Springer) also offer HTML | formats for papers published in the past decade or so, which I | personally often find more convenient for reading purposes than | PDFs. For example, I parse text a lot faster if I use a regex to | split each paragraph into sentences and place a linebreak after | each sentence, or if I do natural language "syntax highlighting" | by assigning a distinctive color to functional words indicating | logical structure like 'if/then', 'and', 'or', 'not', 'because', | and 'is'. And sometimes it really improves readability to be able | to do "semantic highlighting", in the sense of say assigning a | different hashed color to each proper name (or each labeled | thesis, etc) that occurs in the paper. Such manipulations are | basically impossible with PDFs. It makes me wish sci-hub would | start archiving HTML versions in addition to PDFs! | johnsillings wrote: | https://www.arxiv-vanity.com/ | jakderrida wrote: | And, of course, https://ar5iv.labs.arxiv.org/html | | However, ar5iv isn't a la carte like arxiv-vanity. They pretty | much do last month's papers every month or so. Something like | that. | dginev wrote: | Hi, ar5iv creator here. | | You can think of both arxiv-vanity and ar5iv as the "alpha" | experiments that lead into the official arXiv "beta" HTML | announced today. | | Once a few rounds of feedback and improvements are | integrated, and the full collection of articles acquires HTML | in the main arXiv site, ar5iv will be decommissioned. | | The plan is to turn all existing ar5iv links into redirects | to the official HTML, and free up the resources for | maintaining it. I am not sure what are the plans for | maintaining arxiv-vanity, but I suspect they may head down a | similar path some time later. | philipashlock wrote: | 30 years after HTML was invented to support accessibility and | collaboration for research and academia and the same day the | White House released their new accessibility guidance which | happens to be the first time they've published formal new policy | natively has HTML rather than PDF - | https://www.whitehouse.gov/omb/management/ofcio/m-24-08-stre... | murphyslab wrote: | I feel surprised by how succinct, easy-to-understand, and | sensible the policy (M-23-22) is: | | > Default to HTML: HyperText Markup Language (HTML) is the | standard for publishing documents designed to be displayed in a | web browser. HTML provides numerous advantages (e.g., easier to | make accessible, friendlier to assistive technology, more | dynamic and responsive, easier to maintain). When developing | information for the web, agencies should default to creating | and publishing content in an HTML format in lieu of publishing | content in other electronic document formats that are designed | for printing or preserving and protecting the content and | layout of the document (e.g., PDF and DOCX formats). An agency | should develop online content in a non-HTML format only if | necessitated by a specific user need. | | https://www.whitehouse.gov/omb/management/ofcio/delivering-a... | golol wrote: | IMO pdf and HTML optimize for different things. pdf is easy and | pretty. HTML is easy and responsive. But making pdf responsive is | impossible and making HTML pretty is not easy. I think having | arxiv for well-polished pretty documents, not responsive ugly | documents. Most researchers don't have time to make an HTML | responsive and pretty. | querez wrote: | Am researcher, care about responsiveness way more than pretty. | I am super glad for the option. Downloading PDFs is super | annoying. I'm stoked. | radicalriddler wrote: | FUCK YES (excuse my profanity). I have a tool that converts HTML | to Neural Speech and I always wanted to push arXiv papers through | it, but couldn't be bothered with a PDF implementation. | topicseed wrote: | What do they use to convert a PDF document to a clean, correct | HTML document? It's a difficult space, especially with the | variety of layouts you may find in PDF documents... | blackbear_ wrote: | Arxiv encourages users to submit the latex source of their | papers rather than the PDF | vegabook wrote: | PDF is objectively much better than HTML at rendering text | documents. And it's not even close. This could easily have been | done 10, even 15-20 years ago. That it didn't is not just | inertia. Latex and PDF have enormously better text rendering, and | the static format locks a state-commit in time that is much | easier to go back to and reference/critique. Unlike the | intrinsically fluid nature of HTML. For academic work, milestone- | like formats, that lock state in time, are useful for those who | later build on them. And again, the rendering just doesn't | compare and that imparts [sub]conscious quality signals. | imranq wrote: | At this point are academic papers simply peer-reviewed blog | posts? | acjohnson55 wrote: | This is great! I browse papers on mobile, and PDF is so bad for | that use case. ___________________________________________________________________ (page generated 2023-12-21 23:00 UTC)