[HN Gopher] Embedded PDF viewer in Firefox 81 supports filling f...
       ___________________________________________________________________
        
       Embedded PDF viewer in Firefox 81 supports filling forms
        
       Author : muxator
       Score  : 798 points
       Date   : 2020-09-22 14:16 UTC (8 hours ago)
        
 (HTM) web link (support.mozilla.org)
 (TXT) w3m dump (support.mozilla.org)
        
       | austincheney wrote:
       | Does this support digital signatures via signing certificate?
        
       | SigmundA wrote:
       | Still waiting for the SVG backend to be fully implemented for
       | high quality printing.
        
       | skratlo wrote:
       | Which PDF forms standard is this?
        
       | blackbrokkoli wrote:
       | I have a feeling this thread has a strong bias from highly
       | automated valley life. In more provincial regions and even just
       | much of Europe lots of forms have to be filled out and printed.
       | 
       | It is not something you have to everyday or something, but the
       | existing solutions suck massively. You either have to use Adobe,
       | which requires Windows (or Mac, I suppose) and your firstborn or
       | use some massively shady online service. So personally, I love
       | this feature!
       | 
       | (And I also do not think that this will halt all other
       | development at Mozilla like some comments here imply)
        
         | drdaeman wrote:
         | Okular works on Windows. Or, at least, used to be, some years
         | ago.
        
         | boogies wrote:
         | Isn't evince capable of this and the default PDF viewer on
         | GNOME?
        
           | krastanov wrote:
           | I have had evince fail render forms, but okular (the KDE
           | default) has worked pretty well.
        
           | jhoechtl wrote:
           | Gnome - no. Okular the KDE counterpart works very fine.
        
         | Jaxan wrote:
         | In Mac you can fill in PDFs with the builtin Preview app. I
         | like it.
        
           | mkskm wrote:
           | There's also the paid app PDF Expert which is generally
           | excellent.
        
       | boringg wrote:
       | Can we just step away from PDFs to a better standard? Every time
       | I deal with PDFs or I have to on behalf of my parents it is a
       | true waste of time and resources - there has to be a better way.
        
         | jrochkind1 wrote:
         | Well, no, we can't just do that. But it's nice to dream.
        
           | topspin wrote:
           | Well, yes, we can, but the outcome will be far worse. The
           | naive imagine "something better." The real world will
           | interpret "better" as 27 half baked alternatives, 2 of which
           | will work on something other than Chrome running on Windows.
        
         | kibwen wrote:
         | Sure you can: find an ideologically motivated tech billionaire,
         | buy Adobe, release a new version of PDF and make the spec an
         | inaccessible trade secret, aggressively legislate against
         | anyone who attempts to implement it, start charging for Reader,
         | increase the price by a compounding 2% every year, and put that
         | money towards a foundation with a purpose of openly designing
         | and implementing a better, freely-licensed replacement. I
         | predict this would only take 20 to 30 years. :)
        
           | boringg wrote:
           | Sounds like you've been thinking about this. You don't happen
           | to be an ideologically motivated billionaire who happens to
           | think the best thing for humanity and return on capital is to
           | rebuild the pdf spec do you? * fingers crossed _
        
         | Semiapies wrote:
         | At the very least, you need a replacement that's technically
         | better, works well cross-platform, has a layman-acceptable UI,
         | and supports 99.999%+ of all the use cases PDFs currently
         | supports. It also has to convert old PDFs into the new format.
         | 
         | Then, you have to worry about market share and acceptance.
        
       | INTPenis wrote:
       | This is huge. I've felt like an outsider for years here because
       | the gov uses a lot of online forms in PDF.
        
       | hapless wrote:
       | PDF form support still doesn't work very well -- it cannot export
       | filled fields correctly, nor will do they print correctly.
        
       | abrowne wrote:
       | I actually like the pdf.js viewer enough that I use the chrome
       | extension version on chromium. But I see it hasn't been updated
       | in over a year now. Hopefully it will get updated!
       | 
       | https://chrome.google.com/webstore/detail/pdf-viewer/oemmndc...
        
       | getpost wrote:
       | "After entering data into these fields you can download the file
       | to have the filled out version saved to your computer."
       | 
       | What's the use case? Printing out filled-in forms? But otherwise,
       | who would want the PDF in electronic format? It doesn't seem like
       | a practical way for users to submit data.
        
         | brainwad wrote:
         | Well, if the form needs to be faxed (still a thing!) then
         | having the filled PDF makes it easy to use an e-fax service.
         | But I assume sending the file to another computer for printing
         | is the main use case.
        
         | pbhjpbhj wrote:
         | Last year applying for jobs most places had a pdf form, if you
         | were lucky it was an actual form too! So, filling the form and
         | emailing it back is useful -- much better than trying to
         | overwrite text with a PDF background; far better than printing
         | the form, filling with a pen, scanning, then sending.
        
         | detaro wrote:
         | Printing is one use case. Also plenty places that want filled
         | out forms uploaded, e-mailed, ... + it allows you to keep a
         | copy with what you entered.
        
           | getpost wrote:
           | That's my point. Other than printing a nice looking form
           | (which includes faxing), the content of the field data is
           | hard to reuse. Searching PDF content on your own hard drive
           | is problematic.
           | 
           | Are there utilities that extract PDF field data and submit it
           | to a database? I'd be grateful to see examples.
           | 
           | What about field validation? The PDF may have some minor
           | validation, but that's no substitute for the validation done
           | in a DBMS.
           | 
           | If you want users to be able to save a nice looking form,
           | you'd still want the data entered online directly into a
           | DBMS. I'd offer a "download PDF of your input" as an option,
           | for example.
        
             | toyg wrote:
             | _> the content of the field data is hard to reuse._
             | 
             | A lot of people don't care, because they come from forms in
             | cartaceous - where they have to manually retype everything
             | anyway. For many, their "DBMS" will be an Excel sheet with
             | a dozen rows. The more advanced types likely have some
             | Adobe software that does all the magic.
             | 
             | Fillable PDF forms are really seen as a courtesy to users
             | more than anything particularly useful to the emitter.
        
             | detaro wrote:
             | Sure, its a structured format, so you totally can extract
             | the individual fields. AFAIK Adobe sells a server product
             | that does that, but I'm sure there's competitors and I have
             | seen the underlying parsing in feature in PDF libraries
             | before.
             | 
             | That said, plenty of users of PDFs have a very paper-
             | based/manual workflow still, and not the motivation and
             | expertise to run and update an online form thing. Or they
             | need to have the ability to handle odd inputs anyways,
             | because paper forms have even worse input validation.
             | 
             | And from a browser/user perspective, the feature here is
             | useful because people expect me to handle PDFs and do not
             | provide nice web forms. They might have terrible reasons
             | for doing so, but I still need to live with that.
        
       | sixhobbits wrote:
       | today I took a screenshot of a PDF and uploaded it to an OCR
       | service and copied the result into a doc.
       | 
       | The PDF was text-based but every time I copied something it added
       | millions of new lines and hyphens and extra text that wasn't
       | shown on the page.
        
       | TazeTSchnitzel wrote:
       | Will it support only the standardised kind of forms, or also the
       | proprietary Adobe-only kind of forms? (Yes, there's two, and the
       | latter are what Swedish administrative agencies use, so I'm
       | forced to choose the "non-fillable PDF" option lest I get a file
       | intentionally made unreadable to non-Adobe software.)
        
       | dehrmann wrote:
       | PDF support in Firefox is one of the most important additions in
       | recent years. My gripe with Mozilla was they're pursing all these
       | side projects when they really should be targeting feature parity
       | with Chrome. That's the only way people will ever switch.
        
         | eddiecalzone wrote:
         | I'm curious what you notice is missing in terms of feature
         | parity. I'm mostly a back-end developer (not diving into
         | devtools very often) and switched a year ago. I'm much happier
         | and haven't looked back.
        
           | andrewzah wrote:
           | I just want support for APIs, mainly. I get websites from
           | time to time that just refuse to load. E.g.
           | 
           | * blank pages when trying to load an imgur gallery on v68
           | (esr).
           | 
           | * image uploading not working right on instagram and various
           | other sites, either producing blank images or ones with weird
           | lines.
           | 
           | * several teleconferencing / video meeting websites just
           | don't work properly, whether it's not detecting hardware
           | properly, etc
           | 
           | I have to keep chromium installed so I can use these sites
           | properly.
        
         | kodablah wrote:
         | > My gripe with Mozilla was they're pursing all these side
         | projects when they really should be targeting feature parity
         | with Chrome. That's the only way people will ever switch.
         | 
         | A bit off topic from the post at hand, but my gripe was the
         | opposite. The relentless pursuit of parity made them
         | indistinguishable giving users no reason to switch (and taking
         | dev time away from distinguishing features). Granted the
         | pursuit of users instead of principles is its own folly that's
         | hard to overcome when money is needed.
        
         | DHowett wrote:
         | > feature parity
         | 
         | Like PDF support?
        
           | nacs wrote:
           | That's what OP said yes. That features like PDF fill is
           | essential while things like Pocket are basically non-core
           | side projects.
        
             | bad_user wrote:
             | Pocket is an acquired company, the integration with FF has
             | been minimal (it does less than the Chrome extension ;))
             | and I'm pretty sure it pays for itself.
        
             | yjftsjthsd-h wrote:
             | On rereading I agree with your interpretation, but it's
             | easy to read "all these side projects" as referring to the
             | PDF reader.
        
       | steviedotboston wrote:
       | Chrome has had this forever, right?
        
       | PaulHoule wrote:
       | I find the built-in PDF reader in Firefox to be bloat. It's OK,
       | it works 95% of the time, but really I want to use a native PDF
       | viewer.
       | 
       | Is there a version of Firefox that removes this bloat?
       | 
       | Given that Mozilla is very resource constrained, why are they
       | working on features that aren't necessary?
        
         | cptskippy wrote:
         | Given the amount of PDF exploits over the years and the habit
         | of browsers to automatically invoke your PDF viewer of choice
         | either as a plug-in or call out, they're an easy target.
         | 
         | Having a sandboxed PDF viewer that works 95% of the time is
         | great. For those 5% circumstances where I am actively trying to
         | view a PDF and it won't work in browser, I'll gladly go through
         | the minimal effort to open it in an external viewer.
        
         | pessimizer wrote:
         | It was once an add-on, and it was once disableable. It may
         | still be disableable, but I'm sure there's some strange
         | procedure you have to go through to do it.
        
         | derefr wrote:
         | I like that single-page PDFs stay in the browser. I don't want
         | to _keep_ them; I just want to _see_ them. Like any other web-
         | page. I want to be able to hit back, or close the tab, and
         | continue on with my day.
         | 
         | And I also like that I can _preview_ long-form PDFs in the
         | browser, before choosing whether to save them and read them
         | "for real."
         | 
         | Imagine if every time you opened a direct-linked JPEG image in
         | your browser, it treated it as an attachment, downloading it
         | and opening it in your external image-previewer app, rather
         | than rendering it as a synthesized HTML DOM wrapper around the
         | image. Wouldn't you be annoyed by how cluttered your Downloads
         | directory would get with random files you never actually wanted
         | to save?
        
         | danso wrote:
         | A lot of everyday users likely benefit from being able to fill
         | out PDFs in the browser.
        
         | snovv_crash wrote:
         | I find it extremely convenient. Also I know a lot of security
         | issues in PDF viewers are effectively solved by running it in
         | the browser's JS sandbox.
        
           | Mediterraneo10 wrote:
           | You could also disable Firefox's built-in PDF viewer and
           | instead use an external PDF viewer that doesn't even support
           | Javascript.
        
             | nitrogen wrote:
             | Not all PDF vulnerabilities involve JS though.
        
             | snovv_crash wrote:
             | Native PDF clients have had lots of security holes. In this
             | case having the client written in JS means we can repurpose
             | the battle hardened JS sandbox to also contain PDF
             | exploits.
        
             | detaro wrote:
             | You misunderstand the argument the parent comment makes.
             | It's not about Javscript _in_ PDFs.
        
         | Keycap wrote:
         | I hope no one is listing to you.
         | 
         | I don't want to start explaining to my mother, over the phone,
         | how to install and use the pdf viewer anymore :|
        
         | anonymousab wrote:
         | And here I found it much less bloated than the other free
         | desktop PDF viewers.
         | 
         | I think Mozilla's line of thought here is that PDF documents
         | are widespread in the web, to the point where they are a de
         | facto web document type. So it makes sense for a web browser to
         | support them rather than calling out to a user's desktop
         | program (though I assume you can configure it to do so
         | instead).
         | 
         | There's probably a bit of "our competitors do it, so we have to
         | too" in there as well.
        
         | ocdtrekkie wrote:
         | Given that Chrome/Edge also just added the feature, I would
         | point out: All web browsers are using the same library for PDF
         | handling, a feature in pdf.js ends up benefiting a lot of
         | people.
         | 
         | And the reasons for not requiring an outside PDF reader are
         | major: It's yet another likely-to-have-vulnerabilities program
         | people need to install, then update. In most cases, avoiding
         | Adobe programs on your PC is a good way to avoid a lot of
         | vulnerabilities.
        
           | oefrha wrote:
           | > All web browsers are using the same library for PDF
           | handling
           | 
           | Chrome/Chromium uses PDFium, not PDF.js, so no. Not sure
           | about Edge.
           | 
           | PDFium has been able to fill out forms for a long time.
           | What's new for Chrome is the ability to save edited PDF (as
           | fillable).
        
             | ocdtrekkie wrote:
             | If Chrome doesn't use pdf.js, then neither would Edge,
             | which is a Chrome fork. My original comment may have been
             | mistaken.
        
               | oefrha wrote:
               | A Chromium fork could replace certain components if they
               | so choose. The PDF rendering component would be one of
               | the easier ones to replace.
               | 
               | However, I was able to confirm that Edge uses extension
               | ID mhjfbmdgcfjbbpaeojofohoefgiehjai to render PDF
               | internally[1], same as Chrome, so indeed it's using
               | PDFium.
               | 
               | [1] The rendered PDF element would look like
               | <embed id="plugin" type="application/x-google-chrome-pdf"
               | src="..." stream-url="chrome-
               | extension://mhjfbmdgcfjbbpaeojofohoefgiehjai/..."
               | headers="..." ...>
               | 
               | And here's the extension's manifest in Chromium source,
               | where you can find the extension ID: https://github.com/c
               | hromium/chromium/blob/2baa2b094cdd60e980...
        
       | burtonator wrote:
       | I'm the author of Polar (https://getpolarized.io/) that uses
       | PDF.js as its PDF backend.
       | 
       | This is a somewhat big update for PDF.js which is kind of cool in
       | that they haven't really been updating it as aggressively as they
       | usually do in the last year or so.
       | 
       | It's a bit frustrating to work with though. The entire concept of
       | rendering a PDF via JS is fascinating but actually using the API
       | has been a huge pain for us.
       | 
       | We've had to fork it internally and work on typescript bindings
       | and other features to get it to work.
       | 
       | They seem to have a silly policy of only allow developers to use
       | a subset of the API not the whole API itself so that it doesn't
       | look like PDF.js (which I don't understand).
       | 
       | A lot of the functionality just isn't available otherwise.
        
         | 0xFF0123 wrote:
         | Looks cool! On a sidenote, I've always been curious with
         | product sites: what's your metric for including other orgs
         | under "Used and Trusted by Top Organizations"? How do you know
         | they use / trust it?
        
         | 52-6F-62 wrote:
         | I've been working on an internal tool for my company using the
         | same library. It's saved me a _ton_ of work, but my experience
         | has been similar to yours. I 've even had to lock in to a much
         | older version for want of putting a lot more work on my plate
         | since the API seems to have changed a fair bit. (well, between
         | it and JSDOM which I am using to some rendering on the server).
         | And like you I've had to write a bunch of the
         | bindings/definitions myself or just reduce them to nil (declare
         | module "yadda/yadday/thing" as any)--which is thankfully
         | permissible since it just needs to be built once and run
         | "forever" with near-zero need for feature additions, etc.
         | 
         | All to extract images in a routine fashion.
         | 
         | Just the same, I'm still immensely thankful they've published
         | the library as OSS.
        
         | 0x6A75616E wrote:
         | Hey! Polarized looks pretty cool. Question, has this feature
         | (forms) been merged into the public pdf.js master yet?
        
         | maest wrote:
         | Do you have, by any chance, any good resources on PDF.js? The
         | README on the github is ok, but it doesn't really cover what
         | workers are supposed to do and provide any useful mental model
         | for the architecture of the whole thing.
        
         | brendandahl wrote:
         | PDF.js dev here. I'm a bit confused on which part of internal
         | API you would like to use? The way I think of it, there are
         | really three API's in pdf.js: 1) Main thread API (api.js) which
         | we base the version off 2) The code that runs in the worker 3)
         | The viewer components (web/*)
         | 
         | Quite awhile ago when we decided what parts of the API to
         | version, we thought more people would want to use #1. Now that
         | the project is mature we could probably expose some more base
         | the version off of that.
         | 
         | As for the "so that it doesn't look like PDF.js", we don't
         | limit the API because of this. That suggestion (which I don't
         | totally agree with) came from what we saw people doing, where
         | they'd copy the entire viewer, when it'd probably be better to
         | just let the user's browser choose how to show the PDF.
        
           | torresjrjr wrote:
           | > PDF.js dev here
           | 
           | I'm so sorry about being forward but why the hell don't the
           | vim keys (hjkl) smooth scroll? Its so frustrating. Is there
           | an option to set it as so? Using the arrow keys is so
           | cumbersome.
        
             | colejohnson66 wrote:
             | Because not everyone uses Vim?
        
             | tarikozket wrote:
             | here, I'll make it easier for you to contribute to the
             | project by providing you with the lines you'd need to
             | update:
             | 
             | https://github.com/mozilla/pdf.js/blob/83e1bbea6e23db874442
             | 0...
             | 
             | https://github.com/mozilla/pdf.js/blob/83e1bbea6e23db874442
             | 0...
        
               | mavsman wrote:
               | Brilliant
        
       | kebman wrote:
       | I looked into coding PDFs once. Then I closed my MacBook (Pro)
       | and went for a long walk into the ocean. I think I almost got to
       | America, but then I turned and swam back again. Turnd out I had
       | just fallen asleep and had a nightmare. I was actually just
       | working with regular text files, and everything was fine.
        
         | gorgoiler wrote:
         | _Only Forward_
         | 
         | ...a wondeful novel along these lines. They only get shot and
         | kidnapped though. Nothing so bad as PDFs.
        
         | rvense wrote:
         | My favourite PDF fact is that it doesn't have to start at the
         | beginning or end at the end of a file. Any sea of bytes that
         | contains a PDF file is an acceptable PDF file...
        
           | TeMPOraL wrote:
           | Did anyone try to pluck out PDFs from /dev/urandom? How about
           | from radiotelescope feed? Maybe the first evidence of
           | extraterrestrial life will be some poor alien's tax form?
        
             | willis77 wrote:
             | The digits of pi contain every pdf that ever could and ever
             | will exist.
        
               | rbonvall wrote:
               | Since a PDF can begin with non-PDF content, then pi
               | itself is a valid PDF file.
        
               | dheera wrote:
               | Pi is thought to be normal but it hasn't been proven yet,
               | so we can't say that for sure, but it's likely true.
        
               | mtzet wrote:
               | Well maybe. We don't know if pi is a normal number.
        
               | SteveGoob wrote:
               | > We don't know if pi is a normal number.
               | 
               | Sure we do. There are plenty of proofs out there that pi
               | is an irrational number.
        
               | jimktrains2 wrote:
               | Normal in this sense means that all the frequency of all
               | digits approaches a uniform distribution as the length of
               | the sample increases towards infinity. Basically if we
               | could see "all of" p and count all the 0s, 1s 2s, 3s, &c
               | to 9 all the counts would be equal.
        
               | gugagore wrote:
               | That on its own can't be right, because
               | 0.12345678901234.....
               | 
               | According to wikipedia, you gave a definition for "simply
               | normal", and for normal numbers the distribution of any
               | sequence of digits is uniform. So 00, 01, ..., 99 each
               | occur uniformally too.
        
               | enedil wrote:
               | Moreover you need to consider it with regards to all
               | other bases than 10 too.
        
               | andreareina wrote:
               | Irrational does not imply normal. For example,
               | 1.01001000100001... is irrational but it's certainly not
               | normal.
        
               | x3c wrote:
               | Technically, 1.01001000100001... can be normal depending
               | on what ... stands for. :)
        
               | bscphil wrote:
               | Well, obviously. But presumably the ... is meant to imply
               | that this is the summation of 1/(10^(x(x+3)/2)).
        
               | coddle-hark wrote:
               | Or what 1 or 0 or . stands for.
        
               | landryl wrote:
               | Actually I'd argue the example you provided is normal, as
               | long as you authorise a particular encoding where every
               | number n you're looking for is encoded as a string of n
               | zeros.
               | 
               | It's then trivial to see that every number you can think
               | of is encoded in there, and therefore any data, piece of
               | music or movie that ever existed.
               | 
               | (I'm not sure we're allowed to fiddle with the encoding,
               | but since we allow ourselves to represent a piece of
               | music into a number, we're already talking about encoding
               | anyway, so it doesn't seem like cheating to me...)
        
               | enedil wrote:
               | Normality of a number is with respect to number bases, so
               | your trick with encoding is invalid. Otherwise, every
               | computable number could be considered normal - take an
               | algorithm for generating of it, supply a random string
               | (this is the encoding), disregard the random string, and
               | you have a perfectly valid normal representation of your
               | number. So it is cheating.
        
               | dheera wrote:
               | Encoding doesn't count. Normality is a very specific
               | mathematical concept:
               | https://en.wikipedia.org/wiki/Normal_number
               | 
               | Also, 1.01001000100001... is a good example of a number
               | that is both irrational and transcendental but not
               | normal.
        
               | ducktective wrote:
               | I don't think that is a proven fact.
        
               | vcxy wrote:
               | "Find the earliest valid pdf in consecutive digits of pi"
        
               | btown wrote:
               | Imagine if it was a PDF that simply rendered the number
               | 42.
        
               | ddalex wrote:
               | If that happens we know for a fact that we are in a
               | simulation
        
               | drevil-v2 wrote:
               | Please don't give them any ideas.. the whiteboard
               | interview coding tests are hard enough as it is
        
               | paulmd wrote:
               | I mean, the answer is trivially zero, there exists a PDF-
               | like structure somewhere in Pi, and the offset of that
               | doesn't have to be zero, it can start or end anywhere. So
               | the range [0, N] is a valid PDF.
        
               | infogulch wrote:
               | "Find the last byte of the first valid PDF in the binary
               | digits of Pi"
        
               | nroets wrote:
               | <citation needed>
               | 
               | Including a PDF that generates the digits of pi
        
           | nostoc wrote:
           | It'll depend on the pdf reader you're using, but I'm pretty
           | sure the PDF header needs to start in the first 1K of the
           | file.
        
             | mkl wrote:
             | Some readers won't need a header at all, I think. Near the
             | end (usually!) of the file there's an index of objects
             | (page data etc.) with byte offsets, which can point to
             | anywhere in the file.
        
           | sp332 wrote:
           | On the other hand, this allows for some incredible polyglot
           | files, like some of the tricks with PoC||GtfO issues where
           | the file is a readable PDF but also a game cartridge and also
           | a zip file with the proof-of-concept code in the issue. And
           | the front cover has the MD5 hash of the whole file printed on
           | it... but that's another trick entirely!
        
             | rvense wrote:
             | Yeah, next time I need a CV it'll be a single-file Ruby web
             | server and PDF that's also an archive of its own sources.
        
               | toomanybeersies wrote:
               | I'm currently looking for a job as a Rails Developer. I
               | might just do that.
               | 
               | Probably won't send it to any recruiters, but it will be
               | a funny anecdote for interviews.
        
           | cgb223 wrote:
           | My favorite pdf fact is that the security flags for things
           | like copy protection and passwords are on the viewer to
           | implement so you can just turn them off and all the security
           | is gone
        
             | BHSPitMonkey wrote:
             | Debian actually goes out of their way to patch those checks
             | out in their PDF-related packages as part of their stance
             | against DRM, like this example with "pdftk":
             | 
             | https://sources.debian.org/patches/pdftk/2.02-4/drm_fix/
        
             | toomanybeersies wrote:
             | You can also circumvent copy protection on PDFs by taking a
             | screenshot, or taking a photo of the screen with your
             | phone.
        
             | r00fus wrote:
             | This is not entirely true, you can encrypt PDFs [1] since
             | v1.3 of the spec but the cypher is often so weak (RC4 until
             | v1.6) they can be bruteforced in reasonable amounts of
             | time.
             | 
             | [1] https://www.pdflib.com/pdf-knowledge-base/pdf-password-
             | secur...
        
             | Drdrdrq wrote:
             | My somewhat less favorite pdf fact is that if you do that,
             | you are still breaking protection, legally speaking.
        
               | smegger001 wrote:
               | So what if you open it in a postscript viewer instead of
               | a PDF viewer? Because they are compatible formats except
               | for some edge cases like security flags.
        
               | mkl wrote:
               | Postscript and PDF are definitely not compatible formats.
               | The drawing model is similar, but the structure and code
               | are completely different.
        
               | maxerickson wrote:
               | Seems to be a reasonable analogy with trespass, where you
               | are violating the law when you cross an invisible line.
               | The need for marking the line varies considerably.
               | 
               | And even places with strong roaming rights tend place
               | limits on well marked land.
        
           | maddyboo wrote:
           | Can a PDF file contain a PDF file, and if so can that PDF
           | file contain a PDF file?
        
             | betatim wrote:
             | Yes. Because the PDF standard specifies a mechanism that
             | lets you "attach" files to a PDF :)
        
             | layoutIfNeeded wrote:
             | ZIP files can: https://research.swtch.com/zip
        
           | alexott wrote:
           | You can imagine the pain when you need to reliably detect PDF
           | mime type on web proxy, or something like...
        
           | agumonkey wrote:
           | I can never find the PDF hack talk where author explains all
           | 100 ways to embed things in pdf or pdf into things
        
             | wffurr wrote:
             | It's hidden in a PDF in the digits of Pi.
        
         | divbzero wrote:
         | I had to handle action buttons for a PDF once. I swam out from
         | America a long long ways before turning back. Might have
         | spotted you middle of the ocean.
        
           | meshaneian wrote:
           | Legit username warns of one PDF peril.
        
         | schoolornot wrote:
         | I thought I found a nifty trick by using OpenOffice to create a
         | form with a pre-filled value. I decoded the PDF using pdktk or
         | one of the free tools, and then modified the value. Nope, that
         | caused some kind of cascading/checksum error.
         | 
         | Ended up just making the app generate HTML before calling
         | wkhtmltopdf.
         | 
         | The PDF spec is insane! But like all things what you get out of
         | Word/OpenOffice is 100x more complex than if you wrote it
         | yourself, which is indeed doable.
        
         | phpdave11 wrote:
         | It's really not that difficult if you read and understand the
         | PDF specification. As a learning exercise, I created a simple
         | PDF generator library that creates ASCII PDF documents (you can
         | open them in Notepad) and includes comments about what each
         | drawing instruction does.
         | 
         | https://github.com/phpdave11/davepdf
        
           | recursive wrote:
           | I'm sure generating PDFs is much easier than reading them,
           | such that "it just works" with any kind of PDF.
        
             | phpdave11 wrote:
             | Reading them is also easy. I wrote a library that reads
             | PDFs and imports page(s) from an existing PDF into a new
             | PDF as a Form XObject.
             | 
             | https://github.com/phpdave11/gofpdi
        
         | Someone wrote:
         | OK, they added various ways of data compression, but PDF is,
         | basically, a text-based format.
         | 
         | As far as I know, any PDF can be losslessly converted to an
         | equivalent PDF that can be edited in any text editor, even
         | Notepad. And yes, you could fill in the forms there, too (if
         | you were stubborn enough)
        
           | quickthrowman wrote:
           | That's news to me and Bluebeam's PDF search feature! It turns
           | out you can make PDFs (This usually happens with
           | architectural drawings) that are comprised purely of images
           | that are not searchable, and therefore you are wrong.
           | 
           | I silently thank every architect that provides searchable
           | PDFs, it makes my job way easier
        
             | mkl wrote:
             | GP is right. The _code_ that makes up a PDF is text-based.
             | Those images can be encoded in the PDF file using the
             | ASCIIHexDecode filter, i.e. as editable ASCII text code.
        
               | xfer wrote:
               | Yes just use a hex editor and every data is text-based.
        
           | roflc0ptic wrote:
           | It sounds like you either know a lot more than me or a lot
           | less than me. The PDFs I've dealt with don't store text as
           | strings, they store it as individual characters. This left me
           | having to write a heuristic based algorithm to group the
           | characters into words, words into lines, lines into
           | paragraphs, paragraphs into columns.
           | 
           | Again, as far as I know, there are no heuristics good enough
           | to get that right for all values of PDF.
        
             | mkl wrote:
             | More. Go here and download the PDF spec.:
             | https://www.adobe.com/devnet/pdf/pdf_reference.html
             | 
             | Look at Chapter 3, Syntax. The code is all text based. We
             | are not talking about the visible characters in a PDF
             | viewer, but the code of the PDF file itself.
        
             | belval wrote:
             | He knows a lot less than you probably because there is
             | absolutely no requirements for PDFs to be in text format
             | and most aren't. The "text" he is editing could render to
             | completely different characters depending on how the PDF
             | document was created.
             | 
             | The default MacOS PDF printer will actually remap the font
             | cmap making born-digital PDFs where the "text" is something
             | else entirely (say "$" maps to "a").
        
               | yjftsjthsd-h wrote:
               | > The default MacOS PDF printer will actually remap the
               | font cmap making born-digital PDFs where the "text" is
               | something else entirely (say "$" maps to "a").
               | 
               | What? Why!? I've heard of doing that as a form of DRM,
               | but I can't imagine Darwin _defaulting_ to doing that.
        
               | belval wrote:
               | I never dug deeper into it, so I am not aware of why it
               | does that or if it's a specific version or whatnot, but
               | take a PDF from which you can extract the text (with
               | pdftotext/pdfbox for example). Open it in the document
               | viewer and "print" it to PDF. If you extract the text
               | again it is not readable anymore.
               | 
               | This wouldn't be an issue if it was a conscious choice,
               | but when I parsed a lot of born-digital PDFs we ended up
               | with a lot that were like that from various source. Try
               | explaining that...
        
               | colejohnson66 wrote:
               | Could it be "compacting" the fonts? So if U+0000 to
               | U+0007F aren't used at all, remove those glyphs and set
               | U+0000's glyph to be what _was_ U+0080? Yes, I know NULL
               | doesn't have a glyph, but I hope that gets the idea
               | across.
        
         | mettamage wrote:
         | What's your favorite PDF feature that causes a brain meltdown?
         | 
         | I've read a few comments on HN how PDF is, well, not developer-
         | friendly. If people are interested in providing some more
         | examples here, I'd be curious to know!
        
           | jahewson wrote:
           | Acrobat can read fantastically corrupt PDF files none of
           | which are covered by the spec. The endless surprises induce a
           | special kind of madness.
           | 
           | Streams just suddenly end? That's ok. Totally corrupt xref
           | tables? Ok. Incorrect image headers? Ok. Unrecognisably
           | mangled Type1 font formats? Fine!
        
             | belval wrote:
             | That's great because it creates client expectations
             | regarding what my PDF application should support.
             | Implementing the spec is not good enough, you have to do
             | what PDFium or Adobe do.
        
           | pierrebai wrote:
           | In the early 2000 I coded a PDF library for an industrial
           | printer suite. (Print, proof, impositions)
           | 
           | I personally think the structural PDF format is a really
           | great format. It's entirely ASCII-based, a pure text format,
           | yet it can embed arbitrary binary data and compress that
           | data. The actual structure is simple and support just enough
           | functionality, like a tree of object, dictionaries and
           | arrays, unicode strings, date formats, etc.
           | 
           | I think if you limit yourself to pure structural PDF woulde
           | have been a great format to standardize upon, much better
           | than JSON or XML. It';s richer than JSON, simpler and saner
           | than XML. Again, it's top-notch ability to embed binary is
           | great. It has other great characteristics, for example you
           | can update anything just by appending.
           | 
           | The ugly bits are in the "semantic" PDF: the page
           | descriptions, media, etc. Even then, the early version of PDF
           | were nice, mainly just simplified Postscript.
        
           | izacus wrote:
           | Being able to do SQL queries to remote servers, upload form
           | contents directly to a server, embedded 3D models and being
           | able to have a fully featured page embedded Tetris game due
           | to support for JS.
           | 
           | Having said that (and worked on a commercial PDF library),
           | despite all the cruft that came with age, it's a well built
           | format that survived the test of time with good reasons.
        
           | amelius wrote:
           | At least it's probably better than MS-Word's internal format
           | ... (?)
        
             | DavidPeiffer wrote:
             | I'm not saying it's beautiful, but isn't Ms-Word's internal
             | format basically a series of XML files that are zipped up?
             | 
             | The old .Doc and .xls files were a bad format, but my
             | understanding is that since Office 2007 the format is
             | generally much better.
        
               | alexott wrote:
               | Ms office files prior to office 2007 were mostly memory
               | dumps of specific components, wrapped into composite
               | files aka OLE2 storage - their content varied depending
               | on office versions and often locale
        
               | jhoechtl wrote:
               | To be fair the old doc format was conceived in DOS era
               | and memory efficiency was a primer back in the days.
        
               | smegger001 wrote:
               | and if they hadn't waited so long to update to a sane
               | file format no one would complain but they waited until
               | 2007 to fix the format long after the dos era memory
               | excuse had long ceased to be an issue. even then they
               | only did it to allow them to shoe horn their format in as
               | the iso standard after one was already selected bribing
               | there way through the process.
        
             | liversage wrote:
             | Microsoft Word stores XML documents inside a zip archive.
             | There is a detailed specification of the format available:
             | https://docs.microsoft.com/en-
             | us/openspecs/office_standards/...
        
               | xaldir wrote:
               | I think he was talking about the classic .doc format
               | which was a clusterfuck and not the open XML.
        
               | akie wrote:
               | If I remember correctly the XML format was just an XML-
               | encoded version of the binary counterpart. Including all
               | or most of the bugs and weird hacks.
        
               | patrec wrote:
               | You don't remember correctly. Word's docx format is far
               | more intelligent than openoffice ODT, despite propaganda
               | to the contrary. With one exception: word's zip files
               | don't have a convenient magic header. The way it works
               | with ODT, and a bunch of other formats is that you put an
               | uncompressed identifier file (`mimetype`) as the first
               | entry inside your zipfile. At byte 30 (of your zipfile)
               | you then get `mimetype$THE_MIMEMETYPE`. This is a nice
               | trick and works for any zip-based format. Sadly, docx
               | does not do that so you have to go by file extension or
               | look at (more of) the contents of the zipfile.
        
               | amaccuish wrote:
               | with the previous format being essentially a memory dump,
               | i'd say that's progress
        
               | alexott wrote:
               | That's correct - I worked with MS team that documented
               | old formats, and they said that sometimes they don't have
               | people left who knew what specific struct was intended
               | for - although that was mostly for people PowerPoint and
               | Visio, excel and word was better documented
        
           | belval wrote:
           | - Remapping font tables to different characters for to reduce
           | code usage.
           | 
           | - Clipping path logic, you can write text outside of it,
           | which makes it effectively invisible yet it will show up if
           | you try to extract the text.
           | 
           | - Anything regarding the graphicstate stack, it's a pain to
           | debug.
           | 
           | - Extracting content from AcroForm/JS "XFA" forms
           | 
           | PDF is great format for printing, it's just a pain for pretty
           | much everything else.
        
             | aidos wrote:
             | This is also my list. Except for the forms, that's one I
             | _don 't_ have to deal with.
             | 
             | My other one is the use of multiple subset fonts that are
             | actually the same font with a different subset of glyphs
             | that you want to merge back together.
        
         | moultano wrote:
         | It's kinda crazy that this is the format we've standardized on
         | to carry all of the output of academia into the future.
        
           | mjcohen wrote:
           | A lot of the input is LaTeX, so that's ok.
        
             | MayeulC wrote:
             | And arxiv asks for the original latex source when
             | submitting.
             | 
             | Well, at least, pdf is probably better than printed paper
             | for that purpose.
        
         | belval wrote:
         | As someone still working with PDF processing, I can confirm
         | that it doesn't get easier.
        
       | nvr219 wrote:
       | FINALLY! Now I can finally uninstall Chrome.
       | 
       | Of course, I do wish Sumatra supported filling forms. Then I
       | could uninstall Firefox too! ;-)
        
         | MayeulC wrote:
         | Okular works quite well for filling forms in my experience :)
        
       | flowerlad wrote:
       | > _After entering data into these fields you can download the
       | file to have the filled out version saved to your computer._
       | 
       | And then what? Fax it? Sounds like a missed opportunity to me. It
       | would be nice if you can add a Submit button to have the data
       | posted to the server, just like any other web-based form.
        
         | Jaxan wrote:
         | That would be nice if websites support that. But in my
         | experienced all PDF forms I fill in have to be printed and then
         | signed and posted...
        
           | chairmanwow1 wrote:
           | I haven't printed a PDF to sign in years. Why don't you just
           | affix a digital image of your signature to the file? Save it
           | and email it back to whomever.
        
             | emidln wrote:
             | This, and in the rare circumstance where they only accept
             | regular mail or faxes, I use HelloFax.
        
             | Jaxan wrote:
             | Many places don't accept emails. Sure you can sign
             | digitally and then print.
        
         | unbalancedevh wrote:
         | e-mail it, or save it for your records.
        
           | flowerlad wrote:
           | And what would the recipient do with the email? Type it in
           | manually? You don't see any room for improvement here?
        
       | _coveredInBees wrote:
       | Sheesh, what's with the hate for a generally all-round useful
       | feature in an Open source browser? The last thing I want is to
       | have to install 3rd-party software on my machine and have my
       | browser be held hostage to it just to view PDF documents on the
       | web. Being able to fill them in is a very useful feature and the
       | in-browser PDF readers are still way less bloated than most other
       | plugins.
        
         | yjftsjthsd-h wrote:
         | Yes, this is a nice feature added to a basically-reasonable
         | implementation of a PDF viewer. I think the objection is that
         | that PDF viewer should be an actual independent application,
         | not baked into a browser that already is too many things to too
         | many people. It's like Chrome including a basic antivirus
         | function (https://support.google.com/chrome/answer/2765944?co=G
         | ENIE.Pl...) - yes it's useful, yes I trust it more than a lot
         | of AV products, but no I don't think it's reasonable to bundle
         | it into the program that's supposed to be here to render web
         | pages for me. (Similar arguments, to varying degrees, are made
         | against WebRTC and Pocket)
        
           | bad_user wrote:
           | No, it's like Chrome including a PDF viewer.
        
           | _coveredInBees wrote:
           | I really don't see why it should be an independent
           | application. I mean it's not like we expect a PNG viewer or
           | HTML5 video viewer to be a separate application in a browser.
           | Being able to view (and in this case fill/interact with) PDFs
           | is pretty much a basic necessity on the web. Beyond the core
           | HN crowd, almost nobody cares to have a 3rd party application
           | that they have to install to view PDFs in their browser.
           | Having a lightweight and secure PDF viewer that is also not
           | made by some 3rd party company that could be collecting any
           | amount of data on you is a good thing in general.
        
         | kibwen wrote:
         | Yes, like it or not PDF is a de facto standard of the web, in
         | the same way that Flash was nearly a de facto standard before
         | the industry-wide decade-long effort to kill it. A browser that
         | doesn't support PDFs is as lacking in the eyes of users as a
         | browser that doesn't support PNGs.
        
           | Eduard wrote:
           | I don't agree.
           | 
           | PDF is fine to be some binary blob to download just as most
           | other binary blob formats are.
           | 
           | Would you expect to have .exe files being directly
           | interpreted by a browser?
        
             | jiveturkey wrote:
             | > Would you expect to have .exe files being directly
             | interpreted by a browser?
             | 
             | no, i wouldn't. and yet here we are: wasm.
        
           | jmiserez wrote:
           | If Flash was rendered natively in the browser, sandboxed and
           | across different browsers, and with high enough
           | performance/low enough battery impact, it would have stayed.
           | 
           | There were efforts similar to PDF.js to run Flash content
           | using JS but they were never able to tick all those boxes.
        
       | bawolff wrote:
       | Finally. Its a nightmere trying to fill out a pdf form on linux.
        
         | franga2000 wrote:
         | Okular can handle basically everything for me, except for those
         | Adobe-proprietary ones that require JS and all kinds of other
         | dumb features that only Acrobat supports.
        
           | jhoechtl wrote:
           | I recently made the switch to gnome as the multi-monitor
           | support, fractional scaling and general Wayland support is
           | only excelled by sway. I sorely miss Okular!
        
             | mgbmtl wrote:
             | Can't you still run KDE apps under Gnome, even with
             | Wayland? I use a few. Some of them look better with the
             | "QT_QPA_PLATFORM=wayland" environment variable.
        
               | formerly_proven wrote:
               | Most KDE apps work not just under Gnome, but even under
               | _gasp_ Windows! I think Okular and some others are even
               | in the MS app store.
        
             | ReverseCold wrote:
             | Okular should work fine on GNOME, but you might need extra
             | disk space for all the KDE dependencies.
        
               | jhoechtl wrote:
               | Thats the point. Apps which only use QT like keepassx are
               | ok but Okular would swap in half of KDE.
        
         | kevincox wrote:
         | I very rarely have any issues using evince. What PDF viewer are
         | you using?
        
         | randlet wrote:
         | I purchased PDF Studio Pro and it works pretty well for me.
        
         | loufe wrote:
         | I just use Libreoffice Draw to add text into stubborn pdfs on
         | windows and any on Linux. It's a good, free OSS way to get the
         | job done, though not pretty.
        
       | torresjrjr wrote:
       | I just want to smooth scroll with vim keys (hjkl). Too much to
       | ask? :/
        
         | calcifer wrote:
         | How is that relevant to this thread?
        
       | doc_gunthrop wrote:
       | Any chance Firefox will have built-in support for printing to
       | PDF? There's a browser extension[1], but it was last updated 3
       | years ago. Seems the Chrome browser has had this feature for
       | ages.
       | 
       | 1: https://addons.mozilla.org/en-US/firefox/addon/print-to-
       | pdf-...
        
         | jwatt wrote:
         | It's not ready for release yet, but if you flip the preference
         | `print.tab_modal.enabled` to true you'll get the replacement
         | printing interface which has a "Save as PDF" pseudo-printer.
        
           | [deleted]
        
         | callalex wrote:
         | Does your operating system not support this natively from the
         | print dialog?
        
           | auxym wrote:
           | On Windows at least, using the built-in PDF printer with
           | Firefox results in text in the PDF file being converted to
           | paths (not text). Huge file and you can't copy/paste. I've
           | tried 3rd party PDF printers (PDFForge) and the result is the
           | same, so I think it might a FF bug (or feature)?
           | 
           | Chrome's save-as PDF produces actual text. It's the main
           | reason I still have chrome installed.
        
             | RonanTheGrey wrote:
             | That seems.... odd. I am on Firefox on Windows and I print
             | to PDF all the time using the Windows built-in PDF printer
             | ("Microsoft Print to PDF"), without issue. In fact
             | sometimes that printer is the only one that can get things
             | to format correctly!
             | 
             | Something on your system might be interfering with the
             | printing process.
        
             | vel0city wrote:
             | There must be something strange your particular set up, or
             | maybe the behavior changes based on the page. Firefox 81,
             | Windows 10 version 2004, multiple computers, printing this
             | page with the "Microsoft Print to PDF" printer this page of
             | comments all result in a PDF of ~470KB with selectable
             | text.
        
       | elric wrote:
       | That built in PDF viewer is another feature that could have been
       | an addon. It's bloat which increases the browser's attack
       | surface. It's completely unneeded given that just about every OS
       | ships with some kind of PDF reader out of the box.
        
         | morpheuskafka wrote:
         | The built-in PDF reader on Windows is literally to open the PDF
         | in Edge.. so not very good UX for Firefox and also a good
         | argument that browsers are expected to have PDF readers.
        
         | [deleted]
        
         | sp332 wrote:
         | The alternative was installing an Adobe plugin with no sandbox,
         | so it made sense at the time.
        
         | godshatter wrote:
         | I would like to see Mozilla modularize Firefox more. Browsers
         | are such huge beasts that contain everything imaginable plus
         | the kitchen sink these days. It would be nice for these kinds
         | of features to be add-ons that can be disabled or deleted if
         | their functionality is not needed or desired, freeing resources
         | for other use.
         | 
         | They can be part of the initial install so that Mozilla can
         | provide the browser as they envision it, but be able to be
         | removed for those who have other ideas of what their browser
         | should consist of.
         | 
         | I don't know how technically feasible that is with their code,
         | but it makes sense to me from a developer standpoint.
        
         | toyg wrote:
         | Thank Google - Chrome was the first browser to ship with a pdf
         | reader, and people loved it. Now it's just expected that any
         | browser should have a workable PDF reader built-in.
        
         | axelf4 wrote:
         | > It's bloat which increases the browser's attack surface.
         | 
         | AFAICT PDF.js is just another JavaScript application and thus
         | as sandboxed as any other website.
        
           | est31 wrote:
           | It's a js application and thus less exploitable than your
           | average C application with tons of unsound code, but IIRC it
           | belongs to the class of "privileged js" layer that Firefox
           | has, so has special rights that usual website js doesn't
           | have.
        
       | saghm wrote:
       | I just found out that this feature was coming last night, and I
       | hadn't realized that today was release day! I did discover that
       | if you want to enable it on Firefox 80, you can toggle
       | `pdfjs.renderInteractiveForms` in about:config
        
       | paulpauper wrote:
       | I wold like to see a version that allows forms to be signed
        
         | speedmagnet wrote:
         | Microsoft Edge allows you to draw on PDFs and save them easily.
         | I use it for signing all the time.
        
           | anaganisk wrote:
           | I think he meant digital signature
        
             | nip wrote:
             | In case he meant regular (drawn) signature, it can be done
             | via Preview on Mac.
             | 
             | For a local web use, I built for myself
             | https://formulairemagique.fr for this very reason
        
               | tendersej wrote:
               | good job on the simple UI! I think it will prove useful
               | next time I have a form to fill.
        
               | lostlogin wrote:
               | Preview.app is just so good. It's my favourite default
               | Mac app by miles.
               | 
               | It and terminal.app have survived the thing Apple does
               | where they update applications and remove all the
               | application's power to achieve anything.
        
           | shoguning wrote:
           | I was pleasantly surprised by this recently. Just worked
           | using my touchscreen laptop. So rare on Windows.
        
       | godelski wrote:
       | Can we just get support for math text? For years I accidentally
       | print research papers from the browser only to have to open it
       | back up in a non-browser PDF reader and reprint.
       | 
       | With that and form fill I basically don't need another PDF
       | reader, which is nice.
        
       | Causality1 wrote:
       | Is there something hard about fillable forms on PDF? Why have a
       | PDF viewer at all if it couldn't fill out a form?
        
         | derefr wrote:
         | > Is there something hard about fillable forms on PDF?
         | 
         | In the sense of a "form" just being lines on paper that you can
         | arbitrarily add some text to -- no, that's easy.
         | 
         | Likewise, in the sense of a "form" being some defined input
         | regions that accept your keystrokes and turn them into new text
         | DOM nodes in the PDF itself -- easy enough. Though, unlike
         | HTML, there's no concept of an <input> _tag_ that just has the
         | semantics of accepting keystrokes and turning them into
         | (persisted) input; instead, this all has to be done through
         | scripting [i.e. writing event-handlers, or having some PDF
         | authoring software generate them]; and there are several
         | incompatible scripting languages for PDF that get used, some of
         | which are proprietary with no open specification.
         | 
         | But, doing form _validation_? Or, worse yet, making one of
         | those fancy PDF forms that auto-calculates fields like an Excel
         | spreadsheet? Now you're getting into the hairy stuff, because
         | IIRC none of the _open-standard_ PDF scripting systems provide
         | these sorts of mechanisms, so these are inherently proprietary
         | things.
         | 
         | And when I say "proprietary", I mean "like old versions of Word
         | or Photoshop, where each version emitted its own in-memory
         | data-structures to disk without formal serialization; and it
         | was the job of authors of future versions to write importers to
         | deserialize whatever format resulted."
        
         | foxdev wrote:
         | While PDF is an open format on paper, in practice it's as
         | proprietary as any ancient format. Supporting it in full is not
         | trivial.
        
           | core-questions wrote:
           | The real problem here is that, 20+ years on, printing to PDF
           | is still a totally natural and easy-to-understand metaphor
           | for a normal office desktop user; but producing HTML for the
           | browser is still impossible for them.
           | 
           | If we simply had print-to-HTML functionality which resulted
           | in a document identical to what you view onscreen while
           | editing, PDF could die the death it deserves.
           | 
           | But HTML+CSS somehow manages to suck just as much for common
           | usage, so it persists.
        
             | foxdev wrote:
             | I wish epub would catch on for more than books. An epub is
             | just HTML and CSS in a zip file, and a large part of the
             | world population has a device than can load it and present
             | it cleanly.
        
         | jahewson wrote:
         | Yes! PDF forms are amazingly complex. Text in PDF is very
         | complex and the forms themselves are a kind of templated vector
         | graphics. Multiply this by all the weird and corrupt PDF forms
         | out there which Acrobat support and you have a challenging
         | task.
        
         | gpvos wrote:
         | I don't know about you, but >98% of the PDFs I use are just for
         | reading and don't contain a fillable form.
         | 
         | And implementing a PDF viewer is already a major undertaking;
         | adding the form functionality complicates things even more.
        
           | nip wrote:
           | I posted a link (above) to the app I built to solve that
           | problem.
           | 
           | The vast majority of form is indeed not << ready >> for
           | input, requiring users to go through hoops to fill them. And
           | that work is done again by the next person.
        
       | marvindanig wrote:
       | Why is Firefox spending all their money and goodwill on a piece
       | of technology that should be done away with?
       | 
       | PDF is a dork. It's an accessibility nightmare with no obvious
       | advantage over simple ordinary webpages. Somewhere in the
       | comments below, it is mentioned that supporting PDFs is a non-
       | trivial piece of technology. May be! Even steam engines have non-
       | trivial technology under the hood.
        
         | cptskippy wrote:
         | > It's an accessibility nightmare with no obvious advantage
         | over simple ordinary webpages.
         | 
         | It is easy to criticize something when you don't look back at
         | the historical context through which it emerged. It has plenty
         | of advantages over HTML but they're easy to dismiss if you
         | don't have a use case for them.
        
           | inetknght wrote:
           | > _It has plenty of advantages over HTML but they 're easy to
           | dismiss if you don't have a use case for them._
           | 
           | Can you discuss some of the advantages? The only advantage
           | that comes to mind is that Apple has built-in support for
           | writing PDFs and that has a lot to do with Adobe rather than
           | PDF being a better candidate.
        
             | tdhz77 wrote:
             | I work for US Federal courts, I can assure you html isn't
             | sufficient over PDF's for court cases. Evidence are filed
             | in pdfs. Documents (PDFs) need to be a historical archive,
             | and the ability to modify would damage the credibility of
             | those documents.
        
               | kevincox wrote:
               | > ability to modify
               | 
               | How are PDFs any less modifiable than HTML other than
               | requiring (widely available) specialized tools instead of
               | a text editor?
        
               | endless1234 wrote:
               | Cryptographic signing is a core feature of PDF, but not
               | HTML.
        
               | marvindanig wrote:
               | Yeah, but does Firefox need to solve the use-case of a
               | court system? Also, tangentially the solution to
               | guarantee "tamperproof" archiving is in cryptography and
               | that's not a feature of PDF.
        
               | yzmtf2008 wrote:
               | No, Firefox doesn't need to support the use case of a
               | court system. That's not what GP is saying. All we're
               | establishing here is that PDF is a useful format, and
               | Firefox is supporting it.
               | 
               | Also, cryptographic signatures do happen to be a feature
               | of PDF.
        
               | marvindanig wrote:
               | Now that I read my comment I see the issue with it.
               | 
               | What I meant to say is that Firefox should focus on
               | implementing cryptographic signing over HTML then. And
               | not a PDF viewer on the web--in that, enabling
               | cryptographic signatures isn't tied to the format PDF per
               | se.
        
             | toyg wrote:
             | PDF prints infinitely better than HTML, and it can be
             | somewhat hardened against modification by average users.
             | 
             | If you think MSOffice users would prefer to output HTML
             | over PDF, you don't live in the same corporate world I
             | inhabit.
        
             | bbarn wrote:
             | PDF's ubiquity is 100% that it printed the same (or close
             | to same) on any postscript compatible printer. It's tech so
             | old many in the industry ignore the reason it existed (and
             | still exists). Every solution beyond PDF has also been
             | either closed source (read Microsoft) or ignored. It's
             | useful, that's why it exists. Yes, it's archaic, yes, it's
             | hard to read for tech people, but for non tech people, it
             | solves an issue that plagues the entire software industry:
             | Standardization.
        
         | toyg wrote:
         | _> Why is Firefox spending all their money and goodwill_
         | 
         | I doubt "all" their money goes towards the pdf-reader bit. And
         | tbh, I'd say nobody will really lower their goodwill towards
         | Mozilla because they add features that a lot of people actually
         | need.
        
         | beervirus wrote:
         | There are lots of use cases for PDF where a web page is totally
         | unsuitable.
        
           | marvindanig wrote:
           | As someone working on formats, I disagree with your
           | generalization. But let's get into specifics. List the things
           | about PDF that you believe can't be done with web pages?
        
             | cptskippy wrote:
             | It's easy to dismiss things in their entirety and then
             | require someone else to "prove you wrong". Why don't you
             | prove you're right instead?
             | 
             | Why don't you list all of things that PDFs can do that can
             | also be done with web pages?
        
               | marvindanig wrote:
               | Sure, here's my list: everything + more.
               | 
               | There's nothing a PDF can do that a webpage can't. In
               | fact there are a hundred of things that a webpage can do,
               | but a PDF can't. Including, form fields, input fields and
               | seamless form submissions.
               | 
               | Webpages can also do this:
               | https://bubblin.io/cover/official-handbook-by-marvin-
               | danig#f...
               | 
               | Disclosure: It's my work.
        
               | minerjoe wrote:
               | Wish I could look at your work but my browser doesn't
               | support javascript. I wonder what it is about.
        
               | cptskippy wrote:
               | Anyone can create a PDF form to capture data and
               | signatures, email it to someone who can then fill it out
               | offline, and then email it back. That's not something
               | easily done with a webpage, and it's not something my mom
               | can do.
               | 
               | PDFs are easy to make and easy to work with. Web pages
               | aren't.
               | 
               | Your work is impressive, and why would anyone want that?
               | Do you envision lawyers putting all their legal contracts
               | into fancy flippy books?
        
               | marvindanig wrote:
               | > Do you envision lawyers putting all their legal
               | contracts into fancy flippy books?
               | 
               | Someone will have to solve it for the lawyers in a not so
               | 'fancy consumerish' way. Point is that it is possible to
               | do that, and Firefox shouldn't be solving this problem
               | using an ancient format and a layer of cruft in between.
        
             | f1refly wrote:
             | Distributing a document with functioning kerning and
             | embedded fonts that works offline
        
               | marvindanig wrote:
               | Serviceworkers+@font-faces+font-kerning property of CSS3.
               | Done, next.
        
               | yzmtf2008 wrote:
               | I think you missed the point of distributing. I'm never
               | going to let you email me your serviceworkers because I
               | can't forward this document to anyone without relying on
               | you hosting a server / not changing the content.
        
               | marvindanig wrote:
               | Oh, I'm all in for email/attachment based distribution.
               | Just not with Firefox sporting it on the web browser
               | where you'd in all certainty require someone to host a
               | server and for you to trust them that no changes have
               | been made to the content.
               | 
               | That was the entire point of my comment at the top.
        
             | edflsafoiewq wrote:
             | Going to a particular page and only having to render that
             | one page. Large HTML documents are unwieldy.
        
               | beervirus wrote:
               | The modern web is slow for a lot of reasons, but none of
               | them are about rendering lots of static html. Anyway just
               | break things up into multiple pages if necessary.
        
           | derefr wrote:
           | Yes, maybe generally; but let's talk about the specific case
           | here -- filling of complex PDF forms.
           | 
           | When a PDF that has interactive form fields, calculated auto-
           | populated fields, fields that are enabled/disabled according
           | to the inputs of other fields, etc. -- the organization that
           | created it (usually government or education) usually does
           | that because they want you to fill it out _using_ a PDF
           | viewer; save it (which will persist the form inputs "into"
           | the resulting PDF); and then submit _the modified PDF file_
           | back to them. They want this, because they can use automated
           | backend processes to extract the data from the PDF. They
           | _don't_ want you to just print out the thing and fill it out.
           | In fact, many such "fillable" PDFs start off in a state with
           | many of their form-fields disabled and voided, such that
           | printing them out in that state would result in a form you
           | can't really write on!
           | 
           | So, at _that_ point, why didn't they just make the PDF a web
           | page? They've essentially reinvented a web form, but with
           | extra steps. The only benefit a client gets is the ability to
           | edit and save the form offline (but that can be done in a
           | browser, too, with local storage); and furthermore, the
           | ability to treat the resulting filled form as a file, moving
           | it around before you submit it. But the cases where you need
           | that are _very_ niche, compared to the cases where you can
           | just direct employees to your Intranet portal.
        
             | cptskippy wrote:
             | The use case you're describing wasn't feasible until about
             | 20 years after PDFs were introduced. Web Storage isn't that
             | old, has only recently become widely deployed, and in a lot
             | of cases is disabled for security concerns.
        
             | vonmoltke wrote:
             | > In fact, many such "fillable" PDFs start off in a state
             | with many of their form-fields disabled and voided, such
             | that printing them out in that state would result in a form
             | you can't really write on!
             | 
             | I have never seen this. Do you have an example? Every use
             | if fillable PDFs I have encountered is a use case where
             | submitting a handwritten form is still an option.
             | 
             | > The only benefit a client gets is the ability to edit and
             | save the form offline (but that can be done in a browser,
             | too, with local storage); and furthermore, the ability to
             | treat the resulting filled form as a file, moving it around
             | before you submit it.
             | 
             | I have yet to see a web form that actually saves a
             | readable, properly-formatted, self-contained, easy to
             | access, fully-offline copy.
             | 
             | > But the cases where you need that are very niche,
             | compared to the cases where you can just direct employees
             | to your Intranet portal.
             | 
             | This is not a trivial need; most forms sent as fillable
             | PDFs need to or should be retained for some period after
             | submission. Also, I don't know what "employees" and
             | "Intranet" has to do with anything.
             | 
             | You are also missing the use case where a form legally
             | requires a live signature from one or more parties and need
             | to be printed, even if just to scan and return. I recently
             | had to do this for some insurance paperwork.
        
               | Isthatablackgsd wrote:
               | > You are also missing the use case where a form legally
               | requires a live signature from one or more parties and
               | need to be printed, even if just to scan and return. I
               | recently had to do this for some insurance paperwork.
               | 
               | My company have to do this for one state government. They
               | required the signature to be written black inked. It is
               | PITA to do since we all have digital signature set up.
               | But nope, this state government required the written
               | signature.
        
               | andrewshadura wrote:
               | The Canadian visa application form is an example.
        
               | derefr wrote:
               | > I have never seen this. Do you have an example?
               | 
               | I don't have one on-hand, no. But I've certainly had to
               | fill them out in the past. IIRC an especially-bad one
               | came in the form [heh] of a student-loan application for
               | the college I attended. It was essentially a Hypercard
               | stack in the guise of a PDF.
               | 
               | Here are some early Adobe marketing materials (as a PDF,
               | because of course it is) talking about the advantages of
               | "eForm Solutions": https://planetpdf.com/planetpdf/pdfs/p
               | df2k/02E/ldefurio_pdff...
               | 
               | It sounds like every PDF form you've ever dealt with is
               | what Adobe, in this brochure, calls a "Type 1: Print and
               | Fill" or "Type 2: Fill and Print" form. But Type 3 and
               | Type 4 forms do exist in the wild! (They're not often
               | _created_ any more; most of the ones that exist now are
               | from around a decade or two ago, when Adobe was really
               | pushing this idea.) Creating such forms was basically the
               | point of Acrobat as a software product.
               | 
               | When PDF viewers (e.g. Apple Preview) say they don't
               | support "PDF forms", they're not talking about Type 2
               | forms. They usually support those just fine. They're
               | talking about Type 3 and Type 4 forms. And more
               | specifically, the ones that use Adobe's proprietary
               | AcroForms data-embedding system, rather than the open-
               | standard XFA data-embedding system.
               | 
               | (I could swear I saw an HN post about the horrors of
               | AcroForms once, but I can't find it now.)
               | 
               | > I have yet to see a web form that actually saves a
               | readable, properly-formatted, self-contained, easy to
               | access, fully-offline copy.
               | 
               | To be clear, that was what I meant by the second
               | qualifier, "as a file." Browsers support _persisting the
               | state_ of the form. Just, not _as a file_. They persist
               | the state internally, when the form 's author does the
               | client-side Javascript work to enable that.
               | 
               | For the use-case where the user wants to stop filling out
               | the form for now (e.g. because they don't have some
               | required information on-hand), and then come back to it
               | to finish it later, in-browser persistence works
               | perfectly well.
               | 
               | Even cleaner, though, is just building a web-form as a
               | wizard, where fields are submitted one-at-a-time, and you
               | can also freely navigate to previously-filled "steps" to
               | change your answers. That doesn't even require
               | JavaScript; just pure 90s HTML-generated-on-the-backend.
               | Most government sites that thought PDF eForms were a good
               | idea, are now falling back to this approach.
               | 
               | > Also, I don't know what "employees" and "Intranet" has
               | to do with anything.
               | 
               | Secure installations. The main use-case for fillable PDFs
               | (as can be seen in Adobe's marketing brochure, where
               | "government" is the core client) is a case where _public_
               | or _cloud_ solutions just aren 't tenable, i.e. in secure
               | government/military/etc. installations, where the
               | workstations are air-gapped from the public Internet. In
               | such a case, PDF forms can still be sent around via a
               | local non-Internet-routable email server, for the workers
               | there to fill in.
               | 
               | Today, this need can be served just as well by setting up
               | a non-Internet-routable web portal for those same workers
               | to use. But back in the 90s and 00s, "Intranet web
               | portals" were a fancy thing only the most forward of IT
               | bigcorps had on offer. They had _Intranets_ , for sure,
               | but they weren't hosting web-apps on them.
               | 
               | So, what did they do instead? Well, Adobe had two main
               | competitors in the "eForm" market:
               | 
               | * Lotus Notes form documents, connecting to a Lotus
               | Domino database server;
               | 
               | * Microsoft Excel sheets that use VBA to data-bind to an
               | accessible Microsoft Access database file sitting on an
               | SMB network share.
               | 
               | None of these "forms" were hand-submittable. They're all
               | little self-contained interactive applications, that
               | happen to look like forms.
               | 
               | AcroForms did have the fancy property, though, that the
               | AcroForms application-PDF could _generate_ or _export_ a
               | bog-standard output-PDF representing the filled form. But
               | that 's not actually a modified copy of the source PDF.
               | That's the PDF using scripting to _generate you another
               | PDF_ , from scratch.
               | 
               | ------
               | 
               | To be clear, I agree with all the stuff you're talking
               | about; those are all valid use-cases for "PDFs" (i.e.
               | encapsulated PostScript containers.) But they're not what
               | I mean by "PDF forms." I mean the Type 3/4 forms referred
               | to above. There's no reason, in the modern era, that one
               | would implement one of these Type 3/4 "eForm solutions",
               | instead of just putting up a webpage.
               | 
               | If you need an e-signature at the end, have them fill out
               | the web form, then generate a raw PostScript PDF
               | representing their inputs, and let them sign it by
               | dropping a signature vector image on the dotted line in
               | any standard PDF viewer.
        
             | abdullahkhalids wrote:
             | 1. A webpage form requires a server to be up and running,
             | which requires an IT person to manage it, separate from the
             | dept making the form. PDF forms can be created by a person
             | given the right tools (I think Word does it)
             | 
             | 2. IT person + webserver costs have to included in the
             | budget somewhere. Which can be a big problem.
             | 
             | 3. The webpage form can fail, and the support for it has to
             | be provided by the IT dept. If the PDF form fails, dept can
             | handle it on its own, and will often accept a
             | filled+scanned print out of the PDF form.
             | 
             | 4. Adding to the point above, PDF forms degrade gracefully,
             | If they don't work, or internet doesn't work, or someone is
             | on holiday, you can still print, fill and hand them in
             | person. Webpages can degrade catastrophically where you
             | whole dept grinds to halt while the IT person tries to fix
             | the problem.
        
               | derefr wrote:
               | Re: all four of your points -- see my sibling post. I'm
               | not talking about encapsulated-PostScript "Print and
               | Fill" forms (which do certainly degrade gracefully), or
               | even open-standard PDF "Fill and Print" forms (which
               | degrade gracefully _if_ you don 't set them up with a bad
               | default state where there's big "N/A" text over all the
               | disabled fields until you fill in other fields.)
               | 
               | Instead, I'm talking about the PDFs you can basically
               | _only_ load in Acrobat (though, other PDF viewers do
               | _try_ to render them, to varying success) that actually
               | do data-binding to some remote database; do XHRs to
               | submit the form data on success; do  "online" onBlur-XHR-
               | esque field validation; generate new output PDFs _using
               | scripting, from scratch_ when you ask them to save
               | /print; etc.
               | 
               | These are applications, not documents. You can't print
               | them. You just use Acrobat as a glorified application
               | host to fill and submit them. (You can press Ctrl+P to
               | get Acrobat to request to the loaded PDF application that
               | it perform some scripted action to generate a print
               | output. This may or may not do anything, depending on how
               | the PDF was created. It usually just pops a "Printing is
               | not implemented for this form" box. It certainly won't
               | work in non-Acrobat PDF viewers.)
               | 
               | When other PDF viewers say they don't support "fillable
               | PDF eForms", _these_ are the things they 're talking
               | about. They usually support "Fill and Print" forms just
               | fine, because "Fill and Print" forms are a somewhat-sane
               | format, rather than being a competitor to Lotus Notes.
        
               | abdullahkhalids wrote:
               | I understand better what you are saying. I don't think I
               | have ever seen any PDF forms that require an internet
               | connection. The Canadian Visa application forms have
               | inbuilt validation code, that checks the form, and once
               | you upload it, I believe data is extracted into a
               | database.
               | 
               | The benefit of these forms is that the validated form
               | that you submit online is actually printable. Which means
               | that what you see on your screen/paper is pixel by pixel
               | identical to what Canada receives, and therefore
               | _legally_, there is no confusion about what was
               | communicated between Canada and the candidate.
               | 
               | Webforms are not as strongly accepted as such by courts.
               | Because they have to be manipulated further before being
               | printed.
               | 
               | I have read a bunch of your replies, and you are thinking
               | of all the technical reasons why webforms are better than
               | PDF (you are right in that), but PDFs have legal and
               | operational and budgetary advantages, that are more
               | relevant to various organizations.
        
         | rk06 wrote:
         | PDF is widely used and supported. And FWIW, edge does support
         | it.
        
       | shanecleveland wrote:
       | I've built fillable PDFs for a manufacturing business. Links are
       | provided within to the company website to the PDF files, which
       | typically now open in the browser, with varying degrees of
       | reliability. Unfortunately, many people assume this is just
       | another page of the website and that they should be able to
       | interact with like any other web form. Always fun trying to
       | explain this.
        
       | voldacar wrote:
       | Truly revolutionary tech
        
         | rpastuszak wrote:
         | Well, it's a PDF reader that doesn't come with a tracking
         | package, so in a way--yes.
        
           | inetknght wrote:
           | > _it 's a PDF reader that doesn't come with a tracking
           | package_
           | 
           | Uh, what? Firefox supports javascript. PDFs support
           | javascript. Javascript empowers tracking.
        
             | Hamuko wrote:
             | Firefox PDF support is actually Javascript.
             | 
             | https://github.com/mozilla/pdf.js
        
             | MCOfficer wrote:
             | following that logic
             | 
             | - every browser that supports cookies comes with a tracking
             | package
             | 
             | - electron comes with a tracking package
             | 
             | - every language interpreter, runtime or compiler _is_ a
             | tracking package.
             | 
             | - your OS can run tracking software, thus coming with a
             | tracking package.
             | 
             | - Anyone carrying their phone comes with a tracking package
             | 
             | Hang on, did you just post something on the internet? Your
             | HN account comes with a tracking package!
        
           | gpvos wrote:
           | Which PDF readers contain tracking? Anyway, there are several
           | open-source ones that don't.
        
             | ilikehurdles wrote:
             | Acrobat and Chrome come to mind.
        
           | gspr wrote:
           | What's the problem with the Poppler-based ones? I've been
           | producing (with LaTeX) and consuming (with Poppler/Okular)
           | PDFs for a decade and never once have I had to worry about
           | anything related to the format (including tracking).
        
             | rpastuszak wrote:
             | Poppler looks great! But, I _just_ learned about it and I
             | don't think that the majority of population, say, outside
             | of HN knows about its existence, so it's good to have a
             | fairly mainstream alternative available.
             | 
             | OK, Firefox is, sadly, far from being a mainstream browser
             | nowadays, but still I suspect it has a larger user base
             | than Poppler.
        
         | lumberjack wrote:
         | You say in jest, but this simple upgrade very likely improves
         | the lives of more people more significantly than some billion
         | dollar unicorns ever do.
        
         | ManBlanket wrote:
         | And here I was thinking we were living in the future when I
         | could print out a pdf, fill out the fields with a pencil, take
         | a picture of it, then email it to myself, change the file type
         | back to pdf, and send it to whomever requested it...
        
           | mxuribe wrote:
           | I think it was William Gibson who once stated something like,
           | "The future is already here, it is simply unequally
           | distributed...in that, some people just fill out PDF forms,
           | while others have to print it out, fill it out with a
           | pencil...etc...." Ok, maybe i'm remembering that quote
           | inaccurately. ;-)
        
       | ansaso wrote:
       | Seems so absurd that filling a form digitally is breaking tech
       | news in 2020. PDF in a nutshell.
       | 
       | Does anyone see a trend moving away from the PDF standard in
       | recent years? Tried to look for data on it but found nothing.
        
       | [deleted]
        
       | spidersouris wrote:
       | Search in PDF is broken since this update. I must go through the
       | whole document for Firefox to load it up and being able to search
       | in it. Couldn't find a similar issue on Bugzilla. Anyone having
       | the same problem?
        
         | cpeterso wrote:
         | I just tested PDF search (in an IRS PDF in Firefox 81 on
         | Windows) and it works for me.
         | 
         | Do you see the problem in all PDFs? Maybe there is something
         | unique to the PDF you are searching?
        
         | brendandahl wrote:
         | Please file a bug
         | https://bugzilla.mozilla.org/enter_bug.cgi?product=Firefox&c...
        
           | mattashii wrote:
           | I've seen the same happen, so I've filed a bug with my
           | reproduction:
           | https://bugzilla.mozilla.org/show_bug.cgi?id=1666575
        
       | DesiLurker wrote:
       | One Giant Leap for mankind! not /s.
        
       | dgellow wrote:
       | A note for Linux and macOS users, from someone who switched to
       | windows one year ago: it's maybe surprising but it is a VERY REAL
       | pain in the Windows world to find a pdf reader that also allows
       | you to edit forms, that doesn't also come with malware or adware,
       | and has even just a modest UX!
       | 
       | So for sure you already have access to Evince and Preview.app,
       | they already do everything you want, but Windows users don't
       | really have that luxury! Being able to say to users to just
       | install Firefox if they want to edit PDF is really good IMHO, way
       | better than the current situation.
        
         | jiveturkey wrote:
         | eh? acroread is very easily found.
        
         | ImaCake wrote:
         | Just to provide anecdata against the current comments. I
         | totally agree with you. It's not particularly hard if you are
         | pretty tech savvy, but the for the average user you pretty much
         | are stuck with adobe. Or you can try your luck with the
         | edge/chrome pdf form fill but there's a decent chance it just
         | won't bother saving your input. On adbobe, it is still full of
         | extra crap that is irrelevant to everyday use. I think it still
         | bugs people to update it all the time, but I don't use adobe,
         | so I don't know.
        
         | inopinatus wrote:
         | I read that as suggesting this is potentially a killer app for
         | Firefox adoption in the enterprise.
        
         | maxerickson wrote:
         | What comes with Adobe Reader?
         | 
         | I have it on my work computer and haven't noticed anything I
         | would rate as particularly obnoxious, but I don't use it much.
        
         | mickotron wrote:
         | Okular can "edit" forms. I have been doing this on Linux and
         | Windows for a while. Not the most usable but it works. What I
         | can't do in Okular, I do in Gimp.
         | 
         | I will use Firefox for editable form pdfs but for those that
         | don't have editable forms, I will continue to use Okular/Gimp.
         | 
         | I actually stumbled across the ability to edit forms in Firefox
         | only recently. I was like... What? This is amazing! For some
         | reason the pdf i clicked on opened in Firefox and yeah,
         | surprised.
        
           | MayeulC wrote:
           | And IIRC it's available on the windows store. It probably has
           | msi as well.
        
       ___________________________________________________________________
       (page generated 2020-09-22 23:00 UTC)