[HN Gopher] Embedded PDF viewer in Firefox 81 supports filling f... ___________________________________________________________________ Embedded PDF viewer in Firefox 81 supports filling forms Author : muxator Score : 798 points Date : 2020-09-22 14:16 UTC (8 hours ago) (HTM) web link (support.mozilla.org) (TXT) w3m dump (support.mozilla.org) | austincheney wrote: | Does this support digital signatures via signing certificate? | SigmundA wrote: | Still waiting for the SVG backend to be fully implemented for | high quality printing. | skratlo wrote: | Which PDF forms standard is this? | blackbrokkoli wrote: | I have a feeling this thread has a strong bias from highly | automated valley life. In more provincial regions and even just | much of Europe lots of forms have to be filled out and printed. | | It is not something you have to everyday or something, but the | existing solutions suck massively. You either have to use Adobe, | which requires Windows (or Mac, I suppose) and your firstborn or | use some massively shady online service. So personally, I love | this feature! | | (And I also do not think that this will halt all other | development at Mozilla like some comments here imply) | drdaeman wrote: | Okular works on Windows. Or, at least, used to be, some years | ago. | boogies wrote: | Isn't evince capable of this and the default PDF viewer on | GNOME? | krastanov wrote: | I have had evince fail render forms, but okular (the KDE | default) has worked pretty well. | jhoechtl wrote: | Gnome - no. Okular the KDE counterpart works very fine. | Jaxan wrote: | In Mac you can fill in PDFs with the builtin Preview app. I | like it. | mkskm wrote: | There's also the paid app PDF Expert which is generally | excellent. | boringg wrote: | Can we just step away from PDFs to a better standard? Every time | I deal with PDFs or I have to on behalf of my parents it is a | true waste of time and resources - there has to be a better way. | jrochkind1 wrote: | Well, no, we can't just do that. But it's nice to dream. | topspin wrote: | Well, yes, we can, but the outcome will be far worse. The | naive imagine "something better." The real world will | interpret "better" as 27 half baked alternatives, 2 of which | will work on something other than Chrome running on Windows. | kibwen wrote: | Sure you can: find an ideologically motivated tech billionaire, | buy Adobe, release a new version of PDF and make the spec an | inaccessible trade secret, aggressively legislate against | anyone who attempts to implement it, start charging for Reader, | increase the price by a compounding 2% every year, and put that | money towards a foundation with a purpose of openly designing | and implementing a better, freely-licensed replacement. I | predict this would only take 20 to 30 years. :) | boringg wrote: | Sounds like you've been thinking about this. You don't happen | to be an ideologically motivated billionaire who happens to | think the best thing for humanity and return on capital is to | rebuild the pdf spec do you? * fingers crossed _ | Semiapies wrote: | At the very least, you need a replacement that's technically | better, works well cross-platform, has a layman-acceptable UI, | and supports 99.999%+ of all the use cases PDFs currently | supports. It also has to convert old PDFs into the new format. | | Then, you have to worry about market share and acceptance. | INTPenis wrote: | This is huge. I've felt like an outsider for years here because | the gov uses a lot of online forms in PDF. | hapless wrote: | PDF form support still doesn't work very well -- it cannot export | filled fields correctly, nor will do they print correctly. | abrowne wrote: | I actually like the pdf.js viewer enough that I use the chrome | extension version on chromium. But I see it hasn't been updated | in over a year now. Hopefully it will get updated! | | https://chrome.google.com/webstore/detail/pdf-viewer/oemmndc... | getpost wrote: | "After entering data into these fields you can download the file | to have the filled out version saved to your computer." | | What's the use case? Printing out filled-in forms? But otherwise, | who would want the PDF in electronic format? It doesn't seem like | a practical way for users to submit data. | brainwad wrote: | Well, if the form needs to be faxed (still a thing!) then | having the filled PDF makes it easy to use an e-fax service. | But I assume sending the file to another computer for printing | is the main use case. | pbhjpbhj wrote: | Last year applying for jobs most places had a pdf form, if you | were lucky it was an actual form too! So, filling the form and | emailing it back is useful -- much better than trying to | overwrite text with a PDF background; far better than printing | the form, filling with a pen, scanning, then sending. | detaro wrote: | Printing is one use case. Also plenty places that want filled | out forms uploaded, e-mailed, ... + it allows you to keep a | copy with what you entered. | getpost wrote: | That's my point. Other than printing a nice looking form | (which includes faxing), the content of the field data is | hard to reuse. Searching PDF content on your own hard drive | is problematic. | | Are there utilities that extract PDF field data and submit it | to a database? I'd be grateful to see examples. | | What about field validation? The PDF may have some minor | validation, but that's no substitute for the validation done | in a DBMS. | | If you want users to be able to save a nice looking form, | you'd still want the data entered online directly into a | DBMS. I'd offer a "download PDF of your input" as an option, | for example. | toyg wrote: | _> the content of the field data is hard to reuse._ | | A lot of people don't care, because they come from forms in | cartaceous - where they have to manually retype everything | anyway. For many, their "DBMS" will be an Excel sheet with | a dozen rows. The more advanced types likely have some | Adobe software that does all the magic. | | Fillable PDF forms are really seen as a courtesy to users | more than anything particularly useful to the emitter. | detaro wrote: | Sure, its a structured format, so you totally can extract | the individual fields. AFAIK Adobe sells a server product | that does that, but I'm sure there's competitors and I have | seen the underlying parsing in feature in PDF libraries | before. | | That said, plenty of users of PDFs have a very paper- | based/manual workflow still, and not the motivation and | expertise to run and update an online form thing. Or they | need to have the ability to handle odd inputs anyways, | because paper forms have even worse input validation. | | And from a browser/user perspective, the feature here is | useful because people expect me to handle PDFs and do not | provide nice web forms. They might have terrible reasons | for doing so, but I still need to live with that. | sixhobbits wrote: | today I took a screenshot of a PDF and uploaded it to an OCR | service and copied the result into a doc. | | The PDF was text-based but every time I copied something it added | millions of new lines and hyphens and extra text that wasn't | shown on the page. | TazeTSchnitzel wrote: | Will it support only the standardised kind of forms, or also the | proprietary Adobe-only kind of forms? (Yes, there's two, and the | latter are what Swedish administrative agencies use, so I'm | forced to choose the "non-fillable PDF" option lest I get a file | intentionally made unreadable to non-Adobe software.) | dehrmann wrote: | PDF support in Firefox is one of the most important additions in | recent years. My gripe with Mozilla was they're pursing all these | side projects when they really should be targeting feature parity | with Chrome. That's the only way people will ever switch. | eddiecalzone wrote: | I'm curious what you notice is missing in terms of feature | parity. I'm mostly a back-end developer (not diving into | devtools very often) and switched a year ago. I'm much happier | and haven't looked back. | andrewzah wrote: | I just want support for APIs, mainly. I get websites from | time to time that just refuse to load. E.g. | | * blank pages when trying to load an imgur gallery on v68 | (esr). | | * image uploading not working right on instagram and various | other sites, either producing blank images or ones with weird | lines. | | * several teleconferencing / video meeting websites just | don't work properly, whether it's not detecting hardware | properly, etc | | I have to keep chromium installed so I can use these sites | properly. | kodablah wrote: | > My gripe with Mozilla was they're pursing all these side | projects when they really should be targeting feature parity | with Chrome. That's the only way people will ever switch. | | A bit off topic from the post at hand, but my gripe was the | opposite. The relentless pursuit of parity made them | indistinguishable giving users no reason to switch (and taking | dev time away from distinguishing features). Granted the | pursuit of users instead of principles is its own folly that's | hard to overcome when money is needed. | DHowett wrote: | > feature parity | | Like PDF support? | nacs wrote: | That's what OP said yes. That features like PDF fill is | essential while things like Pocket are basically non-core | side projects. | bad_user wrote: | Pocket is an acquired company, the integration with FF has | been minimal (it does less than the Chrome extension ;)) | and I'm pretty sure it pays for itself. | yjftsjthsd-h wrote: | On rereading I agree with your interpretation, but it's | easy to read "all these side projects" as referring to the | PDF reader. | steviedotboston wrote: | Chrome has had this forever, right? | PaulHoule wrote: | I find the built-in PDF reader in Firefox to be bloat. It's OK, | it works 95% of the time, but really I want to use a native PDF | viewer. | | Is there a version of Firefox that removes this bloat? | | Given that Mozilla is very resource constrained, why are they | working on features that aren't necessary? | cptskippy wrote: | Given the amount of PDF exploits over the years and the habit | of browsers to automatically invoke your PDF viewer of choice | either as a plug-in or call out, they're an easy target. | | Having a sandboxed PDF viewer that works 95% of the time is | great. For those 5% circumstances where I am actively trying to | view a PDF and it won't work in browser, I'll gladly go through | the minimal effort to open it in an external viewer. | pessimizer wrote: | It was once an add-on, and it was once disableable. It may | still be disableable, but I'm sure there's some strange | procedure you have to go through to do it. | derefr wrote: | I like that single-page PDFs stay in the browser. I don't want | to _keep_ them; I just want to _see_ them. Like any other web- | page. I want to be able to hit back, or close the tab, and | continue on with my day. | | And I also like that I can _preview_ long-form PDFs in the | browser, before choosing whether to save them and read them | "for real." | | Imagine if every time you opened a direct-linked JPEG image in | your browser, it treated it as an attachment, downloading it | and opening it in your external image-previewer app, rather | than rendering it as a synthesized HTML DOM wrapper around the | image. Wouldn't you be annoyed by how cluttered your Downloads | directory would get with random files you never actually wanted | to save? | danso wrote: | A lot of everyday users likely benefit from being able to fill | out PDFs in the browser. | snovv_crash wrote: | I find it extremely convenient. Also I know a lot of security | issues in PDF viewers are effectively solved by running it in | the browser's JS sandbox. | Mediterraneo10 wrote: | You could also disable Firefox's built-in PDF viewer and | instead use an external PDF viewer that doesn't even support | Javascript. | nitrogen wrote: | Not all PDF vulnerabilities involve JS though. | snovv_crash wrote: | Native PDF clients have had lots of security holes. In this | case having the client written in JS means we can repurpose | the battle hardened JS sandbox to also contain PDF | exploits. | detaro wrote: | You misunderstand the argument the parent comment makes. | It's not about Javscript _in_ PDFs. | Keycap wrote: | I hope no one is listing to you. | | I don't want to start explaining to my mother, over the phone, | how to install and use the pdf viewer anymore :| | anonymousab wrote: | And here I found it much less bloated than the other free | desktop PDF viewers. | | I think Mozilla's line of thought here is that PDF documents | are widespread in the web, to the point where they are a de | facto web document type. So it makes sense for a web browser to | support them rather than calling out to a user's desktop | program (though I assume you can configure it to do so | instead). | | There's probably a bit of "our competitors do it, so we have to | too" in there as well. | ocdtrekkie wrote: | Given that Chrome/Edge also just added the feature, I would | point out: All web browsers are using the same library for PDF | handling, a feature in pdf.js ends up benefiting a lot of | people. | | And the reasons for not requiring an outside PDF reader are | major: It's yet another likely-to-have-vulnerabilities program | people need to install, then update. In most cases, avoiding | Adobe programs on your PC is a good way to avoid a lot of | vulnerabilities. | oefrha wrote: | > All web browsers are using the same library for PDF | handling | | Chrome/Chromium uses PDFium, not PDF.js, so no. Not sure | about Edge. | | PDFium has been able to fill out forms for a long time. | What's new for Chrome is the ability to save edited PDF (as | fillable). | ocdtrekkie wrote: | If Chrome doesn't use pdf.js, then neither would Edge, | which is a Chrome fork. My original comment may have been | mistaken. | oefrha wrote: | A Chromium fork could replace certain components if they | so choose. The PDF rendering component would be one of | the easier ones to replace. | | However, I was able to confirm that Edge uses extension | ID mhjfbmdgcfjbbpaeojofohoefgiehjai to render PDF | internally[1], same as Chrome, so indeed it's using | PDFium. | | [1] The rendered PDF element would look like | <embed id="plugin" type="application/x-google-chrome-pdf" | src="..." stream-url="chrome- | extension://mhjfbmdgcfjbbpaeojofohoefgiehjai/..." | headers="..." ...> | | And here's the extension's manifest in Chromium source, | where you can find the extension ID: https://github.com/c | hromium/chromium/blob/2baa2b094cdd60e980... | burtonator wrote: | I'm the author of Polar (https://getpolarized.io/) that uses | PDF.js as its PDF backend. | | This is a somewhat big update for PDF.js which is kind of cool in | that they haven't really been updating it as aggressively as they | usually do in the last year or so. | | It's a bit frustrating to work with though. The entire concept of | rendering a PDF via JS is fascinating but actually using the API | has been a huge pain for us. | | We've had to fork it internally and work on typescript bindings | and other features to get it to work. | | They seem to have a silly policy of only allow developers to use | a subset of the API not the whole API itself so that it doesn't | look like PDF.js (which I don't understand). | | A lot of the functionality just isn't available otherwise. | 0xFF0123 wrote: | Looks cool! On a sidenote, I've always been curious with | product sites: what's your metric for including other orgs | under "Used and Trusted by Top Organizations"? How do you know | they use / trust it? | 52-6F-62 wrote: | I've been working on an internal tool for my company using the | same library. It's saved me a _ton_ of work, but my experience | has been similar to yours. I 've even had to lock in to a much | older version for want of putting a lot more work on my plate | since the API seems to have changed a fair bit. (well, between | it and JSDOM which I am using to some rendering on the server). | And like you I've had to write a bunch of the | bindings/definitions myself or just reduce them to nil (declare | module "yadda/yadday/thing" as any)--which is thankfully | permissible since it just needs to be built once and run | "forever" with near-zero need for feature additions, etc. | | All to extract images in a routine fashion. | | Just the same, I'm still immensely thankful they've published | the library as OSS. | 0x6A75616E wrote: | Hey! Polarized looks pretty cool. Question, has this feature | (forms) been merged into the public pdf.js master yet? | maest wrote: | Do you have, by any chance, any good resources on PDF.js? The | README on the github is ok, but it doesn't really cover what | workers are supposed to do and provide any useful mental model | for the architecture of the whole thing. | brendandahl wrote: | PDF.js dev here. I'm a bit confused on which part of internal | API you would like to use? The way I think of it, there are | really three API's in pdf.js: 1) Main thread API (api.js) which | we base the version off 2) The code that runs in the worker 3) | The viewer components (web/*) | | Quite awhile ago when we decided what parts of the API to | version, we thought more people would want to use #1. Now that | the project is mature we could probably expose some more base | the version off of that. | | As for the "so that it doesn't look like PDF.js", we don't | limit the API because of this. That suggestion (which I don't | totally agree with) came from what we saw people doing, where | they'd copy the entire viewer, when it'd probably be better to | just let the user's browser choose how to show the PDF. | torresjrjr wrote: | > PDF.js dev here | | I'm so sorry about being forward but why the hell don't the | vim keys (hjkl) smooth scroll? Its so frustrating. Is there | an option to set it as so? Using the arrow keys is so | cumbersome. | colejohnson66 wrote: | Because not everyone uses Vim? | tarikozket wrote: | here, I'll make it easier for you to contribute to the | project by providing you with the lines you'd need to | update: | | https://github.com/mozilla/pdf.js/blob/83e1bbea6e23db874442 | 0... | | https://github.com/mozilla/pdf.js/blob/83e1bbea6e23db874442 | 0... | mavsman wrote: | Brilliant | kebman wrote: | I looked into coding PDFs once. Then I closed my MacBook (Pro) | and went for a long walk into the ocean. I think I almost got to | America, but then I turned and swam back again. Turnd out I had | just fallen asleep and had a nightmare. I was actually just | working with regular text files, and everything was fine. | gorgoiler wrote: | _Only Forward_ | | ...a wondeful novel along these lines. They only get shot and | kidnapped though. Nothing so bad as PDFs. | rvense wrote: | My favourite PDF fact is that it doesn't have to start at the | beginning or end at the end of a file. Any sea of bytes that | contains a PDF file is an acceptable PDF file... | TeMPOraL wrote: | Did anyone try to pluck out PDFs from /dev/urandom? How about | from radiotelescope feed? Maybe the first evidence of | extraterrestrial life will be some poor alien's tax form? | willis77 wrote: | The digits of pi contain every pdf that ever could and ever | will exist. | rbonvall wrote: | Since a PDF can begin with non-PDF content, then pi | itself is a valid PDF file. | dheera wrote: | Pi is thought to be normal but it hasn't been proven yet, | so we can't say that for sure, but it's likely true. | mtzet wrote: | Well maybe. We don't know if pi is a normal number. | SteveGoob wrote: | > We don't know if pi is a normal number. | | Sure we do. There are plenty of proofs out there that pi | is an irrational number. | jimktrains2 wrote: | Normal in this sense means that all the frequency of all | digits approaches a uniform distribution as the length of | the sample increases towards infinity. Basically if we | could see "all of" p and count all the 0s, 1s 2s, 3s, &c | to 9 all the counts would be equal. | gugagore wrote: | That on its own can't be right, because | 0.12345678901234..... | | According to wikipedia, you gave a definition for "simply | normal", and for normal numbers the distribution of any | sequence of digits is uniform. So 00, 01, ..., 99 each | occur uniformally too. | enedil wrote: | Moreover you need to consider it with regards to all | other bases than 10 too. | andreareina wrote: | Irrational does not imply normal. For example, | 1.01001000100001... is irrational but it's certainly not | normal. | x3c wrote: | Technically, 1.01001000100001... can be normal depending | on what ... stands for. :) | bscphil wrote: | Well, obviously. But presumably the ... is meant to imply | that this is the summation of 1/(10^(x(x+3)/2)). | coddle-hark wrote: | Or what 1 or 0 or . stands for. | landryl wrote: | Actually I'd argue the example you provided is normal, as | long as you authorise a particular encoding where every | number n you're looking for is encoded as a string of n | zeros. | | It's then trivial to see that every number you can think | of is encoded in there, and therefore any data, piece of | music or movie that ever existed. | | (I'm not sure we're allowed to fiddle with the encoding, | but since we allow ourselves to represent a piece of | music into a number, we're already talking about encoding | anyway, so it doesn't seem like cheating to me...) | enedil wrote: | Normality of a number is with respect to number bases, so | your trick with encoding is invalid. Otherwise, every | computable number could be considered normal - take an | algorithm for generating of it, supply a random string | (this is the encoding), disregard the random string, and | you have a perfectly valid normal representation of your | number. So it is cheating. | dheera wrote: | Encoding doesn't count. Normality is a very specific | mathematical concept: | https://en.wikipedia.org/wiki/Normal_number | | Also, 1.01001000100001... is a good example of a number | that is both irrational and transcendental but not | normal. | ducktective wrote: | I don't think that is a proven fact. | vcxy wrote: | "Find the earliest valid pdf in consecutive digits of pi" | btown wrote: | Imagine if it was a PDF that simply rendered the number | 42. | ddalex wrote: | If that happens we know for a fact that we are in a | simulation | drevil-v2 wrote: | Please don't give them any ideas.. the whiteboard | interview coding tests are hard enough as it is | paulmd wrote: | I mean, the answer is trivially zero, there exists a PDF- | like structure somewhere in Pi, and the offset of that | doesn't have to be zero, it can start or end anywhere. So | the range [0, N] is a valid PDF. | infogulch wrote: | "Find the last byte of the first valid PDF in the binary | digits of Pi" | nroets wrote: | <citation needed> | | Including a PDF that generates the digits of pi | nostoc wrote: | It'll depend on the pdf reader you're using, but I'm pretty | sure the PDF header needs to start in the first 1K of the | file. | mkl wrote: | Some readers won't need a header at all, I think. Near the | end (usually!) of the file there's an index of objects | (page data etc.) with byte offsets, which can point to | anywhere in the file. | sp332 wrote: | On the other hand, this allows for some incredible polyglot | files, like some of the tricks with PoC||GtfO issues where | the file is a readable PDF but also a game cartridge and also | a zip file with the proof-of-concept code in the issue. And | the front cover has the MD5 hash of the whole file printed on | it... but that's another trick entirely! | rvense wrote: | Yeah, next time I need a CV it'll be a single-file Ruby web | server and PDF that's also an archive of its own sources. | toomanybeersies wrote: | I'm currently looking for a job as a Rails Developer. I | might just do that. | | Probably won't send it to any recruiters, but it will be | a funny anecdote for interviews. | cgb223 wrote: | My favorite pdf fact is that the security flags for things | like copy protection and passwords are on the viewer to | implement so you can just turn them off and all the security | is gone | BHSPitMonkey wrote: | Debian actually goes out of their way to patch those checks | out in their PDF-related packages as part of their stance | against DRM, like this example with "pdftk": | | https://sources.debian.org/patches/pdftk/2.02-4/drm_fix/ | toomanybeersies wrote: | You can also circumvent copy protection on PDFs by taking a | screenshot, or taking a photo of the screen with your | phone. | r00fus wrote: | This is not entirely true, you can encrypt PDFs [1] since | v1.3 of the spec but the cypher is often so weak (RC4 until | v1.6) they can be bruteforced in reasonable amounts of | time. | | [1] https://www.pdflib.com/pdf-knowledge-base/pdf-password- | secur... | Drdrdrq wrote: | My somewhat less favorite pdf fact is that if you do that, | you are still breaking protection, legally speaking. | smegger001 wrote: | So what if you open it in a postscript viewer instead of | a PDF viewer? Because they are compatible formats except | for some edge cases like security flags. | mkl wrote: | Postscript and PDF are definitely not compatible formats. | The drawing model is similar, but the structure and code | are completely different. | maxerickson wrote: | Seems to be a reasonable analogy with trespass, where you | are violating the law when you cross an invisible line. | The need for marking the line varies considerably. | | And even places with strong roaming rights tend place | limits on well marked land. | maddyboo wrote: | Can a PDF file contain a PDF file, and if so can that PDF | file contain a PDF file? | betatim wrote: | Yes. Because the PDF standard specifies a mechanism that | lets you "attach" files to a PDF :) | layoutIfNeeded wrote: | ZIP files can: https://research.swtch.com/zip | alexott wrote: | You can imagine the pain when you need to reliably detect PDF | mime type on web proxy, or something like... | agumonkey wrote: | I can never find the PDF hack talk where author explains all | 100 ways to embed things in pdf or pdf into things | wffurr wrote: | It's hidden in a PDF in the digits of Pi. | divbzero wrote: | I had to handle action buttons for a PDF once. I swam out from | America a long long ways before turning back. Might have | spotted you middle of the ocean. | meshaneian wrote: | Legit username warns of one PDF peril. | schoolornot wrote: | I thought I found a nifty trick by using OpenOffice to create a | form with a pre-filled value. I decoded the PDF using pdktk or | one of the free tools, and then modified the value. Nope, that | caused some kind of cascading/checksum error. | | Ended up just making the app generate HTML before calling | wkhtmltopdf. | | The PDF spec is insane! But like all things what you get out of | Word/OpenOffice is 100x more complex than if you wrote it | yourself, which is indeed doable. | phpdave11 wrote: | It's really not that difficult if you read and understand the | PDF specification. As a learning exercise, I created a simple | PDF generator library that creates ASCII PDF documents (you can | open them in Notepad) and includes comments about what each | drawing instruction does. | | https://github.com/phpdave11/davepdf | recursive wrote: | I'm sure generating PDFs is much easier than reading them, | such that "it just works" with any kind of PDF. | phpdave11 wrote: | Reading them is also easy. I wrote a library that reads | PDFs and imports page(s) from an existing PDF into a new | PDF as a Form XObject. | | https://github.com/phpdave11/gofpdi | Someone wrote: | OK, they added various ways of data compression, but PDF is, | basically, a text-based format. | | As far as I know, any PDF can be losslessly converted to an | equivalent PDF that can be edited in any text editor, even | Notepad. And yes, you could fill in the forms there, too (if | you were stubborn enough) | quickthrowman wrote: | That's news to me and Bluebeam's PDF search feature! It turns | out you can make PDFs (This usually happens with | architectural drawings) that are comprised purely of images | that are not searchable, and therefore you are wrong. | | I silently thank every architect that provides searchable | PDFs, it makes my job way easier | mkl wrote: | GP is right. The _code_ that makes up a PDF is text-based. | Those images can be encoded in the PDF file using the | ASCIIHexDecode filter, i.e. as editable ASCII text code. | xfer wrote: | Yes just use a hex editor and every data is text-based. | roflc0ptic wrote: | It sounds like you either know a lot more than me or a lot | less than me. The PDFs I've dealt with don't store text as | strings, they store it as individual characters. This left me | having to write a heuristic based algorithm to group the | characters into words, words into lines, lines into | paragraphs, paragraphs into columns. | | Again, as far as I know, there are no heuristics good enough | to get that right for all values of PDF. | mkl wrote: | More. Go here and download the PDF spec.: | https://www.adobe.com/devnet/pdf/pdf_reference.html | | Look at Chapter 3, Syntax. The code is all text based. We | are not talking about the visible characters in a PDF | viewer, but the code of the PDF file itself. | belval wrote: | He knows a lot less than you probably because there is | absolutely no requirements for PDFs to be in text format | and most aren't. The "text" he is editing could render to | completely different characters depending on how the PDF | document was created. | | The default MacOS PDF printer will actually remap the font | cmap making born-digital PDFs where the "text" is something | else entirely (say "$" maps to "a"). | yjftsjthsd-h wrote: | > The default MacOS PDF printer will actually remap the | font cmap making born-digital PDFs where the "text" is | something else entirely (say "$" maps to "a"). | | What? Why!? I've heard of doing that as a form of DRM, | but I can't imagine Darwin _defaulting_ to doing that. | belval wrote: | I never dug deeper into it, so I am not aware of why it | does that or if it's a specific version or whatnot, but | take a PDF from which you can extract the text (with | pdftotext/pdfbox for example). Open it in the document | viewer and "print" it to PDF. If you extract the text | again it is not readable anymore. | | This wouldn't be an issue if it was a conscious choice, | but when I parsed a lot of born-digital PDFs we ended up | with a lot that were like that from various source. Try | explaining that... | colejohnson66 wrote: | Could it be "compacting" the fonts? So if U+0000 to | U+0007F aren't used at all, remove those glyphs and set | U+0000's glyph to be what _was_ U+0080? Yes, I know NULL | doesn't have a glyph, but I hope that gets the idea | across. | mettamage wrote: | What's your favorite PDF feature that causes a brain meltdown? | | I've read a few comments on HN how PDF is, well, not developer- | friendly. If people are interested in providing some more | examples here, I'd be curious to know! | jahewson wrote: | Acrobat can read fantastically corrupt PDF files none of | which are covered by the spec. The endless surprises induce a | special kind of madness. | | Streams just suddenly end? That's ok. Totally corrupt xref | tables? Ok. Incorrect image headers? Ok. Unrecognisably | mangled Type1 font formats? Fine! | belval wrote: | That's great because it creates client expectations | regarding what my PDF application should support. | Implementing the spec is not good enough, you have to do | what PDFium or Adobe do. | pierrebai wrote: | In the early 2000 I coded a PDF library for an industrial | printer suite. (Print, proof, impositions) | | I personally think the structural PDF format is a really | great format. It's entirely ASCII-based, a pure text format, | yet it can embed arbitrary binary data and compress that | data. The actual structure is simple and support just enough | functionality, like a tree of object, dictionaries and | arrays, unicode strings, date formats, etc. | | I think if you limit yourself to pure structural PDF woulde | have been a great format to standardize upon, much better | than JSON or XML. It';s richer than JSON, simpler and saner | than XML. Again, it's top-notch ability to embed binary is | great. It has other great characteristics, for example you | can update anything just by appending. | | The ugly bits are in the "semantic" PDF: the page | descriptions, media, etc. Even then, the early version of PDF | were nice, mainly just simplified Postscript. | izacus wrote: | Being able to do SQL queries to remote servers, upload form | contents directly to a server, embedded 3D models and being | able to have a fully featured page embedded Tetris game due | to support for JS. | | Having said that (and worked on a commercial PDF library), | despite all the cruft that came with age, it's a well built | format that survived the test of time with good reasons. | amelius wrote: | At least it's probably better than MS-Word's internal format | ... (?) | DavidPeiffer wrote: | I'm not saying it's beautiful, but isn't Ms-Word's internal | format basically a series of XML files that are zipped up? | | The old .Doc and .xls files were a bad format, but my | understanding is that since Office 2007 the format is | generally much better. | alexott wrote: | Ms office files prior to office 2007 were mostly memory | dumps of specific components, wrapped into composite | files aka OLE2 storage - their content varied depending | on office versions and often locale | jhoechtl wrote: | To be fair the old doc format was conceived in DOS era | and memory efficiency was a primer back in the days. | smegger001 wrote: | and if they hadn't waited so long to update to a sane | file format no one would complain but they waited until | 2007 to fix the format long after the dos era memory | excuse had long ceased to be an issue. even then they | only did it to allow them to shoe horn their format in as | the iso standard after one was already selected bribing | there way through the process. | liversage wrote: | Microsoft Word stores XML documents inside a zip archive. | There is a detailed specification of the format available: | https://docs.microsoft.com/en- | us/openspecs/office_standards/... | xaldir wrote: | I think he was talking about the classic .doc format | which was a clusterfuck and not the open XML. | akie wrote: | If I remember correctly the XML format was just an XML- | encoded version of the binary counterpart. Including all | or most of the bugs and weird hacks. | patrec wrote: | You don't remember correctly. Word's docx format is far | more intelligent than openoffice ODT, despite propaganda | to the contrary. With one exception: word's zip files | don't have a convenient magic header. The way it works | with ODT, and a bunch of other formats is that you put an | uncompressed identifier file (`mimetype`) as the first | entry inside your zipfile. At byte 30 (of your zipfile) | you then get `mimetype$THE_MIMEMETYPE`. This is a nice | trick and works for any zip-based format. Sadly, docx | does not do that so you have to go by file extension or | look at (more of) the contents of the zipfile. | amaccuish wrote: | with the previous format being essentially a memory dump, | i'd say that's progress | alexott wrote: | That's correct - I worked with MS team that documented | old formats, and they said that sometimes they don't have | people left who knew what specific struct was intended | for - although that was mostly for people PowerPoint and | Visio, excel and word was better documented | belval wrote: | - Remapping font tables to different characters for to reduce | code usage. | | - Clipping path logic, you can write text outside of it, | which makes it effectively invisible yet it will show up if | you try to extract the text. | | - Anything regarding the graphicstate stack, it's a pain to | debug. | | - Extracting content from AcroForm/JS "XFA" forms | | PDF is great format for printing, it's just a pain for pretty | much everything else. | aidos wrote: | This is also my list. Except for the forms, that's one I | _don 't_ have to deal with. | | My other one is the use of multiple subset fonts that are | actually the same font with a different subset of glyphs | that you want to merge back together. | moultano wrote: | It's kinda crazy that this is the format we've standardized on | to carry all of the output of academia into the future. | mjcohen wrote: | A lot of the input is LaTeX, so that's ok. | MayeulC wrote: | And arxiv asks for the original latex source when | submitting. | | Well, at least, pdf is probably better than printed paper | for that purpose. | belval wrote: | As someone still working with PDF processing, I can confirm | that it doesn't get easier. | nvr219 wrote: | FINALLY! Now I can finally uninstall Chrome. | | Of course, I do wish Sumatra supported filling forms. Then I | could uninstall Firefox too! ;-) | MayeulC wrote: | Okular works quite well for filling forms in my experience :) | flowerlad wrote: | > _After entering data into these fields you can download the | file to have the filled out version saved to your computer._ | | And then what? Fax it? Sounds like a missed opportunity to me. It | would be nice if you can add a Submit button to have the data | posted to the server, just like any other web-based form. | Jaxan wrote: | That would be nice if websites support that. But in my | experienced all PDF forms I fill in have to be printed and then | signed and posted... | chairmanwow1 wrote: | I haven't printed a PDF to sign in years. Why don't you just | affix a digital image of your signature to the file? Save it | and email it back to whomever. | emidln wrote: | This, and in the rare circumstance where they only accept | regular mail or faxes, I use HelloFax. | Jaxan wrote: | Many places don't accept emails. Sure you can sign | digitally and then print. | unbalancedevh wrote: | e-mail it, or save it for your records. | flowerlad wrote: | And what would the recipient do with the email? Type it in | manually? You don't see any room for improvement here? | _coveredInBees wrote: | Sheesh, what's with the hate for a generally all-round useful | feature in an Open source browser? The last thing I want is to | have to install 3rd-party software on my machine and have my | browser be held hostage to it just to view PDF documents on the | web. Being able to fill them in is a very useful feature and the | in-browser PDF readers are still way less bloated than most other | plugins. | yjftsjthsd-h wrote: | Yes, this is a nice feature added to a basically-reasonable | implementation of a PDF viewer. I think the objection is that | that PDF viewer should be an actual independent application, | not baked into a browser that already is too many things to too | many people. It's like Chrome including a basic antivirus | function (https://support.google.com/chrome/answer/2765944?co=G | ENIE.Pl...) - yes it's useful, yes I trust it more than a lot | of AV products, but no I don't think it's reasonable to bundle | it into the program that's supposed to be here to render web | pages for me. (Similar arguments, to varying degrees, are made | against WebRTC and Pocket) | bad_user wrote: | No, it's like Chrome including a PDF viewer. | _coveredInBees wrote: | I really don't see why it should be an independent | application. I mean it's not like we expect a PNG viewer or | HTML5 video viewer to be a separate application in a browser. | Being able to view (and in this case fill/interact with) PDFs | is pretty much a basic necessity on the web. Beyond the core | HN crowd, almost nobody cares to have a 3rd party application | that they have to install to view PDFs in their browser. | Having a lightweight and secure PDF viewer that is also not | made by some 3rd party company that could be collecting any | amount of data on you is a good thing in general. | kibwen wrote: | Yes, like it or not PDF is a de facto standard of the web, in | the same way that Flash was nearly a de facto standard before | the industry-wide decade-long effort to kill it. A browser that | doesn't support PDFs is as lacking in the eyes of users as a | browser that doesn't support PNGs. | Eduard wrote: | I don't agree. | | PDF is fine to be some binary blob to download just as most | other binary blob formats are. | | Would you expect to have .exe files being directly | interpreted by a browser? | jiveturkey wrote: | > Would you expect to have .exe files being directly | interpreted by a browser? | | no, i wouldn't. and yet here we are: wasm. | jmiserez wrote: | If Flash was rendered natively in the browser, sandboxed and | across different browsers, and with high enough | performance/low enough battery impact, it would have stayed. | | There were efforts similar to PDF.js to run Flash content | using JS but they were never able to tick all those boxes. | bawolff wrote: | Finally. Its a nightmere trying to fill out a pdf form on linux. | franga2000 wrote: | Okular can handle basically everything for me, except for those | Adobe-proprietary ones that require JS and all kinds of other | dumb features that only Acrobat supports. | jhoechtl wrote: | I recently made the switch to gnome as the multi-monitor | support, fractional scaling and general Wayland support is | only excelled by sway. I sorely miss Okular! | mgbmtl wrote: | Can't you still run KDE apps under Gnome, even with | Wayland? I use a few. Some of them look better with the | "QT_QPA_PLATFORM=wayland" environment variable. | formerly_proven wrote: | Most KDE apps work not just under Gnome, but even under | _gasp_ Windows! I think Okular and some others are even | in the MS app store. | ReverseCold wrote: | Okular should work fine on GNOME, but you might need extra | disk space for all the KDE dependencies. | jhoechtl wrote: | Thats the point. Apps which only use QT like keepassx are | ok but Okular would swap in half of KDE. | kevincox wrote: | I very rarely have any issues using evince. What PDF viewer are | you using? | randlet wrote: | I purchased PDF Studio Pro and it works pretty well for me. | loufe wrote: | I just use Libreoffice Draw to add text into stubborn pdfs on | windows and any on Linux. It's a good, free OSS way to get the | job done, though not pretty. | torresjrjr wrote: | I just want to smooth scroll with vim keys (hjkl). Too much to | ask? :/ | calcifer wrote: | How is that relevant to this thread? | doc_gunthrop wrote: | Any chance Firefox will have built-in support for printing to | PDF? There's a browser extension[1], but it was last updated 3 | years ago. Seems the Chrome browser has had this feature for | ages. | | 1: https://addons.mozilla.org/en-US/firefox/addon/print-to- | pdf-... | jwatt wrote: | It's not ready for release yet, but if you flip the preference | `print.tab_modal.enabled` to true you'll get the replacement | printing interface which has a "Save as PDF" pseudo-printer. | [deleted] | callalex wrote: | Does your operating system not support this natively from the | print dialog? | auxym wrote: | On Windows at least, using the built-in PDF printer with | Firefox results in text in the PDF file being converted to | paths (not text). Huge file and you can't copy/paste. I've | tried 3rd party PDF printers (PDFForge) and the result is the | same, so I think it might a FF bug (or feature)? | | Chrome's save-as PDF produces actual text. It's the main | reason I still have chrome installed. | RonanTheGrey wrote: | That seems.... odd. I am on Firefox on Windows and I print | to PDF all the time using the Windows built-in PDF printer | ("Microsoft Print to PDF"), without issue. In fact | sometimes that printer is the only one that can get things | to format correctly! | | Something on your system might be interfering with the | printing process. | vel0city wrote: | There must be something strange your particular set up, or | maybe the behavior changes based on the page. Firefox 81, | Windows 10 version 2004, multiple computers, printing this | page with the "Microsoft Print to PDF" printer this page of | comments all result in a PDF of ~470KB with selectable | text. | elric wrote: | That built in PDF viewer is another feature that could have been | an addon. It's bloat which increases the browser's attack | surface. It's completely unneeded given that just about every OS | ships with some kind of PDF reader out of the box. | morpheuskafka wrote: | The built-in PDF reader on Windows is literally to open the PDF | in Edge.. so not very good UX for Firefox and also a good | argument that browsers are expected to have PDF readers. | [deleted] | sp332 wrote: | The alternative was installing an Adobe plugin with no sandbox, | so it made sense at the time. | godshatter wrote: | I would like to see Mozilla modularize Firefox more. Browsers | are such huge beasts that contain everything imaginable plus | the kitchen sink these days. It would be nice for these kinds | of features to be add-ons that can be disabled or deleted if | their functionality is not needed or desired, freeing resources | for other use. | | They can be part of the initial install so that Mozilla can | provide the browser as they envision it, but be able to be | removed for those who have other ideas of what their browser | should consist of. | | I don't know how technically feasible that is with their code, | but it makes sense to me from a developer standpoint. | toyg wrote: | Thank Google - Chrome was the first browser to ship with a pdf | reader, and people loved it. Now it's just expected that any | browser should have a workable PDF reader built-in. | axelf4 wrote: | > It's bloat which increases the browser's attack surface. | | AFAICT PDF.js is just another JavaScript application and thus | as sandboxed as any other website. | est31 wrote: | It's a js application and thus less exploitable than your | average C application with tons of unsound code, but IIRC it | belongs to the class of "privileged js" layer that Firefox | has, so has special rights that usual website js doesn't | have. | saghm wrote: | I just found out that this feature was coming last night, and I | hadn't realized that today was release day! I did discover that | if you want to enable it on Firefox 80, you can toggle | `pdfjs.renderInteractiveForms` in about:config | paulpauper wrote: | I wold like to see a version that allows forms to be signed | speedmagnet wrote: | Microsoft Edge allows you to draw on PDFs and save them easily. | I use it for signing all the time. | anaganisk wrote: | I think he meant digital signature | nip wrote: | In case he meant regular (drawn) signature, it can be done | via Preview on Mac. | | For a local web use, I built for myself | https://formulairemagique.fr for this very reason | tendersej wrote: | good job on the simple UI! I think it will prove useful | next time I have a form to fill. | lostlogin wrote: | Preview.app is just so good. It's my favourite default | Mac app by miles. | | It and terminal.app have survived the thing Apple does | where they update applications and remove all the | application's power to achieve anything. | shoguning wrote: | I was pleasantly surprised by this recently. Just worked | using my touchscreen laptop. So rare on Windows. | godelski wrote: | Can we just get support for math text? For years I accidentally | print research papers from the browser only to have to open it | back up in a non-browser PDF reader and reprint. | | With that and form fill I basically don't need another PDF | reader, which is nice. | Causality1 wrote: | Is there something hard about fillable forms on PDF? Why have a | PDF viewer at all if it couldn't fill out a form? | derefr wrote: | > Is there something hard about fillable forms on PDF? | | In the sense of a "form" just being lines on paper that you can | arbitrarily add some text to -- no, that's easy. | | Likewise, in the sense of a "form" being some defined input | regions that accept your keystrokes and turn them into new text | DOM nodes in the PDF itself -- easy enough. Though, unlike | HTML, there's no concept of an <input> _tag_ that just has the | semantics of accepting keystrokes and turning them into | (persisted) input; instead, this all has to be done through | scripting [i.e. writing event-handlers, or having some PDF | authoring software generate them]; and there are several | incompatible scripting languages for PDF that get used, some of | which are proprietary with no open specification. | | But, doing form _validation_? Or, worse yet, making one of | those fancy PDF forms that auto-calculates fields like an Excel | spreadsheet? Now you're getting into the hairy stuff, because | IIRC none of the _open-standard_ PDF scripting systems provide | these sorts of mechanisms, so these are inherently proprietary | things. | | And when I say "proprietary", I mean "like old versions of Word | or Photoshop, where each version emitted its own in-memory | data-structures to disk without formal serialization; and it | was the job of authors of future versions to write importers to | deserialize whatever format resulted." | foxdev wrote: | While PDF is an open format on paper, in practice it's as | proprietary as any ancient format. Supporting it in full is not | trivial. | core-questions wrote: | The real problem here is that, 20+ years on, printing to PDF | is still a totally natural and easy-to-understand metaphor | for a normal office desktop user; but producing HTML for the | browser is still impossible for them. | | If we simply had print-to-HTML functionality which resulted | in a document identical to what you view onscreen while | editing, PDF could die the death it deserves. | | But HTML+CSS somehow manages to suck just as much for common | usage, so it persists. | foxdev wrote: | I wish epub would catch on for more than books. An epub is | just HTML and CSS in a zip file, and a large part of the | world population has a device than can load it and present | it cleanly. | jahewson wrote: | Yes! PDF forms are amazingly complex. Text in PDF is very | complex and the forms themselves are a kind of templated vector | graphics. Multiply this by all the weird and corrupt PDF forms | out there which Acrobat support and you have a challenging | task. | gpvos wrote: | I don't know about you, but >98% of the PDFs I use are just for | reading and don't contain a fillable form. | | And implementing a PDF viewer is already a major undertaking; | adding the form functionality complicates things even more. | nip wrote: | I posted a link (above) to the app I built to solve that | problem. | | The vast majority of form is indeed not << ready >> for | input, requiring users to go through hoops to fill them. And | that work is done again by the next person. | marvindanig wrote: | Why is Firefox spending all their money and goodwill on a piece | of technology that should be done away with? | | PDF is a dork. It's an accessibility nightmare with no obvious | advantage over simple ordinary webpages. Somewhere in the | comments below, it is mentioned that supporting PDFs is a non- | trivial piece of technology. May be! Even steam engines have non- | trivial technology under the hood. | cptskippy wrote: | > It's an accessibility nightmare with no obvious advantage | over simple ordinary webpages. | | It is easy to criticize something when you don't look back at | the historical context through which it emerged. It has plenty | of advantages over HTML but they're easy to dismiss if you | don't have a use case for them. | inetknght wrote: | > _It has plenty of advantages over HTML but they 're easy to | dismiss if you don't have a use case for them._ | | Can you discuss some of the advantages? The only advantage | that comes to mind is that Apple has built-in support for | writing PDFs and that has a lot to do with Adobe rather than | PDF being a better candidate. | tdhz77 wrote: | I work for US Federal courts, I can assure you html isn't | sufficient over PDF's for court cases. Evidence are filed | in pdfs. Documents (PDFs) need to be a historical archive, | and the ability to modify would damage the credibility of | those documents. | kevincox wrote: | > ability to modify | | How are PDFs any less modifiable than HTML other than | requiring (widely available) specialized tools instead of | a text editor? | endless1234 wrote: | Cryptographic signing is a core feature of PDF, but not | HTML. | marvindanig wrote: | Yeah, but does Firefox need to solve the use-case of a | court system? Also, tangentially the solution to | guarantee "tamperproof" archiving is in cryptography and | that's not a feature of PDF. | yzmtf2008 wrote: | No, Firefox doesn't need to support the use case of a | court system. That's not what GP is saying. All we're | establishing here is that PDF is a useful format, and | Firefox is supporting it. | | Also, cryptographic signatures do happen to be a feature | of PDF. | marvindanig wrote: | Now that I read my comment I see the issue with it. | | What I meant to say is that Firefox should focus on | implementing cryptographic signing over HTML then. And | not a PDF viewer on the web--in that, enabling | cryptographic signatures isn't tied to the format PDF per | se. | toyg wrote: | PDF prints infinitely better than HTML, and it can be | somewhat hardened against modification by average users. | | If you think MSOffice users would prefer to output HTML | over PDF, you don't live in the same corporate world I | inhabit. | bbarn wrote: | PDF's ubiquity is 100% that it printed the same (or close | to same) on any postscript compatible printer. It's tech so | old many in the industry ignore the reason it existed (and | still exists). Every solution beyond PDF has also been | either closed source (read Microsoft) or ignored. It's | useful, that's why it exists. Yes, it's archaic, yes, it's | hard to read for tech people, but for non tech people, it | solves an issue that plagues the entire software industry: | Standardization. | toyg wrote: | _> Why is Firefox spending all their money and goodwill_ | | I doubt "all" their money goes towards the pdf-reader bit. And | tbh, I'd say nobody will really lower their goodwill towards | Mozilla because they add features that a lot of people actually | need. | beervirus wrote: | There are lots of use cases for PDF where a web page is totally | unsuitable. | marvindanig wrote: | As someone working on formats, I disagree with your | generalization. But let's get into specifics. List the things | about PDF that you believe can't be done with web pages? | cptskippy wrote: | It's easy to dismiss things in their entirety and then | require someone else to "prove you wrong". Why don't you | prove you're right instead? | | Why don't you list all of things that PDFs can do that can | also be done with web pages? | marvindanig wrote: | Sure, here's my list: everything + more. | | There's nothing a PDF can do that a webpage can't. In | fact there are a hundred of things that a webpage can do, | but a PDF can't. Including, form fields, input fields and | seamless form submissions. | | Webpages can also do this: | https://bubblin.io/cover/official-handbook-by-marvin- | danig#f... | | Disclosure: It's my work. | minerjoe wrote: | Wish I could look at your work but my browser doesn't | support javascript. I wonder what it is about. | cptskippy wrote: | Anyone can create a PDF form to capture data and | signatures, email it to someone who can then fill it out | offline, and then email it back. That's not something | easily done with a webpage, and it's not something my mom | can do. | | PDFs are easy to make and easy to work with. Web pages | aren't. | | Your work is impressive, and why would anyone want that? | Do you envision lawyers putting all their legal contracts | into fancy flippy books? | marvindanig wrote: | > Do you envision lawyers putting all their legal | contracts into fancy flippy books? | | Someone will have to solve it for the lawyers in a not so | 'fancy consumerish' way. Point is that it is possible to | do that, and Firefox shouldn't be solving this problem | using an ancient format and a layer of cruft in between. | f1refly wrote: | Distributing a document with functioning kerning and | embedded fonts that works offline | marvindanig wrote: | Serviceworkers+@font-faces+font-kerning property of CSS3. | Done, next. | yzmtf2008 wrote: | I think you missed the point of distributing. I'm never | going to let you email me your serviceworkers because I | can't forward this document to anyone without relying on | you hosting a server / not changing the content. | marvindanig wrote: | Oh, I'm all in for email/attachment based distribution. | Just not with Firefox sporting it on the web browser | where you'd in all certainty require someone to host a | server and for you to trust them that no changes have | been made to the content. | | That was the entire point of my comment at the top. | edflsafoiewq wrote: | Going to a particular page and only having to render that | one page. Large HTML documents are unwieldy. | beervirus wrote: | The modern web is slow for a lot of reasons, but none of | them are about rendering lots of static html. Anyway just | break things up into multiple pages if necessary. | derefr wrote: | Yes, maybe generally; but let's talk about the specific case | here -- filling of complex PDF forms. | | When a PDF that has interactive form fields, calculated auto- | populated fields, fields that are enabled/disabled according | to the inputs of other fields, etc. -- the organization that | created it (usually government or education) usually does | that because they want you to fill it out _using_ a PDF | viewer; save it (which will persist the form inputs "into" | the resulting PDF); and then submit _the modified PDF file_ | back to them. They want this, because they can use automated | backend processes to extract the data from the PDF. They | _don't_ want you to just print out the thing and fill it out. | In fact, many such "fillable" PDFs start off in a state with | many of their form-fields disabled and voided, such that | printing them out in that state would result in a form you | can't really write on! | | So, at _that_ point, why didn't they just make the PDF a web | page? They've essentially reinvented a web form, but with | extra steps. The only benefit a client gets is the ability to | edit and save the form offline (but that can be done in a | browser, too, with local storage); and furthermore, the | ability to treat the resulting filled form as a file, moving | it around before you submit it. But the cases where you need | that are _very_ niche, compared to the cases where you can | just direct employees to your Intranet portal. | cptskippy wrote: | The use case you're describing wasn't feasible until about | 20 years after PDFs were introduced. Web Storage isn't that | old, has only recently become widely deployed, and in a lot | of cases is disabled for security concerns. | vonmoltke wrote: | > In fact, many such "fillable" PDFs start off in a state | with many of their form-fields disabled and voided, such | that printing them out in that state would result in a form | you can't really write on! | | I have never seen this. Do you have an example? Every use | if fillable PDFs I have encountered is a use case where | submitting a handwritten form is still an option. | | > The only benefit a client gets is the ability to edit and | save the form offline (but that can be done in a browser, | too, with local storage); and furthermore, the ability to | treat the resulting filled form as a file, moving it around | before you submit it. | | I have yet to see a web form that actually saves a | readable, properly-formatted, self-contained, easy to | access, fully-offline copy. | | > But the cases where you need that are very niche, | compared to the cases where you can just direct employees | to your Intranet portal. | | This is not a trivial need; most forms sent as fillable | PDFs need to or should be retained for some period after | submission. Also, I don't know what "employees" and | "Intranet" has to do with anything. | | You are also missing the use case where a form legally | requires a live signature from one or more parties and need | to be printed, even if just to scan and return. I recently | had to do this for some insurance paperwork. | Isthatablackgsd wrote: | > You are also missing the use case where a form legally | requires a live signature from one or more parties and | need to be printed, even if just to scan and return. I | recently had to do this for some insurance paperwork. | | My company have to do this for one state government. They | required the signature to be written black inked. It is | PITA to do since we all have digital signature set up. | But nope, this state government required the written | signature. | andrewshadura wrote: | The Canadian visa application form is an example. | derefr wrote: | > I have never seen this. Do you have an example? | | I don't have one on-hand, no. But I've certainly had to | fill them out in the past. IIRC an especially-bad one | came in the form [heh] of a student-loan application for | the college I attended. It was essentially a Hypercard | stack in the guise of a PDF. | | Here are some early Adobe marketing materials (as a PDF, | because of course it is) talking about the advantages of | "eForm Solutions": https://planetpdf.com/planetpdf/pdfs/p | df2k/02E/ldefurio_pdff... | | It sounds like every PDF form you've ever dealt with is | what Adobe, in this brochure, calls a "Type 1: Print and | Fill" or "Type 2: Fill and Print" form. But Type 3 and | Type 4 forms do exist in the wild! (They're not often | _created_ any more; most of the ones that exist now are | from around a decade or two ago, when Adobe was really | pushing this idea.) Creating such forms was basically the | point of Acrobat as a software product. | | When PDF viewers (e.g. Apple Preview) say they don't | support "PDF forms", they're not talking about Type 2 | forms. They usually support those just fine. They're | talking about Type 3 and Type 4 forms. And more | specifically, the ones that use Adobe's proprietary | AcroForms data-embedding system, rather than the open- | standard XFA data-embedding system. | | (I could swear I saw an HN post about the horrors of | AcroForms once, but I can't find it now.) | | > I have yet to see a web form that actually saves a | readable, properly-formatted, self-contained, easy to | access, fully-offline copy. | | To be clear, that was what I meant by the second | qualifier, "as a file." Browsers support _persisting the | state_ of the form. Just, not _as a file_. They persist | the state internally, when the form 's author does the | client-side Javascript work to enable that. | | For the use-case where the user wants to stop filling out | the form for now (e.g. because they don't have some | required information on-hand), and then come back to it | to finish it later, in-browser persistence works | perfectly well. | | Even cleaner, though, is just building a web-form as a | wizard, where fields are submitted one-at-a-time, and you | can also freely navigate to previously-filled "steps" to | change your answers. That doesn't even require | JavaScript; just pure 90s HTML-generated-on-the-backend. | Most government sites that thought PDF eForms were a good | idea, are now falling back to this approach. | | > Also, I don't know what "employees" and "Intranet" has | to do with anything. | | Secure installations. The main use-case for fillable PDFs | (as can be seen in Adobe's marketing brochure, where | "government" is the core client) is a case where _public_ | or _cloud_ solutions just aren 't tenable, i.e. in secure | government/military/etc. installations, where the | workstations are air-gapped from the public Internet. In | such a case, PDF forms can still be sent around via a | local non-Internet-routable email server, for the workers | there to fill in. | | Today, this need can be served just as well by setting up | a non-Internet-routable web portal for those same workers | to use. But back in the 90s and 00s, "Intranet web | portals" were a fancy thing only the most forward of IT | bigcorps had on offer. They had _Intranets_ , for sure, | but they weren't hosting web-apps on them. | | So, what did they do instead? Well, Adobe had two main | competitors in the "eForm" market: | | * Lotus Notes form documents, connecting to a Lotus | Domino database server; | | * Microsoft Excel sheets that use VBA to data-bind to an | accessible Microsoft Access database file sitting on an | SMB network share. | | None of these "forms" were hand-submittable. They're all | little self-contained interactive applications, that | happen to look like forms. | | AcroForms did have the fancy property, though, that the | AcroForms application-PDF could _generate_ or _export_ a | bog-standard output-PDF representing the filled form. But | that 's not actually a modified copy of the source PDF. | That's the PDF using scripting to _generate you another | PDF_ , from scratch. | | ------ | | To be clear, I agree with all the stuff you're talking | about; those are all valid use-cases for "PDFs" (i.e. | encapsulated PostScript containers.) But they're not what | I mean by "PDF forms." I mean the Type 3/4 forms referred | to above. There's no reason, in the modern era, that one | would implement one of these Type 3/4 "eForm solutions", | instead of just putting up a webpage. | | If you need an e-signature at the end, have them fill out | the web form, then generate a raw PostScript PDF | representing their inputs, and let them sign it by | dropping a signature vector image on the dotted line in | any standard PDF viewer. | abdullahkhalids wrote: | 1. A webpage form requires a server to be up and running, | which requires an IT person to manage it, separate from the | dept making the form. PDF forms can be created by a person | given the right tools (I think Word does it) | | 2. IT person + webserver costs have to included in the | budget somewhere. Which can be a big problem. | | 3. The webpage form can fail, and the support for it has to | be provided by the IT dept. If the PDF form fails, dept can | handle it on its own, and will often accept a | filled+scanned print out of the PDF form. | | 4. Adding to the point above, PDF forms degrade gracefully, | If they don't work, or internet doesn't work, or someone is | on holiday, you can still print, fill and hand them in | person. Webpages can degrade catastrophically where you | whole dept grinds to halt while the IT person tries to fix | the problem. | derefr wrote: | Re: all four of your points -- see my sibling post. I'm | not talking about encapsulated-PostScript "Print and | Fill" forms (which do certainly degrade gracefully), or | even open-standard PDF "Fill and Print" forms (which | degrade gracefully _if_ you don 't set them up with a bad | default state where there's big "N/A" text over all the | disabled fields until you fill in other fields.) | | Instead, I'm talking about the PDFs you can basically | _only_ load in Acrobat (though, other PDF viewers do | _try_ to render them, to varying success) that actually | do data-binding to some remote database; do XHRs to | submit the form data on success; do "online" onBlur-XHR- | esque field validation; generate new output PDFs _using | scripting, from scratch_ when you ask them to save | /print; etc. | | These are applications, not documents. You can't print | them. You just use Acrobat as a glorified application | host to fill and submit them. (You can press Ctrl+P to | get Acrobat to request to the loaded PDF application that | it perform some scripted action to generate a print | output. This may or may not do anything, depending on how | the PDF was created. It usually just pops a "Printing is | not implemented for this form" box. It certainly won't | work in non-Acrobat PDF viewers.) | | When other PDF viewers say they don't support "fillable | PDF eForms", _these_ are the things they 're talking | about. They usually support "Fill and Print" forms just | fine, because "Fill and Print" forms are a somewhat-sane | format, rather than being a competitor to Lotus Notes. | abdullahkhalids wrote: | I understand better what you are saying. I don't think I | have ever seen any PDF forms that require an internet | connection. The Canadian Visa application forms have | inbuilt validation code, that checks the form, and once | you upload it, I believe data is extracted into a | database. | | The benefit of these forms is that the validated form | that you submit online is actually printable. Which means | that what you see on your screen/paper is pixel by pixel | identical to what Canada receives, and therefore | _legally_, there is no confusion about what was | communicated between Canada and the candidate. | | Webforms are not as strongly accepted as such by courts. | Because they have to be manipulated further before being | printed. | | I have read a bunch of your replies, and you are thinking | of all the technical reasons why webforms are better than | PDF (you are right in that), but PDFs have legal and | operational and budgetary advantages, that are more | relevant to various organizations. | rk06 wrote: | PDF is widely used and supported. And FWIW, edge does support | it. | shanecleveland wrote: | I've built fillable PDFs for a manufacturing business. Links are | provided within to the company website to the PDF files, which | typically now open in the browser, with varying degrees of | reliability. Unfortunately, many people assume this is just | another page of the website and that they should be able to | interact with like any other web form. Always fun trying to | explain this. | voldacar wrote: | Truly revolutionary tech | rpastuszak wrote: | Well, it's a PDF reader that doesn't come with a tracking | package, so in a way--yes. | inetknght wrote: | > _it 's a PDF reader that doesn't come with a tracking | package_ | | Uh, what? Firefox supports javascript. PDFs support | javascript. Javascript empowers tracking. | Hamuko wrote: | Firefox PDF support is actually Javascript. | | https://github.com/mozilla/pdf.js | MCOfficer wrote: | following that logic | | - every browser that supports cookies comes with a tracking | package | | - electron comes with a tracking package | | - every language interpreter, runtime or compiler _is_ a | tracking package. | | - your OS can run tracking software, thus coming with a | tracking package. | | - Anyone carrying their phone comes with a tracking package | | Hang on, did you just post something on the internet? Your | HN account comes with a tracking package! | gpvos wrote: | Which PDF readers contain tracking? Anyway, there are several | open-source ones that don't. | ilikehurdles wrote: | Acrobat and Chrome come to mind. | gspr wrote: | What's the problem with the Poppler-based ones? I've been | producing (with LaTeX) and consuming (with Poppler/Okular) | PDFs for a decade and never once have I had to worry about | anything related to the format (including tracking). | rpastuszak wrote: | Poppler looks great! But, I _just_ learned about it and I | don't think that the majority of population, say, outside | of HN knows about its existence, so it's good to have a | fairly mainstream alternative available. | | OK, Firefox is, sadly, far from being a mainstream browser | nowadays, but still I suspect it has a larger user base | than Poppler. | lumberjack wrote: | You say in jest, but this simple upgrade very likely improves | the lives of more people more significantly than some billion | dollar unicorns ever do. | ManBlanket wrote: | And here I was thinking we were living in the future when I | could print out a pdf, fill out the fields with a pencil, take | a picture of it, then email it to myself, change the file type | back to pdf, and send it to whomever requested it... | mxuribe wrote: | I think it was William Gibson who once stated something like, | "The future is already here, it is simply unequally | distributed...in that, some people just fill out PDF forms, | while others have to print it out, fill it out with a | pencil...etc...." Ok, maybe i'm remembering that quote | inaccurately. ;-) | ansaso wrote: | Seems so absurd that filling a form digitally is breaking tech | news in 2020. PDF in a nutshell. | | Does anyone see a trend moving away from the PDF standard in | recent years? Tried to look for data on it but found nothing. | [deleted] | spidersouris wrote: | Search in PDF is broken since this update. I must go through the | whole document for Firefox to load it up and being able to search | in it. Couldn't find a similar issue on Bugzilla. Anyone having | the same problem? | cpeterso wrote: | I just tested PDF search (in an IRS PDF in Firefox 81 on | Windows) and it works for me. | | Do you see the problem in all PDFs? Maybe there is something | unique to the PDF you are searching? | brendandahl wrote: | Please file a bug | https://bugzilla.mozilla.org/enter_bug.cgi?product=Firefox&c... | mattashii wrote: | I've seen the same happen, so I've filed a bug with my | reproduction: | https://bugzilla.mozilla.org/show_bug.cgi?id=1666575 | DesiLurker wrote: | One Giant Leap for mankind! not /s. | dgellow wrote: | A note for Linux and macOS users, from someone who switched to | windows one year ago: it's maybe surprising but it is a VERY REAL | pain in the Windows world to find a pdf reader that also allows | you to edit forms, that doesn't also come with malware or adware, | and has even just a modest UX! | | So for sure you already have access to Evince and Preview.app, | they already do everything you want, but Windows users don't | really have that luxury! Being able to say to users to just | install Firefox if they want to edit PDF is really good IMHO, way | better than the current situation. | jiveturkey wrote: | eh? acroread is very easily found. | ImaCake wrote: | Just to provide anecdata against the current comments. I | totally agree with you. It's not particularly hard if you are | pretty tech savvy, but the for the average user you pretty much | are stuck with adobe. Or you can try your luck with the | edge/chrome pdf form fill but there's a decent chance it just | won't bother saving your input. On adbobe, it is still full of | extra crap that is irrelevant to everyday use. I think it still | bugs people to update it all the time, but I don't use adobe, | so I don't know. | inopinatus wrote: | I read that as suggesting this is potentially a killer app for | Firefox adoption in the enterprise. | maxerickson wrote: | What comes with Adobe Reader? | | I have it on my work computer and haven't noticed anything I | would rate as particularly obnoxious, but I don't use it much. | mickotron wrote: | Okular can "edit" forms. I have been doing this on Linux and | Windows for a while. Not the most usable but it works. What I | can't do in Okular, I do in Gimp. | | I will use Firefox for editable form pdfs but for those that | don't have editable forms, I will continue to use Okular/Gimp. | | I actually stumbled across the ability to edit forms in Firefox | only recently. I was like... What? This is amazing! For some | reason the pdf i clicked on opened in Firefox and yeah, | surprised. | MayeulC wrote: | And IIRC it's available on the windows store. It probably has | msi as well. ___________________________________________________________________ (page generated 2020-09-22 23:00 UTC)