[HN Gopher] Paperless-ngx - Open source document management system ___________________________________________________________________ Paperless-ngx - Open source document management system Author : thunderbong Score : 407 points Date : 2023-10-07 11:55 UTC (11 hours ago) (HTM) web link (nerdyarticles.com) (TXT) w3m dump (nerdyarticles.com) | nkrisc wrote: | It finally happened to me: the very thing I just started | researching and testing out showed up simultaneously at the top | of the front page. | | I've found more good information about Paperless right here in | the comments than anywhere else so far. | eviks wrote: | Is there any modern solution that doesn't tie you to the clunky | interface of a single web browser client? | | While the folder organization criticism in the article is on | point (although you could also use tags that many file systems | support, but that's not a reliable system to invest time in, or | maybe if it's backed up by some app that can restore all the | tagging it could be), the range of native tools for | viewing/editing various document formats as well as your ability | to customize your workflows in unparalleled | lucas_codes wrote: | How do people usually backup their self-hosted docker services | using postgres? I have been using docker-volume-backup [0] and | just saving the postgres data directory, but I've found it | requires a minute of downtime to backup properly. | | [0] https://github.com/offen/docker-volume-backup | efrecon wrote: | I have this: https://github.com/efrecon/pgbackup | mastax wrote: | ZFS snapshots | darnir wrote: | Specifically in the case of paperless-ngx, I use their export | facility from a cron job. The export is plaintext and contains | all the information needed to recreate the postgres db and the | learned identifiers. In case of a disk failure (and I've had | one with my paperless store), I just reimported the previous | days backup from my offline backup of paperless' export. | asmor wrote: | restic container with all volumes mounted to | /backup/<volumename> (and . to /backup/self - use named | volumes, not binds) in my composefile with scale 0 and a | backup.sh that's essentially | | docker compose down && docker compose run backup && docker | compose up -d | | The restore procedure is the same, you restore the composefile | through restic on the host and then `docker compose run backup | restic restore latest --exclude "/data/self/*" --target /` | | I find it's fast enough because restic is incremental, but if | you can set this up on a filesystem with snapshots that would | be a great option too. | | Restic takes a bit of fiddling around too. I mount a prepared | ssh config, a known hosts file and a private key. | andix wrote: | For now I only backuped some databases with a pg_dump one liner | triggered from a cron job on the docker host (via docker exec | or docker run --rm). No idea how this scales for big databases. | But for your regular home server <10 GB databases this should | just work. | syntaxing wrote: | I used vackup [1] that's been obsoleted but still works for me. | However, you still need to turn of the container temporarily. | | [1] https://github.com/BretFisher/docker-vackup | jhot wrote: | docker-compose --env-file .env exec postgres /usr/bin/pg_dump | -U postgres "$db_name" | gzip -9 > | "$BACKUP_ROOT/postgres/${NOW}.${db_name}.sql.gz" | poorlyknit wrote: | pg_dump [0] (or pg_dumpall, linked there) sounds like what you | want to use. You could docker exec into the postgres container, | then copy the dump from the volume to your backup location on | the host. | | A bit more contrived than copying the volume but you don't need | to shut down the server. There's probably some scripts out | there for doing this in a structured way but I usually do it | more or less manually/use a bash script. | | [0]: https://www.postgresql.org/docs/current/app-pgdump.html | kstrauser wrote: | This is nifty, but seems to lack to one thing that keeps me | coming back to DEVONthink: a learning classifier. | | With DT, say you've scanned or saved 20 docs to your inbox and | you want to sort them to their long-term homes. DT will suggest | folders based on how closely the new file matches the contents of | those folders. It has the UI equivalent of "this looks like 2023 | state taxes. Is it? This looks like kid #2's school stuff. Is it? | This looks like the older dog's veterinarian records. Is it?" | | That's so, so nice. | | Lately, as an experiment, I've been playing with organizing my | docs with Johnny Decimal, then using the Hazel app to sort known | docs with fixed structures (think bank statements and the like) | into the right folders. My ScanSnap scanner's software does OCR, | so by the time docs land in the inbox folder, they're ready for | automated processing. It's working pretty well so far, and I may | stick with it. | | But if I _were_ to go back to an app, it would be DEVONthink or | something with most of its features. That classifier is too darn | nice, plus its smart rules, plus its scriptability, plus multi- | device sync, plus Markdown notes with wiki links to stored docs, | plus a thousand other niceties. | pydry wrote: | I thought I wanted this originally when I first started going | paperless but I quickly realized that as long as I OCR | everything and throw it in a pile I can easily grep for "state | taxes" and 2023. | lolinder wrote: | Paperless has this--when I upload a new file it will attempt to | categorize it automatically using my existing tags. The more | items I put in each tag the better it gets at categorizing | them, so it definitely seems to be learning somehow, though I'm | not sure on the details of how it works. | | I've never used DT, so it's possible that their system is | substantially better in some way. | Xerox9213 wrote: | Paperless uses tags and will auto tag based on previous scans. | IME it works very well (as long as you have a decently sized | library of tagged documents) and seldom do I have to add my own | tags. It's not perfect, though, and sometimes I have to go in | and fix some of the tags. | | https://docs.paperless-ngx.com/advanced_usage/ | kstrauser wrote: | Oh! Looks like I was wrong. Nice! | | I'd still miss DT's zillion other things I've used over the | years, but that one would have been a dealbreaker. | vr46 wrote: | Previous conversation also here: | https://news.ycombinator.com/item?id=37521492 | midnitewarrior wrote: | From what I can tell, DT is only on Mac, and not open | source. If the company goes under, good luck. | steve1977 wrote: | You can always export the files and you could also access | them directly in the applications document database if | needed. | dgrabla wrote: | My Paperless-ngx listening on a network share + brother ADS-2800W | are key to stay sane. My only complain is that it is resource | hungry. If I allocate less than 2G RAM to the paperless VM it | does not work as it should. | petepete wrote: | I have this exact setup but with the ADS-4300N. I'm new to it | and it's still a novelty. | | My only complaint is I've had the odd letter get scanned upside | down and there's no way to rotate pages in Paperless-ngx. | jplunien wrote: | https://apps.apple.com/app/id6464425056 | | Just recently started working on an iOS/macOS app for it. Hope | you like it! | Obscurity4340 wrote: | How would you compare this to something like DevonThink, out of | curiosity? | bketelsen wrote: | Nice, looks like you're headed in a good direction with this! | apfsx wrote: | This is great, nice work. | ipsi wrote: | I've spun up a copy of this recently (within the last month) and | it's already proving helpful. | | I've purchased a new-build home in Germany, and I'm currently in | the stage between "purchased" and "ready for move-in," and if | you've ever purchased a Neubau in Germany you know how much | paperwork is involved - I get so many documents over email, many | of which are scanned (to preserve the wet signature and stamps), | and some of which I need to copy into a translator, that this is | incredibly helpful. It checks my email, grabs PDFs, straightens | them, OCRs them, adds a correspondent, tags them, and makes them | available through a web UI. | | I also appreciate the full-text search (for all that it might | struggle if I had tens of thousands of documents) as I've had to | go and try to find particular documents where the name of the | document I've received might be a synonym for what the other | person is asking for, but the word they're asking for is at least | used in the text. | | I'll also set it up to pull documents from my NAS as well, where | the scanner writes to, as I also receive a number of documents | via mail (that I also occasionally need to translate or | copy/paste from). | | There are also some limitations that annoy me: | | * I really wish the email filters were more flexible - right now, | I have to have three filters, one of PDFs, one for JPEGs, and one | for PNGs, so I wish I could just set a regex for the attachment | name. This one annoys me enough that if I ever have time I'd look | at doing a PR for it (assuming the filtering is done locally and | not on the IMAP server). * I'd also like to be able to setup | rules to tag documents based on the email domain (e.g., house- | builders get tagged as "house-builder, house") without having to | manage a gigantic explosion of rules. In theory the ML should | handle that, but... I'm mistrustful of ML. We'll see in a few | months if I was too hasty in my judgement or not. * I'd like to | retain slightly more information about the correspondent, like | both name and email address (there's no consistency about who has | their From line as "Name <email>" and who's just "email", even | within the same company), both for de-duplication of | correspondents and domain-based searching. * I wish I could share | documents more easily than downloading it and re-uploading it to | my email client (or mounting the folders and trying to find the | right document, but that has its own set of problems). This one | of those problems that's really easy to state, but potentially | quite difficult to actually implement - could a web application | add a PDF to the clipboard in such a way that GMail, say, would | understand what was happening and add it as an attachment when | pasted? | | Overall though, I'm pretty happy with it, and finding it useful | so quickly was somewhat surprising. | jdoss wrote: | If you are looking to quickly setup Paperless-NGX check out my | little side project https://github.com/jdoss/ppngx. It will setup | everything you need to run Paperless-NGX (PostgreSQL, Redis, | Tika, Gotenberg, PaperlessNGX, and SFTPGo) inside a Podman Pod on | a Linux based system. You can optionally set it up to start on | boot via systemd. | | I run this locally on my workstation and send PDFs many times a | week from Brother ADS2800w scanner via SFTP. Paperless NGX has | reduced my home office paper piles to almost zero. It is a | fantastic open source project and I am very thankful it exists. | wolverine876 wrote: | > everything you need to run Paperless-NGX (PostgreSQL, Redis, | Tika, Gotenberg, PaperlessNGX, and SFTPGo) | | That is a lot of dependency. How stable is Paperless with all | those applications making uncoordinated changes on their own | schedules? | darnir wrote: | The only hard dependencies are Redis and Postgres. The | official stance is to run them from the provided docket | compose and the container for paperless-ngx itself is kept | updated and working for the stable containers of redis and | postgres. | | Tika and Gotenburg are additional features for scanning and | converting MS Office documents to PDF. Not necessary and I | don't use them in my setup at all. Same with sftpgo. I'm not | sure for its usecase. But paperless doesn't directly depend | on it in anyway. | traverseda wrote: | Why would you want to use this over one of the official docker | compose setups? https://github.com/paperless-ngx/paperless- | ngx/blob/main/doc... | | They will also automatically launch if you have docker running | at boot. Is it just because you prefer redhat/IBM's docker | equivalent stack to the much more common and cross platform | docker install? | jdoss wrote: | I don't use Docker at all on any of my infra or workstations. | That's why I made this. | traverseda wrote: | Alright, but you've sort of re-invented docker compose | there, but as a shell script. These days docker compose | even work with podman if you really prefer IBM's docker | implementation to the original. | efrecon wrote: | Well... Maybe re-inventing was part of the fun or a | learning experience. If you want, there is even this: | https://github.com/Mitigram/docker-compose-build | abacate wrote: | I would want this over docker and docker-compose any day. | | I've been using docker compose in production for a couple of | years now and it adds another layer on top of systemd that is | a continuous source of headache, especially during updates. | | Podman gets it right: no central daemon, can automatically | generate systemd services for a whole pod. Updates are | seamless. | | This by itself is enough of a reason to me. | growingkittens wrote: | Paperless-NGX doesn't have document version history, | unfortunately. | | Right now I am looking at OpenProDoc [1] and bitfarm-archiv [2] | as document management possibilities. | | [1] http://jhierrot.github.io/openprodoc/Spec_EN.html | | [2] https://www.bitfarm-archiv.com/document- | management/features.... | lobochrome wrote: | I am just rcloning my paperless-ngx document volume to s3 deep | glacier every night for this. | | It's a bit "scary" since even documents I delete in paperless- | ngx are thus preserved forever, but it may come in handy | someday. | andix wrote: | I'm looking for a suitable document management system for a | while. There is one feature I would like to have, I didn't find | anywhere except maybe in $$$ enterprise systems: | | I want to add custom metadata to documents by | categories/tags/folders, for example like this: | Invoice {issued: date, invoiceNumber: string, amount: number, | due: date} Contract { validFrom: date, renewsAt: date, | autoRenew: boolean} | | When adding a tag like this, it should either automatically fetch | this information from the content document (probably very hard) | or give you a manual workflow to type it into a form, while | showing the document next to it. Maybe just by selecting the text | from the PDF. | | In the folder list and in the search you would be able to add | those meta data information as columns, sort them by value or do | queries (tag:invoice AND invoice.amount > 1000) | | Edit: this feature seems to be one of most upvoted feature | requests for paperless https://github.com/paperless- | ngx/paperless-ngx/discussions/1... | jamala1 wrote: | Is Paperless suitable for business use, say, for a smallish sized | company with 25 employees and 1000 customers. I think in my EU | country such systems need to fulfill certain requirements like | versioning/tracking of changes. | ephimetheus wrote: | Shameless plug: I recently released a native app for iOS that | connects to Paperless-ngx: | | https://apps.apple.com/de/app/swift-paperless/id6448698521 | petergrace wrote: | I use MayanEDMS personally, and have for the past five or so | years. It's complex but does what it says on the tin. | | https://www.mayan-edms.com/ | growingkittens wrote: | Mayan EDMS recently moved a lot of basic documentation behind a | subscription paywall. | saintradon wrote: | I tinkered with this a few weeks ago. Pleasantly surprised with | it's capabilities. | lwhi wrote: | This is very interesting to me. | | I'd love it if I could also use my mobile devices to bring up | paper docs instantly (mobile phone, tablet, kindle). | lobochrome wrote: | There is even a nice Oss Swift app now in the app store. v1 but | looks nice is fast and simple. | | https://apps.apple.com/app/id6448698521 | ephimetheus wrote: | I made that! Glad you like it! | diarrhea wrote: | Easily possible. Paperless-ngx works great on mobile as well. I | have WireGuard on my phone and connect that way, then simply | use a mobile browser, no app needed. | lwhi wrote: | Nice! | LeSaucy wrote: | It's not free/oss, and it's on the Apple ecosystem, but | DEVONTHINK does a fantastic job of this, and supports storing | all of your documents in a webdav store which you can host | yourself. It uses Aabbyy fine reader for ocr which I have found | to provide better results than tensorflow based ocr. | rufugee wrote: | I've been using DEVONThink for just this for a few years, and | it's very good at it. However, it's macOS only and has far | more features than I need (simple searching, tagging, and | organization). I tried paperless a year ago and the search | and rendering was far too slow, and many docs just gave | obscure errors. Perhaps it's time to give it another shot. | I'd love to have something on Linux that could handle my | large repository of documents. | kristofferR wrote: | Is this in reality a German cry for help, disguised as tech talk? | | As one of the least digitized countries in Europe, and the | digitalization budget recently cut 99%, it seems like they still | need to use paper in their lives, and it's not gonna improve | soon. | | This feels so incredibly archaic to me as a Norwegian, I would | have to print out documents to have anything to fill paperless- | ngx with. | _frkl wrote: | You can just use your digital documents directly, and augment | it with the few paper receipts that you might (or might not) | still have to deal with. The main selling point is really | document management (to me, anyway), the 'branding focus' on | physical documents is probably a little misleading. | greenicon wrote: | You can easily use this for digital documents as well. The only | difference in my setup is a tag showing whether the document id | maps to a physical document in a binder or not. | diarrhea wrote: | I track, using tags, whether a document is a scan or properly | digital. The pendulum is strongly in favor of the latter: I use | this tool a ton for natively digital documents as well. | Invoices, contracts, tickets etc. all come in as PDFs anyway, | luckily. I have all that knowledge at the tip of my fingers. | Yes, some of those documents are scans and used to be physical | paper, but that's besides the point. | rayshan wrote: | Genuine question: for simple needs, why use this or DevonThink | over macOS' built-in features? macOS now does OCR (Live Text), | has tagging, and spotlight search is fast (but sometimes presents | too many results to be useful). I even stopped splitting PDFs | into separate documents and organizing them into folders. I just | search. | acka wrote: | Obvious answer: because, contrary to popular belief, not | everyone uses macOS. | phodo wrote: | Does auto OCR work on iCloud files ? For example: I scansnap a | huge collection of documents to a folder that is on iCloud | (synced w desktop). It works great because it is so simple. | However if I have, say, PDF document, will the Mac ocr | functionality perform the OCR if the doc is on iCloud and will | I then be able to search for the text in that doc via spotlight | / finder ? I tested this a few years ago and the search on | content inside scanned PDFs did not work. I had looked at | Paperless but decided to stay on Mac os file system. | darkteflon wrote: | Yeah. I had a Devonthink-based setup but after one too many | database corruptions I threw in the towel. Now I just OCR scan | everything into a few MacOS folders and search using Houdahspot | (Spotlight, I found, was not suitable for fine-grained search). | I'm very happy with the setup. | ndsipa_pomu wrote: | This is more designed for a self hosted server, so if you want | multi-device web access then it's a great solution. I can | download a PDF on my android phone and upload it to my | paperless-ngx instance in a couple of clicks and easily edit | the tags as necessary. It's great for travelling as you're not | reliant on having a locally installed application on your | chosen device with you, and of course it would still be | available if you lost your main device and only had your phone | on you. | LVB wrote: | I used to be the target audience and really enjoyed having my | system just right, sorting and tagging everything, etc. But | over the years I realized that I wasn't really benefiting much, | and gave SwiftScan on my iPhone + dumping into and iCloud | folder a try. For my needs, this has worked fine. It is rare I | even need to refer to the scans, and the macOS OCR + automatic | dates usually let me find the doc quickly. In the worst case I | browse thumbnails. | aetherspawn wrote: | If anyone is looking for a fully-commercial version, we use | something like this -- it is called Hubdoc and it is free with | any Xero subscription. | | I really really appreciate the work that went into paperless, but | for us the business risk of self-hosting this is far too high | because if we lose our docs we lose our tax proof. | xwowsersx wrote: | I wonder if people know about Google's Stacks app? I don't know | if it's as powerful as Paperless-Ngx, but it lets you organize | docs pretty easily and some of it is automatic. I have "stacks" | for insurance, id cards, receipts, medical records, etc. Whenever | I get paper mail, I snap a photo and immediately toss it. I can | then organize it in the Stacks app and easily be able to pull it | up later. It's a pretty useful, easy solution IMO. | swader999 wrote: | Until they cancel it. | xwowsersx wrote: | True :((( | yunohn wrote: | I usually don't jump on the "Google cancels everything" train, | but do keep in mind that Stacks is a project from their Area | 120 incubator, which saw heavy layoffs [1]. It's not on the | remaining list, so it may have already been cancelled | internally and currently in the process of being shut down. | | [1] https://techcrunch.com/2023/01/25/google-spares-three- | area-1... | Eddy_Viscosity2 wrote: | If it starts with 'google' then at best its something you try | out then, if you like it, try and find that functionality in an | app made by someone else. Google will kill this app just when | you get fully invested. All google apps are traps and foot- | guns, especially the ones that work great. | xwowsersx wrote: | Probably right | navigate8310 wrote: | Definitely scary as it's under their incubator area120 | hoppyhoppy2 wrote: | I can't get it on my, ahem, _Google_ Pixel device running | Android 13: | | > _This app isn 't available for your device because it was | made for an older version of Android._ | dstroot wrote: | Also: | | "Stack is only available on Android in the U.S. You can | install it through the Google Play store." | xwowsersx wrote: | That's weird. I'm using it right now on the Pixel 7 Pro | running Android 13. | jeleh wrote: | If you own a Synology NAS I recommend to have a look at synOCR: | | https://github.com/geimist/synOCR/wiki | | English translation: https://github- | com.translate.goog/geimist/synOCR?_x_tr_sl=au... | | I've been using this for several years and it works great. | JW_00000 wrote: | What I don't really understand is, do people really have than | many physical documents that they need to keep track of, that | such a system is worth it? E.g. to file my taxes (in Belgium), I | think I only ever need a few (maybe even only 1 or 2) digital | documents. Or is this more a mentality thing? I know my parents | have folders and folders, e.g. my father kept all expense notes | from his work even after retirement... I throw everything away | once it's handled. | _frkl wrote: | Can't speak for physical documents in general, but personally I | really appreciate paperless-ngx for it's general document | indexing/storage. Being able to scan and ocr physical documents | (usually using the camera on my mobile phone) is very nice, but | I mainly use it with pdfs that paperless automatically fetches, | ocrs (if necessary), and tags from my email inbox, or which I | copy into a specific local folder which gets synced with | paperless. | | Getting all my invoices from last year to prepare taxes is now | just a simple query in the paperless UI, the result would be | about 95% digital and 5% physical documents, probably. Of | course I could do all that old-school using filesystem folders, | but having all my documents indexed and searchable in a single | place was definitely worth the (small) effort of setting it all | up and keep it running. | kristofferR wrote: | I don't understand what you mean with prepare taxes. | | I just add all purchases/sales right when they happen in my | accounting app and attach the invoice PDF. Then when I have | to file taxes, I export the correct numbers. | | Are you doing your bookkeeping in Excel or something? | _frkl wrote: | This is just for my personal taxes, no accounting involved. | I just get all the relevant stuff together once a year. Of | course it's not 10s of 100s of documents, but still enough | so it would take me some time to get everything together | manually. | | Also it was just meant as an example, paperless is | generally useful (to me) in situations where I need to | access somehow related documents, like traveling and such, | or searching my documents for some information. As I said, | there are other systems and ways to do this, but for me | this is the one that stuck. | NoboruWataya wrote: | I'm quite paranoid about throwing stuff away so for me it's at | least partly a mentality thing. I probably save a lot more than | I need but it gives me piece of mind to know that it's all | there. There are some things that it is very helpful to have | easy access to, like utility bills and bank statements (which I | occasionally need for KYC stuff) or ID documents. | ipsi wrote: | Kinda - at the moment I'm receiving _a lot_ of documents, | mostly as PDFs via E-Mail (some the original digital version, | some scans of physical copies), but some via post as well. | | I've only added documents I've received this year (plus a | couple of dozen documents going further back), and I've got | ~250 in there, with a total of ~2.5m words (although I think | word-count is a fuzzy concept in German). | | I've posted a top level comment in more detail, but yeah, it's | helpful to me. | kstrauser wrote: | I guess it's partly a mentality thing for me. I've had numerous | cases of sadness that I couldn't produce a necessary document, | and gladness that I was able to pull up something presumed long | lost. For me, it's easier to save everything "just in case". It | all adds up to less than 50GB so it's not an enormous amount of | data to store by current standards. | | Seriously, a couple cases of "sorry, I don't have proof to back | up that tax deduction" or "hey, here's the receipt proving that | our TV is still covered by warranty!" make it all worthwhile. | dividedbyzero wrote: | Definitely, Germany strongly believes that a document that | hasn't been a physical piece of paper at least once can't be | real. That makes for folders upon folders of documents and it's | actually worse than back in the 20th century because generating | and mailing documents has become way easier and cheaper, so | things that would have been a one-page typewritten letter back | then now are five ten-page ones full of automatically generated | crap. One lengthy illness in the family alone filled hundreds | of pages and it can be very hard to know what can be thrown | away at which point. | schlowmo wrote: | > Definitely, Germany strongly believes that a document that | hasn't been a physical piece of paper at least once can't be | real. | | I'm sorry to tell you that is a an oversimplification and | especially for documenting expenses as a company/freelancer | it's kind of worse. | | Last time I checked if you want to follow the tax law to the | word you're not allowed to change the medium: | | If an invoice came as a paper copy (e.g. by snail mail), this | paper copy is the original. If you scan it the digital | version isn't. | | If an invoice came as a digital document (e.g. a PDF by | email), this digital document is the original - a printed | version of that digital document isn't. | | So if a tax inspector asks for "originals" it's technically | almost impossible to provide them in the sense of the law. If | even a tax inspector would care is another question. | germanier wrote: | It's perfectly legal (and common) for a decade now to scan | documents and destroy the paper original as long as you | follow some guidelines. Keyword is "ersetzendes Scannen". | | And yes, they care about those rules and that you provide | "originals" according to that definition - in particular | that you didn't modify digital documents in any way. You | can (and should) comply with that and there are service | providers to help if you are to small to set that up | yourself. | schlowmo wrote: | Thanks, today I learned about "ersetzendes Scannen". I | just checked and it's exactly a decade (2013) since it's | allowed which coincidetally is the year when I started | working as a freelancer (and I have to care about such | rules). | | I admit that my last paragraph was kind of hyperbole, but | I never heard (at least from other freelancers) of a tax | inspector which wasn't happy with either everything | printed or everything digital. I guess they really start | to care if they suspect something fishy. | noAnswer wrote: | Another search/keyword is "Revisionssicher". If you | storage/software has that, you a good to go. | greenicon wrote: | Just a side note to this and the other replies: You can | also keep the original documents and add scans to paperless | for indexing, etc. Since I switched to paperless I keep my | originals in binders just ordered by the paperless id, so I | can retrieve the original when required. | ipsi wrote: | Yeah, I'm also in Germany (although not German) and installed | Paperless because of this! | | I think more than a few of these projects are started and/or | maintained by Germans due to the astonishing number of | documents received - e.g., paperless-ng appears to have been | done by a German, although neither the original Paperless nor | Paperless NGX immediately appear to be. | esafak wrote: | I would be in favor of not scanning them, forgetting about | them, then throwing them away when I eventually see them again | and deciding I did not miss them. | whateveracct wrote: | It's nice to throw papers away without worrying about it. Or to | archive instruction manuals for stuff I own - paperless is the | first place I look (its search is nice). | krupan wrote: | In my experience, no, you don't need this. The few things I | keep just go in folders named for the year under my Documents | folder, and they are given descriptive filenames like | paystub-2022-10-15.pdf, or companyA-w-2.pdf. In the rare cases | where I need to go back to those (like for a loan application | or doing taxes) it's easy enough to find them. | faiD9Eet wrote: | You are right, you do not want to lookup documents that old, it | is a waste of time... ... unless you are a German and the state | asks for your time sheets three years in the past because | you've gotten child support and are requested to prove your | working hours. ... unless you happen to have an accident and | your insurance is fighting with another insurance who's gonna | pay and they ask you about the incident two years later ... | unless you end up in a contract fight with the postal operator, | that can take a year of mailing before being settled. | | Some correspondences take years and only add a mailing every | few months. You would like to have a thread-like view -- as in | an electronic mail. That is the strength of document management | systems. | djbusby wrote: | I have a small business in USA. For federal business taxes I | need 6-7 documents. Then that process creates other documents I | need for personal taxes, which also requires 6-8 more | documents. So, I'm roughly 20 important documents per year for | federal taxes. Nexus in 3 states, adds more. And save them all | for 7 years. | | The other end of the spectrum in USA is filling with the | 1040-EZ which is like a 3-4 document process. | catlover76 wrote: | Seriously. People in this thread are describing some setups | that momentarily seem cool in theory, but are almost certainly | overkill for personal use. | whateveracct wrote: | Luckily, running paperless-ngx on my NixOS desktop is | trivial. And it was also trivial to make it accessible over | an avahi name on my local network. So it was kind of a "why | not" sort of thing. | yunohn wrote: | In the Netherlands, government bodies are regularly pushing | everything they can to a digital inbox - which I vastly prefer. | My simple, single-employer yearly income tax is all pre- | calculated. Further, deductions for mortgage interest, | healthcare, studies, etc are all pre-filled as much as | possible. I think you only need to upload documents for | complicated sitations or audits? | | Of course, I still quickly download my year-end | bank/salary/mortgage statements and cross-verify the tax | departments numbers. The whole process takes at most a few | hours. | | IME Germany has significantly more hard-copy requirements. | t0mas88 wrote: | You never need to upload the documents in the Netherlands, | their software doesn't have such an option. | | But technically you're expected to keep the documents at | least until you receive the "definitieve aanslag" and if | you're nitpicking I think there is a 7 year term for the tax | services to come back on your filed taxes and change things | or demand proof. | | Practically that doesn't happen if you accepted their pre- | filled numbers and they match your employers. But if you're a | freelancer or other non-standard case I would keep digital | copies for a few years just to be sure. | yunohn wrote: | > You never need to upload the documents in the | Netherlands, their software doesn't have such an option. | | Ah, interesting. I just assumed my situation never | triggered it. | t0mas88 wrote: | Depends on your tax situation. For my private taxes it's maybe | 3 or 4 documents and those from the bank etc have all gone PDF | anyway. | | But when I was a freelancer I used a document scanning system | provided by my bookkeeper. It worked similar to this open | source thing, scan to PDF, automatic OCR and classification. | Needed it because many invoices still arrived on paper, and | receipts for restaurants etc I usually took a picture to | upload. | kristofferR wrote: | In some countries like Germany, the government still | communicates with its citizens by snail mail. Important | documents are usually physical there. They are one of the least | developed countries in Europe with digitalization, they are far | behind. | [deleted] | Macha wrote: | So here's an example where it came in useful to have back | documents: | | I recently purchased a house. As part of the process, I needed | to apply for a mortgage. The bank wanted a statement from my | employer about my income from them, along with my last 2 | complete years tax documents. | | The bank had an inquiry. My employer had said my salary + bonus | was X, but in the first of these two years, my tax documents | said my income from my employer that year was 2.5X. The extra | 1.5X was due to the employer being bought out and some change | of control terms in the RSUs causing immediate payout of what | would normally have been paid out over 4 years. Since I kept | the documents of the RSU terms and the payslips, I could | provide these to the bank to clear the matter up. | | Notably, had I not kept my own copy of these documents, I could | not have gone back to my employer for new copies. Due to the | change of control, they had changed payroll vendors, and had | eventually terminated the contract with the old vendor, so I | could not have gotten a payslip from 1.5 years ago. Similarly, | in the move to the new owner's HR system, the company had lost | many of their records of agreements with employee's, including | contracts etc., so it's not clear they would still have the | terms of the RSUs, especially since the change of control | payout rendered this a "completed" transaction. And later | events made it clear that they did not have, e.g. a copy of my | employment contract. | | Similarly, if I ever had had a dispute over the terms of those | contracts - if I hadn't kept a copy of the contract, and the | company definitely hadn't kept theirs, any dispute would have | been my word against theirs. | iamwpj wrote: | Companies are legally required to keep payroll records for | multiple years (depends on where you live, though I doubt | most places are less than 3-4). This is ok advice, but these | systems don't just work like this. If you didn't have the | documentation the bank would likely take your approved tax | filings as evidence and move on with their day. | | In a real contract dispute your copy of a contact from your | documents isn't notably different in the eyes of the court | than one from your employer. They're both notarized and if | there's a dispute between them there is established | processes. Aside from some titles or etc., historical filing | ownership is typically relegated to the document originators. | viraptor wrote: | It's not just for physical documents. I have payslips which may | be useful in the future, but are would be really hard to | recover when I leave the company. Any invoices which come to my | email. Any bank documents which exist in a vaguely named | "account updates" email. And many other things which could be | possible to find in the future, but are much better in | paperless with appropriate tags and OCR. | | But yeah, then there are for example the bank account contract | updates which come by physical mail only. | | > expense notes (...) I throw everything away once it's | handled. | | Don't know about your location, but I need to keep the tax | related documents for 5 years in case of an audit. | abbbi wrote: | using paperless for some months now and i really like it. Nice to | see the project got some new contributors and frequent releases. | noodlesUK wrote: | One thing that I've done that makes my paper handling process | much easier is have my printer/scanner point to a write only | samba share. Most HP printers support this. I wrote a short | script that looks for new files in there (with inotify), runs OCR | on them with OCRMyPDF and moves them to a different file share. | It means that my non-technical family members can just stick the | paper in the document feeder, and 20 seconds later, an OCRed copy | ends up on the family file share. You don't get the fancy tagging | and search that this provides, but file shares integrate natively | into all OSs, which is a huge perk. | manuc66 wrote: | People using HP printers with feature "Scan to Computer" are | also using https://github.com/manuc66/node-hp-scan-to to send | document to Paperless-ngx : | https://www.reddit.com/r/selfhosted/comments/tethlr/hp_scan_... | doubled112 wrote: | I wanted to read the article, but it was incredible twitchy on | my iPhone. | | I scan into a Samba share that paperless-ngx picks up | automatically, OCRs, tags, and deletes. | | A web application is pretty cross platform too, at this point. | | Plus I can get to them on my phones with less trouble than a | share. | noodlesUK wrote: | Yeah, I was looking at the docs for this and it looks like a | somewhat more featureful version of what I've stuck together. | | How does it handle when you have digital documents you want | to store (a la google drive or similar)? | djhworld wrote: | I've done something similar although I had to jump through a | few hoops to get it to work. | | I have a Fujitsu ScanSnap which is one of those feed-through | scanners. I have it hooked up to a Raspberry Pi which listens | for the button press on the scanner. You press the button, the | paper feeds through the scanner and once it has finished the | scan a script runs to collate everything into a PDF and drops | the result onto a Samba share that's running on the box where | paperless-ngx is. | | It's pretty neat and feels seamless. The worst part was dealing | with SANE and finding linux drivers for my scanner. | godsfshrmn wrote: | Do you have any other info on how to do this? I've looked for | this but cannot find how to do | alchemist1e9 wrote: | I don't understand the Pi and button part. I also have a | Fujitsu ScanSnap and just configure it to save to a Samba | share. | | What does listen for button press mean? and how? | djhworld wrote: | I'm not sure how I would do that on my model (ScanSnap | S1300i), it connects over USB and has no | touchscreen/control interface or network port, or wifi | capability, you have to connect it to a computer via USB. | | This works fine on say, a Mac, with the official Fujitsu | ScanSnap software, and I'm guessing _that_ supports saving | to a samba share, but I wanted a solution that's | | 1. completely headless, i.e. no desktop machine required | and experience needs to be friction free as the headless | part means the only way to interact with the scanning | function is to press 1 button | | 2. linux compatible, as I wanted to connect it to a Pi. I | had to dig for the drivers, Fujitsu didn't have the right | ones for my model on their website! | | I couldn't find any official software from Fujitsu, but I | found the drivers eventually, so ended up coming up with | connecting the scanner to the Pi over USB and glueing the | bits together to drop the PDFs onto the samba share | | The button is located on the scanner, and I run "scanbd" | [1] to listen for the button press, this is what | coordinates the scan function (feeding the paper through) | and then post-scan -> running a script to collate + create | PDFs | | [1] https://wiki.archlinux.org/title/Scanner_Button_Daemon | [deleted] | benbarbersmith wrote: | If you have any notes on this, I've been wanting to set this | up for ages and I'd be incredibly grateful! | djhworld wrote: | My solution was pretty much the same as what this guy did, | although he had a slightly different model of scanner to | me, but it's a very similar setup | | https://chrisschuld.com/2020/01/network-scanner-with- | scansna... | mirashii wrote: | Paperless-ngx supports a folder on disk that you can drop | files into and have them ingested. Throw in a samba | container pointed at the same directory in your docker- | compose and you've replicated the same setup. | Osmose wrote: | I've got this setup with a Brother ADS-1700W scanner, which | can write directly to a network share over wifi. Paperless- | ngx is running on my NAS which hosts the share as well. | diarrhea wrote: | I self host a couple things, but if I had to choose only one, | it'd be this. So far the project strikes a great balance of | stability (zero issues over two years now) and new features | (ownership concept already available, allowing for multiple | accounts in a pretty intuitive way). | | I've killed my instance twice now and had to restore from backup, | which is also surprisingly pleasant to do. Their document | exporter makes that possible. Having everything in a single JSON | and otherwise just the raw PDFs makes a ton of sense and has me | confident my documents are "just there" and moving to a different | system would be feasible. | Ylpertnodi wrote: | >the project strikes a great balance of stability (zero issues | over two years now).... | | >I've killed my instance twice now and had to restore from | backup, which is also surprisingly pleasant to do | | Stable, but murderable? | diarrhea wrote: | Yep, it's not undying, but the murder happened at no fault of | theirs. I'm taking credit for that one. | AmazingTurtle wrote: | I'm working on my own SaaS document management system that is | easy-to-use, affordable and fully automated. Basically a black | hole, throw a scan in or wait for emails to come it, it will | name, tag and categorize it. It will also attempt to retrieve | most important data such as invoice amount, customer numbers, so | that you can easily distinguish and find the documents youre | looking for. It comes with a chat feature so that you can ask | things such as "what was my liability insurance number?" and | it'll answer from the knowledge of your documents. I find this | pretty useful, recently I was at an airport and forgot my flight | number. I just asked what was my flight number and it retrieved | that information from my recent documents easily. Integration | with third party APIs and agnostic backend configuration for LLM | and OCR is in progress. It works with Google Cloud Vision OCR and | OpenAI at the moment. | locustmostest wrote: | We may want to get in touch with each other. We have an Open | Core document management platform that runs in AWS; I'm not | sure about your roadmap, but there may be something there | that's of use: https://github.com/formkiq/formkiq-core | AmazingTurtle wrote: | Cool, I mean - that's a LOT of AWS services right there. | | But yeah, let's connect. Take a look at my project as well! | https://turtledev.net/projects/refind-ai | diarrhea wrote: | Where can I sign up to track progress? This sounds like exactly | the future I envisioned. I take great care manicuring my | paperless instance such that when the day arrives, the LLM | integration can work its magic best. | | That said, open source is absolutely table stakes in this, to | me. From the documents I have in the system one could trivially | impersonate me. Perhaps even as good as clone me. So sending | all that off to random internet corporations, no can't do. | hiAndrewQuinn wrote: | That's unfortunately why I think Microsoft and Google are | going to be the first ones to actually achieve this future. | They're the only organizations well known enough that | enterprise might trust them with this kind of thing. | Jedd wrote: | https://news.ycombinator.com/item?id=37702095 | AmazingTurtle wrote: | I keep this site updated when something changes. | | https://turtledev.net/projects/refind-ai | gsich wrote: | My main gripe is that you can't use an existing folder structure. | denysvitali wrote: | I've created "ODI" (Overengineered Documents Indexer) and | presented it recently. | | https://clis-everywhere.k8s.best/16 | | My approach is scanning the documents with airscan1, indexing | them with a custom OCR Server (using the MLKit by Google on an | Android phone which does completely offline OCR scanning) and | indexing everything in OpenSearch. I've then created a backend + | frontend to see the documents and di full text search with that. | | Everything is (going to be) open source with a permissive | license. | [deleted] | nvahalik wrote: | I love seeing more Angular projects in the wild like this. | | Angular is an under-appreciated, solid, no-gimmicks framework. | Been using it for years rather than React and it seems the the | pendulum is swinging back toward "this side" now. | frde wrote: | Looking through the setup, this seems like an insane way to | package an application for users to install: | https://docs.paperless-ngx.com/setup | | The documentation itself is so full of implementation details | that, as someone who is interested in the concept of this, I'm | scared off even trying to setup and use this | | The project would be much more approachable if there was a simple | native installer. My parents could also benefit from this but | there's no way they would ever even understand how to install | this, much less troubleshoot docker things. | switch007 wrote: | It doesn't look like the project goals include being | installable by your parents | | It looks to sit in the self hosted space that has an admin | manage all the sysadmin tasks. They've provided docker which is | a pretty good step. | | There are desktop apps designed at the single user/less | experienced user, which might be more suitable | starkparker wrote: | You might want Recoll[1]. Similar if less powerful | capabilities, cross-platform, open source, has Windows and | macOS installers. | | Still an overly complex FOSS user interface for a tech-unsavvy | target with lots of digging around to configure it (OCR setup, | for instance[2]), but at least you don't need to know what | Docker is to install it. | | 1: https://www.lesbonscomptes.com/recoll/ | | 2: | https://www.lesbonscomptes.com/recoll/usermanual/webhelp/doc... | ndsipa_pomu wrote: | Self-hosting services usually entails more technical knowledge | than just installing an app and I don't think a document | management system would necessarily work well as a native | application. For starters, there's the backup issue and you | wouldn't want non-technical people to store important documents | that only live on a local drive. Remote web access is also a | very useful feature for when travelling and that wouldn't be | easy to setup for a local install. | | I've been using it for over a year and am very happy with it, | though I intend on moving it from my home Pi docker swarm onto | a free Oracle cloud instance to improve the performance and | uptime (I've got my Pis auto updating and rebooting, so | services get shunted around fairly often). | tmerse wrote: | _The project would be much more approachable if there was a | simple native installer_ | | Actually the very first example on https://docs.paperless- | ngx.com/setup lists an interactive installer which asks the | user some question and eventually arrives at a working docker- | compose setup. $ bash -c "$(curl -L | https://raw.githubusercontent.com/paperless-ngx/paperless- | ngx/main/install-paperless-ngx.sh)" | | If you ask me, this is already pretty user friendly. Although I | agree that if your needs are more involved, there is some | reading you'll have to do. | | I am currently in the process of migrating from mayan-edms to | paperless-ngx and it feels pretty approachable to me if you | know your way around docker (compose). | preya2k wrote: | It is designed to be a server application, so it'd be very | difficult to offer a desktop-like app experience, that's easier | to install. | bettercallsalad wrote: | Is it using local storage or cloud? | ndsipa_pomu wrote: | Yes. | | It's a self-hosted application, so it depends on your setup. I | suppose it's arguably using local storage on the server you run | it on which is often going to be a cloud hosted machine. | beestripes wrote: | Does it have annotation capabilities? Quickly adding a checkmark | or signature would make managing documents much easier. | ndsipa_pomu wrote: | It looks like it does, though I've never wanted to use them. I | just had a quick look at my instance and you can add text notes | alongside the document and also there's some basic editing | draw/text tools to add to the document itself. ___________________________________________________________________ (page generated 2023-10-07 23:00 UTC)