[HN Gopher] Paperless-ngx - Open source document management system
       ___________________________________________________________________
        
       Paperless-ngx - Open source document management system
        
       Author : thunderbong
       Score  : 407 points
       Date   : 2023-10-07 11:55 UTC (11 hours ago)
        
 (HTM) web link (nerdyarticles.com)
 (TXT) w3m dump (nerdyarticles.com)
        
       | nkrisc wrote:
       | It finally happened to me: the very thing I just started
       | researching and testing out showed up simultaneously at the top
       | of the front page.
       | 
       | I've found more good information about Paperless right here in
       | the comments than anywhere else so far.
        
       | eviks wrote:
       | Is there any modern solution that doesn't tie you to the clunky
       | interface of a single web browser client?
       | 
       | While the folder organization criticism in the article is on
       | point (although you could also use tags that many file systems
       | support, but that's not a reliable system to invest time in, or
       | maybe if it's backed up by some app that can restore all the
       | tagging it could be), the range of native tools for
       | viewing/editing various document formats as well as your ability
       | to customize your workflows in unparalleled
        
       | lucas_codes wrote:
       | How do people usually backup their self-hosted docker services
       | using postgres? I have been using docker-volume-backup [0] and
       | just saving the postgres data directory, but I've found it
       | requires a minute of downtime to backup properly.
       | 
       | [0] https://github.com/offen/docker-volume-backup
        
         | efrecon wrote:
         | I have this: https://github.com/efrecon/pgbackup
        
         | mastax wrote:
         | ZFS snapshots
        
         | darnir wrote:
         | Specifically in the case of paperless-ngx, I use their export
         | facility from a cron job. The export is plaintext and contains
         | all the information needed to recreate the postgres db and the
         | learned identifiers. In case of a disk failure (and I've had
         | one with my paperless store), I just reimported the previous
         | days backup from my offline backup of paperless' export.
        
         | asmor wrote:
         | restic container with all volumes mounted to
         | /backup/<volumename> (and . to /backup/self - use named
         | volumes, not binds) in my composefile with scale 0 and a
         | backup.sh that's essentially
         | 
         | docker compose down && docker compose run backup && docker
         | compose up -d
         | 
         | The restore procedure is the same, you restore the composefile
         | through restic on the host and then `docker compose run backup
         | restic restore latest --exclude "/data/self/*" --target /`
         | 
         | I find it's fast enough because restic is incremental, but if
         | you can set this up on a filesystem with snapshots that would
         | be a great option too.
         | 
         | Restic takes a bit of fiddling around too. I mount a prepared
         | ssh config, a known hosts file and a private key.
        
         | andix wrote:
         | For now I only backuped some databases with a pg_dump one liner
         | triggered from a cron job on the docker host (via docker exec
         | or docker run --rm). No idea how this scales for big databases.
         | But for your regular home server <10 GB databases this should
         | just work.
        
         | syntaxing wrote:
         | I used vackup [1] that's been obsoleted but still works for me.
         | However, you still need to turn of the container temporarily.
         | 
         | [1] https://github.com/BretFisher/docker-vackup
        
         | jhot wrote:
         | docker-compose --env-file .env exec postgres /usr/bin/pg_dump
         | -U postgres "$db_name" | gzip -9 >
         | "$BACKUP_ROOT/postgres/${NOW}.${db_name}.sql.gz"
        
         | poorlyknit wrote:
         | pg_dump [0] (or pg_dumpall, linked there) sounds like what you
         | want to use. You could docker exec into the postgres container,
         | then copy the dump from the volume to your backup location on
         | the host.
         | 
         | A bit more contrived than copying the volume but you don't need
         | to shut down the server. There's probably some scripts out
         | there for doing this in a structured way but I usually do it
         | more or less manually/use a bash script.
         | 
         | [0]: https://www.postgresql.org/docs/current/app-pgdump.html
        
       | kstrauser wrote:
       | This is nifty, but seems to lack to one thing that keeps me
       | coming back to DEVONthink: a learning classifier.
       | 
       | With DT, say you've scanned or saved 20 docs to your inbox and
       | you want to sort them to their long-term homes. DT will suggest
       | folders based on how closely the new file matches the contents of
       | those folders. It has the UI equivalent of "this looks like 2023
       | state taxes. Is it? This looks like kid #2's school stuff. Is it?
       | This looks like the older dog's veterinarian records. Is it?"
       | 
       | That's so, so nice.
       | 
       | Lately, as an experiment, I've been playing with organizing my
       | docs with Johnny Decimal, then using the Hazel app to sort known
       | docs with fixed structures (think bank statements and the like)
       | into the right folders. My ScanSnap scanner's software does OCR,
       | so by the time docs land in the inbox folder, they're ready for
       | automated processing. It's working pretty well so far, and I may
       | stick with it.
       | 
       | But if I _were_ to go back to an app, it would be DEVONthink or
       | something with most of its features. That classifier is too darn
       | nice, plus its smart rules, plus its scriptability, plus multi-
       | device sync, plus Markdown notes with wiki links to stored docs,
       | plus a thousand other niceties.
        
         | pydry wrote:
         | I thought I wanted this originally when I first started going
         | paperless but I quickly realized that as long as I OCR
         | everything and throw it in a pile I can easily grep for "state
         | taxes" and 2023.
        
         | lolinder wrote:
         | Paperless has this--when I upload a new file it will attempt to
         | categorize it automatically using my existing tags. The more
         | items I put in each tag the better it gets at categorizing
         | them, so it definitely seems to be learning somehow, though I'm
         | not sure on the details of how it works.
         | 
         | I've never used DT, so it's possible that their system is
         | substantially better in some way.
        
         | Xerox9213 wrote:
         | Paperless uses tags and will auto tag based on previous scans.
         | IME it works very well (as long as you have a decently sized
         | library of tagged documents) and seldom do I have to add my own
         | tags. It's not perfect, though, and sometimes I have to go in
         | and fix some of the tags.
         | 
         | https://docs.paperless-ngx.com/advanced_usage/
        
           | kstrauser wrote:
           | Oh! Looks like I was wrong. Nice!
           | 
           | I'd still miss DT's zillion other things I've used over the
           | years, but that one would have been a dealbreaker.
        
             | vr46 wrote:
             | Previous conversation also here:
             | https://news.ycombinator.com/item?id=37521492
        
             | midnitewarrior wrote:
             | From what I can tell, DT is only on Mac, and not open
             | source. If the company goes under, good luck.
        
               | steve1977 wrote:
               | You can always export the files and you could also access
               | them directly in the applications document database if
               | needed.
        
       | dgrabla wrote:
       | My Paperless-ngx listening on a network share + brother ADS-2800W
       | are key to stay sane. My only complain is that it is resource
       | hungry. If I allocate less than 2G RAM to the paperless VM it
       | does not work as it should.
        
         | petepete wrote:
         | I have this exact setup but with the ADS-4300N. I'm new to it
         | and it's still a novelty.
         | 
         | My only complaint is I've had the odd letter get scanned upside
         | down and there's no way to rotate pages in Paperless-ngx.
        
       | jplunien wrote:
       | https://apps.apple.com/app/id6464425056
       | 
       | Just recently started working on an iOS/macOS app for it. Hope
       | you like it!
        
         | Obscurity4340 wrote:
         | How would you compare this to something like DevonThink, out of
         | curiosity?
        
         | bketelsen wrote:
         | Nice, looks like you're headed in a good direction with this!
        
         | apfsx wrote:
         | This is great, nice work.
        
       | ipsi wrote:
       | I've spun up a copy of this recently (within the last month) and
       | it's already proving helpful.
       | 
       | I've purchased a new-build home in Germany, and I'm currently in
       | the stage between "purchased" and "ready for move-in," and if
       | you've ever purchased a Neubau in Germany you know how much
       | paperwork is involved - I get so many documents over email, many
       | of which are scanned (to preserve the wet signature and stamps),
       | and some of which I need to copy into a translator, that this is
       | incredibly helpful. It checks my email, grabs PDFs, straightens
       | them, OCRs them, adds a correspondent, tags them, and makes them
       | available through a web UI.
       | 
       | I also appreciate the full-text search (for all that it might
       | struggle if I had tens of thousands of documents) as I've had to
       | go and try to find particular documents where the name of the
       | document I've received might be a synonym for what the other
       | person is asking for, but the word they're asking for is at least
       | used in the text.
       | 
       | I'll also set it up to pull documents from my NAS as well, where
       | the scanner writes to, as I also receive a number of documents
       | via mail (that I also occasionally need to translate or
       | copy/paste from).
       | 
       | There are also some limitations that annoy me:
       | 
       | * I really wish the email filters were more flexible - right now,
       | I have to have three filters, one of PDFs, one for JPEGs, and one
       | for PNGs, so I wish I could just set a regex for the attachment
       | name. This one annoys me enough that if I ever have time I'd look
       | at doing a PR for it (assuming the filtering is done locally and
       | not on the IMAP server). * I'd also like to be able to setup
       | rules to tag documents based on the email domain (e.g., house-
       | builders get tagged as "house-builder, house") without having to
       | manage a gigantic explosion of rules. In theory the ML should
       | handle that, but... I'm mistrustful of ML. We'll see in a few
       | months if I was too hasty in my judgement or not. * I'd like to
       | retain slightly more information about the correspondent, like
       | both name and email address (there's no consistency about who has
       | their From line as "Name <email>" and who's just "email", even
       | within the same company), both for de-duplication of
       | correspondents and domain-based searching. * I wish I could share
       | documents more easily than downloading it and re-uploading it to
       | my email client (or mounting the folders and trying to find the
       | right document, but that has its own set of problems). This one
       | of those problems that's really easy to state, but potentially
       | quite difficult to actually implement - could a web application
       | add a PDF to the clipboard in such a way that GMail, say, would
       | understand what was happening and add it as an attachment when
       | pasted?
       | 
       | Overall though, I'm pretty happy with it, and finding it useful
       | so quickly was somewhat surprising.
        
       | jdoss wrote:
       | If you are looking to quickly setup Paperless-NGX check out my
       | little side project https://github.com/jdoss/ppngx. It will setup
       | everything you need to run Paperless-NGX (PostgreSQL, Redis,
       | Tika, Gotenberg, PaperlessNGX, and SFTPGo) inside a Podman Pod on
       | a Linux based system. You can optionally set it up to start on
       | boot via systemd.
       | 
       | I run this locally on my workstation and send PDFs many times a
       | week from Brother ADS2800w scanner via SFTP. Paperless NGX has
       | reduced my home office paper piles to almost zero. It is a
       | fantastic open source project and I am very thankful it exists.
        
         | wolverine876 wrote:
         | > everything you need to run Paperless-NGX (PostgreSQL, Redis,
         | Tika, Gotenberg, PaperlessNGX, and SFTPGo)
         | 
         | That is a lot of dependency. How stable is Paperless with all
         | those applications making uncoordinated changes on their own
         | schedules?
        
           | darnir wrote:
           | The only hard dependencies are Redis and Postgres. The
           | official stance is to run them from the provided docket
           | compose and the container for paperless-ngx itself is kept
           | updated and working for the stable containers of redis and
           | postgres.
           | 
           | Tika and Gotenburg are additional features for scanning and
           | converting MS Office documents to PDF. Not necessary and I
           | don't use them in my setup at all. Same with sftpgo. I'm not
           | sure for its usecase. But paperless doesn't directly depend
           | on it in anyway.
        
         | traverseda wrote:
         | Why would you want to use this over one of the official docker
         | compose setups? https://github.com/paperless-ngx/paperless-
         | ngx/blob/main/doc...
         | 
         | They will also automatically launch if you have docker running
         | at boot. Is it just because you prefer redhat/IBM's docker
         | equivalent stack to the much more common and cross platform
         | docker install?
        
           | jdoss wrote:
           | I don't use Docker at all on any of my infra or workstations.
           | That's why I made this.
        
             | traverseda wrote:
             | Alright, but you've sort of re-invented docker compose
             | there, but as a shell script. These days docker compose
             | even work with podman if you really prefer IBM's docker
             | implementation to the original.
        
               | efrecon wrote:
               | Well... Maybe re-inventing was part of the fun or a
               | learning experience. If you want, there is even this:
               | https://github.com/Mitigram/docker-compose-build
        
           | abacate wrote:
           | I would want this over docker and docker-compose any day.
           | 
           | I've been using docker compose in production for a couple of
           | years now and it adds another layer on top of systemd that is
           | a continuous source of headache, especially during updates.
           | 
           | Podman gets it right: no central daemon, can automatically
           | generate systemd services for a whole pod. Updates are
           | seamless.
           | 
           | This by itself is enough of a reason to me.
        
       | growingkittens wrote:
       | Paperless-NGX doesn't have document version history,
       | unfortunately.
       | 
       | Right now I am looking at OpenProDoc [1] and bitfarm-archiv [2]
       | as document management possibilities.
       | 
       | [1] http://jhierrot.github.io/openprodoc/Spec_EN.html
       | 
       | [2] https://www.bitfarm-archiv.com/document-
       | management/features....
        
         | lobochrome wrote:
         | I am just rcloning my paperless-ngx document volume to s3 deep
         | glacier every night for this.
         | 
         | It's a bit "scary" since even documents I delete in paperless-
         | ngx are thus preserved forever, but it may come in handy
         | someday.
        
       | andix wrote:
       | I'm looking for a suitable document management system for a
       | while. There is one feature I would like to have, I didn't find
       | anywhere except maybe in $$$ enterprise systems:
       | 
       | I want to add custom metadata to documents by
       | categories/tags/folders, for example like this:
       | Invoice {issued: date, invoiceNumber: string, amount: number,
       | due: date}       Contract { validFrom: date, renewsAt: date,
       | autoRenew: boolean}
       | 
       | When adding a tag like this, it should either automatically fetch
       | this information from the content document (probably very hard)
       | or give you a manual workflow to type it into a form, while
       | showing the document next to it. Maybe just by selecting the text
       | from the PDF.
       | 
       | In the folder list and in the search you would be able to add
       | those meta data information as columns, sort them by value or do
       | queries (tag:invoice AND invoice.amount > 1000)
       | 
       | Edit: this feature seems to be one of most upvoted feature
       | requests for paperless https://github.com/paperless-
       | ngx/paperless-ngx/discussions/1...
        
       | jamala1 wrote:
       | Is Paperless suitable for business use, say, for a smallish sized
       | company with 25 employees and 1000 customers. I think in my EU
       | country such systems need to fulfill certain requirements like
       | versioning/tracking of changes.
        
       | ephimetheus wrote:
       | Shameless plug: I recently released a native app for iOS that
       | connects to Paperless-ngx:
       | 
       | https://apps.apple.com/de/app/swift-paperless/id6448698521
        
       | petergrace wrote:
       | I use MayanEDMS personally, and have for the past five or so
       | years. It's complex but does what it says on the tin.
       | 
       | https://www.mayan-edms.com/
        
         | growingkittens wrote:
         | Mayan EDMS recently moved a lot of basic documentation behind a
         | subscription paywall.
        
       | saintradon wrote:
       | I tinkered with this a few weeks ago. Pleasantly surprised with
       | it's capabilities.
        
       | lwhi wrote:
       | This is very interesting to me.
       | 
       | I'd love it if I could also use my mobile devices to bring up
       | paper docs instantly (mobile phone, tablet, kindle).
        
         | lobochrome wrote:
         | There is even a nice Oss Swift app now in the app store. v1 but
         | looks nice is fast and simple.
         | 
         | https://apps.apple.com/app/id6448698521
        
           | ephimetheus wrote:
           | I made that! Glad you like it!
        
         | diarrhea wrote:
         | Easily possible. Paperless-ngx works great on mobile as well. I
         | have WireGuard on my phone and connect that way, then simply
         | use a mobile browser, no app needed.
        
           | lwhi wrote:
           | Nice!
        
         | LeSaucy wrote:
         | It's not free/oss, and it's on the Apple ecosystem, but
         | DEVONTHINK does a fantastic job of this, and supports storing
         | all of your documents in a webdav store which you can host
         | yourself. It uses Aabbyy fine reader for ocr which I have found
         | to provide better results than tensorflow based ocr.
        
           | rufugee wrote:
           | I've been using DEVONThink for just this for a few years, and
           | it's very good at it. However, it's macOS only and has far
           | more features than I need (simple searching, tagging, and
           | organization). I tried paperless a year ago and the search
           | and rendering was far too slow, and many docs just gave
           | obscure errors. Perhaps it's time to give it another shot.
           | I'd love to have something on Linux that could handle my
           | large repository of documents.
        
       | kristofferR wrote:
       | Is this in reality a German cry for help, disguised as tech talk?
       | 
       | As one of the least digitized countries in Europe, and the
       | digitalization budget recently cut 99%, it seems like they still
       | need to use paper in their lives, and it's not gonna improve
       | soon.
       | 
       | This feels so incredibly archaic to me as a Norwegian, I would
       | have to print out documents to have anything to fill paperless-
       | ngx with.
        
         | _frkl wrote:
         | You can just use your digital documents directly, and augment
         | it with the few paper receipts that you might (or might not)
         | still have to deal with. The main selling point is really
         | document management (to me, anyway), the 'branding focus' on
         | physical documents is probably a little misleading.
        
         | greenicon wrote:
         | You can easily use this for digital documents as well. The only
         | difference in my setup is a tag showing whether the document id
         | maps to a physical document in a binder or not.
        
         | diarrhea wrote:
         | I track, using tags, whether a document is a scan or properly
         | digital. The pendulum is strongly in favor of the latter: I use
         | this tool a ton for natively digital documents as well.
         | Invoices, contracts, tickets etc. all come in as PDFs anyway,
         | luckily. I have all that knowledge at the tip of my fingers.
         | Yes, some of those documents are scans and used to be physical
         | paper, but that's besides the point.
        
       | rayshan wrote:
       | Genuine question: for simple needs, why use this or DevonThink
       | over macOS' built-in features? macOS now does OCR (Live Text),
       | has tagging, and spotlight search is fast (but sometimes presents
       | too many results to be useful). I even stopped splitting PDFs
       | into separate documents and organizing them into folders. I just
       | search.
        
         | acka wrote:
         | Obvious answer: because, contrary to popular belief, not
         | everyone uses macOS.
        
         | phodo wrote:
         | Does auto OCR work on iCloud files ? For example: I scansnap a
         | huge collection of documents to a folder that is on iCloud
         | (synced w desktop). It works great because it is so simple.
         | However if I have, say, PDF document, will the Mac ocr
         | functionality perform the OCR if the doc is on iCloud and will
         | I then be able to search for the text in that doc via spotlight
         | / finder ? I tested this a few years ago and the search on
         | content inside scanned PDFs did not work. I had looked at
         | Paperless but decided to stay on Mac os file system.
        
         | darkteflon wrote:
         | Yeah. I had a Devonthink-based setup but after one too many
         | database corruptions I threw in the towel. Now I just OCR scan
         | everything into a few MacOS folders and search using Houdahspot
         | (Spotlight, I found, was not suitable for fine-grained search).
         | I'm very happy with the setup.
        
         | ndsipa_pomu wrote:
         | This is more designed for a self hosted server, so if you want
         | multi-device web access then it's a great solution. I can
         | download a PDF on my android phone and upload it to my
         | paperless-ngx instance in a couple of clicks and easily edit
         | the tags as necessary. It's great for travelling as you're not
         | reliant on having a locally installed application on your
         | chosen device with you, and of course it would still be
         | available if you lost your main device and only had your phone
         | on you.
        
         | LVB wrote:
         | I used to be the target audience and really enjoyed having my
         | system just right, sorting and tagging everything, etc. But
         | over the years I realized that I wasn't really benefiting much,
         | and gave SwiftScan on my iPhone + dumping into and iCloud
         | folder a try. For my needs, this has worked fine. It is rare I
         | even need to refer to the scans, and the macOS OCR + automatic
         | dates usually let me find the doc quickly. In the worst case I
         | browse thumbnails.
        
       | aetherspawn wrote:
       | If anyone is looking for a fully-commercial version, we use
       | something like this -- it is called Hubdoc and it is free with
       | any Xero subscription.
       | 
       | I really really appreciate the work that went into paperless, but
       | for us the business risk of self-hosting this is far too high
       | because if we lose our docs we lose our tax proof.
        
       | xwowsersx wrote:
       | I wonder if people know about Google's Stacks app? I don't know
       | if it's as powerful as Paperless-Ngx, but it lets you organize
       | docs pretty easily and some of it is automatic. I have "stacks"
       | for insurance, id cards, receipts, medical records, etc. Whenever
       | I get paper mail, I snap a photo and immediately toss it. I can
       | then organize it in the Stacks app and easily be able to pull it
       | up later. It's a pretty useful, easy solution IMO.
        
         | swader999 wrote:
         | Until they cancel it.
        
           | xwowsersx wrote:
           | True :(((
        
         | yunohn wrote:
         | I usually don't jump on the "Google cancels everything" train,
         | but do keep in mind that Stacks is a project from their Area
         | 120 incubator, which saw heavy layoffs [1]. It's not on the
         | remaining list, so it may have already been cancelled
         | internally and currently in the process of being shut down.
         | 
         | [1] https://techcrunch.com/2023/01/25/google-spares-three-
         | area-1...
        
         | Eddy_Viscosity2 wrote:
         | If it starts with 'google' then at best its something you try
         | out then, if you like it, try and find that functionality in an
         | app made by someone else. Google will kill this app just when
         | you get fully invested. All google apps are traps and foot-
         | guns, especially the ones that work great.
        
           | xwowsersx wrote:
           | Probably right
        
           | navigate8310 wrote:
           | Definitely scary as it's under their incubator area120
        
         | hoppyhoppy2 wrote:
         | I can't get it on my, ahem, _Google_ Pixel device running
         | Android 13:
         | 
         | > _This app isn 't available for your device because it was
         | made for an older version of Android._
        
           | dstroot wrote:
           | Also:
           | 
           | "Stack is only available on Android in the U.S. You can
           | install it through the Google Play store."
        
           | xwowsersx wrote:
           | That's weird. I'm using it right now on the Pixel 7 Pro
           | running Android 13.
        
       | jeleh wrote:
       | If you own a Synology NAS I recommend to have a look at synOCR:
       | 
       | https://github.com/geimist/synOCR/wiki
       | 
       | English translation: https://github-
       | com.translate.goog/geimist/synOCR?_x_tr_sl=au...
       | 
       | I've been using this for several years and it works great.
        
       | JW_00000 wrote:
       | What I don't really understand is, do people really have than
       | many physical documents that they need to keep track of, that
       | such a system is worth it? E.g. to file my taxes (in Belgium), I
       | think I only ever need a few (maybe even only 1 or 2) digital
       | documents. Or is this more a mentality thing? I know my parents
       | have folders and folders, e.g. my father kept all expense notes
       | from his work even after retirement... I throw everything away
       | once it's handled.
        
         | _frkl wrote:
         | Can't speak for physical documents in general, but personally I
         | really appreciate paperless-ngx for it's general document
         | indexing/storage. Being able to scan and ocr physical documents
         | (usually using the camera on my mobile phone) is very nice, but
         | I mainly use it with pdfs that paperless automatically fetches,
         | ocrs (if necessary), and tags from my email inbox, or which I
         | copy into a specific local folder which gets synced with
         | paperless.
         | 
         | Getting all my invoices from last year to prepare taxes is now
         | just a simple query in the paperless UI, the result would be
         | about 95% digital and 5% physical documents, probably. Of
         | course I could do all that old-school using filesystem folders,
         | but having all my documents indexed and searchable in a single
         | place was definitely worth the (small) effort of setting it all
         | up and keep it running.
        
           | kristofferR wrote:
           | I don't understand what you mean with prepare taxes.
           | 
           | I just add all purchases/sales right when they happen in my
           | accounting app and attach the invoice PDF. Then when I have
           | to file taxes, I export the correct numbers.
           | 
           | Are you doing your bookkeeping in Excel or something?
        
             | _frkl wrote:
             | This is just for my personal taxes, no accounting involved.
             | I just get all the relevant stuff together once a year. Of
             | course it's not 10s of 100s of documents, but still enough
             | so it would take me some time to get everything together
             | manually.
             | 
             | Also it was just meant as an example, paperless is
             | generally useful (to me) in situations where I need to
             | access somehow related documents, like traveling and such,
             | or searching my documents for some information. As I said,
             | there are other systems and ways to do this, but for me
             | this is the one that stuck.
        
         | NoboruWataya wrote:
         | I'm quite paranoid about throwing stuff away so for me it's at
         | least partly a mentality thing. I probably save a lot more than
         | I need but it gives me piece of mind to know that it's all
         | there. There are some things that it is very helpful to have
         | easy access to, like utility bills and bank statements (which I
         | occasionally need for KYC stuff) or ID documents.
        
         | ipsi wrote:
         | Kinda - at the moment I'm receiving _a lot_ of documents,
         | mostly as PDFs via E-Mail (some the original digital version,
         | some scans of physical copies), but some via post as well.
         | 
         | I've only added documents I've received this year (plus a
         | couple of dozen documents going further back), and I've got
         | ~250 in there, with a total of ~2.5m words (although I think
         | word-count is a fuzzy concept in German).
         | 
         | I've posted a top level comment in more detail, but yeah, it's
         | helpful to me.
        
         | kstrauser wrote:
         | I guess it's partly a mentality thing for me. I've had numerous
         | cases of sadness that I couldn't produce a necessary document,
         | and gladness that I was able to pull up something presumed long
         | lost. For me, it's easier to save everything "just in case". It
         | all adds up to less than 50GB so it's not an enormous amount of
         | data to store by current standards.
         | 
         | Seriously, a couple cases of "sorry, I don't have proof to back
         | up that tax deduction" or "hey, here's the receipt proving that
         | our TV is still covered by warranty!" make it all worthwhile.
        
         | dividedbyzero wrote:
         | Definitely, Germany strongly believes that a document that
         | hasn't been a physical piece of paper at least once can't be
         | real. That makes for folders upon folders of documents and it's
         | actually worse than back in the 20th century because generating
         | and mailing documents has become way easier and cheaper, so
         | things that would have been a one-page typewritten letter back
         | then now are five ten-page ones full of automatically generated
         | crap. One lengthy illness in the family alone filled hundreds
         | of pages and it can be very hard to know what can be thrown
         | away at which point.
        
           | schlowmo wrote:
           | > Definitely, Germany strongly believes that a document that
           | hasn't been a physical piece of paper at least once can't be
           | real.
           | 
           | I'm sorry to tell you that is a an oversimplification and
           | especially for documenting expenses as a company/freelancer
           | it's kind of worse.
           | 
           | Last time I checked if you want to follow the tax law to the
           | word you're not allowed to change the medium:
           | 
           | If an invoice came as a paper copy (e.g. by snail mail), this
           | paper copy is the original. If you scan it the digital
           | version isn't.
           | 
           | If an invoice came as a digital document (e.g. a PDF by
           | email), this digital document is the original - a printed
           | version of that digital document isn't.
           | 
           | So if a tax inspector asks for "originals" it's technically
           | almost impossible to provide them in the sense of the law. If
           | even a tax inspector would care is another question.
        
             | germanier wrote:
             | It's perfectly legal (and common) for a decade now to scan
             | documents and destroy the paper original as long as you
             | follow some guidelines. Keyword is "ersetzendes Scannen".
             | 
             | And yes, they care about those rules and that you provide
             | "originals" according to that definition - in particular
             | that you didn't modify digital documents in any way. You
             | can (and should) comply with that and there are service
             | providers to help if you are to small to set that up
             | yourself.
        
               | schlowmo wrote:
               | Thanks, today I learned about "ersetzendes Scannen". I
               | just checked and it's exactly a decade (2013) since it's
               | allowed which coincidetally is the year when I started
               | working as a freelancer (and I have to care about such
               | rules).
               | 
               | I admit that my last paragraph was kind of hyperbole, but
               | I never heard (at least from other freelancers) of a tax
               | inspector which wasn't happy with either everything
               | printed or everything digital. I guess they really start
               | to care if they suspect something fishy.
        
               | noAnswer wrote:
               | Another search/keyword is "Revisionssicher". If you
               | storage/software has that, you a good to go.
        
             | greenicon wrote:
             | Just a side note to this and the other replies: You can
             | also keep the original documents and add scans to paperless
             | for indexing, etc. Since I switched to paperless I keep my
             | originals in binders just ordered by the paperless id, so I
             | can retrieve the original when required.
        
           | ipsi wrote:
           | Yeah, I'm also in Germany (although not German) and installed
           | Paperless because of this!
           | 
           | I think more than a few of these projects are started and/or
           | maintained by Germans due to the astonishing number of
           | documents received - e.g., paperless-ng appears to have been
           | done by a German, although neither the original Paperless nor
           | Paperless NGX immediately appear to be.
        
         | esafak wrote:
         | I would be in favor of not scanning them, forgetting about
         | them, then throwing them away when I eventually see them again
         | and deciding I did not miss them.
        
         | whateveracct wrote:
         | It's nice to throw papers away without worrying about it. Or to
         | archive instruction manuals for stuff I own - paperless is the
         | first place I look (its search is nice).
        
         | krupan wrote:
         | In my experience, no, you don't need this. The few things I
         | keep just go in folders named for the year under my Documents
         | folder, and they are given descriptive filenames like
         | paystub-2022-10-15.pdf, or companyA-w-2.pdf. In the rare cases
         | where I need to go back to those (like for a loan application
         | or doing taxes) it's easy enough to find them.
        
         | faiD9Eet wrote:
         | You are right, you do not want to lookup documents that old, it
         | is a waste of time... ... unless you are a German and the state
         | asks for your time sheets three years in the past because
         | you've gotten child support and are requested to prove your
         | working hours. ... unless you happen to have an accident and
         | your insurance is fighting with another insurance who's gonna
         | pay and they ask you about the incident two years later ...
         | unless you end up in a contract fight with the postal operator,
         | that can take a year of mailing before being settled.
         | 
         | Some correspondences take years and only add a mailing every
         | few months. You would like to have a thread-like view -- as in
         | an electronic mail. That is the strength of document management
         | systems.
        
         | djbusby wrote:
         | I have a small business in USA. For federal business taxes I
         | need 6-7 documents. Then that process creates other documents I
         | need for personal taxes, which also requires 6-8 more
         | documents. So, I'm roughly 20 important documents per year for
         | federal taxes. Nexus in 3 states, adds more. And save them all
         | for 7 years.
         | 
         | The other end of the spectrum in USA is filling with the
         | 1040-EZ which is like a 3-4 document process.
        
         | catlover76 wrote:
         | Seriously. People in this thread are describing some setups
         | that momentarily seem cool in theory, but are almost certainly
         | overkill for personal use.
        
           | whateveracct wrote:
           | Luckily, running paperless-ngx on my NixOS desktop is
           | trivial. And it was also trivial to make it accessible over
           | an avahi name on my local network. So it was kind of a "why
           | not" sort of thing.
        
         | yunohn wrote:
         | In the Netherlands, government bodies are regularly pushing
         | everything they can to a digital inbox - which I vastly prefer.
         | My simple, single-employer yearly income tax is all pre-
         | calculated. Further, deductions for mortgage interest,
         | healthcare, studies, etc are all pre-filled as much as
         | possible. I think you only need to upload documents for
         | complicated sitations or audits?
         | 
         | Of course, I still quickly download my year-end
         | bank/salary/mortgage statements and cross-verify the tax
         | departments numbers. The whole process takes at most a few
         | hours.
         | 
         | IME Germany has significantly more hard-copy requirements.
        
           | t0mas88 wrote:
           | You never need to upload the documents in the Netherlands,
           | their software doesn't have such an option.
           | 
           | But technically you're expected to keep the documents at
           | least until you receive the "definitieve aanslag" and if
           | you're nitpicking I think there is a 7 year term for the tax
           | services to come back on your filed taxes and change things
           | or demand proof.
           | 
           | Practically that doesn't happen if you accepted their pre-
           | filled numbers and they match your employers. But if you're a
           | freelancer or other non-standard case I would keep digital
           | copies for a few years just to be sure.
        
             | yunohn wrote:
             | > You never need to upload the documents in the
             | Netherlands, their software doesn't have such an option.
             | 
             | Ah, interesting. I just assumed my situation never
             | triggered it.
        
         | t0mas88 wrote:
         | Depends on your tax situation. For my private taxes it's maybe
         | 3 or 4 documents and those from the bank etc have all gone PDF
         | anyway.
         | 
         | But when I was a freelancer I used a document scanning system
         | provided by my bookkeeper. It worked similar to this open
         | source thing, scan to PDF, automatic OCR and classification.
         | Needed it because many invoices still arrived on paper, and
         | receipts for restaurants etc I usually took a picture to
         | upload.
        
         | kristofferR wrote:
         | In some countries like Germany, the government still
         | communicates with its citizens by snail mail. Important
         | documents are usually physical there. They are one of the least
         | developed countries in Europe with digitalization, they are far
         | behind.
        
           | [deleted]
        
         | Macha wrote:
         | So here's an example where it came in useful to have back
         | documents:
         | 
         | I recently purchased a house. As part of the process, I needed
         | to apply for a mortgage. The bank wanted a statement from my
         | employer about my income from them, along with my last 2
         | complete years tax documents.
         | 
         | The bank had an inquiry. My employer had said my salary + bonus
         | was X, but in the first of these two years, my tax documents
         | said my income from my employer that year was 2.5X. The extra
         | 1.5X was due to the employer being bought out and some change
         | of control terms in the RSUs causing immediate payout of what
         | would normally have been paid out over 4 years. Since I kept
         | the documents of the RSU terms and the payslips, I could
         | provide these to the bank to clear the matter up.
         | 
         | Notably, had I not kept my own copy of these documents, I could
         | not have gone back to my employer for new copies. Due to the
         | change of control, they had changed payroll vendors, and had
         | eventually terminated the contract with the old vendor, so I
         | could not have gotten a payslip from 1.5 years ago. Similarly,
         | in the move to the new owner's HR system, the company had lost
         | many of their records of agreements with employee's, including
         | contracts etc., so it's not clear they would still have the
         | terms of the RSUs, especially since the change of control
         | payout rendered this a "completed" transaction. And later
         | events made it clear that they did not have, e.g. a copy of my
         | employment contract.
         | 
         | Similarly, if I ever had had a dispute over the terms of those
         | contracts - if I hadn't kept a copy of the contract, and the
         | company definitely hadn't kept theirs, any dispute would have
         | been my word against theirs.
        
           | iamwpj wrote:
           | Companies are legally required to keep payroll records for
           | multiple years (depends on where you live, though I doubt
           | most places are less than 3-4). This is ok advice, but these
           | systems don't just work like this. If you didn't have the
           | documentation the bank would likely take your approved tax
           | filings as evidence and move on with their day.
           | 
           | In a real contract dispute your copy of a contact from your
           | documents isn't notably different in the eyes of the court
           | than one from your employer. They're both notarized and if
           | there's a dispute between them there is established
           | processes. Aside from some titles or etc., historical filing
           | ownership is typically relegated to the document originators.
        
         | viraptor wrote:
         | It's not just for physical documents. I have payslips which may
         | be useful in the future, but are would be really hard to
         | recover when I leave the company. Any invoices which come to my
         | email. Any bank documents which exist in a vaguely named
         | "account updates" email. And many other things which could be
         | possible to find in the future, but are much better in
         | paperless with appropriate tags and OCR.
         | 
         | But yeah, then there are for example the bank account contract
         | updates which come by physical mail only.
         | 
         | > expense notes (...) I throw everything away once it's
         | handled.
         | 
         | Don't know about your location, but I need to keep the tax
         | related documents for 5 years in case of an audit.
        
       | abbbi wrote:
       | using paperless for some months now and i really like it. Nice to
       | see the project got some new contributors and frequent releases.
        
       | noodlesUK wrote:
       | One thing that I've done that makes my paper handling process
       | much easier is have my printer/scanner point to a write only
       | samba share. Most HP printers support this. I wrote a short
       | script that looks for new files in there (with inotify), runs OCR
       | on them with OCRMyPDF and moves them to a different file share.
       | It means that my non-technical family members can just stick the
       | paper in the document feeder, and 20 seconds later, an OCRed copy
       | ends up on the family file share. You don't get the fancy tagging
       | and search that this provides, but file shares integrate natively
       | into all OSs, which is a huge perk.
        
         | manuc66 wrote:
         | People using HP printers with feature "Scan to Computer" are
         | also using https://github.com/manuc66/node-hp-scan-to to send
         | document to Paperless-ngx :
         | https://www.reddit.com/r/selfhosted/comments/tethlr/hp_scan_...
        
         | doubled112 wrote:
         | I wanted to read the article, but it was incredible twitchy on
         | my iPhone.
         | 
         | I scan into a Samba share that paperless-ngx picks up
         | automatically, OCRs, tags, and deletes.
         | 
         | A web application is pretty cross platform too, at this point.
         | 
         | Plus I can get to them on my phones with less trouble than a
         | share.
        
           | noodlesUK wrote:
           | Yeah, I was looking at the docs for this and it looks like a
           | somewhat more featureful version of what I've stuck together.
           | 
           | How does it handle when you have digital documents you want
           | to store (a la google drive or similar)?
        
         | djhworld wrote:
         | I've done something similar although I had to jump through a
         | few hoops to get it to work.
         | 
         | I have a Fujitsu ScanSnap which is one of those feed-through
         | scanners. I have it hooked up to a Raspberry Pi which listens
         | for the button press on the scanner. You press the button, the
         | paper feeds through the scanner and once it has finished the
         | scan a script runs to collate everything into a PDF and drops
         | the result onto a Samba share that's running on the box where
         | paperless-ngx is.
         | 
         | It's pretty neat and feels seamless. The worst part was dealing
         | with SANE and finding linux drivers for my scanner.
        
           | godsfshrmn wrote:
           | Do you have any other info on how to do this? I've looked for
           | this but cannot find how to do
        
           | alchemist1e9 wrote:
           | I don't understand the Pi and button part. I also have a
           | Fujitsu ScanSnap and just configure it to save to a Samba
           | share.
           | 
           | What does listen for button press mean? and how?
        
             | djhworld wrote:
             | I'm not sure how I would do that on my model (ScanSnap
             | S1300i), it connects over USB and has no
             | touchscreen/control interface or network port, or wifi
             | capability, you have to connect it to a computer via USB.
             | 
             | This works fine on say, a Mac, with the official Fujitsu
             | ScanSnap software, and I'm guessing _that_ supports saving
             | to a samba share, but I wanted a solution that's
             | 
             | 1. completely headless, i.e. no desktop machine required
             | and experience needs to be friction free as the headless
             | part means the only way to interact with the scanning
             | function is to press 1 button
             | 
             | 2. linux compatible, as I wanted to connect it to a Pi. I
             | had to dig for the drivers, Fujitsu didn't have the right
             | ones for my model on their website!
             | 
             | I couldn't find any official software from Fujitsu, but I
             | found the drivers eventually, so ended up coming up with
             | connecting the scanner to the Pi over USB and glueing the
             | bits together to drop the PDFs onto the samba share
             | 
             | The button is located on the scanner, and I run "scanbd"
             | [1] to listen for the button press, this is what
             | coordinates the scan function (feeding the paper through)
             | and then post-scan -> running a script to collate + create
             | PDFs
             | 
             | [1] https://wiki.archlinux.org/title/Scanner_Button_Daemon
        
             | [deleted]
        
           | benbarbersmith wrote:
           | If you have any notes on this, I've been wanting to set this
           | up for ages and I'd be incredibly grateful!
        
             | djhworld wrote:
             | My solution was pretty much the same as what this guy did,
             | although he had a slightly different model of scanner to
             | me, but it's a very similar setup
             | 
             | https://chrisschuld.com/2020/01/network-scanner-with-
             | scansna...
        
             | mirashii wrote:
             | Paperless-ngx supports a folder on disk that you can drop
             | files into and have them ingested. Throw in a samba
             | container pointed at the same directory in your docker-
             | compose and you've replicated the same setup.
        
           | Osmose wrote:
           | I've got this setup with a Brother ADS-1700W scanner, which
           | can write directly to a network share over wifi. Paperless-
           | ngx is running on my NAS which hosts the share as well.
        
       | diarrhea wrote:
       | I self host a couple things, but if I had to choose only one,
       | it'd be this. So far the project strikes a great balance of
       | stability (zero issues over two years now) and new features
       | (ownership concept already available, allowing for multiple
       | accounts in a pretty intuitive way).
       | 
       | I've killed my instance twice now and had to restore from backup,
       | which is also surprisingly pleasant to do. Their document
       | exporter makes that possible. Having everything in a single JSON
       | and otherwise just the raw PDFs makes a ton of sense and has me
       | confident my documents are "just there" and moving to a different
       | system would be feasible.
        
         | Ylpertnodi wrote:
         | >the project strikes a great balance of stability (zero issues
         | over two years now)....
         | 
         | >I've killed my instance twice now and had to restore from
         | backup, which is also surprisingly pleasant to do
         | 
         | Stable, but murderable?
        
           | diarrhea wrote:
           | Yep, it's not undying, but the murder happened at no fault of
           | theirs. I'm taking credit for that one.
        
       | AmazingTurtle wrote:
       | I'm working on my own SaaS document management system that is
       | easy-to-use, affordable and fully automated. Basically a black
       | hole, throw a scan in or wait for emails to come it, it will
       | name, tag and categorize it. It will also attempt to retrieve
       | most important data such as invoice amount, customer numbers, so
       | that you can easily distinguish and find the documents youre
       | looking for. It comes with a chat feature so that you can ask
       | things such as "what was my liability insurance number?" and
       | it'll answer from the knowledge of your documents. I find this
       | pretty useful, recently I was at an airport and forgot my flight
       | number. I just asked what was my flight number and it retrieved
       | that information from my recent documents easily. Integration
       | with third party APIs and agnostic backend configuration for LLM
       | and OCR is in progress. It works with Google Cloud Vision OCR and
       | OpenAI at the moment.
        
         | locustmostest wrote:
         | We may want to get in touch with each other. We have an Open
         | Core document management platform that runs in AWS; I'm not
         | sure about your roadmap, but there may be something there
         | that's of use: https://github.com/formkiq/formkiq-core
        
           | AmazingTurtle wrote:
           | Cool, I mean - that's a LOT of AWS services right there.
           | 
           | But yeah, let's connect. Take a look at my project as well!
           | https://turtledev.net/projects/refind-ai
        
         | diarrhea wrote:
         | Where can I sign up to track progress? This sounds like exactly
         | the future I envisioned. I take great care manicuring my
         | paperless instance such that when the day arrives, the LLM
         | integration can work its magic best.
         | 
         | That said, open source is absolutely table stakes in this, to
         | me. From the documents I have in the system one could trivially
         | impersonate me. Perhaps even as good as clone me. So sending
         | all that off to random internet corporations, no can't do.
        
           | hiAndrewQuinn wrote:
           | That's unfortunately why I think Microsoft and Google are
           | going to be the first ones to actually achieve this future.
           | They're the only organizations well known enough that
           | enterprise might trust them with this kind of thing.
        
             | Jedd wrote:
             | https://news.ycombinator.com/item?id=37702095
        
           | AmazingTurtle wrote:
           | I keep this site updated when something changes.
           | 
           | https://turtledev.net/projects/refind-ai
        
       | gsich wrote:
       | My main gripe is that you can't use an existing folder structure.
        
       | denysvitali wrote:
       | I've created "ODI" (Overengineered Documents Indexer) and
       | presented it recently.
       | 
       | https://clis-everywhere.k8s.best/16
       | 
       | My approach is scanning the documents with airscan1, indexing
       | them with a custom OCR Server (using the MLKit by Google on an
       | Android phone which does completely offline OCR scanning) and
       | indexing everything in OpenSearch. I've then created a backend +
       | frontend to see the documents and di full text search with that.
       | 
       | Everything is (going to be) open source with a permissive
       | license.
        
         | [deleted]
        
       | nvahalik wrote:
       | I love seeing more Angular projects in the wild like this.
       | 
       | Angular is an under-appreciated, solid, no-gimmicks framework.
       | Been using it for years rather than React and it seems the the
       | pendulum is swinging back toward "this side" now.
        
       | frde wrote:
       | Looking through the setup, this seems like an insane way to
       | package an application for users to install:
       | https://docs.paperless-ngx.com/setup
       | 
       | The documentation itself is so full of implementation details
       | that, as someone who is interested in the concept of this, I'm
       | scared off even trying to setup and use this
       | 
       | The project would be much more approachable if there was a simple
       | native installer. My parents could also benefit from this but
       | there's no way they would ever even understand how to install
       | this, much less troubleshoot docker things.
        
         | switch007 wrote:
         | It doesn't look like the project goals include being
         | installable by your parents
         | 
         | It looks to sit in the self hosted space that has an admin
         | manage all the sysadmin tasks. They've provided docker which is
         | a pretty good step.
         | 
         | There are desktop apps designed at the single user/less
         | experienced user, which might be more suitable
        
         | starkparker wrote:
         | You might want Recoll[1]. Similar if less powerful
         | capabilities, cross-platform, open source, has Windows and
         | macOS installers.
         | 
         | Still an overly complex FOSS user interface for a tech-unsavvy
         | target with lots of digging around to configure it (OCR setup,
         | for instance[2]), but at least you don't need to know what
         | Docker is to install it.
         | 
         | 1: https://www.lesbonscomptes.com/recoll/
         | 
         | 2:
         | https://www.lesbonscomptes.com/recoll/usermanual/webhelp/doc...
        
         | ndsipa_pomu wrote:
         | Self-hosting services usually entails more technical knowledge
         | than just installing an app and I don't think a document
         | management system would necessarily work well as a native
         | application. For starters, there's the backup issue and you
         | wouldn't want non-technical people to store important documents
         | that only live on a local drive. Remote web access is also a
         | very useful feature for when travelling and that wouldn't be
         | easy to setup for a local install.
         | 
         | I've been using it for over a year and am very happy with it,
         | though I intend on moving it from my home Pi docker swarm onto
         | a free Oracle cloud instance to improve the performance and
         | uptime (I've got my Pis auto updating and rebooting, so
         | services get shunted around fairly often).
        
         | tmerse wrote:
         | _The project would be much more approachable if there was a
         | simple native installer_
         | 
         | Actually the very first example on https://docs.paperless-
         | ngx.com/setup lists an interactive installer which asks the
         | user some question and eventually arrives at a working docker-
         | compose setup.                   $ bash -c "$(curl -L
         | https://raw.githubusercontent.com/paperless-ngx/paperless-
         | ngx/main/install-paperless-ngx.sh)"
         | 
         | If you ask me, this is already pretty user friendly. Although I
         | agree that if your needs are more involved, there is some
         | reading you'll have to do.
         | 
         | I am currently in the process of migrating from mayan-edms to
         | paperless-ngx and it feels pretty approachable to me if you
         | know your way around docker (compose).
        
         | preya2k wrote:
         | It is designed to be a server application, so it'd be very
         | difficult to offer a desktop-like app experience, that's easier
         | to install.
        
       | bettercallsalad wrote:
       | Is it using local storage or cloud?
        
         | ndsipa_pomu wrote:
         | Yes.
         | 
         | It's a self-hosted application, so it depends on your setup. I
         | suppose it's arguably using local storage on the server you run
         | it on which is often going to be a cloud hosted machine.
        
       | beestripes wrote:
       | Does it have annotation capabilities? Quickly adding a checkmark
       | or signature would make managing documents much easier.
        
         | ndsipa_pomu wrote:
         | It looks like it does, though I've never wanted to use them. I
         | just had a quick look at my instance and you can add text notes
         | alongside the document and also there's some basic editing
         | draw/text tools to add to the document itself.
        
       ___________________________________________________________________
       (page generated 2023-10-07 23:00 UTC)