[HN Gopher] Snebu - Simple Network Encrypting Backup Utility ___________________________________________________________________ Snebu - Simple Network Encrypting Backup Utility Author : derekp7 Score : 58 points Date : 2020-12-27 13:36 UTC (9 hours ago) (HTM) web link (www.snebu.com) (TXT) w3m dump (www.snebu.com) | Galanwe wrote: | The article doesn't go into much details on the atomicity of the | backups: | | Are the backups performed while the mount points are still being | written to? | | If the block device is locked during the backup, does the writes | fail or just block? | derekp7 wrote: | It uses standard "tar" and "find" commands on the client. If | you have something like a database, then you should exclude the | dbf files and uses the database's tools to put the DB in hot | backup mode (using a snebu-client plugin script -- there is a | template script in the manpage for snebu-client-plugin(5). | | You can also create a plugin script that performs a disk | snapshot (using LVM snapshot or similar), then mount and backup | from that. | karmakaze wrote: | Catchy name, sounds of a Dr. Seuss creation. | | I used to use byobu and rsync all the time thinking that byobu | was a wrapper script around 'screen' for bring your own back-up. | Later learned it's Japanese for those folding room divider | screens. | | Perhaps someone could draw a snebu for the project. | candiddevmike wrote: | To echo the other comment, please use restic or borg instead of | this. For being a "simple" tool, it appears more complicated and | with more questionably useful features than restic or borg. Borg | and restic have server components that allow clients to send | backups, I believe restic also allows the clients to manage | encryption/not trust the server. | derekp7 wrote: | The "simpleness" in Snebu comes from using standardized tools | such as find and tar, file storage using LZOP, and the backup | catalog in a standard SQLite3 DB, allowing manual recovery of | files. | | However Restic and Borg do have the advantage of writing | directly to cloud backups or "dumb" backend data stores such as | sftp targets, or API-accessed cloud targets (although Restic | doesn't compress and Borg has issues with multiple hosts going | to the same repository, they do still have their use cases). | Whereas Snebu is more of the use case that would be solved by | more complicated to manage tools such as Bacula, Amanda, or | commercial backup solutions but with lower setup and | administrative requirements. | Aachen wrote: | > Restic doesn't compress | | I didn't actually know that, that's kinda weird. It doesn't | seem to be vital to deduplication, looking at | https://github.com/restic/restic/pull/2441, they just didn't | originally implement it and are now wondering how to best | introduce this breaking change (can be forwards compatible, | but of course not backwards), trying to get it right in one | try, keeping the format "robust", adding other breaking | changes while they're at it... wonder if this will happen any | time soon. | | To be honest, though, 98% of the time large files (anything | over a few hundred kilobytes) is in a format that compresses | to a certain extent. Pictures, audio, even text documents and | spreadsheets are compressed these days. One person mentions | huge XML files which sounds like they really should have been | using a different tool (not a text file for starters, if | you're not going to read through those gigabytes of text | anyway), but there are other use cases like somewhat sparse | database tables or disk images or so. Then again, if you have | the space for those original files, doubling the space for a | reserve copy is usually not a deal breaker, and Restic also | deduplicates within and between files so if something were to | be very redundant and compressible it is also likely to be | caught by that algorithm. | | I'm not against increasing the complexity to add it, and | breaking compatibility of course---it's a clear downside of | Restic to not have this---but I also don't really get anyone | who considers it a deal breaker (and only to an extent those | that find it a big deal). | | Either way, thanks for pointing this out, I didn't know (and | made an incorrect assumption since it seems so standard) and | it's good to know about. | [deleted] | ajsnigrutin wrote: | For a simple tool, use rsnapshot ( https://rsnapshot.org/ ). | | Pull system, which rsyncs (over ssh) the data from machones to | folders on a server, then, on the next day, creates a hard-link | copy of the backup folders, to get a separate "yesterdays" | version, and rsyncs over the "todays", to get the new "todays". | ...plus some scripts to do a weekly,and monthly versions and to | keep the number of backups at a desired number (config). | | Only downside is, that it needs root access to rsync whole | filesystems, and that you need to manually set the excludes for | each machine. | derekp7 wrote: | That was actually the original inspiration for Snebu (not | rsnapshot specifically, but a custom script implementing | similar functionality based on the original rsync snapshot | paper). Snapshots are so much better than level-based backups | (full/differential/incremental) when using disk-backed | storage. | | I ended up running into this problem: | https://news.ycombinator.com/item?id=8305283 when I got to 20 | hosts and a year's worth of backups (14 daily, 6 weekly, and | 12 monthly snapshots, so 32 snapshots per hosts). It was made | worse in my case because I was using ZFS with compression. So | the original implementation of Snebu used shell scripts, flat | files for the catalog, and a small utility to burst apart tar | files. | rsync wrote: | "For a simple tool, use rsnapshot ..." | | Even simpler is to do a dumb, 1:1 mirror, with rsync _and | then rotate zfs snapshots on the server side_. | | Two benefits here over rsnapshot (or rsync snapshots, with | hardlinks) are: | | 1) rsync snapshots diff on a file granularity - if you change | one bit of the file, your penalty is the storage space for | that entire file. ZFS, on the other hand, diffs on a bits | level so your penalty is just the changed portions of that | paticular file. | | 2) Depending on your implementation, ZFS snapshots are | immutable/read-only ... so neither a misconfiguration nor | ransomware/mallory can destroy those backups if you lose your | credentials. | | Historically, people would create and maintain "rsync | snapshots" inside their rsync.net accounts using 'cp -al' and | other unix commands run over SSH.[1] | | After transitioning to ZFS, however, we encourage all | customers to just do a "dumb" 1:1 rsync to us and let us | maintain and rotate their preferred snapshot schedule | (days/weeks/months/quarters/years). | | [1] | https://www.rsync.net/resources/howto/remote_commands.html | derekp7 wrote: | Just curious, how do you sync file ownership and | permissions when going to a ZFS target server with a non | privliged account? Or would you have all the files on the | target owned by the user ID accessing the target server, | then keep the metadata (owner/mode/SE Linux attributes / | ACLs, etc) in a separate log file? | | And do you typically turn on ZFS compression? I tried that | years ago with ZFS under Linux, but had major slowdowns | after a while. But I'm sure it has improved quite a bit in | more recent years. Also how does ZFS stack up against Btrfs | in your experience? | | But for a direct-to-cloud backup, the way you do it is | really great -- to me it is much easier than messing with | things like S3 buckets and similar technoligies. | rsync wrote: | Well, if you are using a _sophisticated_ tool like restic | or borg or duplicity, the file attributes are bundled up | into the encrypted chunks and all is well. | | However, if you are using plain old rsync, you can use | the --fake-super command line switch to maintain those | attributes and I believe rsync.net customers are using | that successfully. | | Yes, you are correct - in the old days (2001-2010 or so) | we maintained a pair of "permission scripts" which | allowed customers to dump all their attributes to a file | and then reapply it later. Again, --fake-super seems to | "fix" all of this. | | Generally speaking, you should _always_ enable ZFS | encryption. It improves many aspects of ZFS behavior and | performance - not just space usage. Unless you have very | bizarre workloads or hardware, I think that in 2020 there | is no reason to _not_ enable compression. Deduplication | is another story and is _very expensive_ to maintain - | rsync.net has never run de-dupe on any systems. | | We (rsync.net) and I (personally) have zero experience | with btrfs ... | izacus wrote: | Can anyone recommend a Linux (preferrably cross-platform) tool | like this which is appropriate for use in personal laptops? | | It needs to: | | - Handle random sleeps / disconnects properly without corrupting | backup, | | - Resume backup in progress if the process was interrupted due to | sync or network issues. | | - Be resilient to sleeps/wakeups in a way that it won't just fail | to backup the machine for months on end. | | - (Bonus) Be able to skip backup when connected to mobile | hotspots or other slow networks. | | Any ideas? I'm very happy with Arq backup on Windows/macOS, but I | just can't seem to find anything on Linux. Everything just | assumes a hardwired workstation/server and completely falls apart | on a laptop. | clankyclanker wrote: | Have you looked at git-annex on an anacron-job? | | https://joeyh.name/code/git-annex/ | | It lets you use git to back up large files. However, it's built | for a git-sized number of files and not built for, say, your | entire home directory. | the-dude wrote: | Dropbox? | specktr wrote: | I'd also be interested in recommendations that meet the above | requirements. | | My current work around for my main linux laptop is setting | rclone to run with crontab at a time that I know i'll be online | for an extended period (evenings during long homework | sessions). I've been doing this for several years and have yet | to run into major show stopping bugs, and I frequently do | checksum verification on my backups. | izacus wrote: | Yeah, that's sadly not very useful for laptops I manage - | people tend to turn them on and off at random times and I had | multiple cases where the backups just ended up in hosed state | (because the tool got interrupted at a bad time) or just | didn't backup for week because the laptop never ran for | enough time without interruption after creating a huge file. | derekp7 wrote: | What about a split or shadow repository? That is, have a host- | specific local cache of the metadata from the main backup | server, so that a snapshot backup can be made locally (say to | an attached thumb drive or SD card), then have a process that | sync's that up periodically when a stable network connection is | available? | | If this is a strong enough use case then I have a couple ideas | on how to implement it (I'll throw it in the wishlist). | Otherwise, currently if you have an interrupted backup the next | one will still re-use the files already on the server, even if | it is part of a partial backup. | rsync wrote: | "Can anyone recommend a Linux (preferrably cross-platform) tool | like this which is appropriate for use in personal laptops?" | | I think you should look into the 'borg' backup tool - it has | become the de facto standard for remote backups because it does | everything that rsync does (efficient, changes only backups) | but also produces strongly encrypted remote backup sets that | only you have a key to ... the remote has no access to the | data. | | The borg website is here: | | https://borgbackup.readthedocs.io/en/stable/ | | and a good description of how it works and why you should use | it is here: | | https://www.stavros.io/posts/holy-grail-backups/ | izacus wrote: | Borg has zero scheduling functionality which means that it | doesn't fit any of the requirements (at least not without a | frontent). | | It's literally the tool I was thinking about when I said "all | tools expect a wired workstation". | | I don't want to manually write bash scripts to resume and | check backup status. | rsync wrote: | This is _not_ intended to solve your use-case - in fact, | what I am about to write is primitive and childish ... | however, I have done it myself and seen others do it: | | You can replace a command like this in your crontab: | rsync blah blah | | with a command like this: rsync blah blah ; | rsync blah blah ; rsync blah blah | | See what I am doing there ? I sometimes deploy this hack | with very spotty WAN connections but the laptop sleep/wake | is basically the same thing ... you wake up the laptop and | the running command bombs out and the next one starts. | | The reason this "works" is because rsync picks up right | where it left off on a broken transfer. I believe restic | and borg have similar behavior ... | beagle3 wrote: | resuming backup after network has disappeared for a while or | after sleep is problematic and mostly impossible to do in an | atomic way. | | If you have enough local storage, I'd put a local backup | process (restic/borg/bup), and then rclone/rsync your backup | repository to remote storage. | | If you don't have enough local storage, you can restic or bup | to a remote server reliably as long as you can complete one | scan. On my laptop with a 256GB SSD, a common restic scan with | cold caches takes less than 15 minutes; an hourly backup is a | few hundred megs, a daily backup is often close to 2 gigs, | depends on how many files you have, and how many (and how much) | of them change. | | Alternatively, you can do an rsync-based snapshot. | izacus wrote: | There are multiple Windows and macOS tools that manage to do | that - without writing my own bash scripts which can fail in | corner cases. | | I want to avoid writing them - last time I ended up with | devices not backed up for months because the backup tools | ended up in a partially broken state that caused silent | failures. | | Hence why I want to have a tool designed for this use-case | and not a bunch of bash shell scripts in cron. | hiq wrote: | Can you mention what you've tried so far? I'm happy with | borgbackup, but I don't know if it matches your bullet points. | | Still, I'd be surprised if mainstream backup solutions corrupt | your backups in case of sleep / disconnect. That would be a | pretty big bug. Borgbackup also handles resumes IIRC. Your | third point is unclear, you'd have to try in practice. But if | throughput * time awake < size of compressed deduplicated data, | you obviously cannot expect your backup to complete. I guess | you just rephrased your second point. | | For the last point, personally I think I'd take care of mobile | hotspots / slow networks with a script wrapper rather than | expecting the backup software to handle this. | izacus wrote: | I couldn't find any borg configuration to schedule and resume | after sleep. Unless that changed, how does it fit my | requirements? | [deleted] | hiq wrote: | Regarding sleep that would be the role of e.g. systemd (or | whatever you have on your system) and you'd have to write a | service unit handling that (although I'm not sure what | you'd want exactly, if you have a program running when you | put your computer to sleep, it'll still be running once it | resumes, so not much to do). | | Regarding schedule that could be part of the service, but a | simple cronjob can also do the trick depending on what you | want exactly. | bartvk wrote: | Arq does this really good, I know exactly what you mean. It's | more or less fire and forget. And unfortunately it looks like | you're not getting an answer. | mkl wrote: | I use Syncthing, which just silently does its job, to the point | that I forget it's there. I use that to get files to my Linux | PC, where they can be backed up along with everything else. | | I don't know if Syncthing has your bonus point, as I don't use | mobile hotspots much. | Aachen wrote: | I'm excited to see another tool for off-site encrypted backups! | But I'm also wondering why reinvent the wheel when we have Restic | (or does this predate a usable version of restic?). How does it | compare? It sounds like Restic does everything this does except | it doesn't need a local database and this only skips files that | are already uploaded rather than actual de duplication, but I | might be wrong there. | | Why was the design decision made to hash filenames with sha1? I | don't see a security flaw in this specific use-case, but why | dance with the devil? If it's about shorter hashes, it's safer to | truncate a sha2. | | Since it uses a public key to encrypt to, can a malicious server | overwrite your data with encrypted but fake data or is the data | also signed using a private or symmetric key on the client side? | It might seem like an unusual use case, but why trust the server | if we don't have to, and for servers the attacker might know what | software it runs and therefore what the file structure is. If you | can make someone restore software with modifications of your | choosing, that would be rather powerful even if admittedly | difficult to pull off. | | How does that local sqlite database grow, e.g. for a regular | amount of files for a 1TB drive, do you get a bunch of gigabytes | of state like Duplicity? (I gave up on duplicity because it ate | too large a chunk of my ssd while restic didn't need local state | at all.) What if the local file is gone, can it (albeit slower I | guess) still run normally or do you now need to do a full back- | up? | | Edit: Found one of the answers: it signs the data with hmac | "using a combination of the RSA Public key and the passphrase | used to protect the RSA Private key" as key. Erm, what? The | private key is contained on the client? What's the point of | public key encryption if not to not be able to decrypt the data | unless you have access to the private key? If the client has the | public and the private key, it might as well use symmetric | encryption. But what's weirder: the passphrase that protects the | private key is used for signing?! Why not use the private key to | sign the hash if you have access to it?! And mixing in the public | key does nothing, as this is known information to an attacker. | Might as well store the plain hash if it weren't for the private | key's passphrase. | derekp7 wrote: | To address the question in your edit: | | When the key file is created, you are prompted for a password | to encrypted the private key. That password is used (along with | the public key) to generate the HMAC key which gets recorded in | the keyfile and is used to sign each file in the backup. The | encrypted private key is sent to the server, and gets sent back | when you do the restore, but the HMAC secret key stays on the | client. | | Since you need to type in the password to restore the data, the | HMAC key gets re-generated at restore time. | | The reason the passphrase is hashed with the public key (again | using an HMAC-SHA-256 hash) is so that if the same passphrase | is used on different hosts with unique encryption keys, the | HMAC key still ends up being unique. | | So a compromised key file on the client will compromise the | secret HMAC key, but it won't compromise the encrypted data. | Which is better than having a single symmetric encryption key | laying around for both encryption and authentication | (encryption being the more important item). And the key is tied | to the backup snapshot, not to the repository so each client | can have their own keys. | | Further information is at https://www.snebu.com/tarcrypt.html | which gives the inner workings of tarcrypt (some of the | description of the tar file extensions are dependent on a | working knowledge of the basic tar format and the PAX | extensions, which are documented in the GNU tar documentation). | lhoff wrote: | Inital commit of snebu was from january 6th in 2013 while | restic's alpha release was in 2015. So snebu is older. That | probably also explains why sha1 was chosen in the first place. | derekp7 wrote: | Glad you asked. First, Snebu goes back about 8 years, although | the encryption code was developed more recently. That also | explains thy it the hash is sha1 (at the time the thinking was | "it is good enough for Git"). Right now it is kept for | unencrypted backups for backward compatibility with previous | versions. But it is something I would like to address -- | possibly by using sha-256 (or truncated sha256) or make it user | selectable on new repositories. This will be on the next | version, didn't want to make too many changes at once. However | encrypted backups do use a sha256 HMAC. | | Compared to Borg and Restic, by making the system server-based, | it does limit things like being able to add direc-to-cloud | backup, but a future capability may be added to efficiently | replicate a Snebu server backup to cloud. However what it gains | is better manageability, especially if you are backing up a | number of hosts to one repository. In my instance I'm backing | up about 50 hosts with 14 daily, 6 weekly, and 12 monthly | snapshots. | | Compared to Restic and Borg, it uses public key encryption (so | you don't need to leave a sensitive symmetric key on your | clients), but with HMAC-256 based hashes. So an attacker would | need both your public key and the HMAC secret (only a one-way | hash of the secret is stored on the server). The client | reconstructs the HMAC key using the public key and the | passphrase for the private key. Oh, and you can use multiple | keys -- a client specific key, and a backup key (or more). | Deduplication will work across all clients that share the same | set of keys (the HMAC of all keys is used for the file storage | hash). But for this attack scenario (forged data), you will | need to keep the client key file safe and stored with | appropriate permissions, and you can use a different key (and | lose cross-client deduplication) across clients that are in | different trust zones. | | As for the SQLite DB size -- on one of my installs the DB is | taking 50 GB, and is storing over 1500 backup snapshots, from | 73 hosts (CensOS and RHEL some with Oracle DBs), with a total | of 4.5 TB of disk storage used. | | The SQLite DB is critical as it stores which file belongs to | which directory path on which host -- so it should be backed up | independently (at the end of the backup script I use sqlite3 | commands to do a dump of it once a week and run a vacuum | command). Todo: add these maintenance commands to the Snebu | binary to limit the need to run the Sqlite3 CLI. | | Other features unique to Snebu -- you can have the server | "pull" backups without installing an agent on the clients (the | "tarcrypt" will still need to be on the client if encryption is | needed). You can have the clients "push" data to the server | using a restricted user account that allows backups but not | deletes or restores, and have a separate restore ID for | example. You can give different administrators IDs on the the | backup server access to different groups of hosts in a look- | but-don't-touch, etc. | | Another item I like -- let's say you tell it to expire all | monthly backups older than 6 weeks. But then if the host hasn't | successfully backed up in more than that time (say it has been | taken offline), there is a default to keep the most recent 3 | backups in a retention schedule, so you won't accidentally lose | all you backups in this case (the "preserve" number can be | adjusted via command line parameters). | lucb1e wrote: | > possibly by using sha-256 (or truncated sha256) or make it | user selectable on new repositories | | Honestly I have not found that user-selectability of | algorithms has made anything better. It might be convenient | for the developer or a maintainer to somehow be able to | change it without recompiling the whole thing, but I wouldn't | include security parameters in regular config files that | users change. In the 90s this made sense with encryption | considered equivalent to ammunition (it still is, but laws | have been relaxed) and needing to be changed on the fly, not- | so-battle-tested algorithms that you want to quickly switch | between, updates coming on CDs, etc. These days, we have | security updates coming automatically and we know that some | algorithms are pretty much rock solid for the foreseeable | future. If one person reads up on it properly and then sets | good values, a user (even if that user is a knowledgeable | sysadmin) should really not have to tweak it. | | SHA2 sounds like a good choice, I would just stick with that. | Do think about upgrade paths, but assume that you (or a | future maintainer) will just provide an update when (if) SHA2 | starts to be weakened in similar ways to SHA1. | | > a number of hosts to one repository | | Oh that sounds cool, especially with deduplication between | them. Restic would be able to do that as well, but they | couldn't run concurrently at all and it doesn't really seem | to be meant for that sort of thing even if it does store the | hostname with each backup ("snapshot"). This would certainly | be an interesting feature to a lot of people, I had never | even considered using the same repo for multiple systems! | | > [the sqlite database] should be backed up independently (at | the end of the backup script I use sqlite3 commands to do a | dump of it once a week [...] | | Oh, that is a very important gotcha. I would expect that when | I made a backup, it contains whatever is needed for it to be | restored. | | Glancing through the results when searching the page for | 'database', it doesn't seem to be mentioned. Am I overlooking | the backup script you mentioned, or is that just your own | backup script and not on the page? | | > Other features unique to Snebu -- you can have the server | "pull" backups without installing an agent on the clients | (the "tarcrypt" will still need to be on the client if | encryption is needed). You can have the clients "push" data | to the server using a restricted user account that allows | backups but not deletes or restores, and have a separate | restore ID for example. You can give different administrators | IDs on the the backup server access to different groups of | hosts in a look-but-don't-touch, etc. | | Ooh that is really fancy! Especially the second one; how is | that enforced, I assume with filesystem permissions? | Something like allowing to create files but not change | existing ones (chmod +w on the directory and chmod -rwx on | the files iirc)? | derekp7 wrote: | > I would expect that when I made a backup, it contains | whatever is needed for it to be restored | | The normal use case is for the Sqlite DB (the "backup | catalog", labeled as the "meta" directory in | /etc/snebu.conf), to be on the same external drive array as | the rest of the backup data. And I've found that SQLite is | extremely robust, esp. when used with a write-ahead log | file (which Snebu does by default). | | However I found that if you use an external 2.5" USB hard | drive, they typically have very bad seek times which makes | the DB a constraining factor on performance. In those | specific cases I either make sure the "meta" directory is | located on a separate small flash drive, or on the backup | server itself. | | >or is that just your own backup script and not on the page | | Just a simple script running on the backup server | for host in host1 host2 host3 host4 do | snebu-client backup --remote-client ${host} done | cp /var/lib/snebu/catalog/* /var/lib/snebu/vault/catalog- | backup/ | | Basically something like that. In reality it reads a list | of hosts from a file, on anything that has an Oracle DB on | it gets thrown in hot-backup mode first, and there is a bit | more logic to rotate the backup directory to keep a week's | worth of DB copies, but so far I think I'm just being | paranoid (as I mentioned, SQLite is really solid). | | I run that daily in cron. As a to-do, I really should | either put together a general purpose scheduling script, or | add scheduling functionality to the DB where you could add | host names, ssh private keys, and backup time. | | >how is that enforced | | If you have snebu installed and owned by a non-root user | ("snebu") and set-uid, and make /var/lib/snebu owned by the | snebu user, then only the snebu user can access it. So when | a regular user executes it, the program looks at your EUID | vs UID -- if they are different, it looks up your UID in | the userpermissions table to see if you are allowed the | given function on the given client name. If EUID = UID (if | you run snebu as the snebu user, or if it isn't installed | suid) then extended permissions aren't enforced by the | application (standard file system protections are all that | apply). | | BTW, there is one more trick that I don't have documented, | but I will put a front end for it in the next version -- | you can sync a backup from one Snebu install to a second | one, by using some additional parameters to "listbackups" | to give output similar to the documented "find" (so it can | be fed into a "newbackup" instance on a target server). And | you can issue a "snebu restore" (not "snebu-client | restore") which outputs a tar file of that backup that can | be fed into the second Snebu instance. Useful for cleanly | syncing your on-site instance to an offsite one. But again | it is a bit tricky to explain, so I'm working on how to | present the functionality in a simple front-end (along with | the ability of having the on-site server not encrypted, and | encrypt the data on the way to the off-site server). | | That should be ready for the next minor release. ___________________________________________________________________ (page generated 2020-12-27 23:01 UTC)