[HN Gopher] LTO Tape data storage for Linux nerds
       ___________________________________________________________________
        
       LTO Tape data storage for Linux nerds
        
       Author : detaro
       Score  : 240 points
       Date   : 2022-01-27 12:10 UTC (10 hours ago)
        
 (HTM) web link (blog.benjojo.co.uk)
 (TXT) w3m dump (blog.benjojo.co.uk)
        
       | CharleFKane wrote:
       | I would like to thank the author for bringing back memories. Not
       | all of which are good...
       | 
       | (I used to work for a four letter computer corporation doing
       | enterprise technical support, mostly on tape-based products.)
        
       | cassepipe wrote:
       | "Unlike most block devices these are devices that do not enjoy
       | seeking of any kind. So you generally end up writing streaming
       | file formats to tape, unsurprisingly this is exactly what the
       | Tape ARchive (.tar) is actually for. "
       | 
       | Haha! moment
        
       | magicalhippo wrote:
       | Nice calculator. Crossover point here for LTO-8 seems to be
       | around 250TB. I think I'll stick with my HDD's for now.
        
       | paulmd wrote:
       | I picked up an LTO-5 drive a couple years ago and one thing I
       | found (this is probably a good place to bring it up!) is that the
       | software documentation for tape utilities and high-level
       | overviews of the strategies employed to build and manage
       | libraries of data on this model is pretty thin on the ground at
       | this point. Completely understandable given how few people have
       | tapes these days, but it also makes it a little tougher to pick
       | up from scratch.
       | 
       | (And in particular the high-level overviews are important because
       | tapes are wear items, you only have on the order-of a hundred or
       | two (don't remember the exact figures) full tape reads before the
       | tape wears out, so this is something you want to go into it
       | knowing a strategy and not making it up as you go!)
       | 
       | Since it's complimentary to this discussion I'll link a few:
       | 
       | https://www.cyberciti.biz/hardware/unix-linux-basic-tape-man...
       | 
       | https://databasetutorialpoint.wordpress.com/to-know-more/how...
       | 
       | https://sites.google.com/site/linuxscooter/linux/backups/tap...
       | 
       | https://access.redhat.com/documentation/en-us/red_hat_enterp...
       | 
       | https://access.redhat.com/solutions/68115
       | 
       | That is, unfortunately, essentially the apex of LTO tape
       | documentation in 2022, as far as I can tell.
       | 
       | Do note that in terms of tape standards, LTO-5 is an important
       | threshold because that's where LTFS support got added, and that's
       | the closest thing to a "normal" filesystem abstraction that's
       | available for tape (sort of like packet-formatted CDRWs I guess,
       | in the sense of presenting an abstraction over the raw seekable
       | block device). There is also very little documentation on init,
       | care, and feeding of LTFS iirc - and again, it would be nice to
       | know any pitfalls that might cause shoeshining and tape death.
       | Although I suppose in practice it's mostly going to get used more
       | in a "multi session" scenario where you mostly aren't deleting
       | files, you write till it's full and then maybe wipe the whole
       | tape at once, and it's just a nice abstraction to allow the
       | abstraction of "files" rather than sequential records (tape
       | archives/TARs, in fact!) along an opaque track with no
       | contextualization.
        
       | MayeulC wrote:
       | The tech doesn't seem to be too complex, is there an open
       | hardware project?
       | 
       | Seems like one could go quite far in terms of performance with
       | just some basic HW and an FPGA. Is there significant difference
       | between multiple generation of the tapes themselves, or is it
       | just data encoding patterns that change?
       | 
       | More specifically, I was a bit appalled by the "magnetic erasing"
       | bit. Seems like DRM to me, on a medium that is conceptually
       | extremely simple.
       | 
       | One could probably take a VHS drive and convert it to a data
       | drive, unless I'm being naively optimistic about it?
        
         | justsomehnguy wrote:
         | > One could probably take a VHS drive and convert it to a data
         | drive
         | 
         | https://en.wikipedia.org/wiki/ArVid
         | 
         | > More specifically, I was a bit appalled by the "magnetic
         | erasing"
         | 
         | Nobody laments what there is no 'low level format' for HDDs
         | anymore.
        
         | zaarn wrote:
         | A VHS drive uses a different encoding pattern, the head of the
         | VHS player is physically incapable of moving like the head of
         | an LTO tape. Additionally it lacks precision as an LTO tape is
         | much more densely packed. Lastly, LTO drives use different
         | magnetic materials and signalling, so by all chance the VHS
         | head is probably only going to pick up noise.
        
         | tssva wrote:
         | Back in the day there were a few backup products available
         | which connected to standard VHS VCRs.
        
       | dark-star wrote:
       | PSA: Don't do a cleaning run unless your tape drive tells you to
       | (there are SCSI sense codes for that). The tapes can pretty well
       | assess the need (or not) for cleaning, and excessive cleaning can
       | negatively affect the lifetime of the r/w head (the cleaning
       | tapes are abrasive)
        
       | wolfgang42 wrote:
       | The mt(1) manpage describes seeking on files, records, and file
       | marks, but doesn't explain what any of them are. What's the
       | difference between these these options? (It sounds like file
       | marks are stored on the tape on a special track or something, but
       | I can't seem to find any discussion of the others.)
        
         | StillBored wrote:
         | So, all from a few years old memory and its a complex
         | interwoven mess.
         | 
         | Lets start with tape has two types of head positioning
         | commands, locate and space. Locate is absolute (and mt calls it
         | seek), and space is relative. Mt is generally using space
         | (although one can read the current position with tell then do
         | relative space) for all the commands that aren't "seek". Hence
         | the mt commands are things like "fsf" which is forward space
         | file (mark), or "bsf" for back space file (mark). At some point
         | in the past someone thought that each "file" would fit in a
         | tape block, but then reality hit because there are limits on
         | how large the blocks can actually be (in linux its generally
         | the number of scatter gather entries that can fit in a page
         | reliably). So there are filemarks, which are like "special"
         | tape blocks without any data in them. Instead if you attempt to
         | read over a filemark the drive returns a soft error telling you
         | that you just tried to read a filemark. There are also "fsr"
         | for forward space records with are just the individual blocks
         | forming a "file".
         | 
         | So back to seeking. If you man st, you will notice that each
         | tape drive gets a bunch of /dev/st* aliases, which control the
         | close behavior/etc, as well as some ioctls that match the mt
         | commands. The two important close behaviors to remember are
         | that if the tape is at EOD due to the last command being a
         | write it will write a filemark, then rewind the tape unless a
         | /dev/stXn device is being used, in which case it will leave the
         | head position just past the FM (this is actually a bit more
         | complex too because IIRC there may be two filemarks at EOD, and
         | the tape position gets left between them).
         | 
         | This allows one to do something like "for (x in *.txt); do cat
         | $x >> /dev/st0n; done" and write a bunch of files separated by
         | filemarks (at the default blocking size which will be slow
         | (probably 10k), replace the cat with tar to control
         | blocking/etc). Or if you want to read the previous file `mt -f
         | /dev/st0n bsf 2` to back space 2 filemarks.
         | 
         | Now, the actual data format on tape is going to be dictated by
         | the backup utility used to write it. Some never use filemarks,
         | some do but as a volume separator (eg tar), old ones actually
         | put FM's between files, but that tends to be slow because it
         | kills read perf because it takes the drive out of streaming
         | mode whenever you either read over the filemark (not the part
         | on man st about reading a filemark).
         | 
         | Now you can pick which file to read via "mt -f /dev/st0 rewind;
         | mt -f /dev/st0n fsf X; cat /dev/st0n > restore.file"
         | 
         | There are also tape partition control commands, and tape set
         | marks and various other options which may/may not apply to a
         | given type of tape. Noticeably there are also density flags on
         | the special file (some unix'es) and via mt. LTO for example
         | doesn't have settable densities because its fixed by the
         | physical tape in the drive. Some drives STK T10K/IBM
         | TS11X0/3592 can upgrade the tape density/capacity when used in
         | a newer drive.
         | 
         | That got long...
        
       | kortex wrote:
       | Is there a unix-style streaming tool, like tar/zstd/age, that
       | does forward error correction? I'd love to stick some ECC in that
       | pipeline, data>zstd>age>ecc>tape, cause I'm paranoid about
       | bitrot. I search for such a thing every few months and haven't
       | scratched the itch.
       | 
       | The closest is things like inFECtious, which is more of just a
       | library.
       | 
       | I would prefer something in go/rust, since these languages have
       | shown really high backwards compatibility over time. Last thing
       | you want is finding 10 years later building your recovery tool
       | that you can't build it. Will also accept some dusty c util with
       | a toolpath that hasn't changed in decades.
       | 
       | https://github.com/vivint/infectious
       | 
       | Ok I just dug up blkar, this looks promising, but the more the
       | merrier.
       | 
       | https://github.com/darrenldl/blockyarchive
        
         | StillBored wrote:
         | So, while others have pointed out the media blocks are ECC
         | protected/etc, I think what you are really looking for is
         | application/fs control. LTO supports "Logical Block Protection"
         | which is meta data (CRC's) which are tracked/checked alongside
         | the transport level ECC/etc on fibrechannel & the drive itself.
         | 
         | Check out section 4.9 in
         | https://www.ibm.com/support/pages/system/files/inline-
         | files/....
         | 
         | To be clear, this is a "user" level function that basically
         | says "here is a CRC I want the drive to check and store
         | alongside the data i'm giving it". It needs to be supported by
         | the backup application stack/etc if one isn't writing the drive
         | with scsi passthrough or similar. Its sorta similar to adding a
         | few bytes to a 4k HD sector (something some FC/scsi HDs can do
         | too) turning it into a 4K+X bytes sector on the media, that
         | gets checked by the drive along the way vs, just running in
         | variable block mode and adding a few bytes to the beginning/end
         | of the block being written (something thats possible too since
         | tape drives can support blocks of basically any size).
         | 
         | The problem with these methods, is that one should really be
         | encoding a "block id" which describes which/where the block is
         | as well. Since its entirely possible to get a file with the
         | right ECC/protection information and its the wrong (version)
         | file.
         | 
         | So, while people talk about "bitrot", no modern piece of HW
         | (except intel desktop/laptops without ECC ram) is actually
         | going to return a piece of data that is partially wrong because
         | there are multiple layers of ECC protecting the data. If the
         | media bit rots and the ECC cannot correct it, then you get read
         | errors.
        
           | eternityforest wrote:
           | There's gotta be an API to get the raw data even if it's
           | wrong, right?
        
             | StillBored wrote:
             | Not usually, its the same with HD's. You can't get the raw
             | signal data from the drive unless you have special
             | firmware, or find a hidden read command somewhere.
             | 
             | The drive can't necessarily even pick "wrong" data to send
             | you because there are a lot more failure cases than "I got
             | a sector but the ECC/CRC doesn't match". Embedded servo
             | errors can mean it can't even find the right place, then
             | there are likely head positioning and amp tuning parameters
             | which generally get dynamically adjusted on the fly. This
             | AFAIK is a large part of why reading a "bad" sector can
             | take so long. Its repeatedly rereading it trying to
             | adjust/bias those tuning parameters in order to get a clean
             | read. And there are multiple layers of signal
             | conditioning/coding/etc usually in a feedback loop. The
             | data has to get really trashed before its not recoverable,
             | but when that happens it good and done. (think about even
             | CD's which can get massively scratched/damaged before they
             | stop playing).
        
         | dmitrybrant wrote:
         | If I'm not mistaken, the tape drive automatically adds ECC to
         | each written block, and then uses it to verify the block next
         | time you read it. So if there's bit rot on the tape (i.e. too
         | much for ECC to fix), it will just be reported as a bad block
         | with no data, and there wouldn't be any point of adding
         | "second-order" ECC from the user end.
        
           | metabagel wrote:
           | You're exactly right. There is substantial ECC in the LTO
           | format. If the drive can recover the data, then it's valid.
        
           | BenjiWiebe wrote:
           | There might be a point if you interleaved data and/or had a
           | much higher amount of EC, such that you could recover from
           | isolated bad blocks.
        
         | c0l0 wrote:
         | It may not _exactly_ be what you are looking for, but if you
         | want to protect a stable data set from bit-rot after it 's been
         | created, make sure to take a look at Parchive/par2:
         | 
         | https://en.wikipedia.org/wiki/Parchive
         | 
         | https://github.com/Parchive/par2cmdline/
        
           | genewitch wrote:
           | Parity archives used to be extremely popular back when dialup
           | was king. I've often wondered if there's a filesystem that
           | has that sort of granular control over how much parity there
           | is. I'd use it, for sure.
        
             | uniqueuid wrote:
             | ZFS is probably closest to what you want.
             | 
             | It allows you to choose the amount of parity on the disk-
             | level (as in: 1,2, or 3 disk parity in raidz1, raidz2 and
             | raidz3). You can also keep multiple copies of data around
             | with copies=N (but note that when the entire pool fails,
             | those copies are gone - this just protects you by storing
             | multiple copies in different places, potentially on the
             | same disk).
             | 
             | [edit] To add another neat feature that allows for
             | granularity: ZFS can set attributes (compression, record
             | size, encryption, hash algorithm, copies etc.) on the level
             | of logical data sets. So you can have arbitrarily many data
             | stores on a single pool with different settings. Sadly,
             | parity is not one of those attributes - that's set per
             | pool, not per dataset.
        
               | Notanothertoo wrote:
               | Zfs is king imo. Brtfs is the more liberally licensed oss
               | competitor and Refs is the m$ solution.
        
             | JustFinishedBSG wrote:
             | Still extremely popular (as in _the norm_ ) on Usenet
        
         | dmitrygr wrote:
         | man par2
        
       | amelius wrote:
       | With these prices for drives the market seems ripe for
       | disruption.
        
         | dsr_ wrote:
         | It already is, by spinning disks. Cheaper at the low end,
         | faster the whole way through, random access beats linear access
         | for end user expectations.
        
           | zozbot234 wrote:
           | SMR spinning disks are also being widely repurposed as
           | "archival", somewhat tape-like media since they turned out to
           | be quite low-performance for the most common use scenarios
           | (which means they were getting dropped from soft-RAID arrays,
           | etc.).
        
           | amelius wrote:
           | You are too much focused on read speed. I just want to write
           | huge amounts of data at a low cost, and don't mind waiting a
           | day for retrieval. I.e., how one normally uses backups.
        
             | lazide wrote:
             | Most of the time when people think backups they need faster
             | than 24 hr turn around to restore - because it usually
             | takes about that long to figure out they even need a
             | backup, and most people don't think ahead enough for 2 day
             | recovery time to be useful for most use cases now a days.
             | 
             | If their local snapshots are dead too, or they look for it
             | and realize they can't find a copy of something they
             | thought they had, it's often because they needed that data
             | right away and it wasn't there when they went to get it.
             | Hence 'user expectations'.
             | 
             | That's not in a catastrophic case (which rarely happens)
             | that's the 'bob just realized he deleted the folder
             | containing the key customer presentation last Friday' or
             | 'mary just tried to open the contract copy she needed and
             | it's corrupted'.
             | 
             | If it's a once in 10 or 100 year or whatever event, a 1-2
             | day turnaround is not unexpected and everything else is
             | probably broken too. The file deleted or something got
             | screwed up happens more often and slow response there
             | grinds things to a halt - and causes a lot of stress
             | knowing it's not 'solved'.
        
               | amelius wrote:
               | I bet most companies who are confronted with ransom
               | demands would die for tape backup even if restoration
               | took a week (which is the amount of time they need anyway
               | to get the whole mess sorted out).
        
             | TheCondor wrote:
             | And durability. I've had a portable usb hard drive fall
             | over on my desk and it had major problems after that. Solid
             | state fixes that but it's expensive and I've heard they can
             | lose data if not plugged in with regularity
        
               | lazide wrote:
               | Yeah, SSD is not good for long term storage (like a copy
               | of your tax documents from last year you might need in 5
               | years). The expense for size also makes it infeasible to
               | copy ongoing roll up copies of everything which is one
               | way of solving that.
        
           | KaiserPro wrote:
           | kinda but not. The problem with spinny disks is that you have
           | allocate space for them. You can't quickly swap out drives to
           | take offsite.
           | 
           | Whats grand about tape is that its still faster to dump to
           | your library, eject the magazines and store off site.
           | 
           | Whilst you can do that with HDDs (think snowballs but bigger)
           | its a lot more expensive and error prone.
           | 
           | Tape serves a purpose, but thats pretty niche by todays
           | standards.
        
       | wglb wrote:
       | It would appear the Google backs up the internet on tape:
       | https://www.youtube.com/watch?v=eNliOm9NtCM
       | 
       | Or at least did at one time.
        
         | fishnchips wrote:
         | It probably still does. I was on the gTape SRE team until 2014
         | and we had lots and lots of tapes and tape libraries back then,
         | most of them giant beasts with 8 robots each. With the capacity
         | of new LTO generations constantly growing and the existing
         | investment in hardware and software it would be unusual to
         | discard that.
        
       | cassepipe wrote:
       | Apart from archiving huuuuuge amounts of data, does it make sense
       | for any business to invest in those when you add up in the
       | qualified work time it necessitates for the halved priced it
       | provides. Plus the constant reinvestment in hardware. Plus the
       | fact that to get the data you actually need a human to fetch for
       | you and operate a machine.
       | 
       | Who uses this ?
        
         | motoboi wrote:
         | Everyone.
         | 
         | It's much easier to store tapes in a fire proof and water
         | resistant safe than to find a fire and water resistant storage.
         | 
         | So you can keep you backups in disk, but last resort disaster
         | recoveries should be on tape somewhere.
         | 
         | Gmail has tapes[1]. And they saved me their asses at least
         | once. This can give you a hint of how important and how much
         | use tapes get.
         | 
         | 1 -
         | https://www.datacenterknowledge.com/archives/2011/03/01/goog...
        
         | madduci wrote:
         | A lot of companies, trust me
        
       | archi42 wrote:
       | Something not mentioned by the author, but what I was told here
       | on Hacker News some years ago: If your drive has too much wear
       | (or misalignment of the drive head?) you might end up with tapes
       | that you can only read with exactly your drive.
        
         | detaro wrote:
         | That's something I've seen mentioned too but never could verify
         | if that is something that's actually true with modern tape
         | standards or not. (i.e. last I asked on HN I was told it wasn't
         | a concern anymore) If the drive needs to adjust to get precise
         | enough positioning anyhow, misalignment seems way less likely.
        
           | StillBored wrote:
           | That was true before embedded servo tracks (why the author
           | mentions you cant bulk erase LTO tapes), its not been true
           | for ~20 years unless one was using DLT, DAT, etc.
        
           | op00to wrote:
           | It's absolutely true. There is a LOT more to tape storage
           | than meets the eye.
           | 
           | Let's say you're using LTO tapes as an archive. Did you know
           | LTO tape itself is abrasive, but that abrasive is meant to
           | wear over time with the intended use of the cartridges, which
           | was backups?
           | 
           | If you use new tapes a single time, the abrasive doesn't wear
           | and destroys the tape heads. You will go through a drive head
           | at month, running the drives 24/7. I had a library used as a
           | genomic storage archive with 8 drives (always write, almost
           | never read), and two were constantly out of service, as we
           | averaged two head replacements from IBM a week.
           | 
           | This is much less a factor on use tapes that have been run
           | through a drive a few times.
        
             | detaro wrote:
             | But that's different than "drive will produce tapes that it
             | can read, but other drives don't"? Because sure, drives can
             | fail and need service/replacement, but that's less
             | insidious than a drive producing tapes that are silently
             | unusable in other drives.
        
             | KaiserPro wrote:
             | It used to be true with DAT tapes.
             | 
             | I've not seen it on LTO. Where I work we either had very
             | large tape libraries, with 25+ drives in. We didn't have
             | drive affinity, so if that happened I would get an alert.
             | 
             | The other team used to import bulk data by receiving tapes
             | from all over london and beyond, there must have been
             | thousands of drives writing and reading that data. Plus we
             | didn't buy fresh tapes, and they were dropped, thrown, left
             | in the cold/sun, all sorts.
             | 
             | I think LTO is pretty solid.
        
             | eternityforest wrote:
             | I wonder why the head has to touch the tape at all? Does
             | the hard drive thing where you float a few nm away not
             | apply?
        
             | metabagel wrote:
             | I worked for an LTO tape drive manufacturer for 20 years,
             | and I never heard about this. I think something else was at
             | play here, although I could be wrong. The drives are often
             | used just as you did, although perhaps not always as
             | intensively. Data is written to tapes, and they are shipped
             | offsite. Basically, WORN (write once read never). The
             | backups are for an absolute emergency, such as a 911 type
             | event where a whole building comes down or a data center
             | burns to the ground.
             | 
             | A few factors which may have influenced what you
             | experienced:
             | 
             | * The quality of the tapes could be variable. In my
             | experience, some branded tapes were significantly inferior
             | to others.
             | 
             | * If the drive ran hot, then that may have contributed.
             | IIRC, IBM's LTO-3 drive ran very hot.
             | 
             | * If you don't write data to the tape fast enough, it won't
             | stream. It'll shoe-shine back and forth, as it runs out of
             | data, repositions backwards on the tape, and resumes
             | writing. I think this might affect the tape head life.
        
               | op00to wrote:
               | These were IBM drives in a QualStar XLS connected to
               | systems running FileTek StorEdge. I don't remember if
               | these were Fuji or Sony tapes, but I think Fuji, branded
               | Fuji.
               | 
               | We did have shoeshining issues in testing, but increasing
               | the amount of caching fixed that. Never heard of any
               | throughput issues in production, but .. .edu so you know
               | how well we monitored. That was a software issue anyway.
               | 
               | I think it was LTO5 era, but I don't rightly remember.
               | 
               | The IBM dude who handled all the hardware support would
               | take a look at everything, nod, and replace the drive. I
               | took him out for beer once and that's when he told me
               | about the issues with the tapes. I left for greener
               | pastures before that was solved, but it was going on for
               | a good year.
               | 
               | Maybe he liked the food trucks outside the building, or
               | maybe it was cheaper for them to replace the drives than
               | actually help us fix the problem. Anyway, thanks for the
               | insight! Glad I don't work on hardware anymore.
        
       | MayeulC wrote:
       | I'm wondering what would be the best way to store archival data?
       | 
       | A disk image plus compressed, encrypted then forward-corrected
       | `btrfs-send` snapshots sounds quite efficient to me. Take your
       | hourly, etc snapshots to a regular disk, write monthly ones to
       | the tape until fills up, then take another tape and repeat. The
       | downside is that you need to replay multiple diffs.
       | 
       | Or would it be a good idea to make more frequent writes? I'm not
       | sure what best practices are when it comes to tape and backup.
        
       | einpoklum wrote:
       | > LTO Tape is ... much cheaper than hard drives ... a 12TB SATA
       | drive costs around PS18.00 per TB ... a LTO-8 tape that has the
       | same capacity costs around PS7.40 per TB ... That's a significant
       | price difference.
       | 
       | Actually, it isn't very significant. Price factor of 2.5. I had
       | thought tape storage was cheaper than that. And then there are
       | the drives: A drive to write (3,000 GBP for LTO-8), and at least
       | a couple more drives for reading tapes.
       | 
       | At this price ratio, I would say that ease-of-use and
       | safety/robustness of the backed-up material are more important
       | considerations.
        
         | shellac wrote:
         | Yes, this doesn't sound quite right to me but it may be an
         | economies-of-scale thing. I work on an HPC system and we budget
         | an order of magnitude less for tape storage, and that has held
         | for quite a few years.
        
         | AshamedCaptain wrote:
         | I am also worried for the long-term. If there are new
         | generations so frequently and backwards compatibility is
         | limited or not guaranteed, I ponder if you'd be able to find a
         | working-condition tape reader for your 20-year old tape...
         | 
         | At least it's likely I can find a USB port 20 years from now,
         | or a DVD reader (they are still being manufactured today, when
         | even more than 20 years have passed since their introduction,
         | and they are even compatible with much older CDs...).
        
         | ktpsns wrote:
         | What was actually ignored at that comparison is energy costs.
         | Which can get quite somewhere, if you have all your disks
         | running 24/7 and do not use power saving functions (which is
         | frequently turned off in server contexts). Costs are in the
         | ballpark of 5W per drive, given a contemprary 16TB drive this
         | means 0.3W/TB, with 0.25EUR/kWh (a typical consumer price in
         | Germany), this is roughly 0.6 EUR per TB per year. However,
         | probably the replacement costs for these always-on disk drives
         | will be even higher.
        
           | q3k wrote:
           | Another consideration related to this is that tapes, being
           | usually offline, as much more secured against accidental (or
           | malicious!) erasure when compared to always-on hard drives.
           | 
           | Also related is that tapes can easily be transported
           | around/offsite, literally thrown in the back of a truck as
           | they are. Try doing that to hard drives and see how many
           | start throwing bad sectors after a round-trip.
        
             | einpoklum wrote:
             | HDD's can be taken offline. But beyond that - if you're
             | using HDDs as backup, you'll probably be using an HDD
             | drawer, e.g. something like one of these:
             | 
             | https://www.newegg.com/global/p/pl?d=hot+swap+hard+drive+ba
             | y
             | 
             | ... and the actual disks will usually be stored offline.
             | So, no accidental erasure. But I agree that tapes are
             | probably less sensitive to transportation.
        
           | piaste wrote:
           | If you want tape-like offline storage on HDDs, you can use a
           | SATA docking station. Keep the 'active' backup drives plugged
           | in, store full drives wherever you like.
           | 
           | As a bonus, they can generally be used to offline clone
           | drives.
        
           | archi42 wrote:
           | Tapes are offline and even require manual loading, so I think
           | it's feasible to mitigate this by just powering down the
           | backup system. At least that's what I do (with my primary
           | NAS). But yeah, disk idle usage should not be underestimated.
           | 
           | Also, some nit-picking: Energy prices in Germany are
           | currently MUCH higher than that. We moved and had to get a
           | new contract. Close to 40c/kWh. This makes your point a bit
           | stronger.
           | 
           | //edit: Also2, when doing the math I realized I should first
           | transcode suitable content to h265 (per TB saved the
           | necessary power is cheaper than a new disk), and as a second
           | step replace my four or five remaining 1 TB HDDs with a
           | single bigger drive to reduce the idle power draw (the NAS is
           | on a btrfs mixed-size RAID1).
        
         | pessimizer wrote:
         | > and at least a couple more drives for reading tapes.
         | 
         | Why?
        
           | op00to wrote:
           | Even in an "enterprise+++ class" multi petabyte, multi drive,
           | totally integrated from top to bottom tape archive for
           | scientific data, there would be all kinds of errors found by
           | our data validation process that would have failed an archive
           | restore. It's not just cache overruns, some times the tapes
           | or drives just screwed up silently.
        
           | benjojo12 wrote:
           | If you have 500TB of tape, the chances are that you are
           | reading at least one tape while also needing to write stuff.
           | 
           | I've personally never experienced that scale, I'm sure the
           | industry has some recommended ratio of drives to tapes.
        
             | op00to wrote:
             | When I evaluated this, it was all about read and write
             | access patterns. So much data coming in for so much amount
             | of time, that needs so much validation, and will be
             | restored so many times in the next few years, etc. It's
             | pretty easy if you know your data flows, but when it's a
             | big question mark, you just kind of throw hardware at it
             | and fix the bottlenecks when they come up. We usually wrote
             | more than we read, but we absolutely needed to keep read
             | capacity open.
        
           | dale_glass wrote:
           | Backups are there so that they can be restored. If your only
           | drive is dedicated to writing, then you may never bother
           | reading anything, and that's bad because you should verify
           | your backups.
           | 
           | Also, tape is slow. The MB/s is pretty nice on the latest
           | tech, but a tape is pretty big, so if you have a lot of stuff
           | it'll take a good while. Google says it takes 9.25 hours to
           | write a full LTO8 12TB tape. Which means that if you have a
           | sizable backup, in case of needing a full restore you might
           | well spend a whole week reading tapes.
           | 
           | And that's not accounting for that something might suddenly
           | break, and the time where that becomes important is right
           | when you need something restored urgently.
        
       | connorgutman wrote:
       | I recently purchased a LTO-5 drive for my Gentoo-based NAS and
       | have a few key takeaways for those who are interested. Don't buy
       | a HP tape drive if you want to use LTFS on Linux! HPE Library &
       | Tape Tools is prety much dead on modern Linux. Official support
       | is only for RHE 7.x and a few versions of Suse. Building from
       | source is a dependency nightmare that will leave you pulling
       | hair. IBM drives have much better Linux support thanks to
       | https://github.com/LinearTapeFileSystem/ltfs. That being said,
       | IMO, you should consider ditching LTFS for good ol' TAR! It's
       | been battle tested since 1979 and can be installed on basically
       | anything. TAR is easy to use, well documented, and makes way more
       | sense for linear filesystems. While drag&drop is nice and all, it
       | really does not make sense for linear storage.
        
         | smackeyacky wrote:
         | Upvote for tar! LTFS seems like an overly complex solution to a
         | relatively simple problem that tar already solved. Treating
         | tapes like disks and trying to run a file system on them
         | ignores the way they work.
        
       | wazoox wrote:
       | Hum, that reminds me that I've written a somewhat more complete
       | user guide for LTO tapes, but in French:
       | https://blogs.intellique.com/tech/2021/08/20#BandesCLI
       | 
       | Let me know if you'd like an English version :)
        
         | wazoox wrote:
         | I did it anyway:
         | http://blogs.intellique.com/tech/2022/01/27#TapeCLI
        
       | cbm-vic-20 wrote:
       | > Unlike most block devices these are devices that do not enjoy
       | seeking of any kind.
       | 
       | Old-school DECtapes were actually random-access, seekable block
       | devices! They help 578 blocks of data, each block being 512 bytes
       | (or to be more period correct, 256 16 bit words), so 144kiB. They
       | could be read/written in both directions. When mounted on a tape
       | drive, the OS (like DEC RT-11) would treat it just like how a PC
       | DOS computer treats a floppy: you could get a directly listing,
       | work with files, etc. The random access nature caused the tape to
       | move quickly back and forth across the tape head, a process known
       | as "shoe shining".
       | 
       | https://youtu.be/ZGBS8mBAfYo?t=579
        
         | rbanffy wrote:
         | I've seen AIX being installed from a DDS tape, after booting
         | from said tape.
         | 
         | Fun times.
        
         | StillBored wrote:
         | Tape can do random seeks, but its generally append only. LTO,
         | though supports partitioning which is utilized by LTFS
         | (https://www.lto.org/linear-tape-file-system/) to provide a
         | mountable filesystem abstraction. It works just like any other
         | filesystem, but one has to remember that seeks are much slower
         | than HDs and that overwriting/updating a file basically is like
         | a versioned FS where the old data is still being stored.
         | 
         | Edit: Also, tape formats tend to come in two scan methods since
         | they are generally wider than the tape heads (which frequently
         | are actually multiple heads). Helical scan (think VHS/DAT) and
         | serpentine. LTO is serpentine which means it writes a track
         | from beginning to end, then writes the next track in "reverse"
         | from end to beginning, then the next track again from beginning
         | to end. Back and forth until it hits its track limit.
         | 
         | So basically just about every modern drive reads and writes in
         | both forward and reverse.
         | 
         | Although shoe shining (backing up to start the next read/write)
         | is still a thing despite variable speed drives which try to
         | speed match to the data rate the host is reading/writing at.
        
         | EvanAnderson wrote:
         | This makes me think about the Stringy Floppy:
         | https://en.wikipedia.org/wiki/Exatron_Stringy_Floppy
        
         | tssva wrote:
         | The Coleco Adam home computer also had tape drives which were
         | random-access seekable block devices. 2 tracks with 128 1k
         | blocks per track for a total capacity of 256k. Coleco called
         | their tapes digital data packs. They were standard compact
         | cassette tapes with some additional holes. If you drilled the
         | appropriate holes you could use standard tapes instead of
         | paying the Coleco premium.
         | 
         | CP/M required booting from a block device and as far as I know
         | the Coleco Adam was the only computer which could boot CP/M
         | from a tape. Once booted to CP/M the tape drives were treated
         | just as floppies.
        
       | tlamponi wrote:
       | Interesting read, as with most of ben's blog.. And yeah,
       | buffering is definitively required to get acceptable speed out of
       | tape tech.
       | 
       | If you want a LTO Tape solution with more bells and whistles you
       | could check out Proxmox Backup Server's tape support:
       | 
       | https://pbs.proxmox.com/docs/tape-backup.html
       | 
       | We also rewrote mt and mtx (for robots/changers) in rust, well
       | the relevant parts:
       | 
       | https://pbs.proxmox.com/docs/command-syntax.html#pmt
       | 
       | https://pbs.proxmox.com/docs/command-syntax.html#pmtx
       | 
       | The introduction/main feature section of the docs contain more
       | info, if you're interested:
       | https://pbs.proxmox.com/docs/introduction.html If you have your
       | non-Linux workload contained in VMs and maybe even already use
       | Proxmox VE for that it's really covering safe and painless self-
       | hosted backup needs.
       | 
       | Disclaimer: I work there, but our projects are 100% open source,
       | available under the AGPLv3: https://git.proxmox.com/
        
         | azalemeth wrote:
         | Do you run a service where I can give you data and reasonable
         | money, and you store it on tapes for me? Low cost cloud storage
         | prices seem very distant from this, because presumably it's
         | usually spinning rust and not tapes that are doing the storage.
         | I'd be into a cheaper, larger storage service where this was
         | offered.
        
           | tlamponi wrote:
           | No, we don't provide hosting services - only the software,
           | i.e., Proxmox VE for Hypervisor (VM and Linux container),
           | clustering, hyper-converged storage (Ceph, ZFS integrated
           | directly and most Linux stuff somewhat too, then Proxmox
           | Backup Server with PVE integration, can do duplicated and
           | incremental sending of backups and save that to any Linux FS
           | or, well, LTO Tapes, at last (at least currently, we got more
           | up the pipeline) there's Proxmox Mail Gateway, the oldest
           | project and a bit of a niche, but there's not much else like
           | it available today anymore.
           | 
           | > and you store it on tapes for me?
           | 
           | I mean, we can do client-side encryption and efficient remote
           | syncs, so such a service would be possible to pull of with
           | PBS, but no, we don't got the bunker or dungeon to shelve all
           | those LTO tapes at the moment :-)
        
           | Johnny555 wrote:
           | What is reasonable money? AWS Glacier Deep Archive is around
           | $1/TB/month. Since it includes Multi-AZ replication for
           | "free", you'd have to store multiple tapes in multiple
           | facilities to get the same durability with tapes.
           | 
           | Retrieval costs are additional of course, and depend on how
           | quickly you need access to the data, but if you just want to
           | store data long term in case of disaster, $1/TB for multi-AZ
           | replicated data seems like pretty reasonable pricing.
           | 
           | LTO-6 tapes hold 2.5TB of data (uncompressed), assuming you
           | store 2 for redundancy, you'd need to find a place that will
           | store them for $1.25/tape/month to break even, plus you're
           | paying $25 for the tape itself, so over 3 years, that's
           | almost another $1/month/tape. Plus the tape drive itself is
           | around $1500.
           | 
           | You can use newer tape technology for better economies of
           | scale, but your buy-in cost is higher due to the higher price
           | of the tape drive, so you'd need a pretty high volume of data
           | to break even.
        
             | terafo wrote:
             | Glacier cost in the cheapest region is 3.6$/TB/month. Plus
             | at least 50$ to download that terabyte once it's needed(if
             | hardware that you're backing up is not in AWS), and I don't
             | even factor in retrieval costs. You can get HDD storage
             | cheaper than this(twice as cheap with some providers) if
             | you are willing to use dedicated servers. And they come
             | with unlimited traffic. And you can use hardware there for
             | something. Glacier is expensive AF.
        
               | Johnny555 wrote:
               | _S3 Glacier Deep Archive_ is the closest equivalent to
               | off site tape storage.
               | 
               | From their pricing page:
               | 
               | S3 Glacier Deep Archive - For long-term data archiving
               | that is accessed once or twice in a year and can be
               | restored within 12 hours - us-east-2 (Ohio)
               | 
               | All Storage / Month $0.00099 per GB
               | 
               | https://aws.amazon.com/s3/pricing/
        
               | terafo wrote:
               | Sorry, I confused it with Archive access tier. Still, you
               | need to spend at least 50$ to download it from AWS.
        
               | Johnny555 wrote:
               | This is deep archive offsite tape storage, not something
               | you'd need to restore often.
               | 
               | When I last managed offsite tape backups, I never planned
               | on really needing to retrieve the data -- I had the data
               | on disk and on the most recent tapes. (I did do periodic
               | restore tests)
               | 
               | If I had to restore the data, I wouldn't care how much it
               | costs (within reason).
        
             | [deleted]
        
         | aperrien wrote:
         | That is really impressive! Are the Proxmox tape utilities
         | separate from Proxmox itself? I have a Synology NAS that I'd
         | like to back up to tape. I actually have a tape library, but I
         | haven't seen anything that looks like a simple solution for
         | this until now.
        
           | tlamponi wrote:
           | Well, the CLI tools are not really couple to Proxmox Backup
           | Server and could be built for most somewhat modern Linux
           | distros, quite possibly also other *nix like systems.
           | 
           | The whole tape management is in the common PBS API, so that'd
           | be a bit harder to port but not impossible. For example, I
           | made some effort to get all compile on AARCH64 (arm) and
           | while we do not officially support that currently there are
           | some community members that run it just fine.
           | 
           | So, maybe, but could require a bit more hands-on approach. If
           | you run into trouble you could post in the community forum
           | (<https://forum.proxmox.com>).
        
       | epilys wrote:
       | What I never see explained, is what exactly PCI cards should I
       | get to get the full sized SAS drive to work with my desktop PC?
       | Because looking at server component stores, I see there digit
       | prices for SAS controllers, and the author mentions they are
       | cheap.
        
         | c0l0 wrote:
         | My advice: On eBay (or any other platform that makes it easy to
         | buy used hardware components), go look for "sas2008 4e", and
         | check out the offers. You should be able to get a decent HBA
         | driven by mpt2sas/mpt3sas for around 40 to 60US$.
        
       | albertzeyer wrote:
       | I'm interested specifically for long term archiving. So these
       | tapes claim 30 years. I have read that some types of CD, DVD or
       | Blue-rays can last much longer.
       | 
       | https://superuser.com/a/71239/37009
       | 
       | For example the M-DISC (https://en.wikipedia.org/wiki/M-DISC).
       | 
       | > Millenniata claims that properly stored M-DISC DVD recordings
       | will last 1000 years.
        
         | buttonpusher wrote:
         | Yes, but storing many TBs on several low volume disks is a PITA
         | unless you can invest in a robotic library.
         | 
         | I wonder if Sony's ODA format could ever become more popular in
         | the consumer market. I've never heard anybody mention it
         | before.
         | 
         | Alternatively, I wonder if there could even be a "prosumer"
         | robotic library system for common optical disks, something like
         | a desktop archival data jukebox...
        
           | albertzeyer wrote:
           | There are many examples of cheap self-build robotic systems
           | (basically robotic CD changers). E.g.:
           | 
           | https://hackaday.com/tag/cd-changer/
           | 
           | http://hackalizer.com/jack-the-ripper-is-an-automated-diy-
           | di...
           | 
           | http://hackedgadgets.com/2006/06/07/cd-changing-lego-robot/
           | 
           | Yes, you definitely want sth like that. And further extend
           | it.
        
         | londons_explore wrote:
         | If I had a large amount of data I needed to archive long term
         | and cost effectively, I would archive it to 12 different medias
         | with an 4,8 erasure code, such that if any 4 of the 12 media
         | types are readable, then I can recover the data. I'd choose
         | media like a few types of hard disk (different vendors), DVD's,
         | SD cards, USB memory sticks, tapes.
         | 
         | I would then store those bits of media geographically and
         | politically distributed. And I'd store it with paper documents
         | describing the encoding, the file formats, the compression, any
         | encryption, etc. I'd also include a few physical computers (eg.
         | a raspberry pi or laptop) that has all necessary software to
         | read, decode, and display the data. Set it up to be usable by a
         | non-expert - in 1000 years time, there may be nobody who knows
         | how to use a shell or open a file!
         | 
         | And I'd have a 2nd copy of the whole lot on hard drives
         | connected to the internet for day to day serving of the data to
         | people who need to see it. All the stuff above is only needed
         | in case of organisational failure, war, civilisation collapse,
         | etc.
        
           | albertzeyer wrote:
           | That sounds all nice... but do you actually do that? I'm sure
           | you have some amount of data (maybe not so large) that you
           | want to backup long-term? As most of us do?
           | 
           | If you do that, I would really love to read some more details
           | on how you actually organize that.
        
             | raron wrote:
             | Maybe Github's "code vault" would be interesting for you:
             | https://github.com/github/archive-
             | program/blob/master/GUIDE....
             | https://archiveprogram.github.com/
        
             | londons_explore wrote:
             | My personal data I have no need to keep beyond my own
             | lifespan, and I don't have much of it, so it's easy.
             | 
             | The above is what I have set up for some organisations who
             | want to keep data for thousands of years.
             | 
             | There are other bits to the process, like every 10-30
             | years, repeat the process with the new data _and_ the old
             | data. This time, the  'old' data will be much smaller
             | compared to the storage mediums, so keep that data
             | uncompressed, preferably unencrypted, and un-erasure coded
             | in every geographic location. That removes many barriers to
             | access the data, and increases the chances someone that
             | finds it in 200 years bothers recovering the data.
             | 
             | Sadly in the future world there is a high chance some of
             | the data is copyright, illegal knowledge or gdpr-impacted
             | and all records need to be erased. There isn't really a
             | good solution to that. It's almost impossible to protect
             | against future humans _wanting_ your data gone.
        
         | c0balt wrote:
         | Iirc the cost per Tb, compared to tape, made discs unviable for
         | most backup/ archival applications.
        
           | paulmd wrote:
           | Depends on who's asking. Amazon Glacier never formally
           | disclosed their storage medium (at least as of a few years
           | ago) and one of the theories on what it might be was actually
           | a robotic optical disc changer library based on BD-XL, and
           | the cost/capacity actually does math out. Yeah, discs might
           | be $15 a pop (for a quad layer/128GB disc) for you as a
           | consumer, but when you're Amazon and you'll be buying the
           | complete output of at least one optical disc factory, the
           | economies of scale kick in. It's just expensive because
           | there's no market for 128GB media for consumers (and honestly
           | these days hardly any market for WORM media at all as a
           | consumer), it's not inherently that expensive to make the
           | discs.
           | 
           | (I believe the final consensus pointed to arrays of HDDs
           | where most of them are powered off, and the number of "live"
           | drives per rack is bounded to allow high density/low cost,
           | hence the need for access time/service level bounds, but the
           | BD-XL idea is still intriguing!)
           | 
           | With the consumer discs, even considering cost per GB, the
           | amount of effort required to handle a large library of low-
           | capacity discs is just too great even if the cost is a little
           | bit better. 128GB discs would have been very usable 5 years
           | ago but again, those discs were never affordable to
           | consumers, and the 25GB was still some effort at that time.
           | Today even 128GB is not all that much, as data has grown. As
           | far as I know there is nothing realistic on the horizon to
           | replace blu-ray with higher capacity either, if movie content
           | started being released in 8K it probably would be something
           | like BD-XL with AV1 encoding (or maybe H265 again), not a
           | fundamentally new iteration like DVD->BD.
           | 
           | The future for consumer storage seems to be SSDs and hard
           | drives for fast and slow/bulk storage, and cloud for nearline
           | storage. Tape is still relevant for enterprises though
           | especially in automatic libraries.
        
             | c0balt wrote:
             | Interesting, i didn't know about glacier.
             | 
             | The theory of hard drives being shut off/ powered on
             | dynamically in a rack sounds intriguing. Sounds simple and
             | yet difficult because of the rare usecase, i.e. no
             | commodity hardware available. Maybe something to test out
             | for colo backups to keep power usage down and prolong disk
             | health.
        
           | numpad0 wrote:
           | Capacity per disc too. Blu-Ray discs tops at 125GB and
           | there's no cheap and easy way to automate disc handling to
           | work around that.
        
         | at_a_remove wrote:
         | I am also interested in some long-term archiving: in
         | particular, .ISOs of various Blu-Ray, DVD, and CD media
         | releases.
         | 
         | Still, aside from it being prohibitively expensive (LTO-8 seems
         | like something of a floor given the size of Blu-Rays), tape
         | backups seem to be a hard area to get into. I did some crappy
         | little DLTs in the 1990s but nothing since, so "what software?"
         | and the like questions are all new to me. And this would be
         | with just a single drive, not even a library.
        
         | Robotbeat wrote:
         | Magneto-optical disks using glass media (instead of plastic)
         | have a rated stable media lifetime of at least 50 years and can
         | probably last a century or longer. Glass DVDs are a thing and
         | often can be read in regular DVD drives.
        
           | dehrmann wrote:
           | > Magneto-optical disks using glass media
           | 
           | Are there any commercial products that use this technology?
        
             | eternityforest wrote:
             | Not magneto, but m-disk makes a blu ray that lasts 1k years
             | supposedly. Some people don't trust it though.
        
             | EvanAnderson wrote:
             | I believe mag-op has fallen out of fashion. I worked with
             | HP-branded mag-op "platters" and drives back in the late
             | 2000's. Plasmon and Sony both had offerings in that space
             | too.
        
       | c0l0 wrote:
       | I recently started looking into using an LTO-7 tape drive that I
       | got handed down, along with a few dozens of pristine LOT-6 tapes,
       | for archiving purposes. I got to play around a bit with SAS HBAs,
       | and was kinda shocked how much of a difference that can make in
       | the user (or shall I say sysadmin?) experience: LTO-6 tapes are
       | spec'd to transfer rates of around 150MB/s, so well within the
       | reach of even the the first SAS gear generation. However, the
       | very first SAS HBA with external SFF-8088 connector I managed to
       | get my hands on (an LSI SAS1068e) topped out at a disappointing
       | 80MiB/s, no matter what I tried in terms of blocking and
       | buffering. Switching to a more modern (but still old) LSI
       | SAS2008-based HBA got me close to the theoretical maximum.
       | 
       | Then there's the (to me, still open) question of how to best use
       | the actual tape storage capactiy... Since my hardware is newer
       | than LTO-5, LTFS (https://github.com/LinearTapeFileSystem/ltfs)
       | is an option for convenient access, especially listing tape
       | contents, but that could make it hard for other people down the
       | line to restore data from the tapes I create.
       | 
       | It's probably safest to assume that tar will always be there, at
       | least wherever there's tape, too. GNU tar also handles multi-
       | volume/-tape archives, which seems like a necessity if you need
       | to back up amounts of data that exceed a single tape's capacity.
       | Then again, if you want to use encryption with actual tar
       | (important for the kind of data I need to archive), your only
       | option seems to be piping the whole archive through something to
       | compress the stream, which will make accessing individual records
       | in the archive opaque to the drive itself... and you can't just
       | dispose of individual keys to make select parts of the archived
       | data go away for good, either.
       | 
       | Also, I would like to conserve as much tape as (conveniently)
       | possible in my archiving adventure. There's "projects" (i.e.,
       | top-level directories of directory trees) that consume more than
       | one tape of their own, and then there's smaller projects that you
       | can bin-pack together onto tapes that can fit more than one such
       | project.
       | 
       | I've started implementing a small python wrapper around GNU tar
       | to solve a number of these problems by bin-packing projects into
       | "tape slots" and also keeping track of tape-to-file mappings in a
       | small sqlite database, but a workable solution for the encryption
       | problem(s) is not something I managed to come up with yet... If
       | someone has an idea (or better yet, a complete and free
       | implementation of what I am trying to hack together :)), please
       | be so kind and let me know!
        
         | TheCondor wrote:
         | LTFS has been reasonably well supported and it's fairly open.
         | (I think it's totally open and published, but I haven't drilled
         | deep in to it) I haven't manually restored files but I have
         | switched vendors and it was transparent. It makes tape almost
         | shockingly good, if you can identify by name what you want to
         | recover you can recover it quite quickly.
         | 
         | I had previously used Blu-ray for backups, I think they are
         | fairly durable if you have a dry, cool place to store them, but
         | if you have to find date spread over 20 discs, it's quite a
         | pain. Now it would feel better if Redhat or Suse or somebody
         | cooked ltfs in to their products as a first class thing. I
         | think the catastrophic recovery process would involve building
         | enough of a system to download and install ltfs to access the
         | tapes. I could also create a "recovery system" and then just
         | tar that on to a tape too.
         | 
         | My advice strategy has been to keep things relatively warm and
         | when ltfs starts to feel like a liability then I'm going to
         | move the whole archive to something else, fortunately it's not
         | 100s of br discs, it's tens of tapes so it will take some hours
         | but it's mostly waiting on data to stream.
        
           | [deleted]
        
         | ndespres wrote:
         | I think what you are describing is a feature of the Amanda
         | backup system, which might be worth a look. It supports writing
         | to a library of "virtual tapes" which can then be backed by
         | real tapes, tape libraries, hard disks, etc. and will handle
         | the splitting/overflow problem that you are dealing with.
         | 
         | https://www.zmanda.com/downloads/
        
       | abbbi wrote:
       | if one wants to play with virtual tape libraries, quadstorvtl is
       | a nice solution to that:
       | 
       | https://quadstor.com/
       | 
       | unfortunately they dont seem to have an open vcs for the
       | source... (other than really old versions on github)
       | 
       | other than that there is mhvtl:
       | 
       | http://www.mhvtl.com
        
       | Synaesthesia wrote:
       | That's a lot of storage! Can't really think of a use for this
       | (200tb plus) personally but it is appealing.
        
         | throw0101a wrote:
         | Tape makes more sense the larger you go, as it help amortize
         | the fixed/upfront costs. The incremental costs of buying more
         | tapes (that are re-usable) isn't that much at scale. It's often
         | relatively cheap insurance against data loss for many
         | organizations.
         | 
         | A lot of 'enterprise' backup software is also now coming with
         | hooks into cloud storage (e.g., S3 APIs), but then you have to
         | worry about bandwidth and the time it takes to get the bits
         | offsite at "x" bits/second.
         | 
         | Of course you also have to worry about retrieving the data in
         | case of disaster per the Recovery Time Objective:
         | 
         | *
         | https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Tim...
         | 
         | Also: a backup has not happened until you try and succeed your
         | recovery process.
        
           | metabagel wrote:
           | > but then you have to worry about bandwidth and the time it
           | takes to get the bits offsite at "x" bits/second.
           | 
           | Reminds me of the saying that the fastest throughput is
           | achieved by a 747 full of hard drives.
           | 
           | > Also: a backup has not happened until you try and succeed
           | your recovery process.
           | 
           | A thousand times this.
        
         | simcop2387 wrote:
         | For me it's a lot about just being a data hoarder and never
         | _having_ to delete something because i 'm low on storage. About
         | half of my system though is taken up by system backups and
         | virtual machines. I should do a cleanup of those, but the
         | freedom of just being able to spin up something new or put a
         | new backup on there without ever going, "do i have enough space
         | for this?" is rather nice.
        
           | organsnyder wrote:
           | I also rarely/never delete anything, but my ~2tb NAS still
           | has plenty of room. I guess it makes a difference that the
           | only media I store is my own photos and videos.
        
       | Spooky23 wrote:
       | Backups are a really interesting business. I helped out a
       | colleague a few years ago with a project in a big data center and
       | it was like a whole world that nobody knew existed.
       | 
       | Because of the RTOs and backup windows, the supporting
       | infrastructure was _fast_. The caching layer stuff was the
       | fastest disk in the data center by far, and the team was a small,
       | tight group of people who basically honed their craft by meeting
       | auditor and other requirements. The management left them alone
       | and they did their thing.
       | 
       | That was about a decade ago now; those guys have all moved on to
       | really big things.
        
         | StillBored wrote:
         | Its still that way, the netflix guys get a lot of press for
         | their bandwidth numbers but plenty of backup systems were
         | getting similar (or greater) bandwidth numbers years ago, since
         | many of the caching stacks are basically pcie or mem bandwidth
         | limited. The 300MB/sec number the author lists is really slow,
         | and likely appropriate for LTO3/4, (IIRC, the wikipedia numbers
         | are understated) LTO7+ can peak at > 1GB/sec with the modern
         | drives going even faster if the compression is left enabled.
         | So, given a library with a few dozen drives, the bandwidth gets
         | insane. (ex: SL8500)
        
           | trasz wrote:
           | Someone should ask Spectra Logic folks for their numbers :-)
           | 
           | (Spectra Logic's tape libraries run FreeBSD too.)
        
             | monocasa wrote:
             | SpectraLogic's code isn't in the data plane, you hook up to
             | the drives directly, and the drives can forward changer
             | requests to the internals of the library. So it's however
             | fast the drives are (which are all third party).
             | 
             | Also last I checked freebsd was used for their disk
             | product, not tape.
        
       | johnklos wrote:
       | LTO has been around for more than twenty years, true, but not
       | quite thirty, so we can't test the claim of thirty years of shelf
       | life, but DLT, which are surprisingly similar, came out in 1984,
       | and lots of thirty year old and older DLT media has been shown to
       | be readable.
       | 
       | The tape drives themselves are much more of an issue than the
       | tapes. It's a shame, because it necessitates moving data on older
       | tapes to newer generation tapes after a few generations (which
       | reminds me I have to do that with some LTO-3 tapes).
        
         | wheybags wrote:
         | My one experience of digital magnetic tape is mini-dv
         | cassettes. I recently ripped a bunch of old home videos from
         | some cassettes from the 2000s, and quite a few were fairly
         | damaged. Compared to the vhses from the same time and even
         | older, they were way worse.
        
         | jgrahamc wrote:
         | Speaking of tape lifetimes, my old cassette CrO2 tapes seem to
         | have survived my parents' house:
         | https://blog.jgc.org/2009/08/in-which-i-switch-on-30-year-ol...
        
       | grapescheesee wrote:
       | Many clients I have seen using tapes for archive or onsite backup
       | keep them in a humidity and temperature controlled device (looks
       | like a mini fridge). Seems the emphasis is on humidity for the
       | onsite backup rotations.
        
       | watersb wrote:
       | Everyone who cares about backups chooses a backup system design.
       | 
       | Anyone who cares about their stuff needs to practice a full
       | emergency RESTORE.
       | 
       | I have met very few people who actually do that. For most systems
       | I've seen, the first full test of the restore process is a very
       | scary first production usage of the restore process.
       | 
       | Which is very exciting, sure. I don't want excitement in my data
       | management life.
       | 
       | (I actually see weekly test of onsite backup power at the local
       | banks, and at some large commercial kitchens. Those diesel
       | generators are very loud. I've never seen systematic test of UPS
       | or generators in a front-office environment.)
        
       | smackeyacky wrote:
       | There is one recommendation there I find a bit questionable and
       | thats encryption. If you are out of options and restoring from
       | tape, might be better to have it uncompressed and not encrypted.
       | Its possible, after some physical disaster that you are on
       | somebody elses infrastructure and having some encryption on your
       | data doubles the problems you might have.
       | 
       | I use an ancient LTO2 drive for last resort backups that are off
       | cloud and off premises. Its more peace of mind than practical on
       | a daily basis but I did find myself restoring a few files a
       | couple of weeks ago as I had fat fingered an rm command. It was
       | quicker than getting them from S3 glacier.
        
         | El_RIDO wrote:
         | I'd like to suggest two arguments that made me use software
         | encryption on my tapes instead: 1. You don't have to trust the
         | hardware and can use tool I trust and have the sources for. 2.
         | If you encrypt yourself you can combine it with something like
         | par2 to generate error detection and recovery data, letting you
         | restore the encrypted file off a damaged tape.
         | 
         | A downside of encrypting yourself is that you can't benefit
         | from the hardware compression either, hence the articles
         | suggestion to do that in software before compressing as well.
         | 
         | Personally, my tape writing workflow is: dar (per file
         | compression, skips uncompressable mime types + encryption)
         | followed by par2cmdline with 30% redundancy. For comparison:
         | CD-ROMs have 33% redundancy information (8 bits per 24 bits,
         | CIRC encoding).
        
         | op00to wrote:
         | The tapes compress themselves. There's no real need for file
         | compression.
        
         | benjojo12 wrote:
         | I agree somewhat. Encryption is more critical on tape because
         | there is no easy path to wiping a tape, and in a company
         | situation if you need to erase something in your backups too
         | (think GDPR erasure), then encryption is reasonably critical
         | unless you want to go though all of your cold backups.
         | 
         | For my archival use (the reason why I got into this in the
         | first place) I do not encrypt nor compress the data going to
         | tape. For server/desktop backups. they are compressed and
         | encrypted.
        
           | rowanG077 wrote:
           | It's trivial to wipe data on a tape with a degausser. You
           | destroy the Tape in the process since it also wipes out
           | factory written servo tracks.
        
             | kortex wrote:
             | Is there a way to restore the servo tracks? This sounds
             | like the kind of hack a dedicated nerd could pull off with
             | an arduino and duct tape.
        
               | ansible wrote:
               | Without looking into the specs, at the very least, you'd
               | need to modify the LTO drive firmware. The drive itself
               | isn't designed to operate without the servo tracks. Those
               | are written to newly-manufactured tapes with special
               | equipment at the factory.
               | 
               | So, it would take a very dedicated nerd indeed.
        
               | rowanG077 wrote:
               | Not that I know off. But the positioning on recent Gen
               | LTOs is pretty tight. I don't think it's out of the realm
               | of possibility for a dedicated nerd but it won't be
               | trivial.
        
           | throw0101a wrote:
           | > [...] _nor compress the data going to tape._
           | 
           | Just to note that tape drives have built-in compression that
           | generally is done transparently in the background. So while
           | using something like _zstd_ (per the article) may get more
           | bits on a given tape, there is some compression that one gets
           | "for free" without doing anything at all.
           | 
           | * https://en.wikipedia.org/wiki/Linear_Tape-
           | Open#Optional_tech...
           | 
           | * https://en.wikipedia.org/wiki/Magnetic_tape_data_storage#Da
           | t...
        
             | benjojo12 wrote:
             | I mention this in the post itself
        
               | lights0123 wrote:
               | You mention that they're advertised in the amount of
               | compressed data that can be stored, not that they
               | actually compress data themselves. I thought you meant
               | that they assume you use a compression algorithm
               | yourself.
        
               | benjojo12 wrote:
               | Ah, ok fair enough! I should have pointed that out more
               | clearly!
        
               | throw0101a wrote:
               | You wrote:
               | 
               | > _Drives above LTO-4 have built-in hardware encryption,
               | however I would steer away from using it and instead just
               | encrypt data yourself (possibly with the tool I helped
               | make called age!). Like most things, you should also
               | consider compressing your data before encrypting and
               | writing it to tape. LTO tape capacities are often quoted
               | in their "compressed capacity" which is a little cheeky
               | since it assumes basically over a 50% compression ratio,
               | this is not at all likely to be true if you are writing
               | video or other lossy mediums like images etc to the tape.
               | I generally run my data through zstd to compress and then
               | age to encrypt. Zstd and age are quite fast and I've not
               | found them to impede performance noticeably._
               | 
               | If someone is not familiar with tape drives, I think it
               | would be easy not to realize that the compression is
               | built into drives like the explicitly called out "built-
               | in hardware encryption".
        
           | lostapathy wrote:
           | > Encryption is more critical on tape because there is no
           | easy path to wiping a tape.
           | 
           | I used to work for a government agency. We ran backup tapes
           | that rotated out through a degaussing machine that spun them
           | around for like 10 minutes to wipe them. It's not common to
           | have, but it's definitely easy.
        
       | amelius wrote:
       | Would love to see an article of someone taking a drive apart, and
       | hooking an oscilloscope to the read head of a tape drive.
        
       | dmitrybrant wrote:
       | Funny, I just recently did a similar thing: found an LTO-4 tape
       | drive on eBay for $40, and a few used cartridges (2TB each) for
       | $20.
       | 
       | But before writing my backup to the cartridges, I tried reading
       | their contents, and found that they actually came from a major
       | film studio, with backups of raw animated film content on them!
        
         | paulmd wrote:
         | one thing to emphasize is that the quoted LTO capacity numbers
         | are usually including transparent device compression - if your
         | data is not compressible, such as ZIP/RAR files or compressed
         | audio/video, that's not the number you will get!
         | 
         | Home users will really want to think in terms of the "raw
         | capacity" imo. This is normally half of the advertised capacity
         | for the older standards (I believe the newer ones have stronger
         | compression that squeezes a bit more). LTO-5 tapes are 1.5tb
         | raw, for example.
         | 
         | Maybe you'll get a little bit out of it, but a lot of the
         | things you'd want to back up (and especially the bulkier stuff
         | that really eats space) are already compressed. Family photo
         | library, audio/video storage? JPGs are compressed, H264/H265 or
         | MP3/FLAC/etc are already compressed. System images? A lot of
         | application files are already compressed. Home user scenarios
         | are not outlook mailboxes and database backups like the
         | "official" scenarios.
        
           | nybble41 wrote:
           | > Home users will really want to think in terms of the "raw
           | capacity" imo.
           | 
           |  _Everyone_ would be better off thinking in terms of the raw
           | capacity.  "Compressed capacity" is nothing but a marketing
           | gimmick. Even in enterprise use cases the compression ratios
           | will vary, and the drive's transparent compression is
           | unlikely to offer the most savings. If your data is at all
           | compressible you should compress the backup yourself before
           | sending it to the drive.
        
             | dark-star wrote:
             | It actually works pretty well. Compression in the tape
             | drives is certainly worse than what you could achieve by
             | zipping before, but at least it works at line speed (which
             | is a couple hundred megabytes per second). Factor in the
             | fact that you often write out multiple streams in parallel
             | from a single server to multiple tapes, and it'll become
             | rather tricky to find a compression algorithm that keeps up
             | AND compresses better than the drive.
             | 
             | And most enterprises don't really care if their monthly
             | backup requires 10 or 15 tapes. And zipping it all up
             | beforehand requires even more space on the primary storage
             | which is even more expensive than a couple dozen tapes
        
               | nybble41 wrote:
               | It's still misleading to market the tapes based on a
               | compression factor which will depend in practice on the
               | data being stored. The _tape 's_ capacity is one thing;
               | the effectiveness of the _drive 's_ hardware-accelerated
               | compression algorithm on any given dataset is something
               | else entirely. The two should not be mixed.
        
         | NavinF wrote:
         | I was looking into the same thing recently. The price is right
         | ($10/TB tape vs $13/TB HDD) and it'd be nice to have fewer HBAs
         | and SAS cables, but having to swap the tapes manually every 2TB
         | (every 6 hours?) kinda ruins it for me. An automatic tape
         | library would be ideal, but I couldn't find any in the 100TB
         | range that are cheaper than spinning rust.
        
           | numpad0 wrote:
           | I have a 2U sized LTO2 robot that might have collapsed from
           | stuffs on top by now, but it seemed to have a standard 5" bay
           | drive inside with a passthrough adapter marshaling the drive
           | and the loader mechanism. I wonder if a more recent drive can
           | just be dropped into those libraries or if they need firmware
           | supports.
        
         | ChuckNorris89 wrote:
         | Damn, that's cool. I wish the second hand market in my country
         | was abundant with cheap exotic hardware. Then again, maybe not,
         | because I'd probably fill my small apartment from hoarding
         | stuff like this.
         | 
         | Still did you try to recover any material and wach it?
        
       | dragontamer wrote:
       | I've come to the understanding that tape-drives are for people
       | who need to "build a custom-sized storage solution", especially
       | if you need capacity but not necessarily read/write speeds.
       | 
       | A tape-drive is your read-heads. The tape is like a platter. The
       | tape-library / jukebox is just a robotic mechanism for switching
       | tapes into and/or out of the read-head.
       | 
       | ----------
       | 
       | If you need a Petabyte of uncompressed storage, you can reach it
       | with a tape-library consisting of 84 LTO8 tapes (12TB each). If
       | read/write of 400MB/s is sufficient, one tape drive is
       | sufficient. If you need faster access speeds, you buy a 2nd, 3rd,
       | or 4th tape drive.
       | 
       | So lets say you need 2GB/s read/write speed and a petabyte of
       | storage. You simply get 4x LTO 8 drives, 84 LTO8 tapes, and stick
       | them into a tape library of some kind.
       | 
       | You then buy a certain amount of SSDs + HDDs sufficient for
       | caching, so that you can read/write to this tape library at
       | sufficient speeds (especially since it could be many minutes
       | before a specific byte is accessed).
        
       | kragen wrote:
       | Hey uh
       | 
       | is that a DECTape?
        
         | benjojo12 wrote:
         | The header image is the insides of a LTO 5 tape
        
       | watersb wrote:
       | I use a cloud storage provider to back up via Arq
       | https://arqbackup.com
       | 
       | But I don't expect to restore more than a few gigabytes at a time
       | from that.
       | 
       | It would take me a week or more to download a terabyte of data. I
       | have very little power over internet connection speed, and there
       | are very few alternatives here. I believe there are two different
       | vendors providing connectivity to our town, and you can pick
       | between four retail resellers.
       | 
       | With those limitations, I have tested a full restore process
       | exactly once. That's not good enough.
       | 
       | Data at rest on LTO or offline hard disk is something I can
       | control. Distributed offsite storage, too. Restore within 12
       | hours, I can do that.
       | 
       | The downside to tape or cold disk is more in the management of
       | hourly/daily/weekly backups: you have to provision a media
       | rotation schedule, whereas that's sort of built into an online
       | cloud storage service.
        
       | robohoe wrote:
       | I cut my sysadmin teeth doing tape work in early 2000s. It was
       | quite fun but I don't miss changing tapes and ensuring that the
       | FC tape loader library properly labeled them.
        
       | PaulHoule wrote:
       | I notice that he talks alot about dealing with malfunctioning
       | drives and malfunctioning tapes.
       | 
       | That is my experience too. There is that time I got kicked out of
       | the computer lab as an undergraduate because I'd created a number
       | of newsgroups and they 'wrote' all my files... to what turned out
       | to be an empty SunTape. That time I tried to recover a
       | configuration file from an IBM tape robot and it took 14 hours.
       | When I was successful with tape I always did a lot of practicing
       | and testing. A sysadmin who taught me a lot (esp. how to get
       | things done in a place where you need 'social engineering' to get
       | things done) told me "you don't have a backup plan until you've
       | tested it" and many people learned that the hard way.
        
         | ansible wrote:
         | > _" you don't have a backup plan until you've tested it"_
         | 
         | Yep. Though that's what makes small-shop disk-to-disk backups
         | easy, depending on the backup software used.
         | 
         | We use rsnapshot, which uses rsync and "cp -l" to make backups.
         | So restoring is as easy as using cd to go into the appropriate
         | directory and copying out the files. No special utilities
         | needed. Yes, we encrypt the backup drives using cryptfs / LUKS.
        
       ___________________________________________________________________
       (page generated 2022-01-27 23:00 UTC)