[HN Gopher] What every programmer should know about solid-state ...
       ___________________________________________________________________
        
       What every programmer should know about solid-state drives (2014)
        
       Author : fagnerbrack
       Score  : 122 points
       Date   : 2022-05-23 07:21 UTC (1 days ago)
        
 (HTM) web link (codecapsule.com)
 (TXT) w3m dump (codecapsule.com)
        
       | [deleted]
        
       | B1FF_PSUVM wrote:
       | This guy used to hammer a good point about databases:
       | 
       | "In a time of SSD, multi-core/processor, two terabyte memory and
       | Optane App Direct Mode machines, there is no reason not to build
       | from BCNF data. Time to do what Dr. Codd demonstrated. Technology
       | has finally caught up with the maths."
       | 
       | https://drcoddwasright.blogspot.com (skip the distractions)
        
       | NovemberWhiskey wrote:
       | I feel like maybe this is "what all filesystem developers should
       | now about solid-state drivers"; not very obvious how most other
       | developers would interact with a device at the level of
       | abstraction where they have the kind of necessary control.
        
       | amelius wrote:
       | From Wikipedia:
       | 
       | > In December 2012, Taiwanese engineers from Macronix revealed
       | their intention to announce at the 2012 IEEE International
       | Electron Devices Meeting that they had figured out how to improve
       | NAND flash storage read/write cycles from 10,000 to 100 million
       | cycles using a "self-healing" process that used a flash chip with
       | "onboard heaters that could anneal small groups of memory cells."
       | 
       | So can I apply this myself by placing an SSD drive in an oven?
        
       | nonrandomstring wrote:
       | From a low level programmatic standpoint, managing size and
       | alignment with (potentially unknown) page sizes throws the same
       | challenges as for AV buffers and network packet MTU/sizes -
       | either side of "just right" is suboptimal.
        
       | dang wrote:
       | Related:
       | 
       |  _What every programmer should know about solid-state drives_ -
       | https://news.ycombinator.com/item?id=9049630 - Feb 2015 (31
       | comments)
        
         | dtgriscom wrote:
         | A question: do you have a tool that searches the history for
         | previous links, or do you just have a really good memory?
        
       | user3939382 wrote:
       | What you should know is that I had an Apple OEM 1TB SSD in my
       | late-2013 MBP and one day it failed so catastrophically under
       | normal conditions that 2 of the best data recovery teams in the
       | world told me there was nothing they could do.
       | 
       | Backup your stuff
        
         | avgcorrection wrote:
         | Wow, you didn't have a backup routine. That's so basic. Why
         | not?
         | 
         | -
         | 
         | Oh, what my routine is? Uh. I `cp -a ~ /mnt/backup/date` a
         | couple of times a month.
         | 
         | ... Testing backups?
        
         | twofornone wrote:
         | Speaking about backing up...if one were interested in long term
         | archiving, do magnetic platters offer longer lasting data
         | integrity than SSDs in cold storage?
        
           | bombcar wrote:
           | In general I trust the older tech more than newer for long-
           | term archiving. So that would mean HDD (the oldest tech
           | thereof you can find still sold, probably) or tape or DVD
           | over SSD.
           | 
           | But multiple copies in multiple formats cannot hurt, and the
           | most important stuff should have multiple live copies.
        
             | unilynx wrote:
             | it really depends on the format. pressed DVDs will outlast
             | your VHS tapes
        
           | Melatonic wrote:
           | Not sure about that but I do know that the new sealed helium
           | filled drives are much harder to take apart and do backup
           | recovery on
        
           | UI_at_80x24 wrote:
           | >..do magnetic platters offer longer lasting data integrity
           | than SSDs in cold storage?
           | 
           | Yes. With an SSD the enemy is electron leakage. Minute
           | quantities of electrons trying to escape an unnatural state
           | and return to equilibrium. (yes, I just anthropomorphized
           | electrons.) Magnets however are more stable by nature. (yes
           | there is nothing natural about hard-drive storage. SMR doubly
           | so!)
           | 
           | Anecdote/anecdata: I have been able to retrieve full drives
           | worth of data off of drives that have sat in a cardboard box
           | for 10 years. I also have trouble accessing data on 1-year
           | old USB flash drives.
        
         | Dwedit wrote:
         | Get an 8TB backup drive (Costco has them really cheap), and run
         | Macrium Reflect to clone your HDD onto the backup drive.
         | Macrium Reflect makes use of Volume Shadow Copy, so you can
         | continue using your computer while it's backing things up.
         | 
         | Those big backup HDDs use shingled storage, so they're not any
         | good as general purpose hard drives, but they're excellent for
         | strictly sequential writes, such as a full disk backup to a
         | single file.
        
           | eli wrote:
           | Pair that with an online/remote backup and you're all set. I
           | like Backblaze because the software client is very good but
           | you could just as well push your own encrypted backup to S3
           | or a VPS.
        
             | thekrendal wrote:
             | You can also use BackBlaze B2 to push your own backups with
             | whatever software will support it, similarly to how you'd
             | use S3.
        
         | [deleted]
        
         | toast0 wrote:
         | From my experience, SSDs tend to just disappear from the bus
         | when they're done. If there's JTAG pins, maybe it's OEM
         | recoverable, but good luck. At least with spinning disks, they
         | usually have a media failure which often has warning signs.
         | Bearing failures are usually seized at startup and there are
         | ways to get them moving and then do a full dump. If the
         | electronics fail, often you can pull a board from a working
         | unit and attach it to the media and get good results. I don't
         | think it's reasonable to swap flash chips onto another board
         | (but maybe, I dunno?).
        
         | samatman wrote:
         | I'll admit my memories of 2013 are hazy, but I do recall TRIM
         | being an issue early in the Macbook's history+.
         | 
         | Backup your stuff! I happen to also back up to an SSD these
         | days, because the difference between minutes and hours is hard
         | to argue with.
         | 
         | +edit: history of shipping with an SSD standard, that is.
        
           | lostlogin wrote:
           | > because the difference between minutes and hours is hard to
           | argue with.
           | 
           | If the backups are incremental it shouldn't take hours.
        
             | Retric wrote:
             | Incremental backups are slightly higher risk.
        
             | samatman wrote:
             | For a given backup an SSD will be much faster, less
             | susceptible to drop and vibration damage, and pocketable
             | where a portable hard drive is pouchable at best.
        
       | mhh__ wrote:
       | Is anyone aware of a book-length equivalent of this?
        
         | the_only_law wrote:
         | Doubt it, books on niche technical subjects don't seem to be
         | much of a thing anymore unless you're willing to pay
         | extortionist prices for university textbooks.
        
           | wolverine876 wrote:
           | > books on niche technical subjects don't seem to be much of
           | a thing anymore
           | 
           | Why not? Blog posts aren't nearly as valuable.
        
             | SketchySeaBeast wrote:
             | I assume someone would be writing that book in the hope
             | they'd make money back and that's hard to do with a super
             | niche subject few will be interested in and even fewer
             | would be willing to pay for.
        
           | mhh__ wrote:
           | There is a book on DRAM, caches and hard drives by Bruce
           | Jacobs.
           | 
           | Basically I want what every programmer should know about
           | storage but in the style of dreppers original article.
        
       | wly_cdgr wrote:
       | How relevant is this in 2022? What's changed and what still
       | applies?
        
         | wolverine876 wrote:
         | A serious question: What has changed?
        
       | tester756 wrote:
       | are speeds of bleeding edge mem devices getting close to RAM?
        
         | Scene_Cast2 wrote:
         | In terms of bandwidth or latency? All conditions, worst case,
         | best case?
        
         | cogman10 wrote:
         | Not really.
         | 
         | Max throughput is around 6gbps with a fairly high latency. DDR5
         | has speeds of 52gbps, lower latency, AND your CPU will almost
         | undoubtedly have a cache on it to increase that speed further.
         | 
         | This is all assuming you are putting your mem device on a pci-
         | express bus.
        
           | KennyBlanken wrote:
           | > Max throughput is around 6gbps with a fairly high latency.
           | 
           | In the consumer market, a number of performance NVMe drives
           | will hit over 5GB/sec, which would be 40 Gbps.
           | 
           | The latency isn't anywhere near as good as even quite-old
           | RAM, but modern SSDs are considerably less than an order
           | magnitude off in transfer speed from even current, common ram
           | (DDR4) and "only" about a hundred times higher in latency
           | than RAM.
           | 
           | That's pretty stunning _from mass storage_. So is well over
           | 500,000 IOPS.
        
       | mmmpetrichor wrote:
       | If some typical write pattern from a typical app is wearing out
       | the SSD really fast, I'd say that's the SSD firmware engineer's
       | problem? And I think they've actually done a great job in
       | general, judging by the typical lifespan of SSDs and the
       | typically great performance. I'd argue that if the drive is
       | designed correctly, most programmers shouldn't have to care about
       | low level details. (I did say MOST).
        
         | Sakos wrote:
         | I think you misspelled "it's the user's problem". I don't think
         | most companies care until it becomes something that materially
         | affects them. Until then, users are reliant on the developers
         | of the applications they use to make up for the deficiencies in
         | lower layers.
        
       | dekhn wrote:
       | I treat ssds like faster hard drives and I have never been
       | disappointed.
        
       | tenebrisalietum wrote:
       | > Splitting cold and hot data as much as possible into separate
       | pages will make the job of the garbage collector easier.
       | 
       | How do I tell my SSD to write stuff to specific pages? You can't
       | really tell the SSD to do anything except read, write, or trim
       | LBAs.
       | 
       | Does NVMe support this with its queues?
       | 
       | > 27. Over-provisioning is useful for wear leveling and
       | performance
       | 
       | I thought most if not all SSDs were already overprovisioned. Does
       | additional overprovisioning help?
       | 
       | > To ensure that logical writes are truly aligned to the physical
       | memory, you must align the partition to the NAND-flash page size
       | of the drive.
       | 
       | I think this is false. This assumes there is a one-to-one mapping
       | of LBA to SSD PBA which you don't know. LBA 2048 could go to any
       | PBA on any page/block/flash line in the unit and as things are
       | written and rewritten, any correspondence that might happen due
       | to sequential assignment of PBAs->LBAs would gradually diminish,
       | IF you knew for sure that was happening in the first place.
       | Because you wouldn't really know what the SSD is doing without
       | reverse engineering or seeing the source code of firmware, unless
       | there's things going on in NVMe land that are new and I don't yet
       | know.
        
         | wtallis wrote:
         | I wrote a series of articles that covered the new features
         | defined for NVMe drives. The general pattern is that there are
         | now lots of optional hints that drives and host systems can
         | exchange about data placement, alignment and lifetime. But
         | there are also alternative paradigms available like Zoned
         | Storage that break compatibility to offer explicit control.
         | These features are mostly only implemented in enterprise SSDs,
         | and often only if a big customer specifically asks for them.
         | 
         | https://www.anandtech.com/show/11436/nvme-13-specification-p...
         | 
         | https://www.anandtech.com/show/14543/nvme-14-specification-p...
         | 
         | https://www.anandtech.com/show/16702/nvme-20-specification-r...
         | 
         | https://www.anandtech.com/show/15959/nvme-zoned-namespaces-e...
        
         | thfuran wrote:
         | >I thought most if not all SSDs were already overprovisioned.
         | Does additional overprovisioning help?
         | 
         | I think a big extra helping of overprovisioning is one of the
         | major differences between consumer and enterprise SSDs.
        
       | jbverschoor wrote:
       | > Cells are grouped into a grid, called a block, and blocks are
       | grouped into planes. The smallest unit through which a block can
       | be read or written is a page. Pages cannot be erased
       | individually, only whole blocks can be erased. The size of a
       | NAND-flash page size can vary, and most drive have pages of size
       | 2 KB, 4 KB, 8 KB or 16 KB. Most SSDs have blocks of 128 or 256
       | pages, which means that the size of a block can vary between 256
       | KB and 4 MB. For example, the Samsung SSD 840 EVO has blocks of
       | size 2048 KB, and each block contains 256 pages of 8 KB each.
       | 
       | Very confusing and might be incorrect. What are planes. And are
       | pages made out of blocks or vice-versa? If blocks are grouped in
       | pages, with erasing it sounds very different.. Only whole blocks,
       | which sounds like blocks are bigger than pages.
        
       | bob1029 wrote:
       | I've been thinking about the possibility of "dumb" SSD devices.
       | 
       | All of the current HW-level performance hacks could actually get
       | in the way if your software already enforces things like single
       | writer, chunky writes and/or append-only log structures.
       | 
       | Give me a drive that only writes in 1 linear direction (until its
       | full) and has a big red button to clean the entire thing all at
       | once (which would clearly require some offline processing time &
       | multiple disks for a realistic system).
        
         | bruce343434 wrote:
         | Sure! Go ahead and order some memory cells.
        
         | jerdfelt wrote:
         | Does the ZNS (Zoned Namespaces) spec come close enough?
         | 
         | https://nvmexpress.org/new-nvmetm-specification-defines-zone...
        
           | bob1029 wrote:
           | Yes, actually. This looks like a realistic/practical path.
           | Had no idea this was a thing.
        
             | mbjorling wrote:
             | There is more technical information at zonedstorage.io
             | which also offers drives for academia and open-source
             | projects.
             | 
             | https://zonedstorage.io/docs/community/devices
        
       | metadat wrote:
       | What sorts of programmers should be concerned about these
       | matters? Page cache doesn't seem too important or interesting in
       | my day to day app and distributed systems development.
       | 
       | Maybe it's useful if you want to make something like a more
       | performant version of grep? (aka ripgrep?)
        
         | golergka wrote:
         | Don't your distributed systems use databases of some sort?
        
           | alpaca128 wrote:
           | And why does a DB user need to know those details? Isn't it
           | the whole point of DB systems to provide an optimized
           | solution that allows users to focus on other things?
        
             | dotopotoro wrote:
             | Databases always try to flush something to disk after
             | transaction, just in case unexpected reboot happens. So
             | your writes to db have direct correlation to disk writes.
             | 
             | Choice of db schema impacts physical layout on ssd. E.g.
             | Different tables are more likely to be on different ssd
             | pages resulting in random writes.
             | 
             | Databases are insanely complex, but not magic.
        
         | pavon wrote:
         | My take:
         | 
         | 1-13) General background info that informs the rest.
         | 
         | 14-25) Important for any programmer that does enough file IO
         | that they need to optimize it.
         | 
         | 26-29) Important for any system admin to ensure they aren't
         | inadvertently limiting the performance of their hardware.
        
         | Gordonjcp wrote:
         | By the looks of the article? People writing SSD firmware, or
         | SSD drivers.
         | 
         | There is probably a small but non-zero number of these on here.
        
           | jqcoffey wrote:
           | The author appears to be an EM at Booking.com. It seems
           | unlikely that anyone at Booking would be working on SSD
           | firmware or drivers, but a CDN seems like a reasonable
           | assumption and also a useful place to plumb the depths of SSD
           | implementations.
        
         | eschneider wrote:
         | People who read from disks and people who write to them. How
         | SSDs organize data definitely had read and write performance
         | implications and if you're writing to disk, some write habits
         | that are perfectly reasonable on regular disks can cause
         | catastrophically fast wear on SSDs.
        
           | KennyBlanken wrote:
           | Yes, but the number of people who need to be worried about
           | aligning their writes and such is pretty small; certainly not
           | "every" programmer. The author gets into the weeds about
           | certain things application level programmers almost never
           | need to know or concern themselves about. He really doesn't
           | understand what's useful information and what isn't.
           | 
           | If you're programming at enterprise scale, this sort of stuff
           | is the responsibility of architect-level programmers and
           | senior systems engineers.
           | 
           | Even most linux sysadmins know all about block alignment
           | (well, if they predate most of the various tools figuring out
           | block size/alignment stuff for you.) It's nothing new - RAID
           | arrays work best when properly aligned, for example.
        
         | loxias wrote:
         | > Page cache doesn't seem too important or interesting in my
         | day to day app and distributed systems development
         | 
         | This is why we can't have nice things.
        
           | tshaddox wrote:
           | How so? Isn't the only point of developing these systems and
           | abstractions so that other people don't have to worry about
           | them?
        
             | chrisandchris wrote:
             | IMHO, today to many people think "don't have to worry about
             | them" equals "don't need to know anything about it".
        
               | tshaddox wrote:
               | I would argue that in most cases you "don't need to know
               | anything about it" either. It's reasonable to
               | deliberately treat abstractions as if they are not leaky,
               | as long as you're aware that all abstractions in fact
               | _are_ leaky and you 're equipped to investigate and learn
               | about them _if the leaks cause problems_.
        
               | dotopotoro wrote:
               | "don't need to know anything about it" is acceptable, but
               | should not be encouraged.
               | 
               | It's not like reading 10 bullet points on the subject is
               | "diving deep" and making huge time investment.
               | 
               | It's just getting the minimal context, so later on at
               | least some keywords are known.
        
               | macintux wrote:
               | 10 bullet points on every conceivable computer-related
               | topic is, well, a lot more than 10.
        
           | the_only_law wrote:
           | I love how people say this, when the reality is, all the
           | software from the oh-so-coveted is the biggest shit show I've
           | seen.
           | 
           | But it's rarely because some developer didn't understand page
           | caches, and usually because it obviously didn't revive enough
           | QA or UX input.
        
         | jeffbee wrote:
         | > doesn't seem too important or interesting in my day to day
         | app and distributed systems development.
         | 
         | Makes sense to me. At Google we were told to stop thinking
         | about all this stuff, that the storage hardware and software
         | people were responsible for hiding things like wearout from
         | application developers. This article is really "things you
         | should know if you plan to directly access an NVMe device" but
         | there is a huge class of programmers who are better off not
         | knowing.
        
           | bombcar wrote:
           | There was an article by varnish taking about how you should
           | leave the caching and memory management to the OS - even if
           | you can beat the virtual memory manager _today_ you'll stop
           | improving your home grown solution while RAM and the kernel
           | keep marching on.
        
             | wmf wrote:
             | https://varnish-cache.org/docs/trunk/phk/notes.html
        
         | yourapostasy wrote:
         | Not just programmers. Anyone using ZFS with SSD, whether as the
         | pool itself or in various caches like slog(zil) is going to
         | find this information of use when tuning for better SSD
         | citizenship. Programmers treating SSD like faster spinning rust
         | is like programmers treating S3 like another POSIX filesystem;
         | you _can_ do it, but you 're trading away compounding future
         | advantages for that one moment of expedience.
        
           | dekhn wrote:
           | In my career I have found that file system tuning for the
           | devices an anti-pattern that almost always ends up causing
           | more problems than it's worth.
        
             | philjohn wrote:
             | Are you writing low-level software, such as filesystems, or
             | raw block backed database storage engines? If not, then
             | that's definitely a decent maxim to live by.
        
       ___________________________________________________________________
       (page generated 2022-05-24 23:00 UTC)