[HN Gopher] What every programmer should know about solid-state ... ___________________________________________________________________ What every programmer should know about solid-state drives (2014) Author : fagnerbrack Score : 122 points Date : 2022-05-23 07:21 UTC (1 days ago) (HTM) web link (codecapsule.com) (TXT) w3m dump (codecapsule.com) | [deleted] | B1FF_PSUVM wrote: | This guy used to hammer a good point about databases: | | "In a time of SSD, multi-core/processor, two terabyte memory and | Optane App Direct Mode machines, there is no reason not to build | from BCNF data. Time to do what Dr. Codd demonstrated. Technology | has finally caught up with the maths." | | https://drcoddwasright.blogspot.com (skip the distractions) | NovemberWhiskey wrote: | I feel like maybe this is "what all filesystem developers should | now about solid-state drivers"; not very obvious how most other | developers would interact with a device at the level of | abstraction where they have the kind of necessary control. | amelius wrote: | From Wikipedia: | | > In December 2012, Taiwanese engineers from Macronix revealed | their intention to announce at the 2012 IEEE International | Electron Devices Meeting that they had figured out how to improve | NAND flash storage read/write cycles from 10,000 to 100 million | cycles using a "self-healing" process that used a flash chip with | "onboard heaters that could anneal small groups of memory cells." | | So can I apply this myself by placing an SSD drive in an oven? | nonrandomstring wrote: | From a low level programmatic standpoint, managing size and | alignment with (potentially unknown) page sizes throws the same | challenges as for AV buffers and network packet MTU/sizes - | either side of "just right" is suboptimal. | dang wrote: | Related: | | _What every programmer should know about solid-state drives_ - | https://news.ycombinator.com/item?id=9049630 - Feb 2015 (31 | comments) | dtgriscom wrote: | A question: do you have a tool that searches the history for | previous links, or do you just have a really good memory? | user3939382 wrote: | What you should know is that I had an Apple OEM 1TB SSD in my | late-2013 MBP and one day it failed so catastrophically under | normal conditions that 2 of the best data recovery teams in the | world told me there was nothing they could do. | | Backup your stuff | avgcorrection wrote: | Wow, you didn't have a backup routine. That's so basic. Why | not? | | - | | Oh, what my routine is? Uh. I `cp -a ~ /mnt/backup/date` a | couple of times a month. | | ... Testing backups? | twofornone wrote: | Speaking about backing up...if one were interested in long term | archiving, do magnetic platters offer longer lasting data | integrity than SSDs in cold storage? | bombcar wrote: | In general I trust the older tech more than newer for long- | term archiving. So that would mean HDD (the oldest tech | thereof you can find still sold, probably) or tape or DVD | over SSD. | | But multiple copies in multiple formats cannot hurt, and the | most important stuff should have multiple live copies. | unilynx wrote: | it really depends on the format. pressed DVDs will outlast | your VHS tapes | Melatonic wrote: | Not sure about that but I do know that the new sealed helium | filled drives are much harder to take apart and do backup | recovery on | UI_at_80x24 wrote: | >..do magnetic platters offer longer lasting data integrity | than SSDs in cold storage? | | Yes. With an SSD the enemy is electron leakage. Minute | quantities of electrons trying to escape an unnatural state | and return to equilibrium. (yes, I just anthropomorphized | electrons.) Magnets however are more stable by nature. (yes | there is nothing natural about hard-drive storage. SMR doubly | so!) | | Anecdote/anecdata: I have been able to retrieve full drives | worth of data off of drives that have sat in a cardboard box | for 10 years. I also have trouble accessing data on 1-year | old USB flash drives. | Dwedit wrote: | Get an 8TB backup drive (Costco has them really cheap), and run | Macrium Reflect to clone your HDD onto the backup drive. | Macrium Reflect makes use of Volume Shadow Copy, so you can | continue using your computer while it's backing things up. | | Those big backup HDDs use shingled storage, so they're not any | good as general purpose hard drives, but they're excellent for | strictly sequential writes, such as a full disk backup to a | single file. | eli wrote: | Pair that with an online/remote backup and you're all set. I | like Backblaze because the software client is very good but | you could just as well push your own encrypted backup to S3 | or a VPS. | thekrendal wrote: | You can also use BackBlaze B2 to push your own backups with | whatever software will support it, similarly to how you'd | use S3. | [deleted] | toast0 wrote: | From my experience, SSDs tend to just disappear from the bus | when they're done. If there's JTAG pins, maybe it's OEM | recoverable, but good luck. At least with spinning disks, they | usually have a media failure which often has warning signs. | Bearing failures are usually seized at startup and there are | ways to get them moving and then do a full dump. If the | electronics fail, often you can pull a board from a working | unit and attach it to the media and get good results. I don't | think it's reasonable to swap flash chips onto another board | (but maybe, I dunno?). | samatman wrote: | I'll admit my memories of 2013 are hazy, but I do recall TRIM | being an issue early in the Macbook's history+. | | Backup your stuff! I happen to also back up to an SSD these | days, because the difference between minutes and hours is hard | to argue with. | | +edit: history of shipping with an SSD standard, that is. | lostlogin wrote: | > because the difference between minutes and hours is hard to | argue with. | | If the backups are incremental it shouldn't take hours. | Retric wrote: | Incremental backups are slightly higher risk. | samatman wrote: | For a given backup an SSD will be much faster, less | susceptible to drop and vibration damage, and pocketable | where a portable hard drive is pouchable at best. | mhh__ wrote: | Is anyone aware of a book-length equivalent of this? | the_only_law wrote: | Doubt it, books on niche technical subjects don't seem to be | much of a thing anymore unless you're willing to pay | extortionist prices for university textbooks. | wolverine876 wrote: | > books on niche technical subjects don't seem to be much of | a thing anymore | | Why not? Blog posts aren't nearly as valuable. | SketchySeaBeast wrote: | I assume someone would be writing that book in the hope | they'd make money back and that's hard to do with a super | niche subject few will be interested in and even fewer | would be willing to pay for. | mhh__ wrote: | There is a book on DRAM, caches and hard drives by Bruce | Jacobs. | | Basically I want what every programmer should know about | storage but in the style of dreppers original article. | wly_cdgr wrote: | How relevant is this in 2022? What's changed and what still | applies? | wolverine876 wrote: | A serious question: What has changed? | tester756 wrote: | are speeds of bleeding edge mem devices getting close to RAM? | Scene_Cast2 wrote: | In terms of bandwidth or latency? All conditions, worst case, | best case? | cogman10 wrote: | Not really. | | Max throughput is around 6gbps with a fairly high latency. DDR5 | has speeds of 52gbps, lower latency, AND your CPU will almost | undoubtedly have a cache on it to increase that speed further. | | This is all assuming you are putting your mem device on a pci- | express bus. | KennyBlanken wrote: | > Max throughput is around 6gbps with a fairly high latency. | | In the consumer market, a number of performance NVMe drives | will hit over 5GB/sec, which would be 40 Gbps. | | The latency isn't anywhere near as good as even quite-old | RAM, but modern SSDs are considerably less than an order | magnitude off in transfer speed from even current, common ram | (DDR4) and "only" about a hundred times higher in latency | than RAM. | | That's pretty stunning _from mass storage_. So is well over | 500,000 IOPS. | mmmpetrichor wrote: | If some typical write pattern from a typical app is wearing out | the SSD really fast, I'd say that's the SSD firmware engineer's | problem? And I think they've actually done a great job in | general, judging by the typical lifespan of SSDs and the | typically great performance. I'd argue that if the drive is | designed correctly, most programmers shouldn't have to care about | low level details. (I did say MOST). | Sakos wrote: | I think you misspelled "it's the user's problem". I don't think | most companies care until it becomes something that materially | affects them. Until then, users are reliant on the developers | of the applications they use to make up for the deficiencies in | lower layers. | dekhn wrote: | I treat ssds like faster hard drives and I have never been | disappointed. | tenebrisalietum wrote: | > Splitting cold and hot data as much as possible into separate | pages will make the job of the garbage collector easier. | | How do I tell my SSD to write stuff to specific pages? You can't | really tell the SSD to do anything except read, write, or trim | LBAs. | | Does NVMe support this with its queues? | | > 27. Over-provisioning is useful for wear leveling and | performance | | I thought most if not all SSDs were already overprovisioned. Does | additional overprovisioning help? | | > To ensure that logical writes are truly aligned to the physical | memory, you must align the partition to the NAND-flash page size | of the drive. | | I think this is false. This assumes there is a one-to-one mapping | of LBA to SSD PBA which you don't know. LBA 2048 could go to any | PBA on any page/block/flash line in the unit and as things are | written and rewritten, any correspondence that might happen due | to sequential assignment of PBAs->LBAs would gradually diminish, | IF you knew for sure that was happening in the first place. | Because you wouldn't really know what the SSD is doing without | reverse engineering or seeing the source code of firmware, unless | there's things going on in NVMe land that are new and I don't yet | know. | wtallis wrote: | I wrote a series of articles that covered the new features | defined for NVMe drives. The general pattern is that there are | now lots of optional hints that drives and host systems can | exchange about data placement, alignment and lifetime. But | there are also alternative paradigms available like Zoned | Storage that break compatibility to offer explicit control. | These features are mostly only implemented in enterprise SSDs, | and often only if a big customer specifically asks for them. | | https://www.anandtech.com/show/11436/nvme-13-specification-p... | | https://www.anandtech.com/show/14543/nvme-14-specification-p... | | https://www.anandtech.com/show/16702/nvme-20-specification-r... | | https://www.anandtech.com/show/15959/nvme-zoned-namespaces-e... | thfuran wrote: | >I thought most if not all SSDs were already overprovisioned. | Does additional overprovisioning help? | | I think a big extra helping of overprovisioning is one of the | major differences between consumer and enterprise SSDs. | jbverschoor wrote: | > Cells are grouped into a grid, called a block, and blocks are | grouped into planes. The smallest unit through which a block can | be read or written is a page. Pages cannot be erased | individually, only whole blocks can be erased. The size of a | NAND-flash page size can vary, and most drive have pages of size | 2 KB, 4 KB, 8 KB or 16 KB. Most SSDs have blocks of 128 or 256 | pages, which means that the size of a block can vary between 256 | KB and 4 MB. For example, the Samsung SSD 840 EVO has blocks of | size 2048 KB, and each block contains 256 pages of 8 KB each. | | Very confusing and might be incorrect. What are planes. And are | pages made out of blocks or vice-versa? If blocks are grouped in | pages, with erasing it sounds very different.. Only whole blocks, | which sounds like blocks are bigger than pages. | bob1029 wrote: | I've been thinking about the possibility of "dumb" SSD devices. | | All of the current HW-level performance hacks could actually get | in the way if your software already enforces things like single | writer, chunky writes and/or append-only log structures. | | Give me a drive that only writes in 1 linear direction (until its | full) and has a big red button to clean the entire thing all at | once (which would clearly require some offline processing time & | multiple disks for a realistic system). | bruce343434 wrote: | Sure! Go ahead and order some memory cells. | jerdfelt wrote: | Does the ZNS (Zoned Namespaces) spec come close enough? | | https://nvmexpress.org/new-nvmetm-specification-defines-zone... | bob1029 wrote: | Yes, actually. This looks like a realistic/practical path. | Had no idea this was a thing. | mbjorling wrote: | There is more technical information at zonedstorage.io | which also offers drives for academia and open-source | projects. | | https://zonedstorage.io/docs/community/devices | metadat wrote: | What sorts of programmers should be concerned about these | matters? Page cache doesn't seem too important or interesting in | my day to day app and distributed systems development. | | Maybe it's useful if you want to make something like a more | performant version of grep? (aka ripgrep?) | golergka wrote: | Don't your distributed systems use databases of some sort? | alpaca128 wrote: | And why does a DB user need to know those details? Isn't it | the whole point of DB systems to provide an optimized | solution that allows users to focus on other things? | dotopotoro wrote: | Databases always try to flush something to disk after | transaction, just in case unexpected reboot happens. So | your writes to db have direct correlation to disk writes. | | Choice of db schema impacts physical layout on ssd. E.g. | Different tables are more likely to be on different ssd | pages resulting in random writes. | | Databases are insanely complex, but not magic. | pavon wrote: | My take: | | 1-13) General background info that informs the rest. | | 14-25) Important for any programmer that does enough file IO | that they need to optimize it. | | 26-29) Important for any system admin to ensure they aren't | inadvertently limiting the performance of their hardware. | Gordonjcp wrote: | By the looks of the article? People writing SSD firmware, or | SSD drivers. | | There is probably a small but non-zero number of these on here. | jqcoffey wrote: | The author appears to be an EM at Booking.com. It seems | unlikely that anyone at Booking would be working on SSD | firmware or drivers, but a CDN seems like a reasonable | assumption and also a useful place to plumb the depths of SSD | implementations. | eschneider wrote: | People who read from disks and people who write to them. How | SSDs organize data definitely had read and write performance | implications and if you're writing to disk, some write habits | that are perfectly reasonable on regular disks can cause | catastrophically fast wear on SSDs. | KennyBlanken wrote: | Yes, but the number of people who need to be worried about | aligning their writes and such is pretty small; certainly not | "every" programmer. The author gets into the weeds about | certain things application level programmers almost never | need to know or concern themselves about. He really doesn't | understand what's useful information and what isn't. | | If you're programming at enterprise scale, this sort of stuff | is the responsibility of architect-level programmers and | senior systems engineers. | | Even most linux sysadmins know all about block alignment | (well, if they predate most of the various tools figuring out | block size/alignment stuff for you.) It's nothing new - RAID | arrays work best when properly aligned, for example. | loxias wrote: | > Page cache doesn't seem too important or interesting in my | day to day app and distributed systems development | | This is why we can't have nice things. | tshaddox wrote: | How so? Isn't the only point of developing these systems and | abstractions so that other people don't have to worry about | them? | chrisandchris wrote: | IMHO, today to many people think "don't have to worry about | them" equals "don't need to know anything about it". | tshaddox wrote: | I would argue that in most cases you "don't need to know | anything about it" either. It's reasonable to | deliberately treat abstractions as if they are not leaky, | as long as you're aware that all abstractions in fact | _are_ leaky and you 're equipped to investigate and learn | about them _if the leaks cause problems_. | dotopotoro wrote: | "don't need to know anything about it" is acceptable, but | should not be encouraged. | | It's not like reading 10 bullet points on the subject is | "diving deep" and making huge time investment. | | It's just getting the minimal context, so later on at | least some keywords are known. | macintux wrote: | 10 bullet points on every conceivable computer-related | topic is, well, a lot more than 10. | the_only_law wrote: | I love how people say this, when the reality is, all the | software from the oh-so-coveted is the biggest shit show I've | seen. | | But it's rarely because some developer didn't understand page | caches, and usually because it obviously didn't revive enough | QA or UX input. | jeffbee wrote: | > doesn't seem too important or interesting in my day to day | app and distributed systems development. | | Makes sense to me. At Google we were told to stop thinking | about all this stuff, that the storage hardware and software | people were responsible for hiding things like wearout from | application developers. This article is really "things you | should know if you plan to directly access an NVMe device" but | there is a huge class of programmers who are better off not | knowing. | bombcar wrote: | There was an article by varnish taking about how you should | leave the caching and memory management to the OS - even if | you can beat the virtual memory manager _today_ you'll stop | improving your home grown solution while RAM and the kernel | keep marching on. | wmf wrote: | https://varnish-cache.org/docs/trunk/phk/notes.html | yourapostasy wrote: | Not just programmers. Anyone using ZFS with SSD, whether as the | pool itself or in various caches like slog(zil) is going to | find this information of use when tuning for better SSD | citizenship. Programmers treating SSD like faster spinning rust | is like programmers treating S3 like another POSIX filesystem; | you _can_ do it, but you 're trading away compounding future | advantages for that one moment of expedience. | dekhn wrote: | In my career I have found that file system tuning for the | devices an anti-pattern that almost always ends up causing | more problems than it's worth. | philjohn wrote: | Are you writing low-level software, such as filesystems, or | raw block backed database storage engines? If not, then | that's definitely a decent maxim to live by. ___________________________________________________________________ (page generated 2022-05-24 23:00 UTC)