[HN Gopher] Comparing DDR5 Memory from Micron, Samsung, SK Hynix ___________________________________________________________________ Comparing DDR5 Memory from Micron, Samsung, SK Hynix Author : JoachimS Score : 47 points Date : 2022-02-15 19:59 UTC (3 hours ago) (HTM) web link (www.eetimes.com) (TXT) w3m dump (www.eetimes.com) | bcrl wrote: | If anyone thinks that on-die ECC is a good thing as the | manufactuers are touting, please go read the discussions on this | topic over in the forums at www.realworldtech.com. The goal of on | die ECC is purely to ensure that DRAM manufacturers are able to | obtain better yields by reducing the impact of defects, which is | not the same as ensuring data integrity. This means that it fails | the "trust but verify" tenant. Even worse is that some failures | may not even get reported to the system as is the case with ECC | implemented in the memory controllers and caches of modern CPUs. | The industry is trying to make this look like a good thing, but | I'm on the same side as Linus Torvalds: all modern systems should | ship with ECC memory. IBM got it right with parity memory in the | IBM PC. | grue_some wrote: | DDR5 contains two forms of ECC. The first is standard ECC which | is used to correct for bit flips in transmission. The second | on-die ECC is used to correct bit flips on the die, hence the | name. The world has already accepted that standard ECC on high | speed interfaces is a good idea, so why would on-die ECC be a | bad idea? Yes, they correct different error types, but they | both attempt to correct corrupted bits and the do so in a | mathematically similar way. | | All that said, there are still ECC (has an ECC memory) and no- | ECC dimms for DDR5. So if the on-die ECC is concerning for | anyone, they can still get a DIMM with a separate ECC memory. | But the ECC happening at the interface between the DIMM and the | CPU will still exist always and you will have to trust it. | bcrl wrote: | Again, going back to the discussions over on RWT: some of the | less robust forms of ECC that DRAM manufacturers typically | implement can end up amplifying the problem by turning double | bit flips into silent multi bit flips which makes the memory | controller's job much harder. DRAM manufacturing process tech | is not optimized for logic like CPUs are, and those | limitations really do constrain how much logic (or "how | good") the ECC implemented on DRAM chips is. I trust CPU | manufacturers to get memory controllers right more than I | trust DRAM manufactures to get ECC right for one simple | reason: row hammer. | kimixa wrote: | It's entirely possible that on-die ECC is still a good thing | for the end user - to really judge you need to compare the | error rate (and proportion the ecc corrected) of dies that | would have previously failed validation. It may be that it's | good for both - IE more dies can be used (so higher supply and | lower prices to the consumer), yet the un-fixed error rate is | still lower than dies that would have previously passed | validation but lack on-die ECC. | | I doubt any manufacturer would make that public, however, but | an estimate may be made if error rates actually start | increasing in the real world due to ddr5 allowing this. | | I agree that end-to-end ECC really should be the default for | consumer products these days, but so long as the big players | see it as a "Professional User" product differentiation point | it'll always be more expensive than it should be. | deckard1 wrote: | > so long as the big players see it as a "Professional User" | product differentiation point it'll always be more expensive | than it should be. | | Right. The more important Linus to speak up for ECC isn't | Torvalds. It's Linus Sebastian, of Linus Tech Tips. He's made | a few videos on ECC targeted towards gamers. Gamers drive the | enthusiast PC market and when they start caring, more ECC | gets made which will drive the cost down a bit. Last time I | bought 32GB DDR4 UDIMM ECC there was literally one SKU. Not | manufacturer. Not brand. _SKU_. One single item in production | in the entire world. 16GB wasn 't much better off, either. | | It's a hard sell, though. Non-ECC will always be cheaper | because it costs less to produce. Gamers don't really care | that ECC prevents one crash in years because they are used to | frequent crashes already. They are largely being fed dogshit | from the AAA gaming industry and they have learned to just | deal with it. Crashes are just part of being on the bleeding | edge of gaming and Nvidia/Radeon drivers. One less crash in a | sea of crashes isn't something gamers are lining up for. But | a better model GPU or bigger SSD? It's an obvious choice. | kimixa wrote: | > Gamers don't really care that ECC prevents one crash in | years because they are used to frequent crashes already. | | I work on GPU drivers for one of those companies. | | We regularly get reports and backtraces that cannot be | reproduced, or "Cannot Happen" without some external factor | (e.g. some other bit of code poking around our memory | space). Often they're just silently dropped or ignored on | the long tail of issues that nobody can get any traction | on. | | My understanding is the stats from hyperscalers is that ECC | correction events happen a lot more than "Common Knowledge" | may imply - I wonder just what proportion of things that | are blamed on software may actually be due to hardware | issues like this? | | Again, without a significant change in the market (IE | enough gamers start using ECC to actually be statistically | relevant and comparing stability) this cannot really be | tested, but I've wondered. | bcrl wrote: | Except that anyone using Intel desktop CPUs pretty much | can't use ECC thanks to marketing deciding that ECC is a | market segmentation feature. | | The real way to make ECC happen industry wide is for OS | vendors like Microsoft to make it a platform requirement. | A no ECC, no boot policy would change things overnight. | Sadly, we can't even get DRAM manufacturers to fix row | hammer properly, so the likelihood of this happening is | pretty much nil. | sliken wrote: | If people cared, they would buy ECC capable chips. In | fact my desktop is a Xeon e3-1230v5, which as cheaper and | slightly slower (3.4 vs 3.6 GHz or something) then the | equivalent i7. It was $50 more for the motherboard and | $100 more for the ram. I'm sure if the market flooded to | ECC capable chips (the silicon is the same) Intel would | sell them. | | So many people grumble, but I'm not really sure Intel | should push ECC if desktops users aren't willing to pay a | modest premium for it. | | Many cheer AMD, which does not disable ECC on desktop | chips, but neither do they promise ECC will actually | work. It's a confusing mess between physical capacity | (ram increases by 16GB when you add a 16GB dimm), and the | actual correction of errors and telling the OS about the | event. Only on the EPYC does AMD test and certify that | ECC will work. | wmf wrote: | You can use ECC by buying the Xeon version which is only | slightly more expensive. | g42gregory wrote: | Maybe I am not understanding something, but I thought that total | memory bandwidth is critical for Deep Learning applications. This | is where HBM on-die would shine, no? I am deferring the purchase | of a new desktop/server until processors with HBM come to market. | I think AMD is shipping EPYC engineering samples with some | version of memory and Intel is slated the release by the end of | the year. Am I wrong about this? | wmf wrote: | The only CPU with HBM is Sapphire Rapids and it may cost $20K; | for that money you're probably better off buying an H100. | hulitu wrote: | Article seems to imply that all DDR5 chips have ECC. Is this true | ? | sliken wrote: | Yes, as discussed on other threads here, the ECC helps increase | chip yields, but does not prevent offchip errors. So it's not | equivalent to what people normally mean by ECC memory which | stores parity that will correct single bit errors and detect 2 | bit errors anywhere in the chip, dimm, dimm slot, motherboard, | socket, or CPU areas. | tester756 wrote: | >DDR5 provides both data and clock rates that double the | performance up to at least 7,200 MB/s. Additionally, DDR5 lowers | the operating voltage to 1.1V. | | hmm? 7GB/s is the performance that modern disks achieve | kamilner wrote: | Why is it that LPDDR is recently faster than DDR of the same | 'generation'? I thought LPDDR is purely a lower voltage version | of DDR, so I naively would have expected worse performance. Is it | because it's typically closer (physically) to the CPU? | bcrl wrote: | DDR is typically a bus with more than 1 DIMM slot per channel. | LPDDR is typically point to point. Electrically, it's a lot | easier to meet signal integrity requirements on a point to | point trace than it is to make a multi drop bus work properly. | grue_some wrote: | LPDDR uses a wider bus so, at a similar clock rate, it is | faster. | dhdc wrote: | More importantly, because of the low-power requirement, LPDDR | typically have better binned dies than DDR. | sliken wrote: | I believe it's just the advantages you get from very short | trace lengths. Dimm slots are usually inches away, so you end | up with long traces from CPU -> dimm slot, pay the overhead of | the dimm slot connection, and then traces within a dimm. | | LPDDR on the other hand move the individual dimm chips as close | as possible to the CPU and don't have any connector. This also | makes it much easier to have wider memory. A 13" MBP can have a | 512 bit wide memory system with at least 16 channels in a | thin/light laptop that is quite power efficient. To get similar | with DIMMs you'd have to buy a dual socket server motherboard | with 8 channels per socket and would be lucky to fit that in an | ATX size motherboard in a 1.75" thick chassis. ___________________________________________________________________ (page generated 2022-02-15 23:01 UTC)