[HN Gopher] Comparing DDR5 Memory from Micron, Samsung, SK Hynix
       ___________________________________________________________________
        
       Comparing DDR5 Memory from Micron, Samsung, SK Hynix
        
       Author : JoachimS
       Score  : 47 points
       Date   : 2022-02-15 19:59 UTC (3 hours ago)
        
 (HTM) web link (www.eetimes.com)
 (TXT) w3m dump (www.eetimes.com)
        
       | bcrl wrote:
       | If anyone thinks that on-die ECC is a good thing as the
       | manufactuers are touting, please go read the discussions on this
       | topic over in the forums at www.realworldtech.com. The goal of on
       | die ECC is purely to ensure that DRAM manufacturers are able to
       | obtain better yields by reducing the impact of defects, which is
       | not the same as ensuring data integrity. This means that it fails
       | the "trust but verify" tenant. Even worse is that some failures
       | may not even get reported to the system as is the case with ECC
       | implemented in the memory controllers and caches of modern CPUs.
       | The industry is trying to make this look like a good thing, but
       | I'm on the same side as Linus Torvalds: all modern systems should
       | ship with ECC memory. IBM got it right with parity memory in the
       | IBM PC.
        
         | grue_some wrote:
         | DDR5 contains two forms of ECC. The first is standard ECC which
         | is used to correct for bit flips in transmission. The second
         | on-die ECC is used to correct bit flips on the die, hence the
         | name. The world has already accepted that standard ECC on high
         | speed interfaces is a good idea, so why would on-die ECC be a
         | bad idea? Yes, they correct different error types, but they
         | both attempt to correct corrupted bits and the do so in a
         | mathematically similar way.
         | 
         | All that said, there are still ECC (has an ECC memory) and no-
         | ECC dimms for DDR5. So if the on-die ECC is concerning for
         | anyone, they can still get a DIMM with a separate ECC memory.
         | But the ECC happening at the interface between the DIMM and the
         | CPU will still exist always and you will have to trust it.
        
           | bcrl wrote:
           | Again, going back to the discussions over on RWT: some of the
           | less robust forms of ECC that DRAM manufacturers typically
           | implement can end up amplifying the problem by turning double
           | bit flips into silent multi bit flips which makes the memory
           | controller's job much harder. DRAM manufacturing process tech
           | is not optimized for logic like CPUs are, and those
           | limitations really do constrain how much logic (or "how
           | good") the ECC implemented on DRAM chips is. I trust CPU
           | manufacturers to get memory controllers right more than I
           | trust DRAM manufactures to get ECC right for one simple
           | reason: row hammer.
        
         | kimixa wrote:
         | It's entirely possible that on-die ECC is still a good thing
         | for the end user - to really judge you need to compare the
         | error rate (and proportion the ecc corrected) of dies that
         | would have previously failed validation. It may be that it's
         | good for both - IE more dies can be used (so higher supply and
         | lower prices to the consumer), yet the un-fixed error rate is
         | still lower than dies that would have previously passed
         | validation but lack on-die ECC.
         | 
         | I doubt any manufacturer would make that public, however, but
         | an estimate may be made if error rates actually start
         | increasing in the real world due to ddr5 allowing this.
         | 
         | I agree that end-to-end ECC really should be the default for
         | consumer products these days, but so long as the big players
         | see it as a "Professional User" product differentiation point
         | it'll always be more expensive than it should be.
        
           | deckard1 wrote:
           | > so long as the big players see it as a "Professional User"
           | product differentiation point it'll always be more expensive
           | than it should be.
           | 
           | Right. The more important Linus to speak up for ECC isn't
           | Torvalds. It's Linus Sebastian, of Linus Tech Tips. He's made
           | a few videos on ECC targeted towards gamers. Gamers drive the
           | enthusiast PC market and when they start caring, more ECC
           | gets made which will drive the cost down a bit. Last time I
           | bought 32GB DDR4 UDIMM ECC there was literally one SKU. Not
           | manufacturer. Not brand. _SKU_. One single item in production
           | in the entire world. 16GB wasn 't much better off, either.
           | 
           | It's a hard sell, though. Non-ECC will always be cheaper
           | because it costs less to produce. Gamers don't really care
           | that ECC prevents one crash in years because they are used to
           | frequent crashes already. They are largely being fed dogshit
           | from the AAA gaming industry and they have learned to just
           | deal with it. Crashes are just part of being on the bleeding
           | edge of gaming and Nvidia/Radeon drivers. One less crash in a
           | sea of crashes isn't something gamers are lining up for. But
           | a better model GPU or bigger SSD? It's an obvious choice.
        
             | kimixa wrote:
             | > Gamers don't really care that ECC prevents one crash in
             | years because they are used to frequent crashes already.
             | 
             | I work on GPU drivers for one of those companies.
             | 
             | We regularly get reports and backtraces that cannot be
             | reproduced, or "Cannot Happen" without some external factor
             | (e.g. some other bit of code poking around our memory
             | space). Often they're just silently dropped or ignored on
             | the long tail of issues that nobody can get any traction
             | on.
             | 
             | My understanding is the stats from hyperscalers is that ECC
             | correction events happen a lot more than "Common Knowledge"
             | may imply - I wonder just what proportion of things that
             | are blamed on software may actually be due to hardware
             | issues like this?
             | 
             | Again, without a significant change in the market (IE
             | enough gamers start using ECC to actually be statistically
             | relevant and comparing stability) this cannot really be
             | tested, but I've wondered.
        
               | bcrl wrote:
               | Except that anyone using Intel desktop CPUs pretty much
               | can't use ECC thanks to marketing deciding that ECC is a
               | market segmentation feature.
               | 
               | The real way to make ECC happen industry wide is for OS
               | vendors like Microsoft to make it a platform requirement.
               | A no ECC, no boot policy would change things overnight.
               | Sadly, we can't even get DRAM manufacturers to fix row
               | hammer properly, so the likelihood of this happening is
               | pretty much nil.
        
               | sliken wrote:
               | If people cared, they would buy ECC capable chips. In
               | fact my desktop is a Xeon e3-1230v5, which as cheaper and
               | slightly slower (3.4 vs 3.6 GHz or something) then the
               | equivalent i7. It was $50 more for the motherboard and
               | $100 more for the ram. I'm sure if the market flooded to
               | ECC capable chips (the silicon is the same) Intel would
               | sell them.
               | 
               | So many people grumble, but I'm not really sure Intel
               | should push ECC if desktops users aren't willing to pay a
               | modest premium for it.
               | 
               | Many cheer AMD, which does not disable ECC on desktop
               | chips, but neither do they promise ECC will actually
               | work. It's a confusing mess between physical capacity
               | (ram increases by 16GB when you add a 16GB dimm), and the
               | actual correction of errors and telling the OS about the
               | event. Only on the EPYC does AMD test and certify that
               | ECC will work.
        
               | wmf wrote:
               | You can use ECC by buying the Xeon version which is only
               | slightly more expensive.
        
       | g42gregory wrote:
       | Maybe I am not understanding something, but I thought that total
       | memory bandwidth is critical for Deep Learning applications. This
       | is where HBM on-die would shine, no? I am deferring the purchase
       | of a new desktop/server until processors with HBM come to market.
       | I think AMD is shipping EPYC engineering samples with some
       | version of memory and Intel is slated the release by the end of
       | the year. Am I wrong about this?
        
         | wmf wrote:
         | The only CPU with HBM is Sapphire Rapids and it may cost $20K;
         | for that money you're probably better off buying an H100.
        
       | hulitu wrote:
       | Article seems to imply that all DDR5 chips have ECC. Is this true
       | ?
        
         | sliken wrote:
         | Yes, as discussed on other threads here, the ECC helps increase
         | chip yields, but does not prevent offchip errors. So it's not
         | equivalent to what people normally mean by ECC memory which
         | stores parity that will correct single bit errors and detect 2
         | bit errors anywhere in the chip, dimm, dimm slot, motherboard,
         | socket, or CPU areas.
        
       | tester756 wrote:
       | >DDR5 provides both data and clock rates that double the
       | performance up to at least 7,200 MB/s. Additionally, DDR5 lowers
       | the operating voltage to 1.1V.
       | 
       | hmm? 7GB/s is the performance that modern disks achieve
        
       | kamilner wrote:
       | Why is it that LPDDR is recently faster than DDR of the same
       | 'generation'? I thought LPDDR is purely a lower voltage version
       | of DDR, so I naively would have expected worse performance. Is it
       | because it's typically closer (physically) to the CPU?
        
         | bcrl wrote:
         | DDR is typically a bus with more than 1 DIMM slot per channel.
         | LPDDR is typically point to point. Electrically, it's a lot
         | easier to meet signal integrity requirements on a point to
         | point trace than it is to make a multi drop bus work properly.
        
         | grue_some wrote:
         | LPDDR uses a wider bus so, at a similar clock rate, it is
         | faster.
        
         | dhdc wrote:
         | More importantly, because of the low-power requirement, LPDDR
         | typically have better binned dies than DDR.
        
         | sliken wrote:
         | I believe it's just the advantages you get from very short
         | trace lengths. Dimm slots are usually inches away, so you end
         | up with long traces from CPU -> dimm slot, pay the overhead of
         | the dimm slot connection, and then traces within a dimm.
         | 
         | LPDDR on the other hand move the individual dimm chips as close
         | as possible to the CPU and don't have any connector. This also
         | makes it much easier to have wider memory. A 13" MBP can have a
         | 512 bit wide memory system with at least 16 channels in a
         | thin/light laptop that is quite power efficient. To get similar
         | with DIMMs you'd have to buy a dual socket server motherboard
         | with 8 channels per socket and would be lucky to fit that in an
         | ATX size motherboard in a 1.75" thick chassis.
        
       ___________________________________________________________________
       (page generated 2022-02-15 23:01 UTC)