[HN Gopher] ECC RAM should be a human right ___________________________________________________________________ ECC RAM should be a human right Author : zdw Score : 49 points Date : 2023-01-21 21:44 UTC (1 hours ago) (HTM) web link (dmitrybrant.com) (TXT) w3m dump (dmitrybrant.com) | greenbit wrote: | Commodity PCs back in the 80s and 90s didn't have error | correction but they did have parity (iirc). Correction requires | three extra bits per byte compared to parity only carrying one | extra bit per byte. I recall around 1990 you could get your 30pin | SIMs as 9-bit (parity) or 8-bit (no-parity), and virtually all of | the PCs at the time wanted the 9 bit modules. Parity can't | correct errors, but at least it can cause an exception when you | read something that's had a bit flip. | Felger wrote: | Yes I can indeed recall systems being shipped with 4x 9-bits | sticks. | | And god the horrific price of thoses sticks... | phkahler wrote: | Unfortunately DDR5 is going to complicate rather than fix the | story. | zdw wrote: | It's very strange that DDR5 mandates internal ECC within each | physical package, but not on the longer and possibly more EMI | sensitive connections between the memory chips and controller. | | I would have thought that the additional cost would be minimal | (additional wiring on the logic board in some cases), but maybe | this is just more artificial market segmentation? | ilyt wrote: | They have internal ECC coz that allows them to have higher | yields, what could be considered faulty chip in DDR4 can be | now sold in DDR5. So it is effectively cost-reducing measure | for them. Exposing that to the user not only would cost extra | pennies, but potentially have uses go "hey, this stick is | shit, look at how many correctable errors it is producing, | please replace it" | arp242 wrote: | Why is that? | loeg wrote: | DDR5 will have some minimal ECC on the stick but critically | does not mandate full runs to the CPU. Or in Wikipedia's | words: | | > Unlike DDR4, all DDR5 chips have on-die ECC, where errors | are detected and corrected before sending data to the CPU. | This, however, is not the same as true ECC memory with an | extra data correction chip on the memory module. DDR5's on- | die error correction is to improve reliability and to allow | denser RAM chips which lowers the per-chip defect rate. There | still exist non-ECC and ECC DDR5 DIMM variants; the ECC | variants have extra data lines to the CPU to send error- | detection data, letting the CPU detect and correct errors | that occurred in transit. | | So in some ways it is better than previous generations, but | it gives vendors another excuse not to implement full- | coverage ECC. That's my guess of why GP said it complicates | things. | | https://en.wikipedia.org/wiki/DDR5_SDRAM | adhoc32 wrote: | Latest Intel desktop CPUs (i.e. i9-13900KF) supports ECC with the | W680 chipset. | fortran77 wrote: | One of the main reasons I buy Xeon desktops is the ECC. With | 128 GB of memory, and 1 bitflip/GB/year average error rate, it | seems too risky to not use ECC for production work. | Retric wrote: | Real world numbers are closer to 1 bitflip/GB/hour than year | because bit flips are highly correlated. | | "A large-scale study based on Google's very large number of | servers was presented at the SIGMETRICS/Performance '09 | conference.[6] The actual error rate found was several orders | of magnitude higher than the previous small-scale or | laboratory studies, with between 25,000 (2.5 x 10-11 | error/bit*h) and 70,000 (7.0 x 10-11 error/bit*h, or 1 bit | error per gigabyte of RAM per 1.8 hours) errors per billion | device hours per megabit. More than 8% of DIMM memory modules | were affected by errors per year." | https://en.wikipedia.org/wiki/ECC_memory | | A random stick of non ECC memory might be far above average | fine, but you don't know. | skunkworker wrote: | I wish those motherboards didn't cost $450+, I've contemplated | building a home server with a 13th gen + ECC because you also | get quicksync onboard. | coder543 wrote: | Exactly. $450 for a motherboard just to get ECC support is | ridiculous. I don't know how it is with AM5, but on AM4, you | could use ECC memory with many normally-priced motherboards. | | Mentioning W680 feels pointless. You've _always_ been able to | buy high end motherboards and stick ECC in them. The entire | point of the article is that _all_ computers should be using | ECC RAM, not just the expensive, workstation class computers. | Dylan16807 wrote: | It's worth keeping in mind that the chipset has zero | involvement in ECC. The CPU is directly attached to the memory | slots. They're using the chipset as an expensive dongle. | NelsonMinar wrote: | Still wild to me we lost ECC RAM. It used to be standard in PCs. | | Does Apple hardware come with ECC RAM? If anyone could make it | make sense as a business, it's them. | pram wrote: | The Xeon based Macs had ECC of course. None of the ARM ones do | (yet) | [deleted] | MBCook wrote: | When was it standard? It's been the high-end extra thing for as | long as I can remember. | MisterTea wrote: | I know the Pentium Pro/2/3 chipsets and motherboards all(?) | supported it. Unsure of the Pentium 1 as the 430TX on my Tyan | Tomcat IV doesn't, and that is a dual processor board. 486 | and earlier likely depended on the chipset as there were | many. | | At work I have two working slot 1 PIII 800's each with 1GB | ECC (4x 256MB DIMMS) on a regular Asus board (doing nothing | but waiting to go home with me one day). The board reports | the RAM is in fact ECC and that it is enabled. | NelsonMinar wrote: | I was thinking of 386 era computers and strictly speaking it | was just parity RAM, not ECC. Which often led to annoyances | when a single parity error would cause your whole computer to | halt. | | Wikipedia says "By the mid-1990s, most DRAM had dropped | parity checking as manufacturers felt confident that it was | no longer necessary.". | https://en.wikipedia.org/wiki/RAM_parity | | I'd love to read a technical deep dive on RAM reliability | over time. You'd think with increasing memory cell density | and overall larger RAM the number of absolute errors on a | desktop computer would be going up over time. | Felger wrote: | I can remember 486 Motherboard in Packard Bell (quite the | entry brand...) systems frequently used 36 bits ECC FP SIMMs. | | Printers and plotters from this era used ECC modules most of | the time. | | But by the end of the century, they were replaced by | unbuffered, unregistered 16/32/64 bits modules. | | Every mid range server still use ECC. Entry HPE Servers use | ECC UREG (unregistered, 9 chips) modules, while mid range and | more use ECC REG modules (9 chip + interface controller | onboard). Ironically, UREG module are more expensive than ECC | REG. | | Also, most workstations used ECC modules. Less frequently | since 4-5 years. | [deleted] | dale_glass wrote: | ECC RAM would actually be a boon to everyone, including gamers. | | ECC means not only that you know precisely when you've gone too | far with overclocking, but potentially allows overclocking a bit | further, relying on that some amount of trouble can now be | tolerated. | | It also means you're not going to break your OS by playing with | this stuff. Memory corruption carries a huge risk of disk | corruption, which can mean things like corrupt data, random | crashes or an unbootable system that persists even after | reverting everything to defaults. | p1necone wrote: | The sweet spot for overclocking ECC ram is still before it | starts malfunctioning. If it's clocked higher but is correcting | for errors it will still be slower. | ilyt wrote: | Entirely depends on error rate | [deleted] | RealityVoid wrote: | I doubt that would actually be useful with overclocking. I | don't know the arch of the modern PC well enough to say with | 100% confidence, but on embedded arches, the RAM has the parity | bits checked when they get placed on the bus. If the error | happens on data retrieval(or was already present) , then the | ECC saves you, but if it happen anywhere else... not really? I | don't know if.. ALU's for example automatically include the | parity bits in the computation. | p1mrx wrote: | You're talking about overclocking the CPU. ECC is more | relevant when overclocking the RAM itself, which also affects | gaming performance. | Dylan16807 wrote: | They specifically mean overclocking the memory. | jjtheblunt wrote: | totally embarrassingly naive question : why bother overclocking | ? | dale_glass wrote: | I think it's mostly pointless in this day and age. | | I'm just saying that it has a potential appeal for gamers | too, so it's not just a datacenter type of technology that | some nerds want to play with. | | At the very least it'd make overclocking safer and easier, so | any manufacturer making gamer type boards with a lot of | overclocking settings in the BIOS should like the idea of it. | eric__cartman wrote: | Some people prefer to trade off stability for a slight | performance improvement. With modern hardware I don't think | it's worth it to be honest. I want my computer to work day in | and day out even if it means a 2% lower score in some | benchmark. | jjeaff wrote: | There are also a lot of cases where you can overclock | without sacrificing stability. The standard clock speed for | any line of processors is simply the minimum it is tested | for. But you sometimes get lucky and can get a better chip | with more viable transistors. So you can boost the clock ok | those and reap the benefits without any drawbacks. | | There are sites and services that do "binning" where they | test the specific chips and you can buy ones that have been | vetted to clock higher. | [deleted] | LanternLight83 wrote: | I'm with you, but it's worth noting that errant bit-flips are | also the most convincing argument for vertically integrated file- | systems like ZFS and BTRFS. | whitepoplar wrote: | At least make it user-configurable! I'd trade off a bit of memory | capacity for ECC protection in a heartbeat. | PaulKeeble wrote: | ECC has been used as an artificial market segmentation mechanism | for a long time and it needs to come to an end. RAM just like | SSDs and HDDs ought to have some amount of self protection again | basic errors, all places where data is stored even for short | periods needs this. | thinking001001 wrote: | Digital privacy should be a human right. ECC RAM is just another | iteration. | theandrewbailey wrote: | * * * ___________________________________________________________________ (page generated 2023-01-21 23:00 UTC)