[HN Gopher] DRAM thermal issues reach crisis point ___________________________________________________________________ DRAM thermal issues reach crisis point Author : rbanffy Score : 165 points Date : 2022-07-18 13:37 UTC (9 hours ago) (HTM) web link (semiengineering.com) (TXT) w3m dump (semiengineering.com) | anthony_r wrote: | we're lucky that this happens at 360 Kelvin and not at 200 Kelvin | or even lower. | p1mrx wrote: | Note that "kelvin" is lowercase: | https://english.stackexchange.com/questions/329629/is-kelvin... | 8jy89hui wrote: | If our habitable temperature was cooler or hotter, we would use | different materials to best reflect that environment. I'm not | so sure it is luck | tinus_hn wrote: | It's lucky in the northern hemisphere there is an easily | recognizable star pointing almost exactly at the North Pole, | which makes navigation much easier. | | It's lucky some available material worked the right way to | make a transistor. | | It's lucky some person smart enough to make that work got to | work on that. | | History is full of lucky coincidences like that. How many | Einsteins have died out in the jungle, without access to our | scientific knowledge or a way to add to it? For most of | history and partly still today, being a scientist wasn't | possible for just about anyone, you had to be from the right | family. It's _all_ about luck. | H8crilA wrote: | So let's just use those that work up to 400K :) | somebodynew wrote: | There is a bit of luck in even having any viable materials | that work at the required temperature to choose from. | | For example, humanity hasn't been able to find a single | appropriate material for a superconductor at room | temperature/atmospheric pressure despite significant | research, but a civilization living below 100 K has a myriad | of options to choose from. Superconductors are high | technology to us, but if your planet is cold enough then | superconducting niobium wire would be a boring household item | like copper wire is for us. | dodobirdlord wrote: | Niobium superconducts at 9.3K, so that would be a pretty | cold household! | marcosdumay wrote: | Hum... We inhabit that temperature exactly because it | allows for a wide range of chemical reactions in a | controlled fashion. | | The Anthropic Principle is not luck. | | We are lucky that those interesting things are possible. We | are also unlucky that many interesting things are not | possible. But given that they are possible, it was almost | inevitable that most of them would be possible around us. | YakBizzarro wrote: | well, depends how you define lucky. at cryogenic temperature, | the leakage current of a transistor is so small that you | virtually don't require DRAM refresh. I tested DRAM cells with | discharge times of hours, and the transistor was not at all | optimized. See https://www.rambus.com/blogs/part-1-dram-goes- | cryogenic/ (not my work) | klodolph wrote: | It's a combination of chemistry and geometry (and other | factors). Maybe there's some luck. | | There are ICs and components built for operating in extreme | environments, like drilling. You can get SiC (silicon carbide) | chips that operate above 200degC (473K), if that's important to | you. There are also various semiconductors that are worse than | silicon at handling high temperatures, like germanium. Old | germanium circuits sometimes don't even work correctly on a hot | day. | | If we lived at 200K, I'm sure that there's a host of | semiconductor materials which would be available to us which | don't work at 300K. | dusted wrote: | Sounds like nothing a little liquid nitrogen can't fix. | | > (as a standard metric, about once every 64 milliseconds) | | 64 milliseconds? wow.. I thought they'd need refreshing way more | often | brutusborn wrote: | I loved this part at the end: "By contrast, allowing a | temperature increase for chips in large data centers could have | surprising environmental benefits. To this point, Keysight's | White recalled that a company once requested JEDEC increase the | spec for an operating temperature by five degrees. The estimate | of the potential savings was stunning. Based on how much energy | they consumed annually for cooling, they calculated a five degree | change could translate to shutting down three coal power plants | per year. JEDEC ultimately compromised on the suggestion." | JJMcJ wrote: | I've heard of some large companies that run their data centers | hot. | | Cheaper to have a slightly higher failure rate, or have the | computers throttle their clock speed, than to pay for extra air | conditioning. | woleium wrote: | Could also be more likely that it's cheaper to extend the | life of an older DC by accepting higher temperatures and | failure rates than to upgrade the hvac to accommodate newer | higher density designs | klysm wrote: | Hard to do the math here though because you don't know the | failure statistics in advance. Kind of a multi-armed bandit | problem of sorts. | magicalhippo wrote: | Wouldn't the Arrhenius equation[1] be a good approximation? | It's used in the industry[2] from what I know. | | Of course you'll need some data for calibrating the model, | but if you got that? | | [1]: https://en.wikipedia.org/wiki/Arrhenius_equation | | [2]: https://www.ti.com/lit/an/snva509a/snva509a.pdf | buescher wrote: | Well, sometimes, and more frequently now at really small | process nodes, but mostly no. Most electronic failures | are basically mechanical and thermal cycling will cause | fatigue failures more than elevated temperatures will | accelerate things like electromigration. Lots of people | still use 1960s style handbook methods anyway because | there's no plug-and-chug replacement. | | The groundbreaking work here was by Michael Pecht back in | the early nineties: | https://apps.dtic.mil/sti/pdfs/ADA275029.pdf | gjsman-1000 wrote: | _Only three?_ That 's not an immediate win. What if the | temperature increase causes just so slightly more failures, | causing so slightly more replacements, and each replacement | requires energy to make, ship, install, replace, recycle, the | effects of increased demand... What if it lasts slightly less | long, causing more early failures and eWaste? After all that | potential risk, is it a benefit still, and if so, how much? | | We don't know and it is hard to know - but I don't blame JEDEC | and would not call it a "compromise" on their part like it was | a superior option. | Spooky23 wrote: | For one company? That's pretty impressive. | | When I was on an architecture team that consolidated ~80 | datacenter to 3 circa 2010, this was a key dollar driver. We | raised the temperature ~6 degrees from the average temp, | which meant kicking out a few vendors initially. The cost | savings for doing that was essentially the total operational | costs of 5 datacenters. | | The annual failure rates for the hardware did not change at | all by any metric. Number of service impacting hardware | failures went to zero due to the consolidation. | | In general, if you operate within the operating ranges of | your hardware, you won't have failure. You will have | complaints from employees, because computers will operate at | temperatures not comfortable for humans. | benlivengood wrote: | It's almost certainly Google, since they've historically | ran their data centers hotter than most [0]. Cooling | efficiency increases with a higher delta-T to the working | fluid, and Google uses a gigawatt or two continuously [1]. | From PUE numbers that's hundreds of MW spent on cooling, so | making it more efficient is quite worth it. | | [0] https://www.google.com/about/datacenters/efficiency/ | [1] https://www.cnbc.com/2022/04/13/google-data-center- | goal-100p... | sllabres wrote: | We did the same several years ago. At the time I found http | ://www.cs.toronto.edu/~bianca/papers/temperature_cam.pdf | quite interesting. I didn't find many other precise papers | about issues when running at a higher temperatures, but | many "should" and "can". | | Based on ASHARE there are two guidelines from HPE and IBM: | https://www.chiltrix.com/documents/HP-ASHRAE.pdf | https://www.ibm.com/downloads/cas/1Q94RPGE | | We found (by measurement) that some places in the | datacenter with suboptimal airflow are well over the | average or simulated temperature so one can leave the safe | temperature envelope if one isn't careful. | Spooky23 wrote: | The beauty of a big project like is you get the | engineering resources to lake sure the datacenter is | working right. | | The hyper scale people take this to the next degree. | Freestyler_3 wrote: | Don't you want to have AC running on cooling mode or | dehydration mode to get water out of the air? | | edit: this makes me wonder what is the ideal humidity in a | data centre? Is too dry a thing? | Dylan16807 wrote: | Dry air makes static electricity buildup more. | picture wrote: | Cmon.. logically it has to be more than just three right? | "Cooling efficiency" sometimes come in unit of W/degC | difference, so I'd imagine that a few more degrees would be a | huge deal. | marcosdumay wrote: | You usually have to budget a few degC of difference just | for pushing enough energy through heat exchangers. So the | rate of chip temperature / external temperature is lower | than the rate that effectively determines the cooling | efficiency. | gjsman-1000 wrote: | I don't know how much power efficiency would be saved - my | concern is more that it is completely logical that running | any part at higher temperatures causes increased risk of | failure, whether it be a computer part or a mechanical | part. _How much?_ I don 't know - I just don't blame JEDEC | for recognizing this is not a clear and obvious win. | | Imagine if the failure rate was raised by as little as 1%. | RAM Failure is not uncommon compared to other parts - I've | had it happen before and render a system unable to boot, | that's why we have Memtest86 and not CPUtest86 or | SSDtest86. A 1% increase in failure over 5 years would have | effects just as unbelievable as the power saving that | increasing the temperature would be. How many smartphones | would be junked? How many PCs would be thrown out for not | working by people who are average Joes who can't diagnose | them, and the extra waste that generates from both | disposing the old PC and purchasing a new one? Perhaps the | new PC is more efficient, but which is better, greater | emissions or more eWaste in the ground due to the new PC | being likely more efficient than the old one? | | The point is that it is not a clear win. With further | research it might be, and I might be all for it. I'm only | nitpicking the description of it as being a "compromise" as | though it were obvious. | | [@picture: I'm at my posting limit for the day because HN | is, well... I'll leave their censorship policies for | another day. I would agree with you if the RAM with the 90C | limit were strictly ECC RAM because that is most often used | in data centers and not consumer parts. Maybe we have non- | ECC/85 RAM and ECC/90 RAM options...] | dcow wrote: | Well now you're just being hyperbolic. As you say, this | is an engineering problem so solutions are far from their | ideals states in either direction. However, 1% increase | in ram failure rates ruining the world? That doesn't | sound right. Errors are encountered in RAM _all the time_ | and guess what, they 're corrected often by the hardware | before even bothering the system. I'm sure we could deal | with a 1% increase... | Spooky23 wrote: | Most datacenter hardware is fine at 95 degrees F (inlet | temp). Approved configurations are usually available to | 105 degrees F or slightly higher. Some devices can run as | high as 130F. | | In the operating range, you're not going to have any | measurable change in operations or failure rate - if you | do the parts are defective. All of the stories you hear | about this and that are conjecture. | smolder wrote: | Interestingly, computer chips can often be run at lower | voltage and wattage for a given frequency if they are | kept at a colder temperature. As a home user I can | significantly reduce power draw for a CPU/GPU by | improving the cooling solution and lowering voltages. | | The reasons this doesn't work for datacenters is two- | fold, I think: First, they won't see efficiency | improvements just by keeping their CPUs and GPUs (or RAM) | cooler because the power levels/tables for the chips are | baked-in, and operators aren't going to the trouble of | tweaking voltages themselves. Second, even if they did | tweak voltages, the cost of sustaining lower temperatures | with better cooling likely won't outweigh the savings | resulting from lower power draw for the chips. | | Still, this raises the question of whether designing | hardware for higher operating temperatures is always the | right move. At some point there's going to be a cost in | performance and/or efficiency that outweighs the savings | from allowing higher temperatures. Ideally these | tradeoffs should be balanced as a whole. | kllrnohj wrote: | I think you missed the biggest reason this doesn't do | much for servers - they _already_ run at low frequencies | & voltages. | | For example take the Epyc 7742, at 225W it sounds super | power hungry. But the 64-core chip only boosts to 3.4ghz | max (2.25ghz base). That's less than the base clock of | almost any of the Ryzen consumer CPUs. And if you look at | the lower frequency end of https://images.anandtech.com/d | oci/16214/PerCore-1-5950X.png there's not a whole heck of | a lot of efficiency gains likely to be had below that | ~3-3.4ghz mark. They're already basically sipping power | at something like 3w per CPU core or less. 225w / 64c = | 3.5w/c, _but_ the IO uncore isn 't exactly cheap to run | and iirc sits more like in the 50-70w range. So subtract | that out and you're at more like 2.5-2.7w/c. I don't | think throwing cooling at this is really going to get you | much of a gain. | bradstewart wrote: | This usually isn't true _at scale_ though. The chip | manufacturers do a ton of validation and qualification to | set the operating parameters (voltage, etc). | | You can undervolt (or overclock) one specific chip, | individuals have been doing this at home for basically | ever, but there's (almost) always a system-specific | validation process you then do to make sure the system is | stable for a specific workload with the new parameters. | | And these parameters differ between batches of chips, or | even between chips within a batch. | | It's also significantly harder to drastically reduce the | temperature of the chips inside of a single server, given | the machine density of a typical data center. | rbanffy wrote: | > It's also significantly harder to drastically reduce | the temperature of the chips inside of a single server, | given the machine density of a typical data center. | | It'd be fun, however, if we could dynamically adjust that | according to workload. If workload is light, you could | consolidate load into fewer sockets/memory sticks and | power down everything in that socket. | picture wrote: | That's for sure. I 100% agree that increased temperature | will statistically increase failure rate. I'm just | thinking that, the most common mechanisms of thermal | failure in electronics are caused by repeated thermal | cycling which cause fatigue and stress failures at | interconnects (solder bumps, silicon bonding, etc). Data | centers are designed to be operated in a relatively very | constant temperature environment, so I would suspect that | the failure rate may not be raised significantly. | uoaei wrote: | One single company changing one single design parameter and | enabling savings on the scale of _multiple power plants_? | That is as immediate as wins get. | __alexs wrote: | Maybe RAM will finally get more than a 4mm thermal pad and a | random bit of Alu for cooling. Seems like most cooling designs | have treated RAM as even more of an after thought than VRMs up | until recently. | | Even in most servers the accommodation for RAM cooling has | basically just been orientating the DIMMs to line up with | airflow. They are still packed together with minimal clearance. | dcow wrote: | > cooling has basically just been orientating the DIMMs to line | up with airflow | | Isn't that server cooling in a nutshell? Ram high volumes of | airflow through the chassis with stupidly loud fans and hope | the parts stay cool? | dodobirdlord wrote: | You still need to conduct the heat away from the sources to a | radiator of some sort, since cooling is proportional to | surface area and it's much easier to increase surface area by | adding fins than by increasing airflow. You can only speed up | the air to a certain point, past which better cooling becomes | a matter of shaping the components for more contact with the | air. | kllrnohj wrote: | Sure but the airflow over DIMMs in a server chassis is | already _vastly_ more cooling than RAM gets in any consumer | application other than GDDR on GPUs. | __alexs wrote: | The density is also vastly higher. | Ekaros wrote: | Yeah, kinda weird that on ATX RAM is placed in way that | is perpendicular to usual CPU cooling or even general air | flow. The top mounted fans do change this, but I don't | think those are very common. | sbierwagen wrote: | Makes it easier to keep all the traces the same length: h | ttps://electronics.stackexchange.com/questions/74789/purp | os... | kllrnohj wrote: | Although the entire socket can be rotated 90* for even | better traces, which is what the EVGA Kingpin | motherboards do ( | https://www.evga.com/articles/01543/EVGA-Z690-DARK- | KINGPIN/ ) | mjevans wrote: | This design would have made so much more sense before the | top of the case closed loop watercooler radiator setups | became popular. | | I still like this a lot, but now the top down fan and | some kind of ducting to help direct the air out the top / | side vent makes more sense. There's so much heat these | days everyone needs the baffles inside of a case. | AshamedCaptain wrote: | I am not sure how much this is related to external cooling | versus actually internal thermal dissipation. DDR JEDEC | standards have actually decreased power consumption on every | generation. | __alexs wrote: | They have reduced voltage but power consumption per sq-mm has | gone up with increased densities. Many people run DRAM at | above JEDEC speeds which usually requires higher voltages | too. | | Peak power consumption of DDR4 is around 375mW/GB @ 1.2V, | DDR5 drops this about 10% but also increases the maximum | density of a DIMM by 8x to 512GB which is like, 150W for a | single DIMM. | formerly_proven wrote: | There are only three (tiny) 12 V power pins on a DDR5 | module, neither that nor the form factor allows for | dissipating anywhere close to 150 W. The teased 512 GB | Samsung module doesn't even have a heatspreader. | __alexs wrote: | VIN_BULK is 5V with a max current of 2A but every data | pin provides current that is used on the DIMM in some | respect. | deelowe wrote: | There's been talk of eliminating sockets for years. Something | has got to give. | jnwatson wrote: | You can still actively cool socketed RAM. | deelowe wrote: | Sort of. Trace length is already a nightmare. | zeroth32 wrote: | more compact chips will have a higher failure rate. Not a | great idea for servers. | to11mtm wrote: | There's a fun curve on this to be sure. | | If I had to guess, Servers would not go any further than | some sort of memory-backplane where the memory for multiple | channels was integrated onto a single PCB. | | Even then, IIRC hot-swapping of memory modules is a thing | for some servers, so that will have to be handled somehow. | wallaBBB wrote: | Question that comes to mind - Is M2 (with thermal issue on new | Air) affected considering how RAM is packed there? | Toutouxc wrote: | What thermal issues? All I've seen so far are people who don't | seem to understand how passive cooling works, despite the M1 | Air being out for two years and working the same way. | nostrademons wrote: | The M2 chip generates more heat than the M1, with 20% more | transistors and about a 12% higher clock speed. M2 Mac Pro | has thermal issues compared to M1 Mac Pro as well, even with | the fan. | buryat wrote: | mac pro doesn't have m1/2 | ywain wrote: | They were likely referring to the laptop Macbook Pro, not | the desktop Mac Pro. | webmobdev wrote: | Perhaps OP came across this recent article - _Reviewers | agree: The M2 MacBook Air has a heat problem_ - | https://www.digitaltrends.com/computing/m2-macbook-air- | revie... . | GeekyBear wrote: | Throttles under load isn't a heat problem. | | This review of Lenovo's Thinkpad Yoga is what a heat | problem looks like: | | >Unfortunately, the laptop got uncomfortably hot in its | Best performance mode during testing, even with light | workloads. | | https://arstechnica.com/gadgets/2022/07/review-lenovos- | think... | | Too hot to comfortably touch, even under light workloads, | unless you set it to throttle all the time? That's a heat | problem. | tedunangst wrote: | Is it really a problem if it's designed to thermally | throttle? | EricE wrote: | It is if you are expecting maximum performance. | tinus_hn wrote: | Perhaps that's the problem. Their expectations are | unrealistic. Did Apple promise no thermal throttle? | Dylan16807 wrote: | Marketing usually talks about unthrottled speed only, | including Apple's here as far as I have seen. | jhallenworld wrote: | Maybe DRAM becomes non-viable, so switch to SRAM. Which is | denser, 14 nm DRAM or 5 nm SRAM? | 55873445216111 wrote: | SRAM is ~10x higher cost per bit (due to memory cell size) than | DRAM | [deleted] | to11mtm wrote: | DRAM. | | IIRC TSMC's 135MBit 5nm example is 79.8mm^2, although that's | got other logic. | | In the abstract, a 0.021 square-micrometer-per-bit size [1] | says you'd need about 21mm^2 for a gigabit (base 10) of 5nm | SRAM, without other logic. | | Micron claimed 0.315Gb/mm^2 on their 14nm process, [2] so | somewhere between a factor of 6 and 7. | | That said, my understanding is that there is some sort of wall | around 10nm, where we can't really make smaller capacitors and | thus the limitation on things. (This may have changed since I | last was aware however.) | | (There is also the way than 'nm' works these days... but I'm | not qualified to speak on that) | | Also, AFAIK SRAM is still broadly speaking more power hungry | than DRAM (I may be completely out of date on this though...) | | [1] - https://fuse.wikichip.org/news/3398/tsmc-details-5-nm/ | | [2] - https://semiengineering.com/micron-d1%CE%B1-the-most- | advance... | Victerius wrote: | > A few overheated transistors may not greatly affect | reliability, but the heat generated from a few billion | transistors does. This is particularly true for AI/ML/DL designs, | where high utilization increases thermal dissipation, but thermal | density affects every advanced node chip and package, which are | used in smart phones, server chips, AR/VR, and a number of other | high-performance devices. For all of them, DRAM placement and | performance is now a top design consideration. | | I know this may not be a cheap solution, but why not start | selling pre-built computers with active cooling systems? | Refrigerant liquids like those used in refrigerators or water | cooling could be an option. The article addresses this: | | > Although it sounds like a near-perfect solution in theory, and | has been shown to work in labs, John Parry, industry lead, | electronics and semiconductor at Siemens Digital Industries | Software, noted that it's unlikely to work in commercial | production. "You've got everything from erosion by the fluid to | issues with, of course, leaks because you're dealing with | extremely small, very fine physical geometry. And they are | pumped. One of the features that we typically find has the lowest | reliability associated with it are electromechanical devices like | fans and pumps, so you end up with complexity in a number of | different directions." | | So instead of integrating fluids within the computer, build | powerful mini-freezers for computers and store the computer | inside. Or split the warm transistors from the rest of the build | and store only those inside the mini freezer, with cables to | connect to the rest of the computer outside. | CoolGuySteve wrote: | I've always wondered why motherboards aren't placed at a slight | angle like the wing of a car so that the air moving over it has | a higher angle of incidence, higher pressure, and higher | thermal capacity. | | With the angle, you can also place cable connectors and whatnot | on the bottom of the board so they don't obstruct airflow as | much. | | Basically, optimize PV = nRT inside the computer case at no | extra cost other than a redesign. | saltcured wrote: | I'm struggling slightly to envision the effect you are | seeking. My motherboards don't tend to be flying through the | air and so lack a well-defined angle of attack... :-) There | already exist horizontal and vertical motherboard mounts in | different computer cases, including ones that could be stood | either way to suit the desktop. In my experience, this | doesn't affect cooling that much. | | I think the fan, internal baffle, and vent positions dominate | the airflow conditions inside the case. So, rather than | tilting a motherboard, wouldn't you get whatever you are | after with just a slight change in these surrounding | structures? | CoolGuySteve wrote: | You seem to be ignoring that all the punch through | connectors on a board are currently on the side that air | must pass over. | | Furthermore, I've never seen a case, either desktop or | rackmount, that allows one to angle the fans at anything | other than a 90 degree angle or parallel to the board. | | None of this makes sense in terms of fluid dynamics. | saltcured wrote: | Having a smooth board seems at odds with having a large | surface area for heat transfer, doesn't it? And wouldn't | laminar flow also have less movement near the surface? | For optimal cooling, would you actually want turbulence | to mix the layers? Instead of mounting fans at different | angles, add some vanes or even duct work to aim and | accelerate the flow where it needs to transfer heat. | | But, given that boards do not have completely | standardized layouts, it seems like you eventually need | to assume a forest of independent heat sinks sticking up | in the air. You lose the commodity market if everything | has to be tailor made, like the integrated heat sink and | heat pipe systems in laptops. | SketchySeaBeast wrote: | I would assume because the things with the greatest heat have | typically had such a requirement for active cooling that | minor optimization wouldn't have helped much and for | everything else you really didn't worry about (though my | motherboard now has heat-pipes across the VRMs and my RAM and | northbridge have got big old heat spreaders). | CoolGuySteve wrote: | Yeah, the way my case is laid out, airflow to the VRM is | blocked by the heat spreaders on the RAM and the ATX power | connector. AMD systems in particular seem to require better | memory thermals. | | It seems like we're reaching a point where a new ATX | standard is required to ensure the memory and GPU can make | contact with a large heatsink similar to how the trashcan | Mac Pro and XBox Series X are designed. Doing so would also | cut down on the ridiculous number of fans an overclocked | gaming PC needs these days, my GPU and CPU heatsinks have 5 | 80mm fans mounted to them. | | ATX is great but it seems like only minor improvements to | power connectors and whatnot have been made since it was | introduced in 1995. | Macha wrote: | Are the trashcan Mac Pro and Xbox Series X considered | effecient cooling solutions? I thought the trashcan Pro | had issues at higher temperatures which in turn limited | their ability to use higher end parts and in turn forced | the return of the cheese grater? | | The series X GPU then is considered equivalent to a | desktop 3070, and laptop 3080s exist and are also | considered equivalent to a desktop 3070, so don't require | anything particularly novel in terms of cooling solutions | (3080 laptops are loud under load, but so is the series | X). | | Overclocked components are so heavy in cooling needs as | they're being run so far outside their most efficient | window to get the maximum performance - which is why | datacenters which care more about energy usage than | gamers tend to use lower clocked parts. | CoolGuySteve wrote: | Both systems are a fraction of the size of an ATX case | and as efficient as they needed to be to meet their | predetermined convective cooling needs. In both cases, | profit margin is increased by reducing material and | shipping volume requirements. | | A similar single heatsink design for high end PCs would | need to be much larger than either of those designs but | considering how much empty space is in an ATX case, I | don't think it would be much larger than current PCs. | | Consider that the best PC cooling solutions all look like | this: https://assets1.ignimgs.com/2018/01/18/cpucooler-12 | 80-149617... | | Or pass liquid through a radiator with comparable volume, | standardizing the contact points for a single block | heatsink with larger fans would make computers more | efficient and quiet. | picture wrote: | It won't be a simple redesign to tilt boards "slightly" | because manufacturing processes that are already honed in | need to be completely retooled, with likely more complexity | (man different length of standoffs per board?) | | And additionally, there are only a few key components of a | motherboard that need cooling. Most of the passive components | like the many many decoupling capacitors don't generate | significant heat. The components that do require access to | cool air are already fitted with finned heat sinks and even | additional fans. They interact with air enough to where a | slight tilt cannot make a meaningful difference. | | Basically just adding a small piece of aluminum to key areas | will work better than angling the whole board | kllrnohj wrote: | You don't really need to pass any air over the PCB, though. | Anything that needs cooling sticks up above it. Also the | airflow through a case isn't perfectly parallel to the | motherboard PCB anyway. GPU fans throw the air in all sorts | of directions, including straight down into the motherboard. | And so do CPU coolers. | | Cables also don't really obstruct the airflow like at all. | dangrossman wrote: | The article mentions that the automotive industry demands some | of the largest temperature ranges for these parts. New cars are | basically computers on wheels (especially something like a | Tesla), and the cabin in a hot day under a glass roof can | easily exceed 170F. Where will the freezer you build around all | the computers go, and how will it be powered while the car is | sitting parked in a lot? | outworlder wrote: | > I know this may not be a cheap solution, but why not start | selling pre-built computers with active cooling systems? | Refrigerant liquids like those used in refrigerators or water | cooling could be an option. | | Before going into water cooling, a change in form factor to | allow for better airflow (and mounting of larger heat sinks) | would be in order. | | Water cooling would require a water cooling block, not sure how | it would work with the current form factor. | | > So instead of integrating fluids within the computer, build | powerful mini-freezers for computers and store the computer | inside. Or split the warm transistors from the rest of the | build and store only those inside the mini freezer, with cables | to connect to the rest of the computer outside. | | That's impractical. You are heat exchanging with the air, then | you are cooling down the air? Versus exhaust the hot air and | bringing more from the outside. You just need to dissipate | heat, active cooling is not needed. | kube-system wrote: | Heat pipes are the phase-change cooling solution that solves | all of those issues. People don't really think of their cheap | laptop as having a phase-change liquid cooling system, but it | actually does. | _jal wrote: | For most commercial use, you're talking about refrigerated | racks. They exist, but they're pretty niche. | | In a typical data center, all this does is decentralize your | cooling. Now you have many smaller (typically less robust) | motors to monitor and replace, and many drain lines much closer | to customer equipment and power. | | Those units take up a lot more space, too, because of the | insulation. | tbihl wrote: | The elevated temperatures of the overheating components are | such that fluid flow, not temperature difference, is the thing | to go after, and it also has the advantage of being much | simpler than adding a whole refrigeration cycle. | | These problems start to read like problems from nuclear power, | where sufficiently uniform flow is a huge deal so that various | materials aren't compromised in the reactor. | beckingz wrote: | Condensation in most environments gets really rough on | computers. | | In theory you can eliminate condensation. | | But in practice, there's a difference between theory and | practice. | tonetheman wrote: | i would hang the memory upside down so that condensation goes | away from the electronics then put in a catch tray at the | bottom for evaporation. | | I am sure there is a lot more to it than that though... ha | beckingz wrote: | More of an issue on the motherboards where it will | eventually get into something. | dclowd9901 wrote: | Heat also moves upward so that would probably cause the | board and its components to get too hot. | dtx1 wrote: | why not integrate the ram into the package like apple does | anyway and use a slightly larger SoC Cooling solution for the | chips? Or just attach headspreaders to ram modules (like gaming | modules) and add a fan for them like servers already do due to | their general front to back airflow design. The only thing you | can't do anymore is relying on the passive cooling of the chips | own surface, something CPUs can't do anymore since the early | 90s | toast0 wrote: | Apple does a ram on top system right? | | That's not going to be viable for servers ror two big | reasons: | | a) it would big a major capacity limitation; you're not | fitting 8-16 DIMMs worth of ram ontop of the CPU. Sure, not | everyone fills up their servers, but many do. | | b) if you put the ram on top of the cpu, all of the cpu heat | needs to transit the ram, which practically means you need a | low heat cpu. This works for Apple, their laptop cooling | design has never been appropriate for a high heat cpu, but | servers manage to cool hundred watt chips in 1U through | massive airflow, so high heat enables more computation. | | Heatspreaders may make their way into server ram though | (although not so big, cause a lot of servers are 1U) | | Otoh, the article says | | > 'From zero to 85degC, it operates one way, and at 85deg to | 90degC, it starts to change,'" noted Bill Gervasi, principal | systems architect at Nantero and author of the JEDEC DDR5 | NVRAM spec. "From 90deg to 95degC, it starts to panic. Above | 95degC, you're going to start losing data, so you'd better | start shutting the system down." | | CPUs commonly operate in that temperature range, but RAM | doesn't pull that much power, so it doesn't get too much | above ambient as long as there's some airflow, and if ambient | hits 50C, most people are going to shutdown their severs | anyway. | kube-system wrote: | Maybe we could architect servers with more CPU packages and | fewer cores per package? | | Maybe instead of 32 RAM packages and 4 CPU packages, we | could have 16 CPU packages each with onboard RAM? | nsteel wrote: | Will these CPUs talk to each other with a similar latency | hit as we get from talking to DRAM today? | __alexs wrote: | The M1/M2 has the RAM on the same package as the CPU but | it's not actually on top of the die, it's adjacent to it. | Here's a pic of one someone on reddit delided | https://imgur.com/a/RhGk1xw | | Obviously this is still a lot of heat in a small space but | it does mean the cooler gets to have good coupling with the | die rather than going all the way through some DRAM first. | toast0 wrote: | That's more tractable. Gotta make sure everything hits | the same z-height and the contact patches are right... | But you still have a capacity issue. | SketchySeaBeast wrote: | Big old SoC really do seem like the future. CPU, GPU, RAM, | motherboard controllers, throw all those different problem | onto a big old die and optimize for cooling that guy. | foobiekr wrote: | SOCs are harder, not easier, to cool. | AtlasBarfed wrote: | Yeah I don't understand why a dedicated fan and other basic | CPU cooling techniques don't apply here. It's probably | because the DRAM industry doesn't want to change form factors | and standards to a substantial degree... | | ... probably because they do the bare minimum to keep up with | CPU design and routinely get busted for cartel price fixing | and predatory pricing. | dtx1 wrote: | I mean literally this https://youtu.be/TFE9wfAfudE?t=611 | Problem solved | jackmott42 wrote: | Active cooling tends to have the challenge of controlling | condensation, and then of course now you are drawing even MORE | power from the wall. | mrtranscendence wrote: | I've seen YouTube videos of overclockers employing | refrigeration techniques (or coolants like liquid nitrogen), | and it does seem like condensation is a major issue. Maybe | that's not as much of a problem at more reasonable | temperatures? | | But yeah, I'd be just as or more concerned about the amount | of power it would take to run a freezer like that ... I'm | already drawing as much as 850 watts for my PC, with a max of | a couple hundred watts for my OLED TV and speakers, and don't | forget the modem and router, and a lamp to top it all off; | would a powerful enough mini freezer to cool my PC even fit | on the circuit? | | Actually, it's even worse because I've got an air purifier | running there too ... but I could move that, I suppose. | Ekaros wrote: | Cascade cooling is a fun thing. The next step after water | cooling before getting to liquid nitrogen... | | Still, I wouldn't really go for that. Knowing how noisy the | average compressor and fan for that size is. I much prefer | my nearly silent fan cooled machine... | EricE wrote: | If you are going to go extreme enough to have a | compressor and fan, you can always put them in another | room :p | snarfy wrote: | The biological solution to leaks is clotting. Do we have | cooling liquids that clot like blood does, say when exposed to | oxygen? | kansface wrote: | Great, not your computer can have a thrombosis or a stroke! | xxpor wrote: | the reliability there isn't particularly great ;) | mgsouth wrote: | 50-100 yrs between major overhaul? When's the last time you | had to manually top-up or bleed air out of your circulatory | system? I'd say that's impressively robust. | chmod775 wrote: | It will clot inside the cooling circuit because of the air | within it. Or will get within it. | | However there are ways to prevent and detect leaks in current | system with negative pressure: | https://www.youtube.com/watch?v=UiPec2epHfc | SketchySeaBeast wrote: | So you're taking a 300W-1000W space heater and putting it into | a freezer that needs to be able to bleed that much heat? Going | to need another breaker. | Victerius wrote: | I'm just brainstorming. I can troubleshoot my computer and | write basic code but I'm not a computer engineer. | 7speter wrote: | This has come up often in the comments section of articles | I've seen about prospective 600-900w 40 series nvidia cards. | SketchySeaBeast wrote: | Honestly, the fact that my 3080 can draw 400W makes me kind | of sick and I limit FPS specifically so it doesn't. I can't | ever see myself buying a card that draws double that. | max51 wrote: | You can reduce the power limit a lot on a 3080 before it | impacts performance. The last 3 - 5% of performance they | are getting out of their chip is responsible for more | than a third of the power draw on higher clocked cards. | SketchySeaBeast wrote: | Yeah, I've significantly undervolted both my GPU and CPU. | I now never see 300W, really helped with thermals as | well. | baybal2 wrote: | nonrandomstring wrote: | Still waiting to see the first micro-engineered Stirling engine | that can self-cool. Any physicists care to comment on why that | won't work yet, or ever? | Chabsff wrote: | You can only cool something by making something else warmer by | a larger amount. The heat has to go somewhere, and moving that | heat in any non-passive way will invariably produce yet more | heat in the process. | nonrandomstring wrote: | I think some people are interpreting that as a joke. I'm not | talking about a _net gain_ of energy or any crazy perpetual | motion machine. Think of something like a "heat brake". | Differential heat energy can be converted to mechanical work. | Some of that can be used to cool the system elsewhere, | creating a negative feedback loop. Another way to think of | such a system is like the "reluctance" of an inductor. | | With present thermoelectric effects, using a Seebeck junction | to generate current for a fan is hopelessly ineffective. But | is that necessarily the case for all designs which could help | to hold a system under a critical temperature when heat | spikes. | acomjean wrote: | do you mean something like a solar chimney, where heat is | used to draw air through the rest of the building? | | https://en.wikipedia.org/wiki/Solar_chimney | nonrandomstring wrote: | That's an example of a similar system, but probably | impractical for use in an electronics context. I have in | my imagination a fantasy "smart" material that in the | limit can transfer 0.5 * k^m joules of heat per square | meter per second from one side to the other (where m is | somewhere between 1 and 2). Such a material would always | feel slightly warmer on one side and cooler on the other, | and this effect would actually increase in the presence | of ambient heat, hence it could act as a thermal "brake" | or active heat pipe/diode. I beleieve such a device is | "allowable" within the laws of physics. | ta8645 wrote: | > You can only cool something by making something else warmer | by a larger amount. | | Why isn't it also true that you can only make something | warmer, by cooling something else by a larger amount? | | The movement of electricity generates waste heat, why isn't | that process reversible? Making the heat disappear into a | cold wire, rather than just dissipating into the atmosphere? | (not suggesting it's would be easy or even practical). | nostrademons wrote: | 2nd law of thermodynamics - entropy is always increasing. | Heat transfer is never 100% efficient, you always lose | something in transmission. This is also why it's not | possible to create a perpetual-motion machine. | | https://en.wikipedia.org/wiki/Second_law_of_thermodynamics | nonrandomstring wrote: | Peltier coolers [1] do exist for specialist applications | but they are not at all effective. You can even buy them on | Amazon. If the goal is to iron out a spike to stop your | semiconductor from going into thermal runaway (instead of | generating net energy as is the knee-jerk of some | unimaginative down-voters here) then it's a possible | saviour. | | [1] https://www.britannica.com/science/Seebeck-effect | | [2] https://www.amazon.com/Peltier- | Cooler/s?k=Peltier+Cooler | dylan604 wrote: | Next, we'll have a generation of mobile devices that will be | liquid cooled. Of course because of the miniturization, there | will be no way to refill the liquid coolant without getting a new | device. This will naturally happen before the batteries die | creating an even shorter life cycle in devices. Sounds like a | perfect pitch for an upcoming WWDC type of event. | superkuh wrote: | RAM has been parallel for ages. IBM's new POWER10 achitecture | switches to serial control of ram with firmware running on the | ram sticks. As long as complex mitigations and monitoring are | going to be required this might be the way to go. | bilsbie wrote: | It's at least partly caused by climate change too ___________________________________________________________________ (page generated 2022-07-18 23:00 UTC)