[HN Gopher] DRAM thermal issues reach crisis point
       ___________________________________________________________________
        
       DRAM thermal issues reach crisis point
        
       Author : rbanffy
       Score  : 165 points
       Date   : 2022-07-18 13:37 UTC (9 hours ago)
        
 (HTM) web link (semiengineering.com)
 (TXT) w3m dump (semiengineering.com)
        
       | anthony_r wrote:
       | we're lucky that this happens at 360 Kelvin and not at 200 Kelvin
       | or even lower.
        
         | p1mrx wrote:
         | Note that "kelvin" is lowercase:
         | https://english.stackexchange.com/questions/329629/is-kelvin...
        
         | 8jy89hui wrote:
         | If our habitable temperature was cooler or hotter, we would use
         | different materials to best reflect that environment. I'm not
         | so sure it is luck
        
           | tinus_hn wrote:
           | It's lucky in the northern hemisphere there is an easily
           | recognizable star pointing almost exactly at the North Pole,
           | which makes navigation much easier.
           | 
           | It's lucky some available material worked the right way to
           | make a transistor.
           | 
           | It's lucky some person smart enough to make that work got to
           | work on that.
           | 
           | History is full of lucky coincidences like that. How many
           | Einsteins have died out in the jungle, without access to our
           | scientific knowledge or a way to add to it? For most of
           | history and partly still today, being a scientist wasn't
           | possible for just about anyone, you had to be from the right
           | family. It's _all_ about luck.
        
           | H8crilA wrote:
           | So let's just use those that work up to 400K :)
        
           | somebodynew wrote:
           | There is a bit of luck in even having any viable materials
           | that work at the required temperature to choose from.
           | 
           | For example, humanity hasn't been able to find a single
           | appropriate material for a superconductor at room
           | temperature/atmospheric pressure despite significant
           | research, but a civilization living below 100 K has a myriad
           | of options to choose from. Superconductors are high
           | technology to us, but if your planet is cold enough then
           | superconducting niobium wire would be a boring household item
           | like copper wire is for us.
        
             | dodobirdlord wrote:
             | Niobium superconducts at 9.3K, so that would be a pretty
             | cold household!
        
             | marcosdumay wrote:
             | Hum... We inhabit that temperature exactly because it
             | allows for a wide range of chemical reactions in a
             | controlled fashion.
             | 
             | The Anthropic Principle is not luck.
             | 
             | We are lucky that those interesting things are possible. We
             | are also unlucky that many interesting things are not
             | possible. But given that they are possible, it was almost
             | inevitable that most of them would be possible around us.
        
         | YakBizzarro wrote:
         | well, depends how you define lucky. at cryogenic temperature,
         | the leakage current of a transistor is so small that you
         | virtually don't require DRAM refresh. I tested DRAM cells with
         | discharge times of hours, and the transistor was not at all
         | optimized. See https://www.rambus.com/blogs/part-1-dram-goes-
         | cryogenic/ (not my work)
        
         | klodolph wrote:
         | It's a combination of chemistry and geometry (and other
         | factors). Maybe there's some luck.
         | 
         | There are ICs and components built for operating in extreme
         | environments, like drilling. You can get SiC (silicon carbide)
         | chips that operate above 200degC (473K), if that's important to
         | you. There are also various semiconductors that are worse than
         | silicon at handling high temperatures, like germanium. Old
         | germanium circuits sometimes don't even work correctly on a hot
         | day.
         | 
         | If we lived at 200K, I'm sure that there's a host of
         | semiconductor materials which would be available to us which
         | don't work at 300K.
        
       | dusted wrote:
       | Sounds like nothing a little liquid nitrogen can't fix.
       | 
       | > (as a standard metric, about once every 64 milliseconds)
       | 
       | 64 milliseconds? wow.. I thought they'd need refreshing way more
       | often
        
       | brutusborn wrote:
       | I loved this part at the end: "By contrast, allowing a
       | temperature increase for chips in large data centers could have
       | surprising environmental benefits. To this point, Keysight's
       | White recalled that a company once requested JEDEC increase the
       | spec for an operating temperature by five degrees. The estimate
       | of the potential savings was stunning. Based on how much energy
       | they consumed annually for cooling, they calculated a five degree
       | change could translate to shutting down three coal power plants
       | per year. JEDEC ultimately compromised on the suggestion."
        
         | JJMcJ wrote:
         | I've heard of some large companies that run their data centers
         | hot.
         | 
         | Cheaper to have a slightly higher failure rate, or have the
         | computers throttle their clock speed, than to pay for extra air
         | conditioning.
        
           | woleium wrote:
           | Could also be more likely that it's cheaper to extend the
           | life of an older DC by accepting higher temperatures and
           | failure rates than to upgrade the hvac to accommodate newer
           | higher density designs
        
           | klysm wrote:
           | Hard to do the math here though because you don't know the
           | failure statistics in advance. Kind of a multi-armed bandit
           | problem of sorts.
        
             | magicalhippo wrote:
             | Wouldn't the Arrhenius equation[1] be a good approximation?
             | It's used in the industry[2] from what I know.
             | 
             | Of course you'll need some data for calibrating the model,
             | but if you got that?
             | 
             | [1]: https://en.wikipedia.org/wiki/Arrhenius_equation
             | 
             | [2]: https://www.ti.com/lit/an/snva509a/snva509a.pdf
        
               | buescher wrote:
               | Well, sometimes, and more frequently now at really small
               | process nodes, but mostly no. Most electronic failures
               | are basically mechanical and thermal cycling will cause
               | fatigue failures more than elevated temperatures will
               | accelerate things like electromigration. Lots of people
               | still use 1960s style handbook methods anyway because
               | there's no plug-and-chug replacement.
               | 
               | The groundbreaking work here was by Michael Pecht back in
               | the early nineties:
               | https://apps.dtic.mil/sti/pdfs/ADA275029.pdf
        
         | gjsman-1000 wrote:
         | _Only three?_ That 's not an immediate win. What if the
         | temperature increase causes just so slightly more failures,
         | causing so slightly more replacements, and each replacement
         | requires energy to make, ship, install, replace, recycle, the
         | effects of increased demand... What if it lasts slightly less
         | long, causing more early failures and eWaste? After all that
         | potential risk, is it a benefit still, and if so, how much?
         | 
         | We don't know and it is hard to know - but I don't blame JEDEC
         | and would not call it a "compromise" on their part like it was
         | a superior option.
        
           | Spooky23 wrote:
           | For one company? That's pretty impressive.
           | 
           | When I was on an architecture team that consolidated ~80
           | datacenter to 3 circa 2010, this was a key dollar driver. We
           | raised the temperature ~6 degrees from the average temp,
           | which meant kicking out a few vendors initially. The cost
           | savings for doing that was essentially the total operational
           | costs of 5 datacenters.
           | 
           | The annual failure rates for the hardware did not change at
           | all by any metric. Number of service impacting hardware
           | failures went to zero due to the consolidation.
           | 
           | In general, if you operate within the operating ranges of
           | your hardware, you won't have failure. You will have
           | complaints from employees, because computers will operate at
           | temperatures not comfortable for humans.
        
             | benlivengood wrote:
             | It's almost certainly Google, since they've historically
             | ran their data centers hotter than most [0]. Cooling
             | efficiency increases with a higher delta-T to the working
             | fluid, and Google uses a gigawatt or two continuously [1].
             | From PUE numbers that's hundreds of MW spent on cooling, so
             | making it more efficient is quite worth it.
             | 
             | [0] https://www.google.com/about/datacenters/efficiency/
             | [1] https://www.cnbc.com/2022/04/13/google-data-center-
             | goal-100p...
        
             | sllabres wrote:
             | We did the same several years ago. At the time I found http
             | ://www.cs.toronto.edu/~bianca/papers/temperature_cam.pdf
             | quite interesting. I didn't find many other precise papers
             | about issues when running at a higher temperatures, but
             | many "should" and "can".
             | 
             | Based on ASHARE there are two guidelines from HPE and IBM:
             | https://www.chiltrix.com/documents/HP-ASHRAE.pdf
             | https://www.ibm.com/downloads/cas/1Q94RPGE
             | 
             | We found (by measurement) that some places in the
             | datacenter with suboptimal airflow are well over the
             | average or simulated temperature so one can leave the safe
             | temperature envelope if one isn't careful.
        
               | Spooky23 wrote:
               | The beauty of a big project like is you get the
               | engineering resources to lake sure the datacenter is
               | working right.
               | 
               | The hyper scale people take this to the next degree.
        
             | Freestyler_3 wrote:
             | Don't you want to have AC running on cooling mode or
             | dehydration mode to get water out of the air?
             | 
             | edit: this makes me wonder what is the ideal humidity in a
             | data centre? Is too dry a thing?
        
               | Dylan16807 wrote:
               | Dry air makes static electricity buildup more.
        
           | picture wrote:
           | Cmon.. logically it has to be more than just three right?
           | "Cooling efficiency" sometimes come in unit of W/degC
           | difference, so I'd imagine that a few more degrees would be a
           | huge deal.
        
             | marcosdumay wrote:
             | You usually have to budget a few degC of difference just
             | for pushing enough energy through heat exchangers. So the
             | rate of chip temperature / external temperature is lower
             | than the rate that effectively determines the cooling
             | efficiency.
        
             | gjsman-1000 wrote:
             | I don't know how much power efficiency would be saved - my
             | concern is more that it is completely logical that running
             | any part at higher temperatures causes increased risk of
             | failure, whether it be a computer part or a mechanical
             | part. _How much?_ I don 't know - I just don't blame JEDEC
             | for recognizing this is not a clear and obvious win.
             | 
             | Imagine if the failure rate was raised by as little as 1%.
             | RAM Failure is not uncommon compared to other parts - I've
             | had it happen before and render a system unable to boot,
             | that's why we have Memtest86 and not CPUtest86 or
             | SSDtest86. A 1% increase in failure over 5 years would have
             | effects just as unbelievable as the power saving that
             | increasing the temperature would be. How many smartphones
             | would be junked? How many PCs would be thrown out for not
             | working by people who are average Joes who can't diagnose
             | them, and the extra waste that generates from both
             | disposing the old PC and purchasing a new one? Perhaps the
             | new PC is more efficient, but which is better, greater
             | emissions or more eWaste in the ground due to the new PC
             | being likely more efficient than the old one?
             | 
             | The point is that it is not a clear win. With further
             | research it might be, and I might be all for it. I'm only
             | nitpicking the description of it as being a "compromise" as
             | though it were obvious.
             | 
             | [@picture: I'm at my posting limit for the day because HN
             | is, well... I'll leave their censorship policies for
             | another day. I would agree with you if the RAM with the 90C
             | limit were strictly ECC RAM because that is most often used
             | in data centers and not consumer parts. Maybe we have non-
             | ECC/85 RAM and ECC/90 RAM options...]
        
               | dcow wrote:
               | Well now you're just being hyperbolic. As you say, this
               | is an engineering problem so solutions are far from their
               | ideals states in either direction. However, 1% increase
               | in ram failure rates ruining the world? That doesn't
               | sound right. Errors are encountered in RAM _all the time_
               | and guess what, they 're corrected often by the hardware
               | before even bothering the system. I'm sure we could deal
               | with a 1% increase...
        
               | Spooky23 wrote:
               | Most datacenter hardware is fine at 95 degrees F (inlet
               | temp). Approved configurations are usually available to
               | 105 degrees F or slightly higher. Some devices can run as
               | high as 130F.
               | 
               | In the operating range, you're not going to have any
               | measurable change in operations or failure rate - if you
               | do the parts are defective. All of the stories you hear
               | about this and that are conjecture.
        
               | smolder wrote:
               | Interestingly, computer chips can often be run at lower
               | voltage and wattage for a given frequency if they are
               | kept at a colder temperature. As a home user I can
               | significantly reduce power draw for a CPU/GPU by
               | improving the cooling solution and lowering voltages.
               | 
               | The reasons this doesn't work for datacenters is two-
               | fold, I think: First, they won't see efficiency
               | improvements just by keeping their CPUs and GPUs (or RAM)
               | cooler because the power levels/tables for the chips are
               | baked-in, and operators aren't going to the trouble of
               | tweaking voltages themselves. Second, even if they did
               | tweak voltages, the cost of sustaining lower temperatures
               | with better cooling likely won't outweigh the savings
               | resulting from lower power draw for the chips.
               | 
               | Still, this raises the question of whether designing
               | hardware for higher operating temperatures is always the
               | right move. At some point there's going to be a cost in
               | performance and/or efficiency that outweighs the savings
               | from allowing higher temperatures. Ideally these
               | tradeoffs should be balanced as a whole.
        
               | kllrnohj wrote:
               | I think you missed the biggest reason this doesn't do
               | much for servers - they _already_ run at low frequencies
               | & voltages.
               | 
               | For example take the Epyc 7742, at 225W it sounds super
               | power hungry. But the 64-core chip only boosts to 3.4ghz
               | max (2.25ghz base). That's less than the base clock of
               | almost any of the Ryzen consumer CPUs. And if you look at
               | the lower frequency end of https://images.anandtech.com/d
               | oci/16214/PerCore-1-5950X.png there's not a whole heck of
               | a lot of efficiency gains likely to be had below that
               | ~3-3.4ghz mark. They're already basically sipping power
               | at something like 3w per CPU core or less. 225w / 64c =
               | 3.5w/c, _but_ the IO uncore isn 't exactly cheap to run
               | and iirc sits more like in the 50-70w range. So subtract
               | that out and you're at more like 2.5-2.7w/c. I don't
               | think throwing cooling at this is really going to get you
               | much of a gain.
        
               | bradstewart wrote:
               | This usually isn't true _at scale_ though. The chip
               | manufacturers do a ton of validation and qualification to
               | set the operating parameters (voltage, etc).
               | 
               | You can undervolt (or overclock) one specific chip,
               | individuals have been doing this at home for basically
               | ever, but there's (almost) always a system-specific
               | validation process you then do to make sure the system is
               | stable for a specific workload with the new parameters.
               | 
               | And these parameters differ between batches of chips, or
               | even between chips within a batch.
               | 
               | It's also significantly harder to drastically reduce the
               | temperature of the chips inside of a single server, given
               | the machine density of a typical data center.
        
               | rbanffy wrote:
               | > It's also significantly harder to drastically reduce
               | the temperature of the chips inside of a single server,
               | given the machine density of a typical data center.
               | 
               | It'd be fun, however, if we could dynamically adjust that
               | according to workload. If workload is light, you could
               | consolidate load into fewer sockets/memory sticks and
               | power down everything in that socket.
        
               | picture wrote:
               | That's for sure. I 100% agree that increased temperature
               | will statistically increase failure rate. I'm just
               | thinking that, the most common mechanisms of thermal
               | failure in electronics are caused by repeated thermal
               | cycling which cause fatigue and stress failures at
               | interconnects (solder bumps, silicon bonding, etc). Data
               | centers are designed to be operated in a relatively very
               | constant temperature environment, so I would suspect that
               | the failure rate may not be raised significantly.
        
           | uoaei wrote:
           | One single company changing one single design parameter and
           | enabling savings on the scale of _multiple power plants_?
           | That is as immediate as wins get.
        
       | __alexs wrote:
       | Maybe RAM will finally get more than a 4mm thermal pad and a
       | random bit of Alu for cooling. Seems like most cooling designs
       | have treated RAM as even more of an after thought than VRMs up
       | until recently.
       | 
       | Even in most servers the accommodation for RAM cooling has
       | basically just been orientating the DIMMs to line up with
       | airflow. They are still packed together with minimal clearance.
        
         | dcow wrote:
         | > cooling has basically just been orientating the DIMMs to line
         | up with airflow
         | 
         | Isn't that server cooling in a nutshell? Ram high volumes of
         | airflow through the chassis with stupidly loud fans and hope
         | the parts stay cool?
        
           | dodobirdlord wrote:
           | You still need to conduct the heat away from the sources to a
           | radiator of some sort, since cooling is proportional to
           | surface area and it's much easier to increase surface area by
           | adding fins than by increasing airflow. You can only speed up
           | the air to a certain point, past which better cooling becomes
           | a matter of shaping the components for more contact with the
           | air.
        
             | kllrnohj wrote:
             | Sure but the airflow over DIMMs in a server chassis is
             | already _vastly_ more cooling than RAM gets in any consumer
             | application other than GDDR on GPUs.
        
               | __alexs wrote:
               | The density is also vastly higher.
        
               | Ekaros wrote:
               | Yeah, kinda weird that on ATX RAM is placed in way that
               | is perpendicular to usual CPU cooling or even general air
               | flow. The top mounted fans do change this, but I don't
               | think those are very common.
        
               | sbierwagen wrote:
               | Makes it easier to keep all the traces the same length: h
               | ttps://electronics.stackexchange.com/questions/74789/purp
               | os...
        
               | kllrnohj wrote:
               | Although the entire socket can be rotated 90* for even
               | better traces, which is what the EVGA Kingpin
               | motherboards do (
               | https://www.evga.com/articles/01543/EVGA-Z690-DARK-
               | KINGPIN/ )
        
               | mjevans wrote:
               | This design would have made so much more sense before the
               | top of the case closed loop watercooler radiator setups
               | became popular.
               | 
               | I still like this a lot, but now the top down fan and
               | some kind of ducting to help direct the air out the top /
               | side vent makes more sense. There's so much heat these
               | days everyone needs the baffles inside of a case.
        
         | AshamedCaptain wrote:
         | I am not sure how much this is related to external cooling
         | versus actually internal thermal dissipation. DDR JEDEC
         | standards have actually decreased power consumption on every
         | generation.
        
           | __alexs wrote:
           | They have reduced voltage but power consumption per sq-mm has
           | gone up with increased densities. Many people run DRAM at
           | above JEDEC speeds which usually requires higher voltages
           | too.
           | 
           | Peak power consumption of DDR4 is around 375mW/GB @ 1.2V,
           | DDR5 drops this about 10% but also increases the maximum
           | density of a DIMM by 8x to 512GB which is like, 150W for a
           | single DIMM.
        
             | formerly_proven wrote:
             | There are only three (tiny) 12 V power pins on a DDR5
             | module, neither that nor the form factor allows for
             | dissipating anywhere close to 150 W. The teased 512 GB
             | Samsung module doesn't even have a heatspreader.
        
               | __alexs wrote:
               | VIN_BULK is 5V with a max current of 2A but every data
               | pin provides current that is used on the DIMM in some
               | respect.
        
         | deelowe wrote:
         | There's been talk of eliminating sockets for years. Something
         | has got to give.
        
           | jnwatson wrote:
           | You can still actively cool socketed RAM.
        
             | deelowe wrote:
             | Sort of. Trace length is already a nightmare.
        
           | zeroth32 wrote:
           | more compact chips will have a higher failure rate. Not a
           | great idea for servers.
        
             | to11mtm wrote:
             | There's a fun curve on this to be sure.
             | 
             | If I had to guess, Servers would not go any further than
             | some sort of memory-backplane where the memory for multiple
             | channels was integrated onto a single PCB.
             | 
             | Even then, IIRC hot-swapping of memory modules is a thing
             | for some servers, so that will have to be handled somehow.
        
       | wallaBBB wrote:
       | Question that comes to mind - Is M2 (with thermal issue on new
       | Air) affected considering how RAM is packed there?
        
         | Toutouxc wrote:
         | What thermal issues? All I've seen so far are people who don't
         | seem to understand how passive cooling works, despite the M1
         | Air being out for two years and working the same way.
        
           | nostrademons wrote:
           | The M2 chip generates more heat than the M1, with 20% more
           | transistors and about a 12% higher clock speed. M2 Mac Pro
           | has thermal issues compared to M1 Mac Pro as well, even with
           | the fan.
        
             | buryat wrote:
             | mac pro doesn't have m1/2
        
               | ywain wrote:
               | They were likely referring to the laptop Macbook Pro, not
               | the desktop Mac Pro.
        
           | webmobdev wrote:
           | Perhaps OP came across this recent article - _Reviewers
           | agree: The M2 MacBook Air has a heat problem_ -
           | https://www.digitaltrends.com/computing/m2-macbook-air-
           | revie... .
        
             | GeekyBear wrote:
             | Throttles under load isn't a heat problem.
             | 
             | This review of Lenovo's Thinkpad Yoga is what a heat
             | problem looks like:
             | 
             | >Unfortunately, the laptop got uncomfortably hot in its
             | Best performance mode during testing, even with light
             | workloads.
             | 
             | https://arstechnica.com/gadgets/2022/07/review-lenovos-
             | think...
             | 
             | Too hot to comfortably touch, even under light workloads,
             | unless you set it to throttle all the time? That's a heat
             | problem.
        
             | tedunangst wrote:
             | Is it really a problem if it's designed to thermally
             | throttle?
        
               | EricE wrote:
               | It is if you are expecting maximum performance.
        
               | tinus_hn wrote:
               | Perhaps that's the problem. Their expectations are
               | unrealistic. Did Apple promise no thermal throttle?
        
               | Dylan16807 wrote:
               | Marketing usually talks about unthrottled speed only,
               | including Apple's here as far as I have seen.
        
       | jhallenworld wrote:
       | Maybe DRAM becomes non-viable, so switch to SRAM. Which is
       | denser, 14 nm DRAM or 5 nm SRAM?
        
         | 55873445216111 wrote:
         | SRAM is ~10x higher cost per bit (due to memory cell size) than
         | DRAM
        
         | [deleted]
        
         | to11mtm wrote:
         | DRAM.
         | 
         | IIRC TSMC's 135MBit 5nm example is 79.8mm^2, although that's
         | got other logic.
         | 
         | In the abstract, a 0.021 square-micrometer-per-bit size [1]
         | says you'd need about 21mm^2 for a gigabit (base 10) of 5nm
         | SRAM, without other logic.
         | 
         | Micron claimed 0.315Gb/mm^2 on their 14nm process, [2] so
         | somewhere between a factor of 6 and 7.
         | 
         | That said, my understanding is that there is some sort of wall
         | around 10nm, where we can't really make smaller capacitors and
         | thus the limitation on things. (This may have changed since I
         | last was aware however.)
         | 
         | (There is also the way than 'nm' works these days... but I'm
         | not qualified to speak on that)
         | 
         | Also, AFAIK SRAM is still broadly speaking more power hungry
         | than DRAM (I may be completely out of date on this though...)
         | 
         | [1] - https://fuse.wikichip.org/news/3398/tsmc-details-5-nm/
         | 
         | [2] - https://semiengineering.com/micron-d1%CE%B1-the-most-
         | advance...
        
       | Victerius wrote:
       | > A few overheated transistors may not greatly affect
       | reliability, but the heat generated from a few billion
       | transistors does. This is particularly true for AI/ML/DL designs,
       | where high utilization increases thermal dissipation, but thermal
       | density affects every advanced node chip and package, which are
       | used in smart phones, server chips, AR/VR, and a number of other
       | high-performance devices. For all of them, DRAM placement and
       | performance is now a top design consideration.
       | 
       | I know this may not be a cheap solution, but why not start
       | selling pre-built computers with active cooling systems?
       | Refrigerant liquids like those used in refrigerators or water
       | cooling could be an option. The article addresses this:
       | 
       | > Although it sounds like a near-perfect solution in theory, and
       | has been shown to work in labs, John Parry, industry lead,
       | electronics and semiconductor at Siemens Digital Industries
       | Software, noted that it's unlikely to work in commercial
       | production. "You've got everything from erosion by the fluid to
       | issues with, of course, leaks because you're dealing with
       | extremely small, very fine physical geometry. And they are
       | pumped. One of the features that we typically find has the lowest
       | reliability associated with it are electromechanical devices like
       | fans and pumps, so you end up with complexity in a number of
       | different directions."
       | 
       | So instead of integrating fluids within the computer, build
       | powerful mini-freezers for computers and store the computer
       | inside. Or split the warm transistors from the rest of the build
       | and store only those inside the mini freezer, with cables to
       | connect to the rest of the computer outside.
        
         | CoolGuySteve wrote:
         | I've always wondered why motherboards aren't placed at a slight
         | angle like the wing of a car so that the air moving over it has
         | a higher angle of incidence, higher pressure, and higher
         | thermal capacity.
         | 
         | With the angle, you can also place cable connectors and whatnot
         | on the bottom of the board so they don't obstruct airflow as
         | much.
         | 
         | Basically, optimize PV = nRT inside the computer case at no
         | extra cost other than a redesign.
        
           | saltcured wrote:
           | I'm struggling slightly to envision the effect you are
           | seeking. My motherboards don't tend to be flying through the
           | air and so lack a well-defined angle of attack... :-) There
           | already exist horizontal and vertical motherboard mounts in
           | different computer cases, including ones that could be stood
           | either way to suit the desktop. In my experience, this
           | doesn't affect cooling that much.
           | 
           | I think the fan, internal baffle, and vent positions dominate
           | the airflow conditions inside the case. So, rather than
           | tilting a motherboard, wouldn't you get whatever you are
           | after with just a slight change in these surrounding
           | structures?
        
             | CoolGuySteve wrote:
             | You seem to be ignoring that all the punch through
             | connectors on a board are currently on the side that air
             | must pass over.
             | 
             | Furthermore, I've never seen a case, either desktop or
             | rackmount, that allows one to angle the fans at anything
             | other than a 90 degree angle or parallel to the board.
             | 
             | None of this makes sense in terms of fluid dynamics.
        
               | saltcured wrote:
               | Having a smooth board seems at odds with having a large
               | surface area for heat transfer, doesn't it? And wouldn't
               | laminar flow also have less movement near the surface?
               | For optimal cooling, would you actually want turbulence
               | to mix the layers? Instead of mounting fans at different
               | angles, add some vanes or even duct work to aim and
               | accelerate the flow where it needs to transfer heat.
               | 
               | But, given that boards do not have completely
               | standardized layouts, it seems like you eventually need
               | to assume a forest of independent heat sinks sticking up
               | in the air. You lose the commodity market if everything
               | has to be tailor made, like the integrated heat sink and
               | heat pipe systems in laptops.
        
           | SketchySeaBeast wrote:
           | I would assume because the things with the greatest heat have
           | typically had such a requirement for active cooling that
           | minor optimization wouldn't have helped much and for
           | everything else you really didn't worry about (though my
           | motherboard now has heat-pipes across the VRMs and my RAM and
           | northbridge have got big old heat spreaders).
        
             | CoolGuySteve wrote:
             | Yeah, the way my case is laid out, airflow to the VRM is
             | blocked by the heat spreaders on the RAM and the ATX power
             | connector. AMD systems in particular seem to require better
             | memory thermals.
             | 
             | It seems like we're reaching a point where a new ATX
             | standard is required to ensure the memory and GPU can make
             | contact with a large heatsink similar to how the trashcan
             | Mac Pro and XBox Series X are designed. Doing so would also
             | cut down on the ridiculous number of fans an overclocked
             | gaming PC needs these days, my GPU and CPU heatsinks have 5
             | 80mm fans mounted to them.
             | 
             | ATX is great but it seems like only minor improvements to
             | power connectors and whatnot have been made since it was
             | introduced in 1995.
        
               | Macha wrote:
               | Are the trashcan Mac Pro and Xbox Series X considered
               | effecient cooling solutions? I thought the trashcan Pro
               | had issues at higher temperatures which in turn limited
               | their ability to use higher end parts and in turn forced
               | the return of the cheese grater?
               | 
               | The series X GPU then is considered equivalent to a
               | desktop 3070, and laptop 3080s exist and are also
               | considered equivalent to a desktop 3070, so don't require
               | anything particularly novel in terms of cooling solutions
               | (3080 laptops are loud under load, but so is the series
               | X).
               | 
               | Overclocked components are so heavy in cooling needs as
               | they're being run so far outside their most efficient
               | window to get the maximum performance - which is why
               | datacenters which care more about energy usage than
               | gamers tend to use lower clocked parts.
        
               | CoolGuySteve wrote:
               | Both systems are a fraction of the size of an ATX case
               | and as efficient as they needed to be to meet their
               | predetermined convective cooling needs. In both cases,
               | profit margin is increased by reducing material and
               | shipping volume requirements.
               | 
               | A similar single heatsink design for high end PCs would
               | need to be much larger than either of those designs but
               | considering how much empty space is in an ATX case, I
               | don't think it would be much larger than current PCs.
               | 
               | Consider that the best PC cooling solutions all look like
               | this: https://assets1.ignimgs.com/2018/01/18/cpucooler-12
               | 80-149617...
               | 
               | Or pass liquid through a radiator with comparable volume,
               | standardizing the contact points for a single block
               | heatsink with larger fans would make computers more
               | efficient and quiet.
        
           | picture wrote:
           | It won't be a simple redesign to tilt boards "slightly"
           | because manufacturing processes that are already honed in
           | need to be completely retooled, with likely more complexity
           | (man different length of standoffs per board?)
           | 
           | And additionally, there are only a few key components of a
           | motherboard that need cooling. Most of the passive components
           | like the many many decoupling capacitors don't generate
           | significant heat. The components that do require access to
           | cool air are already fitted with finned heat sinks and even
           | additional fans. They interact with air enough to where a
           | slight tilt cannot make a meaningful difference.
           | 
           | Basically just adding a small piece of aluminum to key areas
           | will work better than angling the whole board
        
           | kllrnohj wrote:
           | You don't really need to pass any air over the PCB, though.
           | Anything that needs cooling sticks up above it. Also the
           | airflow through a case isn't perfectly parallel to the
           | motherboard PCB anyway. GPU fans throw the air in all sorts
           | of directions, including straight down into the motherboard.
           | And so do CPU coolers.
           | 
           | Cables also don't really obstruct the airflow like at all.
        
         | dangrossman wrote:
         | The article mentions that the automotive industry demands some
         | of the largest temperature ranges for these parts. New cars are
         | basically computers on wheels (especially something like a
         | Tesla), and the cabin in a hot day under a glass roof can
         | easily exceed 170F. Where will the freezer you build around all
         | the computers go, and how will it be powered while the car is
         | sitting parked in a lot?
        
         | outworlder wrote:
         | > I know this may not be a cheap solution, but why not start
         | selling pre-built computers with active cooling systems?
         | Refrigerant liquids like those used in refrigerators or water
         | cooling could be an option.
         | 
         | Before going into water cooling, a change in form factor to
         | allow for better airflow (and mounting of larger heat sinks)
         | would be in order.
         | 
         | Water cooling would require a water cooling block, not sure how
         | it would work with the current form factor.
         | 
         | > So instead of integrating fluids within the computer, build
         | powerful mini-freezers for computers and store the computer
         | inside. Or split the warm transistors from the rest of the
         | build and store only those inside the mini freezer, with cables
         | to connect to the rest of the computer outside.
         | 
         | That's impractical. You are heat exchanging with the air, then
         | you are cooling down the air? Versus exhaust the hot air and
         | bringing more from the outside. You just need to dissipate
         | heat, active cooling is not needed.
        
         | kube-system wrote:
         | Heat pipes are the phase-change cooling solution that solves
         | all of those issues. People don't really think of their cheap
         | laptop as having a phase-change liquid cooling system, but it
         | actually does.
        
         | _jal wrote:
         | For most commercial use, you're talking about refrigerated
         | racks. They exist, but they're pretty niche.
         | 
         | In a typical data center, all this does is decentralize your
         | cooling. Now you have many smaller (typically less robust)
         | motors to monitor and replace, and many drain lines much closer
         | to customer equipment and power.
         | 
         | Those units take up a lot more space, too, because of the
         | insulation.
        
         | tbihl wrote:
         | The elevated temperatures of the overheating components are
         | such that fluid flow, not temperature difference, is the thing
         | to go after, and it also has the advantage of being much
         | simpler than adding a whole refrigeration cycle.
         | 
         | These problems start to read like problems from nuclear power,
         | where sufficiently uniform flow is a huge deal so that various
         | materials aren't compromised in the reactor.
        
         | beckingz wrote:
         | Condensation in most environments gets really rough on
         | computers.
         | 
         | In theory you can eliminate condensation.
         | 
         | But in practice, there's a difference between theory and
         | practice.
        
           | tonetheman wrote:
           | i would hang the memory upside down so that condensation goes
           | away from the electronics then put in a catch tray at the
           | bottom for evaporation.
           | 
           | I am sure there is a lot more to it than that though... ha
        
             | beckingz wrote:
             | More of an issue on the motherboards where it will
             | eventually get into something.
        
             | dclowd9901 wrote:
             | Heat also moves upward so that would probably cause the
             | board and its components to get too hot.
        
         | dtx1 wrote:
         | why not integrate the ram into the package like apple does
         | anyway and use a slightly larger SoC Cooling solution for the
         | chips? Or just attach headspreaders to ram modules (like gaming
         | modules) and add a fan for them like servers already do due to
         | their general front to back airflow design. The only thing you
         | can't do anymore is relying on the passive cooling of the chips
         | own surface, something CPUs can't do anymore since the early
         | 90s
        
           | toast0 wrote:
           | Apple does a ram on top system right?
           | 
           | That's not going to be viable for servers ror two big
           | reasons:
           | 
           | a) it would big a major capacity limitation; you're not
           | fitting 8-16 DIMMs worth of ram ontop of the CPU. Sure, not
           | everyone fills up their servers, but many do.
           | 
           | b) if you put the ram on top of the cpu, all of the cpu heat
           | needs to transit the ram, which practically means you need a
           | low heat cpu. This works for Apple, their laptop cooling
           | design has never been appropriate for a high heat cpu, but
           | servers manage to cool hundred watt chips in 1U through
           | massive airflow, so high heat enables more computation.
           | 
           | Heatspreaders may make their way into server ram though
           | (although not so big, cause a lot of servers are 1U)
           | 
           | Otoh, the article says
           | 
           | > 'From zero to 85degC, it operates one way, and at 85deg to
           | 90degC, it starts to change,'" noted Bill Gervasi, principal
           | systems architect at Nantero and author of the JEDEC DDR5
           | NVRAM spec. "From 90deg to 95degC, it starts to panic. Above
           | 95degC, you're going to start losing data, so you'd better
           | start shutting the system down."
           | 
           | CPUs commonly operate in that temperature range, but RAM
           | doesn't pull that much power, so it doesn't get too much
           | above ambient as long as there's some airflow, and if ambient
           | hits 50C, most people are going to shutdown their severs
           | anyway.
        
             | kube-system wrote:
             | Maybe we could architect servers with more CPU packages and
             | fewer cores per package?
             | 
             | Maybe instead of 32 RAM packages and 4 CPU packages, we
             | could have 16 CPU packages each with onboard RAM?
        
               | nsteel wrote:
               | Will these CPUs talk to each other with a similar latency
               | hit as we get from talking to DRAM today?
        
             | __alexs wrote:
             | The M1/M2 has the RAM on the same package as the CPU but
             | it's not actually on top of the die, it's adjacent to it.
             | Here's a pic of one someone on reddit delided
             | https://imgur.com/a/RhGk1xw
             | 
             | Obviously this is still a lot of heat in a small space but
             | it does mean the cooler gets to have good coupling with the
             | die rather than going all the way through some DRAM first.
        
               | toast0 wrote:
               | That's more tractable. Gotta make sure everything hits
               | the same z-height and the contact patches are right...
               | But you still have a capacity issue.
        
           | SketchySeaBeast wrote:
           | Big old SoC really do seem like the future. CPU, GPU, RAM,
           | motherboard controllers, throw all those different problem
           | onto a big old die and optimize for cooling that guy.
        
             | foobiekr wrote:
             | SOCs are harder, not easier, to cool.
        
           | AtlasBarfed wrote:
           | Yeah I don't understand why a dedicated fan and other basic
           | CPU cooling techniques don't apply here. It's probably
           | because the DRAM industry doesn't want to change form factors
           | and standards to a substantial degree...
           | 
           | ... probably because they do the bare minimum to keep up with
           | CPU design and routinely get busted for cartel price fixing
           | and predatory pricing.
        
             | dtx1 wrote:
             | I mean literally this https://youtu.be/TFE9wfAfudE?t=611
             | Problem solved
        
         | jackmott42 wrote:
         | Active cooling tends to have the challenge of controlling
         | condensation, and then of course now you are drawing even MORE
         | power from the wall.
        
           | mrtranscendence wrote:
           | I've seen YouTube videos of overclockers employing
           | refrigeration techniques (or coolants like liquid nitrogen),
           | and it does seem like condensation is a major issue. Maybe
           | that's not as much of a problem at more reasonable
           | temperatures?
           | 
           | But yeah, I'd be just as or more concerned about the amount
           | of power it would take to run a freezer like that ... I'm
           | already drawing as much as 850 watts for my PC, with a max of
           | a couple hundred watts for my OLED TV and speakers, and don't
           | forget the modem and router, and a lamp to top it all off;
           | would a powerful enough mini freezer to cool my PC even fit
           | on the circuit?
           | 
           | Actually, it's even worse because I've got an air purifier
           | running there too ... but I could move that, I suppose.
        
             | Ekaros wrote:
             | Cascade cooling is a fun thing. The next step after water
             | cooling before getting to liquid nitrogen...
             | 
             | Still, I wouldn't really go for that. Knowing how noisy the
             | average compressor and fan for that size is. I much prefer
             | my nearly silent fan cooled machine...
        
               | EricE wrote:
               | If you are going to go extreme enough to have a
               | compressor and fan, you can always put them in another
               | room :p
        
         | snarfy wrote:
         | The biological solution to leaks is clotting. Do we have
         | cooling liquids that clot like blood does, say when exposed to
         | oxygen?
        
           | kansface wrote:
           | Great, not your computer can have a thrombosis or a stroke!
        
           | xxpor wrote:
           | the reliability there isn't particularly great ;)
        
             | mgsouth wrote:
             | 50-100 yrs between major overhaul? When's the last time you
             | had to manually top-up or bleed air out of your circulatory
             | system? I'd say that's impressively robust.
        
           | chmod775 wrote:
           | It will clot inside the cooling circuit because of the air
           | within it. Or will get within it.
           | 
           | However there are ways to prevent and detect leaks in current
           | system with negative pressure:
           | https://www.youtube.com/watch?v=UiPec2epHfc
        
         | SketchySeaBeast wrote:
         | So you're taking a 300W-1000W space heater and putting it into
         | a freezer that needs to be able to bleed that much heat? Going
         | to need another breaker.
        
           | Victerius wrote:
           | I'm just brainstorming. I can troubleshoot my computer and
           | write basic code but I'm not a computer engineer.
        
           | 7speter wrote:
           | This has come up often in the comments section of articles
           | I've seen about prospective 600-900w 40 series nvidia cards.
        
             | SketchySeaBeast wrote:
             | Honestly, the fact that my 3080 can draw 400W makes me kind
             | of sick and I limit FPS specifically so it doesn't. I can't
             | ever see myself buying a card that draws double that.
        
               | max51 wrote:
               | You can reduce the power limit a lot on a 3080 before it
               | impacts performance. The last 3 - 5% of performance they
               | are getting out of their chip is responsible for more
               | than a third of the power draw on higher clocked cards.
        
               | SketchySeaBeast wrote:
               | Yeah, I've significantly undervolted both my GPU and CPU.
               | I now never see 300W, really helped with thermals as
               | well.
        
       | baybal2 wrote:
        
       | nonrandomstring wrote:
       | Still waiting to see the first micro-engineered Stirling engine
       | that can self-cool. Any physicists care to comment on why that
       | won't work yet, or ever?
        
         | Chabsff wrote:
         | You can only cool something by making something else warmer by
         | a larger amount. The heat has to go somewhere, and moving that
         | heat in any non-passive way will invariably produce yet more
         | heat in the process.
        
           | nonrandomstring wrote:
           | I think some people are interpreting that as a joke. I'm not
           | talking about a _net gain_ of energy or any crazy perpetual
           | motion machine. Think of something like a  "heat brake".
           | Differential heat energy can be converted to mechanical work.
           | Some of that can be used to cool the system elsewhere,
           | creating a negative feedback loop. Another way to think of
           | such a system is like the "reluctance" of an inductor.
           | 
           | With present thermoelectric effects, using a Seebeck junction
           | to generate current for a fan is hopelessly ineffective. But
           | is that necessarily the case for all designs which could help
           | to hold a system under a critical temperature when heat
           | spikes.
        
             | acomjean wrote:
             | do you mean something like a solar chimney, where heat is
             | used to draw air through the rest of the building?
             | 
             | https://en.wikipedia.org/wiki/Solar_chimney
        
               | nonrandomstring wrote:
               | That's an example of a similar system, but probably
               | impractical for use in an electronics context. I have in
               | my imagination a fantasy "smart" material that in the
               | limit can transfer 0.5 * k^m joules of heat per square
               | meter per second from one side to the other (where m is
               | somewhere between 1 and 2). Such a material would always
               | feel slightly warmer on one side and cooler on the other,
               | and this effect would actually increase in the presence
               | of ambient heat, hence it could act as a thermal "brake"
               | or active heat pipe/diode. I beleieve such a device is
               | "allowable" within the laws of physics.
        
           | ta8645 wrote:
           | > You can only cool something by making something else warmer
           | by a larger amount.
           | 
           | Why isn't it also true that you can only make something
           | warmer, by cooling something else by a larger amount?
           | 
           | The movement of electricity generates waste heat, why isn't
           | that process reversible? Making the heat disappear into a
           | cold wire, rather than just dissipating into the atmosphere?
           | (not suggesting it's would be easy or even practical).
        
             | nostrademons wrote:
             | 2nd law of thermodynamics - entropy is always increasing.
             | Heat transfer is never 100% efficient, you always lose
             | something in transmission. This is also why it's not
             | possible to create a perpetual-motion machine.
             | 
             | https://en.wikipedia.org/wiki/Second_law_of_thermodynamics
        
             | nonrandomstring wrote:
             | Peltier coolers [1] do exist for specialist applications
             | but they are not at all effective. You can even buy them on
             | Amazon. If the goal is to iron out a spike to stop your
             | semiconductor from going into thermal runaway (instead of
             | generating net energy as is the knee-jerk of some
             | unimaginative down-voters here) then it's a possible
             | saviour.
             | 
             | [1] https://www.britannica.com/science/Seebeck-effect
             | 
             | [2] https://www.amazon.com/Peltier-
             | Cooler/s?k=Peltier+Cooler
        
       | dylan604 wrote:
       | Next, we'll have a generation of mobile devices that will be
       | liquid cooled. Of course because of the miniturization, there
       | will be no way to refill the liquid coolant without getting a new
       | device. This will naturally happen before the batteries die
       | creating an even shorter life cycle in devices. Sounds like a
       | perfect pitch for an upcoming WWDC type of event.
        
       | superkuh wrote:
       | RAM has been parallel for ages. IBM's new POWER10 achitecture
       | switches to serial control of ram with firmware running on the
       | ram sticks. As long as complex mitigations and monitoring are
       | going to be required this might be the way to go.
        
       | bilsbie wrote:
       | It's at least partly caused by climate change too
        
       ___________________________________________________________________
       (page generated 2022-07-18 23:00 UTC)