[HN Gopher] Intel 3rd gen Xeon Scalable (Ice Lake): generational...
       ___________________________________________________________________
        
       Intel 3rd gen Xeon Scalable (Ice Lake): generationally big,
       competitively small
        
       Author : totalZero
       Score  : 60 points
       Date   : 2021-04-06 16:33 UTC (6 hours ago)
        
 (HTM) web link (www.anandtech.com)
 (TXT) w3m dump (www.anandtech.com)
        
       | ChuckMcM wrote:
       | I found the news of Intel releasing this chip quite encouraging.
       | If they have enough capacity on their 10nm node to put it into
       | production then they have tamed many of the problems that were
       | holding them back. My hope is that Gelsinger's renewed attention
       | to engineering excellence will allow the folks who know how to
       | iron out a process to work more freely than they did under the
       | previous leadership.
       | 
       | That said, fixing Intel is a three step process right? First they
       | have to get their process issues under control (seems like they
       | are making progress there). Second, they need to figure out the
       | third party use of that process so that they can bank some some
       | of revenue that is out there from the chip shortage. And finally,
       | they need to answer the "jelly bean" market, and by that we know
       | that "jelly bean" type processors have become powerful enough to
       | be the only processor in a system so Intel needs to play there or
       | it will lose that whole segment to Nvidia/ARM.
        
         | sitkack wrote:
         | If they price it right, it could be amazing. Computing is
         | mostly about economics. The new node sizes greatly increase the
         | production capacity. Half the dimension in x and y gets you 4x
         | the transistors on the same wafer. It is like making 4x the
         | number of fabs.
         | 
         | It also has speed and power advantages.
         | 
         | I think this release is excellent news on many levels.
        
           | Retric wrote:
           | Intel _10nm_ is really just a marketing term at this point
           | and has nothing to do with transistor density.
        
             | [deleted]
        
         | judge2020 wrote:
         | Production for a datacenter CPU is not the same as production
         | for datacenter + enthusiast-grade consumer CPUs like Zen 3
         | currently achieves, unfortunately. Rocket lake being backported
         | to 14nm is still not a good sign for actual production volume,
         | although it probably means next generation will be 10nm all the
         | way.
        
           | willis936 wrote:
           | Datacenter CPUs are much larger than consumer parts and yield
           | goes down with the square of the die area. They start with
           | these because the margins go up faster than square of the die
           | area.
        
             | Robotbeat wrote:
             | But modern techniques exist to deal with problems in a
             | large die (ie testing and then segmenting off cores with
             | mistakes on them), so the fact they're starting with large
             | chip die sizes doesn't really tell you much, no?
        
               | erik wrote:
               | The top end most profitable SKUs are fully enabled dies.
               | That they are now able to ship dies this large is a good
               | sign. The 10nm laptop chips they have produced so far
               | were rumored to have atrocious yield.
        
           | knz_ wrote:
           | > Rocket lake being backported to 14nm is still not a good
           | sign for actual production volume,
           | 
           | I'm not seeing a good reason for thinking this is the case.
           | Server CPUs are harder to fab (much larger die area) and they
           | need to fab more of them (desktop CPUs are relatively niche
           | compared to mobile and server CPUs).
           | 
           | If anything this is a sign that 10nm is fully ready.
        
           | bushbaba wrote:
           | I assume for intel, they make more server CPUs per year than
           | the entirety of AMDs output.
        
             | bayindirh wrote:
             | Intel has a momentum and some cult following, but they're
             | no match for AMD in certain aspects like PCI lanes, memory
             | channels, and some types of computation which favors AMD's
             | architecture.
             | 
             | Day by day, more data centers get AMD systems by choice or
             | by requirement (Oh, you want 8xA100 nVidia made modules
             | with maximum performance. You need an AMD CPU since it has
             | more PCI lanes, for example).
             | 
             | You don't see much AMD server CPUs around because first
             | generation and most of the second generation has completely
             | bought by FAANG, Dropbox, et al.
             | 
             | As the productions ramps up with newer generation, we can
             | buy the overflowing parts after most of the production is
             | gobbled up by these buyers.
        
               | zepmck wrote:
               | In terms of PCI lanes efficiency, there is no competition
               | between Intel and AMD. Intel is much ahead wrt AMD. Don't
               | not be impressed about number of lanes available on the
               | board.
        
               | cptskippy wrote:
               | I can only assume you're referring to Intel's Rocket Lake
               | storage demonstration they tweeted out. This was using
               | PCMark 10's Quick Storage Benchmark which is more CPU
               | bound than anything else.
               | 
               | All of the other benchmarks in the PCMark test suite push
               | the bottleneck down to the storage device.
               | 
               | One would think Intel might want to build a storage array
               | that could stress the PCIe lanes but then that might show
               | an entirely different picture than the one Intel is
               | portraying.
        
               | bayindirh wrote:
               | > Don't not be impressed about number of lanes available
               | on the board.
               | 
               | When you configure the system full-out with GPUs & HBAs,
               | the number of lanes becomes a matter of necessity rather
               | than a spec which you drool over.
               | 
               | A PCIe lane is a PCIe lane. Its capacity, latency and
               | speed is fixed, and you need these with minimum number of
               | PCIe switches to saturate the devices and servers you
               | have, at least in our scenario.
        
             | ryan_j_naughton wrote:
             | There is a slow but seismic shift to AMD within data
             | centers right now.
        
               | mhh__ wrote:
               | They still have something like 40% to go to even reach
               | parity with Intel though
        
               | [deleted]
        
           | totalZero wrote:
           | > Rocket lake being backported to 14nm is still not a good
           | sign for actual production volume
           | 
           | I'm genuinely having trouble understanding what you mean by
           | this.
           | 
           | Rocket Lake being backported to 14nm means that 10nm can be
           | allocated in greater proportion toward higher-priced chips
           | like Alder Lake and Ice Lake SP. Seems like it would be good
           | for production volume.
        
             | rincebrain wrote:
             | I think they mean that the fact that they needed to
             | backport Rocket Lake, versus just having all their
             | production on 10nm, implies a much more limited production
             | capacity than the other situation.
        
             | zamadatix wrote:
             | Production volume referring to production volume of the
             | node still being low as a whole not risks to the being able
             | to get volume of these particular SKUs using the node.
        
             | jlawer wrote:
             | I was under the impression the issue with 10nm is
             | frequency, which lead to the rocket lake backport.
             | Unfortunately it seems that the 10nm node's efficiency
             | point is lower on the frequency curve. The reviewed ice
             | lake processor was 300mhz lower then the previous
             | generation (though with much higher core count), despite
             | higher power draw and the process node shrink.
             | 
             | In laptop processors they can easily show efficiency gains
             | from the 10nm process and IPC improvement, It appears most
             | laptop processors end up running a power envelope lower
             | then the ideal performance per watt efficiency. Server
             | Processors with higher core counts means you can run more
             | workloads per server, again providing efficiency gains.
             | However desktop / gaming tends to be smaller core count +
             | higher frequency with little concern of efficiency outside
             | of quality of life factors (i.e. don't make me use a 1KW
             | chiller). Intel has been pushing 5ghz processor frequency
             | for years, and rocket lake continues that push (5.3ghz
             | boost), when they drop frequency to move to 10nm, its hard
             | to see an IPC improvement that is able to paper over that.
             | 
             | However alder lake CPUs will have a thread count advantage,
             | so at least with 24 threads it should be able to show
             | generational improvement over the current 8c/16 rocket lake
             | parts. That will allow them to at least argue their value
             | with select benchmarks and intel only features. Those 8
             | efficiency cores will likely be a BIG win on laptop, but on
             | desktop I doubt they will compare favourably to the full
             | fat cores on a current Ryzen 5900x (i.e. a currently
             | available 24 core processor).
             | 
             | Intel is going to have at least 1 more BAD mainstream
             | desktop generation before they can truely compete on the
             | mainstream high end, however there is a chance they have
             | something like a HEDT part that would allow them to at
             | least save face. That being said, given a choice, Intel
             | will give up desktop market share for the faster growing
             | laptop and server markets.
        
         | buu700 wrote:
         | What's a "jelly bean" processor? Trying to search for that just
         | gets a bunch of hits about Android 4.1.
        
           | madsushi wrote:
           | https://news.ycombinator.com/item?id=17376874
           | 
           | > [1] Jelly Bean chips are those that are made in batches of
           | 1 - 10 million with a set of functions that are fairly
           | specific to their application.
        
             | foobarian wrote:
             | Is that the chip-on-board packaging like described here:
             | https://electronics.stackexchange.com/questions/9137/what-
             | ki... ?
        
           | ChuckMcM wrote:
           | Sometimes referred to as "applications specific processor"
           | (ASP) or "System on chip" (SoC). These are the bulk of
           | semiconductor sales these days as they have replaced all of
           | the miscellaneous gate logic on devices with a single
           | programmable block that has a bunch of built in peripherals.
           | 
           | Think Atmel AtMega parts, there are trillions of these in
           | various roles. When you think of something like a 555
           | timer[1] that is now more cost effectively and capably
           | replaced with an 8 pin micro-processor you can get an idea of
           | the shift.
           | 
           | While these are rarely built on the "leading edge" process
           | node, when a process node takes over for high margin chips,
           | the previous node gets used for lower margin chips, which
           | effectively does a shrink on their die increasing their cost
           | (most of these chips seem to keep their performance specs
           | fairly constant, preferring cost reduction over performance
           | improvement.)
           | 
           | Anyway, the zillions of these chips in lots of different
           | "flavors" are colloquially referred to as "jelly bean" chips.
        
           | dragontamer wrote:
           | http://sparks.gogo.co.nz/assets/_site_/downloads/smd-
           | discret...
           | 
           | > Jellybean is a common term for components that you keep in
           | your parts inventory for when your project just needs "a
           | transistor" or "a diode" or "a mosfet"
           | 
           | -----------
           | 
           | For many hobbyists, a Raspberry Pi or Arduino is a good
           | example of a Jellybean. You buy 10x Raspberry Pis and stuff
           | your drawer full of them, because they're cheap enough to do
           | most tasks. You don't really know what you're going to use
           | all 10x Rasp. Pi for, but you know you'll find a use of it a
           | few weeks from now.
           | 
           | ---------
           | 
           | At least, in my Comp. Engineering brain, I think N2222
           | transistors or 3904-transistors, or the 741 Op-amp. There are
           | better op-amps and better transistors for any particular job.
           | But I chose these parts because they're familiar,
           | comfortable, cheap and well understood by a wide variety of
           | engineers.
           | 
           | Well, not the 741 OpAmp anymore anyway. 741 was a jellybean
           | back in the 12V days. Today I think 5V compatibility has
           | become the standard voltage (because of USB). So 5V op-amps
           | are a more important "jellybean".
        
             | klodolph wrote:
             | I don't know how old you are, but the 741 was obsolete in
             | the 1980s. It sticks around in EE textbooks because it's
             | such an easy way to demonstrate _problems_ with op-amps...
             | high input current, low gain-bandwidth product, low slew
             | rate, etc.
             | 
             | I think your jellybean op-amps would more likely be TL072,
             | LM358, or NE5532.
        
               | dragontamer wrote:
               | Old, beaten up textbooks from the corner of my
               | neighborhood library was talking 741 back in the 2000s,
               | when I was in high school and started dabbling around
               | with electricity more seriously.
               | 
               | Maybe it was fully obsolete by that point, but high
               | school + neighborhood libraries aren't exactly filled
               | with up-to-date textbooks or the latest and greatest.
               | 
               | I remember that Radio Shack was still selling kits with
               | 741 in them, as well as breadboards and common
               | components... 12V wall-warts and the like. Online
               | shopping was beginning to get popular, but I was still a
               | mallrat who picked up components and dug through old
               | Radio Shack manuals into 2005 or 2006.
               | 
               | It was the ability to walk around, and see those
               | component shelves sitting there in Radio Shack that got
               | me curious about the hobby and start researching it. I do
               | wonder how modern children are supposed to get interested
               | into hobbies now that malls are less popular (and
               | electronic shops like Radio Shack are basically
               | disappeared).
               | 
               | ------------
               | 
               | I don't remember what we used in college. I knew that I
               | was more selective and understood the kinds of problems
               | various OpAmps had back then. Also you're not really rich
               | enough to invest into a private stockpile of chips, and
               | instead just use whatever the labs are stocked with in
               | college.
               | 
               | LM358 is the jellybean that I keep in my drawer today. If
               | you're curious. Old habits die hard though, I still think
               | 741 as the jellybean even though it really is obsolete
               | today.
        
               | ChuckMcM wrote:
               | I've got a tube each of 358's and 1458's (dual version)
               | in my parts supplies. But my microwave stuff is finding
               | them lacking.
        
               | bavell wrote:
               | +1 for LM358
        
         | carlhjerpe wrote:
         | What I don't understand is: ASML is building these machines for
         | making ICs. Why can TSMC use them for 7nm but Intel can only
         | use them for 10 right now? Doesn't ASML make the lenses as well
         | so that you're "only" stuck making the etching thingy (forgot
         | what it's called, but the reflective template of a CPU).
         | 
         | It seems like nobody is talking about this, could anyone shine
         | some light?
        
           | dragontamer wrote:
           | Consider that the wavelength of red light is 700 nm, and the
           | wavelength of UV-C is 100nm to 280nm.
           | 
           | And immediately, we see the problem about dropping to 10nm:
           | that's literally smaller than the distance that photons
           | vibrate on their way to the final target.
           | 
           | And yeah, 10nm and 7nm is a marketing term, but that doesn't
           | change the fact that these processes are all smaller than the
           | wavelength of light.
           | 
           | -------
           | 
           | So there are two ways to get around this problem.
           | 
           | 1. Use smaller light: "Extreme UV" is even smaller than
           | normal UV at 13.5nm. Kind of the obvious solution, but higher
           | energy and changes the chemistry slightly, since the light is
           | a different color. Things are getting mighty close to literal
           | "X-Ray Lasers" as they are, so the power requirements are
           | getting quite substantial.
           | 
           | 2. Multipatterning -- Instead of developing the entire thing
           | in one shot, do it in multiple shots, and "carefully line up"
           | the chips between different shots. As difficult as it sounds,
           | its been done before at 40nm and other processes. (https://en
           | .wikipedia.org/wiki/Multiple_patterning#EUV_Multip...)
           | 
           | 3. Do both at the same time to reach 5nm, 4nm, or 3nm. Either
           | way, 10nm and 7nm is the point where the various companies
           | had to decide to do #1 first or #2 first. Either way, your
           | company needs to learn to do both in the long term. TSMC and
           | Samsung went with #1 EUV, and I think Intel though that #2
           | multi-patterning would be easier.
           | 
           | And the rest is history. Seems like EUV was easier after all,
           | and TSMC / Samsung's bets paid off.
           | 
           | Mind you, I barely know any of the stuff I'm talking about.
           | I'm not a physicist or chemist. But the above is my general
           | understanding of the issues. I'm sure Intel had their reasons
           | to believe why multipatterning would be easier. Maybe it was
           | easier, but other company issues drove away engineers and
           | something unrelated caused Intel to fall behind.
        
           | vzidex wrote:
           | I'll take a crack at it, though I'm only in undergrad (took a
           | course on VLSI this semester).
           | 
           | Making a device at a specific technology node (e.g. 14nm,
           | 10nm, 7nm) isn't just about the lithography, although litho
           | is crucial too. In effect, lithography is what allows you to
           | "draw" patterns onto a wafer, but then you still need to do
           | various things to that patterned wafer (deposition, etching,
           | polishing, cleaning, etc.). Going from "we have litho
           | machines capable of X nm spacing" to "we can manufacture a
           | CPU on this node at scale with good yield" requires a huge
           | amount of low-level design to figure out transistor sizings,
           | spacings, and then how to actually manufacture the designed
           | transistors and gates using the steps listed above.
        
           | mqus wrote:
           | TSMCs 7nm is roughly equivalent to intels 10nm, the numbers
           | don't really mean anything and are not comparable
        
       | lifeisstillgood wrote:
       | This might be a very dumb question but it always bothered me -
       | silicon wafers are always shown as great circles, but processor
       | dies are obviously square. But it looks like the etching etc goes
       | right to the circular edges - wouldn't it be better to leave the
       | dead space untouched?
        
         | pas wrote:
         | I think these are just press/PR wafers and real production ones
         | don't pattern on the edge. (First of all it takes time, and in
         | case of EUV it means things amortize even faster, because every
         | shot damages the "optical elements" a bit.)
         | 
         | edit: it also depends on how many dies the mask (reticle) has
         | on it. Intel uses one die reticles, so i. theory their real
         | wafers have no situation in which they have partial dies at the
         | edge.
        
         | w0utert wrote:
         | Most semiconductor production processes like etching, doping,
         | polish etc are done on the full wafer, not on individual
         | images/fields. So there is nothing to be gained there in terms
         | of production efficiency.
         | 
         | The litho step could in theory be optimized by skipping
         | incomplete fields at the edges, but the reduction in exposure
         | time would be relatively small, especially for smaller designs
         | that fit multiple chips within a single image field. I imagine
         | it would als introduce yield risk because of things like uneven
         | wafer stress & temperature, higher variability in stage move
         | time when stepping edge fields vs center fields, etc.
        
         | andromeduck wrote:
         | Many of the process steps involve rotation so this is
         | impractical.
        
       | jvanderbot wrote:
       | From Anandtech[1]:
       | 
       | "As impressive as the new Xeon 8380 is from a generational and
       | technical stand-point, what really matters at the end of the day
       | is how it fares up to the competition. I'll be blunt here; nobody
       | really expected the new ICL-SP parts to beat AMD or the new Arm
       | competition - and it didn't. The competitive gap had been so
       | gigantic, with silly scenarios such as where a competing 1-socket
       | systems would outperform Intel's 2-socket solutions. Ice Lake SP
       | gets rid of those more embarrassing situations, and narrows the
       | performance gap significantly, however the gap still remains, and
       | is still undeniable."
       | 
       | This sounds about right for a company fraught with so many
       | process problems lately: Play catch up for a while and hope you
       | experience fewer in the future to continue to narrow the gap.
       | 
       | "Narrow the gap significantly" sounds like good technical
       | progress for Intel. But the business message isn't wonderful.
       | 
       | 1. https://www.anandtech.com/show/16594/intel-3rd-gen-xeon-
       | scal...
        
         | ajross wrote:
         | I don't know that it's all so bad. The final takeaway is that a
         | 660mm2 Intel die at 270W got about 70-80% of the performance
         | that AMD's 1000mm2 MCM gets at 250W. So performance per
         | transistor is similar, but per watt Intel lags. But then the
         | idle draw was significantly better (AMD's idle power remains a
         | problem across the Zen designs), so for many use cases it's
         | probably a draw.
         | 
         | That sounds "competetive enough" to me in the datacenter world,
         | given the existing market lead Intel has.
        
           | marmaduke wrote:
           | It's impressive how you and parent comment copied over
           | to/from the dupe posting verbatim.
           | 
           |  _edit_ oops nevermind, I see my comment was also
           | mysteriously transported from the dupe.
        
           | Symmetry wrote:
           | I'm not sure that's a fair area comparison? AMD only has
           | around 600 mm2 of expensive leading edge 7nm silicon and uses
           | chiplets to up their yields. The rest is the connecting bits
           | from an older and cheaper process. Intel's full size is a
           | single monolithic die on a leading edge process.
        
             | ineedasername wrote:
             | Do chiplets underperform compared to a monolithic die?
        
               | wmf wrote:
               | Yes.
        
               | Symmetry wrote:
               | All things being equal a chiplet design will underperform
               | a monolithic die. But we've already seen the benchmarks
               | on the performance of Milan so looking at chiplets versus
               | monolithic is mostly about considering AMD's strategy and
               | constraints rather than how the chips perform.
        
               | monocasa wrote:
               | Pretty much any time you have signals going off chip, you
               | lose out on both bandwidth and latency.
        
           | ComputerGuru wrote:
           | I would argue that for high-end servers, idle draw is a bit
           | of a non-issue as presumably either you have only one of
           | these machines and it's sitting idle (so no matter how
           | inefficient it doesn't matter) or you have hundreds/thousands
           | of them and they'll be as far from idle as it's possible to
           | be.
           | 
           | AMD's idle power consumption is a bigger issue for desktop,
           | laptop, and HEDT.
        
             | rbanffy wrote:
             | If it has 80% of the performance, it will still be
             | competitive at 80% of the price.
        
               | ShroudedNight wrote:
               | This sounds like a dangerous assumption to make. I would
               | expect that needing 25% more machines for the same
               | performance would be a non-starter for many potential
               | customers.
        
             | throwaway4good wrote:
             | I would expect most high-end servers in data centers to sit
             | idle most of the time? Do you know of any data on this?
        
               | ajross wrote:
               | Most servers are doing things for human beings, and we
               | have irregular schedules. Standard rule of thumb is that
               | you plan for a peak capacity of 10x average. A datacenter
               | that _doesn 't_ have significant idle capacity is one
               | that's some kind of weird special purpose thing like a
               | mining facility.
        
               | adrian_b wrote:
               | That's true, but I would expect that most idle servers
               | are turned off and they use Wake-on-LAN to become active
               | when there is work to do.
               | 
               | Just a few servers could be kept idle, not off, to enable
               | a sub-second start-up time for some new work.
        
               | jeffbee wrote:
               | Certainly for bit players and corporate datacenters with
               | utilization < 1% you'd expect the median server to just
               | sit there. For larger (amazon, google, etc) players the
               | economic incentives against idleness are just too great.
        
               | JoshTriplett wrote:
               | > For larger (amazon, google, etc) players the economic
               | incentives against idleness are just too great.
               | 
               | Not all workloads are CPU-bound. Cloud providers have
               | _many_ servers for which the CPUs are idle most of the
               | time, because they 're disk-bound, network-bound, other-
               | server-bound, bursty, or similar. They're going to aim to
               | minimize the idle time, but they can't eliminate it
               | entirely given that they have customer-defined workloads.
        
               | mamon wrote:
               | But if the workload is not CPU-bound then why would they
               | care about upgrading their CPUs to more performant ones,
               | like Ice Lake Xeons?
        
               | JoshTriplett wrote:
               | The workloads are determined by their customers, and
               | customers don't always pick the exact size system they
               | need (or there isn't always an option for the exact size
               | system they need). The major clouds are going to upgrade
               | and offer faster CPUs as an option, people are going to
               | use that option, and some of their workloads will end up
               | idling the CPU. Major cloud vendors almost certainly have
               | statistics for "here's how much idle time we have, so
               | here's approximately how much we'd save with lower power
               | consumption on idle".
        
             | ajross wrote:
             | Electricity costs for large datacenters are higher than the
             | equipment costs. They absolutely care about idle draw.
        
               | bostonsre wrote:
               | If you are the one paying the electricity bills for that
               | datacenter, then yes, it probably matters to you a lot.
               | If you are just renting a server from aws or gcp, it
               | probably matters less. Although, I assume costs born from
               | idle inefficiency will probably be passed to the
               | customer...
        
               | [deleted]
        
               | spideymans wrote:
               | Shouldn't datacenters attempt to minimize idle time
               | though? A server sitting at idle is a depreciating asset
               | that could likely be put to more productive use if tasks
               | were rescheduled to take advantage of idle time (this
               | would also reduce the total number of servers needed).
        
               | deelowe wrote:
               | Utilization is a very difficult problem to solve. The
               | difference between peak and off peak utilization can be
               | as much as 70% or more depending on the application.
        
               | gumby wrote:
               | That is definitely the objective but the reality is that
               | load is not* uniform over the day. So you are paying to
               | keep some number of servers hot (I don't know about
               | spinup/spindown practices in modern datacenters).
               | 
               | I doubt this applies to HPC (the target market for this
               | part) as they either schedule jobs closely or could, I
               | imagine, shut them down. But I'm not in that space either
               | so this is merely conjecture.
               | 
               | * I am sure there are corner cases where the load _is_
               | uniform, but they are by definition few.
        
               | ComputerGuru wrote:
               | If you have enough servers for idle draw to be more than
               | a rounding error in your opex breakdown, then you have a
               | strategy to keep idle time to zero. It doesn't make any
               | financial sense (no matter how low idle draw is) to have
               | a server sit idle (or even powered off, but that's a
               | capex problem).
        
               | my123 wrote:
               | For a cloud infrastructure, you have a significant part
               | at idle, for when customers want to instantly spawn a VM.
        
               | zsmi wrote:
               | The target market for this part is not that kind of
               | datacenter.
               | 
               | Based on the article they're targeting high performance
               | compute, i.e. "application codes used in earth system
               | modeling, financial services, manufacturing, as well as
               | life and material science."
        
               | klodolph wrote:
               | The opposite is true... a major advantage of running
               | cloud infrastructure is that you can run your CPUs near
               | 100% all the time. CPUs which are not running full bore
               | can have jobs moved to them.
        
               | jrockway wrote:
               | Yeah, I think it's hard to keep your computers at 100%
               | utilization for the entire day. You host services close
               | to your users, and your users go to bed at some point,
               | many of them at around the same time every day. Then your
               | computers have very little work to do.
               | 
               | Some bigger companies have a lot of batch jobs that can
               | run overnight and steal idle cycles, but you have to be
               | gigantic before that's realistic. (My experience with
               | writing gigantic batch jobs is that I just requisitioned
               | the compute at "production quality" so I could work on
               | them during the day, rather than waiting for them to run
               | overnight. Not sure what other people did, and therefore
               | not really sure how much runs overnight at big
               | companies.)
               | 
               | Cloud providers have spot instances that could take up
               | some of this slack, but I bet there is plenty of idle
               | capacity precisely because the cost can't go to $0
               | because of electricity use. Or I could be completely
               | wrong about workloads, maybe everyone has their web
               | servers and CI systems running at 100% CPU all night.
               | I've never seen it, though.
        
               | thekrendal wrote:
               | Or for redundancy sake, if you're using any kind of sane
               | setup. (Yes, YMMV bigly with this particular idea.)
        
               | chomp wrote:
               | Can confirm, built out a datacenter space in a past life.
               | Power costs were of limited concern - cooling was the
               | limited resource. Even then, literally no one went down a
               | spec sheet and compared "hmm, this one has a tiny less
               | amount of watts idle". We just kept servers dark
               | regardless so that we can save on cooling. Nitpicking
               | idle draw for server processors just isn't realistic for
               | a lot of cases.
        
               | dahfizz wrote:
               | large datacenters have hardware orchestration systems
               | that let them turn off unused machines. There really is
               | no reason to have lots of machines on but unused. At
               | least, that is not a significant enough event to be a
               | determining factor in hardware purchasing.
        
             | neogodless wrote:
             | A bit off topic from the server CPU discussion, but I was
             | curious how well AMD is advancing idle power consumption.
             | 
             | For example, the Ryzen 3000 desktop chips seemed to have
             | the issue[0], but the same Zen 2 cores seem to have found
             | some improvements in the Ryzen 4000 mobile chips[1].
             | 
             | I didn't want to just rely on Reddit forum comments, so I
             | found this measure of the Ryzen 3600[2].
             | 
             | > When one thread is active, it sits at 12.8 W, but as we
             | ramp up the cores, we get to 11.2 W per core. The non-core
             | part of the processor, such as the IO chip, the DRAM
             | channels and the PCIe lanes, even at idle still consume
             | around 12-18 W in the system.
             | 
             | My interpretation was expect ~12 W or more idle consumption
             | (just from the CPU package), but I'm not sure I understand
             | it correctly.
             | 
             | I couldn't find the same information for Ryzen 4000
             | laptops, but the same APU is tested in a NUC, where the
             | total system draw (at the wall) at idle was about 10-11 W,
             | still nearly double that of a Core i7 U-series NUC[3], but
             | certainly lower than that of just the CPU package in the
             | Ryzen 3600.
             | 
             | Anecdotally, my 45W Ryzen 7 4800H laptop with 15.6" 1080p
             | screen lasts about 4 hours on 80% of the 60Wh battery with
             | 95% brightness, doing various non-intensive tasks. Though I
             | don't know how well the battery holds up on complete non-
             | use standby.
             | 
             | [0] https://old.reddit.com/r/AMDHelp/comments/cfm1xa/why_is
             | _ryze...
             | 
             | [1] https://old.reddit.com/r/Amd/comments/haq4fg/the_idle_p
             | ower_...
             | 
             | [2] https://www.anandtech.com/show/15787/amd-
             | ryzen-5-3600-review...
             | 
             | [3] https://www.anandtech.com/show/16236/asrock-4x4-box4800
             | u-ren...
        
               | bkor wrote:
               | > I couldn't find the same information for Ryzen 4000
               | laptops
               | 
               | I measured an Asus Mini PC PN50 with a Ryzen 4500U. The
               | idle power usage was 8.5 Watt for the system. This with
               | 32GB of memory and a SATA SSD installed. It would be nice
               | if it was lower than this, but it isn't too bad.
               | Interestingly the machine used 1.2 Watt while off after
               | it wasn't on power, 0.5 Watt after starting up and
               | shutting it down.
               | 
               | Recently noticed some people focussing on low power but
               | powerful 24/7 home "servers". Systems that are on 24/7,
               | but often idle. One system used around 4.5 Watt in idle.
               | The "brick" / power adapter often uses too much power,
               | even when everything is off.
        
               | wtallis wrote:
               | Ryzen 3000 desktop processors use a chiplet design, with
               | the IO die built on an older process than the processor
               | dies. Ryzen 4000 mobile processors are monolithic dies,
               | so they don't have the extra power of the inter-chiplet
               | connections and they're entirely 7nm parts instead of a
               | mix of 7nm and 14nm.
        
           | monocasa wrote:
           | You can't really compare die sizes of a MCM and a single die
           | and expect to get transistor counts out of that. So much of
           | the area of the MCM is taken up by all the separate phys to
           | communicate between the chiplets and the I/O die, and the I/O
           | die itself is on GF14nm (about equivalent to Intel 22nm) last
           | time I checked, not a new competitive logic node.
           | 
           | There's probably a few more gates still on the AMD side, but
           | it's not the half again larger that you'd expect looking at
           | area alone.
        
         | jvanderbot wrote:
         | Furthermore:
         | 
         | "At the end of the day, Ice Lake SP is a success. Performance
         | is up, and performance per watt is up. I'm sure if we were able
         | to test Intel's acceleration enhancements more thoroughly, we
         | would be able to corroborate some of the results and hype that
         | Intel wants to generate around its product. But even as a
         | success, it's not a traditional competitive success. The
         | generational improvements are there and they are large, and as
         | long as Intel is the market share leader, this should translate
         | into upgraded systems and deployments throughout the enterprise
         | industry. Intel is still in a tough competitive situation
         | overall with the high quality the rest of the market is
         | enabling."
        
           | jandrese wrote:
           | I found it a little weird that they conclusions section
           | didn't mention the AMD or ARM competition at all, given that
           | the Intel chip seemed to be behind them in most of the tests.
        
             | jvanderbot wrote:
             | You mean OP didn't? Yes, that's probably standard PR to
             | focus on strengths rather than competition.
        
               | jandrese wrote:
               | I mean the Anand piece.
        
               | jvanderbot wrote:
               | The conclusions section was quoted in my post and they
               | explicitly mention it.
               | 
               | "As impressive as the new Xeon 8380 is from a
               | generational and technical stand-point, what really
               | matters at the end of the day is how it fares up to the
               | competition. I'll be blunt here; nobody really expected
               | the new ICL-SP parts to beat AMD or the new Arm
               | competition - and it didn't. "
        
         | ksec wrote:
         | It is certainly good enough to compete, prioritising Fab
         | capacity to Server unit and lock in those important ( Swaying )
         | deals from clients. Sales and Marketing work their connection
         | along with software tools that HPC markets needs and AFAIK is
         | still far ahead of AMD.
         | 
         | And I can bet those prices have lots of room for special
         | discount to clients. Since RAM and NAND Storage dominate the
         | cost of server, the difference of Intel and AMD shrinks rapidly
         | in the grand scheme of things, giving Intel a chance to fight.
         | And there is something not mentioned enough, the importance of
         | PCI-E 4.0 Support.
         | 
         | I wanted to rant about AMD, but I guess there is not much
         | point. ARM is coming.
        
         | quelsolaar wrote:
         | >This sounds about right for a company fraught with so many
         | process problems lately
         | 
         | Publicly the problems have been lately, but the things that
         | caused these problems have happened much further back.
         | 
         | I'm cautiously bullish on Intel. From what I gather, Intel is
         | in a much better place internally. They have much better focus,
         | there is less infighting, its more engineering then sales lead,
         | they have some very good people and they are no longer
         | complacent. It will however take years before this is becomes
         | visible from the outside.
         | 
         | Given the demand for CPUs and the competitions inability to
         | deliver, I think intel will do OK even if they are no ones
         | first choice of CPU vendor, while they try to catch up.
        
       | intricatedetail wrote:
       | Why Intel even bothers releasing products that don't bring
       | anything new and worthwhile to the table? This is such a massive
       | waste of time, resources and environment.
        
       | w0mbat wrote:
       | 10nm? I love retro-computing.
        
         | ajross wrote:
         | As gets repeated ad nauseum, industry numbering has gone wonky.
         | Intel still hews more or less to the ITRS labelling for its
         | nodes, which means that it's 10nm process has pitches and
         | density values along the same lines as TSMC or Samsung's 7nm
         | processes.
         | 
         | This is, indeed, no longer an industry leading density and it
         | lags what you see on "5nm" parts from Apple and Qualcomm. But
         | it's the same density that AMD is using for the Zen 2/3 devices
         | against which this is competing in the datacenter.
        
           | adrian_b wrote:
           | Maybe the density is the same, but the 10-nm process variant
           | that Intel is forced to use for Ice Lake Server is much worse
           | than the 7-nm TSMC process.
           | 
           | It is worse in the sense that at the same number of active
           | cores and the same power consumption, the 10-nm Ice Lake
           | Server can reach only a much lower clock frequency than the
           | 7-nm Epyc, which results in a much lower performance for
           | anything that does not use AVX-512.
           | 
           | It is also worse in the sense that the maximum clock
           | frequency when the power limits are not reached is also much
           | worse for the 10-nm process used for Ice Lake Server.
           | 
           | Ice Lake Server does not use the improved 10-nm process
           | (SuperFin) that is used for Tiger Lake and it is strongly
           | handicapped because of that.
        
             | ac29 wrote:
             | While I'd agree with you that TSMC's current 7nm seems to
             | be better than Intel's current 10nm, comparing Epyc to Ice
             | Lake SP isnt quite the same. Intel is putting (up to) 40
             | cores on the same die, AMD only puts 8 cores. It looks like
             | AMD has the better method for overall performance, and
             | Intel will likely follow them - in addition to being able
             | to get more cores into a socket, I suspect Intel could also
             | crank frequency higher with less cores per die.
        
               | adrian_b wrote:
               | For the user it does not matter how many cores are on a
               | die.
               | 
               | For the user it matters what is included in a package.
               | The new Ice Lake Server package (77.5 mm x 56.5 mm) has
               | finally reached about the same size as the Epyc package
               | (75.4 mm x 58.5 mm), because now Intel offers for the
               | first time 8 memory channels, like its competitors have
               | offered for many years.
               | 
               | So in packages of the same size, Intel has 40 cores,
               | while AMD offers 64 cores. Moreover Intel requires an
               | extra package for the I/O controller, while AMD includes
               | it in the CPU package.
               | 
               | So for general-purpose users, AMD offers much more in the
               | same space.
               | 
               | On the other hand, Ice Lake Server has twice the number
               | of FMA units, so it has as many floating-point
               | multipliers as 80 AMD cores. This advantage is diminished
               | by the fact that the clock frequency for heavy AVX-512
               | instructions is only 80% of the nominal frequency, but it
               | can still give an advantage to Ice Lake Server for the
               | programs that can use AVX-512.
        
               | totalZero wrote:
               | From a yield perspective, if core failures are
               | independent events, binning is probably easier with the
               | big chiplet approach.
               | 
               | The Epyc 3 approach does have some drawbacks. Looking at
               | the Epyc 3 TDP numbers, there's probably a nontrivial
               | thermal cost to breaking out the dies as AMD has. Not to
               | mention the I/O for Epyc 3 is not on TSMC 7nm.
        
         | mhh__ wrote:
         | Intel's process have been a disaster, however considering that
         | for the most part they aren't _that_ far behind (especially
         | financially) I don 't think they have to catch up much on
         | process at least to be right back in the fight - I will believe
         | that the pecking order has truly changed when AMD's
         | documentation and software is as good as Intel's.
        
       | Pr0GrasTiNati0n wrote:
       | And only 20 of those cores have back doors.....lulz
        
       | Sephr wrote:
       | As disappointing as the perf is for server workloads, what I'm
       | really interested in is SLI gaming performance. I can imagine
       | that this would be a boon for high end gaming with multiple x16
       | PCIe 4.0 slots and 8 DDR4 channels.
       | 
       | SLI really shines on HEDT platforms, and this is probably the
       | last non-multi-chip quasi-HEDT CPU for a while with this kind of
       | IO.
       | 
       | (Yes, I know SLI is 'dead' with the latest generation of GPUs)
        
         | zamadatix wrote:
         | These would be absolute trash for SLI performance vs top end
         | standard consumer desktop parts. The best SKU has a peak boost
         | clock of 3.7 GHz, the core to core latencies are about twice as
         | high as the desktop parts, and the memory+PCIe bandwidth mean
         | little to nothing for gaming performance (remember SLI
         | bandwidth goes over a dedicate bridge as well) which is highly
         | sensitive to latencies instead.
        
       | marmaduke wrote:
       | Nice to see that AVX512 hasn't died with Xeon Phi. I see it
       | coming out in a number of high end but lightweight notebooks too
       | (Surface Pro with i7 10XXG7, MacBookPro 13" idem). This is a nice
       | way to avoid needing GPU for heavily vectorizable compute tasks,
       | assuming you don't need the CUDA ecosystem.
        
         | api wrote:
         | The 2020 Intel MacBook Air and 13" Pro have 10nm Ice Lake with
         | AVX512. The Ice Lake MacBook Air performs pretty well and very
         | close to the Ice Lake Pro, though of course the M1 destroys it.
        
           | mhh__ wrote:
           | > though of course the M1 destroys it.
           | 
           | SIMD throughput?
        
             | api wrote:
             | Actually I don't know... I suspect Intel still wins in wide
             | SIMD. The M1 totally destroys Intel in general purpose code
             | performance, especially when you consider power
             | consumption.
        
         | bitcharmer wrote:
         | AVX-512 is an abomination in my field and we avoid it like the
         | plague. It looks like we're not the only ones. Linus has a lot
         | to say about it as well.
         | 
         | https://www.phoronix.com/scan.php?page=news_item&px=Linus-To...
        
         | 37ef_ced3 wrote:
         | For example, AVX-512 neural net inference: https://NN-512.com
         | 
         | Only interesting if you care about price (dollars spent per
         | inference)
         | 
         | For raw speed (no matter the price) the GPU wins
        
         | dragontamer wrote:
         | GPGPU will never really be able to take over CPU-based SIMD.
         | 
         | GPUs have far more bandwidth, but CPUs beat them in latency.
         | Being able to AVX512 your L1 cached data for a memcpy will
         | always be superior to passing data to the GPU.
         | 
         | With Ice Lake's 1MB L2 cache, pretty much all tasks smaller
         | than 1MB will be superior in AVX512 rather than sending it to a
         | GPU. Sorting 250,000 Float32 elements? Better to SIMD Bitonic
         | sort / SIMD Mergepath
         | (https://web.cs.ucdavis.edu/~amenta/f15/GPUmp.pdf) on your
         | AVX512 rather than spend a 5us PCIe 4.0 traversal to the GPU.
         | 
         | It is better to keep the data hot in your L2 / L3 cache, rather
         | than pipe it to a remote computer (even if the 16x PCIe 4.0
         | pipe is 32GB/s and the HBM2 RAM is high bandwidth once it gets
         | there).
         | 
         | --------
         | 
         | But similarly: CPU SIMD can never compete against GPGPUs at
         | what they do. GPUs have access to 8GBs @500GB/s VRAM on the
         | low-end and 40GBs @1000GB/s on the high end (NVidia's A100).
         | EDIT: Some responses have reminded me about the 80GB @ 2000GB/s
         | models NVidia recently released.
         | 
         | CPUs barely scratch 200GB/s on the high end, since DDR4 is just
         | slower than GPU-RAM. For any problem where data-bandwidth and
         | parallelism is the bottleneck, that fits inside of GPU-VRAM
         | (such as many-many sequences of large scale matrix
         | multiplications), it will pretty much always be better to
         | compute that sort of thing on a GPU.
        
           | marmaduke wrote:
           | In my experience, the most important aspect missing in most
           | CPU GPU discussions, is that CPUs have a massive cache
           | compared to GPUs, and that cache has pretty good bandwidth
           | (~30 GB/core?), even if main memory doesn't. So even if your
           | task's hot data doesn't fit in L2 but in L3/core, AVX-
           | whatever per core processing is a good bet regardless of what
           | a GPU can do.
           | 
           | Another aspect that seems like a hidden assumption in CPU-GPU
           | discussions is that you have the time-energy-expertise budget
           | to (re)build your application to fit GPUs.
        
             | dragontamer wrote:
             | On the memory perspective, I basically see problems in
             | roughly the following grouping of categories:
             | 
             | 40TBs+ -- Storage-only solutions. "External Tape Merge sort
             | algorithm", "Sequential Table Scan", etc. etc. (SSDs or
             | even Hard drives if you go big enough)
             | 
             | 4TB to 40TBs -- Multi-socket DDR4 RAM is king (8-way Ice
             | Lake Xeon Scalable Platinum will probably reach 40TBs).
             | Single-node distributed memory with NUMA / UPI to scale.
             | 
             | 1TB to 4TB -- Single Socket DDR4 RAM (EPYC, even if at 4x
             | NUMA. Or Single-node Ice Lake).
             | 
             | 80GB to 1TB -- DGX / NVlink distributed memory A100 ganging
             | up HBM2 together. GPU-distributed RAM is king.
             | 
             | 256MBs to 80GBs -- HBM2 / GDDR6 Graphics RAM is king (80GB
             | A100 2TB/s).
             | 
             | 1.5MBs to 256MBs -- L3 cache is king (8x32MBs EPYC L3
             | cache, or POWER9 110MB+ L3 cache unified)
             | 
             | 128kB to 1.5MBs -- L2 cache is king (1.25MB Ice Lake Xeons
             | L2, this article)
             | 
             | 1kB to 128kB -- L1 cache is king. (128kB L1 cache on Apple
             | M1). Note: "GPU __Shared__" is a close analog to L1 and
             | competes against it, but is shared between 32 to 256 GPU
             | threads, so its not an apples-to-apples comparison.
             | 
             | 1kB and below -- The realm of register-space solutions.
             | (See 64-bit chess engine bitboards and the like). Almost
             | fully CPU-constrained / GPU-constrained programming. 256x
             | 32-bit GPU registers per GPU-thread / SIMD thread. CPUs
             | have fewer nominal registers, but many "out of order"
             | buffers or "reorder buffers" that practically count as
             | register storage in a practical / pragmatic sense. CPUs
             | just use their "real registers" as a mechanism to
             | automatically discover parallelism in otherwise single-
             | thread written code.
             | 
             | ------------
             | 
             | As you can see: GPUs win in some categories, but CPUs win
             | in others. And these numbers change every few months as a
             | new CPU and/or GPU comes out. And at the lowest levels:
             | CPUs and GPUs cannot be compared due to fundamental
             | differences in architecture.
             | 
             | For example: GPU __shared__ memory has gather/scatter
             | capabilities (the NVidia PTX instructions / AMD GCN
             | instructions permute vs bpermute), while CPUs traditionally
             | only accelerate gather capabilities (pshufb), and leave
             | vgather/vscatter instructions to the L1 cache instead. GPUs
             | have 32x ports to __shared__, so every one of the
             | 32-threads in a wave-front can read/write every single
             | clock-tick (as long as all 32 they are on different
             | ports/alignment, or you have a special one-to-all
             | broadcast). CPUs only have 2 or 4 ports, so vscatter and
             | vgather operate slowly, as if a single thread were
             | reading/writing each of the memory locations.
             | 
             | But CPU L1 cache has store-forwarding, MESI + cache
             | coherence, and other acceleration features that GPUs don't
             | have.
             | 
             | GPUs are therefore more efficient at sharing data within
             | workgroups of ~256 threads, but CPUs are more efficient at
             | sharing data between cores, or even among out-of-die NUMA
             | solutions, thanks to robust MESI messaging.
        
           | ajross wrote:
           | FWIW: your DRAM numbers are quoting clock speeds and not
           | bandwidth. They aren't linear at all. In fact with enough
           | cores you can easily saturate memory that wide, and CPUs are
           | getting wider just as fast as GPUs are. The giant Epyc AMD
           | pushed out last fall has 8 (!) 64 bit DRAM channels, where
           | IIRC the biggest NVIDIA part is still at 6.
        
             | mrb wrote:
             | dragontamer is still correct. He quotes correct bandwidth
             | numbers. EPYC's 8 channels of DDR4-3200 gets it to 204.8
             | GB/s (and, yes, that's _bandwidth_ )
             | 
             | Whereas Nvidia's A100 has over 2000 GB/s of memory
             | bandwidth. That's 10-fold better.
        
             | dragontamer wrote:
             | > 8 (!) 64 bit DRAM channels
             | 
             | Yeah. And at 3200 Mbit/sec, that comes out to 200GB/s.
             | (3200 MHz x 8-bytes (aka 64-bit) == 25GB/s. x8 channels ==
             | 200GB/s).
             | 
             | > where IIRC the biggest NVIDIA part is still at 6.
             | 
             | That's 6x *1024-bit* HBM2 channels. Total bandwidth is
             | 2000GBps, or over 10x the speed of the "8x channel EPYC".
             | Yeah, HBM2 is fat, extremely fat.
             | 
             | ----------
             | 
             | *ONE* HBM2 channel offers over 300GBps bandwidth. And the
             | A100 has *SIX* of them. Literally ONE HBM2 channel beats
             | the speed of all 8x DDR4 EPYC memory channels working in
             | parallel.
        
               | ajross wrote:
               | You're still quoting clock speeds. That's not how this
               | works. Go check a timing diagram for a DRAM cycle in your
               | part of choice and do the math.
        
               | dragontamer wrote:
               | Do you know what 3200MHz / PC4-25600 DDR4 means?
               | 
               | 25600 is the channel rate in (EDIT) MB/sec of the stick
               | of RAM. That's 25GB/s for a 3200 MHz DDR4 stick. x8 (for
               | 8-channels working in parallel) is 200GB/s.
               | 
               | -----------
               | 
               | This has been measured in practice by Netflix: https://20
               | 19.eurobsdcon.org/slides/NUMA%20Optimizations%20in...
               | 
               | As you can see, Netflix's FreeBSD optimizations have
               | allowed EPYC to reach 194GB/s measured performance (or
               | just under the 200GB/s theoretical). And only with VERY
               | careful NUMA-tuning and extreme optimizations were they
               | able to get there.
        
               | gbl08ma wrote:
               | All of that is bandwidth and clock speed, not latency
        
               | dragontamer wrote:
               | Look, if CPUs were better at memory latency, the BVH-
               | traversal of raytracing would still be done on CPUs.
               | 
               | BVH-tree traversals are done on the GPU now for a reason.
               | GPUs are better at latency hiding and taking advantage of
               | larger sets of bandwidth than CPUs. Yes, even on things
               | like pointer-chasing through a BVH-tree for AABB bounds
               | checking.
               | 
               | GPUs have pushed latency down and latency-hiding up to
               | unimaginable figures. In terms of absolute latency,
               | you're right, GPUs are still higher latency than CPUs.
               | But in terms of "practical" effects (once accounting for
               | latency hiding tricks on the GPU, such as 8x way
               | occupancy (similar to hyperthreading), as well as some
               | dedicated datastructures / programming tricks (largely
               | taking advantage of the millions of rays processed in
               | parallel per frame), it turns out that you can convert
               | many latency-bound problems into bandwidth-constrained
               | problems.
               | 
               | -----------
               | 
               | That's the funny thing about computer science. It turns
               | out that with enough RAM and enough parallelism, you can
               | convert ANY latency-bound problem into a bandwidth-bound
               | problem. You just need enough cache to hold the results
               | in the meantime, while you process other stuff in
               | parallel.
               | 
               | Raytracing is an excellent example of this form of
               | latency hiding. Bouncing a ray off of your global data-
               | structure of objects involved traversing pointers down
               | the BVH tree. A ton of linked-list like current_node =
               | current_node->next like operations (depending on which
               | current_node->child the ray hit).
               | 
               | From the perspective of any ray, it looks like its
               | latency-bound. But from the perspective of processing
               | 2.073 million rays across a 1920 x 1080 video game scene
               | with realtime-raytracing enabled, its bandwidth bound.
        
               | wmf wrote:
               | That presentation shows 194 gigabits/s which is only ~24
               | gigabytes/s at the NIC; that requires ~96 gigabytes/s of
               | memory bandwidth. Usable memory bandwidth on Milan is
               | only <120 gigabytes/s which is about 60% of the
               | theoretical max. DRAM never gets more than ~80% of
               | theoretical max bandwidth because of command overhead
               | (which is what I think ajross keeps alluding to).
               | https://www.anandtech.com/show/16594/intel-3rd-gen-xeon-
               | scal...
        
               | dragontamer wrote:
               | I appreciate the correction. It seems like I made the
               | mistake of Gbit vs GByte confusion (little-b vs big-B).
               | 
               | > (which is what I think ajross keeps alluding to)
               | 
               | It seems like ajross is accusing me of underestimating
               | CPU-bandwidth. At least, that's my interpretation of the
               | discussion so far. As you've pointed out however, I'm
               | overestimating it.
               | 
               | EDIT: But I'm overestimating it on both sides. A100 2000
               | TB/s is the "channel bandwidth" as well, as the CAS and
               | RAS commands still need to go through the channel and get
               | interpreted.
        
           | volta83 wrote:
           | > Being able to AVX512 your L1 cached data for a memcpy will
           | always be superior to passing data to the GPU.
           | 
           | The two last apps I worked on have been GPU-only. The CPU
           | process starts running and launches GPU work, and that's it,
           | the GPU does all the work until the process exits.
           | 
           | There is no need to "pass data to the GPU" because data is
           | never on CPU memory, so there is nothing to pass from there.
           | All network and file I/O goes directly to the GPU.
           | 
           | Once all your software runs on the GPU, passing data to the
           | CPU for some small task doesn't make much sense either.
        
             | dragontamer wrote:
             | So we know that GPUs are really good at raytracing and
             | matrix multiplication, two things that are needed for
             | graphics programming.
             | 
             | However, the famous "Moana" scene for Disney-level
             | productions is a 93GB (!!!!) scene statically, with another
             | 131GBs (!!!) of animation data (trees blowing in the winds,
             | waves moving on the shore, etc. etc.).
             | 
             | That's simply never going to fit on a 8GB, 40GB, or even
             | 80GB high-end GPU. The only way to work with that kind of
             | data is to think about how to split it up, and have the CPU
             | store lots of the data, while the GPU processes pieces of
             | the data in parallel.
             | 
             | https://www.render-blog.com/2020/10/03/gpu-motunui/
             | 
             | Which has been done before, mind you. But it should be
             | noted that the discussion point for GPU-scale compute runs
             | into practical RAM-capacity constraints today, even on
             | movie-scale problems from 5 years ago (Moana was released
             | in 2016, and had to be rendered on hardware years older
             | than 2016).
             | 
             | Moana scene is here if you're curious:
             | https://www.disneyanimation.com/resources/moana-island-
             | scene...
             | 
             | ----------
             | 
             | But yes, if your data fits within the 8GBs GPU (or you can
             | afford a 40GB or 80GB VRAM GPU and your data fits in that),
             | doing everything on the GPU is absolutely an option.
        
               | oivey wrote:
               | We know that GPUs are really good at far more than ray
               | tracing and matrix multiplication. Oversimplifying a bit,
               | they're great at basically any massively parallel
               | operation that has minimal branching and can fit in
               | memory. Using a GPU to just add two images together
               | probably isn't worth it, but many real world workflows
               | allow you to operate solely on the GPU.
               | 
               | If you're Disney, you can afford boxes with 10+ A100s
               | with NVLink sharing the memory in a single 400+ GB pool.
               | Unknown if that ends up being more economical than the
               | equivalent CPU version, but it's important to understand
               | in order to evaluate the future of GPUs.
        
               | volta83 wrote:
               | >That's simply never going to fit on a 8GB, 40GB, or even
               | 80GB high-end GPU. The only way to work with that kind of
               | data is to think about how to split it up, and have the
               | CPU store lots of the data, while the GPU processes
               | pieces of the data in parallel.
               | 
               | There is always a problem size that does not fit into
               | memory.
               | 
               | Whether that memory is the GPU memory, or the CPU memory,
               | doesn't really matter.
               | 
               | We have been solving this problem for 60 years already.
               | It isn't rocket science.
               | 
               | ---
               | 
               | The CPU doesn't have to do anything.
               | 
               | The GPU can map a file stored on hard disk to VRAM
               | memory, do random access into it, process chunks of it,
               | write the results into network sockets and send them over
               | the network, etc.
               | 
               | The only thing the CPU has to do is launch a kernel:
               | int main(args...) {
               | main_kernel<<<...>>>(args...);             synchronize();
               | return 0;         }
               | 
               | and this is a relatively accurate depiction of how the
               | "main function" of the two latests apps I've worked on
               | look like: the GPU does everything.
               | 
               | ---
               | 
               | > However, the famous "Moana" scene for Disney-level
               | productions is a 93GB (!!!!) scene statically, with
               | another 131GBs [...] That's simply never going to fit on
               | a high-end GPU.
               | 
               | LOL.
               | 
               | V100 with 32Gbs and 8x per rack gave you 256 Gb of VRAM
               | addressable from any GPU in the rack.
               | 
               | A100 with 80GB and 16x per rack give you 1.3 TB of VRAM
               | addressable from any GPU in the rack.
               | 
               | You can fit Moana in GPU VRAM in a now old DGX-2.
               | 
               | If you are willing to bet cash on Moana never fitting on
               | a single GPU, I'd take you on that bet. Sounds like free
               | money to me.
        
               | dragontamer wrote:
               | I'll post the link again: https://www.render-
               | blog.com/2020/10/03/gpu-motunui/
               | 
               | This person rendered the Moana scene on just 8GBs of GPU
               | VRAM. It does this by rendering 6.7GB chunks at a time on
               | the GPU, with the CPU keeping the RAM-heavy "big picture"
               | in mind. (EDITED paragraph. First wording of this
               | paragraph was poor).
               | 
               | ------
               | 
               | Its not that these problems "cannot be solved", its that
               | these problems "become grossly more complicated" when
               | under RAM / VRAM constraints. They're still solvable, but
               | now you have to do strange techniques.
               | 
               | ------
               | 
               | With regards to a Ray-tracer, tracing the ray-of-light
               | that's bouncing around could theoretically touch ANY of
               | the 93GBs of static object data (which could have been
               | shifted by any of the 131 GBs of animation data). That is
               | to say: a ray that bounces off of a any leaf on any tree
               | could bounce in any direction, hitting potentially any
               | other geometry in the scene.
               | 
               | That pretty much forces you to keep the geometry in high-
               | speed RAM, and not do an I/O cycle between each ray-
               | bounce.
               | 
               | As a rough reminder of the target performance: Raytracers
               | aim at ~30 million to 30-billion ray-bounces per second,
               | depending on movie-grade vs video-game optimized. Either
               | way, that level of performance is really only ever going
               | to be solved by keeping all of the geometry data in RAM.
               | 
               | > A100 with 80GB and 16x per rack give you 1.3 TB of VRAM
               | addressable from any GPU in the rack.
               | 
               | That doesn't mean it makes sense to traverse a BVH-tree
               | across a relatively high-latency NVLink connection off-
               | chip. I know GPUs have decent latency hiding but...
               | that's a lot of latency to hide.
               | 
               | Again: your CPU-renderers can hit 10s of millions of rays
               | per second. I'm not sure if you're gonna get something
               | pragmatic by just dropping the entire geometry into
               | distributed NVSwitch'd memory and hoping for the best.
               | 
               | Honestly, that's where the 8GB CPU+GPU team becomes
               | interesting to me. A methodology for clearly separating
               | the geometry and splitting up which local compute-devices
               | are responsible for handling which rays is going to scale
               | better than a naive dump reliant on remote-connections
               | pretending to be RAM.
               | 
               | Video games hit Billions of rays/second. The promise of
               | GPU-compute is on that order, and I just doubt that
               | remote RAM accesses over NVLink will get you there.
               | 
               | > If you are willing to bet cash on Moana never fitting
               | on a single GPU, I'd take you on that bet. Sounds like
               | free money to me.
               | 
               | The issue is not Moana (or other movies from 2016), the
               | issue are the movies that will be made in 2022 and into
               | the future. Especially if they're near photorealistic
               | like Marvel-movies or Star Wars.
               | 
               | ----------
               | 
               | The other problem is: what's cheaper? A DGX-system could
               | very well be faster than one CPU system. But would it be
               | faster than a cluster of Ice-Lake Xeons with AVX512 each
               | with the precise amount of RAM needed for the problem?
               | (Ex: 512GBs in some hypothetical future movie?)
               | 
               | A team probably would be better: CPUs have expandable
               | RAM, that's their biggest advantage. GPUs have fixed RAM.
               | Slicing the problem up so that pieces of Raytracing fits
               | on GPUs, while the other, "bulkier" bits fit on CPU DDR4
               | (or DDR5), would probably be the most cost-efficient way
               | at solving the raytracing problem.
               | 
               | The GPU-Moana experiment showed that "collecting rays
               | that bounce outside of RAM" is an efficient methodology.
               | Slice the scene into 8GB chunks, process the rays that
               | are within that chunk, and the collate the rays together
               | to find where the rays go.
        
             | aviraldg wrote:
             | > There is no need to "pass data to the GPU" because data
             | is never on CPU memory, so there is nothing to pass from
             | there. All network and file I/O goes directly to the GPU.
             | 
             | This is very interesting - do you have a link that explains
             | how it works / is implemented?
        
               | dragontamer wrote:
               | PS5, XBox Series X, and NVidia have a "GPU Direct I/O"
               | feature.
               | 
               | https://www.nvidia.com/en-us/geforce/news/rtx-io-gpu-
               | acceler...
               | 
               | https://www.amd.com/en/products/professional-
               | graphics/radeon...
               | 
               | The GPU itself can send PCIe 4.0 messages out. So why not
               | have the GPU make I/O requests on behalf of itself? Its a
               | bit obscure, but this feature has been around for a
               | number of years now. The idea is to remove the CPU and
               | DDR4 from the loop entirely, because those just
               | bottleneck / slowdown the GPU.
               | 
               | --------
               | 
               | From an absolute performance perspective, it seems good.
               | But CPUs are really good and standardized at accessing
               | I/O in very efficient ways. I'm personally of the opinion
               | that blocking and/or event driven I/O from the CPU (with
               | the full benefit of threads / OS-level concepts) would be
               | easier to think about than high-performance GPU-code.
               | 
               | But still, its a neat concept, and it seems like there's
               | a big demand for it (see PS5 / XBox Series X).
        
               | etaioinshrdlu wrote:
               | The CPU is still acting as the PCIe controller though
               | (right?), which kind of makes the CPU act like a network
               | switch. PCIe is a point-to-point protocol kind of like
               | ethernet too. Old-school PCI was a shared bus so devices
               | might be able to directly talk to each other, but I don't
               | think that was ever actually used.
        
               | d110af5ccf wrote:
               | My understanding matches yours, but it's worth noting
               | that (IIUC) memory and PCIe are (last time I checked?) a
               | separate I/O subsystem that just happens to reside within
               | the same package as the CPU on modern chips. So P2PDMA
               | avoids burning CPU cycles and RAM bandwidth shuffling
               | data around that you never wanted to use on the CPU
               | anyway. (Also see: https://lwn.net/Articles/767281/)
        
               | dragontamer wrote:
               | Take a look at the Radeon more closely.
               | 
               | I think the Radeon + Premier Pro documentation makes it
               | clear how it works:
               | https://www.amd.com/system/files/documents/radeon-pro-
               | ssg-pr...
               | 
               | As you can see, the GPU is attached to the x16 slot, and
               | the 4x NVMe SSDs are attached to the GPU. When the CPU
               | wants to store data on the SSD, it communicates first to
               | the GPU, which then pass-throughs the data to the four
               | SSDs.
               | 
               | That's the simpler example.
               | 
               | --------------
               | 
               | In NVidia's case, they're building on top of GPUDirect
               | Storage (https://developer.nvidia.com/blog/gpudirect-
               | storage/), which seems to be based on enterprise
               | technology where PCIe switches were used.
               | 
               | NVidia's GPUs would command the PCIe switch to grab data,
               | without having the PCIe switch send data to the CPU
               | (which would most likely be dropped in DDR4, or maybe L3
               | in an optimized situation).
        
       | ASpaceCowboi wrote:
       | will this work on the latest mac pro? Probably not right?
        
         | wmf wrote:
         | No, it's a different socket.
        
           | robbyt wrote:
           | Classic Intel
        
             | wmf wrote:
             | You can't increase memory and PCIe channels while keeping
             | the same socket. This isn't a cash grab; it's actual
             | progress.
        
       | paulpan wrote:
       | TLDR from Anandtech is that while this is a good improvement over
       | previous gen, it still falls behind AMD (Epyc) and ARM (Altra)
       | counterparts. What's somewhat alarming is that on a per-core
       | comparison (28-core 205W designs), the performance increase can
       | be a wash. Doesn't bode well for Intel as both their competitors
       | are due for refreshes that will re-widen the gap.
       | 
       | Key question will be how quickly Intel will shift to the next
       | architecture, Sapphire Rapids. Will this release be like the
       | consumer/desktop Rocket Lake? E.g. just a placeholder to
       | essentially volume test the 10nm fabrication for datacenter.
       | Probably at least a year out at this point since Ice Lake SP was
       | supposed to be originally released in 2H2020.
        
         | gsnedders wrote:
         | > Key question will be how quickly Intel will shift to the next
         | architecture, Sapphire Rapids. Will this release be like the
         | consumer/desktop Rocket Lake? E.g. just a placeholder to
         | essentially volume test the 10nm fabrication for datacenter.
         | Probably at least a year out at this point since Ice Lake SP
         | was supposed to be originally released in 2H2020.
         | 
         | Alder Lake is meant to be a consumer part contemporary with
         | Sapphire Rapids, which is server only. They're likely based on
         | the same (performance) core, with Adler Lake additionally
         | having low-power cores.
         | 
         | Last I heard the expectation was still that these new parts
         | would enter the market at the end of this year.
        
       | CSSer wrote:
       | Lately Intel seems to be getting a lot of flack here. As a
       | layperson in the space who's pretty out of the loop (I built a
       | home PC about a decade ago), could someone explain to me why that
       | is? Is Intel really falling behind or dressing up metrics to
       | mislead or something like that? I also partly ask because I feel
       | that I only really superficially understand why Apple ditched/is
       | ditching Intel, although I understand if that is a bit off-topic
       | for the current article.
        
         | s_dev wrote:
         | >Is Intel really falling behind
         | 
         | Intel is already behind AMD -- they have no product segment
         | where they are absolutely superior. The means AMD is setting
         | the market pace.
         | 
         | On top of this Apple is switching to ARM designed CPUs. This
         | also looks to be a vote of no confidence in Intel.
         | 
         | The consensus seems to be that Intel who have their own fabs --
         | never really nailed anything under 14nm and are now being
         | outcompeted.
        
           | meepmorp wrote:
           | Apple designs it's own chips, it doesn't use ARM's designs.
           | They do use the ARM ISA, tho.
        
           | totalZero wrote:
           | > Intel is already behind AMD -- they have no product segment
           | where they are absolutely superior.
           | 
           | There are some who would argue this claim, but I think it's
           | at least a defensible one.
           | 
           | Still, availability is an important factor that isn't
           | captured by benchmarking. AMD has had CPU inventory trouble
           | in the low-end laptop segment and high-end desktop segment
           | alike.
           | 
           | > The consensus seems to be that Intel who have their own
           | fabs -- never really nailed anything under 14nm and are now
           | being outcompeted.
           | 
           | Intel has done well with 10nm laptop CPUs. They were just
           | very late to the party. Desktop and server timelines have
           | been quite a bit worse. I agree Intel did not nail 10nm, but
           | they're definitely hanging in there. It's one process node at
           | the cusp of transition to EUV, so some of the defeatism
           | around Intel may be overzealous if we keep in mind that 7nm
           | process development has been somewhat parallel to 10nm
           | because of the difference in the lithographic technology.
        
         | yoz-y wrote:
         | Intel was unable to improve their fabrication process year
         | after year, while promising to do so repeatedly. Now, they have
         | been practically lapped twice. Apple has a somewhat specific
         | use case, but their cpus have significantly better performance
         | per watt.
        
         | matmatmatmat wrote:
         | Some of the other comments above have touched on this, but I
         | think there is also a bit of latent anti-Intel sentiment in
         | many people's minds. Intel extracted a non-trivial price
         | premium out of consumers for many, many years (both for chips
         | and by forcing people to upgrade motherboards by changing CPU
         | sockets) while AMD could only catch up to them for brief
         | periods of time. People paid that price premium for one reason
         | or another, but it doesn't mean they were thrilled about it.
         | 
         | Many people, I'd say especially enthusiasts, were quite happy
         | when AMD was able to compete on a performance/$ basis and then
         | outright beat Intel.
         | 
         | Of course, now the tables have turned and AMD is able to
         | extract that price premium while Intel cut prices. Who knows
         | how long this will last, but Intel is still the 800 lb gorilla
         | in terms of capacity, engineering talent, and revenue. I don't
         | think we've heard the last from them.
        
         | blackoil wrote:
         | A perfect storm. Intel had trouble with its 10nm/7nm
         | engineering processes, which TSMC has been able to achieve. AMD
         | had a resurgence with Zen arch. and ARM/Apple/TSMC/Samsung put
         | 100s of billions to catchup with the x86 performance.
         | 
         | Intel is still biggest player in the game, because even though
         | they are stuck at 14nm, AMD isn't able to manufacture enough to
         | take bigger chunks of the market. Apple won't sell it to
         | PC/Datacenter space, rest are still niche.
        
           | ac29 wrote:
           | > even though they are stuck at 14nm
           | 
           | I think this isnt quite fair, their laptop 10nm chips have
           | been shipping in volume since last year, and their server
           | chips were released today, with 200k+ units already shipped
           | (according to Anandtech). The only line left on 14nm is
           | socketed Desktop processors, which is a relatively small
           | market compared to laptops and servers.
        
             | colinmhayes wrote:
             | Hacker News users generally aren't very interested in
             | laptop processors. Sure business wise they're incredibly
             | important, but as far as getting flack on hacker news,
             | laptop chips won't stop it. People here have been waiting
             | for intel 10nm on server and especially desktop for 6 years
             | now.
        
               | totalZero wrote:
               | Unless you have scraped past posts to perform some kind
               | of sentiment analysis, this is pure speculation intended
               | to move the goalposts on GP.
        
         | jimbob21 wrote:
         | Yes, quite simply they have fallen behind while also promising
         | things they have failed to deliver. As an example, their most
         | recent flagship release is the 11900k, which has 2 fewer cores
         | (now 8) than its predecessor (had 10, 10900k), and almost no
         | improvement to speak of otherwise (in some games its ~1%
         | faster). On the other hand, AMD's flagship, which to be fair is
         | $150 more expensive, has 16 cores, very similar clock speeds,
         | and is much more energy efficient (intel and amd calculate TDP
         | differently). Overall, AMD is the better choice by a large
         | margin and Intel is getting flock because it sat on its laurels
         | for the last decade(?) and hasn't done anything to improve
         | itself.
         | 
         | To put it in numbers alone, look at this benchmark. Flagship vs
         | Flagship:
         | https://www.cpubenchmark.net/compare/Intel-i9-11900K-vs-AMD-...
        
           | formerly_proven wrote:
           | Naturally the 11900K performs quite a bit worse than the
           | 10900K in anything which uses all cores, but the remarkable
           | thing about the 11900K is that it even performs worse in a
           | bunch of game benchmarks, so as a product it genuinely
           | doesn't make any sense.
        
         | chx wrote:
         | Absolutely. Intel has been stuck on the 14nm node for a very,
         | very long time. 10nm CPUs were supposed to ship in 2015, they
         | did really only in late 2019, 2020. Meanwhile AMD caught up and
         | Intel has been doing the silliest shenanigans to appear as if
         | they were competitive, like in 2018 they demonstrated a 28 core
         | 5GHz CPU and kinda forgot to mention the behind-the-scenes one
         | horsepower (~745W) industrial chiller keeping that beast
         | running.
         | 
         | Also, the first 10nm "Ice Lake" mobile CPUs were not really an
         | improvement over the by then many times refined 14nm chips
         | "Comet Lake". It's been a faecal pageant.
        
         | mhh__ wrote:
         | Intel's processes (i.e. turning files on a computer into chips)
         | have been a complete disaster in recent years, to the point of
         | basically _missing_ one of their key die shrinks entirely as
         | far as I can tell.
         | 
         | They are, in a certain sense, suffering from their own success
         | in that their competitors have basically been nonexistant up
         | until Zen came about (and even then only until Zen 3 have Intel
         | truly been knocked off their single thread perch). This has led
         | to them getting cagey, and a bit ridiculous in the sense that
         | they are not only backporting new designs to old processes but
         | also pumping them up to genuinely ridiculous power budgets.
         | With Apple, AMD, and TSMC they have basically been caught with
         | their trousers down by younger and leaner companies.
         | 
         | Ultimately this is where Intel need good leadership. The mba
         | solution is to just give up and do something else (e.g. spin
         | off the fabs), but I think they should have the confidence (as
         | far as I can tell this is what they are doing) to rise to the
         | technical challenge - they will probably never have a run like
         | they did from Nehalem to shortly before now, but throwing in
         | the towel means that the probability is zero.
         | 
         | Intel have been in situations like this before, e.g. When
         | Itanium was clearly doomed and AMD were doing well (amd64),
         | they came back with new processors and basically ran away to
         | the bank for years - AMD's server market share is still pitiful
         | compared to Intel (10% at most), for example.
        
           | Symmetry wrote:
           | I don't want to council despair but I'm not as sanguine as
           | you either. Intel has had disastrous microarchitectures
           | before. Itanium, P4, and previous ones. But it's never had to
           | worry about recovering from a _process_ disaster before. It
           | might very well be able to but I worry.
        
             | mhh__ wrote:
             | I'm not exactly optimistic either, I just think that the
             | doomsaying is overblown (and sometimes looks like a tribal
             | thing from Apple and AMD fans if I'm being honest - i.e.
             | companies aren't your friends)
        
           | ac29 wrote:
           | > Intel's processes (i.e. turning files on a computer into
           | chips) have been a complete disaster in recent years, to the
           | point of basically missing one of their key die shrinks
           | entirely as far as I can tell.
           | 
           | Which one? I dont believe they missed a die shrink, it just
           | took a _long_ time. Intel 14nm came out in 2014 with their
           | Broadwell Processors, and the next node, 10nm came out in
           | 2019 (technically 2018, but very few units shipped that
           | year).
        
             | totalZero wrote:
             | Intel killed the longstanding "tick tock" model in 2016
             | because of failures with 10nm yield and the higher-than-
             | expected costs of 14nm. Intel got too aggressive with the
             | timeline of the die shrink, which led to them trying to do
             | 10nm on DUV rather than waiting for EUV technology where
             | the light is about an order of magnitude shorter in
             | wavelength than that of DUV (and thus able to resolve
             | today's nano-scale features without all the RETs needed for
             | DUV).
             | 
             | From the 2015 10-K [0]:
             | 
             |  _" We expect to lengthen the amount of time we will
             | utilize our 14nm and our next-generation 10nm process
             | technologies, further optimizing our products and process
             | technologies while meeting the yearly market cadence for
             | product introductions."_
             | 
             | Spoiler alert: In the five years after shelving the tick-
             | tock model, Intel also missed the yearly market cadence for
             | product introductions.
             | 
             | [0] https://www.sec.gov/Archives/edgar/data/50863/000005086
             | 31600...
        
             | mhh__ wrote:
             | Cannon Lake I believe was basically cancelled.
        
               | chx wrote:
               | You wish. It was released because a bunch of Intel
               | managers had bonuses tied to launching 10nm and so they
               | released it.
        
         | ineedasername wrote:
         | They can't get their next-gen fabs (chip factories) into
         | production. It's been a problem long enough that they're not
         | even next-gen anymore: it's current-gen, about to be previous-
         | gen.
         | 
         | So what you're seeing isn't really anti-Intel, it's probably
         | often more like bitter disappointment that they haven't done
         | better. Though I'm sure there's a tiny bit of fanboy-ism for &
         | against Intel.
         | 
         | There's definitely some of that pro-AMD fanboy sentiment in the
         | gaming community where people build their own rigs: AMD chips
         | are massively cheaper than a comparable Intel chip.
        
           | M277 wrote:
           | Just a minor nitpick regarding your last paragraph, this is
           | no longer the case. Intel is now significantly cheaper after
           | they heavily cut prices across the board.
           | 
           | For instance, you can now get an i7-10700K (which is roughly
           | equivalent in single thread and better in multi thread) for
           | cheaper than a R5 5600X.
        
             | robocat wrote:
             | Nitpick: you are comparing price where you should be
             | comparing performance per dollar, or are you cherry-picking
             | the wrong comparison?
             | 
             | My cherry-pick is where the AMD chip is 30% more expensive,
             | but multi-threaded performance is 100% better in this
             | example:
             | https://www.cpubenchmark.net/compare/Intel-i9-11900K-vs-
             | AMD-...
             | 
             | Edit: picking individual processors to compare (especially
             | low volume ones) is often not useful when talking about how
             | well a company is competing in the market.
        
               | makomk wrote:
               | The comment you're replying to is "cherry-picking" the
               | current-gen AMD processor which offers the best value for
               | most users. You're cherry-picking an Intel processor
               | which almost no-one has any reason to buy over other
               | Intel options (the i9-11900K is much more expensive than
               | the 11700K or 10700k for little extra performance; AMD
               | had a few chips like this last gen, and they actually
               | downplayed how much of a price increase this gen was by
               | only comparing to those poor-value chips). One of these
               | comparisons is a lot more useful than the other.
        
           | MangoCoffee wrote:
           | >So what you're seeing isn't really anti-Intel, it's probably
           | often more like bitter disappointment that they haven't done
           | better.
           | 
           | its back to where everyone design its own chip for their own
           | product but don't need a fab 'cause of foundry like TSMC and
           | Samsung.
        
         | tyingq wrote:
         | Lots of shade because they first missed the whole mobile
         | market, then got beat by AMD Zen by missing the chiplet concept
         | and a successful current-gen process size, then finally also
         | overshadowed by Apple's M1. The M1 thing is interesting,
         | because it likely means the next set of ARM Neoverse CPUs for
         | servers, from Amazon and others, will be really impressive.
         | Intel is behind on many fronts.
        
           | mhh__ wrote:
           | >likely means the next set of ARM Neoverse CPUs from Amazon
           | and others will be really impressive
           | 
           | M1 is proof that it can be done, however you can absolutely
           | make a bad CPU for a good ISA so I wouldn't take it for
           | granted.
        
             | tyingq wrote:
             | Might be a hint as to how much of M1's prowess is just the
             | process size and how much is Apple.
        
         | JohnJamesRambo wrote:
         | https://jamesallworth.medium.com/intels-disruption-is-now-co...
         | 
         | I think that summarizes it pretty well in that one graph.
        
       ___________________________________________________________________
       (page generated 2021-04-06 23:00 UTC)