hngopher.com

       [HN Gopher] Intel's Ponte Vecchio: Chiplets Gone Crazy
       ___________________________________________________________________
        
       Intel's Ponte Vecchio: Chiplets Gone Crazy
        
       Author : rbanffy
       Score  : 109 points
       Date   : 2023-09-25 07:39 UTC (15 hours ago)
        
 (HTM) web link (chipsandcheese.com)
 (TXT) w3m dump (chipsandcheese.com)
        
       | baq wrote:
       | > With that in mind, Ponte Vecchio is better seen as a learning
       | experience. Intel engineers likely gained a lot of experience
       | with different process nodes and packaging technologies while
       | developing PVC
       | 
       |  _cough_ An expensive lesson, I'm sure.
        
         | hinkley wrote:
         | Cheaper than Itanium I bet.
        
           | failuser wrote:
           | Itanium killed enough competitors by sheer announcement that
           | it might have been a positive for Intel in the end.
        
           | jahav wrote:
           | Sure, but position of Intel back then was very different than
           | today.
           | 
           | Being dethroned and free cash flow negative is rather bad I
           | am told.
        
       | nwiswell wrote:
       | Does this feel a lot like Xeon Phi v3.0 to anybody else?
       | 
       | Intel's strategy here is baffling to me. Rather than keep trying
       | to improve their existing line of coprocessors (and most
       | critically, keep accumulating key talent), they kill off the
       | program, scatter their talent to the four winds, wait a couple
       | years, and then launch another substandard product.
        
         | mastax wrote:
         | I think Intel's strategy, in a broad sense, makes sense. Xeon
         | Phi succeeded in a few tiny niches, but they need a real GPGPU
         | in order to compete in the broader market this decade. They
         | tried to make their microarchitecture broadly similar to their
         | competitors' to reduce risk and improve software compatibility.
         | They knew their architecture (and software) wouldn't be as good
         | as their experienced competitors' but thought that at the high
         | end they could use their advanced packaging technology as an
         | advantage. In hindsight that was maybe over-ambitious if it
         | caused the substantial delays (I don't think we know that for
         | certain but it's a good guess) but maybe it will pay dividends
         | in the next product. You do have to take some risks when you're
         | in last place.
        
           | nwiswell wrote:
           | I just don't understand why they would keep shutting programs
           | down rather than doing course corrections toward a more
           | competitive GPGPU. This behavior stretches all the way back
           | to Larrabee in 2010.
           | 
           | If I was a betting man, I would bet that this project is dead
           | inside 36 months. And if I was a GPU designer, I'd
           | accordingly not touch Intel with a barge pole. They've
           | painted themselves into a corner.
           | 
           | I personally know GPU experts who left Intel for Nvidia
           | because of this. I can't imagine they would consider going
           | back at this point.
        
         | brokencode wrote:
         | This is typical of Intel's weak leadership and focus on short
         | term profits instead of long term success.
         | 
         | Just look at how they dragged their feet in transitioning to
         | EUV because it was too expensive. This contributed to large
         | delays in their 10 and 7 nm processes and a total loss in their
         | process leadership.
         | 
         | And look at how many billions they poured into making a 5G
         | modem only to give up and sell their IP to Apple.
         | 
         | Or how they dragged their feet in getting into mobile, then
         | came out with Atom way too late to be successful in the market.
         | They essentially gave the market to ARM.
         | 
         | Optane is another recent example. Cool technology, but if a
         | product is not a smashing success right away, Intel throws in
         | the towel.
         | 
         | There's no real long term vision that I can see. No resilience
         | to challenges or ability to solve truly difficult problems.
        
           | tester756 wrote:
           | >Optane is another recent example. Cool technology, but if a
           | product is not a smashing success right away, Intel throws in
           | the towel.
           | 
           | Wasn't the actual (partial) reason that they didnt have a
           | place to actually create them since Micron sold the fab?
           | 
           | https://www.extremetech.com/computing/320932-micron-
           | ends-3d-...
        
             | brokencode wrote:
             | From my understanding, the problem was that it wasn't
             | selling well enough and they decided to cut their losses.
             | 
             | I'm not saying that Optane was a hill they needed to die
             | on, but it's just another example of their failed
             | leadership and decision making.
             | 
             | Look at how AMD is pursuing and largely succeeding with
             | their vision of using chiplets in their CPUs and GPUs to
             | enable significantly higher core counts at a lower cost.
             | 
             | Or how Nvidia is innovating with massive AI supercomputers,
             | ray tracing, and DLSS.
             | 
             | What is Intel's vision? In what way are they inventing the
             | next generation of computing? It seems to me that their
             | company objective is just to play catch up with AMD and
             | Nvidia.
        
             | wtallis wrote:
             | I think it's fair to say that Optane was not merely "not a
             | smashing success" but was completely uneconomical. Intel
             | was essentially using Optane products as loss leaders to
             | promote platform lock-in, and had limited uptake. Micron
             | made only the smallest token attempt to bring 3D XPoint to
             | market before bailing. Clearly neither partner saw a way
             | forward to reduce the costs drastically to make it
             | competitive as a high-volume product.
        
           | qwytw wrote:
           | > They essentially gave the market to ARM
           | 
           | They also had the best ARM chips for years with
           | StrongARM/Xscale (using their own cores). Which they killed
           | because obviously Atom was going to be much better and lock
           | in everyone into x86...
        
         | [deleted]
        
       | brrrrrm wrote:
       | > This is likely a compiler issue where the v0 += acc * v0
       | sequence couldn't be converted into a FMA instruction.
       | 
       | Err, is the ISA undocumented/impossible to inspect in the
       | execution pipeline? Seems like an important thing to verify/fix
       | for a hardware benchmark...
        
         | tremon wrote:
         | Yes, at least for as far as I know. The actual micro-ops
         | resulting from the instruction stream are invisible. You can
         | count the number of uops issued and partly deduce how the
         | instructions were decoded, but not view the uops themselves.
        
         | wtallis wrote:
         | From the preceding paragraph:
         | 
         | > We weren't able to get to the bottom of this because we don't
         | have the profiling tools necessary to get disassembly from the
         | GPU.
        
           | touisteur wrote:
           | And that's all I need to know about replacing all NVIDIA
           | stuff. I know it's pretty hard to get there, but Intel should
           | know that having a serious general purpose computing thing
           | means solid compilers, toolchains, optimized libraries, and a
           | whole lot of mindshare (as in 'a large number of people
           | willing to throw their time to test your stuff').
        
             | ndneighbor wrote:
             | I am an Intel shill lately but I think it's more of a time
             | thing rather than the desire to keep stuff a secret.
             | They've been pretty good about open documentation on the
             | stuff that matters (like this) such as OpenVINO.
        
               | touisteur wrote:
               | I was a bit annoyed about the OpenVINO reference, because
               | I felt they closed most of the things about myriad-x and
               | the SHAVE arch. And last time I tried OpenVINO on
               | TigerLake I was left with a very thick pile of
               | undebuggable, uninspectable opencl-y stuff, very bad
               | taste in my mouth.
               | 
               | I mean OpenVINO's perf is up there on Intel CPUs and it's
               | a great optimising compiler, I've thrown a lot of weird
               | stuff in there and it didn't crap out with complaints
               | about unsupported layers or unsupported combination of
               | layers. It also has an OK batching story (as opposed to
               | TVM last time I checked...) if you're ready to perform
               | some network surgery.
               | 
               | I also feel it's very bad at reporting errors, and
               | stepping through with gdb is one of the worst
               | experiences... BUT but yeah most of the code is available
               | now.
               | 
               | Now if they could stop moving shit around, and renaming
               | stuff, it'd be great. Hoping they settle on 'OneAPI' for
               | some time.
        
               | bigbillheck wrote:
               | SHAVE was such a cool architecture, it's too bad about
               | all the secrecy.
        
           | colejohnson66 wrote:
           | Is the Intel Xe ISA even publicly documented? I've searched
           | before and I can't find a PDF detailing the instruction set.
           | AMD releases them,[0] but I can't find anything from Intel
           | (or Nvidia for that matter).
           | 
           | [0]: RDNA2 ISA:
           | https://www.amd.com/content/dam/amd/en/documents/radeon-
           | tech...
        
             | wmf wrote:
             | https://www.intel.com/content/www/us/en/docs/graphics-for-
             | li... (Alchemist is a variant of Xe)
        
       | kcb wrote:
       | Intel( and AMD) need to get their high end GPUs offered by a
       | cloud provider. Total non-starter until then.
        
       | ds wrote:
       | The potential for intel to explode is definitely there if intel
       | executes with its AI demand.
       | 
       | I suppose one unknown catalyst with intel is what happens in
       | taiwan/china. If things get crazy over there, suddenly intel
       | seems alot more valuable as the 'US' chip maker (they produce
       | roughly 75% in the US iirc). If the gov starts to even more
       | heaivly subsidize non-reliance on asia, intel could find major
       | gains if TSMC/samsung get shut out.
       | 
       | I mean, just look at the market caps- Intel is worth 6x less than
       | nvidia despite historically having the same or greater gross
       | revenue (not counting the most recent quarter of course).
        
         | eklitzke wrote:
         | Absolutely. We're still in early days, but the products that
         | Intel has announced in this space are impressive, and if they
         | execute well they should be able to capture a significant
         | amount of market share. That isn't to say that they will be the
         | majority or dominant player in this space, but even capturing
         | 10% or 20% of the datacenter GPU market in the next few years
         | would be a win for Intel.
         | 
         | Intel is also well known for inking long-term deals with major
         | discounts for big customers (Google, Facebook, etc.) that can
         | commit to purchasing large amounts of hardware, whereas Nvidia
         | doesn't really have the same reputation. It's conceivable that
         | Intel could use this strategy to help bootstrap their server
         | GPU business. The Googles and Facebooks of the world are going
         | to have to evaluate this in the context of how much additional
         | engineering work it is to support and debug multiple GPU
         | architectures for their ML stack, but thinking long-
         | term/strategically these companies should be highly motivated
         | to negotiate these kinds of contracts to avoid lock-in and get
         | better discounts.
        
         | washadjeffmad wrote:
         | I was surprised by how poorly poised Intel was to act on the
         | "Cambrian explosion" of AGI late last year. After the release
         | of their Intel Arc GPUs, it took almost two quarters for their
         | Intel Extensions for PyTorch/TensorFlow to be released, to
         | middling support and interest, which hasn't changed much,
         | today.
         | 
         | How many of us learned ML using Compute Sticks, OpenVINO and
         | OneAPI or another of their libraries or frameworks, or their
         | great documentation? It's like they didn't really believe in it
         | outside of research.
         | 
         | What irony is it when a bedrock of "AI" fails to dream?
        
           | version_five wrote:
           | Maybe I'm thinking about it too simply but yeah I agree.
           | 
           | Language models in particular are very similar architectures
           | and effectively a lot of dot products. And running them on
           | GPU's is arguably overkill. Look at llama.cpp for the way the
           | industry is going. I want a fast parallel quantized dot
           | product instruction on a CPU, and I want the memory bandwidth
           | to keep it loaded up. Intel should be able to deliver that,
           | with none of the horrible baggage that comes from CUDA and
           | nvidia drivers.
        
         | bsder wrote:
         | > The potential for intel to explode is definitely there if
         | intel executes with its AI demand.
         | 
         | Nope. Intel doesn't get "It's the software, stupid."
         | 
         | Intel is congenitally unable to pay software people more than
         | their engineers--and they treat their engineers like crap,
         | mostly. And they're going to keep getting beaten black and blue
         | by nVidia for that.
        
           | mastax wrote:
           | I think Intel is doing relatively well on the software side,
           | given how short a time frame we're talking about. OneAPI is
           | in the same ballpark as AMD and on a better trajectory, I
           | think. They're competing for second place, remember.
           | 
           | The more disappointing thing for me is that they bought like
           | 5 AI startups pretty early on and have basically just shut
           | most of them down. Maybe that was always the plan? See which
           | ones develop the best and consider the rest to be acqui-
           | hires? But I think it's more likely just fallout from Intel's
           | era of flailing around and acquiring random crap.
        
           | varelse wrote:
           | [dead]
        
         | hnav wrote:
         | Per the article, this is on TSMC's 5nm node, though it does
         | seem that Intel has some level of support from the US govt
         | since it's the only onshore player there.
        
       | dyingkneepad wrote:
       | So the author compares it with a bunch of other GPUs, but: what
       | about the price? I mean yeah H100 looks better in the graphs, but
       | does it cost the same?
        
         | wmf wrote:
         | I don't know if there even is a price. Maybe Intel is just
         | giving them out for free.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-09-25 23:00 UTC)