hngopher.com

       [HN Gopher] Intel to set its FPGA unit free to pursue its own path
       ___________________________________________________________________
        
       Intel to set its FPGA unit free to pursue its own path
        
       Author : rbanffy
       Score  : 45 points
       Date   : 2023-10-04 16:42 UTC (6 hours ago)
        
 (HTM) web link (www.nextplatform.com)
 (TXT) w3m dump (www.nextplatform.com)
        
       | thenobsta wrote:
       | Every time I see an FPGA article, I feel a little sad that
       | Tabula[1] didn't make it -- 1.6Ghz clock and reprogrammable on
       | the fly. RIP.
       | 
       | 1. https://en.wikipedia.org/wiki/Tabula,_Inc.
        
       | asfarley wrote:
       | Hello to Audrey and James
        
       | almatabata wrote:
       | Damn i hoped we would one day get a customizable fpga into our
       | CPUs. I hoped that it would make sense to install certain
       | instructions on your fpga depending on your workloads. I guess
       | this either kills that possibility or pushes it into a very far
       | future.
       | 
       | I do not understand this part though:
       | 
       | > There was talk of hybrid CPU-FPGA packages, which never seem to
       | get > commercialized because no system architect likes static
       | ratios of compute - > unless they are determining the ratios.
       | Like the hyperscalers and cloud > builders, who can tell
       | companies like Intel and AMD what their product > roadmaps need
       | to look like.
       | 
       | What do not see what they mean by ratio here. Do they mean die
       | ratio between cpu and fpga?
        
         | tverbeure wrote:
         | > I hoped that it would make sense to install certain
         | instructions on your fpga depending on your workloads.
         | 
         | It's one of those things that seem like a good idea, but they
         | just don't work out in practice. FPGA LUTs are just way too
         | slow. You'd have to find a case where doing something on a 3GHz
         | CPU clock running multiple instruction parallel gets
         | outperformed by LUTs that runs at 700MHz (at best). And when
         | you cascade the LUTs, they become slower too.
         | 
         | And that's without solving the problem of closely coupling a
         | CPU pipeline with FPGA logic.
         | 
         | > What do not see what they mean by ratio here. Do they mean
         | die ratio between cpu and fpga?
         | 
         | What they mean is: in something like the Zynq FPGA family, I
         | want a die with 2 CPU cores and 5000K LUTs. The other guy wants
         | 8 CPU cores and 2000K LUTs. It works for narrow applications
         | like signal processing where power efficiency and cost isn't a
         | top concern, but for a hyperscaler, power consumption is a very
         | important metric. As is the cost of paying for a significant
         | part of the silicon die that's sitting there unused.
        
           | mikewarot wrote:
           | The right kind of sea of LUTs can outperform anything even if
           | it's clocked at 100 Mhz... the trick is to get a pipeline
           | filled, instead of trying to outrun light.
           | 
           | Imagine an LLM with a new token every 10 nS
        
           | duskwuff wrote:
           | > It's one of those things that seem like a good idea, but
           | they just don't work out in practice.
           | 
           | GPGPU sucks a lot of air out of the room as well. There
           | aren't many purely computational problems which FPGAs can
           | solve better than a compute-optimized GPU; even though GPUs
           | aren't quite as flexible, they clock a lot faster, they're
           | cheaper, and they're easier to develop for.
        
           | KirillPanov wrote:
           | > where doing something on a 3GHz CPU clock running multiple
           | instruction parallel gets outperformed by LUTs that runs at
           | 700MHz
           | 
           | Easy: go wide.
           | 
           | Make the FPGA-CPU interface four times wider on the FPGA side
           | than the CPU side. Each tick of the CPU clock reads (or
           | writes) one quarter of the bits.
        
           | j_not_j wrote:
           | > FPGA LUTs are just way too slow
           | 
           | If, and of course that is a big if, you can repackage a
           | (parallelizable) calculation into FPGA look-up tables and
           | implement multiples of this (e.g. 8 to 80 times) then you can
           | think maybe it's quicker than CPU at 3GHz.
           | 
           | However, you have to include DMA of the data to and fro. It's
           | unlikely to be worth the very extensive effort of integrating
           | two wildly different technologies.
           | 
           | On the other hand, it may not be a complicated calculation
           | but FPGA can do much lower latency and smaller variance in
           | latency (hello high-frequency traders). That is a very narrow
           | niche.
           | 
           | A simple board with CPU and FPGA is the Arduino MKR Vidor
           | 4000: ARM Cortex 32-bit CPU and Intel Cyclone 10 FPGA).
           | Hardware cost: $85. Full suite of development software $1000
           | or more (although lesser tools are available for free.)
        
             | imtringued wrote:
             | >However, you have to include DMA of the data to and fro.
             | It's unlikely to be worth the very extensive effort of
             | integrating two wildly different technologies.
             | 
             | That is exactly the part where having the FPGA next to the
             | CPU helps... You can transparently access the CPU cache via
             | an AXI slave port on the CPU on AMD's MPSoCs at a rate of
             | up to 16 bytes per cycle and you get multiple of those.
        
           | almatabata wrote:
           | Thanks for clarifying.
        
         | amluto wrote:
         | Integrating an FPGA with the actual front-end and register
         | files seems so you can invoke it synchronously, with fast
         | instructions at low latency, seems neat but rather complicated.
         | As for an FPGA asynchronously accessing application memory, I
         | tentatively expect CXL with some shared virtual memory trickery
         | to succeed in this space, at least in a couple years when the
         | dust hopefully settles, and then you can do whatever you want.
        
         | dralley wrote:
         | > Damn i hoped we would one day get a customizable fpga into
         | our CPUs. I hoped that it would make sense to install certain
         | instructions on your fpga depending on your workloads. I guess
         | this either kills that possibility or pushes it into a very far
         | future.
         | 
         | Depends on what AMD does with Xilinx.
        
           | imtringued wrote:
           | I am actually surprised how AMD managed to successfully
           | leverage it's FPGAs for machine learning inference. It is
           | competing with Nvidia's Jetson.
        
           | gsmecher wrote:
           | > Depends on what AMD does with Xilinx.
           | 
           | Currently the AMD/Xilinx dynamic seems to reverse this:
           | "Depends on what Xilinx does with AMD".
           | 
           | AMD's software roadmap for AI/datacentre leans heavily on
           | Vitis (for software) and AI Engines (as an execution
           | platform). CPUs that integrate AI engines are already
           | shipping (Ryzen AI). It's Xilinx technology, but you should
           | expect it to look more like a GPU accelerator than a
           | traditional LUTs-and-routing FPGA. And, as duskwuff have
           | pointed out, this sucks a lot of the oxygen out of the CPU-
           | with-FPGA design space.
        
         | bfrog wrote:
         | "a customizable fpga into our CPUs" that already happened, it
         | just didn't happen in x86 land. There have been a good number
         | of products from various vendors that connect up hard cores and
         | fpga fabric.
         | 
         | power pc cores, riscv cores, and by large arm cores
        
           | tverbeure wrote:
           | That's not what OP meant though. They were talking about
           | custom CPU instructions implemented with FPGA logic.
        
             | bfrog wrote:
             | That doesn't sound that beneficial honestly
        
         | throwaway4590 wrote:
         | Whenever I see talk about Intel's FPGA unit, I link back to an
         | invention I submitted to Intel while I was an intern there [0].
         | I went through the patent pipeline, but to my knowledge they
         | never did anything with it. This was during the excitement of
         | Intel's original acquisition of Altera.
         | 
         | In fairness, I never mocked up a true enough implementation in
         | Verilog to get an idea of real world speedup, and even now, I'm
         | not sure exactly what operations you could see real gain with
         | from small reconfigurable fabrics near the CPU. Still, I liked
         | the elegance of having L1-L3+ FPGA's for speeding up operations
         | of increasing levels of complexity, and I figured programmers
         | smarter than me would find creative ways of using the FPGA's
         | with the added instructions.
         | 
         | [0] https://patents.google.com/patent/US10310868B2/
        
           | almatabata wrote:
           | Thanks for sharing. Small question about Image 20, does that
           | represent a use case for an instruction translator? For
           | example you have an arm chip and you want to run x86 code so
           | you offload the x86 instructions to the fpga?
        
             | throwaway4590 wrote:
             | I believe my contributions start at Image 25 on Google.
             | Images 1-24 are generic CPU boilerplate images that the
             | lawyers add to most patents in the field.
        
         | eschneider wrote:
         | This doesn't make a lot of sense. I mean, there are SOCs out
         | there with asymmetric cores (say, an ARM A53 and an ARM M4 on
         | the same die) for folks who's workloads warrant that sorta
         | thing. I'd expect there'd be s similar market for CPUs, with
         | built in FPGAs of various sizes.
        
           | tverbeure wrote:
           | It only makes sense for a few applications. See the popular
           | Xilinx Zynq UltraScale MPSoC product line. They are popular
           | for digital signal processing, for example. But they are not
           | power efficient, and they are very expensive.
           | 
           | Good enough for a low volume custom solution for which custom
           | silicon is too expensive. Not for a hyperscaler.
        
       | varelse wrote:
       | [dead]
        
       | mips_r4300i wrote:
       | Thank goodness. I've been expecting this ever since Intel bought
       | Altera, they just stuck with it a couple years longer than I
       | figured.
       | 
       | They focused solely on the high end, but it turns out nobody
       | really wants FPGA fabric on a CPU. You can already do
       | acceleration over a PCI express link, and that's what you more
       | often do with embedded applications where the CPU is acting more
       | like a dispatch controller than doing the real work.
       | 
       | Intel also have completely ignored the low end of the market. The
       | only true lowend part they have is the Cyclone 10LP, which is
       | literally the exact same part as the cyclone 3/4 from 2008. Just
       | slightly die shrunk. No hard IP support like ddr3 controllers, no
       | MIPI, nothing that people are getting from the competition now.
       | 
       | Intel did realize this, which is why the new AgileX family
       | includes some "low-mid range" parts, but they will be still much
       | more expensive. Low-end to Intel means "under $1k unit cost"
       | which ignores a huge part of the market.
       | 
       | They have better tools, documentation, and support than Gowin,
       | who is a recent Chinese FPGA upstart using stolen Lattice IP and
       | hires. But they will lose to Gowin by default in the commodity
       | space unless they do something.
        
         | Tuna-Fish wrote:
         | They did not ignore the low end by choice.
         | 
         | The entire story of Altera inside Intel can be summarized as:
         | 
         | Intel fabs make amazing promises about process performance and
         | availability. Altera builds their product stack on that. In the
         | end, the fabs fail to deliver either performance, or sufficient
         | amount of manufacturing capability. Now Altera has to pick
         | which products they want to ship. They obviously can the low
         | end. Even the high end that ships is horribly late, because of
         | manufacturing issues.
         | 
         | There would have been massive demand for the combined
         | Intel+Altera products. Many large customers built their future
         | based on the marketing promises Intel made, and when they
         | couldn't deliver, those customers had to redevelop everything
         | on something else. As an example, look up Nokia Reefshark.
        
         | trsohmers wrote:
         | They have announced the new Agilex 3 line, which should include
         | some CPLD price point parts and be a real rebirth for
         | ~$100/unit modern devices.
        
           | bfrog wrote:
           | Lets see I guess... I'm not holding my breath, but it'd be
           | great to not use Vivado's slow ass Java IDE one day. Quartus
           | is light years faster seemingly.
        
       | SilverBirch wrote:
       | Yeah it was really funny watching Intel buy Altera at the same
       | time that they were spinning out McAfee and thinking "well we'll
       | see how long this lasts..."
       | 
       | Big chunk of the team from Altera are at AMD now anyway.
       | 
       | Hopefully they finally get back to innovating on the actual FPGA
       | now. I'm so tired of the hardened rubbish and cpu integrated
       | rubbish.
        
         | aleph_minus_one wrote:
         | > I'm so tired of the hardened rubbish and cpu integrated
         | rubbish.
         | 
         | Was there actually a way to access a CPU-integrated FPGA as an
         | "ordinary" user/customer (i.e. not a "special customer")?
        
       | brucethemoose2 wrote:
       | > We wouldn't place heavy bets on Falcon Shores making it to
       | completion unless a big HPC center adopts it, and given how
       | Argonne National Laboratory was treated, we don't think there
       | will be a lot of uptake unless Intel makes some pretty big
       | pricing concessions. Which it can ill afford. Hybrid CPU-GPU
       | devices - the original plan for Falcon Shores, have also been
       | shelved.
       | 
       | That's even more eyebrow raising than an Altera spinoff.
       | 
       | Altera is a good side business, but Falcon Shores is like Intel's
       | consolidated future. If they just let that go... What do they
       | expect? That everyone will just buy Xeon CPUs and IGP laptops
       | forever?
        
         | chx wrote:
         | Look at
         | https://benchmark.chaos.com/v5/vray?index=1&ordering=desc&by...
         | Intel can ill afford to think about forever. They have a runway
         | built by illegal monopoly tactics. What that runway ends, the
         | music stops unless they do something _very_ drastic to their
         | CPUs _right now_. The fastest 96 core AMD CPU alone is 30%
         | faster than the fastest Intel offering, 120 cores in two
         | sockets -- and that 's not the fastest CPU AMD offers.
         | 
         | This is not to say Intel will go bankrupt look at the number of
         | quarters AMD spent in red but it really doesn't want to become
         | #2.
        
       | tester756 wrote:
       | It will hurt them on earnings in 2024?
       | 
       | What they gain from it? Is there some deal with TSMC behind the
       | scenes?
       | 
       | It seems like TSMC is investing in some Intel's companies IMS and
       | now this
        
         | SilverBirch wrote:
         | They're just dumping the distraction from their core business,
         | the Altera acquisition was one of Bryan krzanich's many mid-
         | steps, buying and betting instead of running a business.
        
       ___________________________________________________________________
       (page generated 2023-10-04 23:00 UTC)