[HN Gopher] Intel to set its FPGA unit free to pursue its own path ___________________________________________________________________ Intel to set its FPGA unit free to pursue its own path Author : rbanffy Score : 45 points Date : 2023-10-04 16:42 UTC (6 hours ago) (HTM) web link (www.nextplatform.com) (TXT) w3m dump (www.nextplatform.com) | thenobsta wrote: | Every time I see an FPGA article, I feel a little sad that | Tabula[1] didn't make it -- 1.6Ghz clock and reprogrammable on | the fly. RIP. | | 1. https://en.wikipedia.org/wiki/Tabula,_Inc. | asfarley wrote: | Hello to Audrey and James | almatabata wrote: | Damn i hoped we would one day get a customizable fpga into our | CPUs. I hoped that it would make sense to install certain | instructions on your fpga depending on your workloads. I guess | this either kills that possibility or pushes it into a very far | future. | | I do not understand this part though: | | > There was talk of hybrid CPU-FPGA packages, which never seem to | get > commercialized because no system architect likes static | ratios of compute - > unless they are determining the ratios. | Like the hyperscalers and cloud > builders, who can tell | companies like Intel and AMD what their product > roadmaps need | to look like. | | What do not see what they mean by ratio here. Do they mean die | ratio between cpu and fpga? | tverbeure wrote: | > I hoped that it would make sense to install certain | instructions on your fpga depending on your workloads. | | It's one of those things that seem like a good idea, but they | just don't work out in practice. FPGA LUTs are just way too | slow. You'd have to find a case where doing something on a 3GHz | CPU clock running multiple instruction parallel gets | outperformed by LUTs that runs at 700MHz (at best). And when | you cascade the LUTs, they become slower too. | | And that's without solving the problem of closely coupling a | CPU pipeline with FPGA logic. | | > What do not see what they mean by ratio here. Do they mean | die ratio between cpu and fpga? | | What they mean is: in something like the Zynq FPGA family, I | want a die with 2 CPU cores and 5000K LUTs. The other guy wants | 8 CPU cores and 2000K LUTs. It works for narrow applications | like signal processing where power efficiency and cost isn't a | top concern, but for a hyperscaler, power consumption is a very | important metric. As is the cost of paying for a significant | part of the silicon die that's sitting there unused. | mikewarot wrote: | The right kind of sea of LUTs can outperform anything even if | it's clocked at 100 Mhz... the trick is to get a pipeline | filled, instead of trying to outrun light. | | Imagine an LLM with a new token every 10 nS | duskwuff wrote: | > It's one of those things that seem like a good idea, but | they just don't work out in practice. | | GPGPU sucks a lot of air out of the room as well. There | aren't many purely computational problems which FPGAs can | solve better than a compute-optimized GPU; even though GPUs | aren't quite as flexible, they clock a lot faster, they're | cheaper, and they're easier to develop for. | KirillPanov wrote: | > where doing something on a 3GHz CPU clock running multiple | instruction parallel gets outperformed by LUTs that runs at | 700MHz | | Easy: go wide. | | Make the FPGA-CPU interface four times wider on the FPGA side | than the CPU side. Each tick of the CPU clock reads (or | writes) one quarter of the bits. | j_not_j wrote: | > FPGA LUTs are just way too slow | | If, and of course that is a big if, you can repackage a | (parallelizable) calculation into FPGA look-up tables and | implement multiples of this (e.g. 8 to 80 times) then you can | think maybe it's quicker than CPU at 3GHz. | | However, you have to include DMA of the data to and fro. It's | unlikely to be worth the very extensive effort of integrating | two wildly different technologies. | | On the other hand, it may not be a complicated calculation | but FPGA can do much lower latency and smaller variance in | latency (hello high-frequency traders). That is a very narrow | niche. | | A simple board with CPU and FPGA is the Arduino MKR Vidor | 4000: ARM Cortex 32-bit CPU and Intel Cyclone 10 FPGA). | Hardware cost: $85. Full suite of development software $1000 | or more (although lesser tools are available for free.) | imtringued wrote: | >However, you have to include DMA of the data to and fro. | It's unlikely to be worth the very extensive effort of | integrating two wildly different technologies. | | That is exactly the part where having the FPGA next to the | CPU helps... You can transparently access the CPU cache via | an AXI slave port on the CPU on AMD's MPSoCs at a rate of | up to 16 bytes per cycle and you get multiple of those. | almatabata wrote: | Thanks for clarifying. | amluto wrote: | Integrating an FPGA with the actual front-end and register | files seems so you can invoke it synchronously, with fast | instructions at low latency, seems neat but rather complicated. | As for an FPGA asynchronously accessing application memory, I | tentatively expect CXL with some shared virtual memory trickery | to succeed in this space, at least in a couple years when the | dust hopefully settles, and then you can do whatever you want. | dralley wrote: | > Damn i hoped we would one day get a customizable fpga into | our CPUs. I hoped that it would make sense to install certain | instructions on your fpga depending on your workloads. I guess | this either kills that possibility or pushes it into a very far | future. | | Depends on what AMD does with Xilinx. | imtringued wrote: | I am actually surprised how AMD managed to successfully | leverage it's FPGAs for machine learning inference. It is | competing with Nvidia's Jetson. | gsmecher wrote: | > Depends on what AMD does with Xilinx. | | Currently the AMD/Xilinx dynamic seems to reverse this: | "Depends on what Xilinx does with AMD". | | AMD's software roadmap for AI/datacentre leans heavily on | Vitis (for software) and AI Engines (as an execution | platform). CPUs that integrate AI engines are already | shipping (Ryzen AI). It's Xilinx technology, but you should | expect it to look more like a GPU accelerator than a | traditional LUTs-and-routing FPGA. And, as duskwuff have | pointed out, this sucks a lot of the oxygen out of the CPU- | with-FPGA design space. | bfrog wrote: | "a customizable fpga into our CPUs" that already happened, it | just didn't happen in x86 land. There have been a good number | of products from various vendors that connect up hard cores and | fpga fabric. | | power pc cores, riscv cores, and by large arm cores | tverbeure wrote: | That's not what OP meant though. They were talking about | custom CPU instructions implemented with FPGA logic. | bfrog wrote: | That doesn't sound that beneficial honestly | throwaway4590 wrote: | Whenever I see talk about Intel's FPGA unit, I link back to an | invention I submitted to Intel while I was an intern there [0]. | I went through the patent pipeline, but to my knowledge they | never did anything with it. This was during the excitement of | Intel's original acquisition of Altera. | | In fairness, I never mocked up a true enough implementation in | Verilog to get an idea of real world speedup, and even now, I'm | not sure exactly what operations you could see real gain with | from small reconfigurable fabrics near the CPU. Still, I liked | the elegance of having L1-L3+ FPGA's for speeding up operations | of increasing levels of complexity, and I figured programmers | smarter than me would find creative ways of using the FPGA's | with the added instructions. | | [0] https://patents.google.com/patent/US10310868B2/ | almatabata wrote: | Thanks for sharing. Small question about Image 20, does that | represent a use case for an instruction translator? For | example you have an arm chip and you want to run x86 code so | you offload the x86 instructions to the fpga? | throwaway4590 wrote: | I believe my contributions start at Image 25 on Google. | Images 1-24 are generic CPU boilerplate images that the | lawyers add to most patents in the field. | eschneider wrote: | This doesn't make a lot of sense. I mean, there are SOCs out | there with asymmetric cores (say, an ARM A53 and an ARM M4 on | the same die) for folks who's workloads warrant that sorta | thing. I'd expect there'd be s similar market for CPUs, with | built in FPGAs of various sizes. | tverbeure wrote: | It only makes sense for a few applications. See the popular | Xilinx Zynq UltraScale MPSoC product line. They are popular | for digital signal processing, for example. But they are not | power efficient, and they are very expensive. | | Good enough for a low volume custom solution for which custom | silicon is too expensive. Not for a hyperscaler. | varelse wrote: | [dead] | mips_r4300i wrote: | Thank goodness. I've been expecting this ever since Intel bought | Altera, they just stuck with it a couple years longer than I | figured. | | They focused solely on the high end, but it turns out nobody | really wants FPGA fabric on a CPU. You can already do | acceleration over a PCI express link, and that's what you more | often do with embedded applications where the CPU is acting more | like a dispatch controller than doing the real work. | | Intel also have completely ignored the low end of the market. The | only true lowend part they have is the Cyclone 10LP, which is | literally the exact same part as the cyclone 3/4 from 2008. Just | slightly die shrunk. No hard IP support like ddr3 controllers, no | MIPI, nothing that people are getting from the competition now. | | Intel did realize this, which is why the new AgileX family | includes some "low-mid range" parts, but they will be still much | more expensive. Low-end to Intel means "under $1k unit cost" | which ignores a huge part of the market. | | They have better tools, documentation, and support than Gowin, | who is a recent Chinese FPGA upstart using stolen Lattice IP and | hires. But they will lose to Gowin by default in the commodity | space unless they do something. | Tuna-Fish wrote: | They did not ignore the low end by choice. | | The entire story of Altera inside Intel can be summarized as: | | Intel fabs make amazing promises about process performance and | availability. Altera builds their product stack on that. In the | end, the fabs fail to deliver either performance, or sufficient | amount of manufacturing capability. Now Altera has to pick | which products they want to ship. They obviously can the low | end. Even the high end that ships is horribly late, because of | manufacturing issues. | | There would have been massive demand for the combined | Intel+Altera products. Many large customers built their future | based on the marketing promises Intel made, and when they | couldn't deliver, those customers had to redevelop everything | on something else. As an example, look up Nokia Reefshark. | trsohmers wrote: | They have announced the new Agilex 3 line, which should include | some CPLD price point parts and be a real rebirth for | ~$100/unit modern devices. | bfrog wrote: | Lets see I guess... I'm not holding my breath, but it'd be | great to not use Vivado's slow ass Java IDE one day. Quartus | is light years faster seemingly. | SilverBirch wrote: | Yeah it was really funny watching Intel buy Altera at the same | time that they were spinning out McAfee and thinking "well we'll | see how long this lasts..." | | Big chunk of the team from Altera are at AMD now anyway. | | Hopefully they finally get back to innovating on the actual FPGA | now. I'm so tired of the hardened rubbish and cpu integrated | rubbish. | aleph_minus_one wrote: | > I'm so tired of the hardened rubbish and cpu integrated | rubbish. | | Was there actually a way to access a CPU-integrated FPGA as an | "ordinary" user/customer (i.e. not a "special customer")? | brucethemoose2 wrote: | > We wouldn't place heavy bets on Falcon Shores making it to | completion unless a big HPC center adopts it, and given how | Argonne National Laboratory was treated, we don't think there | will be a lot of uptake unless Intel makes some pretty big | pricing concessions. Which it can ill afford. Hybrid CPU-GPU | devices - the original plan for Falcon Shores, have also been | shelved. | | That's even more eyebrow raising than an Altera spinoff. | | Altera is a good side business, but Falcon Shores is like Intel's | consolidated future. If they just let that go... What do they | expect? That everyone will just buy Xeon CPUs and IGP laptops | forever? | chx wrote: | Look at | https://benchmark.chaos.com/v5/vray?index=1&ordering=desc&by... | Intel can ill afford to think about forever. They have a runway | built by illegal monopoly tactics. What that runway ends, the | music stops unless they do something _very_ drastic to their | CPUs _right now_. The fastest 96 core AMD CPU alone is 30% | faster than the fastest Intel offering, 120 cores in two | sockets -- and that 's not the fastest CPU AMD offers. | | This is not to say Intel will go bankrupt look at the number of | quarters AMD spent in red but it really doesn't want to become | #2. | tester756 wrote: | It will hurt them on earnings in 2024? | | What they gain from it? Is there some deal with TSMC behind the | scenes? | | It seems like TSMC is investing in some Intel's companies IMS and | now this | SilverBirch wrote: | They're just dumping the distraction from their core business, | the Altera acquisition was one of Bryan krzanich's many mid- | steps, buying and betting instead of running a business. ___________________________________________________________________ (page generated 2023-10-04 23:00 UTC)