[HN Gopher] AMD Patent Reveals Hybrid CPU-FPGA Design That Could... ___________________________________________________________________ AMD Patent Reveals Hybrid CPU-FPGA Design That Could Be Enabled by Xilinx Tech Author : craigjb Score : 137 points Date : 2021-01-03 17:39 UTC (5 hours ago) (HTM) web link (hothardware.com) (TXT) w3m dump (hothardware.com) | rwmj wrote: | About *!$% time! I was hoping Intel would do something like this | when they acquired Altera a few years back. Does anyone know why | Intel acquired Altera? | PedroBatista wrote: | Almost the same reason someone buys a Peloton bike or rusted | old Porsche. Because someone had a dream last night and have | the money. | d_tr wrote: | AFAIK there exist some Xeon + FPGA chips. No clue about | availability though... | mhh__ wrote: | Xilinx already have ARM cores in their FPGAs so I wonder which | way they'll go - I'd honestly prefer a neoverse core than an X86 | efferifick wrote: | Not sure how realistic it would be, but I would like to see a | RISC-V base core, and the FPGA implementing the extensions. | Why? Because it would be cool! Also, I don't really see a use | case except for debugging compilers supporting multiple RISC-V | extensions and what not. | craigjb wrote: | Microchip has the product for you then! Well, the RiscV part | anyway. https://www.microsemi.com/product-directory/soc- | fpgas/5498-p... | jagger27 wrote: | AMD already has full-on Arm products. | | https://www.amd.com/en/amd-opteron-a1100 | sbrorson wrote: | You are right the ARM cores, mostly. Xilinx Zynq devices have | ARM A devices built into them as "hard" cores. That is, the | ARMs are instantiated directly in silicon, not as "soft" cores | which take LUTs (gates) from the FPGA fabric. The ARM A is a | microprocessor (not a microcontroller) powerful enough to run | Linux. | | The ARM connects to the FPGA fabric using a so-called AXI bus, | which is a local bus defined by ARM. Xilinx supplies a bunch of | "soft" cores which you can instantiate in the FPGA and | integrate with the ARM. Of course, you can write your own logic | for the FPGA too, as long as you can figure out how to | interface to it using one of the AXI bus variants. | | Several vendors offer experimenters platforms which are | affordable enough for hobbyists and folks making engineering | prototypes. Examples are the Avnet's Zed board and Digilent's | Zybo board. | | The biggest problem with the Zynq ecosystem is that the Xilinx | tools -- Vivado/SDK and whatever they renamed it to last year | -- are steaming piles of smelly brown stoff. Vivado is buggy, | poorly supported, has bad documentation, and the supplied | examples typically don't work in the latest version of Vivado | since they were written long ago and have been made obsolete | via version skew. An absolute disgrace compared to what | software engineers are used to. The SDK is basically Eclipse | which has its own problems, but is not as bad as Vivado. Ask me | how I know. | | I think AMD and Xilinx have a long way to go before they can | satisfy the hype and speculation I see in all the posts here. I | suppose one could shell out $20K for a seat of Synopsys if one | wanted a decent set of dev tools, but that's not the direction | most software engineers are going nowadays. | | Also, assuming NVidia completes its acquisition of ARM, the | whole Zynq ecosystem is imperiled since it pits ARM against | NVidia. | ohazi wrote: | For decades, the FPGA vendors have had this fever dream of "an | FPGA in every PC" -- either as an add-on card, or as part of the | chipset on a motherboard -- that would enable a compiler or | operating system to seamlessly accelerate arbitrary tasks on | demand. | | In my opinion, the problem has always been their software: the | FPGA vendor tools are slow, bloated monstrosities. The core of | these tools are written by the big three EDA vendors (Cadence, | Synopsys, and Mentor Graphics) rather than the FPGA vendors | themselves. The licenses include ridiculous, paranoid | restrictions [1] and force the FPGA vendors to keep their | bitstream formats and timing databases secret [2] in order to | prevent competition from other tool vendors. Most FPGA vendors | didn't see this as a problem, but even the ones that did didn't | have much of a choice, because the tool market is a cartel. | | Thankfully, we now have an open source toolchain [3] with support | for a growing number of FPGA architectures [4], and using it vs. | the vendor tools is like using gcc or llvm vs. a '90s era, non- | compliant C++ compiler. It even has a real IR that isn't Verilog, | which has made it easier to design new HDLs [5]. | | I don't see how a dynamic FPGA accelerator platform can be even | remotely viable without this. It's the difference between a | developer getting to choose between one of a few dozen pre-baked | designs that lock up the entire FPGA (and needing to learn how to | shovel data into it), vs. a compiler flag that can give you the | option of unrolling any loop directly into any inactive region of | FPGA fabric. | | It would be quite the cherry on top to see AMD build something | interesting in this space. But unless they're willing to fully | unencumber at least this one design, I think the effort is likely | to fail. The open source guys are chomping at the bit to make | this work, and have been making real progress lately. Meanwhile, | the EDA vendors have been making promises, failing, and throwing | tantrums for the last 20 years. It's time to write them off. | | [1] | https://twitter.com/OlofKindgren/status/1052822081652617221?... | | [2] Imagine trying to write an assembler without being allowed to | see the manual that tells you how instructions are encoded. It's | like that, but the state-space is hundreds to thousands of bytes | in multiple configurations rather than a few dozen bits. | | [3] https://github.com/YosysHQ/yosys | | [4] https://symbiflow.github.io/ | | [5] https://github.com/m-labs/nmigen | travis729 wrote: | I would love to hack on FPGAs but always run into the issue of | closed toolchains. The recent open source work is a breath of | fresh air, but we need to see an FPGA vender that embraces and | sponsors this work. | ohazi wrote: | I think/hope it's an unstable equilibrium -- if either | Altera/Intel or Xilinx/AMD give a nod to the open source | tools, the others will follow. | | Lattice is seemingly at "wink wink, nudge nudge" levels of | support -- their lawyers won't allow them to say anything | because they're afraid of pissing off Synopsys, but they also | know that they're currently the best supported platform, and | don't seem interested in deliberately making things | difficult. | mhh__ wrote: | Symbiflow is still a long long way off replacing Vendor tools | at scale, right? | | I'm really liking Clash and Bluespec (Bluespec is completely | open source now) but I don't want to write any conventional | languages. | thrtythreeforty wrote: | What does Bluespec compile to? All the way to a bitstream | (surely not) or to Verilog or an intermediate language? | mhh__ wrote: | Firstly, (for the uninitiated) Bluspec is both a Haskell | DSL (Bluespec Classic) and a Verilog-like language | (Bluespec SystemVerilog) | | It compiles to Verilog, but the stack is much more | integrated than other similar compile-to-verilog HDLs - the | simulator is similar to verilator and much easier to get | started with. | | I'm kind of beginning to feel that Haskell isn't a good | medium for HDL code - Verilog already encourages unreadable | names like "mem_chk_sig_state" and Haskell code is almost | unstructured to my eye (I like functional programming but | it seems hard to keep it readable because of the style it | imposes - the flow is there but the names are usually way | too short for my taste) | ohazi wrote: | I'm pretty sure Bluespec and SpinalHDL compile to Verilog. | Chisel uses it's own IR (FIRRTL). I think Migen used to | target Verilog, but now targets (one of?) the IR(s) that | Yosys supports (RTLIL?). | [deleted] | Traster wrote: | I hate to be that bucket of cold water, but there's _multiple_ | reasons FPGAs haven 't been successful in package with CPUs. | Firstly, the costs of embedding the FPGA - FPGAs are relatively | large and power hungry (for what they can do), if you're sticking | one on a CPU die, you're seriously talking about trading that | against other extremely useful logic. You really need to make a | judgement at purchase time whether you want that dark piece of | silicon instead of CPU cores for day to day use. | | Secondly, whilst they're reconfigurable, they're not | reoconfigurable in the time scales it takes to spawn a thread, | it's more like the same scale of time to compile a program (this | is getting a little better over time). Which makes it a difficult | system design problem to make sure your FPGA is programmed with | the right image to run the software programme you want. If you're | at that level of optimization, why not just design your system to | use a PCI-E board, it'll give you more CPU, and way more FPGA | compute and both will be cheaper because you get a stock CPU and | stock FPGA, not some super custom FPGA-CPU hybrid chip. | | Thirdly the programming model for FPGAs are fundamentally very | different to CPUs, it's dataflow, and generally the FPGA is | completely deterministic. We really don't have a good answer for | writing FPGA logic to handle the sort of cache hierarchy, out of | order execution that CPUs do. So you're not getting the same sort | of advantage that you'd expect from that data locality. It's very | difficult to write CPU/FPGA programs that run concurrently, | almost all solutions today run in parallel - you package up your | work, send it off to the FPGA and wait for it to finish. | | Finally, as others have said - the tools are bad. That's | relatively solvable. | | For me, it boils down to this, if you have an application that | you think would be good on the same package as a CPU, it's | probably worth hardening it into ASIC (see: error correction, | Apple's AI stuff). If you have an application that isn't, then a | PCI-E card is probably a better bet - you get more FPGA, more CPU | and you're not trading the two off. | imtringued wrote: | It's easier to provide "custom instructions" and only | accelerate CPU bottlenecks if you don't have PCIe as a massive | bottleneck. If you are using an accelerator behind a bus you | always have to make sure there is enough work for the | accelerator to justify a data transfer. GPUs are built around | the idea of batching a lot of work and running it in parallel. | You can make an FPGA work like that but you are throwing away | the low latency benefits of FPGAs. | wtallis wrote: | Even the best-case scenarios for integrating a FPGA onto the | same die as CPU cores would still have the FPGA _separate_ | from the CPU cores. It 's really not possible to make an | open-ended high bandwidth low latency interface to a huge | chunk of FPGA silicon part of the regular CPU core's tightly- | optimized pipeline, without drastically slowing down that | CPU. The sane way to use an FPGA is as a coprocessor, not | grafted onto the processor core itself. Then, you're | interacting with the FPGA through interfaces like memory- | mapped IO whether it's on-die, on-package, or on an add-in | card. | Traster wrote: | Yeah, worth mentioning highly optimized FPGA designs run at | up to 600MHz (or to put it another way, 400MHz lower than | what Intel advertised 4 years ago). So at a minimum, you're | going to clock cross, have a >10 cycle pipeline at CPU | speeeds (variable clock) and clock cross back. | jacoblambda wrote: | I definitely agree that a PCI-E card is preferable. Hell even | if you have it in CPU, you probably want it sat on the PCI-E | bus anyways so it can P2P DMA with other hardware. | | Also (not disagreeing but I'm curious), last time I checked | FPGAs could pull off some level of partial reconfiguration in | the millisecond and sub millisecond ranges. I may be a bit off | on these times but I saw them in a research paper a few years | back. What types of speed would be necessary for CPUs to | actually be able to benefit from a small FPGA onboard (rather | than on an expansion card) with all the context switching. | user5994461 wrote: | Yet another patent that should never have been granted. | | SoC have been a thing for a long time. SoC = CPU + FPGA on a | single chip. | | Looking at the patent, the list of 20 claims is absurd. The title | says it all "... PROGRAMMABLE INSTRUCTIONS IN COMPUTER SYSTEMS", | they're trying to patent anything that can run or dispatch | instructions. | mhh__ wrote: | Ironically the neural engine patent is literally the only | public information on how it works I can find | refulgentis wrote: | >> the list of 20 claims is absurd. | | Claims are a union - each individual claim may sound simple, | what matters is the combination. | | >> The title says it all "... PROGRAMMABLE INSTRUCTIONS IN | COMPUTER SYSTEMS", they're trying to patent anything that can | run or dispatch instructions. | | No. The title of a patent is not a patent. | fvv wrote: | Claims define the context and boundaries of the patent | user5994461 wrote: | Every claim is almost a patent on its own. Submit 20 claims | that are progressively more specific, so if one claim is | denied during the patent application or afterwards, the other | claims can still stand. | | Typical strategy is to claim as many things as you can | imagine, like inventing CPU and anything that can evaluate an | instruction and instructions themselves, then remove any | claim that the patent office refuses to grant. | cptskippy wrote: | That's how the industry works. You gather and hoard as many | frivolous patents as you can in a cold war arms race. If a new | company threatens your business, you search your portfolio for | a patent they violated and sue them. | | Companies who grow to a certain size look to be acquired by | larger firms with bigger war chests. | | Sometimes companies recognize patents are stifling progress and | engage in cross licensing or pooling of patents. Sometimes they | do it to gang up on a new rival. | economusty wrote: | Computronium | d_tr wrote: | The main reason I am interested in this acquisition is a (faint) | hope that they open some specs up to help projects like | SymbiFlow. | Scene_Cast2 wrote: | A killer tech for this would be a framework that automatically | reprograms the FPGA and offloads the work if it makes sense. For | example - running k-means? Have your FPGA automatically (with | minimal dev effort) flash to be a Nearest Neighbor accelerator. | | The problem is finding a way to make that translation happen with | minimal dev effort, as software is written rather differently | from hardware. | cashsterling wrote: | I recommend checking out CacheQ: https://cacheq.com/ | | they are working on almost exactly this. If I was an investor, | or Intel or AMD, I would buy them and/or invest heavily. | therealcamino wrote: | Their web site is very sparse on what programming models the | tool supports. Traditionally, the things you can easily | accelerate automatically are algorithms you can write | naturally in Fortran 77 (lots of arrays, no pointers), and | that's one limit on the applicability of these automatic | tools. (Other limits that other posters have pointed out are | compilation+place+route runtime, and reconfiguration time.) | | They are claiming you can use malloc and make "extensive" use | of pointers in C programs and still have them automatically | compiled for the FPGA. That's where details are needed and | they are mostly missing. | | I watched their 30 minute demo film. The speedups are | impressive, and on the small example it's impressive that it | does the partitioning automatically. However, the program | contains only a single call to malloc, and all pointers are | derived from that address, so it doesn't do much to convince | us that it the memory model and alias analysis give you more | flexibility than the F77 model. | d_tr wrote: | You might want to check the "Warp Processing" project out: | http://www.cs.ucr.edu/~vahid/warp/. It is probably exactly what | you are thinking about. Transparent analysis of the instruction | stream at runtime and synthesis and offloading of hot spots to | the FPGA. | Scene_Cast2 wrote: | Huh, interesting. It seems that the work doesn't have to be | explicitly parallel for this to work, which is a surprise. | rch wrote: | I recall reading papers about doing this by profiling Java apps | a decade or so ago, but I would have to dig pretty deep in my | HN comment history to find them. | | The approach seems conceptually similar to the optimizations | available via the enterprise version of GraalVM. | BryanBeshore wrote: | Lisa Su is a fantastic CEO. Time will tell what the impact of | AMD's acquisition of Xilinx will be (should it close), but this | shows the strategy and execution behind Su and team. | | While a lot of acquisitions don't pan out, this seems great. | parsimo2010 wrote: | AMD purchasing Xilinx is a reaction to Intel purchasing Altera | five years ago. Dr. Su might be a good CEO for other reasons, | but this isn't something that illustrates brilliant strategy on | her part. | cptskippy wrote: | The industry doesn't move overnight. AMD might have seen | where Intel was going and didn't want to be caught off guard, | or that might be the alternative to Apple approach of dozens | of coprocessors on a chip. | BryanBeshore wrote: | As I said, time will tell | rusticpenn wrote: | Intel did not produce anything worthwhile from that strategy | yet and I have seen no plans either. I use Altera for all my | FPGA needs. | ATsch wrote: | A large reason for the deal with Altera was that Altera | already used intel for fabrication. I understand Intel's | 10nm and 7nm failure has hurt them a lot in that regard, | quite the opposite of the expected synergy. Unlike Xilinx | for AMD, they didn't really have any other technologies | intel needed either, the biggest advantage was fabrication | and that fell through. | sitkack wrote: | This is AMD competing with Nvidia, not AMD competing with | Intel. | GeorgeTirebiter wrote: | Xilinx had laid off a good chunk right before their sale to | AMD. Xilinx was having some financial troubles; when that | happens, investors want out before a company craters. So | selling themselves was one possible solution. | ATsch wrote: | I think it's more a reaction to the decreasing importance of | CPUs in the datacenter in favor of interconnect technology. | FPGAs are one of the directions in which the "smart nic" or | "DPU" tech has been moving, which is critical to the trend of | datacenter disaggregation. Xilinx has a very strong offering | in that regard. | baybal2 wrote: | It is not a trend at all if you look at market data. | | Prime majority of hosting market still goes to bog standard | servers, not even blades. | | I'll wait for "clouds" to get to significant double double | digit market share first. | ATsch wrote: | If you look at market data, you can see that this market | did not exist a few years ago and is now estimated to be | worth billions, with major players releasing products in | the space. Unless the dynamics pushing this forward | change overnight, I think it's pretty safe to call it a | trend. | DCKing wrote: | They're going to need good leadership to pull this off. AMD | doesn't have a great track record when it comes to these | integrations. | | AMD bought ATI while promising the same integration | "synergies". GPU style compute was going to be completely woven | into the CPU - "AMD Fusion". Sounds great - but they ended up | with them being beaten to the CPU-with-integrated-GPU market by | Intel by over a year (Intel Clarkdale launched January 2010, | AMD Llano midway 2011). 14 years after the acquisition, AMD's | iGPU integration is not much different compared to any other | iGPU integration, their raw performance lead is shrinking | compared to Intel and they're beaten by Apple. Radeon | Technologies Group functionally operates independently within | the company, and AMD won't use their more performant new RDNA | architecture in iGPUs for two years after its launch for some | reason - even their 2021 APUs still use their 2017 Vega | architecture (fundamentally based on 2012 GCN technology). In | the intervening years they've screwed up their processor | architecture and marketshare for by going all in on the | terrible Bulldozer architecture that was designed around the | broken promises of far reaching GPU integration. | | Given all that the ATI acquisition might still have been worth | it - in hindsight AMD needed a competent GPU architecture one | way or another - but the mismanagement of this acquistion | nearly killed the company. I hope better leadership can do | something here but I'm not really holding my breath. | atq2119 wrote: | Agreed. Now to be fair, the acquisition is also what helped | the company survive because it got them the console business. | So it's not like it was completely botched. | | They screwed up majorly with software, and they may have the | same problem with an FPGA acquisition as well. AMD failed big | time to capitalize on GPUs the way Nvidia did, and that's | really almost entirely down to lack of good software | solutions. There's ROCm now and it seems plausible that the | gap is going to narrow further with AMD GPUs deployed to big | HPC clusters, but a gap remains. | m4rtink wrote: | Aren't all the new desktop consoles and the generation before | that based on AMD CPU and GPU fused together in a specific | way ? | wtallis wrote: | The consoles use AMD SoCs that include CPU and GPU cores, | but there's nothing special about how the CPU and GPU are | connected. The only remotely unusual aspect there is that | many of the console SoCs connect GDDR5/6 to the SoC's | shared memory controller, while other consumer devices | using similar chips (marketed by AMD as APUs) tend to use | DDR4 or LPDDR. | qwerty456127 wrote: | I could never stop wondering why is this not a norm yet. Why | doesn't every computer have an FPGA. | mhh__ wrote: | Probably power, getting the data onto the FPGA, and utilising | FPGA's being unlike software. | | I definitely want one but any common task worth having on an | FPGA is probably common enough to justify either a GPU or | actual silicon. | | Intel and AMD both have the IP to do it, and iPhones do have a | Lattice chip on them apparently | rowanG077 wrote: | Once partial reconfiguration works and the FPGA can access | main memory directly I see a lot of use cases. Imagine | applications reconfiguring the FPGA in the blink of an eye to | optimize their own algorithms. | rjsw wrote: | There have been PCI FPGA boards available for a long time | that can access main memory, I had them in my desktop | machines nearly 20 years ago. | rowanG077 wrote: | Yes through the PCI bus not directly. You don't want to | have that latency. You want a unified model. Like Intel | GPUs that can access main memory, or the FPGA being | another endpoint in AMDs infinite fabric architecture. | That exists as well in SoCFPGA boards. But not in the mid | or high performance segments. | rjsw wrote: | Back when AMD released the first Opteron CPUs there was a | vendor selling an FPGA that would plug into an Opteron | socket along with the IP to implement HyperTransport in | the FPGA. | mhh__ wrote: | Attacking a hypothetical poorly isolated on-chip FPGA | seems like the mother of all exploits, thinking about it | rowanG077 wrote: | Why? To make an FPGA do what you want you need to be able | to reconfigure it. If you have reconfiguration capability | you need to have remote code execution. And in that case | you have already lost. | mhh__ wrote: | As in, the FPGA would have to be carefully segmented so | the accelerator couldn't be used to access memory it | shouldn't have access to. | | I don't think it would happen in a general purpose chip | but I could see it happening in a smaller one like the | exploits christopher Domas demonstrated against some | embedded X86 cores. | rowanG077 wrote: | Why though? Your Integrated Intel or AMD GPU can also | access all of your memory. I don't see how an FPGA | provides any additional attack vector. As I said you'd | need code execution privileges anyway and once you have | that your system is already owned. | rjsw wrote: | The boards that I have used could not reprogram the FPGA | over the PCI bus. | mhh__ wrote: | I was thinking aloud about the memory rather than the | actual FPGA bitstream | imtringued wrote: | Existing FPGA vendors made sure their products remained in a | lucrative niche by maintaining full control over the | development process for FPGA designs. | amelius wrote: | My guess: because FPGAs are slow compared to mainstream desktop | CPUs and only make sense if you have massive paralelism. But | then you'd need a massive FPGA which would be crazy expensive, | plus you'd need a good way to handle throughput. | | I could be totally wrong, though. | atq2119 wrote: | That, plus programming FPGA kind of sucks. The software tool | chains are somewhere between 20 and 30 years behind the state | of the art for software development. | | Also, FPGAs can't be reasonably context-switched. Flashing | them takes a significant amount of time, so forget about | time-multiplexing access to the FPGA among different | applications. | m4rtink wrote: | I could imagine some sort of API based queuing - say you | have 2 "slots" you can program stuff on so if you play 8 k | video you can have on flashed to video decoder while the | other one can speed up your kernel compilation. If you then | want to also use FPGA accelerated denoising on some video | you recently recorded, the OS will politely tell you to | wait for one of the other apps using the available slots to | terminate first. | amelius wrote: | Is there even any progress in OSes with respect to how | they deal with tasks/processes on GPUs? | atq2119 wrote: | Progress relative to what? | | Since applications do all their rendering via the GPU | these days, desktop multi-tasking requires reasonably | time-sliced access to the GPU. GPUs have proper memory | protection these days (GPU-side page tables for each | process). That's big progress over 10 years ago. | ineedasername wrote: | Sounds like spending a few hours a month learning an HDL could be | a good long-term career decision. | deelowe wrote: | Anyone who is considering this, make sure you learn digital | circuits first. | seabird wrote: | You're going to need to commit a lot more time than that. HDLs | and the surrounding concepts have key fundamental differences | from software that a lot of developers have a hard time | stomaching. That's why high-level synthesis is the FPGA | industry's City of El Dorado; software developers would be able | to create acceleration designs without having to build up a | fairly large new skillset. | imtringued wrote: | I've never understood this argument. The change in mindset is | extremely small. It's merely a matter of awareness. High | level synthesis can work just fine if you don't go overboard | with constructs that are hard to synthesize. There is no | fundamental reason why a math equation in C should be harder | to synthesize than the Verilog or VHDL equivalent. | Nullabillity wrote: | The dataflow dialect of VHDL instantly felt really natural to | me, coming from FRP (among a bunch of other stuff). | | Of course, using it in industry is presumably pretty | different from using it for a few school courses. | efferifick wrote: | While sibling comments mention that it is probably wiser to | learn digital logic before HDL (and I agree with them), I think | it is important to also consider that there is now High Level | Syntehesis where programming languages similar to C (e.g., | OpenCL) can compile to VHDL. HLS may lower the barrier for | programmers to take advantage of FPGAs. However, whether the | design can compile to fit the constraints of the FPGA available | is another question that I do not know the answer. | nsajko wrote: | I think the right way isn't "learn a HDL", it's "learn digital | electronics design". Hardware description languages enable | succint hardware description, but it's still necessary to keep | an image of the actual hardware in mind. | ip26 wrote: | HDL is really just ascii schematics. | GuB-42 wrote: | Everyone seems to be talking about accelerated instructions but | how about I/O? | | FPGAs are awesome at asynchronous I/O and low latency. We could | implement network stacks, sound and video processing, etc... It | can start a TLS handshake as soon as the electrical signal hits | the ethernet port, while the CPU is not even aware of it | happening. It can timestamp MIDI input down to the microsecond | and replay with the same precision. It can process position data | from a VR headset at the very last moment in the graphics | pipeline. Maybe even do something like a software defined radio. | | Basically every simple but latency-critical operations. Of | course, embedded/realtime systems are a prime target. | slimsag wrote: | A fair amount of enterprise NICs in data centers do exactly | this, e.g. Intel FPGA smart NICs | | I don't know enough to know how this being on the CPU would | affect performance in this scenario, but I'd love to learn | more! | leecb wrote: | Everything described in the article sounds exactly like some of | the Virtex*-FX products from more than 10 years ago. | | For instance, the Virtex4-FX had either one or two 450MHz PowerPC | coresembedded in it, where you could implement 8 of your own | additional instructions in the FPGA. This is effectively now a | CPU where you can extend the instruction set, and design your own | instructions specific to your application. For example, you might | make special instructions using the onboard logic to accelerate | video compression, or math operations; I know of one application | that was designed to do a 4x4 matrix multiply per cycle. | | https://www.digikey.com/catalog/en/partgroup/virtex-4-fx-ser... | https://www.xilinx.com/support/documentation/data_sheets/ds1... | mhh__ wrote: | What was the latency like to actually get data into your shiny | new instruction e.g. do I get a 14 stage pipeline stall to | actually use the instruction? | rowanG077 wrote: | That depends on how you designed your instruction. | sitkack wrote: | And your pipeline | thrtythreeforty wrote: | For those curious, Xtensa is a similar embeddable architecture | (known especially for its use in the ESP32 microcontroller) | that allows broad latitude to the designer to customize its | instruction set with custom acceleration. The integration is | very good, the compiler recognizes the new intrinsics and the | designer has control over how the instruction is pipelined into | the main processor. | | Unfortunately it's very proprietary, and as far as I know there | isn't an at-home version you can play with on FPGAs. But this | kind of thing does exist if you can afford it - you don't have | to roll your own RTL. | nynx wrote: | This is exciting! Would be cool if it could access some sort of | gpio as well! ___________________________________________________________________ (page generated 2021-01-03 23:00 UTC)