[HN Gopher] AMD Patent Reveals Hybrid CPU-FPGA Design That Could...
       ___________________________________________________________________
        
       AMD Patent Reveals Hybrid CPU-FPGA Design That Could Be Enabled by
       Xilinx Tech
        
       Author : craigjb
       Score  : 137 points
       Date   : 2021-01-03 17:39 UTC (5 hours ago)
        
 (HTM) web link (hothardware.com)
 (TXT) w3m dump (hothardware.com)
        
       | rwmj wrote:
       | About *!$% time! I was hoping Intel would do something like this
       | when they acquired Altera a few years back. Does anyone know why
       | Intel acquired Altera?
        
         | PedroBatista wrote:
         | Almost the same reason someone buys a Peloton bike or rusted
         | old Porsche. Because someone had a dream last night and have
         | the money.
        
         | d_tr wrote:
         | AFAIK there exist some Xeon + FPGA chips. No clue about
         | availability though...
        
       | mhh__ wrote:
       | Xilinx already have ARM cores in their FPGAs so I wonder which
       | way they'll go - I'd honestly prefer a neoverse core than an X86
        
         | efferifick wrote:
         | Not sure how realistic it would be, but I would like to see a
         | RISC-V base core, and the FPGA implementing the extensions.
         | Why? Because it would be cool! Also, I don't really see a use
         | case except for debugging compilers supporting multiple RISC-V
         | extensions and what not.
        
           | craigjb wrote:
           | Microchip has the product for you then! Well, the RiscV part
           | anyway. https://www.microsemi.com/product-directory/soc-
           | fpgas/5498-p...
        
         | jagger27 wrote:
         | AMD already has full-on Arm products.
         | 
         | https://www.amd.com/en/amd-opteron-a1100
        
         | sbrorson wrote:
         | You are right the ARM cores, mostly. Xilinx Zynq devices have
         | ARM A devices built into them as "hard" cores. That is, the
         | ARMs are instantiated directly in silicon, not as "soft" cores
         | which take LUTs (gates) from the FPGA fabric. The ARM A is a
         | microprocessor (not a microcontroller) powerful enough to run
         | Linux.
         | 
         | The ARM connects to the FPGA fabric using a so-called AXI bus,
         | which is a local bus defined by ARM. Xilinx supplies a bunch of
         | "soft" cores which you can instantiate in the FPGA and
         | integrate with the ARM. Of course, you can write your own logic
         | for the FPGA too, as long as you can figure out how to
         | interface to it using one of the AXI bus variants.
         | 
         | Several vendors offer experimenters platforms which are
         | affordable enough for hobbyists and folks making engineering
         | prototypes. Examples are the Avnet's Zed board and Digilent's
         | Zybo board.
         | 
         | The biggest problem with the Zynq ecosystem is that the Xilinx
         | tools -- Vivado/SDK and whatever they renamed it to last year
         | -- are steaming piles of smelly brown stoff. Vivado is buggy,
         | poorly supported, has bad documentation, and the supplied
         | examples typically don't work in the latest version of Vivado
         | since they were written long ago and have been made obsolete
         | via version skew. An absolute disgrace compared to what
         | software engineers are used to. The SDK is basically Eclipse
         | which has its own problems, but is not as bad as Vivado. Ask me
         | how I know.
         | 
         | I think AMD and Xilinx have a long way to go before they can
         | satisfy the hype and speculation I see in all the posts here. I
         | suppose one could shell out $20K for a seat of Synopsys if one
         | wanted a decent set of dev tools, but that's not the direction
         | most software engineers are going nowadays.
         | 
         | Also, assuming NVidia completes its acquisition of ARM, the
         | whole Zynq ecosystem is imperiled since it pits ARM against
         | NVidia.
        
       | ohazi wrote:
       | For decades, the FPGA vendors have had this fever dream of "an
       | FPGA in every PC" -- either as an add-on card, or as part of the
       | chipset on a motherboard -- that would enable a compiler or
       | operating system to seamlessly accelerate arbitrary tasks on
       | demand.
       | 
       | In my opinion, the problem has always been their software: the
       | FPGA vendor tools are slow, bloated monstrosities. The core of
       | these tools are written by the big three EDA vendors (Cadence,
       | Synopsys, and Mentor Graphics) rather than the FPGA vendors
       | themselves. The licenses include ridiculous, paranoid
       | restrictions [1] and force the FPGA vendors to keep their
       | bitstream formats and timing databases secret [2] in order to
       | prevent competition from other tool vendors. Most FPGA vendors
       | didn't see this as a problem, but even the ones that did didn't
       | have much of a choice, because the tool market is a cartel.
       | 
       | Thankfully, we now have an open source toolchain [3] with support
       | for a growing number of FPGA architectures [4], and using it vs.
       | the vendor tools is like using gcc or llvm vs. a '90s era, non-
       | compliant C++ compiler. It even has a real IR that isn't Verilog,
       | which has made it easier to design new HDLs [5].
       | 
       | I don't see how a dynamic FPGA accelerator platform can be even
       | remotely viable without this. It's the difference between a
       | developer getting to choose between one of a few dozen pre-baked
       | designs that lock up the entire FPGA (and needing to learn how to
       | shovel data into it), vs. a compiler flag that can give you the
       | option of unrolling any loop directly into any inactive region of
       | FPGA fabric.
       | 
       | It would be quite the cherry on top to see AMD build something
       | interesting in this space. But unless they're willing to fully
       | unencumber at least this one design, I think the effort is likely
       | to fail. The open source guys are chomping at the bit to make
       | this work, and have been making real progress lately. Meanwhile,
       | the EDA vendors have been making promises, failing, and throwing
       | tantrums for the last 20 years. It's time to write them off.
       | 
       | [1]
       | https://twitter.com/OlofKindgren/status/1052822081652617221?...
       | 
       | [2] Imagine trying to write an assembler without being allowed to
       | see the manual that tells you how instructions are encoded. It's
       | like that, but the state-space is hundreds to thousands of bytes
       | in multiple configurations rather than a few dozen bits.
       | 
       | [3] https://github.com/YosysHQ/yosys
       | 
       | [4] https://symbiflow.github.io/
       | 
       | [5] https://github.com/m-labs/nmigen
        
         | travis729 wrote:
         | I would love to hack on FPGAs but always run into the issue of
         | closed toolchains. The recent open source work is a breath of
         | fresh air, but we need to see an FPGA vender that embraces and
         | sponsors this work.
        
           | ohazi wrote:
           | I think/hope it's an unstable equilibrium -- if either
           | Altera/Intel or Xilinx/AMD give a nod to the open source
           | tools, the others will follow.
           | 
           | Lattice is seemingly at "wink wink, nudge nudge" levels of
           | support -- their lawyers won't allow them to say anything
           | because they're afraid of pissing off Synopsys, but they also
           | know that they're currently the best supported platform, and
           | don't seem interested in deliberately making things
           | difficult.
        
         | mhh__ wrote:
         | Symbiflow is still a long long way off replacing Vendor tools
         | at scale, right?
         | 
         | I'm really liking Clash and Bluespec (Bluespec is completely
         | open source now) but I don't want to write any conventional
         | languages.
        
           | thrtythreeforty wrote:
           | What does Bluespec compile to? All the way to a bitstream
           | (surely not) or to Verilog or an intermediate language?
        
             | mhh__ wrote:
             | Firstly, (for the uninitiated) Bluspec is both a Haskell
             | DSL (Bluespec Classic) and a Verilog-like language
             | (Bluespec SystemVerilog)
             | 
             | It compiles to Verilog, but the stack is much more
             | integrated than other similar compile-to-verilog HDLs - the
             | simulator is similar to verilator and much easier to get
             | started with.
             | 
             | I'm kind of beginning to feel that Haskell isn't a good
             | medium for HDL code - Verilog already encourages unreadable
             | names like "mem_chk_sig_state" and Haskell code is almost
             | unstructured to my eye (I like functional programming but
             | it seems hard to keep it readable because of the style it
             | imposes - the flow is there but the names are usually way
             | too short for my taste)
        
             | ohazi wrote:
             | I'm pretty sure Bluespec and SpinalHDL compile to Verilog.
             | Chisel uses it's own IR (FIRRTL). I think Migen used to
             | target Verilog, but now targets (one of?) the IR(s) that
             | Yosys supports (RTLIL?).
        
       | [deleted]
        
       | Traster wrote:
       | I hate to be that bucket of cold water, but there's _multiple_
       | reasons FPGAs haven 't been successful in package with CPUs.
       | Firstly, the costs of embedding the FPGA - FPGAs are relatively
       | large and power hungry (for what they can do), if you're sticking
       | one on a CPU die, you're seriously talking about trading that
       | against other extremely useful logic. You really need to make a
       | judgement at purchase time whether you want that dark piece of
       | silicon instead of CPU cores for day to day use.
       | 
       | Secondly, whilst they're reconfigurable, they're not
       | reoconfigurable in the time scales it takes to spawn a thread,
       | it's more like the same scale of time to compile a program (this
       | is getting a little better over time). Which makes it a difficult
       | system design problem to make sure your FPGA is programmed with
       | the right image to run the software programme you want. If you're
       | at that level of optimization, why not just design your system to
       | use a PCI-E board, it'll give you more CPU, and way more FPGA
       | compute and both will be cheaper because you get a stock CPU and
       | stock FPGA, not some super custom FPGA-CPU hybrid chip.
       | 
       | Thirdly the programming model for FPGAs are fundamentally very
       | different to CPUs, it's dataflow, and generally the FPGA is
       | completely deterministic. We really don't have a good answer for
       | writing FPGA logic to handle the sort of cache hierarchy, out of
       | order execution that CPUs do. So you're not getting the same sort
       | of advantage that you'd expect from that data locality. It's very
       | difficult to write CPU/FPGA programs that run concurrently,
       | almost all solutions today run in parallel - you package up your
       | work, send it off to the FPGA and wait for it to finish.
       | 
       | Finally, as others have said - the tools are bad. That's
       | relatively solvable.
       | 
       | For me, it boils down to this, if you have an application that
       | you think would be good on the same package as a CPU, it's
       | probably worth hardening it into ASIC (see: error correction,
       | Apple's AI stuff). If you have an application that isn't, then a
       | PCI-E card is probably a better bet - you get more FPGA, more CPU
       | and you're not trading the two off.
        
         | imtringued wrote:
         | It's easier to provide "custom instructions" and only
         | accelerate CPU bottlenecks if you don't have PCIe as a massive
         | bottleneck. If you are using an accelerator behind a bus you
         | always have to make sure there is enough work for the
         | accelerator to justify a data transfer. GPUs are built around
         | the idea of batching a lot of work and running it in parallel.
         | You can make an FPGA work like that but you are throwing away
         | the low latency benefits of FPGAs.
        
           | wtallis wrote:
           | Even the best-case scenarios for integrating a FPGA onto the
           | same die as CPU cores would still have the FPGA _separate_
           | from the CPU cores. It 's really not possible to make an
           | open-ended high bandwidth low latency interface to a huge
           | chunk of FPGA silicon part of the regular CPU core's tightly-
           | optimized pipeline, without drastically slowing down that
           | CPU. The sane way to use an FPGA is as a coprocessor, not
           | grafted onto the processor core itself. Then, you're
           | interacting with the FPGA through interfaces like memory-
           | mapped IO whether it's on-die, on-package, or on an add-in
           | card.
        
             | Traster wrote:
             | Yeah, worth mentioning highly optimized FPGA designs run at
             | up to 600MHz (or to put it another way, 400MHz lower than
             | what Intel advertised 4 years ago). So at a minimum, you're
             | going to clock cross, have a >10 cycle pipeline at CPU
             | speeeds (variable clock) and clock cross back.
        
         | jacoblambda wrote:
         | I definitely agree that a PCI-E card is preferable. Hell even
         | if you have it in CPU, you probably want it sat on the PCI-E
         | bus anyways so it can P2P DMA with other hardware.
         | 
         | Also (not disagreeing but I'm curious), last time I checked
         | FPGAs could pull off some level of partial reconfiguration in
         | the millisecond and sub millisecond ranges. I may be a bit off
         | on these times but I saw them in a research paper a few years
         | back. What types of speed would be necessary for CPUs to
         | actually be able to benefit from a small FPGA onboard (rather
         | than on an expansion card) with all the context switching.
        
       | user5994461 wrote:
       | Yet another patent that should never have been granted.
       | 
       | SoC have been a thing for a long time. SoC = CPU + FPGA on a
       | single chip.
       | 
       | Looking at the patent, the list of 20 claims is absurd. The title
       | says it all "... PROGRAMMABLE INSTRUCTIONS IN COMPUTER SYSTEMS",
       | they're trying to patent anything that can run or dispatch
       | instructions.
        
         | mhh__ wrote:
         | Ironically the neural engine patent is literally the only
         | public information on how it works I can find
        
         | refulgentis wrote:
         | >> the list of 20 claims is absurd.
         | 
         | Claims are a union - each individual claim may sound simple,
         | what matters is the combination.
         | 
         | >> The title says it all "... PROGRAMMABLE INSTRUCTIONS IN
         | COMPUTER SYSTEMS", they're trying to patent anything that can
         | run or dispatch instructions.
         | 
         | No. The title of a patent is not a patent.
        
           | fvv wrote:
           | Claims define the context and boundaries of the patent
        
           | user5994461 wrote:
           | Every claim is almost a patent on its own. Submit 20 claims
           | that are progressively more specific, so if one claim is
           | denied during the patent application or afterwards, the other
           | claims can still stand.
           | 
           | Typical strategy is to claim as many things as you can
           | imagine, like inventing CPU and anything that can evaluate an
           | instruction and instructions themselves, then remove any
           | claim that the patent office refuses to grant.
        
         | cptskippy wrote:
         | That's how the industry works. You gather and hoard as many
         | frivolous patents as you can in a cold war arms race. If a new
         | company threatens your business, you search your portfolio for
         | a patent they violated and sue them.
         | 
         | Companies who grow to a certain size look to be acquired by
         | larger firms with bigger war chests.
         | 
         | Sometimes companies recognize patents are stifling progress and
         | engage in cross licensing or pooling of patents. Sometimes they
         | do it to gang up on a new rival.
        
       | economusty wrote:
       | Computronium
        
       | d_tr wrote:
       | The main reason I am interested in this acquisition is a (faint)
       | hope that they open some specs up to help projects like
       | SymbiFlow.
        
       | Scene_Cast2 wrote:
       | A killer tech for this would be a framework that automatically
       | reprograms the FPGA and offloads the work if it makes sense. For
       | example - running k-means? Have your FPGA automatically (with
       | minimal dev effort) flash to be a Nearest Neighbor accelerator.
       | 
       | The problem is finding a way to make that translation happen with
       | minimal dev effort, as software is written rather differently
       | from hardware.
        
         | cashsterling wrote:
         | I recommend checking out CacheQ: https://cacheq.com/
         | 
         | they are working on almost exactly this. If I was an investor,
         | or Intel or AMD, I would buy them and/or invest heavily.
        
           | therealcamino wrote:
           | Their web site is very sparse on what programming models the
           | tool supports. Traditionally, the things you can easily
           | accelerate automatically are algorithms you can write
           | naturally in Fortran 77 (lots of arrays, no pointers), and
           | that's one limit on the applicability of these automatic
           | tools. (Other limits that other posters have pointed out are
           | compilation+place+route runtime, and reconfiguration time.)
           | 
           | They are claiming you can use malloc and make "extensive" use
           | of pointers in C programs and still have them automatically
           | compiled for the FPGA. That's where details are needed and
           | they are mostly missing.
           | 
           | I watched their 30 minute demo film. The speedups are
           | impressive, and on the small example it's impressive that it
           | does the partitioning automatically. However, the program
           | contains only a single call to malloc, and all pointers are
           | derived from that address, so it doesn't do much to convince
           | us that it the memory model and alias analysis give you more
           | flexibility than the F77 model.
        
         | d_tr wrote:
         | You might want to check the "Warp Processing" project out:
         | http://www.cs.ucr.edu/~vahid/warp/. It is probably exactly what
         | you are thinking about. Transparent analysis of the instruction
         | stream at runtime and synthesis and offloading of hot spots to
         | the FPGA.
        
           | Scene_Cast2 wrote:
           | Huh, interesting. It seems that the work doesn't have to be
           | explicitly parallel for this to work, which is a surprise.
        
         | rch wrote:
         | I recall reading papers about doing this by profiling Java apps
         | a decade or so ago, but I would have to dig pretty deep in my
         | HN comment history to find them.
         | 
         | The approach seems conceptually similar to the optimizations
         | available via the enterprise version of GraalVM.
        
       | BryanBeshore wrote:
       | Lisa Su is a fantastic CEO. Time will tell what the impact of
       | AMD's acquisition of Xilinx will be (should it close), but this
       | shows the strategy and execution behind Su and team.
       | 
       | While a lot of acquisitions don't pan out, this seems great.
        
         | parsimo2010 wrote:
         | AMD purchasing Xilinx is a reaction to Intel purchasing Altera
         | five years ago. Dr. Su might be a good CEO for other reasons,
         | but this isn't something that illustrates brilliant strategy on
         | her part.
        
           | cptskippy wrote:
           | The industry doesn't move overnight. AMD might have seen
           | where Intel was going and didn't want to be caught off guard,
           | or that might be the alternative to Apple approach of dozens
           | of coprocessors on a chip.
        
           | BryanBeshore wrote:
           | As I said, time will tell
        
           | rusticpenn wrote:
           | Intel did not produce anything worthwhile from that strategy
           | yet and I have seen no plans either. I use Altera for all my
           | FPGA needs.
        
             | ATsch wrote:
             | A large reason for the deal with Altera was that Altera
             | already used intel for fabrication. I understand Intel's
             | 10nm and 7nm failure has hurt them a lot in that regard,
             | quite the opposite of the expected synergy. Unlike Xilinx
             | for AMD, they didn't really have any other technologies
             | intel needed either, the biggest advantage was fabrication
             | and that fell through.
        
           | sitkack wrote:
           | This is AMD competing with Nvidia, not AMD competing with
           | Intel.
        
           | GeorgeTirebiter wrote:
           | Xilinx had laid off a good chunk right before their sale to
           | AMD. Xilinx was having some financial troubles; when that
           | happens, investors want out before a company craters. So
           | selling themselves was one possible solution.
        
           | ATsch wrote:
           | I think it's more a reaction to the decreasing importance of
           | CPUs in the datacenter in favor of interconnect technology.
           | FPGAs are one of the directions in which the "smart nic" or
           | "DPU" tech has been moving, which is critical to the trend of
           | datacenter disaggregation. Xilinx has a very strong offering
           | in that regard.
        
             | baybal2 wrote:
             | It is not a trend at all if you look at market data.
             | 
             | Prime majority of hosting market still goes to bog standard
             | servers, not even blades.
             | 
             | I'll wait for "clouds" to get to significant double double
             | digit market share first.
        
               | ATsch wrote:
               | If you look at market data, you can see that this market
               | did not exist a few years ago and is now estimated to be
               | worth billions, with major players releasing products in
               | the space. Unless the dynamics pushing this forward
               | change overnight, I think it's pretty safe to call it a
               | trend.
        
         | DCKing wrote:
         | They're going to need good leadership to pull this off. AMD
         | doesn't have a great track record when it comes to these
         | integrations.
         | 
         | AMD bought ATI while promising the same integration
         | "synergies". GPU style compute was going to be completely woven
         | into the CPU - "AMD Fusion". Sounds great - but they ended up
         | with them being beaten to the CPU-with-integrated-GPU market by
         | Intel by over a year (Intel Clarkdale launched January 2010,
         | AMD Llano midway 2011). 14 years after the acquisition, AMD's
         | iGPU integration is not much different compared to any other
         | iGPU integration, their raw performance lead is shrinking
         | compared to Intel and they're beaten by Apple. Radeon
         | Technologies Group functionally operates independently within
         | the company, and AMD won't use their more performant new RDNA
         | architecture in iGPUs for two years after its launch for some
         | reason - even their 2021 APUs still use their 2017 Vega
         | architecture (fundamentally based on 2012 GCN technology). In
         | the intervening years they've screwed up their processor
         | architecture and marketshare for by going all in on the
         | terrible Bulldozer architecture that was designed around the
         | broken promises of far reaching GPU integration.
         | 
         | Given all that the ATI acquisition might still have been worth
         | it - in hindsight AMD needed a competent GPU architecture one
         | way or another - but the mismanagement of this acquistion
         | nearly killed the company. I hope better leadership can do
         | something here but I'm not really holding my breath.
        
           | atq2119 wrote:
           | Agreed. Now to be fair, the acquisition is also what helped
           | the company survive because it got them the console business.
           | So it's not like it was completely botched.
           | 
           | They screwed up majorly with software, and they may have the
           | same problem with an FPGA acquisition as well. AMD failed big
           | time to capitalize on GPUs the way Nvidia did, and that's
           | really almost entirely down to lack of good software
           | solutions. There's ROCm now and it seems plausible that the
           | gap is going to narrow further with AMD GPUs deployed to big
           | HPC clusters, but a gap remains.
        
           | m4rtink wrote:
           | Aren't all the new desktop consoles and the generation before
           | that based on AMD CPU and GPU fused together in a specific
           | way ?
        
             | wtallis wrote:
             | The consoles use AMD SoCs that include CPU and GPU cores,
             | but there's nothing special about how the CPU and GPU are
             | connected. The only remotely unusual aspect there is that
             | many of the console SoCs connect GDDR5/6 to the SoC's
             | shared memory controller, while other consumer devices
             | using similar chips (marketed by AMD as APUs) tend to use
             | DDR4 or LPDDR.
        
       | qwerty456127 wrote:
       | I could never stop wondering why is this not a norm yet. Why
       | doesn't every computer have an FPGA.
        
         | mhh__ wrote:
         | Probably power, getting the data onto the FPGA, and utilising
         | FPGA's being unlike software.
         | 
         | I definitely want one but any common task worth having on an
         | FPGA is probably common enough to justify either a GPU or
         | actual silicon.
         | 
         | Intel and AMD both have the IP to do it, and iPhones do have a
         | Lattice chip on them apparently
        
           | rowanG077 wrote:
           | Once partial reconfiguration works and the FPGA can access
           | main memory directly I see a lot of use cases. Imagine
           | applications reconfiguring the FPGA in the blink of an eye to
           | optimize their own algorithms.
        
             | rjsw wrote:
             | There have been PCI FPGA boards available for a long time
             | that can access main memory, I had them in my desktop
             | machines nearly 20 years ago.
        
               | rowanG077 wrote:
               | Yes through the PCI bus not directly. You don't want to
               | have that latency. You want a unified model. Like Intel
               | GPUs that can access main memory, or the FPGA being
               | another endpoint in AMDs infinite fabric architecture.
               | That exists as well in SoCFPGA boards. But not in the mid
               | or high performance segments.
        
               | rjsw wrote:
               | Back when AMD released the first Opteron CPUs there was a
               | vendor selling an FPGA that would plug into an Opteron
               | socket along with the IP to implement HyperTransport in
               | the FPGA.
        
               | mhh__ wrote:
               | Attacking a hypothetical poorly isolated on-chip FPGA
               | seems like the mother of all exploits, thinking about it
        
               | rowanG077 wrote:
               | Why? To make an FPGA do what you want you need to be able
               | to reconfigure it. If you have reconfiguration capability
               | you need to have remote code execution. And in that case
               | you have already lost.
        
               | mhh__ wrote:
               | As in, the FPGA would have to be carefully segmented so
               | the accelerator couldn't be used to access memory it
               | shouldn't have access to.
               | 
               | I don't think it would happen in a general purpose chip
               | but I could see it happening in a smaller one like the
               | exploits christopher Domas demonstrated against some
               | embedded X86 cores.
        
               | rowanG077 wrote:
               | Why though? Your Integrated Intel or AMD GPU can also
               | access all of your memory. I don't see how an FPGA
               | provides any additional attack vector. As I said you'd
               | need code execution privileges anyway and once you have
               | that your system is already owned.
        
               | rjsw wrote:
               | The boards that I have used could not reprogram the FPGA
               | over the PCI bus.
        
               | mhh__ wrote:
               | I was thinking aloud about the memory rather than the
               | actual FPGA bitstream
        
         | imtringued wrote:
         | Existing FPGA vendors made sure their products remained in a
         | lucrative niche by maintaining full control over the
         | development process for FPGA designs.
        
         | amelius wrote:
         | My guess: because FPGAs are slow compared to mainstream desktop
         | CPUs and only make sense if you have massive paralelism. But
         | then you'd need a massive FPGA which would be crazy expensive,
         | plus you'd need a good way to handle throughput.
         | 
         | I could be totally wrong, though.
        
           | atq2119 wrote:
           | That, plus programming FPGA kind of sucks. The software tool
           | chains are somewhere between 20 and 30 years behind the state
           | of the art for software development.
           | 
           | Also, FPGAs can't be reasonably context-switched. Flashing
           | them takes a significant amount of time, so forget about
           | time-multiplexing access to the FPGA among different
           | applications.
        
             | m4rtink wrote:
             | I could imagine some sort of API based queuing - say you
             | have 2 "slots" you can program stuff on so if you play 8 k
             | video you can have on flashed to video decoder while the
             | other one can speed up your kernel compilation. If you then
             | want to also use FPGA accelerated denoising on some video
             | you recently recorded, the OS will politely tell you to
             | wait for one of the other apps using the available slots to
             | terminate first.
        
               | amelius wrote:
               | Is there even any progress in OSes with respect to how
               | they deal with tasks/processes on GPUs?
        
               | atq2119 wrote:
               | Progress relative to what?
               | 
               | Since applications do all their rendering via the GPU
               | these days, desktop multi-tasking requires reasonably
               | time-sliced access to the GPU. GPUs have proper memory
               | protection these days (GPU-side page tables for each
               | process). That's big progress over 10 years ago.
        
       | ineedasername wrote:
       | Sounds like spending a few hours a month learning an HDL could be
       | a good long-term career decision.
        
         | deelowe wrote:
         | Anyone who is considering this, make sure you learn digital
         | circuits first.
        
         | seabird wrote:
         | You're going to need to commit a lot more time than that. HDLs
         | and the surrounding concepts have key fundamental differences
         | from software that a lot of developers have a hard time
         | stomaching. That's why high-level synthesis is the FPGA
         | industry's City of El Dorado; software developers would be able
         | to create acceleration designs without having to build up a
         | fairly large new skillset.
        
           | imtringued wrote:
           | I've never understood this argument. The change in mindset is
           | extremely small. It's merely a matter of awareness. High
           | level synthesis can work just fine if you don't go overboard
           | with constructs that are hard to synthesize. There is no
           | fundamental reason why a math equation in C should be harder
           | to synthesize than the Verilog or VHDL equivalent.
        
           | Nullabillity wrote:
           | The dataflow dialect of VHDL instantly felt really natural to
           | me, coming from FRP (among a bunch of other stuff).
           | 
           | Of course, using it in industry is presumably pretty
           | different from using it for a few school courses.
        
         | efferifick wrote:
         | While sibling comments mention that it is probably wiser to
         | learn digital logic before HDL (and I agree with them), I think
         | it is important to also consider that there is now High Level
         | Syntehesis where programming languages similar to C (e.g.,
         | OpenCL) can compile to VHDL. HLS may lower the barrier for
         | programmers to take advantage of FPGAs. However, whether the
         | design can compile to fit the constraints of the FPGA available
         | is another question that I do not know the answer.
        
         | nsajko wrote:
         | I think the right way isn't "learn a HDL", it's "learn digital
         | electronics design". Hardware description languages enable
         | succint hardware description, but it's still necessary to keep
         | an image of the actual hardware in mind.
        
           | ip26 wrote:
           | HDL is really just ascii schematics.
        
       | GuB-42 wrote:
       | Everyone seems to be talking about accelerated instructions but
       | how about I/O?
       | 
       | FPGAs are awesome at asynchronous I/O and low latency. We could
       | implement network stacks, sound and video processing, etc... It
       | can start a TLS handshake as soon as the electrical signal hits
       | the ethernet port, while the CPU is not even aware of it
       | happening. It can timestamp MIDI input down to the microsecond
       | and replay with the same precision. It can process position data
       | from a VR headset at the very last moment in the graphics
       | pipeline. Maybe even do something like a software defined radio.
       | 
       | Basically every simple but latency-critical operations. Of
       | course, embedded/realtime systems are a prime target.
        
         | slimsag wrote:
         | A fair amount of enterprise NICs in data centers do exactly
         | this, e.g. Intel FPGA smart NICs
         | 
         | I don't know enough to know how this being on the CPU would
         | affect performance in this scenario, but I'd love to learn
         | more!
        
       | leecb wrote:
       | Everything described in the article sounds exactly like some of
       | the Virtex*-FX products from more than 10 years ago.
       | 
       | For instance, the Virtex4-FX had either one or two 450MHz PowerPC
       | coresembedded in it, where you could implement 8 of your own
       | additional instructions in the FPGA. This is effectively now a
       | CPU where you can extend the instruction set, and design your own
       | instructions specific to your application. For example, you might
       | make special instructions using the onboard logic to accelerate
       | video compression, or math operations; I know of one application
       | that was designed to do a 4x4 matrix multiply per cycle.
       | 
       | https://www.digikey.com/catalog/en/partgroup/virtex-4-fx-ser...
       | https://www.xilinx.com/support/documentation/data_sheets/ds1...
        
         | mhh__ wrote:
         | What was the latency like to actually get data into your shiny
         | new instruction e.g. do I get a 14 stage pipeline stall to
         | actually use the instruction?
        
           | rowanG077 wrote:
           | That depends on how you designed your instruction.
        
             | sitkack wrote:
             | And your pipeline
        
         | thrtythreeforty wrote:
         | For those curious, Xtensa is a similar embeddable architecture
         | (known especially for its use in the ESP32 microcontroller)
         | that allows broad latitude to the designer to customize its
         | instruction set with custom acceleration. The integration is
         | very good, the compiler recognizes the new intrinsics and the
         | designer has control over how the instruction is pipelined into
         | the main processor.
         | 
         | Unfortunately it's very proprietary, and as far as I know there
         | isn't an at-home version you can play with on FPGAs. But this
         | kind of thing does exist if you can afford it - you don't have
         | to roll your own RTL.
        
       | nynx wrote:
       | This is exciting! Would be cool if it could access some sort of
       | gpio as well!
        
       ___________________________________________________________________
       (page generated 2021-01-03 23:00 UTC)