[HN Gopher] eGPU: A 750 MHz Class Soft GPGPU for FPGA
       ___________________________________________________________________
        
       eGPU: A 750 MHz Class Soft GPGPU for FPGA
        
       Author : matt_d
       Score  : 39 points
       Date   : 2023-08-01 20:11 UTC (2 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | stefanpie wrote:
       | One group at Georgia Tech in our building has also been working
       | on open source GPU designs that can also target FPGAs and
       | interoperate with RISCV. They have several publications on the
       | work they have built up. Thought I might share since it's not
       | referenced in the submission paper.
       | 
       | https://vortex.cc.gatech.edu/
        
         | mepian wrote:
         | They still haven't published the source code for their Skybox
         | project, I wonder why. Unless I missed it in their repository?
         | https://github.com/vortexgpgpu
        
       | gsmecher wrote:
       | Also discussed here:
       | https://old.reddit.com/r/FPGA/comments/15fnb6u/egpu_a_750_mh...
        
       | dragontamer wrote:
       | For a GPU circuit, it basically comes down to the number of
       | hardware multipliers on the FPGA, does it not?
       | 
       | I remember synthesizing a 16-bit Wallace tree in a lab exercise
       | back in college. I think that single multiplier used up 70% of my
       | LUTs.
       | 
       | You only will get massive amounts of hardware parallel
       | multipliers if the underlying circuit has a ton of hardware
       | multipliers (Like Xilinx's VLIW SIMD AI chips)
       | 
       | -------
       | 
       | At all computer sizes, a GPU probably will have more multiply
       | circuits than an equivalent cost FPGA, with exception of maybe
       | those AI chips from Xilinx (where the individual cores are
       | basically presynthesized with hardcoded ISA).
       | 
       | Ex: at under 500mW power usage you probably will prefer some ARM
       | NEON SIMD or TI DSP / VLIW. At cell phone levels you'd prefer a
       | cell phone GPU, and at desktop/server levels you'd prefer a
       | desktop GPU.
        
         | danhor wrote:
         | > At all computer sizes, a GPU probably will have more multiply
         | circuits than an equivalent cost FPGA
         | 
         | Very likely yes, but FPGAs often have hundreds to thousands of
         | hardware multipliers, as part of the DSP blocks. Here for
         | example newer AMD FPGAs:
         | https://eu.mouser.com/datasheet/2/903/ds890_ultrascale_overv...
        
           | mathisfun123 wrote:
           | I wish people would stop quoting marketing material as some
           | kind representation of what they know.
           | 
           | You're giving completely the wrong impression about dsp
           | slices - it is absolutely not 1 dsp slice per FP operator at
           | any precision that you would want to do floating point
           | arithmetic. It's definitely at least 2 plus a whole bunch of
           | LUTs (~500) for FP16 with 4 stages or something like that.
           | And if you want faster (fewer stages) then you need more
           | slices. On alveo u280, which is an ultrascale part, I have
           | never been able to effectively utilize more than ~4000 dsp
           | slices (out of 9024) for 5,4 mults and that cost basically
           | 99% of clbs in SLR1 and SLR2.
           | 
           | And even then, disconnected FPUs are completely meaningless
           | without a datapath implementing eg matmul and boy oh boy do
           | you have no clue what you're in for there.
           | 
           | Takeaway: it's pointless to compare raw specsheet numbers
           | when _everything_ comes down to datapath.
        
           | pkaye wrote:
           | How much would that FPGA cost?
        
           | UncleOxidant wrote:
           | The FPGAs with enough multipliers to be competitive against
           | an actual GPU are going to be quite a bit more expensive than
           | a GPU aren't they?
        
       | Lramseyer wrote:
       | Full Disclosure, I work for an FPGA company.
       | 
       | The mind blowing part of all of this is the fact that they were
       | able to close timing at 771MHz. That is insanely fast for an
       | FPGA. For perspective, most modern FPGAs run their designs at
       | around 300MHz* While most of the heavy lifting in this design use
       | hardened components like DSPs and FPUs, it's still very
       | impressive to see!
       | 
       | What I didn't see talked about much was how memory is loaded in
       | and out of the processor. I'm curious to see what the memory
       | bandwidth numbers look like as well as the resource utilization
       | of the higher level routing.
       | 
       | *For most hardware designs that aren't things like CPUs and GPUs,
       | you don't always need a super high clock speed. You have a lot
       | more flexibility to compute in space rather than in time (think
       | more threads running slower.) The pros and cons of such tradeoffs
       | are a bit of a complicated topic, but should at least be noted.
        
         | mathisfun123 wrote:
         | > The mind blowing part of all of this is the fact that they
         | were able to close timing at 771MHz
         | 
         | It's true but I mean this is Intel in-house research right? If
         | they can't get absolute peak fmax on their own parts that would
         | be a really bad look right? Plus these stratix parts have hard
         | FP blocks (not just DSPs) so they're basically mostly
         | scheduling stuff rather building the whole datapath. But
         | admittedly I haven't read the paper...
         | 
         | >Full Disclosure, I work for an FPGA company
         | 
         | I currently do too (as an intern, maybe even the same one as
         | you) and I haven't looked very hard but I'm sure we have
         | similar fmax achieving projects (maybe even GPUs since we're
         | fighting hard to compete with Nvidia...).
        
       | unwind wrote:
       | Uh, non-native question: what is the word "class" doing in the
       | title?
       | 
       | Is a hyphen missing, so it should be "750 MHz-class"? I searched
       | the linked page but the word only appears in the title, sans
       | hyphen.
        
       | avmich wrote:
       | Wonder it this could help to alleviate the momentary shortage of
       | GPUs on the market.
        
         | ZiiS wrote:
         | 10 year old entry level GPUs have 100 750Mhz cores
        
           | monocasa wrote:
           | 'Cores' are really overstated in GPUs. CUDA cores are really
           | SIMD lanes and if you counted it the same way as a CPU does,
           | you'd get somewhere in the dozens of cores range even for
           | modern GPUs.
        
             | codedokode wrote:
             | A proper method is counting ALUs instead of vague "cores".
        
             | xigency wrote:
             | That seems backwards to me. Sure, a GPU core is less
             | general, but in terms of concurrent execution, memory
             | bandwidth, and FLOPS I would expect hundreds to thousands
             | of cores for all new GPU offerings. Apple's double-digit
             | GPU core counts for instance sound extremely understated.
        
               | monocasa wrote:
               | It's not. The best comparison is the SM count for Nvidia
               | hardware, or the wavefront count for AMD hardware. So a
               | 4070 has 46 cores as you'd count them on a CPU.
        
         | latchkey wrote:
         | I don't think this will be momentary. Reality is that there
         | have been shortages of GPUs for a long time now and demand
         | isn't going down. People are signing 3 year contracts with
         | lambda now.
        
         | [deleted]
        
         | monocasa wrote:
         | If it's on an FPGA then it doesn't really compete with GPUs you
         | can buy from just about any perspective other than openness.
        
       ___________________________________________________________________
       (page generated 2023-08-01 23:00 UTC)