[HN Gopher] Nyuzi - An Experimental Open-Source FPGA GPGPU Proce...
       ___________________________________________________________________
        
       Nyuzi - An Experimental Open-Source FPGA GPGPU Processor
        
       Author : peter_d_sherman
       Score  : 128 points
       Date   : 2021-02-14 14:37 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ourlordcaffeine wrote:
       | Are there open source opencl to FPGA compilers?
       | 
       | If you're playing with FPGA's, you might as well directly compile
       | the kernel into a circuit, rather than building a gpu on an FPGA
       | and running your kernel on that.
       | 
       | Proprietary solutions like Altera OpenCL compiler exist.
        
       | antman wrote:
       | Layman here! I see a lot of posts on that subject lately so I
       | need to ask: Can someone design a RAM chip?
        
         | pjc50 wrote:
         | Possibly, but why would you want a less efficient RAM chip that
         | costs more compared to something that's a commodity you can
         | buy?
        
           | ComputerGuru wrote:
           | There's basically certain little magic sauce to RAM chip
           | _design_. The production process (which is rather independent
           | from the commonly discussed "node size" production for CPU
           | /GPU-related tech) is where the magic happens.
        
         | detaro wrote:
         | What do you mean specifically by "design a RAM chip"?
         | (obviously RAM chips that you can buy are designed before they
         | are made, so that's probably not what you are after?)
         | 
         | FPGAs typically do contain dedicated RAM areas, because
         | implementing it out of FPGA logic slices is terribly
         | inefficient.
        
           | tcherasaro wrote:
           | FPGA designer here. Just wanted to point out that
           | "efficiency" is highly context sensitive in FPGA design.
           | Everything is an area / speed / power trade-off. If you only
           | need a ram that is 8-bits wide and 64 words deep then it
           | might be way inefficient to waste a dedicated 18kbit block
           | ram on it when it would fit better into 2 LUTs. This is why
           | Xilinx, for one, provides pragma such as RAM_STYLE to help
           | guide synthesis:
           | 
           | (* ram_style = "distributed" *) reg [data_size-1:0] myram
           | [2**addr_size-1:0];
           | 
           | block: Instructs the tool to infer RAMB type components.
           | 
           | distributed: Instructs the tool to infer the LUT RAMs.
           | 
           | registers: Instructs the tool to infer registers instead of
           | RAMs.
           | 
           | ultra: Instructs the tool to use the UltraScale+TM URAM
           | primitives.
           | 
           | See: https://www.xilinx.com/support/documentation/sw_manuals/
           | xili...
           | 
           | edit: formatting*
        
         | sitkack wrote:
         | Yes, designing ram is a lower level operation as compared to
         | designing logic via an HDL and needs to directly take the
         | process (chemistry, optics, mechanics) of the fab into account.
         | 
         | https://openram.soe.ucsc.edu/
        
           | bserge wrote:
           | DRAM chips are fascinating.
           | 
           | Instead of going with the much more expensive SRAM, someone
           | decided that refreshing billions of capacitors hundreds of
           | times a second while performing read and write operations is
           | an acceptable way of _storing_ data (even if only while
           | powered).
           | 
           | I wonder what the managers who first heard the idea must've
           | thought :D
           | 
           | And it works so well! It's probably one of the most reliable
           | component in a computer.
        
             | pjc50 wrote:
             | Many of the early RAM systems were non-persistent (mercury
             | delay lines, phosphor) and some were destructive-read (core
             | memory).
             | 
             | Appears to have been invented by Dennard of Dennard
             | Scaling: https://www.thoughtco.com/who-invented-the-
             | intel-1103-dram-c...
        
       | kleiba wrote:
       | I'm a total lay person here, but my understanding is that
       | designing a new processor is very challenging these days because
       | of the patent situation. That is, so much in hardware design is
       | patented that you're bound to run into problems if you don't know
       | what you're doing.
       | 
       | Is this true, and is it of relevance here?
        
         | 10000truths wrote:
         | Yes, IP cores are very expensive to license, if they're even
         | available for licensing at all. This is part of the appeal of
         | RISC-V - an open-spec, royalty-free processor architecture that
         | is free of charge for chip designers to implement.
        
           | lkcl wrote:
           | unfortunately, if you make modifications and you want them to
           | be "upstreamed" (using libre/open project terminology as an
           | alonogy) you cannot do that without participating in the
           | RISC-V Foundation. you can implement APPROVED (Authorized)
           | parts of the RISC-V specification. you cannot arbitrarily go
           | changing it and still call it "RISC-V", that's a Trademark
           | violation.
        
           | admax88q wrote:
           | RISC-V is not an IP core, just an instruction set
           | architecture.
           | 
           | Any implementation of it has the exact same patent minefield
           | to navigate as any other ISA. Most of the patents are around
           | implementation techniques not instruction set.
        
             | jecel wrote:
             | The RISC-V instruction set was carefully designed not to
             | require the use of any currently valid patents to do an
             | implementation. It is up to each processor designer to not
             | violate any patents in their project.
        
               | lkcl wrote:
               | this is unfortunately not true (that the RISC-V ISA was
               | designed not to require currently-valid patents). people
               | may _believe_ that to be the case, but it 's not. from
               | 3rd hand i've heard that IBM has absolutely tons of
               | patents that RISC-V infringes. whether IBM decide to take
               | action on that is another matter. they're a bit of a
               | heavyweight, so there would have to be substantial harm
               | to their business for the "800 lb gorilla" effect to kick
               | in.
        
               | astrange wrote:
               | It's not actually possible to do this though, it's up to
               | the other side's lawyers to decide if they're going to
               | sue you, and the answer is yes if they can afford it. You
               | don't have a jury on hand to evaluate every patent that
               | ever exists.
               | 
               | Besides that, engineers in large companies are told to
               | explicitly not look up any patents so they won't be
               | accused of willful infringement.
        
         | vmception wrote:
         | Yes but there are alot of profitable applications which dont
         | need to be advertised. You run it in-house and make money on
         | the output, ie. ML farms or mining. You dont take preorders for
         | the hardware at all and just have boutique custom units and
         | nobody knows the architecture, even if you offer some remote
         | rental/SaaS tool.
        
         | pkaye wrote:
         | This processor seems to be a barrel processor architecture from
         | my quick look so not entirely new.
         | 
         | https://en.wikipedia.org/wiki/Barrel_processor
        
         | ChuckNorris89 wrote:
         | Not sure of how relevant it is here, but yes, GPUs
         | architectures are bound by tons of patents so you can bet your
         | a$$ that if you were to commercially launch your own GPU IP,
         | you'll have Nvidia's and AMD's lawyers knocking on your door in
         | under 10 seconds.
         | 
         | IIRC most companies out there selling GPU IP are still paying
         | royalties to AMD for their patents on shader architecture which
         | they got from their acquisition of ATI which in turn came from
         | their acquisition of ArtX which was founded by people who
         | worked at the long defunct SGI (Silicon Graphics).
         | 
         | The funny thing is, if you backtrack through all GPU
         | innovations, most stem from former SGI employees.
         | 
         | When 3Dfx went under, even though Nvidia's GPU tech was already
         | superior to anything 3Dfx had, Nvidia immediately swept in and
         | picked their carcass clean, mostly for their patents in this
         | space, so they would have more ammo/leverage against
         | competitors going forward.
         | 
         | Regardless how you feel about patents, with their pros and
         | cons, hardware engineering is a capital intensive business and
         | without patents to protect your expensive R&D, it wouldn't be a
         | viable business.
        
           | bserge wrote:
           | Aren't patents supposed to expire?
           | 
           | Isn't that the idea, you have a patent for 10-20 years, build
           | your business (which AMD/nVidia did, very successfully) then
           | everyone is free to use it, possibly leading to innovation?
           | 
           | I'm poorly versed in this, so if anyone with more knowledge
           | could share some thoughts, that would be appreciated.
        
             | lkcl wrote:
             | only if the patent holder does not create an "improvement"
             | on the old one. then the older (referenced) patent is
             | extended. Bosch have done this specifically so that they
             | can hold on to the original CAN Bus patent.
        
             | HideousKojima wrote:
             | Correct, patents in the US expire after 20 years.
        
               | JPLeRouzic wrote:
               | And if I remember correctly, (I wrote my last patent 10
               | years ago) there are annuals fees that would invalidate
               | any right if not paid.
        
           | arithmomachist wrote:
           | >Nvidia immediately swept in and picked their carcass clean,
           | mostly for their patents in this space, so they would have
           | more ammo/leverage against competitors going forward.
           | 
           | That's surely not a healthy situation either. Courts should
           | never be a central part of competition among businesses.
        
           | joshspankit wrote:
           | To clarify what I think is the relevance, as well as to
           | explore my own questions:
           | 
           | If someone were to clean-room design their own GPU chip, how
           | likely is it that Nvidia and AMD would come down on them
           | anyway simply by virtue of the fact that they (presumably)
           | have patents on everything that you could think of putting in
           | that chip?
           | 
           | In essence: do you now have to be an expert in what you're
           | _not_ allowed to put in before you even start?
        
             | raphlinus wrote:
             | So here's what I would do if I were in this situation. I
             | wouldn't build a graphics processing unit per se, but
             | instead would build a highly parallel SIMD CPU organized in
             | workgroups, and with workgroup-local shared memory. These
             | cores could be relatively simple in some respects (they
             | wouldn't need complex out-of-order superscalar pipelines or
             | sophisticated branch prediction), but should have good
             | simultaneous multithreading to hide latency effectively.
             | 
             | Then, if you wanted to run a traditional rasterization
             | pipeline, you'd do it basically in software, using
             | approaches similar to cudaraster (which is BSD licensed!).
             | The paper on that suggests that it would be on the order 2X
             | slower than optimized GPU hardware for triangle-centric
             | workloads, but that might be worth it. The good news is
             | this story gets better the more the workload diverges from
             | what traditional GPUs are tuned for - in particular, the
             | more sophisticated the shaders get, the more performance
             | depends on the ability to just evaluate the shader code
             | efficiently.
             | 
             | It would of course be very difficult to make a chip that is
             | competitive with modern GPUs (the engineering involved is
             | impressive by any standards), but I think a lot would be
             | gained from such an effort.
             | 
             | I should probably disclaim that this is _definitely_ not
             | legal advice. Anyone who wants to actually play in the GPU
             | space should plan on spending some quality time with a team
             | of topnotch lawyers.
        
               | jeffbush wrote:
               | (project author here) That is pretty close to the
               | approach this project has taken, although my motivation
               | was not so much avoiding IP as exploring the line between
               | hardware acceleration and software.
        
               | lkcl wrote:
               | allo jeff nice to see you're around :) thank you so much
               | for the time you spend guiding me through nyuzi. also for
               | explaining the value of the metric "pixels / clock" as a
               | measure for iteratively being able to focus on the
               | highest bang-per-buck areas to make incremental
               | improvements, progressing from full-software to high-
               | performance 3D.
               | 
               | have you seen Tom Forsyth's fascinating and funny talk
               | about how Larrabee turned into AVX512 after 15 years?
               | 
               | https://player.vimeo.com/video/450406346
               | https://news.ycombinator.com/item?id=15993848
        
               | raphlinus wrote:
               | Great to hear! I've poked around a little and see that,
               | and in any case wish you success and that we can all
               | learn from it.
        
             | ChuckNorris89 wrote:
             | To clarify further, Nvidia and AMD (and probably other
             | small players like ARM, Quallcomm, Imagination) own the
             | patents on core shader tech, which are the building blocks
             | of any modern GPU design.
             | 
             | If you want to design a GPU IP that works around all their
             | patents, you probably can, but unless you're a John Carmack
             | x10, your resulting design would be horribly inefficient
             | and not competitive enough to be worth the expensive
             | silicon it will be etched on and probably not compatible to
             | any modern API like Vulcan or DirectX.
             | 
             | But if you just want to build your own meme GPU for
             | education/shits and giggles, that doesn't follow any
             | patents or APIs, then you can and some people already did:
             | 
             | https://www.youtube.com/watch?v=l7rce6IQDWs
        
             | ericbarrett wrote:
             | I am not in the graphics space but I am quite familiar with
             | tech business practices.
             | 
             | I think the chance you would be sued is near 100%. If you
             | released and showed any market traction at all, you would
             | immediately become a threat to the duopoly; they surely
             | remember the rise of 3Dfx. Don't bother arguing the merits
             | of the patents because it would be a business decision, not
             | a technical one--this is the kind of thing that's decided
             | at the C-level and then justified (or cautioned against) by
             | the company's legal team, not the other way around. Patents
             | are merely leverage to effect the defense of the business,
             | and you can be sure they'll be used.
        
               | joshspankit wrote:
               | I agree with you (and definitely a conversation worth
               | having) but for the sake of this thread let's pretend
               | that legal action would only be taken when a patent was
               | actually matched with what was put in the chip.
        
             | lkcl wrote:
             | if it were done, say, as a Libre/Open processor, say, with
             | the backing of NLnet (a Charitable Foundation), where the
             | "Bad PR ju-ju" for trying it on was simply not worth the
             | effort
             | 
             | if it were done. say, as a Libre/Open processor, say, with
             | the backing of NLnet (a Charitable Foundation), where NLnet
             | has access to over 450 Law Professors more than willing to
             | protect "Libre/Open" projects from patent trolls by running
             | crowd-funded patent-busting efforts
             | 
             | if it were done as a Libre/Open Hybrid Processor, based on
             | extending an ISA such as ooo, I dunno, maybe OpenPOWER,
             | which has the backing of IBM with a patent portfolio
             | spanning several decades, who would be very upset if tiny
             | companies like NVidia or AMD tried it on against a
             | Charitably-funded project.
             | 
             | that would be a very interesting situation, wouldn't it? i
             | wonder if there's a project around that's trying this as a
             | strategy? hmmm, hey, you know what? there is! it's called
             | http://libre-soc.org
        
           | ericbarrett wrote:
           | I learned GL in the 1990s on SGI systems. Shaders didn't
           | exist, poly counts were in the 100s, and textures were a
           | massive processing burden. The rendering pipeline of course
           | was quite different. And yet so much is the same! Code
           | organization, data types, all is quite familiar, whether it's
           | OpenGL or DirectX or what not. The achievements of SGI
           | engineers have literally benefited generations.
        
             | lkcl wrote:
             | Jeff's evaluation of GPLGPU is fascinating:
             | https://jbush001.github.io/2016/07/24/gplgpu-
             | walkthrough.htm...
             | 
             | you are absolutely correct in that everything has moved on
             | from "Fixed Function" of SGI, and how GPLGPU works (worked)
             | - btw it's NOT GPL-licensed: Frank sadly made his own
             | license, "GPL words but with non-commercial tacked onto the
             | end" which ... er... isn't GPL... _sigh_ - but everything
             | commercially has now moved on to Shader Engines.
             | 
             | that basically means Vulkan.
             | 
             | however you may be fascinated to know, from Jeff's
             | evaluation, that there are still startling similarities in
             | basic functionality in not-GPL GPLGPU and in modern designs
             | targetted at Shader Engines.
        
           | ComputerGuru wrote:
           | I don't see how patents acquired from SGI could possibly
           | still be protected and require licensing.
        
       | peter_d_sherman wrote:
       | Related:
       | 
       | Ben Eater - Let's build a video card!
       | 
       | https://eater.net/vga
       | 
       | Embedded Thoughts Blog - Driving a VGA Monitor Using an FPGA
       | 
       | https://embeddedthoughts.com/2016/07/29/driving-a-vga-monito...
       | 
       | Ken Shirriff - Using an FPGA to generate raw VGA video:FizzBuzz
       | with animation
       | 
       | http://www.righto.com/2018/04/fizzbuzz-hard-way-generating-v...
       | 
       | Clifford Wolf - SimpleVOut -- A Simple FPGA Core for Creating
       | VGA/DVI/HDMI/OpenLDI Signals
       | 
       | https://github.com/cliffordwolf/SimpleVOut
       | 
       | PDS: Also, this looks interesting, from SimpleVOut:
       | 
       | >"svo_vdma.v
       | 
       | A _video DMA controller_. Has a read-only AXI4 master interface
       | to access the video memory. "
        
         | fortran77 wrote:
         | Yeah, but these people aren't doing GPGPU computation
        
           | phendrenad2 wrote:
           | Or even anything resembling even 2D graphics acceleration.
        
       | FPGAhacker wrote:
       | One of the things that interests me (of many), is the use of
       | cmake.
       | 
       | Does anyone have good references on extending cmake to new tools
       | that don't produce executables per se, or otherwise work in non
       | traditional ways?
        
       | code-scope wrote:
       | Very Cool project:
       | 
       | Love GPGPU, I git clone it and try to understand the code better
       | here:                  https://www.code-
       | scope.com/s/s/u#c=sd&uh=0f2c2fa280a2&h=afe7a329&di=-1&i=38
       | 
       | It looks like 5 stages FP (FP32?) pipe lines, NUM_VECTOR_LANES
       | =16 NUM_REGISTERS=32
       | 
       | Are you writing your own kernel from scratch? If so which CPU
       | does it runs on - some embedded CPU inside FPGA?
       | 
       | In the mandelbrot.c code, it has following: #define vector_mixi
       | __builtin_nyuzi_vector_mixi                   How does it get
       | translate to vector operations in FPGA?  Where is the code
       | implement the __builtin_*?
       | 
       | Thanks a lot an very interesting project.
        
       | marcodiego wrote:
       | There's people keeping OpenVGA alive[1]. With the failure of the
       | open grahics project[2] is there any known promising projects
       | besides libregpu[3]?
       | 
       | [1] https://github.com/elec-otago/openvga
       | 
       | [2] https://en.wikipedia.org/wiki/Open_Graphics_Project
       | 
       | [3] https://libre-soc.org/3d_gpu/
        
         | phkahler wrote:
         | >> is there any known promising projects besides libregpu?
         | 
         | I think the most useful thing right now would be a high quality
         | version of the "easy" parts of a GPU. Basic scan out, possibly
         | overlays, color space conversion, buffer handling. This would
         | allow ANY open processor projects to have frame buffer graphics
         | and run LLVMpipe for basic rendering and desktop compositing.
         | This may be slow, but it is required for every open GPU
         | project, while a SoC can live without the actual GPU for some
         | applications.
         | 
         | IMHO, first thing first.
        
           | lkcl wrote:
           | this is easy to chuck together in a few days, literally, from
           | pre-existing components found on the internet.
           | 
           | * litex (choose any one of the available cores)
           | 
           | * richard herveille's excellent rgb_ttl / VGA HDL
           | https://github.com/RoaLogic/vga_lcd
           | 
           | * some sort of "sprite" graphics would do
           | https://hackaday.com/2014/08/15/sprite-graphics-
           | accelerator-...
           | 
           | the real question is: would anyone bother to give you the
           | money to make such a project, and the question before that
           | is: can you tell a sufficiently compelling story to get
           | customers - _real_ customers with money - to write you a
           | Letter of Intent that you can show to investors?
           | 
           | if the answer to either of those questions is "no" then, with
           | many apologies for pointing this out, it's a waste of your
           | time unless you happen to have some other reason for doing
           | the work - basically one with zero expectation up-front of
           | turning it into a successful commercial product.
           | 
           | now, here's the thing: even if you were successful in that
           | effort, it's so trivial (Richard Herveille's RGB/TTL HDL sits
           | as a peripheral on the Wishbone Bus) that it's like... why
           | are you doing this again?
           | 
           | the _real_ effort _is_ the 3D part - Vulkan compliance,
           | Texture Opcodes, Vulkan Image format conversion opcodes
           | (YUV2RGB, 8888 to 1555 etc. etc.), SIN /COS/ATAN2, Dot
           | Product, Cross Product, Vector Normalisation, Z-Buffers and
           | so on.
        
             | phkahler wrote:
             | Seriously? VGA with DVI outputs? And a link to a Sprite
             | engine?
             | 
             | We need HDMI output, preferably 4K capable. I also
             | mentioned colorspace conversion. Should have also said to
             | "just throw in" video decoder for VP9 and AV1 if that's
             | available. The point is that the likes of SiFive and other
             | Risc-V SoC vendors should be making desktop chips, not just
             | headless Linux boards or ones with proprietary GPUs.
             | 
             | Like I said, the "easy" part should be done and available -
             | not theoretically assemblable from various pieces.
             | 
             | If this were readily available, I'd be able to buy it from
             | someone today. There IS a market for it and that will be
             | growing fast. Add a real GPU and things look even better.
        
           | marcodiego wrote:
           | Yeah. I also miss the small but firm steps approach.
        
       ___________________________________________________________________
       (page generated 2021-02-14 23:00 UTC)