[HN Gopher] Show HN: Minimax - A Compressed-First, Microcoded RI...
       ___________________________________________________________________
        
       Show HN: Minimax - A Compressed-First, Microcoded RISC-V CPU
        
       RISC-V's compressed instruction (RVC) extension is intended as an
       add-on to the regular, 32-bit instruction set, not a replacement or
       competitor. Its designers intended RVC instructions to be expanded
       into regular 32-bit RV32I equivalents via a pre-decoder.  What
       happens if we explicitly architect a RISC-V CPU to execute RVC
       instructions, and "mop up" any RV32I instructions that aren't
       convenient via a microcode layer? What architectural optimizations
       are unlocked as a result?  "Minimax" is an experimental RISC-V
       implementation intended to establish if an RVC-optimized CPU is, in
       practice, any simpler than an ordinary RV32I core with pre-decoder.
       While it passes a modest test suite, you should not use it without
       caution. (There are a large number of excellent, open source,
       "little" RISC-V implementations you should probably use reach for
       first.)
        
       Author : gsmecher
       Score  : 103 points
       Date   : 2022-11-01 15:41 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | thrtythreeforty wrote:
       | This is very impressive, especially the performance per LUT! Did
       | I overlook frequency spec on a given target or did you not
       | specify?
       | 
       | Will the execute stage pipeline effectively to reach higher
       | f_max? (Of course there will be a small logic penalty, and a
       | larger FF penalty, but the core is small enough that it would
       | probably be tolerable.) Or is the core's whole architecture
       | predicated on a two stage design?
        
         | gsmecher wrote:
         | This core is targeted at "smaller-is-better" applications with
         | few actual instruction-throughput requirements. If it reaches
         | 200 MHz on a Xilinx KU060, I will be delighted. (That specific
         | clock frequency on that specific part carries heavy hints about
         | what this core is intended for.)
         | 
         | With that in mind: the single instruction-per-clock design is
         | for simplicity's sake, not performance's sake. If the execution
         | stage were pipelined, it'd be a different core. If performance
         | is the goal, I'd start by ripping out some of the details that
         | distinguish this core from other (excellent) RISC-V cores.
        
           | varispeed wrote:
           | KU060 costs a nice sum of PS4,529.10 on Mouser (out of stock
           | of course)
        
             | Teknoman117 wrote:
             | > out of stock of course
             | 
             | I picked probably the worst time imaginable to get into
             | FPGAs. All of my "higher" end stuff is repurposed mining
             | hardware...
        
           | thatcherc wrote:
           | > 200 MHz on a Xilinx KU060
           | 
           | > (That specific clock frequency on that specific part
           | carries heavy hints about what this core is intended for.)
           | 
           | Fun clue! Looks like the Xilinx KU060 is a rad-hard FPGA for
           | space applications. Does anyone know what 200 MHz might
           | imply? Comms maybe?
        
           | gaudat wrote:
           | Poor man's Tile64?
        
       | cmrdporcupine wrote:
       | This is very nice. A couple years ago I was playing around with a
       | hobby project I was dubbing "Retro-V" which was to be a RISC-V
       | core tied to a 1980s-style display processor and keyboard/mouse
       | input on a small FGPA and 512k or 1MB or so of SRAM. I was using
       | PicoRV32 for that, but this would have been be far better.
        
         | drh wrote:
         | Sounds interesting! What were you using for the display
         | processor?
        
           | cmrdporcupine wrote:
           | I was hand-rolling my own. I had it doing a basic 640x480
           | buffer with some basic character generation and sprite
           | support & HDMI/DVI output
           | 
           | These days I'd probably consider forking my friend Randy's
           | C64 VICII implementation (VIC-II Kawari) and just expand
           | framebuffer size, sprites, colours, etc, since he put so much
           | work into it.
           | 
           | It was a lot of fun, but I got stalled on the SD card
           | interface. That was more complexity than I felt with dealing
           | at that point. And I was working at Google at the time and so
           | they owned all my thoughts and deeds and going through the
           | open sourcing process for it would have been a hassle. If I
           | wasn't hunting for work and needing to make $$ right now, I'd
           | pick it up again maybe? Was more of a verilog learning
           | process.
        
         | gsmecher wrote:
         | PicoRV32 and FemtoRV32 are both excellent, conventional RISC-V
         | implementations, and are more complete and proven than Minimax.
         | Relative to the size of any 7-series or newer Xilinx FPGA, the
         | difference in LUT cost between any of the three is pretty
         | minor. I think you made a perfectly defensible decision. (I
         | love me some SERV, too, and if you are willing to spend
         | orthodoxy to save gates, it's an excellent choice too.)
        
           | cmrdporcupine wrote:
           | Yes, PicoRV32 is very nice. However for what I was building,
           | with limited RAM, compressed instructions would have made a
           | lot of sense. I started porting a BASIC to my system (in C),
           | and it quite easily would have filled almost the whole 512kB
           | SRAM.
           | 
           | And the thought of handwriting one in RISC-V assembly
           | convinced me that maybe RISC-V wasn't as "retro friendly" as
           | I would have liked.
        
             | gsmecher wrote:
             | Understood. Maybe this landed after your project - but both
             | PicoRV32 and SERV now support compressed extensions, at
             | some additional resource cost. FemtoRV32 Quark doesn't -
             | which is not a knock, since it's a beautifully simple
             | implementation and that's the point.
             | 
             | The retrocomputing scene looks like a ton of fun and I'd be
             | delighted if any of my work is used there.
        
               | cmrdporcupine wrote:
               | Ah, yes, this was 2018/19, in the Before Times, and I
               | don't recall if PicoRV32 had compressed yet but I don't
               | think it did.
               | 
               | SERV always looked intriguing, too. Though I recall maybe
               | its build process was a hassle.
               | 
               | Anyways, this is neat, keep on keeping on! I'm just a
               | software guy, so I remain amazed by the world at the gate
               | level and what it can do. Entirely different kind of
               | abstraction building.
        
       | tomcam wrote:
       | > RISC-V's compressed instruction (RVC) extension is intended as
       | an add-on
       | 
       | Doesn't it make this... an IISC? Increased instruction set?
       | Asking for a friend
        
         | znwu wrote:
         | RISC no longer has the clear border as it had 30 years ago.
         | Nowadays RISC just means an ISA has most of the following
         | points: 1. Load/Store architecture 2. Fixed-length instructions
         | or few length variations. 3. Highly uniform instruction
         | encoding. 4. Mostly single-operation instructions.
         | 
         | These four points all have direct benefits on hardware design.
         | And compressed ISA like RVC and Thumb checks them all.
         | 
         | On the contrary, "fewer instruction types", "orthognoal
         | instructions" never had any real benefit beyond perceptual
         | aesthetics, so as a result they are long abandoned.
        
       | [deleted]
        
       | sterlind wrote:
       | the actual Verilog source is incredibly small. I would have
       | thought that implementing a CPU, even a toy one, would take more
       | than 500 lines. is this normal for hardware?
        
         | nine_k wrote:
         | I suspect some heavier lifting is done here:
         | use ieee.std_logic_1164.all;         use ieee.numeric_std.all;
         | 
         | It looks that the VHDL source is about instruction decoding,
         | registers, etc, but does not include things like ALU logic. (I
         | don't know VHDL actually.)
        
           | robinsonb5 wrote:
           | Those two lines are just the VHDL equivalent of #include
           | <stdio.h> - i.e. boilerplate that you'll see in almost every
           | source file.
           | 
           | But it's true that you don't have to describe the ALU down to
           | the bit level - thanks to those two lines you can say "q <=
           | d1 + d2" instead of having to build an adder at the gate
           | level. (Though you can, of course, do that if you really want
           | to!)
        
         | gsmecher wrote:
         | What you see is all there is.
         | 
         | At a certain scale, it's conventional for hardware designs to
         | become complex enough that it's necessary to structure them in
         | hierarchies, just to maintain control. This design is small
         | enough that none of the extra structure is essential.
         | 
         | It's possible to be incredibly expressive in Verilog and VHDL.
         | This implementation is written in VHDL, which has an outdated
         | reputation for being long-winded.
         | 
         | Also worth a look: FemtoRV32 Quark [0], which is written in
         | Verilog.
         | 
         | [0]: https://github.com/BrunoLevy/learn-
         | fpga/blob/master/FemtoRV/...
        
           | robinsonb5 wrote:
           | Have you seen the OPC series of CPUs? (One Page Computing -
           | the challenge being to keep the code small enough to be
           | printed onto a single sheet of line printer paper!)
        
             | gsmecher wrote:
             | Yup! Thanks for pointing OPC [0] out. These CPUs were a
             | huge eye-opener - and a huge lesson about the value of
             | using a standardized instruction set.
             | 
             | Building a custom CPU commits you to writing an assembler
             | and listing generator - which is a good hobby-project job
             | for one person who's handy with Python. After stumbling
             | through those foothills, though, I found myself at the base
             | of some very steep, scary GCC/binutils cliffs wondering how
             | I could have gotten so lost, so far from home.
             | 
             | Even if all RISC-V does is offer a bunch of arbitrary
             | answers to arbitrary design questions, I consider it a
             | massive win.
             | 
             | [0]: https://revaldinho.github.io/opc/
        
       | robinsonb5 wrote:
       | That is very, cool. I'm particularly interested in the
       | compressed-first approach because I have some projects where
       | minimising BRAM usage is paramount so code density really
       | matters. The use of microcode to emulate 32-bit instructions
       | reminds me a lot of ZPU (I still have a soft spot for that
       | architecture) - was that an influence?
        
       | downvotetruth wrote:
       | Can the address and/or data also be 16 bit or would that violate
       | RISC-V spec?
        
         | snvzz wrote:
         | AIUI the registers and operations with them should be 32bit for
         | RV32I.
         | 
         | The bus is up to you... should you want a 8bit data bus and 16
         | bit address bus, I don't think the spec cares.
         | 
         | This is akin to 68020 (32bit ISA) vs 68000 (still 32bit ISA) or
         | 68008 (still 32bit ISA).
        
           | gsmecher wrote:
           | I don't think the RISC-V spec cares, either, since it
           | specifies an execution environment but not interfaces.
           | 
           | A narrower data bus would allow a 2-cycle execution path, and
           | would likely split the longest combinatorial path in the
           | current design (which certainly goes through the adder tree.)
           | This could be either an 0.5 instruction-per-clock (IPC)
           | design, or a pipelined design that maintains 1 IPC at the
           | expense of extra pipeline hazards and corresponding bubbles.
           | 
           | A narrower address seems like it's only helpful as a knock-on
           | to a split data bus.
           | 
           | Gut feeling: I doubt that splitting the data or address buses
           | into additional phases would actually save resources. You
           | would certainly need more flip-flops to maintain state, and
           | more LUTs to manage combinational paths across the two
           | execution stages. While you can sometimes add complexity and
           | "win back" gates, it's an approach with limits. If you
           | compare SERV's resource usage to FemtoRV32-Quark's, it's
           | notable how much additional state (flip-flops) SERV "spends"
           | to reduce its combinatorial logic (LUT) footprint.
        
       ___________________________________________________________________
       (page generated 2022-11-01 23:01 UTC)