[HN Gopher] Show HN: Minimax - A Compressed-First, Microcoded RI... ___________________________________________________________________ Show HN: Minimax - A Compressed-First, Microcoded RISC-V CPU RISC-V's compressed instruction (RVC) extension is intended as an add-on to the regular, 32-bit instruction set, not a replacement or competitor. Its designers intended RVC instructions to be expanded into regular 32-bit RV32I equivalents via a pre-decoder. What happens if we explicitly architect a RISC-V CPU to execute RVC instructions, and "mop up" any RV32I instructions that aren't convenient via a microcode layer? What architectural optimizations are unlocked as a result? "Minimax" is an experimental RISC-V implementation intended to establish if an RVC-optimized CPU is, in practice, any simpler than an ordinary RV32I core with pre-decoder. While it passes a modest test suite, you should not use it without caution. (There are a large number of excellent, open source, "little" RISC-V implementations you should probably use reach for first.) Author : gsmecher Score : 103 points Date : 2022-11-01 15:41 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | thrtythreeforty wrote: | This is very impressive, especially the performance per LUT! Did | I overlook frequency spec on a given target or did you not | specify? | | Will the execute stage pipeline effectively to reach higher | f_max? (Of course there will be a small logic penalty, and a | larger FF penalty, but the core is small enough that it would | probably be tolerable.) Or is the core's whole architecture | predicated on a two stage design? | gsmecher wrote: | This core is targeted at "smaller-is-better" applications with | few actual instruction-throughput requirements. If it reaches | 200 MHz on a Xilinx KU060, I will be delighted. (That specific | clock frequency on that specific part carries heavy hints about | what this core is intended for.) | | With that in mind: the single instruction-per-clock design is | for simplicity's sake, not performance's sake. If the execution | stage were pipelined, it'd be a different core. If performance | is the goal, I'd start by ripping out some of the details that | distinguish this core from other (excellent) RISC-V cores. | varispeed wrote: | KU060 costs a nice sum of PS4,529.10 on Mouser (out of stock | of course) | Teknoman117 wrote: | > out of stock of course | | I picked probably the worst time imaginable to get into | FPGAs. All of my "higher" end stuff is repurposed mining | hardware... | thatcherc wrote: | > 200 MHz on a Xilinx KU060 | | > (That specific clock frequency on that specific part | carries heavy hints about what this core is intended for.) | | Fun clue! Looks like the Xilinx KU060 is a rad-hard FPGA for | space applications. Does anyone know what 200 MHz might | imply? Comms maybe? | gaudat wrote: | Poor man's Tile64? | cmrdporcupine wrote: | This is very nice. A couple years ago I was playing around with a | hobby project I was dubbing "Retro-V" which was to be a RISC-V | core tied to a 1980s-style display processor and keyboard/mouse | input on a small FGPA and 512k or 1MB or so of SRAM. I was using | PicoRV32 for that, but this would have been be far better. | drh wrote: | Sounds interesting! What were you using for the display | processor? | cmrdporcupine wrote: | I was hand-rolling my own. I had it doing a basic 640x480 | buffer with some basic character generation and sprite | support & HDMI/DVI output | | These days I'd probably consider forking my friend Randy's | C64 VICII implementation (VIC-II Kawari) and just expand | framebuffer size, sprites, colours, etc, since he put so much | work into it. | | It was a lot of fun, but I got stalled on the SD card | interface. That was more complexity than I felt with dealing | at that point. And I was working at Google at the time and so | they owned all my thoughts and deeds and going through the | open sourcing process for it would have been a hassle. If I | wasn't hunting for work and needing to make $$ right now, I'd | pick it up again maybe? Was more of a verilog learning | process. | gsmecher wrote: | PicoRV32 and FemtoRV32 are both excellent, conventional RISC-V | implementations, and are more complete and proven than Minimax. | Relative to the size of any 7-series or newer Xilinx FPGA, the | difference in LUT cost between any of the three is pretty | minor. I think you made a perfectly defensible decision. (I | love me some SERV, too, and if you are willing to spend | orthodoxy to save gates, it's an excellent choice too.) | cmrdporcupine wrote: | Yes, PicoRV32 is very nice. However for what I was building, | with limited RAM, compressed instructions would have made a | lot of sense. I started porting a BASIC to my system (in C), | and it quite easily would have filled almost the whole 512kB | SRAM. | | And the thought of handwriting one in RISC-V assembly | convinced me that maybe RISC-V wasn't as "retro friendly" as | I would have liked. | gsmecher wrote: | Understood. Maybe this landed after your project - but both | PicoRV32 and SERV now support compressed extensions, at | some additional resource cost. FemtoRV32 Quark doesn't - | which is not a knock, since it's a beautifully simple | implementation and that's the point. | | The retrocomputing scene looks like a ton of fun and I'd be | delighted if any of my work is used there. | cmrdporcupine wrote: | Ah, yes, this was 2018/19, in the Before Times, and I | don't recall if PicoRV32 had compressed yet but I don't | think it did. | | SERV always looked intriguing, too. Though I recall maybe | its build process was a hassle. | | Anyways, this is neat, keep on keeping on! I'm just a | software guy, so I remain amazed by the world at the gate | level and what it can do. Entirely different kind of | abstraction building. | tomcam wrote: | > RISC-V's compressed instruction (RVC) extension is intended as | an add-on | | Doesn't it make this... an IISC? Increased instruction set? | Asking for a friend | znwu wrote: | RISC no longer has the clear border as it had 30 years ago. | Nowadays RISC just means an ISA has most of the following | points: 1. Load/Store architecture 2. Fixed-length instructions | or few length variations. 3. Highly uniform instruction | encoding. 4. Mostly single-operation instructions. | | These four points all have direct benefits on hardware design. | And compressed ISA like RVC and Thumb checks them all. | | On the contrary, "fewer instruction types", "orthognoal | instructions" never had any real benefit beyond perceptual | aesthetics, so as a result they are long abandoned. | [deleted] | sterlind wrote: | the actual Verilog source is incredibly small. I would have | thought that implementing a CPU, even a toy one, would take more | than 500 lines. is this normal for hardware? | nine_k wrote: | I suspect some heavier lifting is done here: | use ieee.std_logic_1164.all; use ieee.numeric_std.all; | | It looks that the VHDL source is about instruction decoding, | registers, etc, but does not include things like ALU logic. (I | don't know VHDL actually.) | robinsonb5 wrote: | Those two lines are just the VHDL equivalent of #include | <stdio.h> - i.e. boilerplate that you'll see in almost every | source file. | | But it's true that you don't have to describe the ALU down to | the bit level - thanks to those two lines you can say "q <= | d1 + d2" instead of having to build an adder at the gate | level. (Though you can, of course, do that if you really want | to!) | gsmecher wrote: | What you see is all there is. | | At a certain scale, it's conventional for hardware designs to | become complex enough that it's necessary to structure them in | hierarchies, just to maintain control. This design is small | enough that none of the extra structure is essential. | | It's possible to be incredibly expressive in Verilog and VHDL. | This implementation is written in VHDL, which has an outdated | reputation for being long-winded. | | Also worth a look: FemtoRV32 Quark [0], which is written in | Verilog. | | [0]: https://github.com/BrunoLevy/learn- | fpga/blob/master/FemtoRV/... | robinsonb5 wrote: | Have you seen the OPC series of CPUs? (One Page Computing - | the challenge being to keep the code small enough to be | printed onto a single sheet of line printer paper!) | gsmecher wrote: | Yup! Thanks for pointing OPC [0] out. These CPUs were a | huge eye-opener - and a huge lesson about the value of | using a standardized instruction set. | | Building a custom CPU commits you to writing an assembler | and listing generator - which is a good hobby-project job | for one person who's handy with Python. After stumbling | through those foothills, though, I found myself at the base | of some very steep, scary GCC/binutils cliffs wondering how | I could have gotten so lost, so far from home. | | Even if all RISC-V does is offer a bunch of arbitrary | answers to arbitrary design questions, I consider it a | massive win. | | [0]: https://revaldinho.github.io/opc/ | robinsonb5 wrote: | That is very, cool. I'm particularly interested in the | compressed-first approach because I have some projects where | minimising BRAM usage is paramount so code density really | matters. The use of microcode to emulate 32-bit instructions | reminds me a lot of ZPU (I still have a soft spot for that | architecture) - was that an influence? | downvotetruth wrote: | Can the address and/or data also be 16 bit or would that violate | RISC-V spec? | snvzz wrote: | AIUI the registers and operations with them should be 32bit for | RV32I. | | The bus is up to you... should you want a 8bit data bus and 16 | bit address bus, I don't think the spec cares. | | This is akin to 68020 (32bit ISA) vs 68000 (still 32bit ISA) or | 68008 (still 32bit ISA). | gsmecher wrote: | I don't think the RISC-V spec cares, either, since it | specifies an execution environment but not interfaces. | | A narrower data bus would allow a 2-cycle execution path, and | would likely split the longest combinatorial path in the | current design (which certainly goes through the adder tree.) | This could be either an 0.5 instruction-per-clock (IPC) | design, or a pipelined design that maintains 1 IPC at the | expense of extra pipeline hazards and corresponding bubbles. | | A narrower address seems like it's only helpful as a knock-on | to a split data bus. | | Gut feeling: I doubt that splitting the data or address buses | into additional phases would actually save resources. You | would certainly need more flip-flops to maintain state, and | more LUTs to manage combinational paths across the two | execution stages. While you can sometimes add complexity and | "win back" gates, it's an approach with limits. If you | compare SERV's resource usage to FemtoRV32-Quark's, it's | notable how much additional state (flip-flops) SERV "spends" | to reduce its combinatorial logic (LUT) footprint. ___________________________________________________________________ (page generated 2022-11-01 23:01 UTC)