[HN Gopher] Show HN: RISC-V core written in 600 lines of C89
       ___________________________________________________________________
        
       Show HN: RISC-V core written in 600 lines of C89
        
       Author : mnurzia
       Score  : 145 points
       Date   : 2023-06-10 13:08 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | aportnoy wrote:
       | How about a RISC-V disassembler in 200 lines of C99?
       | 
       | https://github.com/andportnoy/riscv-disassembler/blob/master...
        
         | mnurzia wrote:
         | This is really cool, thanks for sharing! Something like this
         | would be a great tool to distribute with my emulator.
        
           | garganzol wrote:
           | It would be nice if you could put a link to that project to
           | your README file. Both projects are very impressive,
           | especially when seen in conjunction with each other.
        
             | aportnoy wrote:
             | I mean, his simulator already has a disassembler contained
             | within it, would just need to replace comments with print
             | statements.
        
       | charcircuit wrote:
       | This isn't a RISC-V core. It is a RISC-V emulator library.
        
       | nevi-me wrote:
       | Question: do the implementation of single instructions compile to
       | single instructions if targeting RISC-V with optimisations
       | enabled? That would be really awesome if compilers realise what
       | your code is doing and replace the implementations of
       | instructions with those instructions.
        
         | mnurzia wrote:
         | Not really, my implementation isn't smart enough to guide
         | compilers to the right solution. Trivial instructions, like
         | xor, are of course recognized, but for example the 32x32 mul
         | implementation isn't. Maybe compilers will be smart enough one
         | day...
         | 
         | https://godbolt.org/z/WEcTzKf7M
        
         | dbcurtis wrote:
         | Yeah, well, the rock that breaks your pick in that scenario is
         | copying all the processor state back and forth to/from the
         | emulation model, including flag register bits, and also
         | correctly handling exceptions and faults. Emulating the
         | instruction's happy path is just scratching the surface.
        
           | duskwuff wrote:
           | > including flag register bits
           | 
           | RISC-V doesn't have those. Compare+branch is a single
           | instruction.
        
           | sitkack wrote:
           | In the guest, you trap on reading emulation state, so that
           | the source of truth is the hardware. Rather than use
           | something like KVM I wonder if you could run another child
           | process and use P trace?
        
       | [deleted]
        
       | sutterbutter wrote:
       | As a total newb to programming, what am I looking at here?
        
         | detrites wrote:
         | There are several different types of CPU's, in two main
         | classes, CISC and RISC. The difference is summarised by the
         | first letter - "Complex" vs. "Reduced" - Instruction Set
         | Computer. Or, what size "vocabulary" a CPU decodes.
         | 
         | RISC-V is a type of CPU architecture (a set of plans for how to
         | build one, not an actual CPU itself), that also happens to be
         | open source. Anyone can build a RISC-V CPU without having to
         | buy the rights to do so. (Many are.)
         | 
         | This project is an _emulation_ of a RISC-V CPU. A kind of
         | virtual  "reference" CPU in software. It can be used to compile
         | code that can run on a RISC-V type CPU, and to help understand
         | what's happening inside the CPU when it runs.
         | 
         | It's written in C, which is and was a very fundamental
         | programming language that's influenced the design of many other
         | languages. It is a language that is very close the fundamental
         | language CPU's natively decode and process.
         | 
         | CPU's natively use a language referred to as "Assembly", but
         | which actually has many varieties particular to each CPU
         | design. Regardless of variety of CPU, assembly is usually is
         | about as reasonably "close to metal" as it gets.
         | 
         | It's literally communicating with the CPU directly in its own
         | language. This makes it extremely fast to run, but laborious to
         | code, and also somewhat "dangerous" in that with such low-level
         | control, it's easy to mess things up.
         | 
         | This project takes an input of a text list of RISC-V assembly
         | instructions (a "program") and pretends to be RISC-V CPU with
         | those instructions loaded into it and being run on it. Useful
         | for understanding, prototyping and building a RISC-V program.
         | 
         | CPU's are designed rather to run assembly that already "works",
         | having been created programmatically (compiled or interpreted),
         | by a higher level language that isn't going to give it things
         | that make no sense (hopefully).
         | 
         | So there is not usually a lot of provisioning done in the
         | design of the CPU to make it easy to watch it and its state
         | carefully at a low level and examine how your assembly program
         | is working, or not working. Emulation eases this.
        
           | dragonwriter wrote:
           | > CPU's natively use a language referred to as "Assembly", b
           | 
           | Strictly, CPUs use machine code. Assembly targeting a
           | particular CPU is a _very_ thin more-human-readable
           | abstraction around the underlying machine code, but it is
           | not, itself, what the CPU executes. That's why "assemblers"
           | exist - they are compilers from assembly language to machine
           | code (though, because assembly is a very thin abstraction,
           | they are much simpler than most other compilers.)
        
             | tester756 wrote:
             | Would calling "Assembly" a CPU's frontend language be
             | correct?
             | 
             | The same way as it is in compilers
        
             | detrites wrote:
             | Agree. And deeper than that may be microcode, which we
             | rarely see or reason about, and while may very much be
             | there is rarely of practical use. (Ie, when learning, the
             | distinctions may be somewhat an impediment without payoff.)
        
       | bjourne wrote:
       | Why stick with c89? Can't think of any compilers that doesn't
       | support c99 nowadays. The major benefit is that you can use
       | uint8_t and friends directly and don't need to define your own
       | wrapper types.
        
         | flohofwoe wrote:
         | One "advantage" (if one wants to call that) is that the code
         | would also compile as C++, while C99 has diverged enough from
         | the common C/C++ subset that one cannot use all C99 features in
         | C++ mode.
        
           | mnurzia wrote:
           | I totally missed this, good point.
           | 
           | Slightly unrelated, but just thought I should mention: the
           | sokol libraries are awesome!
        
         | contrarian1234 wrote:
         | Did Visual Studio finally make the jump?! (you could always
         | just compile it as C++ code though)
        
           | bjourne wrote:
           | Nope stdint.h has been in msvc for over 10 years. Other c99
           | features may be not supported though.
        
             | flohofwoe wrote:
             | Except for VLAs (which are optional post-C99 anyway), MSVC
             | actually has pretty good support for recent C versions, and
             | since 2020 they're basically back on the "modern C" train: 
             | https://devblogs.microsoft.com/cppblog/c11-and-c17-standard
             | -...
        
             | jpfr wrote:
             | MSVC did a big rewrite of the C frontend around MSVC2013. I
             | haven't encountered C99 idioms that don't work nowadays.
             | Granted, I might not use every feature in my typical coding
             | style...
        
               | arp242 wrote:
               | It's been "fully" C99 (and C11, C17) compliant for about
               | 2 or 3 years. The only missing C99 featured before that
               | were relatively rarely used ones like _Pragma.
        
             | mort96 wrote:
             | Hasn't the main issue with MS been VLAs? I seem to recall
             | that VLAs are the main reason MSVC won't ever support C99,
             | and that MSVC is one of the main reasons why VLAs were made
             | optional. It seems like MSVC supports C11 and C17 now,
             | thanks to the removal of mandatory VLAs.
        
               | zabzonk wrote:
               | vehement oppostion from ms my be one of the reasons for
               | them being optional (and thus worthless) but the main one
               | is that that they are impossible to use correctly. what
               | happens if you make one too big?
        
               | mort96 wrote:
               | I think they could potentially have some very limited
               | valid use cases, but I agree that a fixed length array
               | and/or heap allocation is usually much better than VLAs.
               | 
               | I was mainly just pointing out that MS's lack of C99
               | support isn't really a part of keeping C89 alive,
               | especially now that they officially support C11.
        
         | boricj wrote:
         | Funnily enough, the file rv.h does use stdint.h if available
         | and contains the following comment:
         | 
         | > All I want for Christmas is C89 with stdint.h
        
         | dezgeg wrote:
         | I've met several people that seriously think that C89 is the
         | peak of programming languages and that C99 just brings
         | misfeatures (like, allowing variable declarations in middle of
         | basic blocks according to them)
        
         | mnurzia wrote:
         | It's more of a fun exercise, I guess. But I do have experience
         | with at least one compiler that doesn't support C99: Zilog's
         | ez80 C compiler. Back in the day I used to program my TI-84+ CE
         | for fun[0], and the only C solution was a pretty bespoke
         | C89-only compiler[1] distributed with a community toolchain[2],
         | which has since switched to clang. It's somewhat irrational,
         | but in the back of my mind it bugs me if the software I write
         | can't run on platforms like that.
         | 
         | [0] https://github.com/mnurzia/chip8-ce
         | 
         | [1] http://www.zilog.com/docs/appnotes/pb0098.pdf
         | 
         | [2] https://ce-programming.github.io/toolchain/
        
       | freecodyx wrote:
       | This proves that at the core. The things we rely on to achieve
       | great software and life impacting technologies are extremely
       | simple. The complexity is that how to make them.
        
         | numpad0 wrote:
         | The complexity is in how to distribute dev workload and how to
         | make it financially viable. No one pays for beautiful works of
         | art unless it's somehow anchored, tangled and aligned into
         | their interests.
        
         | arcticbull wrote:
         | The core concepts are generally very straightforward, however
         | it's always the optimization that adds complexity. That's how
         | you get the orders of magnitude improvement. This C89 core
         | definitely doesn't do macro op fusion for instance.
        
       | rowanG077 wrote:
       | The Readme doesn't answer it but I struggle to see why you want a
       | c implementation of an ISA.
        
         | detrites wrote:
         | Not sure if this was intended, but coming to this as someone
         | vaguely aware of RISC-V, it's looking like a fantastic form of
         | documentation for the ISA, that both describes and gives a way
         | to play with it, but in an intuitive, even fun manner.
         | 
         | Obviously this works best for someone who already knows C -
         | but, given it's C89 mitigates against this aspect somewhat.
        
           | rowanG077 wrote:
           | A reference implementation would be in Verilog or VHDL.
        
         | Farmadupe wrote:
         | Considering it's allocation-free, maybe it's an ultralight/
         | simulator for checking large quantities of compiler output?
         | (i.e no VM to create and destroy for every testcase)
         | 
         | Or the same but for testing some verilog/vhdl CPU implemetation
         | in a simulator?
         | 
         | Or since it's only 500SLOC, maybe it's just for fun!
        
           | rowanG077 wrote:
           | Then I would expect a comparison with verilator.
        
           | mnurzia wrote:
           | This is an excellent idea. One limitation of a testing
           | library of mine, `mptest`, is its inability to sandbox tests.
           | I may take this idea and develop a more robust (and
           | potentially parallel) testing framework around it.
        
         | nly wrote:
         | So you can compile and run it on any platform with a C compiler
        
           | rowanG077 wrote:
           | That is just something you can do with C code. That is not a
           | goal in itself. Why would you want to run a C ISA instead of
           | just using a standard simulator? Why not use verilator + any
           | of the open source RISC-V cores?
        
             | LoganDark wrote:
             | Because those are slower, more complex, and more difficult
             | to understand?
        
               | rowanG077 wrote:
               | I doubt verilator is much slower. The speed of it is
               | insane. They are indeed more complex and difficult to
               | understand. But I fail to see how that is a criterium. I
               | would very much rather include an industry standard
               | library in comparison to something homegrown.
        
         | srgpqt wrote:
         | Perhaps this could be used to run sandboxed code. Game engines
         | could safely run mods using something like this, ala QuakeC.
        
           | mnurzia wrote:
           | Definitely. My motivation for writing this was to have a
           | simple CPU for a virtual game console-like project. I decided
           | to release it on its own, though.
        
           | mcraiha wrote:
           | For modern game engine you most likely want WebAssembly
           | support. e.g. Flight Simulator does that
           | https://flightsimulator.zendesk.com/hc/en-
           | us/articles/766290...
        
             | srgpqt wrote:
             | Sure, I'd love to see your 600 line webassembly
             | interpreter.
        
               | sitkack wrote:
               | Run wasm on this core.
        
       | bitwize wrote:
       | I feel myself descending into old-fartitude more and more with
       | every year. My wife and I were recently involved in a car
       | accident (no one was hurt). While I was being checked out I
       | overheard a 20-year-old firefighter exchange Facebook information
       | with an 18-year-old EMT. I was like, "wait a minute, you guys
       | seem really young and you still use Facebook? I thought Facebook
       | was for your grandparents and all the kids now use Snapchat or
       | TikTok?"
       | 
       | I get that same feeling now. This kid is 20 and still using C89?
       | Shouldn't people his age have been reared entirely in the
       | crystal-spires-and-togas utopia of Rust, with raw pointers and
       | buffer overruns being mere legends of their people's benighted
       | past told to them by their elders?
       | 
       | It's kind of comforting to see young programmers embracing the
       | old ways, even if it's for hack value only.
        
         | sitkack wrote:
         | I think kids or at least there's the risk of kids seeing old
         | people romantically reenacting their eight bit micro days and
         | think that it's some thing besides nostalgia.
         | 
         | I was kind of the opposite as a kid, if it wasn't crazy
         | futuristic I didn't want it. So even in the 80s I wanted an
         | FPGA accelerators in every machine.
        
         | mnurzia wrote:
         | Admittedly, C89 has very little utility, especially among
         | people my age. For example, my university progresses from
         | Racket to Java to C++, and has a systems course that partially
         | teaches C11. Although good for teaching, I don't think those
         | languages artificially constrained me in the ways that C89
         | does. I felt that my programming skills improved the most when
         | I forced myself to work in such an under-powered language.
         | 
         | I also like the idea of being able to run my code anywhere,
         | kind of like Doom.
        
       | peterfirefly wrote:
       | 'switch' is a really, really nice language construct that was
       | fully implemented long before C89. Using lots of nested 'if's
       | instead is not a good idea.
        
         | hgs3 wrote:
         | 'switch' is good, but for VM's computed goto is better.
        
           | KerrAvon wrote:
           | depends on the compiler implementation. modern compilers may
           | be able to treat equivalent switch statements, gotos, and
           | if/else statements pretty much the same
        
             | nsajko wrote:
             | Only in trivial cases.
        
         | sylware wrote:
         | nested "ifs" are optimized out by compilers. Moreover in the
         | latest horrible gcc extensions you have the case statement
         | using a _not_compiler constant expression (you can find the
         | usage of such horrible gcc extension in linux net code).
        
           | mnurzia wrote:
           | This was my one of my main justifications for making this
           | design choice, in addition to the (in my opinion)
           | overwhelming amount of break statements that would result
           | from using switches. But more importantly, many of the "if"
           | statements have non-constant or more complex expressions in
           | them that aren't supported in switch statements in ANSI C.
        
             | sylware wrote:
             | Yep.
             | 
             | And as you stated, it is important to stay as much as
             | possible close to c89, because ISO is literaly doing
             | planned obsolescence, but on a long time cycle (5-10
             | years).
             | 
             | Hopefully risc-v will be a success, and all system
             | components and interpreters of very-high-level languages
             | will be rewritten in risc-v assembly and it will become
             | actually very hard to do planned-obsolescence.
        
       | sylware wrote:
       | A bigger implementation, but has 64bits support:
       | 
       | https://bellard.org/tinyemu/
        
       | garganzol wrote:
       | Seeing the RISC-V instructions implemented in the emulator like
       | that, it comes to my mind that RISC-V is really a reduced
       | instruction set CPU.
       | 
       | When compared to AVR 16-bit RISC instruction set, RISC-V looks so
       | much simpler. (You may be indirectly familiar with AVR
       | architecture by the household name "Arduino".)
       | 
       | The intriguing part is that AVR is just a microcontroller, while
       | RISC-V is intended to be a full-blown CPU.
        
         | opencl wrote:
         | The base instruction set is tiny but there are quite a few
         | extensions and pretty much every practical implementation
         | includes at least a few of them.
         | 
         | i.e. the GD32V microcontrollers implement RV32IMAC, Allwinner
         | D1 which is a "full-blown" CPU meant to run Linux implements
         | RV64IMAFDCVU.
         | 
         | RV32I/RV64I are the base 32/64 bit integer instruction sets and
         | every letter after that is a different extension. Most of the
         | extensions are relatively small and simple, but the C
         | (compressed instructions) extension introduces some decoder
         | complexity and the V (vector) extension adds several hundred
         | instructions.
         | 
         | Though even with all the extensions it is still a very
         | small/simple ISA by modern standards.
        
       | RobotToaster wrote:
       | Is this designed to be used with some kind of C to VHDL/verilog
       | transpiler?
        
         | RealityVoid wrote:
         | Not really, think of it like a... CPU emulator? Ish? You have
         | registers as variables in the program. If you have register a1
         | and you are at an instruction adding 1 to it, it will add 1 to
         | the variable representing a1. So on and so forth.
         | 
         | This works because, well, memory operations are mostly(all?) a
         | CPU does so this "core" takes the program and does the same
         | kind of memory operations the silicon would do, only in SW.
        
       ___________________________________________________________________
       (page generated 2023-06-10 23:01 UTC)