[HN Gopher] Show HN: RISC-V core written in 600 lines of C89 ___________________________________________________________________ Show HN: RISC-V core written in 600 lines of C89 Author : mnurzia Score : 145 points Date : 2023-06-10 13:08 UTC (9 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | aportnoy wrote: | How about a RISC-V disassembler in 200 lines of C99? | | https://github.com/andportnoy/riscv-disassembler/blob/master... | mnurzia wrote: | This is really cool, thanks for sharing! Something like this | would be a great tool to distribute with my emulator. | garganzol wrote: | It would be nice if you could put a link to that project to | your README file. Both projects are very impressive, | especially when seen in conjunction with each other. | aportnoy wrote: | I mean, his simulator already has a disassembler contained | within it, would just need to replace comments with print | statements. | charcircuit wrote: | This isn't a RISC-V core. It is a RISC-V emulator library. | nevi-me wrote: | Question: do the implementation of single instructions compile to | single instructions if targeting RISC-V with optimisations | enabled? That would be really awesome if compilers realise what | your code is doing and replace the implementations of | instructions with those instructions. | mnurzia wrote: | Not really, my implementation isn't smart enough to guide | compilers to the right solution. Trivial instructions, like | xor, are of course recognized, but for example the 32x32 mul | implementation isn't. Maybe compilers will be smart enough one | day... | | https://godbolt.org/z/WEcTzKf7M | dbcurtis wrote: | Yeah, well, the rock that breaks your pick in that scenario is | copying all the processor state back and forth to/from the | emulation model, including flag register bits, and also | correctly handling exceptions and faults. Emulating the | instruction's happy path is just scratching the surface. | duskwuff wrote: | > including flag register bits | | RISC-V doesn't have those. Compare+branch is a single | instruction. | sitkack wrote: | In the guest, you trap on reading emulation state, so that | the source of truth is the hardware. Rather than use | something like KVM I wonder if you could run another child | process and use P trace? | [deleted] | sutterbutter wrote: | As a total newb to programming, what am I looking at here? | detrites wrote: | There are several different types of CPU's, in two main | classes, CISC and RISC. The difference is summarised by the | first letter - "Complex" vs. "Reduced" - Instruction Set | Computer. Or, what size "vocabulary" a CPU decodes. | | RISC-V is a type of CPU architecture (a set of plans for how to | build one, not an actual CPU itself), that also happens to be | open source. Anyone can build a RISC-V CPU without having to | buy the rights to do so. (Many are.) | | This project is an _emulation_ of a RISC-V CPU. A kind of | virtual "reference" CPU in software. It can be used to compile | code that can run on a RISC-V type CPU, and to help understand | what's happening inside the CPU when it runs. | | It's written in C, which is and was a very fundamental | programming language that's influenced the design of many other | languages. It is a language that is very close the fundamental | language CPU's natively decode and process. | | CPU's natively use a language referred to as "Assembly", but | which actually has many varieties particular to each CPU | design. Regardless of variety of CPU, assembly is usually is | about as reasonably "close to metal" as it gets. | | It's literally communicating with the CPU directly in its own | language. This makes it extremely fast to run, but laborious to | code, and also somewhat "dangerous" in that with such low-level | control, it's easy to mess things up. | | This project takes an input of a text list of RISC-V assembly | instructions (a "program") and pretends to be RISC-V CPU with | those instructions loaded into it and being run on it. Useful | for understanding, prototyping and building a RISC-V program. | | CPU's are designed rather to run assembly that already "works", | having been created programmatically (compiled or interpreted), | by a higher level language that isn't going to give it things | that make no sense (hopefully). | | So there is not usually a lot of provisioning done in the | design of the CPU to make it easy to watch it and its state | carefully at a low level and examine how your assembly program | is working, or not working. Emulation eases this. | dragonwriter wrote: | > CPU's natively use a language referred to as "Assembly", b | | Strictly, CPUs use machine code. Assembly targeting a | particular CPU is a _very_ thin more-human-readable | abstraction around the underlying machine code, but it is | not, itself, what the CPU executes. That's why "assemblers" | exist - they are compilers from assembly language to machine | code (though, because assembly is a very thin abstraction, | they are much simpler than most other compilers.) | tester756 wrote: | Would calling "Assembly" a CPU's frontend language be | correct? | | The same way as it is in compilers | detrites wrote: | Agree. And deeper than that may be microcode, which we | rarely see or reason about, and while may very much be | there is rarely of practical use. (Ie, when learning, the | distinctions may be somewhat an impediment without payoff.) | bjourne wrote: | Why stick with c89? Can't think of any compilers that doesn't | support c99 nowadays. The major benefit is that you can use | uint8_t and friends directly and don't need to define your own | wrapper types. | flohofwoe wrote: | One "advantage" (if one wants to call that) is that the code | would also compile as C++, while C99 has diverged enough from | the common C/C++ subset that one cannot use all C99 features in | C++ mode. | mnurzia wrote: | I totally missed this, good point. | | Slightly unrelated, but just thought I should mention: the | sokol libraries are awesome! | contrarian1234 wrote: | Did Visual Studio finally make the jump?! (you could always | just compile it as C++ code though) | bjourne wrote: | Nope stdint.h has been in msvc for over 10 years. Other c99 | features may be not supported though. | flohofwoe wrote: | Except for VLAs (which are optional post-C99 anyway), MSVC | actually has pretty good support for recent C versions, and | since 2020 they're basically back on the "modern C" train: | https://devblogs.microsoft.com/cppblog/c11-and-c17-standard | -... | jpfr wrote: | MSVC did a big rewrite of the C frontend around MSVC2013. I | haven't encountered C99 idioms that don't work nowadays. | Granted, I might not use every feature in my typical coding | style... | arp242 wrote: | It's been "fully" C99 (and C11, C17) compliant for about | 2 or 3 years. The only missing C99 featured before that | were relatively rarely used ones like _Pragma. | mort96 wrote: | Hasn't the main issue with MS been VLAs? I seem to recall | that VLAs are the main reason MSVC won't ever support C99, | and that MSVC is one of the main reasons why VLAs were made | optional. It seems like MSVC supports C11 and C17 now, | thanks to the removal of mandatory VLAs. | zabzonk wrote: | vehement oppostion from ms my be one of the reasons for | them being optional (and thus worthless) but the main one | is that that they are impossible to use correctly. what | happens if you make one too big? | mort96 wrote: | I think they could potentially have some very limited | valid use cases, but I agree that a fixed length array | and/or heap allocation is usually much better than VLAs. | | I was mainly just pointing out that MS's lack of C99 | support isn't really a part of keeping C89 alive, | especially now that they officially support C11. | boricj wrote: | Funnily enough, the file rv.h does use stdint.h if available | and contains the following comment: | | > All I want for Christmas is C89 with stdint.h | dezgeg wrote: | I've met several people that seriously think that C89 is the | peak of programming languages and that C99 just brings | misfeatures (like, allowing variable declarations in middle of | basic blocks according to them) | mnurzia wrote: | It's more of a fun exercise, I guess. But I do have experience | with at least one compiler that doesn't support C99: Zilog's | ez80 C compiler. Back in the day I used to program my TI-84+ CE | for fun[0], and the only C solution was a pretty bespoke | C89-only compiler[1] distributed with a community toolchain[2], | which has since switched to clang. It's somewhat irrational, | but in the back of my mind it bugs me if the software I write | can't run on platforms like that. | | [0] https://github.com/mnurzia/chip8-ce | | [1] http://www.zilog.com/docs/appnotes/pb0098.pdf | | [2] https://ce-programming.github.io/toolchain/ | freecodyx wrote: | This proves that at the core. The things we rely on to achieve | great software and life impacting technologies are extremely | simple. The complexity is that how to make them. | numpad0 wrote: | The complexity is in how to distribute dev workload and how to | make it financially viable. No one pays for beautiful works of | art unless it's somehow anchored, tangled and aligned into | their interests. | arcticbull wrote: | The core concepts are generally very straightforward, however | it's always the optimization that adds complexity. That's how | you get the orders of magnitude improvement. This C89 core | definitely doesn't do macro op fusion for instance. | rowanG077 wrote: | The Readme doesn't answer it but I struggle to see why you want a | c implementation of an ISA. | detrites wrote: | Not sure if this was intended, but coming to this as someone | vaguely aware of RISC-V, it's looking like a fantastic form of | documentation for the ISA, that both describes and gives a way | to play with it, but in an intuitive, even fun manner. | | Obviously this works best for someone who already knows C - | but, given it's C89 mitigates against this aspect somewhat. | rowanG077 wrote: | A reference implementation would be in Verilog or VHDL. | Farmadupe wrote: | Considering it's allocation-free, maybe it's an ultralight/ | simulator for checking large quantities of compiler output? | (i.e no VM to create and destroy for every testcase) | | Or the same but for testing some verilog/vhdl CPU implemetation | in a simulator? | | Or since it's only 500SLOC, maybe it's just for fun! | rowanG077 wrote: | Then I would expect a comparison with verilator. | mnurzia wrote: | This is an excellent idea. One limitation of a testing | library of mine, `mptest`, is its inability to sandbox tests. | I may take this idea and develop a more robust (and | potentially parallel) testing framework around it. | nly wrote: | So you can compile and run it on any platform with a C compiler | rowanG077 wrote: | That is just something you can do with C code. That is not a | goal in itself. Why would you want to run a C ISA instead of | just using a standard simulator? Why not use verilator + any | of the open source RISC-V cores? | LoganDark wrote: | Because those are slower, more complex, and more difficult | to understand? | rowanG077 wrote: | I doubt verilator is much slower. The speed of it is | insane. They are indeed more complex and difficult to | understand. But I fail to see how that is a criterium. I | would very much rather include an industry standard | library in comparison to something homegrown. | srgpqt wrote: | Perhaps this could be used to run sandboxed code. Game engines | could safely run mods using something like this, ala QuakeC. | mnurzia wrote: | Definitely. My motivation for writing this was to have a | simple CPU for a virtual game console-like project. I decided | to release it on its own, though. | mcraiha wrote: | For modern game engine you most likely want WebAssembly | support. e.g. Flight Simulator does that | https://flightsimulator.zendesk.com/hc/en- | us/articles/766290... | srgpqt wrote: | Sure, I'd love to see your 600 line webassembly | interpreter. | sitkack wrote: | Run wasm on this core. | bitwize wrote: | I feel myself descending into old-fartitude more and more with | every year. My wife and I were recently involved in a car | accident (no one was hurt). While I was being checked out I | overheard a 20-year-old firefighter exchange Facebook information | with an 18-year-old EMT. I was like, "wait a minute, you guys | seem really young and you still use Facebook? I thought Facebook | was for your grandparents and all the kids now use Snapchat or | TikTok?" | | I get that same feeling now. This kid is 20 and still using C89? | Shouldn't people his age have been reared entirely in the | crystal-spires-and-togas utopia of Rust, with raw pointers and | buffer overruns being mere legends of their people's benighted | past told to them by their elders? | | It's kind of comforting to see young programmers embracing the | old ways, even if it's for hack value only. | sitkack wrote: | I think kids or at least there's the risk of kids seeing old | people romantically reenacting their eight bit micro days and | think that it's some thing besides nostalgia. | | I was kind of the opposite as a kid, if it wasn't crazy | futuristic I didn't want it. So even in the 80s I wanted an | FPGA accelerators in every machine. | mnurzia wrote: | Admittedly, C89 has very little utility, especially among | people my age. For example, my university progresses from | Racket to Java to C++, and has a systems course that partially | teaches C11. Although good for teaching, I don't think those | languages artificially constrained me in the ways that C89 | does. I felt that my programming skills improved the most when | I forced myself to work in such an under-powered language. | | I also like the idea of being able to run my code anywhere, | kind of like Doom. | peterfirefly wrote: | 'switch' is a really, really nice language construct that was | fully implemented long before C89. Using lots of nested 'if's | instead is not a good idea. | hgs3 wrote: | 'switch' is good, but for VM's computed goto is better. | KerrAvon wrote: | depends on the compiler implementation. modern compilers may | be able to treat equivalent switch statements, gotos, and | if/else statements pretty much the same | nsajko wrote: | Only in trivial cases. | sylware wrote: | nested "ifs" are optimized out by compilers. Moreover in the | latest horrible gcc extensions you have the case statement | using a _not_compiler constant expression (you can find the | usage of such horrible gcc extension in linux net code). | mnurzia wrote: | This was my one of my main justifications for making this | design choice, in addition to the (in my opinion) | overwhelming amount of break statements that would result | from using switches. But more importantly, many of the "if" | statements have non-constant or more complex expressions in | them that aren't supported in switch statements in ANSI C. | sylware wrote: | Yep. | | And as you stated, it is important to stay as much as | possible close to c89, because ISO is literaly doing | planned obsolescence, but on a long time cycle (5-10 | years). | | Hopefully risc-v will be a success, and all system | components and interpreters of very-high-level languages | will be rewritten in risc-v assembly and it will become | actually very hard to do planned-obsolescence. | sylware wrote: | A bigger implementation, but has 64bits support: | | https://bellard.org/tinyemu/ | garganzol wrote: | Seeing the RISC-V instructions implemented in the emulator like | that, it comes to my mind that RISC-V is really a reduced | instruction set CPU. | | When compared to AVR 16-bit RISC instruction set, RISC-V looks so | much simpler. (You may be indirectly familiar with AVR | architecture by the household name "Arduino".) | | The intriguing part is that AVR is just a microcontroller, while | RISC-V is intended to be a full-blown CPU. | opencl wrote: | The base instruction set is tiny but there are quite a few | extensions and pretty much every practical implementation | includes at least a few of them. | | i.e. the GD32V microcontrollers implement RV32IMAC, Allwinner | D1 which is a "full-blown" CPU meant to run Linux implements | RV64IMAFDCVU. | | RV32I/RV64I are the base 32/64 bit integer instruction sets and | every letter after that is a different extension. Most of the | extensions are relatively small and simple, but the C | (compressed instructions) extension introduces some decoder | complexity and the V (vector) extension adds several hundred | instructions. | | Though even with all the extensions it is still a very | small/simple ISA by modern standards. | RobotToaster wrote: | Is this designed to be used with some kind of C to VHDL/verilog | transpiler? | RealityVoid wrote: | Not really, think of it like a... CPU emulator? Ish? You have | registers as variables in the program. If you have register a1 | and you are at an instruction adding 1 to it, it will add 1 to | the variable representing a1. So on and so forth. | | This works because, well, memory operations are mostly(all?) a | CPU does so this "core" takes the program and does the same | kind of memory operations the silicon would do, only in SW. ___________________________________________________________________ (page generated 2023-06-10 23:01 UTC)