[HN Gopher] RISC-V Int. Ratifies 15 New Specs, Opening Up New RI... ___________________________________________________________________ RISC-V Int. Ratifies 15 New Specs, Opening Up New RISC-V Design Possibilities Author : snvzz Score : 109 points Date : 2021-12-02 16:05 UTC (6 hours ago) (HTM) web link (riscv.org) (TXT) w3m dump (riscv.org) | FullyFunctional wrote: | I have some mixed feelings about most of these. As Jim Keller | said, "most of the performance comes from just six instruction | and RISC-V has all of those". Adding more instructions will cost | area, power, design & verification time, all of which could go to | making the existing code go faster. | rektide wrote: | Your argument is general, but these are very specific areas | being served. | | * RISC-V Vector instructions seem like a huge win for all forms | of HPC. x86 is getting vector instructions & the wins have been | immense. Rather than a wide range of specific SIMD | instructions, vector instructions seem like a far more general | & easier to scale up & down implementation strategy. Not | everyone has to implement! | | * RISC-V Hypervisor specifications seem required for modern | computing, where VM's are commonplace. Have to have this | specification. Not everyone has to implement! | | * RISC-V Scalar Cryptography specifications providing | accelorated cryptography seems like another have to have modern | in data-centers. | | Worth re-iterating what's been said already: extensions are | just that: extensions. They're not required. I'm not sure what | the current state is, of code detecting & use the accelerated | implementation when available, using soft-fallbacks otherwise. | For things like cryptography, usually it's a library, openssl | or someone, where the library is the reference implementation, | with special paths written in for using harware where | available. | aseipp wrote: | These particular extensions come across as "long-tail" things | that are probably worth standardizing, IMO. Not every core | needs cryptographic acceleration, but the ones that do need it | tend to _really_ need it for those cases. Similarly if you need | hypervisor mode support, there are basically no alternatives to | just having it, and it requires enough software support to the | point you probably have to standardize it, if there 's any hope | of it working. There's also the advantage that these give a | baseline for vendors and software to target instead of rolling | their own, within sensibility (though they may choose not to). | Some of the other drafted extensions not mentioned here are | perhaps more questionable... | | All three of these are complex enough to definitively increase | the design/verification time for any core that implements them, | though, that's for sure. (A net effect of this is that while | there are tons of simple in-order cores, actual "production" | RISC-V cores with features like this will remain rare...) | YorkshireSeason wrote: | The beauty of a modular instruction set architecture like | RISCV's is that you _don 't_ have to implement all of it, only | the extensions that make sense for your use case. | | Aside, Keller's quote is probably partly in jest. If you are in | a constrained micro-controller environment something like the | ZFinx extension is probably helpful beyond the "just six | instructions" for code density. If you are crypto heavy, the | crypto extension are going to be more helpful than "just six | instructions". If your workload is parallelisable and regular, | vectorisation helps you more than "just six instructions" and | so on. | | One size doesn't fit all. | bsder wrote: | > One size doesn't fit all. | | True, but a standard that is too malleable isn't really a | standard at all. | neilalexander wrote: | In probably any "open" ISA, vendors/manufacturers are | likely to "fork it" and show up with their own extensions | anyway. By embracing extensions as a first-class concept, | it would seem RISC-V is trying to embrace variance rather | than to repeat the mistakes of architectures like amd64 | (which has multiple "microarchitecture levels" and only the | lowest level is truly portable). | panick21_ wrote: | To a certain extent yes they embrace variance but to a | certain extent they don't. | | The idea is that what is dominates is software. If you | add your own extensions, literally all software in the | world wont support it. You will need to provide a huge | amount of stuff to fully take advantage of that. | | The availability of software both open and commercial on | top of standardized profiles targets should be what | manufacturers target. | | Early on of course, manufactures have provided things | that are not standard yet. However over time, does it | really make sense to supply your own bit manipulation | extension? As the standard grows the waste majority of | application should not require or be really improved by | proprietary extensions. | | Of course if somebody comes along and makes a chip that | is just vastly better then what anybody else has with | some extensions. That could break that paradigm and | people might embrace it. | brucehoult wrote: | A fairly high proportion of extensions (both existing, | and simply possible in future in general) are so | specialised that you wrap the special instructions inside | a function (often within a loop inside that function) and | then put that function in a library. | | You just choose whether to use that version of the | library or another one that uses normal instructions. | | It's no exaggeration to say that many of those extension | instructions might exist in only one function in one | library on your entire Linux (or Android, FreeBSD, | whatever) system. | | To some extent the Vector extension can be like that. For | most programs they'll just pick up vectorised versions of | memcpy, strlen and so forth. In other programs (generally | ones you compile yourself) you might want to use the | vector extension directly -- maybe with auto- | vectorisation in time. LLVM can do a bit of that already. | | Only a few of the extensions have instructions that can | profitably weave their way into every part of your code. | The Bitmanip extension is like that. You _really_ want to | know whether your target processor has B or not. | snvzz wrote: | If you're building a chip for a server, workstation, | laptop, smartphone, then you'll want to adhere to a | platform spec profile. | | RVA22[0] is the first such profile, and among other | important things which go a long way to ease cross-vendor | software compatibility, it does require RVA22U and RVA22S, | which in turn require a set of extensions. | | [0]: https://github.com/riscv/riscv-platform- | specs/blob/main/risc... | bee_rider wrote: | How does the modular instruction set work? If someone | proposes an extension, is the onus on them to also provide a | minimal RISCV implementation of that functionality? Or is it | just accepted that some binaries won't work on all devices? | forty wrote: | I have no idea for Riscv specifically, but x86/amd64 have a | lot of optional instructions (I'm mostly aware of vector | stuff like SSE, AVX but I'm sure there are other stuff). | | On the programming side, you can detect at runtime feature | support and use specific code path accordingly, or decide | at compile time that you require a specific CPU feature and | then your binary will just not work on CPUs without the | feature. | Pet_Ant wrote: | > Or is it just accepted that some binaries won't work on | all devices? | | Yes. Just like you cannot run Pentium code on a 386 because | they added new extensions. Or how Scheme isn't really a | programming language but more like a _family_ of very | nearly compatible languagues. RISCV has multiple targets | and so so they have very different needs from embedded | automotive to desktop. But with a common core is easier to | develop and share tooling. | [deleted] | YorkshireSeason wrote: | It's best to think of RISCV not as a single ISA (= | instruction set architecture), but a parametric ISA. The | extensions are parameters. | | RISCV offers lots of official extensions to choose from, | such as M, A, F, D, P, V, .... In addition you have the 32 | vs 64 bit data width parameter. Any specific ISA will have | to instantiate those parameters, like e.g. so: _RISCV32MFP_ | or _RISCV64MAF_. Any implementation of e.g. _RISCV64MAF_ | will have to implement in silicon exactly those assembly | command (and supporting features) that the M, A and F | extension demand, with 64 bit register width. | | Like in OO-programming the class constructors take | arguments that parameterise the created object. | ------ | | Regarding an implementation, given that RISCV is an ISA, | not an ISA implementation, you need to provide a functional | model. The official standard is [1] but it's a bit behind | the ratified extensions. For example [2] defines the (ISA- | visible) registers, while [3] gives you the instruction | decoding and execution clause for the most base instruction | set. [4] describes part of one of the available address | translation modes (for the 32 bit variant of the ISA). | Note: in modern processors page-table walks are hardware | accelerated, so OS and processor need to use the same | format here, which is why this is part of the ISA. | | [1] https://github.com/riscv/sail-riscv/tree/master/model | | [2] https://github.com/riscv/sail- | riscv/blob/master/model/riscv_... | | [3] https://github.com/riscv/sail- | riscv/blob/master/model/riscv_... | | [4] https://github.com/riscv/sail- | riscv/blob/master/model/riscv_... | panick21_ wrote: | The way it works is that there are profiles. The idea | behind profiles is that different use cases define profiles | with the instruction extensions the require or are optional | and so on. | | So the major Linux distros agree on a set of instructions | and that's called a profile. Same for embedded and others | eventually. | | You can add your own extensions for yourself if you want. | You can also make extentions and try to make it a sudo | standard. Or you can attempt to make it into a standard | extention. | | To be a standard extension it has to go threw a long | process and it will likely be tapped out multiple times | before it is ever ratified. Once its ratified it will find | its way into profiles. | | So for example standard Linux distros now use RV64GC, | likely the next version of the Linux profile will include | more of the new instructions. | | But yes, the goal is not to create a 'universal binary'. | But a reasonable compromise between reuse and | specialization. | FullyFunctional wrote: | I think you misunderstood what he said and I know he wasn't | joking, but I didn't point out that the implied context was | for Tenstorrent's usage, thus data center. He didn't mean | that you just need six instructions (eg. Turing tarpit), he | meant (and he's right) that the bulk of [integer] performance | comes from a very small set of instructions, most critically | loads and conditional branches. | | All of the discussed extensions helps _specific_ workloads, | but unless your workload is, say, 100% encryption all the | time, then the crypto extension will only provide a trivial | improvement on the _overall_ performance. | | Vector is a little bit different, but it (like AVX2/512) | comes at a _very_ significant cost and you better have | software that can take advantage of it. | panick21_ wrote: | The whole point of RISC-V is to be a universal architecture | used for everything. The idea is to have profiles for | different verticals and application. In these profiles you | define what extensions you need. | | If there is really a significant win for a certain type of | server workloads, that community will make its own profile | and hopefully be able to get chips that utilize that. | | The problem is that there are also many mixed workloads and | having lots of general compute can work pretty well if you | want to run a broad set of extinctions. | | RISC-V is sort of a fluid spectrum from highly specialized | to highly general depending on the use case. | BenoitP wrote: | I don't see the J Extension on here. Does anyone know what's the | state of work of that group? | | (J Extension is about dynamic languages acceleration; stuff like | code caches, and maybe providing GCs some help. I guess that's | new territory so it's not as straightforward compared to say, the | bitmanip extension) | Pet_Ant wrote: | Is there a full list of what was ratified? | | Wikipedia only lists 6 as frozen, so where did the others come | from? https://en.wikipedia.org/wiki/RISC-V#Design | stephano wrote: | https://wiki.riscv.org/display/TECH/Recently+Ratified+Extens... | | Updated versions of the Privileged and Unprivileged Spec PDFs | will be posted to riscv.org/specifications soon. | Pet_Ant wrote: | For convenience: * PMP Enhancements for | memory access and execution prevention on Machine mode | (Smepmp) * RISC-V Base Cache Management | Operation ISA Extensions | * RISC-V Bit-Manipulation ISA-extensions | * RISC-V Count Overflow and Mode-Based Filtering Extension | * RISC-V Cryptography Extensions Volume I: Scalar & Entropy | Source Instructions * RISC-V | State Enable Extension | * RISC-V "stimecmp / vstimecmp" Extension | * RISC-V Vector Extension | * The RISC-V Instruction Set Manual Volume II: Privileged | Architecture * "Zfh" and | "Zfhmin" Standard Extensions for Half-Precision Floating- | Point * "Zfinx", "Zdinx", | "Zhinx", "Zhinxmin": Standard Extensions for Floating-Point | in Integer Registers | kiwidrew wrote: | Oh boy, give it a few more years and the RISC-V architecture is | going to have as many extensions as XMPP! Yay for | interoperability! | ghaff wrote: | Was on a call ahead of the RISC-V Summit last night where the | topic came up. | | Not to name drop but here's what David Patterson had to say | (he's vice chair of RISC-V BoD among other things). | | "One of brilliant features of RISC-v is modularity. Everyone | wants an ecosystem that is adaptable but runs standard | software. Defining profiles and platforms is the next thing on | their slate. Binary compatibility is not the overwhelming thing | in the SoC world that it was with microprocessors. Flexibility | is one of the various attractive features of RISC-V." | | The idea with profiles is that you create groupings of modules | aimed at a specific use case. | | So, yes, there needs to be some balancing of flexibility and | compatibility/interoperability and there are concerns around | this. (One of the processor analysts brought this up.) But | people are aware and thinking about it. | hajile wrote: | When they say R64GC, the C is compressed while the G is short | for I, M, A, F, D, Z, icsr, and Zifencei. | | ARM does something similar. They have TONS of extensions, but | then group them into 8.0, 8.1, 8.2, etc then also group them | with the A, R, and M designators too. | ufo wrote: | My memory is failing me... Is the scalar cryptography extension | include the one that has the bitwise manipulation (rotations, | etc) or is it that a separate spec? | Pet_Ant wrote: | Yoe maybe interested in the just ratified "RISC-V Bit- | Manipulation ISA-extensions" https://github.com/riscv/riscv- | bitmanip/releases/download/1.... | bem94 wrote: | There is some overlap. There's the "Zbkb" (horrible name, I | know) extension which contains a subset of instructions from | the larger bitmanip extensions which are very useful for | cryptography. | | The more general bitmanip extensions contain other things | useful for e.g. address arithmetic. These are somewhat | orthogonal to scalar crypto. | ufo wrote: | I'd love to hear what people have to say about the vector | instructions. I've always found that SIMD on x86was quite clunky | and I heard risc-v vectors are very different from that. Is that | true? | d_tr wrote: | The extension is agnostic with respect to the actual width of | the chip's registers, and you also won't have to separately | account for the "last iteration" where you have not enough | elements to fill a register, or at least it will be more | convenient. It also has strided load and store as well as | scatter and gather. | | This is all I remember, there is probably more. | _chris_ wrote: | Very different. RISC-V's vectors (RVV) are "variable length", | so the programmer can request a length and the machine tells | you what it can give you. Different machine versions can change | the underlying vector size and the code Will Just Work. | | This is different from "fixed-width SIMD" which has a hard- | coded vector length. To make things more challenging for the | programmer/compiler, I believe most x86 SIMD versions also | don't provide a "mask" register, so you're stuck with using all | vector elements (AVX512 added masks). | | Each has its advantages and disadvantages (esp. on the design | complexity vs programmer/compiler interface complexity). | | RVV also provides a mechanism to reconfigure the register file, | ganging logical registers together to get longer effective | vector lengths. | volta83 wrote: | Can you change the "shape of the vectors? e.g. 1x16 vs 4x4 to | support vectors and matrices? | crest wrote: | You have widening operations e.g. 16x16->32 bit | multiplications and can reduce number of available | registers to get longer vectors, but among the really | interesting ones are fault only first load and masked | instructions that enable the vector unit to work on things | like null terminated strings. The specification includes | vectorized strlen/strcmp/strcpy/strncpy implementations as | examples. Most existing (packed) SIMD instruction sets | aren't useful for these common functions. | sanxiyn wrote: | Yes it is. On x86, SSE is 128-bit and AVX is 256-bit and | AVX-512 is 512-bit. RISC-V V extension handles all vector | lengths uniformly: vector add is the same instruction no matter | vector length. | petermcneeley wrote: | What about a vector of 1 element | brucehoult wrote: | Yes, no problem. | | Machines with any size vector registers handle code | specifying vector length of 1 (or 0!) no problem. | | If you really want to make a machine with vector registers | that hold only one element then that will work too, except | for a handful of instructions that simply don't make sense | in that case (unless you use the LMUL feature): vector | permute register, slide up, slide down. | | CPUs intended to run standard operating systems with | shrink-wrapped software are constrained in the RVA22 | profile to provide vector registers of at least 128 bits | and no more than 65536 bits. But if you're doing some | custom embedded custom CPU then you can make the vector | registers the same size as the integer registers (32 or 64 | bits). Note that if you do that, you can still usefully do | vector operations on chars and shorts, and you can also set | LMUL=8 to give you effectively four vector registers of 256 | or 512 bits each (which might or migth not be processed | serially). | bem94 wrote: | Direct links to some of the latest specs: | | - Scalar crypto: https://github.com/riscv/riscv-crypto/releases | | - Vectors: https://github.com/riscv/riscv-v-spec/releases | | - Bitmanip: https://github.com/riscv/riscv-bitmanip/releases | kevin_thibedeau wrote: | I don't see why crypto can't just be a peripheral. Here's a | block of memory and a key. Tell me when you're done. | crest wrote: | There are lots of good reasons to make cryptographic | operations instructions instead of a memory mapped | peripheral, but I prefer something like VIA padlock which | implemented cipher modes instead of just implementing the | round function as instruction. Any implementation could even | trap those and implement them in a peripheral. The problem | with memory mapped peripherals is that access to them has to | be multiplexed and their state preserved by context switches. | Specialized instruction on existing registers avoid this | problem. VIA padlock solved it by piggybacking on the | existing x86 REP prefix for interruptible string instructions | and only cached the cipher round keys in the crypto unit | reloading them from memory (or repeating the key schedule) | after a context switch. | bem94 wrote: | In lots of places this makes sense. E.g. lots of embedded ARM | platforms have a separate AES / ECC accelerator peripheral. | | The trouble comes when you need to share access to a memory | mapped peripheral among multiple threads/processes/users etc. | It can be done, but it's usually easier to manage CPU | registers than peripheral devices for things like crypto | operations in larger systems. Plus, you have to do access | control to the peripheral (so other processes don't try and | steal your key), if its all within the security boundary of a | "normal" process, you get that (mostly) for free. | | All of the above has caveats and exceptions, but generally | (ARM, SPARC, x86, now RISC-V) take this approach. | Symmetry wrote: | Huh, I'd heard that the Bitmanip extension would have a | conditional move but I don't see it in this version. | chem83 wrote: | Hypervisor seems to be covered here: | https://github.com/riscv/riscv-isa-manual/blob/master/src/hy... ___________________________________________________________________ (page generated 2021-12-02 23:01 UTC)