[HN Gopher] RISC-V Int. Ratifies 15 New Specs, Opening Up New RI...
       RISC-V Int. Ratifies 15 New Specs, Opening Up New RISC-V Design
       Author : snvzz
       Score  : 109 points
       Date   : 2021-12-02 16:05 UTC (6 hours ago)
 (HTM) web link (riscv.org)
 (TXT) w3m dump (riscv.org)
       | FullyFunctional wrote:
       | I have some mixed feelings about most of these. As Jim Keller
       | said, "most of the performance comes from just six instruction
       | and RISC-V has all of those". Adding more instructions will cost
       | area, power, design & verification time, all of which could go to
       | making the existing code go faster.
         | rektide wrote:
         | Your argument is general, but these are very specific areas
         | being served.
         | * RISC-V Vector instructions seem like a huge win for all forms
         | of HPC. x86 is getting vector instructions & the wins have been
         | immense. Rather than a wide range of specific SIMD
         | instructions, vector instructions seem like a far more general
         | & easier to scale up & down implementation strategy. Not
         | everyone has to implement!
         | * RISC-V Hypervisor specifications seem required for modern
         | computing, where VM's are commonplace. Have to have this
         | specification. Not everyone has to implement!
         | * RISC-V Scalar Cryptography specifications providing
         | accelorated cryptography seems like another have to have modern
         | in data-centers.
         | Worth re-iterating what's been said already: extensions are
         | just that: extensions. They're not required. I'm not sure what
         | the current state is, of code detecting & use the accelerated
         | implementation when available, using soft-fallbacks otherwise.
         | For things like cryptography, usually it's a library, openssl
         | or someone, where the library is the reference implementation,
         | with special paths written in for using harware where
         | available.
         | aseipp wrote:
         | These particular extensions come across as "long-tail" things
         | that are probably worth standardizing, IMO. Not every core
         | needs cryptographic acceleration, but the ones that do need it
         | tend to _really_ need it for those cases. Similarly if you need
         | hypervisor mode support, there are basically no alternatives to
         | just having it, and it requires enough software support to the
         | point you probably have to standardize it, if there 's any hope
         | of it working. There's also the advantage that these give a
         | baseline for vendors and software to target instead of rolling
         | their own, within sensibility (though they may choose not to).
         | Some of the other drafted extensions not mentioned here are
         | perhaps more questionable...
         | All three of these are complex enough to definitively increase
         | the design/verification time for any core that implements them,
         | though, that's for sure. (A net effect of this is that while
         | there are tons of simple in-order cores, actual "production"
         | RISC-V cores with features like this will remain rare...)
         | YorkshireSeason wrote:
         | The beauty of a modular instruction set architecture like
         | RISCV's is that you _don 't_ have to implement all of it, only
         | the extensions that make sense for your use case.
         | Aside, Keller's quote is probably partly in jest. If you are in
         | a constrained micro-controller environment something like the
         | ZFinx extension is probably helpful beyond the "just six
         | instructions" for code density. If you are crypto heavy, the
         | crypto extension are going to be more helpful than "just six
         | instructions". If your workload is parallelisable and regular,
         | vectorisation helps you more than "just six instructions" and
         | so on.
         | One size doesn't fit all.
           | bsder wrote:
           | > One size doesn't fit all.
           | True, but a standard that is too malleable isn't really a
           | standard at all.
             | neilalexander wrote:
             | In probably any "open" ISA, vendors/manufacturers are
             | likely to "fork it" and show up with their own extensions
             | anyway. By embracing extensions as a first-class concept,
             | it would seem RISC-V is trying to embrace variance rather
             | than to repeat the mistakes of architectures like amd64
             | (which has multiple "microarchitecture levels" and only the
             | lowest level is truly portable).
               | panick21_ wrote:
               | To a certain extent yes they embrace variance but to a
               | certain extent they don't.
               | The idea is that what is dominates is software. If you
               | add your own extensions, literally all software in the
               | world wont support it. You will need to provide a huge
               | amount of stuff to fully take advantage of that.
               | The availability of software both open and commercial on
               | top of standardized profiles targets should be what
               | manufacturers target.
               | Early on of course, manufactures have provided things
               | that are not standard yet. However over time, does it
               | really make sense to supply your own bit manipulation
               | extension? As the standard grows the waste majority of
               | application should not require or be really improved by
               | proprietary extensions.
               | Of course if somebody comes along and makes a chip that
               | is just vastly better then what anybody else has with
               | some extensions. That could break that paradigm and
               | people might embrace it.
               | brucehoult wrote:
               | A fairly high proportion of extensions (both existing,
               | and simply possible in future in general) are so
               | specialised that you wrap the special instructions inside
               | a function (often within a loop inside that function) and
               | then put that function in a library.
               | You just choose whether to use that version of the
               | library or another one that uses normal instructions.
               | It's no exaggeration to say that many of those extension
               | instructions might exist in only one function in one
               | library on your entire Linux (or Android, FreeBSD,
               | whatever) system.
               | To some extent the Vector extension can be like that. For
               | most programs they'll just pick up vectorised versions of
               | memcpy, strlen and so forth. In other programs (generally
               | ones you compile yourself) you might want to use the
               | vector extension directly -- maybe with auto-
               | vectorisation in time. LLVM can do a bit of that already.
               | Only a few of the extensions have instructions that can
               | profitably weave their way into every part of your code.
               | The Bitmanip extension is like that. You _really_ want to
               | know whether your target processor has B or not.
             | snvzz wrote:
             | If you're building a chip for a server, workstation,
             | laptop, smartphone, then you'll want to adhere to a
             | platform spec profile.
             | RVA22[0] is the first such profile, and among other
             | important things which go a long way to ease cross-vendor
             | software compatibility, it does require RVA22U and RVA22S,
             | which in turn require a set of extensions.
             | [0]: https://github.com/riscv/riscv-platform-
             | specs/blob/main/risc...
           | bee_rider wrote:
           | How does the modular instruction set work? If someone
           | proposes an extension, is the onus on them to also provide a
           | minimal RISCV implementation of that functionality? Or is it
           | just accepted that some binaries won't work on all devices?
             | forty wrote:
             | I have no idea for Riscv specifically, but x86/amd64 have a
             | lot of optional instructions (I'm mostly aware of vector
             | stuff like SSE, AVX but I'm sure there are other stuff).
             | On the programming side, you can detect at runtime feature
             | support and use specific code path accordingly, or decide
             | at compile time that you require a specific CPU feature and
             | then your binary will just not work on CPUs without the
             | feature.
             | Pet_Ant wrote:
             | > Or is it just accepted that some binaries won't work on
             | all devices?
             | Yes. Just like you cannot run Pentium code on a 386 because
             | they added new extensions. Or how Scheme isn't really a
             | programming language but more like a _family_ of very
             | nearly compatible languagues. RISCV has multiple targets
             | and so so they have very different needs from embedded
             | automotive to desktop. But with a common core is easier to
             | develop and share tooling.
             | [deleted]
             | YorkshireSeason wrote:
             | It's best to think of RISCV not as a single ISA (=
             | instruction set architecture), but a parametric ISA. The
             | extensions are parameters.
             | RISCV offers lots of official extensions to choose from,
             | such as M, A, F, D, P, V, .... In addition you have the 32
             | vs 64 bit data width parameter. Any specific ISA will have
             | to instantiate those parameters, like e.g. so: _RISCV32MFP_
             | or _RISCV64MAF_. Any implementation of e.g. _RISCV64MAF_
             | will have to implement in silicon exactly those assembly
             | command (and supporting features) that the M, A and F
             | extension demand, with 64 bit register width.
             | Like in OO-programming the class constructors take
             | arguments that parameterise the created object.
             | ------
             | Regarding an implementation, given that RISCV is an ISA,
             | not an ISA implementation, you need to provide a functional
             | model. The official standard is [1] but it's a bit behind
             | the ratified extensions. For example [2] defines the (ISA-
             | visible) registers, while [3] gives you the instruction
             | decoding and execution clause for the most base instruction
             | set. [4] describes part of one of the available address
             | translation modes (for the 32 bit variant of the ISA).
             | Note: in modern processors page-table walks are hardware
             | accelerated, so OS and processor need to use the same
             | format here, which is why this is part of the ISA.
             | [1] https://github.com/riscv/sail-riscv/tree/master/model
             | [2] https://github.com/riscv/sail-
             | riscv/blob/master/model/riscv_...
             | [3] https://github.com/riscv/sail-
             | riscv/blob/master/model/riscv_...
             | [4] https://github.com/riscv/sail-
             | riscv/blob/master/model/riscv_...
             | panick21_ wrote:
             | The way it works is that there are profiles. The idea
             | behind profiles is that different use cases define profiles
             | with the instruction extensions the require or are optional
             | and so on.
             | So the major Linux distros agree on a set of instructions
             | and that's called a profile. Same for embedded and others
             | eventually.
             | You can add your own extensions for yourself if you want.
             | You can also make extentions and try to make it a sudo
             | standard. Or you can attempt to make it into a standard
             | extention.
             | To be a standard extension it has to go threw a long
             | process and it will likely be tapped out multiple times
             | before it is ever ratified. Once its ratified it will find
             | its way into profiles.
             | So for example standard Linux distros now use RV64GC,
             | likely the next version of the Linux profile will include
             | more of the new instructions.
             | But yes, the goal is not to create a 'universal binary'.
             | But a reasonable compromise between reuse and
             | specialization.
           | FullyFunctional wrote:
           | I think you misunderstood what he said and I know he wasn't
           | joking, but I didn't point out that the implied context was
           | for Tenstorrent's usage, thus data center. He didn't mean
           | that you just need six instructions (eg. Turing tarpit), he
           | meant (and he's right) that the bulk of [integer] performance
           | comes from a very small set of instructions, most critically
           | loads and conditional branches.
           | All of the discussed extensions helps _specific_ workloads,
           | but unless your workload is, say, 100% encryption all the
           | time, then the crypto extension will only provide a trivial
           | improvement on the _overall_ performance.
           | Vector is a little bit different, but it (like AVX2/512)
           | comes at a _very_ significant cost and you better have
           | software that can take advantage of it.
             | panick21_ wrote:
             | The whole point of RISC-V is to be a universal architecture
             | used for everything. The idea is to have profiles for
             | different verticals and application. In these profiles you
             | define what extensions you need.
             | If there is really a significant win for a certain type of
             | server workloads, that community will make its own profile
             | and hopefully be able to get chips that utilize that.
             | The problem is that there are also many mixed workloads and
             | having lots of general compute can work pretty well if you
             | want to run a broad set of extinctions.
             | RISC-V is sort of a fluid spectrum from highly specialized
             | to highly general depending on the use case.
       | BenoitP wrote:
       | I don't see the J Extension on here. Does anyone know what's the
       | state of work of that group?
       | (J Extension is about dynamic languages acceleration; stuff like
       | code caches, and maybe providing GCs some help. I guess that's
       | new territory so it's not as straightforward compared to say, the
       | bitmanip extension)
       | Pet_Ant wrote:
       | Is there a full list of what was ratified?
       | Wikipedia only lists 6 as frozen, so where did the others come
       | from? https://en.wikipedia.org/wiki/RISC-V#Design
         | stephano wrote:
         | https://wiki.riscv.org/display/TECH/Recently+Ratified+Extens...
         | Updated versions of the Privileged and Unprivileged Spec PDFs
         | will be posted to riscv.org/specifications soon.
           | Pet_Ant wrote:
           | For convenience:                   * PMP Enhancements for
           | memory access and execution prevention on Machine mode
           | (Smepmp)                     * RISC-V Base Cache Management
           | Operation ISA Extensions
           | * RISC-V Bit-Manipulation ISA-extensions
           | * RISC-V Count Overflow and Mode-Based Filtering Extension
           | * RISC-V Cryptography Extensions Volume I: Scalar & Entropy
           | Source Instructions                             * RISC-V
           | State Enable Extension
           | * RISC-V "stimecmp / vstimecmp" Extension
           | * RISC-V Vector Extension
           | * The RISC-V Instruction Set Manual Volume II: Privileged
           | Architecture                                      * "Zfh" and
           | "Zfhmin" Standard Extensions for Half-Precision Floating-
           | Point                                 * "Zfinx", "Zdinx",
           | "Zhinx", "Zhinxmin": Standard Extensions for Floating-Point
           | in Integer Registers
       | kiwidrew wrote:
       | Oh boy, give it a few more years and the RISC-V architecture is
       | going to have as many extensions as XMPP! Yay for
       | interoperability!
         | ghaff wrote:
         | Was on a call ahead of the RISC-V Summit last night where the
         | topic came up.
         | Not to name drop but here's what David Patterson had to say
         | (he's vice chair of RISC-V BoD among other things).
         | "One of brilliant features of RISC-v is modularity. Everyone
         | wants an ecosystem that is adaptable but runs standard
         | software. Defining profiles and platforms is the next thing on
         | their slate. Binary compatibility is not the overwhelming thing
         | in the SoC world that it was with microprocessors. Flexibility
         | is one of the various attractive features of RISC-V."
         | The idea with profiles is that you create groupings of modules
         | aimed at a specific use case.
         | So, yes, there needs to be some balancing of flexibility and
         | compatibility/interoperability and there are concerns around
         | this. (One of the processor analysts brought this up.) But
         | people are aware and thinking about it.
         | hajile wrote:
         | When they say R64GC, the C is compressed while the G is short
         | for I, M, A, F, D, Z, icsr, and Zifencei.
         | ARM does something similar. They have TONS of extensions, but
         | then group them into 8.0, 8.1, 8.2, etc then also group them
         | with the A, R, and M designators too.
       | ufo wrote:
       | My memory is failing me... Is the scalar cryptography extension
       | include the one that has the bitwise manipulation (rotations,
       | etc) or is it that a separate spec?
         | Pet_Ant wrote:
         | Yoe maybe interested in the just ratified "RISC-V Bit-
         | Manipulation ISA-extensions" https://github.com/riscv/riscv-
         | bitmanip/releases/download/1....
         | bem94 wrote:
         | There is some overlap. There's the "Zbkb" (horrible name, I
         | know) extension which contains a subset of instructions from
         | the larger bitmanip extensions which are very useful for
         | cryptography.
         | The more general bitmanip extensions contain other things
         | useful for e.g. address arithmetic. These are somewhat
         | orthogonal to scalar crypto.
       | ufo wrote:
       | I'd love to hear what people have to say about the vector
       | instructions. I've always found that SIMD on x86was quite clunky
       | and I heard risc-v vectors are very different from that. Is that
       | true?
         | d_tr wrote:
         | The extension is agnostic with respect to the actual width of
         | the chip's registers, and you also won't have to separately
         | account for the "last iteration" where you have not enough
         | elements to fill a register, or at least it will be more
         | convenient. It also has strided load and store as well as
         | scatter and gather.
         | This is all I remember, there is probably more.
         | _chris_ wrote:
         | Very different. RISC-V's vectors (RVV) are "variable length",
         | so the programmer can request a length and the machine tells
         | you what it can give you. Different machine versions can change
         | the underlying vector size and the code Will Just Work.
         | This is different from "fixed-width SIMD" which has a hard-
         | coded vector length. To make things more challenging for the
         | programmer/compiler, I believe most x86 SIMD versions also
         | don't provide a "mask" register, so you're stuck with using all
         | vector elements (AVX512 added masks).
         | Each has its advantages and disadvantages (esp. on the design
         | complexity vs programmer/compiler interface complexity).
         | RVV also provides a mechanism to reconfigure the register file,
         | ganging logical registers together to get longer effective
         | vector lengths.
           | volta83 wrote:
           | Can you change the "shape of the vectors? e.g. 1x16 vs 4x4 to
           | support vectors and matrices?
             | crest wrote:
             | You have widening operations e.g. 16x16->32 bit
             | multiplications and can reduce number of available
             | registers to get longer vectors, but among the really
             | interesting ones are fault only first load and masked
             | instructions that enable the vector unit to work on things
             | like null terminated strings. The specification includes
             | vectorized strlen/strcmp/strcpy/strncpy implementations as
             | examples. Most existing (packed) SIMD instruction sets
             | aren't useful for these common functions.
         | sanxiyn wrote:
         | Yes it is. On x86, SSE is 128-bit and AVX is 256-bit and
         | AVX-512 is 512-bit. RISC-V V extension handles all vector
         | lengths uniformly: vector add is the same instruction no matter
         | vector length.
           | petermcneeley wrote:
           | What about a vector of 1 element
             | brucehoult wrote:
             | Yes, no problem.
             | Machines with any size vector registers handle code
             | specifying vector length of 1 (or 0!) no problem.
             | If you really want to make a machine with vector registers
             | that hold only one element then that will work too, except
             | for a handful of instructions that simply don't make sense
             | in that case (unless you use the LMUL feature): vector
             | permute register, slide up, slide down.
             | CPUs intended to run standard operating systems with
             | shrink-wrapped software are constrained in the RVA22
             | profile to provide vector registers of at least 128 bits
             | and no more than 65536 bits. But if you're doing some
             | custom embedded custom CPU then you can make the vector
             | registers the same size as the integer registers (32 or 64
             | bits). Note that if you do that, you can still usefully do
             | vector operations on chars and shorts, and you can also set
             | LMUL=8 to give you effectively four vector registers of 256
             | or 512 bits each (which might or migth not be processed
             | serially).
       | bem94 wrote:
       | Direct links to some of the latest specs:
       | - Scalar crypto: https://github.com/riscv/riscv-crypto/releases
       | - Vectors: https://github.com/riscv/riscv-v-spec/releases
       | - Bitmanip: https://github.com/riscv/riscv-bitmanip/releases
         | kevin_thibedeau wrote:
         | I don't see why crypto can't just be a peripheral. Here's a
         | block of memory and a key. Tell me when you're done.
           | crest wrote:
           | There are lots of good reasons to make cryptographic
           | operations instructions instead of a memory mapped
           | peripheral, but I prefer something like VIA padlock which
           | implemented cipher modes instead of just implementing the
           | round function as instruction. Any implementation could even
           | trap those and implement them in a peripheral. The problem
           | with memory mapped peripherals is that access to them has to
           | be multiplexed and their state preserved by context switches.
           | Specialized instruction on existing registers avoid this
           | problem. VIA padlock solved it by piggybacking on the
           | existing x86 REP prefix for interruptible string instructions
           | and only cached the cipher round keys in the crypto unit
           | reloading them from memory (or repeating the key schedule)
           | after a context switch.
           | bem94 wrote:
           | In lots of places this makes sense. E.g. lots of embedded ARM
           | platforms have a separate AES / ECC accelerator peripheral.
           | The trouble comes when you need to share access to a memory
           | mapped peripheral among multiple threads/processes/users etc.
           | It can be done, but it's usually easier to manage CPU
           | registers than peripheral devices for things like crypto
           | operations in larger systems. Plus, you have to do access
           | control to the peripheral (so other processes don't try and
           | steal your key), if its all within the security boundary of a
           | "normal" process, you get that (mostly) for free.
           | All of the above has caveats and exceptions, but generally
           | (ARM, SPARC, x86, now RISC-V) take this approach.
         | Symmetry wrote:
         | Huh, I'd heard that the Bitmanip extension would have a
         | conditional move but I don't see it in this version.
         | chem83 wrote:
         | Hypervisor seems to be covered here:
         | https://github.com/riscv/riscv-isa-manual/blob/master/src/hy...
       (page generated 2021-12-02 23:01 UTC)