[HN Gopher] ARM chips have an instruction with JavaScript in the... ___________________________________________________________________ ARM chips have an instruction with JavaScript in the name Author : kdeldycke Score : 216 points Date : 2020-10-17 07:45 UTC (15 hours ago) (HTM) web link (stackoverflow.com) (TXT) w3m dump (stackoverflow.com) | chubot wrote: | Emery Berger argues that the systems community should be doing | exactly this -- improving infrastructure to run JS and Python | workloads: | | https://blog.sigplan.org/2020/10/12/from-heavy-metal-to-irra... | | _We need to incorporate JavaScript and Python workloads into our | evaluations. There are already standard benchmark suites for | JavaScript performance in the browser, and we can include | applications written in node.js (server-side JavaScript), Python | web servers, and more. This is where cycles are being spent | today, and we need evaluation that matches modern workloads. For | example, we should care less if a proposed mobile chip or | compiler optimization slows down SPEC, and care more if it speeds | up Python or JavaScript!_ | FridgeSeal wrote: | "We need more performance, should we fix the underlying | performance problem in our software? No, we should design our | CPU's to accommodate our slow, bottlenecked language!" | | Asking for CPU features to speed up Python is like trying to | strap a rocket to horse and cart. Not the best way to go | faster. We should focus on language design and tooling that | makes it easier to write in "fast" languages, rather than | bending over backwards to accommodate things like Python, which | have so much more low-hanging fruit in terms of performance. | throwaway_pdp09 wrote: | Please No. Let the hardware do best what it's good at, being | simple and running fast. Let the interpreter/compiler layer do | its thing best, flexibility. There have been attempts to move | the hardware 'upwards' to meet the software and it's not | generally worked well. No special purpose language supporting | hardware exists now that I'm aware of - lisp machines, | smalltalk machines, Rekursiv, stretch, that 1980s object | oriented car crash by intel whose name escapes me... | | Edited to be a touch less strident. | matthewmacleod wrote: | _Let the hardware do best what it 's good at, being simple | and running fast. Let the interpreter/compiler layer do its | thing best, flexibility._ | | Yeah, this is pretty much the opposite of what actually works | in practice for general-purpose processors though - otherwise | we'd all be using VLIW processors. | throwaway_pdp09 wrote: | The complex processors like the PDPs were followed by risc | processors because they were simpler. The hardwrae has to | run code, I get that, but VLIW didn't work. Risc did. The | x86 decompiles its opcodes into micro-ops which are | load/store risc-y simple things. Simplicity was always the | way to go. | | I do take your point about VLIW, but I'm kind of assuming | that the CPU has to, you know, actually run real workloads. | So move the complexity out of the languages. Or strongly, | statically type them. Or just don't use JS for server-side | work. Don't make the hardware guys pick up after bad | software. | klelatti wrote: | When RISC first appeared they really were simpler than | competing designs. | | But today I think it's hard to argue that modern | pipelined, out of order processors with hundreds of | millions of transistors are in any sense 'simple'. | | If there is a general lesson to be learned it's that the | processor is often best placed to optimise on the fly | rather than have the compiler try to do it (VLIW) or | trying to fit a complex ISA to match the high level | language you're running. | saagarjha wrote: | You do understand that current hardware exists to support C, | right? | throwaway_pdp09 wrote: | Yep. They have a compiler to bring it down to the metal so | IDK what you're saying. | | --- EDIT --- | | @saagarjha, as I'm being slowposted by HN, here's my | response via edit: | | OK, sure! You need some agreed semantics for _that_ , at | the low level. But the hardware guys aren't likely to add | actors in the silicon. And they presumably don't intend to | support eg. hardware level malloc, nor hardware level | general expression evaluation[0], nor hardware level | function calling complete with full argument handling, nor | fopen, nor much more. | | BTW "The metal which largely respects C's semantics?" C | semantics were modelled after real machinery, which is why | C has variables which can be assigned to, and arrays which | follow very closely actual memory layout, and pointers | which are for the hardware's address handling. If the C | designers could follow theory rather than hardware, well, | look at lisp. | | [0] IIRC the PDPs had polynomial evaluation in hardware. | saagarjha wrote: | The metal which largely respects C's semantics? For | example, here are some instructions that exist to match | C's atomics model: | https://developer.arm.com/documentation/den0024/a/Memory- | Ord... | FullyFunctional wrote: | [0] Close, it was the VAX-11 and is the poster child for | CISC madness. | kps wrote: | What aspect of currently popular CPU instruction sets | 'exists to support C'? | saagarjha wrote: | Here's a (doubly-indirected) example: | https://news.ycombinator.com/item?id=24813376 | kps wrote: | And what about that has anything to do with C | specifically? Every useful programming language requires | cause precede effect, and every architecture that allows | load-store reordering has memory barrier instructions. | Specifically, where would code written in C require the | compiler to generate one of these instructions, where | code hand-written for the process's native instruction | set would not? | saagarjha wrote: | It matches C's semantics exactly, to the point where ARM | chose a specific acquire/release to match the "sequential | consistency for data-race-free programs" model without | requiring any global barriers or unnecessarily strong | guarantees, while still allowing reordering. | | (I should note that I believe this is actually C++'s | memory model that C is using as well, and perhaps some | other languages have adopted it too.) | Athas wrote: | Strong sequential consistency is a big one. Most | architectures that have tried to diverge from this for | performance reasons run into trouble with the way people | like to write C code (but will not have trouble with | languages actually built for concurrency). | | Arguably the scalar focus of CPUs is also to make them | more suited for C-like languages. Now, attempts to do | radically different things (like Itanium) failed for | various reasons, in Itanium's case at least partially | because it was hard to write compilers good enough to | exploit its VLIW design. It's up in the air whether a | different high-level language would have made those | compilers feasible. | | It's not like current CPUs are completely crippled by | having to mostly run C programs, and that we'd have 10x | as many FLOPS if only most software was in Haskell, but | there are certainly trade-offs that have been made. | | It is interesting to look at DSPs and GPU architectures, | for examples of performance-oriented machines that have | _not_ been constrained by mostly running legacy C code. | My own experience is mostly with GPUs, and I wouldn 't | say the PTX-level CUDA architecture is _too_ different | from C. It 's a scalar-oriented programming model, | carefully designed so it can be transparently vectorised. | This approach won over AMDs old explicitly VLIW-oriented | architecture, and most GPU vendors are now also using the | NVIDIA-style design (I think NVIDIA calls it SPMT). From | a programming experience POV, the main difference between | CUDA programming and C programming (apart from the | massive parallelism) is manual control over the memory | hierarchy instead of a deep cache hierarchy, and a really | weak memory model. | | Oh, and of course, when we say "CPUs are built for C", we | really mean the huge family of shared-state imperative | scalar languages that C belongs to. I don't think C has | any really unique limitations or features that have to be | catered to. | OldHand2018 wrote: | > Now, attempts to do radically different things (like | Itanium) failed for various reasons, in Itanium's case at | least partially because it was hard to write compilers | good enough to exploit its VLIW design. It's up in the | air whether a different high-level language would have | made those compilers feasible. | | My day job involves supporting systems on Itanium: the | Intel C compiler on Itanium is actually pretty good... | now. We'd all have a different opinion of Itanium if it | had been released with something half as good as what | we've got now. | | I'm sure you can have a compiler for any language that | really makes VLIW shine. But it would take a lot of work, | and you'd have to do that work early. Really early. | Honestly, if any chip maker decided to do a clean-sheet | VLIW processor and did compiler work side-by-side while | they were designing it, I'd bet it would perform really | well. | klelatti wrote: | Thank you for an interesting comment - seems to imply | that Intel have markedly improved the Itanium compiler | since they discontinued Itanium which is interesting! | | I guess any new architecture needs to be substantially | better than existing out of order, superscalar | implementations to justify any change and we are still | seeing more transistors being thrown at existing | architectures each year and generating some performance | gains. | | I wonder if / when this stops then we will see a | revisiting of the VLIW approach. | innocenat wrote: | I doubt it. Another major disadvantage of VLIW is | instruction density. If compiler cannot fill all | instruction slots, you are losing the density (thus | wasting cache, bandwidth, etc). | klelatti wrote: | It's obviously interesting to understand performance on real | life JS and Python workloads and maybe to use this to inform | ISA implementations. | | I don't think that it's being suggested that ISAs should be | designed to closely match the nature of these high level | languages. This has been tried before (e.g. iAPX 432 which | wasn't a resounding success!) | FullyFunctional wrote: | Personally I think this would be unfortunate as I don't think | JavaScript is the path forward, but computers have always | existed to run software (d'oh), which mean the natural | selection will obviously make this happen if there is a market | advantage. | | However I see all high performance web computing moving to WASM | and JavaScipt will exist just as the glue to tie it together. | Adding hardware support for this is naive and has failed before | (ie. Jazelle, picoJava, etc). | masklinn wrote: | The quote isn't really saying that JS-specific instructions | need be added to the ISA though. | FullyFunctional wrote: | In that sense it's not saying anything different from what | we have been doing for the past 60 years. | | The only significant thing that has changed is that power & | cooling is no longer free, so perf/power is a major | concern, especially for datacenter customers. | masklinn wrote: | > In that sense it's not saying anything different from | what we have been doing for the past 60 years. | | Yes it is? The essay's point is that "standard" hardware | benchmark (C and SPEC and friends) don't match modern | workloads, and should be devaluated in favour of better | matching actual modern workloads. | FullyFunctional wrote: | You think Intel designs processors around SPEC? (Hint: | they don't). | | ADD: It was an issue a long time ago. Benchmarks like | SPEC are actually much nicer than real server workloads. | For example, running stuff like SAP would utterly trash | the TLB. Curiously, AMD processors can address 0.25 TB | without missing in the TLB, much better than Intel. | randomsearch wrote: | Yeah WASM killing JS is an outside bet for this decade. Could | happen. (Please Lord). | doteka wrote: | Curious, have you actually written production software | targeting WASM? Because I have, in Rust, the language with | the best toolchain for this by an order of magnitude. | | Despite all that, and me being very familiar with both Rust | and JS, it was a big pain. WASM will remain a niche | technology for those who really need it, as it should be. | No one is going to write their business CRUD in it, it | would be a terrible idea. | FullyFunctional wrote: | I don't dispute it, but could you elaborate on the | painful parts? | | I find the crossing in and out of JavaScript to be less | than ideal. However, I don't see why WASM couldn't evolve | to require less of that, ie. expose more of what you need | JavaScript for today. | monoideism wrote: | > However, I don't see why WASM couldn't evolve to | require less of that, ie. expose more of what you need | JavaScript for today | | It can, and it is. Designers are already doing all they | can to make it an appealing target for a variety of | languages on multiple platforms. | alquemist wrote: | We are no longer in 2005. Javascript, especially in its | Typescript flavor, is a perfectly capable modern language. | sterlind wrote: | maybe this will lead to the revival of Transmeta-like | architectures? I always had a soft spot for reprogrammable | microcode. | FullyFunctional wrote: | WASM is _designed_ to be easily jitted, without the | expensive machinery we had to put in place to do this for | x86, so the whole point is not require a new architecture. | | Because of this I find WASM to be the best direction yet | for a "universal ISA" as it's very feasible to translate to | most strange new radical architecture (like EDGE, Prodigy, | etc). (Introducing a new ISA is almost impossible due to | the cost of porting the world. RISC-V might be the last to | succeed). | masklinn wrote: | > Adding hardware support for this is naive and has failed | before (ie. Jazelle, picoJava, etc). | | The hardware support being added here would work just as well | for WASM (though it might be less critical). | FullyFunctional wrote: | Pray tell in which way this will help WASM? | dimeatree wrote: | It would mean JavaScript would have to compile to the same | byte code less there are multiple instruction sets. | FullyFunctional wrote: | Unlike Java there's no official bytecode and all | implementation do it differently. I don't think any high | performance implementation use bytecodes, but instead uses | threaded code for their 1st tier, and native code for all | others. | nom wrote: | So it was easier to add an instruction in silicon to cater for an | ill-designed programming language, than to change the language | itself? | | I mean, if float-to-integer performance is so critical, why was | this not fixed a long time ago _in the language_? What am I | missing? | Skunkleton wrote: | > What am I missing? | | ARM wants to be appropriate for more workloads. They don't want | to have to wait for software to change. The want to sell | processor designs now. | TameAntelope wrote: | One is a technology change, the other is a culture change. The | former is _infinitely_ easier than the latter, in my opinion | /experience. | [deleted] | offtop5 wrote: | Consider Node JS is a top server side language, and arm for | data centers is coming ( to the extent it's not already here), | this makes sense. | | Technically someone can just make a better VM engine for | JavaScript to execute inside of, but whatever I guess they | decided this would be easier. | dtech wrote: | I'd say JS on the mobile ARM devices is 10.000x more common, | and thus important, than the NodeJS on ARM servers. | offtop5 wrote: | Wait. | | Do you mean JS inside of browsers themselves ? Or JS | running in another manner | untog wrote: | > What am I missing? | | I guess the incredibly long tail of JavaScript deployments. You | could change the language today and it would take years for it | to percolate down to all the deployments out there. By | comparison a silicon change requires zero code changes and is | entirely backwards compatible. Plus there's a pretty simple | marketing argument: "buy our CPU, it makes commonly-encountered | workloads faster", vs "buy our CPU, when everyone upgrades the | platform they use it'll be faster" | kanox wrote: | My guess is that this is what JIT people asked for. | amelius wrote: | Don't sufficiently advanced compilers infer what the real type | of a variable is, in the most important cases? | olliej wrote: | We do, but there are still times when a double -> int | conversion is necessary, this is true in every other language | as well. | | The real problem is that JS inherited the x86 behavior, so | everyone has to match that. The default ARM behavior is | different. All this instruction does is perform a standard | fpu operation, but instead of passing the current mode flags | to the fpu, it passes a fixed set irrespective of the current | processor mode. | | As far as I can tell, any performance win comes from removing | the branches after the ToInt conversion that are normally | used to match x86 behavior. | robocat wrote: | First comment on question: "The JavaScript engine has to do | this operation (which is called ToInt32 in the spec) whenver | you apply a bitwise operator to a number and at various other | times (unless the engine has been able to maintain the number | as an integer as an optimization, but in many cases it | cannot). - T.J. Crowder" | | Edit: From https://www.ecma- | international.org/ecma-262/5.1/#sec-9.5 9.5 | ToInt32: (Signed 32 Bit Integer) The abstract | operation ToInt32 converts its argument to one of 2 integer | values in the range -231 through 231-1, inclusive. This | abstract operation functions as follows: Let | number be the result of calling ToNumber on the input | argument. If number is NaN, +0, -0, +[?], or -[?], | return +0. Let posInt be sign(number) * | floor(abs(number)). Let int32bit be posInt modulo 232; | that is, a finite integer value k of Number type with | positive sign and less than 232 in magnitude such that the | mathematical difference of posInt and k is mathematically an | integer multiple of 232. If int32bit is greater than or | equal to 231, return int32bit - 232, otherwise return | int32bit. NOTE Given the above definition of | ToInt32: The ToInt32 abstract operation is | idempotent: if applied to a result that it produced, the | second application leaves that value unchanged. | ToInt32(ToUint32(x)) is equal to ToInt32(x) for all values of | x. (It is to preserve this latter property that +[?] and -[?] | are mapped to +0.) ToInt32 maps -0 to +0. | domenicd wrote: | Remember not to refer to outdated specs; the modern version | is at https://tc39.es/ecma262/#sec-toint32 . The changes | look editorial modernizations (i.e., I don't think there | have been any bugfixes to this low-level operation in the 9 | years since ES 5.1 was published), but it's better to be | safe than sorry, and build the right habits. | chrisseaton wrote: | Many important cases are polymorphic so have to be able to | handle both. | why_only_15 wrote: | The JSC JIT people seemed kind of surprised by this, which was | weird. Maybe the V8 JIT people asked for it? | monocasa wrote: | Is there a source for that? I was under the impression this | initially shipped on iOS devices so it'd be weird for JSC to | be surprised by it. | smt1 wrote: | Remember, don't think too hard about this stuff. It's better to | meditate. You can apparently just telepathically learn this stuff | by activating your paraterial lobe which can take 2-nary infos | about learn to think about it as more generalized cognitive | processes properly. Sharing = caring. | | I think it's best understood under a generalized ethnographic | "utterance" aka linguistic and cultural problem in say a first | order generalized theory of computing or combinatorics. aka the | discrete log encoding::decode problem or the more ancient wisdom | not passing down quick enough problem. When 2 group of static | people of different cultures that some static group can just | (lua) = implicitly define stuff very hard (combinatorical | optimization) = branch and bound, it's better to remember than | coding=calculate can be just thought as just a data structure | problem (do a one time pad) in a generalized calculation in a | automorphism and using techniques found in that abstract syntax | tree that JavaScript can gradually typed more safely using | TypeScript which Microsoft "already" gradually feed a generalized | search engine (Google) knows so they can just Wikipedia (autocode | the "Wisdom of the Crowds") it. | | I'd understand it by understanding first principles from logic | and congitive sciences and applied software engineering. | | Software defined dynamic by necessity requires regularization to | be very secure cryptographically to be ever exist in the first | place. | | But remember you can think about it anything another way. This is | basically doing a reinterpret_cast in classical first order C++. | | I personally think about it best as myself as a lexically scoped | optimizing compiler using words compiled in some interior | language to do a 1:1 mapping in Visual Studio to static analysis | where my "utterances" can really apparently convince people that | pattern matching stuff can really understood various matching | techniques to recognize that these are just the same patterns. | | Basic elementary physics and elementary internet-networking | things to do figure out some paradoxes: | | 1. why aren't people focusing on general energy scavenging in | principle? Apparently other people are. But it requires change of | focus in many respects. Basically you can on a computer on | software do it safely, but in the physical world, you may | accidentally be doing very non-sensical things like imagining | that you can just do an inversion of gravity. | | 2. remember that bias exists, but history suggests that bias can | be thought many different ways. your experiences bias you, but | sometimes humans beings need to think automatically in new | adductive reasoning inferential systems in safety. That can only | be done to affect the real world in computer simulation. This can | add to human knowledge though. Deductive and inductive and | coinductive can be performed over arbitrary domains in whatever | order theoretic ways. 3. remember these are just analogies. They | can condition your mind, but could also let your brain think more | clearly. 4. for example implicit::ego::bias (abc) problem. 5. | everyone who knows signal processing knows singular "poles" as 0 | can be patched up by more modern stable manifold theory but only | people who understand that boundaries are just arbitrary of | linear spaces. | | Remember than in the cultural sense of philosophy you can uw | "hard" stuff once you understand traditionally thought of random | [0,1] as boolean as more rich simplexes in the newer norm space | [-1,1] which modern probability theory can be applied to. It | creates a generalized system of norms of measurement of classical | informational theory space. | | Sorry for being so cryptic but recall that you could understand | that shannon for example "accidentally" created the information | by creating information and commutation theory by autodualizing | math and a implicit association of a simple easy to read 1 page | document that created a "digital bit". Equated as entropy. You | could understand say hibert who let his "school" understand inf | dimensional spaces. | | I'll just point out a few free hugs in unicode spectrum (see | https://home.unicode.org/emoji/emoji-frequency/). I just | understand to look at it using free glue guns but people will | have to discover the right modulation schemes themselves in | discrete logarithms but it may require say, for example, doing 3 | simultaneous generalizations to combine 2 y-combinators of | information theoretic boolean bits could be remodulated into L-R | folded "cultural" to use experience to debias past implicit bias | and use + forward thinking bias to auto create quasi-well ordered | programs that computer science people have already created. | | let's just say I know by experience I know these things: | | 1. the www was originally created for people to collaborate over | long distances to communicate more easily with clarity by being | disconnected 2. social media was created for people to | communicate as "new" ways but things facebook and twitter got too | polluted by disinformation 3. telecomm systems eventually went | from very messy error connecting codes modems and progressed to | broadband by easing collaboration yet censorship still exists 4. | search engine systems consuming more and more information to help | locate stuff, but things like google get new bits of | interformation to associations in their own internal db 5. it was | at one time very hard to write international software but the | invention of unicode helped people get access to the internet for | the first time 6. the open/free source software like github and | cultural practices like code reviews, allowed software developers | for people around teams and corporations in remote areas to | collaborate easily 7. computing devices like phones and smarter | free computer languages allowed safer gradually and shifting | codes of conduct to allow more open collaboration 8. the creation | of cryptographic two session protocols allowed two parties | security agree to securely exchange two exchange metadata | together and work together 9. in applied software engineering, | the creation of distributed version control systems like git and | linux allowed mass adoption of free software 10. so these days in | the software world you can just code software just by "coding" by | pasting in semantic meaningful this is compositional software. | 11. the productivity improvements in computer science has been | enormous so that most people can save on energy both in hardware | and software and refocus on more important stuff | joosters wrote: | Bad bot. | klelatti wrote: | Given that the instruction set already has a float to integer | conversion it seems likely that the overhead of implementing this | would be small and so given the performance (and presumably | energy) win quoted elsewhere seems like a good move. | | It would be interesting to know the back story on this: how did | the idea feed back from JS implementation teams to ARM. Webkit | via Apple or V8 via Google? | olliej wrote: | Correct, the overhead is minimal - it basically just makes the | float->int conversion use a fixed set of rounding and clamping | modes, irrespective of what the current mode flags are set to. | | The problem is JS's double->int conversion was effectively | defined as "what wintel does by default", so on arm, ppc, etc | you need a follow on branch that checks for the clamping | requirements and corrects the result value to what x86 does. | | Honestly it would not surprise me if the perf gains are due to | removing the branch rather than the instruction itself. | klelatti wrote: | Interesting. So it's as much an x86 legacy issue as JS and | presumably JS followed x86 because it was more efficient to | do so (or maybe by default). | | Sounds too like performance gains will depend on how often | the branch is taken which seems highly dependent on the | values that are being converted? | formerly_proven wrote: | > Interesting. So it's as much an x86 legacy issue as JS | and presumably JS followed x86 because it was more | efficient to do so (or maybe by default). | | Most languages don't start with a spec, so the semantics of | a lot of these get later specced as "uhhhhh whatever the C | compiler did by default on the systems we initially built | this on". | brundolf wrote: | Seeing as JavaScript was designed and implemented in two | weeks, I'm betting this is the answer | tekromancr wrote: | Today's JavaScript is so divorced, so radically different | from the the original implementation to be considered a | different language, though. | monocasa wrote: | Rounding mode was defined then and hasn't changed since | though. | goatlover wrote: | Isn't modern JS backward compatible with 1.1? | domenicd wrote: | Mostly. See https://tc39.es/ecma262/#sec-additions-and- | changes-that-intr... for a comprehensive list of | backward-incompatible changes in the spec. | | Using that list to answer your question is a bit tricky, | since it also includes backward-compatibility breaks with | newer features. But, e.g., | | > In ECMAScript 2015, ToNumber applied to a String value | now recognizes and converts BinaryIntegerLiteral and | OctalIntegerLiteral numeric strings. In previous editions | such strings were converted to NaN. | | and | | > In ECMAScript 2015, the Date prototype object is not a | Date instance. In previous editions it was a Date | instance whose TimeValue was NaN. | | sound like backward-incompatible changes to a JS 1.1 | behavior. | | Another notable example is the formalization of function- | in-block semantics, which broke compatibility with | various implementations in order to find a least-bad | compromise everyone could interop on. I'm not sure if JS | 1.1 even had blocks though, much less functions in | blocks... | asddubs wrote: | >Another notable example is the formalization of | function-in-block semantics, which broke compatibility | with various implementations in order to find a least-bad | compromise everyone could interop on. I'm not sure if JS | 1.1 even had blocks though, much less functions in | blocks... | | can you explain what you mean? Did early js | implementations not have functions inheriting the block | scope? | olliej wrote: | It was 100% [ed] due to [/ed] the default wintel | behavior. All other architectures have to produce the | same value for that conversion. | freeone3000 wrote: | The branch always has to be taken if executing JavaScript, | because otherwise how would you tell if the value was | correct or not? You'd have to calculate it using this | method regardless and then compare! | olliej wrote: | What are you defining as correct? | freeone3000 wrote: | "correct" in this case being consistent with the | JavaScript specification. | fred256 wrote: | If you always take a branch, it's not a branch. | | Or did you mean by "taken" that the branch instruction | has to be executed regardless of whether the branch is | taken or not? | freeone3000 wrote: | JavaScript JITs always emit this instruction when ToInt32 | is required, since checking would be more expensive in | user code. And the instruction always used the JS | rounding method, since that's cheaper in silicon. I used | branch since the parent used "branch". | jleahy wrote: | Not quite. JS is round towards zero, ie. the same as C. If | you look at the x86 instruction set then until SSE2 (when | Intel specifically added an extra instruction to achieve | this) this was extremely awkward to achieve. x86 always did | round-to-nearest as the default. | | The use of INT_MIN as the overflow value is an x86-ism | however, in C the exact value is undefined. | marcusarmstrong wrote: | If anybody else was curious, it appears that the performance win | of use of this instruction looks to be about 1-2% in general | javascript workloads: | https://bugs.webkit.org/show_bug.cgi?id=184023#c24 | bitcoinmoney wrote: | I would argue a solid 1-2% can get you a promotion in HW | companies. You put 5 of this improvements and that's a | generational improvement. | nichch wrote: | Is that significant enough to justify the instruction? | lstamour wrote: | It is if it also increases battery life by a similar amount | ;-) | johntb86 wrote: | It increases the die area, power consumption, and instruction | set size by a miniscule amount, so probably. | secondcoming wrote: | Nothing justifies the prolonging of Javascript torture. | Joker_vD wrote: | Nothing justifies the prolonging of C torture either, | except of the C's wide spread. Why do you think modern CPUs | still expose mostly C-abstract-machine-like interface | instead of their actual out-of-order, pipelined, | heterogeneous-memory-hierarch-ied internal workings? | atq2119 wrote: | CPUs expose a "mostly-C-abstract-machine-like" interface | because this allows chip designers to change the internal | workings of the processor to improve performance while | maintaining compatibility with all of the existing | software. | | It has nothing to do with C, specifically, but with the | fact that vast amounts of important software tend to be | distributed in binary form. In a hypothetical world where | everybody is using Gentoo, the tradeoffs would be | different and CPUs would most likely expose many more | micro-architectural details. | wnoise wrote: | It's not like other languages are well-adapted to that | either. That's a hard target to code for. | goatlover wrote: | Why couldn't Haskell compilers make good use of that? | jerf wrote: | The "sufficiently smart compiler" [1] has been tried | often enough, with poor enough results, that it's not | something anyone counts on anymore. | | In this case, the most relevant example is probably the | failure of the Itanium. Searching for that can be | enlightening too, but heres a good start: | https://stackoverflow.com/questions/1011760/what-are-the- | tec... (For context, the essential Itanium idea was to | move complexity out of the chip and into the compiler.) | | Also, don't overestimate Haskell's performance. As much | fun as I've had with it, I've always been a bit | disappointed with its performance. Though for good | reasons, it too was designed in the hopes that a | Sufficiently Smart Compiler would be able to turn it into | something blazingly fast, but it hasn't succeeded any | more than anything else. Writing high-performance Haskell | is a lot like writing high performance Javascript for a | particular JIT... it can be done, but you have to know | _huge_ amounts of stuff about how the compiler /JIT will | optimize things and have to write in a very particular | subset of the language that is much less powerful and | convenient than the full language, with little to no | compiler assistance, and with even slight mistakes able | to trash the perfromance hardcore as some small little | thing recursively destroys all the optimizations. It's | such a project it's essentially writing in a different | language that just happens to integrate nicely with the | host. | | [1]: https://duckduckgo.com/sufficiently smart compiler | erikpukinskis wrote: | This is like saying "nothing justifies the prolonging of | capitalist torture". On some level it's correct, but it's | also being upset at something bordering a fundamental law | of the universe. | | There will always be a "lowest common denominator" platform | that reaches 100% of customers. | | By definition the lowest common denominator will be | limited, inelegant, and suffer from weird compatibility | problems. | | If it wasn't JavaScript it would be another language with | very similar properties and a similar history of | development. | moonchild wrote: | A recent AMD microarch was 10-20% faster than the previous | one, despite running on the same physical process. No single | component was responsible; there were changes to several | components, each of which increased speed by only 1-4%. | marcusarmstrong wrote: | A great question for the folks who actually do CPU design! As | somebody who writes JS rather frequently I'm not complaining. | [deleted] | mhh__ wrote: | Unless I misread the current arm docs, I don't think this is | still present in the ISA as of 2020? | | The whole RISC/CISC thing is long dead anyway, so I don't really | mind having something like this on my CPU. | | Bring on the mill (I don't think it'll set the world on fire if | they ever make it to real silicon but it's truly different) | alain94040 wrote: | To understand RISC, ignore the acronym it stands for, instead | just think fixed-instruction, load-store architecture. That's | what RISC really means today. | | No variable-length instructions. No arithmetic instructions | that can take memory operands, shift them, and update their | address at the same time. | elygre wrote: | Someone once explained it like "not a reduced set, but a set | of reduced instructions". Not r(is)c, but (ri)sc. | | Pretty much what you say, I just liked the way of describing | it. | FullyFunctional wrote: | It seems to me people are ignoring that the C stands for | Complexity. What's reduced is Complexity of the instruction | set, not the size of it (or even the instructions | themselves). In the context of the coinage of the term, | they almost certainly could have called it "microcode-free | ISA", but it wouldn't have sounded as cool. | mavhc wrote: | Doesn't the C stand for Computer? | FullyFunctional wrote: | Oops I'm wrong about the name but not about the spirit. | This is the original paper: | https://dl.acm.org/doi/pdf/10.1145/641914.641917 | andromeduck wrote: | Is ARM even microcode free these days? | userbinator wrote: | ...and even ARM breaks instructions into uops, just like x86. | mhh__ wrote: | Of course it does. The level of abstraction required for | modern pipelining and OOo scheduling is still beneath ARM. | I'm not that familiar with the details of arm on paper but | it's not that low level by research standards. | Joker_vD wrote: | What is with variable-length instruction aversion? Why is it | better to load a 32-bit immediate with two 4-byte | instructions (oh, and splitting it in 12/20 bit parts is non- | intuitive because of sign extension, thanks RISC V authors) | than with one 5-byte instuction? | sharpneli wrote: | Fixed width instructions allow trivial parallel instruction | decoding. | | With variable length instructions one must decode a | previous one to figure out where the next one will start. | scottlamb wrote: | > With variable length instructions one must decode a | previous one to figure out where the next one will start. | | People said the same thing about text encodings. Then | UTF-8 came along. Has anyone applied the same idea to | instruction encoding? | Someone wrote: | That would eat precious bits in each instruction (one in | each byte, if one only indicates 'first' or 'last' bytes | of each instruction). | | It probably is better to keep the "how long is this | instruction" logic harder and 'waste' logic on the | decoder. | pjc50 wrote: | All sorts of boundary issues occur when you're near the end | of a page or cache line. How many instruction bytes should | you load per cycle? What happens when your preload of a | byte that happens to be in a subsequent page trips a page | fault? | | By comparison, 4byte instructions that are always aligned | have none of those problems. Alignment is a significant | simplification for the design. | | (Somewhere I have a napkin plan for fully Huffman coded | instructions, but then jumps are a serious problem as | they're not longer byte aligned!) | Hello71 wrote: | > What happens when your preload of a byte that happens | to be in a subsequent page trips a page fault? | | spectre | dragontamer wrote: | > No variable-length instructions | | AESE / AESMC (AES Encode, AES Mix-columns) are an instruction | pair in modern ARM chips in which the pair runs as a singular | fused macro-op. | | That is to say, a modern ARM chip will see "AESE / AESMC", | and then fuse the two instructions and execute them | simultaneously for performance reasons. Almost every "AESE" | encode instruction must be followed up with AESMC (mix | columns), so this leads to a significant performance increase | for ARM AES instructions. | kanox wrote: | Arm v8 is the "current" 64-bit version of the ISA and you | almost certainly have it inside your phone. You might have a | version older than v8.3. | ramshorns wrote: | What does it mean for the RISC/CISC thing to be dead? The | distinction between them is more blurred than it used to be? | dragontamer wrote: | RISC / CISC was basically IBM-marketing speak for "our | processors are better", and never was defined in a precise | manner. The marketing is dead, but the legend lives on years | later. | | IBM's CPU-advancements of pipelining, out-of-order execution, | etc. etc. were all implemented into Intel's chips throughout | the 90s. Whatever a RISC-machine did, Intel proved that the | "CISC" architecture could follow suite. | | ------ | | From a technical perspective: all modern chips follow the | same strategy. They are superscalar, deeply-pipelined, deeply | branch predicted, micro-op / macro-op fused "emulated" | machines using Tomasulo's algorithm across a far larger | "reorder buffer register" set which is completely independent | of the architectural specification. (aka: out-of-order | execution). | | Ex: Intel Skylake has 180 64-bit reorder buffer registers | (despite having 16 architectural registers). ARM A72 has | 128-ROB registers (depsite having 32-architectural | registers). The "true number" of registers of any CPU is | independent of the instruction set. | FullyFunctional wrote: | Since RISC wasn't coined by IBM (but by Patterson and | Ditzel) this is just plain nonsense. RISC was and is a | philosophy that's basically about not adding transistors or | complexity that doesn't help performance and accepting that | we have to move some of that complexity to software | instead. | | Why wasn't it obvious previously? A few things had to | happen: compilers had to evolve to be sophisticated enough, | mindsets had to adapt to trusting these tools to do a good | enough job (I actually know several who in the 80' still | insisted on assembler on the 390), and finally, VLSI had to | evolve to the point where you could fit an entire RISC on a | die. The last bit was a quantum leap as you couldn't do | this with a "CISC" and the penalty for going off-chip was | significant (and has only grown). | FullyFunctional wrote: | Not even remotely. Nothing in the RISC philosophy says | anything about pure data-path ALU operations. In fact this | instruction is pretty banal compared to many other FP | instructions. | jbirer wrote: | I think you are thinking of the old Java-optimizing | instructions on the older ARM processors. | saagarjha wrote: | Those never really took off... | barumi wrote: | FTA: | | > Which is odd, because you don't expect to see JavaScript so | close to the bare metal. | | This seems to ignore all the work done on server-side javascript | with projects such as node.js and deno, as well as the fact that | cloud service providers such as AWS have been developing their | own ARM-based servers. | yjftsjthsd-h wrote: | Just because a language is used on servers / at scale / in | enterprise, doesn't make it any closer to the metal. I mean, | look at Python, Java, or Ruby: All used on servers for | "serious" applications, but certainly not any sort of "bare | metal" languages still. | joshuaissac wrote: | Some ARM chips used to have bare-metal Java bytecode support, | called Jazelle. | orf wrote: | No it doesn't, it's still surprising to see a CPU instruction | added specifically to cater for such a high level language like | JS. | | It does totally makes sense though, given the importance of JS | and it's common use in mobiles. | est31 wrote: | Not surprising to me to see CPU instructions being added for | widely used use cases. Basically, a CPU instruction set | maker's job is to perform clustering of the tasks that the | CPUs do and accelerate commonly used tasks. If people do a | lot of AES, the instruction set maker adds AES extensions. If | people do a lot of CRC, they add CRC instructions. If people | convert doubles to integers all day in a very JS specific | way, then that's the instruction they add. | regularfry wrote: | With reduced instruction sets, though, the idea is to | provide common subparts and make them fast rather than | dedicated instructions. While it's not odd to see dedicated | instructions in CPUs in general, it's jarring to see if | you're familiar with ARM chips from back when they had 25 | instructions total. | est31 wrote: | Yeah that's how RISC vs CISC is taught in class, I've | heard that same thing. I think it's an outdated paradigm | though, if it's not been wrong all along. A CISC chip can | still officially support instructions, but deprecate them | and implement them very slowly in microcode, as is done | with some x86 instructions. And a RISC chip manufacturer | might just have used this phrase as a marketing line | because designing RISC chips is easier than starting with | a monster that has tons of instructions, and they needed | some catchy phrase to market their alternative. They then | get into the gradual process of making their customers | happier one by one by adding special, mostly cold, | silicon targeted for special workflows, like in this | instance. | | Ultimately, the instruction set isn't that relevant | anyways, what's more relevant is how the microcode can | handle speculative execution for common workflows. | There's a great podcast/interview from the x86 | specification author: | https://www.youtube.com/watch?v=Nb2tebYAaOA | orf wrote: | Can you name any other instructions with the name of a | programming language in the actual instruction name? | | No? Then it seems way more specific than the other examples | you listed. So specific that it's only applicable to a | single language and that language is in the instruction | name. That's surprising, like finding an instruction called | "python GIL release". | tedunangst wrote: | If they had named the instruction FCVTZSO, would you | care? Would you even know? | Someone wrote: | CPUs with the name of a programming language exist: | | - https://en.wikipedia.org/wiki/Lisp_machine | | - https://en.wikipedia.org/wiki/Java_processor | | These are examples of https://en.wikipedia.org/wiki/High- | level_language_computer_a.... | aardvark179 wrote: | Off the top of my head, and ignoring the Java bytecode | instructions added at one point, no. I can however think | of quite a lot of crypto related instructions, and many | CPU and MMU features aimed at very specific language | features such as garbage collection. | FullyFunctional wrote: | Well if you go far enough back, it was not usual to find | special allowance for programming language constructs | (like procedure linkage that supported display, useful | for languages like Pascal with nested procedures etc). | | The reason you don't find this is that all "modern" | processors designed since ~ 1980 are machines to run ... | C as the vast majority (up until ~ Java) of all software | on desktops and below were written in C. This also has | implications for security as catching things like out of | bound access or integer overflow isn't part of C so doing | it comes with an explicit cost even when it's cheap in | hardware. | est31 wrote: | In addition to what the sibling comments said, note that | "Python GIL release" is a very high level concept and how | the GIL is implemented can change over time. It interacts | with how the OS implements locking. FJCVTZS on the other | hand is a (mostly) pure function with well defined inputs | and outputs, and will likely be useful for Javascript | implementations for a long time. Will JS be around in 10 | years? Very likely. So ARM adds an instruction for it. | | The people at ARM have likely put a lot of thought behind | this, trying to find the places where they, as | instruction set vendor, can help making JS workflows | easier. | | And btw, I'm pretty sure that if python has some equally | low hanging fruit to make execution faster, and python | enjoys equally large use, ARM will likely add an | instruction for it as well. | diggan wrote: | Not necessarily programming languages (in similar vain as | what est31 said) but cryptography has made its way into | instructions, see SHA1RNDS4, SHA1NEXTE, SHA1MSG1 and | more. They are not general cryptographic primitives but | specific instructions for computing a specific | cryptographic hash, just because SHA became popular. Also | has SHA in it's name :) | est31 wrote: | Yes, the examples I quoted AES and CRC all have dedicated | instructions in x86 processors (at least the ones that | implement the extensions). | | https://en.wikipedia.org/wiki/AES_instruction_set#Instruc | tio... | | https://www.felixcloutier.com/x86/crc32 | | As does ARM: | | https://developer.arm.com/documentation/ddi0514/g/introdu | cti... | | https://developer.arm.com/documentation/dui0801/g/A32-and | -T3... | | You could say that these are language-agnostic, and | indeed they are, but in a certain way they are more | _specific_ than the JavaScript operation, because the | JavaScript operation is likely used in many different | algorithms. I 'd argue that from the point of view of a | chip manufacturer, the difference matters only little. | Both are tasks that the CPUs do often and thus cost | energy. By implementing them natively, the CPU | manufacturer reduces the cost of those workflows for | their clients. | neerajsi wrote: | More importantly, the customers of the cpu vendors use | benchmarks to evaluate potential chip suppliers, and Js | benchmarks are near the top of the list for ARM now. | Everyone from the OS vendor to the compiler authors to | ARM itself are looking at the same operations, so | instructions like this get requested and added. | | There are also funny xaflag, axflag instructions in armv8 | specifically to speed up x86 emulation on arm64. I | believe they were added at the request of the msft | emulator team. | imtringued wrote: | It doesn't surprise me because we are supposed to expect | commonly used programming languages to receive optimizations. | If we didn't follow this logic there would be no reason to | optimize C. Instead each processor company would optimize for | their own proprietary language and expect everyone to use the | proprietary language for high performance needs. | barumi wrote: | > No it doesn't, it's still surprising to see a CPU | instruction added specifically to cater for such a high level | language like JS. | | It surprises you personally, but if you think about it it's | easy to understand that widespread interpreters that use a | specific number crunching primitive implented in software can | and do benefit from significant performance improvements if | they offload that to the hardware. | | You only need to browse through the list of opcodes supported | by modern processors to notice a countless list of similar | cases of instructions being added to support even higher-leve | operations. | | I mean, are you aware that even Intel added instructions for | signal processing, graphics, and even support for hash | algorithms? | | And you're surprised by a floating point rounding operation? | moritzwarhier wrote: | Considering that JS is JIT-compiled and doesn't have an | integer type, this is not suprising to me at all.. (but | still interesting) | Animats wrote: | Nice. Also useful is a set of integer operations that raise an | exception on overflow. | nabla9 wrote: | SPARC processors have tagged add and subtract instructions to | help dynamic languages. | goatinaboat wrote: | IBM has hardware acceleration for XML processing, of all | things, so there is plenty of precedent for this. | | https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0... | FullyFunctional wrote: | ... which were removed in SPARCv8 (64-bit) because nobody used | them. | rhacker wrote: | It seems like every 2 months I feel the burn of JS not having | more standard primitive types and choices for numbers. I get this | urge to learn Rust or Swift or Go which lasts about 15 minutes... | until I realize how tied up I am with JS. | | But I do think one day (might take a while) JS will no longer be | the obvious choice for front-end browser development. | robpalmer wrote: | What other numeric types would you like to see? | | In addition to the existing doubles, ES2020 added support for | signed integers. | kevincox wrote: | With WebAssembly that day is nearing. However it is certainly | many years out. | tachyonbeam wrote: | Are you saying you'd like to have specific int32, int64, | float32, float64 types? | jjuhl wrote: | Strong typing and static typing - yes please. | jayd16 wrote: | You could split the difference and go C# or Java and use one of | their web stacks. | FullyFunctional wrote: | Funny, I have a small JavaScript app I have abandoned because I | find developing JS so awful. Now that I have ramped up in Rust | I am very tempted to rewrite it as Rust has first-class WASM | support. Unfortunately I'd still need JavaScript for the UI | bits. | | IMO: Rust isn't the easiest language to learn, but the | investment pays off handsomely and the ecosystem is just | wonderful. | | EDIT: I meant "to learn" which completely changes the statement | :) | qball wrote: | >But I do think one day (might take a while) JS will no longer | be the obvious choice for front-end browser development. | | I think that day might be sooner than anyone thinks- Chromium | is dominant enough now that their including Dart as a first- | class language (or more likely, a successor to Dart) will | likely be a viable strategy soon. | | Of course, the wildcard is Apple, but ultimately Dart _can_ | compile down to JS- being able to write in a far superior | language that natively runs on 80% of the market and transpiles | to the rest is suddenly much more of a winning proposition. | shrewduser wrote: | I feel like kotlin is a much better language than dart, has | many more use cases and compiles down to javascript also. | elevenoh wrote: | Dart is weak for functional programming / functional way of | thinking. Right there, dart lang loses me as a potential | user. | sroussey wrote: | If you want that, you can start with TypeScript and name your | number types. Doesn't do anything real at the moment, but | subsections of code can then be compiled as assemblyscript to | wasm. | offbynull wrote: | Anyone else remember the ARM Jazelle DBX extension? I wonder if | they'll end up dumping this in this the same way. | | I don't remember very many phones supporting DBX, but IIRC the | ones that did seemed to run J2ME apps much smoother. | saagarjha wrote: | This one is at least documented and in use by major browsers, | so I doubt it will go away anytime soon. | masklinn wrote: | It's quite different though, Jazelle literally implemented java | bytecode in hardware. | | These instructions "merely" performs a float -> int conversion | with JS semantics, such that implementations don't have to | reimplement those semantics in software on ARM. The JS | semantics probably match x86 so x86 gets an "unfair" edge and | this is a way for ARM to improve their position. | [deleted] | [deleted] ___________________________________________________________________ (page generated 2020-10-17 23:00 UTC)