[HN Gopher] iAPX432: Gordon Moore, Risk and Intel's Super-CISC F... ___________________________________________________________________ iAPX432: Gordon Moore, Risk and Intel's Super-CISC Failure Author : klelatti Score : 74 points Date : 2023-04-02 16:33 UTC (6 hours ago) (HTM) web link (thechipletter.substack.com) (TXT) w3m dump (thechipletter.substack.com) | nickdothutton wrote: | I think the iAPX432 team went on to do the i960 (another | interesting architecture that didn't really find the success that | was hoped-for) and then finally they went on to the PentiumPro | where they found more success. | convolvatron wrote: | it wasn't a total failure. the 860 and 960 were decent little | engines that found a home in high performance computing and | embedded applications that needed a little oomph. I worked on | some 860 array products and certainly remember finding 960s in | printers and other gear | speedbird wrote: | Worked on an i860 Stratus machine in the early 90s - provided | a key part of our distributed infra due to its FT | capabilities. | trzy wrote: | Trivia: The i960 was used to power Sega's highly successful | Model 2 arcade board, which truly ushered in the era of 3D | gaming* (with Daytona USA), and was used in the F-22 Raptor | until it was later replaced with other CPUs. | | * Certainly not the first 3D arcade hardware but arguably | this along with Namco's MIPS-based System 22 (Ridge Racer, | released a little before Daytona), was the inflection point | that made 2D effectively obsolete. | panick21_ wrote: | Its really said the i960 didn't take off. Intel wasn't in Unix | workstation market much and the high end was owned by Digital | and IBM. Intel was working on i860, i960, | | Intel if they had added quicker, could have cooperated with a | Unix workstation maker and potentially done really well. | | Sun was defiantly looking around for a chip partner at the | time, but non of the American companies were interested so they | went Japan. So the timing didn't really work out. A Sun Intel | alliance would have been a scary prospect and beneficial for | both companies. | twoodfin wrote: | What's most interesting to me about the i432 is the rich array of | object types essentially embedded into its ISA. The JVM "knows" a | little bit about virtual dispatch tables, monitors, arrays, but | even that pales in comparison to the i432's user-facing model of | the CPU state. | | Is there anything comparable surviving today? | userbinator wrote: | There were some attempts in the Java direction: | https://en.wikipedia.org/wiki/Java_processor | | But ultimately it seems that the idea of language-specific CPUs | just didn't survive because people want to be able to use any | programming language with them. | panick21_ wrote: | The Java-Everything trip Sun went on was truly horrific. Both | in terms of technical and business results. | gumby wrote: | I don't think so, except at the margins. | | I started out as a Lisp hacker on machines designed for it | (PDP-10 and CADR, later D-machines) so I was very much in the | camp you describe. They had hardware / microcode support for | tagging, unboxing, fundamental Lisp opcodes, and for the Lispms | specifically, things like a GC barrier and transporter support. | When I looked at implementations like VAXLisp, the extra cycles | needed to implement these things seemed like a burden to me. | | Of course those machines did lots of other things as well, and | so were subject to a lot of evolutionary pressure the research | machines were not subject to. | | The shocker that changed my mind was the idea of using the TLB | to implement the write barrier. Yes, doing all that extra work | cost cycles, but you were doing on a machine that had evolved | lots of extra capabilities that could ameliorate some of the | burden. Plus the underlying hardware just got faster faster | (I.e. second derivative was higher). | | Meanwhile, the more dedicated architectures were burning | valuable real estate on these features and couldn't keep up | elsewhere. You saw this in the article when the author wrote | about gates that could have been used elsewhere. | | Finally, some decisions box you in -- the 64kb object size | limitation being an example in the 432. Sure, you can work | around it, but then the support for these objects becomes a | deadweight (part of the RISC argument). | | You see this also in the use of GPUs as huge parallel machines, | even though the original programming abstraction was triangles. | | Going back to my first sentence about "at the margins": | optimize at the end. Apple famously added a "jvm" instruction | -- must have been the fruit of a lot of metering! Note that | they didn't have to do this for Objective-C: some extremely | clever programming made dispatch cheap. | | Tagging/unboxing can be supported in a variety of (relatively) | inexpensive ways by using ALU circuitry otherwise idle during | address calculation OR (more likely these days) by implementing | a couple of in demand ops, either way pretty cheap. | | Finally, we do have a return to and flourishing of separate, | specialized functional units (image processors, "learning" | units and such, like, say, the database hardware of old) but | they aren't generally fully programmable (even if they have | processors embedded in them) but they key factor is that they | don't interfere (except via some DMA) with the core processing | operations. | aardvark179 wrote: | "Going back to my first sentence about "at the margins": | optimize at the end. Apple famously added a "jvm" instruction | -- must have been the fruit of a lot of metering! Note that | they didn't have to do this for Objective-C: some extremely | clever programming made dispatch cheap." | | I'm struggling to think of what you are referring to here. | ARM added op codes for running JVM byte code on the processor | itself, but I think those instructions were dropped a long | time ago. ARM also added an instruction (floating point | convert to fixed point rounding towards zero) as it became | such a common operation in JS code. There have also been | various GC related instructions and features added to POWER, | but I think all that was well after Apple had abandoned the | architecture. | | I may be forgetting sonething, could you clarify? | panick21_ wrote: | Not adding tagging is basically a negligence crime. That | feature isn't that expensive and it could have saved most of | the security issues that have happened to last 20+ years. | Lammy wrote: | NSA et al probably like better it that way so they have | easier access to the "'intel' inside" my PC. | yourapostasy wrote: | _> Is there anything comparable surviving today?_ | | I'm not aware of such Super CISC instruction sets in popular | use today, but I wonder with VM's and statistically-based AI | proliferating now, whether we might revisit such architectures | in the future. Could continuous VM-collected statistical data | inform compiler and JIT compiler design to collapse expensive, | common complex operations we can't identify patterns for with | current methods into Super CISC instructions that substantially | speed up patterns we didn't know previously existed, or are our | current methods to analyze and implement compilers and JIT's | good enough and what's mostly holding them back these days are | other factors like memory and cache access speed and pipeline | stalls? | rodgerd wrote: | > Is there anything comparable surviving today? | | Surviving? No. The most recent is arguably Sun's Rock | processor, which was one of the final nails in their coffin, | was quite an i432 redux. It promised all sorts of hardware | support for transactions and other features that Sun thought | would make it a killer chip, was rumoured to tape out requiring | 1 kW of power for mediocre performance, and Oracle killed it | when they saw how dysfunctional it was. | monocasa wrote: | If you squint hard enough, the underlying object capability | system as privilege boundary concept still does live on. | | In hardware the 432 went on to inspire 16 and 32 protected | modes on x86. There it was inspiration for just about anything | involving the GDT and the LDT including fine grained memory | segments, hardware task switching of Task State Segments, and | virtual dispatch through Task Gates. | | But a large point of the RISC revolution was that these kinds | of abstractions in microcode don't make sense anymore when you | have ubiquitous I$s. Rather than a fixed length blob of vendor | code that's hard to update, let end users create whatever | abstractions they feel make sense in regular (albeit if | privileged) code. Towards that end the 90s and 2000s had an | explosion of supervisor mode enforced object capability | systems. These days the most famous is probably sel4; there are | a lot of parallels between sel4's syscall layer and the | iAPX432's object capability interface between user code and | microcode. In a lot of ways the most charitable way to look at | the iAPX432 microcode was as a very early microkernel in ROM. | markhahn wrote: | timing is interesting: ia432 listed as "late 1981" | (wikipedia), and 286 (protected mode 16:16 segmentation) in | Feb 1982. of course, the 432 had been going on for some | time... | CalChris wrote: | The ambitious 432 was also late, quite late. So Intel needed a | simple stopgap product which was an iteration of the 8088, the | 8086. | wtallis wrote: | The 8088 (1979) was a low-cost (reduced bus width) follow-up to | the 8086 (1978). You may be thinking of the 8080 (1974) or 8085 | (1976). | B1FF_PSUVM wrote: | """ | | The key new features included: | | Ada : The architecture would be programmed using the Ada | programming language, which at the time was seen as the 'next big | thing' in languages. | | """ | | This was it - the next big thing. Missed and went down, but | there's always the doubt if they were right too early ... | | Seems the performance just wasn't there: | | """ | | Gordon admitted that "to a significant extent," he was personally | responsible. "It was a very aggressive shot at a new | microprocessor, but we were so aggressive that the performance | was way below par." | | """ | | It was kind of uncanny having it shouted from the rooftops one | year, to dead silence about anything having ever happened a few | years later. | markhahn wrote: | Ada was the CISC of languages - high-concept the same way. And | it lost to C, surely the RISC of languages. | | Never bet against low-tech. | markhahn wrote: | 432 was a pretty interesting flop, but surely the ia64 has to | rank up there. | | it would be interesting to try to chart some of the features that | show up in various chips. for instance, 64k segments in both 432 | and 286+, or VLIW in 860 and ia64. | ghaff wrote: | One difference is that, according to the article, Intel | actually learned quite a bit technically from the 432 even | though it was a commercial flop. It's hard to see much of a | silver lining in IA64/Itanium for either Intel or HP--or, | indeed, for all the other companies that wasted resources on | Itanium if only because they felt they had to cover their | bases. | fpoling wrote: | Itanic was a flop due to AMD releasing 64bit CPU. And I still | think Intel learned a lot from its failure if not from the | technology but business-wise. Just stick to improving the | existing architecture while keeping backward-compatibility. | markhahn wrote: | VLIW was really marooned in time: driven by overconfidence | in the compiler (which had shown that you could actually | expose pipeline hazards), and underestimates of the coming | abundance of transistors (which make superscalar OoO really | take off, along with giant onchip caches). well, and | multicore to sop up even more available transistors. | PAPPPmAc wrote: | IMO, Itanic was a doomed design from the start, the lesson | to be learned is that "You can't statically schedule | dynamic behavior." The VLIW/EPIC type designs like Itanium | require you have a _very clever_ compiler to schedule well | enough to extract even a tiny fraction of theoretical | performance for both instruction packing and memory | scheduling reasons. That turns out to be extremely | difficult in the best case, and in a dynamic environment | (with things like interrupts, a multitasking OS, bus | contention, DRAM refresh timing, etc.) it's basically | impossible. Doing much of the micro-scheduling dynamically | in the instruction decoder (see: all modern x86 parts that | decompose x86 instructions into whatever it is they run | internally that vendor generation) nearly always wins in | practice. | | Intel spent decades trying to clean-room a user-visible | high end architecture (iAPX432, then i860, then Itanium), | while the x86 world found a cheat code for microprocessors | with the dynamic translation of a standard ISA into | whatever fancy modern core you run internally (microcode- | on-top-of-a-RISC? Dynamic microcode? JIT instruction | decoder? I don't think we really have a comprehensive name | for it) thing. Arguably, NexGen were really the first to | the trick in 1994, with their Nx586 design that later | evolved into the AMD K6, but Intel's P6 - from which most | i686 designs descend - is an even better implementation of | the same trick less than a year later, and almost all | subsequent designs work that way. | wtallis wrote: | Based on https://en.wikipedia.org/wiki/File:Itanium_Sales_F | orecasts_e... it's clear that Itanium was delayed and sales | projections were drastically reduced multiple times before | AMD even announced their 64-bit alternative, let alone | actually shipping Opteron. (For reference, AMD announced | AMD64 in October 1999, published the spec August 2000, | shipped hardware in April 2003. Intel didn't publicly | confirm their plans to adopt x86-64 until February 2004, | and shipped hardware in June 2004.) | l1k wrote: | A lot of RISC CPU arches which were popular in the 1990's | declined because their promulgators stopped investments and | bet on switching to IA64 instead. Around the year 2000, VLIW | was seen as the future and all the CISC and RISC | architectures were considered obsolete. | | That strategic failure by competitors allowed x86 to grow | market share at the high end, which benefited Intel more than | the money lost on Itanium. | ghaff wrote: | It's more complicated than that. | | Sun didn't slow down on UltraSPARC or make an Itanium side | bet. IBM did (and continues to) place their big hardware | bet on Power--Itanium was mostly a cover your bases thing. | I don't know what HP would have done--presumably either | gone their own way with VLIW or kept PA-RISC going. | | Pretty much all the other RISC/Unix players had to go to a | standard processor; some were already on x86. Intel mostly | recovered from Itanium specifically but it didn't do them | any favors. | sliken wrote: | Actually, they did. Intel promised aggressive delivery | schedule, performance ramp, and performance. The industry | took it hook, line, and sinker. While AMD decided not to | limit 64 bit to the high end and brought out x86-64. | | Sun did a port IA64 port of solaris, which is definitely | an itanium side bet. | | HP was involved in the IA64 effort and definitely was | planning on the replacement of pa-risc from day 1. | davidgay wrote: | > HP was involved in the IA64 effort and definitely was | planning on the replacement of pa-risc from day 1. | | As my memory remembers and | https://en.wikipedia.org/wiki/Itanium agrees, Itanium | originated at HP. So yes, a replacement for pa-risc from | day 1, but even more so... | rodgerd wrote: | Another way to look at the Itanic is that HP somehow | conned Intel into betting the farm on building HP-PA3 for | HP. Which is pretty impressive. | foobiekr wrote: | This isn't really true. IBM/Motorola need to own the | failure of POWER and PowerPC and MIPS straight up died on | the performance side. Sun continued with Ultrasparc. | | It wasn't that IA64 killed them, it's that they were | getting shaky and IA64 appealed _because_ of that. Plus the | lack of a 64bit x86. | userbinator wrote: | _Plus the lack of a 64bit x86._ | | If you look at the definitions of various structures and | opcodes in x86 you'll notice gaps that would've been | ideal for a 64-bit expansion, so I think they had a plan | besides IA64, but AMD beat them to it (and IMHO with a | far more inelegant extension.) | panick21_ wrote: | Its simply economics Intel had the volume. Sun and SGI | simply didn't have the economics to invest the same | amount, and they were also not chip company, the both | didn't invest enough in chip design or invested it | wrongly. | | Sun spend an unbelievable amount of money on dumb ass | processor projects. | | Towards the end of the 90s all of them realized their | business model would not do well against Intel, so pretty | much all of them were looking for an exit and IA64 hype | basically killed most of them. Sun stuck it out with | Sparc with mixed results. IBM POWER continues but in a | thin slice of the market. | | Ironically there was a section of Digital and Intel who | thought that Alpha should be the bases of 64 bit x86. | That would have made Intel pretty dominate. Alpha (maybe | a TSO version) with 32 bit x86 comparability mode. | PAPPPmAc wrote: | Look closely at AMD designs (and staff) of the very late | 90s and early 2000s and/or all modern x86 parts and see | that ...more or less, that's what happened, just not with | an Alpha mode. | | Dirk Meyer (Co-Architect of the DEC Alpha 21064 and | 21264) lead the K7 (Athlon) project, and they run on a | licensed EV6 bus borrowed from the Alpha. | | Jim Keller (Co-Architect of the DEC Alpha 21164 21264) | lead the K8 (first gen x86-64) project, and there are a | number of design decisions in the K8 evocative of the | later Alpha designs. | | The vast majority of x86 parts since the (NexGen Nx686 | which became) AMD K6 and Pentium Pro (P6) have been | internal RISC-ish cores with decoders that ingest x86 | instructions and chunk them up to be scheduled on an | internal RISC architecture. | | It has turned out to sort of be a better-than-both-worlds | thing almost by accident. A major part of what did in the | VLIW-ish designs was that "You can't statically schedule | dynamic behavior" and a major problem for the RISC | designs was that exposing architectural innovations on a | RISC requires you change the ISA and/or memory behavior | in visible ways from generation to generation, | interfering with compatability so... the RISC- | behind-x86-decoder designs get to follow the state of the | art changing whatever they need to behind the decoder | without breaking compatibility AND get to have the | decoder do the micro-scheduling dynamically. | Dalewyn wrote: | >That strategic failure by competitors allowed x86 to grow | market share at the high end, which benefited Intel more | than the money lost on Itanium. | | In that sense, Itanium was a resounding success for Intel | (and AMD). | panick21_ wrote: | Itanium was a success right until they actually made a | chip. | | What they should have done is hype Itanium and then they | day it came out they should have said yeah this was a | joke, what we did is buy Alpha from Compaq and its | literally just Alpha with x86 comparability mode. | | Then they would have dominated. | jimmaswell wrote: | If I recall correctly and shuttling instructions around fast | enough is the main bottleneck right now, why do people want to | return to RISC? ___________________________________________________________________ (page generated 2023-04-02 23:00 UTC)