[HN Gopher] iAPX432: Gordon Moore, Risk and Intel's Super-CISC F...
       ___________________________________________________________________
        
       iAPX432: Gordon Moore, Risk and Intel's Super-CISC Failure
        
       Author : klelatti
       Score  : 74 points
       Date   : 2023-04-02 16:33 UTC (6 hours ago)
        
 (HTM) web link (thechipletter.substack.com)
 (TXT) w3m dump (thechipletter.substack.com)
        
       | nickdothutton wrote:
       | I think the iAPX432 team went on to do the i960 (another
       | interesting architecture that didn't really find the success that
       | was hoped-for) and then finally they went on to the PentiumPro
       | where they found more success.
        
         | convolvatron wrote:
         | it wasn't a total failure. the 860 and 960 were decent little
         | engines that found a home in high performance computing and
         | embedded applications that needed a little oomph. I worked on
         | some 860 array products and certainly remember finding 960s in
         | printers and other gear
        
           | speedbird wrote:
           | Worked on an i860 Stratus machine in the early 90s - provided
           | a key part of our distributed infra due to its FT
           | capabilities.
        
           | trzy wrote:
           | Trivia: The i960 was used to power Sega's highly successful
           | Model 2 arcade board, which truly ushered in the era of 3D
           | gaming* (with Daytona USA), and was used in the F-22 Raptor
           | until it was later replaced with other CPUs.
           | 
           | * Certainly not the first 3D arcade hardware but arguably
           | this along with Namco's MIPS-based System 22 (Ridge Racer,
           | released a little before Daytona), was the inflection point
           | that made 2D effectively obsolete.
        
         | panick21_ wrote:
         | Its really said the i960 didn't take off. Intel wasn't in Unix
         | workstation market much and the high end was owned by Digital
         | and IBM. Intel was working on i860, i960,
         | 
         | Intel if they had added quicker, could have cooperated with a
         | Unix workstation maker and potentially done really well.
         | 
         | Sun was defiantly looking around for a chip partner at the
         | time, but non of the American companies were interested so they
         | went Japan. So the timing didn't really work out. A Sun Intel
         | alliance would have been a scary prospect and beneficial for
         | both companies.
        
       | twoodfin wrote:
       | What's most interesting to me about the i432 is the rich array of
       | object types essentially embedded into its ISA. The JVM "knows" a
       | little bit about virtual dispatch tables, monitors, arrays, but
       | even that pales in comparison to the i432's user-facing model of
       | the CPU state.
       | 
       | Is there anything comparable surviving today?
        
         | userbinator wrote:
         | There were some attempts in the Java direction:
         | https://en.wikipedia.org/wiki/Java_processor
         | 
         | But ultimately it seems that the idea of language-specific CPUs
         | just didn't survive because people want to be able to use any
         | programming language with them.
        
           | panick21_ wrote:
           | The Java-Everything trip Sun went on was truly horrific. Both
           | in terms of technical and business results.
        
         | gumby wrote:
         | I don't think so, except at the margins.
         | 
         | I started out as a Lisp hacker on machines designed for it
         | (PDP-10 and CADR, later D-machines) so I was very much in the
         | camp you describe. They had hardware / microcode support for
         | tagging, unboxing, fundamental Lisp opcodes, and for the Lispms
         | specifically, things like a GC barrier and transporter support.
         | When I looked at implementations like VAXLisp, the extra cycles
         | needed to implement these things seemed like a burden to me.
         | 
         | Of course those machines did lots of other things as well, and
         | so were subject to a lot of evolutionary pressure the research
         | machines were not subject to.
         | 
         | The shocker that changed my mind was the idea of using the TLB
         | to implement the write barrier. Yes, doing all that extra work
         | cost cycles, but you were doing on a machine that had evolved
         | lots of extra capabilities that could ameliorate some of the
         | burden. Plus the underlying hardware just got faster faster
         | (I.e. second derivative was higher).
         | 
         | Meanwhile, the more dedicated architectures were burning
         | valuable real estate on these features and couldn't keep up
         | elsewhere. You saw this in the article when the author wrote
         | about gates that could have been used elsewhere.
         | 
         | Finally, some decisions box you in -- the 64kb object size
         | limitation being an example in the 432. Sure, you can work
         | around it, but then the support for these objects becomes a
         | deadweight (part of the RISC argument).
         | 
         | You see this also in the use of GPUs as huge parallel machines,
         | even though the original programming abstraction was triangles.
         | 
         | Going back to my first sentence about "at the margins":
         | optimize at the end. Apple famously added a "jvm" instruction
         | -- must have been the fruit of a lot of metering! Note that
         | they didn't have to do this for Objective-C: some extremely
         | clever programming made dispatch cheap.
         | 
         | Tagging/unboxing can be supported in a variety of (relatively)
         | inexpensive ways by using ALU circuitry otherwise idle during
         | address calculation OR (more likely these days) by implementing
         | a couple of in demand ops, either way pretty cheap.
         | 
         | Finally, we do have a return to and flourishing of separate,
         | specialized functional units (image processors, "learning"
         | units and such, like, say, the database hardware of old) but
         | they aren't generally fully programmable (even if they have
         | processors embedded in them) but they key factor is that they
         | don't interfere (except via some DMA) with the core processing
         | operations.
        
           | aardvark179 wrote:
           | "Going back to my first sentence about "at the margins":
           | optimize at the end. Apple famously added a "jvm" instruction
           | -- must have been the fruit of a lot of metering! Note that
           | they didn't have to do this for Objective-C: some extremely
           | clever programming made dispatch cheap."
           | 
           | I'm struggling to think of what you are referring to here.
           | ARM added op codes for running JVM byte code on the processor
           | itself, but I think those instructions were dropped a long
           | time ago. ARM also added an instruction (floating point
           | convert to fixed point rounding towards zero) as it became
           | such a common operation in JS code. There have also been
           | various GC related instructions and features added to POWER,
           | but I think all that was well after Apple had abandoned the
           | architecture.
           | 
           | I may be forgetting sonething, could you clarify?
        
           | panick21_ wrote:
           | Not adding tagging is basically a negligence crime. That
           | feature isn't that expensive and it could have saved most of
           | the security issues that have happened to last 20+ years.
        
             | Lammy wrote:
             | NSA et al probably like better it that way so they have
             | easier access to the "'intel' inside" my PC.
        
         | yourapostasy wrote:
         | _> Is there anything comparable surviving today?_
         | 
         | I'm not aware of such Super CISC instruction sets in popular
         | use today, but I wonder with VM's and statistically-based AI
         | proliferating now, whether we might revisit such architectures
         | in the future. Could continuous VM-collected statistical data
         | inform compiler and JIT compiler design to collapse expensive,
         | common complex operations we can't identify patterns for with
         | current methods into Super CISC instructions that substantially
         | speed up patterns we didn't know previously existed, or are our
         | current methods to analyze and implement compilers and JIT's
         | good enough and what's mostly holding them back these days are
         | other factors like memory and cache access speed and pipeline
         | stalls?
        
         | rodgerd wrote:
         | > Is there anything comparable surviving today?
         | 
         | Surviving? No. The most recent is arguably Sun's Rock
         | processor, which was one of the final nails in their coffin,
         | was quite an i432 redux. It promised all sorts of hardware
         | support for transactions and other features that Sun thought
         | would make it a killer chip, was rumoured to tape out requiring
         | 1 kW of power for mediocre performance, and Oracle killed it
         | when they saw how dysfunctional it was.
        
         | monocasa wrote:
         | If you squint hard enough, the underlying object capability
         | system as privilege boundary concept still does live on.
         | 
         | In hardware the 432 went on to inspire 16 and 32 protected
         | modes on x86. There it was inspiration for just about anything
         | involving the GDT and the LDT including fine grained memory
         | segments, hardware task switching of Task State Segments, and
         | virtual dispatch through Task Gates.
         | 
         | But a large point of the RISC revolution was that these kinds
         | of abstractions in microcode don't make sense anymore when you
         | have ubiquitous I$s. Rather than a fixed length blob of vendor
         | code that's hard to update, let end users create whatever
         | abstractions they feel make sense in regular (albeit if
         | privileged) code. Towards that end the 90s and 2000s had an
         | explosion of supervisor mode enforced object capability
         | systems. These days the most famous is probably sel4; there are
         | a lot of parallels between sel4's syscall layer and the
         | iAPX432's object capability interface between user code and
         | microcode. In a lot of ways the most charitable way to look at
         | the iAPX432 microcode was as a very early microkernel in ROM.
        
           | markhahn wrote:
           | timing is interesting: ia432 listed as "late 1981"
           | (wikipedia), and 286 (protected mode 16:16 segmentation) in
           | Feb 1982. of course, the 432 had been going on for some
           | time...
        
       | CalChris wrote:
       | The ambitious 432 was also late, quite late. So Intel needed a
       | simple stopgap product which was an iteration of the 8088, the
       | 8086.
        
         | wtallis wrote:
         | The 8088 (1979) was a low-cost (reduced bus width) follow-up to
         | the 8086 (1978). You may be thinking of the 8080 (1974) or 8085
         | (1976).
        
       | B1FF_PSUVM wrote:
       | """
       | 
       | The key new features included:
       | 
       | Ada : The architecture would be programmed using the Ada
       | programming language, which at the time was seen as the 'next big
       | thing' in languages.
       | 
       | """
       | 
       | This was it - the next big thing. Missed and went down, but
       | there's always the doubt if they were right too early ...
       | 
       | Seems the performance just wasn't there:
       | 
       | """
       | 
       | Gordon admitted that "to a significant extent," he was personally
       | responsible. "It was a very aggressive shot at a new
       | microprocessor, but we were so aggressive that the performance
       | was way below par."
       | 
       | """
       | 
       | It was kind of uncanny having it shouted from the rooftops one
       | year, to dead silence about anything having ever happened a few
       | years later.
        
         | markhahn wrote:
         | Ada was the CISC of languages - high-concept the same way. And
         | it lost to C, surely the RISC of languages.
         | 
         | Never bet against low-tech.
        
       | markhahn wrote:
       | 432 was a pretty interesting flop, but surely the ia64 has to
       | rank up there.
       | 
       | it would be interesting to try to chart some of the features that
       | show up in various chips. for instance, 64k segments in both 432
       | and 286+, or VLIW in 860 and ia64.
        
         | ghaff wrote:
         | One difference is that, according to the article, Intel
         | actually learned quite a bit technically from the 432 even
         | though it was a commercial flop. It's hard to see much of a
         | silver lining in IA64/Itanium for either Intel or HP--or,
         | indeed, for all the other companies that wasted resources on
         | Itanium if only because they felt they had to cover their
         | bases.
        
           | fpoling wrote:
           | Itanic was a flop due to AMD releasing 64bit CPU. And I still
           | think Intel learned a lot from its failure if not from the
           | technology but business-wise. Just stick to improving the
           | existing architecture while keeping backward-compatibility.
        
             | markhahn wrote:
             | VLIW was really marooned in time: driven by overconfidence
             | in the compiler (which had shown that you could actually
             | expose pipeline hazards), and underestimates of the coming
             | abundance of transistors (which make superscalar OoO really
             | take off, along with giant onchip caches). well, and
             | multicore to sop up even more available transistors.
        
             | PAPPPmAc wrote:
             | IMO, Itanic was a doomed design from the start, the lesson
             | to be learned is that "You can't statically schedule
             | dynamic behavior." The VLIW/EPIC type designs like Itanium
             | require you have a _very clever_ compiler to schedule well
             | enough to extract even a tiny fraction of theoretical
             | performance for both instruction packing and memory
             | scheduling reasons. That turns out to be extremely
             | difficult in the best case, and in a dynamic environment
             | (with things like interrupts, a multitasking OS, bus
             | contention, DRAM refresh timing, etc.) it's basically
             | impossible. Doing much of the micro-scheduling dynamically
             | in the instruction decoder (see: all modern x86 parts that
             | decompose x86 instructions into whatever it is they run
             | internally that vendor generation) nearly always wins in
             | practice.
             | 
             | Intel spent decades trying to clean-room a user-visible
             | high end architecture (iAPX432, then i860, then Itanium),
             | while the x86 world found a cheat code for microprocessors
             | with the dynamic translation of a standard ISA into
             | whatever fancy modern core you run internally (microcode-
             | on-top-of-a-RISC? Dynamic microcode? JIT instruction
             | decoder? I don't think we really have a comprehensive name
             | for it) thing. Arguably, NexGen were really the first to
             | the trick in 1994, with their Nx586 design that later
             | evolved into the AMD K6, but Intel's P6 - from which most
             | i686 designs descend - is an even better implementation of
             | the same trick less than a year later, and almost all
             | subsequent designs work that way.
        
             | wtallis wrote:
             | Based on https://en.wikipedia.org/wiki/File:Itanium_Sales_F
             | orecasts_e... it's clear that Itanium was delayed and sales
             | projections were drastically reduced multiple times before
             | AMD even announced their 64-bit alternative, let alone
             | actually shipping Opteron. (For reference, AMD announced
             | AMD64 in October 1999, published the spec August 2000,
             | shipped hardware in April 2003. Intel didn't publicly
             | confirm their plans to adopt x86-64 until February 2004,
             | and shipped hardware in June 2004.)
        
           | l1k wrote:
           | A lot of RISC CPU arches which were popular in the 1990's
           | declined because their promulgators stopped investments and
           | bet on switching to IA64 instead. Around the year 2000, VLIW
           | was seen as the future and all the CISC and RISC
           | architectures were considered obsolete.
           | 
           | That strategic failure by competitors allowed x86 to grow
           | market share at the high end, which benefited Intel more than
           | the money lost on Itanium.
        
             | ghaff wrote:
             | It's more complicated than that.
             | 
             | Sun didn't slow down on UltraSPARC or make an Itanium side
             | bet. IBM did (and continues to) place their big hardware
             | bet on Power--Itanium was mostly a cover your bases thing.
             | I don't know what HP would have done--presumably either
             | gone their own way with VLIW or kept PA-RISC going.
             | 
             | Pretty much all the other RISC/Unix players had to go to a
             | standard processor; some were already on x86. Intel mostly
             | recovered from Itanium specifically but it didn't do them
             | any favors.
        
               | sliken wrote:
               | Actually, they did. Intel promised aggressive delivery
               | schedule, performance ramp, and performance. The industry
               | took it hook, line, and sinker. While AMD decided not to
               | limit 64 bit to the high end and brought out x86-64.
               | 
               | Sun did a port IA64 port of solaris, which is definitely
               | an itanium side bet.
               | 
               | HP was involved in the IA64 effort and definitely was
               | planning on the replacement of pa-risc from day 1.
        
               | davidgay wrote:
               | > HP was involved in the IA64 effort and definitely was
               | planning on the replacement of pa-risc from day 1.
               | 
               | As my memory remembers and
               | https://en.wikipedia.org/wiki/Itanium agrees, Itanium
               | originated at HP. So yes, a replacement for pa-risc from
               | day 1, but even more so...
        
               | rodgerd wrote:
               | Another way to look at the Itanic is that HP somehow
               | conned Intel into betting the farm on building HP-PA3 for
               | HP. Which is pretty impressive.
        
             | foobiekr wrote:
             | This isn't really true. IBM/Motorola need to own the
             | failure of POWER and PowerPC and MIPS straight up died on
             | the performance side. Sun continued with Ultrasparc.
             | 
             | It wasn't that IA64 killed them, it's that they were
             | getting shaky and IA64 appealed _because_ of that. Plus the
             | lack of a 64bit x86.
        
               | userbinator wrote:
               | _Plus the lack of a 64bit x86._
               | 
               | If you look at the definitions of various structures and
               | opcodes in x86 you'll notice gaps that would've been
               | ideal for a 64-bit expansion, so I think they had a plan
               | besides IA64, but AMD beat them to it (and IMHO with a
               | far more inelegant extension.)
        
               | panick21_ wrote:
               | Its simply economics Intel had the volume. Sun and SGI
               | simply didn't have the economics to invest the same
               | amount, and they were also not chip company, the both
               | didn't invest enough in chip design or invested it
               | wrongly.
               | 
               | Sun spend an unbelievable amount of money on dumb ass
               | processor projects.
               | 
               | Towards the end of the 90s all of them realized their
               | business model would not do well against Intel, so pretty
               | much all of them were looking for an exit and IA64 hype
               | basically killed most of them. Sun stuck it out with
               | Sparc with mixed results. IBM POWER continues but in a
               | thin slice of the market.
               | 
               | Ironically there was a section of Digital and Intel who
               | thought that Alpha should be the bases of 64 bit x86.
               | That would have made Intel pretty dominate. Alpha (maybe
               | a TSO version) with 32 bit x86 comparability mode.
        
               | PAPPPmAc wrote:
               | Look closely at AMD designs (and staff) of the very late
               | 90s and early 2000s and/or all modern x86 parts and see
               | that ...more or less, that's what happened, just not with
               | an Alpha mode.
               | 
               | Dirk Meyer (Co-Architect of the DEC Alpha 21064 and
               | 21264) lead the K7 (Athlon) project, and they run on a
               | licensed EV6 bus borrowed from the Alpha.
               | 
               | Jim Keller (Co-Architect of the DEC Alpha 21164 21264)
               | lead the K8 (first gen x86-64) project, and there are a
               | number of design decisions in the K8 evocative of the
               | later Alpha designs.
               | 
               | The vast majority of x86 parts since the (NexGen Nx686
               | which became) AMD K6 and Pentium Pro (P6) have been
               | internal RISC-ish cores with decoders that ingest x86
               | instructions and chunk them up to be scheduled on an
               | internal RISC architecture.
               | 
               | It has turned out to sort of be a better-than-both-worlds
               | thing almost by accident. A major part of what did in the
               | VLIW-ish designs was that "You can't statically schedule
               | dynamic behavior" and a major problem for the RISC
               | designs was that exposing architectural innovations on a
               | RISC requires you change the ISA and/or memory behavior
               | in visible ways from generation to generation,
               | interfering with compatability so... the RISC-
               | behind-x86-decoder designs get to follow the state of the
               | art changing whatever they need to behind the decoder
               | without breaking compatibility AND get to have the
               | decoder do the micro-scheduling dynamically.
        
             | Dalewyn wrote:
             | >That strategic failure by competitors allowed x86 to grow
             | market share at the high end, which benefited Intel more
             | than the money lost on Itanium.
             | 
             | In that sense, Itanium was a resounding success for Intel
             | (and AMD).
        
               | panick21_ wrote:
               | Itanium was a success right until they actually made a
               | chip.
               | 
               | What they should have done is hype Itanium and then they
               | day it came out they should have said yeah this was a
               | joke, what we did is buy Alpha from Compaq and its
               | literally just Alpha with x86 comparability mode.
               | 
               | Then they would have dominated.
        
       | jimmaswell wrote:
       | If I recall correctly and shuttling instructions around fast
       | enough is the main bottleneck right now, why do people want to
       | return to RISC?
        
       ___________________________________________________________________
       (page generated 2023-04-02 23:00 UTC)