[HN Gopher] 8086 Microcode Disassembled
       ___________________________________________________________________
        
       8086 Microcode Disassembled
        
       Author : matt_d
       Score  : 169 points
       Date   : 2020-09-05 14:10 UTC (8 hours ago)
        
 (HTM) web link (www.reenigne.org)
 (TXT) w3m dump (www.reenigne.org)
        
       | derefr wrote:
       | One thing I've always wondered: in what ways is the design of
       | microcode instruction-sets for CISC-ISA CPUs, different from the
       | design of outward-presenting RISC ISAs?
       | 
       | For example, does microcode tend to have instructions that "half
       | complete" a transfer-level operation, leaving some registers in
       | an indeterminate state, under the assumption (which is, in
       | practice, a guarantee) that they'll always have another ucode op
       | executed after them that does "the rest of" the operation and so
       | puts things right?
       | 
       | Or, for another example, on CISC CPUs that have a small set of
       | system-visible registers, and use register renaming to map them
       | to a larger register file (e.g. x86_64), do the user-visible
       | register names make it into the microcode; or do the microcode
       | ops function directly on register-file offsets?
       | 
       | To answer these questions, though, we'd probably need a _survey_
       | of microcode for various CPUs, including modern ones. So I 'm not
       | holding my breath. Unless an engineer from Intel or the like
       | wants to jump in!
       | 
       | ------
       | 
       | I've also been curious whether there are any lessons in the
       | design of microcode ISAs, that can be applied in the design of
       | abstract-machine bytecode ISAs.
       | 
       | Right now, most bytecode ISAs are semi-idealized RISC ISAs, with
       | some load-time specialization of bytecode into VM-specific ops;
       | but rarely is there recompilation of bytecode into VM-specific
       | microcode. I'm curious why that is.
        
         | ajenner wrote:
         | Microcode instruction sets have different engineering trade-
         | offs to the user-visible ISA. In microcode, memory bandwidth
         | isn't such an issue so microcode instructions can be relatively
         | wide (21 bits compared to 8 for the 8086).
         | 
         | The microcode can also be relatively difficult to write. For
         | example, in the 8086 microcode I saw one place where there is a
         | "DEC2 tmpc" microinstruction (subtract 2 from tmpc), then tmpc
         | is loaded after that, after which the correct result is
         | available (this makes sense when you think about how the ALU
         | works on the chip, but in any normal ISA you have to load the
         | values into the operands before you perform operations on
         | them).
         | 
         | There's nothing in the 8086 microcode which creates any
         | temporary undetermined states as far as I can tell but there
         | may be combinations of microinstructions which could create a
         | race condition.
        
         | tenebrisalietum wrote:
         | So go to the 6502 for an example which has a decode ROM which
         | is similar in concept, but far more primitive and fixed rather
         | than programmable..
         | 
         | Instructions take a length of time from 2 to 7 cycles, with an
         | additional cycle under certain conditions.
         | 
         | The decode ROM determines what is done for each of those cycles
         | and allow the modularization of circuity for common purposes
         | among instructions.
         | 
         | I think CPUs such as the Z80 and definitely the 68000 had more
         | sophisticated mechanisms where the microcode was really a sub-
         | CPU executing actual micro-instructions.
         | 
         | > I've also been curious whether there are any lessons in the
         | design of microcode ISAs, that can be applied in the design of
         | abstract-machine bytecode ISAs.
         | 
         | I think something like Itanium's VLIW breaks this barrier
         | between microcode and ISA more than other ISAs and the lesson
         | was it's too difficult to port legacy software to it, so we
         | keep going on with CPUs that continue to support the appearance
         | of an in-order ISA initially developed in the 1970's with more
         | and more frankensteined extensions as time goes on.
        
         | kens wrote:
         | > in what ways is the design of microcode instruction-sets for
         | CISC-ISA CPUs, different from the design of outward-presenting
         | RISC ISAs?
         | 
         | There's a lot of variability in microcode designs, but based on
         | the microcodes I've examined closely (various IBM 360
         | mainframes, Xerox Alto, 8086), there are several
         | characteristics.
         | 
         | Microcode is usually much wider than instructions (21 bits for
         | the 8086, over 100 bits for some IBM machines). Microcode is
         | usually doing several things in parallel. An instruction set is
         | designed to be general-purpose and "make sense", while
         | microcode is nearly incomprehensible and does whatever bizarre
         | tricks are necessary to implement just what is needed for the
         | instruction set. (One important factor is that microcode
         | doesn't need to be backwards or forwards compatible, so
         | designers can do whatever they want.) Microcode's relationship
         | with memory is different since you're dealing with address and
         | data registers, not abstract reading and writing of memory.
         | Microcode needs to worry a lot more about timing. For instance,
         | in the 8086, an ALU operation is set up a cycle before it
         | happens. In the Xerox Alto, conditional branches happen a cycle
         | after you issue them.
         | 
         | For your specific question about registers, much of the 8086's
         | microcode ignores the specific register names, saying things
         | like move the generic source register to the ALU. The hardware
         | selects the appropriate register based on the instruction,
         | direction bit, etc. (I'm in the middle of writing a blog post
         | about this.)
         | 
         | For a more modern look at microcode, the book "The Anatomy of a
         | High-Performance Microprocessor" describes the AMD K6 processor
         | in way more detail than you'd want.
        
         | ChuckMcM wrote:
         | In my experience, CISC instruction sets are primarily a way to
         | create compact representations of fairly long instruction
         | words.
         | 
         | That arises from the way in which the register file, the ALU,
         | the flags, and various counters (instruction, stack, Etc) are
         | laid out in logic and the buses between them.
         | 
         | Something that I find really fun to do is to experiment with
         | compute architectures. I first started playing around with this
         | with an Altera DE2 board, then the Spartan III dev board from
         | Xilinx, and these days with a Lattice Ice40K board (Icebreaker,
         | and soon ULX3S board). There are a number of "soft" CPUs where
         | you can play around with this to your hearts content (given you
         | have enough gates in your FPGA, which is getting easier and
         | easier these days).
         | 
         | CISC instructions are, as the name suggests, are simply
         | "subroutine calls" or "macro calls" (depending you what era of
         | computing you were introduced to) into an underlying machine
         | that can move bits around each "clock". RISC is essentially
         | making that level of instruction available directly to the
         | compiler.
         | 
         | The most infamous version of exposing what is essentially
         | microcode to the compiler was, in my opinion, the Itanium which
         | has a really flexible native instruction set that the compiler
         | mixed and matched into pseudo instructions which it used to
         | compile code into. A more elegant version of this was the Xerox
         | PARC "D" machines which allowed you to load an instruction set
         | prior to booting into your actual applications. This made
         | Mesa[1] development interesting because you needed the
         | appropriate instruction set to go with the Mesa compiler you
         | were using.
         | 
         | [1] Mesa was Xerox's modular development language that inspired
         | Wirth's Modula 2 (I believe that was the ordering, Wirth might
         | claim it went the other way)
        
       | dylan604 wrote:
       | It seems like there have been a few disassembly write ups on the
       | 8086 lately. Are the tools getting to the point where this is
       | possible, or just enough people with enough serious interest in
       | this? Coincidence? Am I seeing a pattern that isn't really there?
        
         | ajenner wrote:
         | Probably not entirely a coincidence - Ken Shirriff is doing a
         | series on the 8086 which may account for at least one of the
         | other articles you've noticed. My disassembly was only possible
         | because of Ken's high-resolution photos of the die with the
         | metal layer removed - that's why it took me until now to do it.
        
           | dylan604 wrote:
           | so it's turtles all the way down? someone makes a break
           | through that gets used by someone else to make a different
           | break through kind of a thing. this is why science needs to
           | be open. no one person/group can do it all. i just wish that
           | research didn't have to be done in secret to protect
           | potential patent ability. Let the work be published and the
           | let the people responsible receive whatever
           | credit/recognition/awards deserved.
           | 
           | kudos for your efforts!
        
       | surfsvammel wrote:
       | Awesome stuff. Really nostalgic. An 8086 with yellow monochrome
       | screen was my first computer. It ran Police Quest I, I think.
        
         | ksaj wrote:
         | Have you used a green monochrome screen? I still remember the
         | first time I got one, because it was cheaper than those
         | newfangled amber screens.
         | 
         | At first I thought it was a little stupid because of how slow
         | the fade was when the cursor blinked, and it wasn't nearly as
         | sharp or vivid. But within the first few hours of hacking
         | around, I recognized how much easier on the eyes it was without
         | the flickery amber that wobbled when you clacked your teeth
         | together, and the weird random "snow" when refreshing the
         | screen in a text "animation."
         | 
         | If only fractals didn't take an hour or so to render back then,
         | an animated one at modern speeds would have been quite soothing
         | to watch that way.
         | 
         | Fractint - I'm shocked I actually remember the name.
         | Downloading it from a BBS is how I got my _second_ computer
         | virus! Exciting times. Nostalgic is right.
        
           | dm319 wrote:
           | Have you come across coolretroterm? It simulates the snow and
           | wobble of those screens, and I think, does a reasonable job.
           | Not sure if it would work with a graphical program though.
        
       | viler wrote:
       | Outstanding work - never fails to amaze me when people unearth
       | little secrets like that 4 decades after the fact. That
       | MUL/IMUL/IDIV status bit hack is one for the ages.
        
       | ajenner wrote:
       | Author here if anyone has any questions.
        
         | userbinator wrote:
         | Does the microcode give any hints on why the general PUSH and
         | POP are in completely different places in the opcode map (push
         | is FF/6, pop is in its own group in 8F/0 with 8F/1-7 invalid,
         | while FF/7 is unused)? It almost looks like FF/7 was supposed
         | to be the pop. I've always wondered what 8F/1-7 and FF/7 do on
         | an 8086/8 too, but it's very hard to find that information.
        
         | mkup wrote:
         | Does MUL/IMUL/IDIV result negation trick (via REP prefix) work
         | on later 8086-compatible Intel CPUs (e.g. 80286, 80386 etc)?
        
           | ajenner wrote:
           | I have just learned from dreNorteR on VCF that it no effect
           | on a 286 but has a different, unexpected, and useful effect
           | on a 186! http://www.vcfed.org/forum/showthread.php?76657-808
           | 8-8086-mi...
        
         | dm319 wrote:
         | I wonder if you could do a version of this article for a lay
         | person like me? I really enjoyed Ken's articles because it
         | assumed very little knowledge.
        
         | derefr wrote:
         | > While most of the unused parts of the ROM (64 instructions)
         | are filled with zeroes, there are a few parts which aren't. The
         | following instructions appear right at the end of the ROM [...]
         | 
         | Given that they're right at the end -- and seemingly
         | intentionally written there _after_ the rest of the unused
         | space before them was zeroed -- might those bytes be a checksum
         | of the ROM?
        
           | ajenner wrote:
           | I don't think there's anything on the chip that could compute
           | a checksum of the microcode ROM contents. It could be some
           | kind of copyright message perhaps, though I don't know how
           | it's encoded and it's only 42 bits long so there isn't much
           | space for anything meaningful.
        
             | derefr wrote:
             | I would guess that it's not a runtime-verified checksum,
             | but rather a simple embedded "sum complement" value, used
             | for ROM-mastering-time integrity verification.
             | 
             | A sum-complement value is a value computed _from_ some
             | data, such that, when the data is checksummed with the sum-
             | complement value now embedded _into_ it, the data will sum
             | to zero. This approach to checksumming is useful, as any
             | potential verifier just has to throw the image-as-a-whole
             | through the checksumming algorithm, and ensure that the
             | output is zero. It doesn't need one iota of knowledge about
             | _what_ it's verifying. It doesn't even need an extra
             | machine-register to hold the expected checksum.
             | 
             | These "blind" checksums allow ROM production hardware
             | (programmers, copiers) to both pre-verify the integrity of
             | the input image, and to post-verify that it has programmed
             | the image onto a chip successfully. No special container
             | format for the ROM image is required, nor is the ROM image
             | required to be structured in any particular way (which is
             | good, because ROMs are used for all sorts of things, not
             | just code.) The ROM image can be any opaque blob, just as
             | long as it sums to zero.
             | 
             | In fact, you don't even need a ROM "image" at all. It's
             | possible to integrity-verify a programmed ROM "against
             | itself"; and thus, a hand-programmed ROM (e.g. an EEPROM
             | you programmed in your office) can be sent to the
             | duplication facility to serve as the reference from which
             | mask-ROM masks will be generated. The data on the EEPROM
             | can be trusted, because it sums to zero. And the mask ROMs
             | themselves can be checked for flaws by seeing whether
             | _they_ sum to zero.
             | 
             | For smaller-scale ROM distribution, ROM-to-PROM bulk
             | copiers are used. These copiers can be made to both pre-
             | verify the source, and to post-verify the programmed
             | copies. Using this approach to checksumming, the copier can
             | avoid having to verify the source "against" the
             | destination, instead only needing to verify the source
             | once, and then verify the destinations against themselves.
             | This both speeds up verification; and allows for the use of
             | simpler microcontrollers in these copiers, which reduces
             | their design cost. (By quite a lot, back in the 1970s, when
             | all this was most relevant.)
             | 
             | You can see this approach to checksumming in practice in
             | early-generation game cartridge ROMs, which almost always
             | have these embedded sum-complement values (and so
             | presumably were integrity-verified during
             | mastering/duplication.) These sum-complement value fields
             | get referred to by emulators as "the checksum" of the ROM
             | image--but technically, they're not; if you're following
             | along, you'll realize that "the checksum" of such ROM
             | images is zero! :)
        
           | [deleted]
        
         | soufron wrote:
         | Yup. Who did write the microcode at the time? And for how long?
        
           | ajenner wrote:
           | According to https://en.wikipedia.org/wiki/Intel_8086: "The
           | architecture was defined by Stephen P. Morse with some help
           | and assistance by Bruce Ravenel (the architect of the 8087)
           | in refining the final revisions. Logic designer Jim McKevitt
           | and John Bayliss were the lead engineers of the hardware-
           | level development team and Bill Pohlman the manager for the
           | project." I expect the microcode was developed in tandem with
           | the rest of the chip, so probably took about 2 years.
        
         | schoen wrote:
         | What's "random logic"? From context, it sounds like circuitry
         | that explicitly implements the functionality of an opcode, as
         | opposed to circuitry that can be used by the microcode, or
         | something?
        
           | ajenner wrote:
           | Yes, exactly - the logic that implements the simpler
           | instructions directly as special-purpose gates rather than
           | microcode.
        
             | kens wrote:
             | To expand on that, "random logic" means that it looks
             | random; it's not _actually_ random. This is in contrast to
             | circuits that have an underlying structure to them, like a
             | PLA or ROM.
        
       | pkphilip wrote:
       | Amazing work! Can't say I understood half of what you have
       | written, but sure is some top quality work!
        
       | procd wrote:
       | Amazing!
        
       | rkagerer wrote:
       | Chris Gerlinsky did a talk last year on the process he uses to
       | decap chips and extract their ROM bits with a microscope:
       | https://youtu.be/4YpSevQWCX8
       | 
       | My favorite line was when he described how one hint you've got
       | the decoding right might be stumbling upon a recognizable ASCII
       | string, and said "sometimes the only ASCII text you find is a
       | copyright notice... keep putting those in, that's great!"
        
       | kevbin wrote:
       | The link to https://www.righto.com/2020/06/a-look-at-die-
       | of-8086-process... is worth clicking
        
       ___________________________________________________________________
       (page generated 2020-09-05 23:00 UTC)