[HN Gopher] 8086 Microcode Disassembled ___________________________________________________________________ 8086 Microcode Disassembled Author : matt_d Score : 169 points Date : 2020-09-05 14:10 UTC (8 hours ago) (HTM) web link (www.reenigne.org) (TXT) w3m dump (www.reenigne.org) | derefr wrote: | One thing I've always wondered: in what ways is the design of | microcode instruction-sets for CISC-ISA CPUs, different from the | design of outward-presenting RISC ISAs? | | For example, does microcode tend to have instructions that "half | complete" a transfer-level operation, leaving some registers in | an indeterminate state, under the assumption (which is, in | practice, a guarantee) that they'll always have another ucode op | executed after them that does "the rest of" the operation and so | puts things right? | | Or, for another example, on CISC CPUs that have a small set of | system-visible registers, and use register renaming to map them | to a larger register file (e.g. x86_64), do the user-visible | register names make it into the microcode; or do the microcode | ops function directly on register-file offsets? | | To answer these questions, though, we'd probably need a _survey_ | of microcode for various CPUs, including modern ones. So I 'm not | holding my breath. Unless an engineer from Intel or the like | wants to jump in! | | ------ | | I've also been curious whether there are any lessons in the | design of microcode ISAs, that can be applied in the design of | abstract-machine bytecode ISAs. | | Right now, most bytecode ISAs are semi-idealized RISC ISAs, with | some load-time specialization of bytecode into VM-specific ops; | but rarely is there recompilation of bytecode into VM-specific | microcode. I'm curious why that is. | ajenner wrote: | Microcode instruction sets have different engineering trade- | offs to the user-visible ISA. In microcode, memory bandwidth | isn't such an issue so microcode instructions can be relatively | wide (21 bits compared to 8 for the 8086). | | The microcode can also be relatively difficult to write. For | example, in the 8086 microcode I saw one place where there is a | "DEC2 tmpc" microinstruction (subtract 2 from tmpc), then tmpc | is loaded after that, after which the correct result is | available (this makes sense when you think about how the ALU | works on the chip, but in any normal ISA you have to load the | values into the operands before you perform operations on | them). | | There's nothing in the 8086 microcode which creates any | temporary undetermined states as far as I can tell but there | may be combinations of microinstructions which could create a | race condition. | tenebrisalietum wrote: | So go to the 6502 for an example which has a decode ROM which | is similar in concept, but far more primitive and fixed rather | than programmable.. | | Instructions take a length of time from 2 to 7 cycles, with an | additional cycle under certain conditions. | | The decode ROM determines what is done for each of those cycles | and allow the modularization of circuity for common purposes | among instructions. | | I think CPUs such as the Z80 and definitely the 68000 had more | sophisticated mechanisms where the microcode was really a sub- | CPU executing actual micro-instructions. | | > I've also been curious whether there are any lessons in the | design of microcode ISAs, that can be applied in the design of | abstract-machine bytecode ISAs. | | I think something like Itanium's VLIW breaks this barrier | between microcode and ISA more than other ISAs and the lesson | was it's too difficult to port legacy software to it, so we | keep going on with CPUs that continue to support the appearance | of an in-order ISA initially developed in the 1970's with more | and more frankensteined extensions as time goes on. | kens wrote: | > in what ways is the design of microcode instruction-sets for | CISC-ISA CPUs, different from the design of outward-presenting | RISC ISAs? | | There's a lot of variability in microcode designs, but based on | the microcodes I've examined closely (various IBM 360 | mainframes, Xerox Alto, 8086), there are several | characteristics. | | Microcode is usually much wider than instructions (21 bits for | the 8086, over 100 bits for some IBM machines). Microcode is | usually doing several things in parallel. An instruction set is | designed to be general-purpose and "make sense", while | microcode is nearly incomprehensible and does whatever bizarre | tricks are necessary to implement just what is needed for the | instruction set. (One important factor is that microcode | doesn't need to be backwards or forwards compatible, so | designers can do whatever they want.) Microcode's relationship | with memory is different since you're dealing with address and | data registers, not abstract reading and writing of memory. | Microcode needs to worry a lot more about timing. For instance, | in the 8086, an ALU operation is set up a cycle before it | happens. In the Xerox Alto, conditional branches happen a cycle | after you issue them. | | For your specific question about registers, much of the 8086's | microcode ignores the specific register names, saying things | like move the generic source register to the ALU. The hardware | selects the appropriate register based on the instruction, | direction bit, etc. (I'm in the middle of writing a blog post | about this.) | | For a more modern look at microcode, the book "The Anatomy of a | High-Performance Microprocessor" describes the AMD K6 processor | in way more detail than you'd want. | ChuckMcM wrote: | In my experience, CISC instruction sets are primarily a way to | create compact representations of fairly long instruction | words. | | That arises from the way in which the register file, the ALU, | the flags, and various counters (instruction, stack, Etc) are | laid out in logic and the buses between them. | | Something that I find really fun to do is to experiment with | compute architectures. I first started playing around with this | with an Altera DE2 board, then the Spartan III dev board from | Xilinx, and these days with a Lattice Ice40K board (Icebreaker, | and soon ULX3S board). There are a number of "soft" CPUs where | you can play around with this to your hearts content (given you | have enough gates in your FPGA, which is getting easier and | easier these days). | | CISC instructions are, as the name suggests, are simply | "subroutine calls" or "macro calls" (depending you what era of | computing you were introduced to) into an underlying machine | that can move bits around each "clock". RISC is essentially | making that level of instruction available directly to the | compiler. | | The most infamous version of exposing what is essentially | microcode to the compiler was, in my opinion, the Itanium which | has a really flexible native instruction set that the compiler | mixed and matched into pseudo instructions which it used to | compile code into. A more elegant version of this was the Xerox | PARC "D" machines which allowed you to load an instruction set | prior to booting into your actual applications. This made | Mesa[1] development interesting because you needed the | appropriate instruction set to go with the Mesa compiler you | were using. | | [1] Mesa was Xerox's modular development language that inspired | Wirth's Modula 2 (I believe that was the ordering, Wirth might | claim it went the other way) | dylan604 wrote: | It seems like there have been a few disassembly write ups on the | 8086 lately. Are the tools getting to the point where this is | possible, or just enough people with enough serious interest in | this? Coincidence? Am I seeing a pattern that isn't really there? | ajenner wrote: | Probably not entirely a coincidence - Ken Shirriff is doing a | series on the 8086 which may account for at least one of the | other articles you've noticed. My disassembly was only possible | because of Ken's high-resolution photos of the die with the | metal layer removed - that's why it took me until now to do it. | dylan604 wrote: | so it's turtles all the way down? someone makes a break | through that gets used by someone else to make a different | break through kind of a thing. this is why science needs to | be open. no one person/group can do it all. i just wish that | research didn't have to be done in secret to protect | potential patent ability. Let the work be published and the | let the people responsible receive whatever | credit/recognition/awards deserved. | | kudos for your efforts! | surfsvammel wrote: | Awesome stuff. Really nostalgic. An 8086 with yellow monochrome | screen was my first computer. It ran Police Quest I, I think. | ksaj wrote: | Have you used a green monochrome screen? I still remember the | first time I got one, because it was cheaper than those | newfangled amber screens. | | At first I thought it was a little stupid because of how slow | the fade was when the cursor blinked, and it wasn't nearly as | sharp or vivid. But within the first few hours of hacking | around, I recognized how much easier on the eyes it was without | the flickery amber that wobbled when you clacked your teeth | together, and the weird random "snow" when refreshing the | screen in a text "animation." | | If only fractals didn't take an hour or so to render back then, | an animated one at modern speeds would have been quite soothing | to watch that way. | | Fractint - I'm shocked I actually remember the name. | Downloading it from a BBS is how I got my _second_ computer | virus! Exciting times. Nostalgic is right. | dm319 wrote: | Have you come across coolretroterm? It simulates the snow and | wobble of those screens, and I think, does a reasonable job. | Not sure if it would work with a graphical program though. | viler wrote: | Outstanding work - never fails to amaze me when people unearth | little secrets like that 4 decades after the fact. That | MUL/IMUL/IDIV status bit hack is one for the ages. | ajenner wrote: | Author here if anyone has any questions. | userbinator wrote: | Does the microcode give any hints on why the general PUSH and | POP are in completely different places in the opcode map (push | is FF/6, pop is in its own group in 8F/0 with 8F/1-7 invalid, | while FF/7 is unused)? It almost looks like FF/7 was supposed | to be the pop. I've always wondered what 8F/1-7 and FF/7 do on | an 8086/8 too, but it's very hard to find that information. | mkup wrote: | Does MUL/IMUL/IDIV result negation trick (via REP prefix) work | on later 8086-compatible Intel CPUs (e.g. 80286, 80386 etc)? | ajenner wrote: | I have just learned from dreNorteR on VCF that it no effect | on a 286 but has a different, unexpected, and useful effect | on a 186! http://www.vcfed.org/forum/showthread.php?76657-808 | 8-8086-mi... | dm319 wrote: | I wonder if you could do a version of this article for a lay | person like me? I really enjoyed Ken's articles because it | assumed very little knowledge. | derefr wrote: | > While most of the unused parts of the ROM (64 instructions) | are filled with zeroes, there are a few parts which aren't. The | following instructions appear right at the end of the ROM [...] | | Given that they're right at the end -- and seemingly | intentionally written there _after_ the rest of the unused | space before them was zeroed -- might those bytes be a checksum | of the ROM? | ajenner wrote: | I don't think there's anything on the chip that could compute | a checksum of the microcode ROM contents. It could be some | kind of copyright message perhaps, though I don't know how | it's encoded and it's only 42 bits long so there isn't much | space for anything meaningful. | derefr wrote: | I would guess that it's not a runtime-verified checksum, | but rather a simple embedded "sum complement" value, used | for ROM-mastering-time integrity verification. | | A sum-complement value is a value computed _from_ some | data, such that, when the data is checksummed with the sum- | complement value now embedded _into_ it, the data will sum | to zero. This approach to checksumming is useful, as any | potential verifier just has to throw the image-as-a-whole | through the checksumming algorithm, and ensure that the | output is zero. It doesn't need one iota of knowledge about | _what_ it's verifying. It doesn't even need an extra | machine-register to hold the expected checksum. | | These "blind" checksums allow ROM production hardware | (programmers, copiers) to both pre-verify the integrity of | the input image, and to post-verify that it has programmed | the image onto a chip successfully. No special container | format for the ROM image is required, nor is the ROM image | required to be structured in any particular way (which is | good, because ROMs are used for all sorts of things, not | just code.) The ROM image can be any opaque blob, just as | long as it sums to zero. | | In fact, you don't even need a ROM "image" at all. It's | possible to integrity-verify a programmed ROM "against | itself"; and thus, a hand-programmed ROM (e.g. an EEPROM | you programmed in your office) can be sent to the | duplication facility to serve as the reference from which | mask-ROM masks will be generated. The data on the EEPROM | can be trusted, because it sums to zero. And the mask ROMs | themselves can be checked for flaws by seeing whether | _they_ sum to zero. | | For smaller-scale ROM distribution, ROM-to-PROM bulk | copiers are used. These copiers can be made to both pre- | verify the source, and to post-verify the programmed | copies. Using this approach to checksumming, the copier can | avoid having to verify the source "against" the | destination, instead only needing to verify the source | once, and then verify the destinations against themselves. | This both speeds up verification; and allows for the use of | simpler microcontrollers in these copiers, which reduces | their design cost. (By quite a lot, back in the 1970s, when | all this was most relevant.) | | You can see this approach to checksumming in practice in | early-generation game cartridge ROMs, which almost always | have these embedded sum-complement values (and so | presumably were integrity-verified during | mastering/duplication.) These sum-complement value fields | get referred to by emulators as "the checksum" of the ROM | image--but technically, they're not; if you're following | along, you'll realize that "the checksum" of such ROM | images is zero! :) | [deleted] | soufron wrote: | Yup. Who did write the microcode at the time? And for how long? | ajenner wrote: | According to https://en.wikipedia.org/wiki/Intel_8086: "The | architecture was defined by Stephen P. Morse with some help | and assistance by Bruce Ravenel (the architect of the 8087) | in refining the final revisions. Logic designer Jim McKevitt | and John Bayliss were the lead engineers of the hardware- | level development team and Bill Pohlman the manager for the | project." I expect the microcode was developed in tandem with | the rest of the chip, so probably took about 2 years. | schoen wrote: | What's "random logic"? From context, it sounds like circuitry | that explicitly implements the functionality of an opcode, as | opposed to circuitry that can be used by the microcode, or | something? | ajenner wrote: | Yes, exactly - the logic that implements the simpler | instructions directly as special-purpose gates rather than | microcode. | kens wrote: | To expand on that, "random logic" means that it looks | random; it's not _actually_ random. This is in contrast to | circuits that have an underlying structure to them, like a | PLA or ROM. | pkphilip wrote: | Amazing work! Can't say I understood half of what you have | written, but sure is some top quality work! | procd wrote: | Amazing! | rkagerer wrote: | Chris Gerlinsky did a talk last year on the process he uses to | decap chips and extract their ROM bits with a microscope: | https://youtu.be/4YpSevQWCX8 | | My favorite line was when he described how one hint you've got | the decoding right might be stumbling upon a recognizable ASCII | string, and said "sometimes the only ASCII text you find is a | copyright notice... keep putting those in, that's great!" | kevbin wrote: | The link to https://www.righto.com/2020/06/a-look-at-die- | of-8086-process... is worth clicking ___________________________________________________________________ (page generated 2020-09-05 23:00 UTC)