[HN Gopher] Reverse-engineering the multiplication algorithm in ...
       ___________________________________________________________________
        
       Reverse-engineering the multiplication algorithm in the Intel 8086
       processor
        
       Author : CoBE10
       Score  : 117 points
       Date   : 2023-03-15 16:16 UTC (6 hours ago)
        
 (HTM) web link (www.righto.com)
 (TXT) w3m dump (www.righto.com)
        
       | kens wrote:
       | Author here. I'm without electricity but can discuss the 8086 if
       | anyone has questions :-)
        
         | gregschlom wrote:
         | Ken, I've read and enjoyed many of your posts and I always
         | wonder: how do you find so much time to do that? Is that your
         | full time occupation?
         | 
         | Thanks!
        
           | kens wrote:
           | I'm retired, so I have time for various vintage computer
           | projects.
        
         | detrites wrote:
         | Not a question, just a comment that after having read several
         | of your posts now, building up knowledge as I go, this one is
         | the first I've understood from start to end. Most satisfying!
        
         | peteri wrote:
         | Interesting, any reason for not discussing AAD yet? (oh and
         | great series BTW)
        
           | kens wrote:
           | AAD is lower on the list since it is both complicated and
           | kind of obscure. Also, it depends on multiplication and
           | division, so I want to cover them first.
        
             | bonzini wrote:
             | AAD is essentially a multiplication and a sum:
             | 170  Q     -> tmpc                                     171
             | AH    -> tmpb      CALL  CORX         172  AL    -> tmpb
             | ADD   tmpc         173  ZERO  -> AH        JMP   179
             | 179  SIGMA -> AL        RNI              F
             | 
             | It doesn't depend on division despite the name, it performs
             | ASCII to binary conversion.
        
         | jzwinck wrote:
         | As far as I know, x86 and x86-64 have never had microcode for
         | ASCII (base 10 is what I'm interested in) to binary number
         | conversion. Why do you think this is? It seems like a very
         | common operation these days, and the reasons you gave for
         | microcode multiply seem applicable.
        
           | prirun wrote:
           | Prime minicomputers had decimal arithmetic in microcode &
           | hardware in the early 80's to support PL/I and COBOL
           | "picture" data types and packed decimal. They also had
           | sophisticated editing to pictures like ZZZ,ZZ9.99 which would
           | display 1024 as 1,024.00
           | 
           | These instructions were also implemented in PMA, the Prime
           | Macro Assembler, in the OS so that if a machine didn't have
           | these instructions, a UII fault occurred and the instruction
           | was simulated in software. I implemented a lot of the easier
           | instructions in C in the Prime emulator, but for the more
           | complex things like decimal math, I let those fault so the
           | Prime OS would handle them.
        
           | kens wrote:
           | The IBM System/360 (1964) had instructions to convert a
           | character string to packed decimal, and to convert packed
           | decimal to binary. This was motivated by the large amount of
           | punch-card data that was handled back then, with all the
           | numbers in decimal. But I guess Intel figured that string
           | conversion wasn't common enough in modern systems to merit
           | adding it to the instruction set.
        
       | garganzol wrote:
       | CISC operations like 'imul' and 'idiv' were perceived like curse
       | words in the RISC world of the day, but they were actually a
       | beacon of hope. When x86 architecture matured, those operations
       | became super fast thanks to the hardware acceleration.
       | 
       | So take it with a grain of salt when you hear someone claim that
       | RISC is invariantly superior to CISC. Times and times again, the
       | truth lies somewhere in the middle.
        
         | leni536 wrote:
         | Well, idiv being super fast is new to me, but I guess it's
         | relative.
        
           | cperciva wrote:
           | Integer division is sufficiently famously slow that people
           | (and compilers) go to great lengths to avoid it... which
           | makes it rare enough that there's very little point
           | implementing it.
           | 
           | If you really need integer division, you can typically
           | optimize it by doing a FP iteration and then fixing up the
           | approximate result.
        
             | Tuna-Fish wrote:
             | The worst case integer divide times on newest Intel/AMD
             | cpus are 15/18 for 64bit and 12/13 for 32-bit. Both do
             | early out for easy values at around 10 cycles.
             | 
             | Integer division is still on the slower side operations on
             | the machines, but the advice to avoid division at all costs
             | is outdated. In comparison, the FP divide instructions are
             | in the ~13/14 cycles, it would be hard for you to shuffle
             | data to that side and back and not be slower. (And no, they
             | are not doing this internally -- both have separate integer
             | divide units.)
        
       | ajross wrote:
       | For those confused as to why there's all this signedness handling
       | for the 2's complement multiplication algorithm we were all
       | taught in school is identical for both signed and unsigned
       | numbers: x86 multiplication is _widening_. The 8086 would compute
       | the 32 bit product of two 16 bit values[1] (one of which must be
       | AX) and place the result in two output registers (DX and AX,
       | always).
       | 
       | If you interpret the arguments as unsigned, you can treat the
       | input high bits as zeroes. But if it's signed then you need to
       | conceptually sign-extend the high bit. So they're different
       | operations with different results in the high word.
       | 
       | [1] There was an 8 bit multiply too. It didn't have this property
       | and looked more like a normal instruction, kens would have to
       | tell us if it shared the same microcode.
        
         | bonzini wrote:
         | Also, we use sign-magnitude representation while x86 uses twos
         | complement.
         | 
         | > It didn't have this property and looked more like a normal
         | instruction, kens would have to tell us if it shared the same
         | microcode.
         | 
         | If you're thinking of "IMUL reg,reg" it was added only in
         | 80186. If instead you're thinking of AAD, then yes it reuses
         | the CORX subroutine; it's only unsigned so it doesn't need the
         | parts that deal with negation.
        
       | andrehacker wrote:
       | Ken, please tell us you have a book deal, this is just awesome
       | stuff.
        
         | kens wrote:
         | I've been vaguely thinking about a book on the 8086 die. I'm
         | sure I could sell at least 5 copies :-)
        
           | amir wrote:
           | If that's all it takes then I pre-order five copies.
        
           | JohnFen wrote:
           | For the good of humanity -- or at least our industry -- or at
           | least my intellectual curiosity -- or at least so I can have
           | another cool book on my shelf to increase my geek cred --
           | please do this!
        
           | spyremeown wrote:
           | Count six. I'd buy it in a beat.
        
       ___________________________________________________________________
       (page generated 2023-03-15 23:01 UTC)