# i386 assembler

The i386 assembler lives at asm/i386.fs and covers a big part of the
architecture (some parts are not implemented yet).

You write code by calling operand specifiers followed by an operation word. For
example, "ax bx add," assembles an ADD operation with EAX as a destination and
EBX as a source.

Operand words always result in one element being placed on PS. The instruction
word then consumes the operands from PS and writes the result. Error checking
is minimal, you should be careful about how to write instructructions.

## Register operand

You use a register operand by typing its name. The list of registers is:

ax bx cx dx
sp bp si di

These register words don't have a "e" prefix because their meaning is
context-dependent. When we're in real mode, they refer to the 16-bit register.
When we're using an instruction where the register has to be 16-bit such has
movzx, it refers to the 16-bit register too. But otherwise, these words refer to
the 32-bit register, so they can be seen as if they has the "e" prefix.

Then, there's the 8-bit register words:

al bl cl dl
ah bh ch dh

There is also a list of "special" registers which can only be used with "mov,"
and another "regular" register operand:

es ss ds fs gs
cr0 cr2 cr3
dr0 dr1 dr2 dr3 dr6 dr7
tr6 tr7

## Immediate operand

    i) ( imm -- )

You can specify an immediate operand with the "i)" word. For example "si 42 i)
sub," writes the SUB operation with ESI as destination and 42 as an immediate
source.

An immediate operand is always the source and should always be placed second in
the operand list (except for push,)

Remember, all operand words leave an item (always one and only one) on PS. If,
for example, you want to apply a displacement that is already on PS, you have to
do proper PS juggling:

    : movAXi, ( i -- ) ax swap i) mov, ;

## Memory operand

    m) ( addr -- )

You can refer to a memory address with the "m)" word. "cl $1234 m) mov," loads
the byte at memory address $1234 into register CL.

Order is important here: "$2345 m) dx mov," writes the contents of EDX in memory
address $2345.

## Indirect register operand

    d) ( operand displacement -- operand )

We can also refer to a memory address stored in register with the "d)" word,
along with a constant offset. For example, if EDI is $1200, "ax di 42 d) mov,"
loads the 4 byte value from memory address $122a into EAX. Again, order is
important so "di 42 d) ax mov" does the opposite.

If you want indirect addressing without offset, use "0 d)". The assembler will
automatically use the operation form that is more compact (which contains no
offset).

## Operation width

    8b) ( operand -- operand )
    16b) ( operand -- operand )
    32b) ( operand -- operand )

By default, operations are written in their 32-bit wide versions. But operations
can be 32-bit, 16-bit or 8-bit wide. There are multiple factors deciding on that
width.

First, using an 8-bit register operator (al, ch, etc.) yields an 8-bit operand.

But sometimes you want an 8-bit operations that does not involve a register, for
example "$1234 m) inc,". To force this one operation into 8-bit or 16-bit mode,
do it with "8b)" or "16b)", for example: "$1234 m) 8b) inc,".

You can set the "realmode" global value to 1 to put the assembler in real mode,
where the default width is 16-bit until you set "realmode" back to 0.

You won't be using "32b)" often, but it can be useful in macros where the input
operand can be sized and you want to make it "full-width".

By design, the "8b)", "16b)" and "32b)" words have the exact same effect as the
words of the same name in i386 HAL. This means, for example, that the MOD) word
[doc/usage/bmw] works fine with assembler code.

## Instructions

Instructions are divided in multiple groups with different signatures.

### Inherent ( -- )

ret,     nop,      cli,      sti,      cld,      std,
lodsb,   lodsw,    lods,     stosb,    stosw,    stos,
cmpsb,   cmpsw,    cmps,
movsb,   movsw,    movs,     scasb,    scasw,    scas,
insb     insw,     ins,      outsb,    outsw,    outs,
repz,    repnz,    rep,
pusha,   popa,     pushf,   popf,     iret,

### Single operand ( dst -- )

push,    pop,
neg,     not,
mul,     div,
dec,     inc,
setg,    setl,     seta,     setb,
setge,   setle,    setae,    setbe,
setz,    setnz,    setc,     setnc,
lgdt,    lidt,

With "mul," and "div," the destination is always ax and you don't specify it.
So, you'll write them like "bx mul," or "cx div,"

"push," is the only instruction that can take an immediate as its first (and
only) operand.

### Two operands ( dst src -- )

add,     or,       adc,      sbb,
and,     sub,      xor,      cmp,
test,    xchg,
mov,     lea,

### Shift instructions ( dst shiftby -- )

"shiftby" is either an immediate or "cl".

rol,     ror,      rcl,      rcr,
sal,     sar,      shl,      shr,

### movsx, and movzx,

movsx, and movzx, are like a regular two operand instructions, but widths are
funky. "dst" is always full-width, but as you know, src is always either 8b or
16b. For the 8b variant, you apply 8b) to src, but for the 16b variant, you
apply nothing. Examples:

bx $1234 m) 8b) movsx, \ 8-bit sign-extended move into ebx
bx $1234 m) movzx, \ 16-bit sign-extended move into ebx
bx ax movsx, \ 16-bit sign-extended move from ax to ebx

### ?movzx, macro ( dst src -- )

"?movzx," is a macro that does the "right thing" to get a possibly smaller "src"
into a full width "dst" register ("dst" has to be a register and it has to be
full width).

If src is full width, then a simple "mov," is written. If it's an immediate too.
Otherwise, it's a movzx,.

### in, and out,

The "in," and "out," operations support both their immediate form and their
ax/dx form. You have to specify registers even if only al and ax are legal. ax
is always first for both instructions. Width modifier is applied to ax, not dx
Examples:

al 42 i) out, \ 8-bit out to port 42
ax 16b) 42 i) in, \ 16-bit in from port 42
ax dx out, \ 32-bit out to port DX
ax 16b) dx in, \ 16-bit in from port DX

### Jumps and calls

"jmp," and "call," have two possible forms: Immediate or mod/rm.

In mod/rm mode, these operations work like others. For example, "ax jmp," works
as you'd expect.

The immediate offset form is used directly, without the "i)" word. The number
supplied to it is expected to be an offset relative the operations *beginning*
position (yes, *beginning*, unlike what the i386 operation expects, which is an
offset from the end of the operation). This means that "0 jmp," is always an
infinite loop.

At this point a bit of fiddling happens to this offset. First, we check if the
offset is small enough to fit in 8-bits. If it is, we write the 8-bit form of
the jump/call. If it's not, we write the 32-bit form (or 16-bit form if we're
in real mode).

Then, after that, we need to adjust that offset so that it jumps to where it's
supposed to by subtracting 2, 3 or 5 bytes (depending on the opcode width)
before writing it.

Conditional jumps (jz, jnc, etc.) work the same way except that they only
support the immediate mode (again, no "i)") and will subtract an additional 1
from the resulting offset in 16-bit/32-bit because the opcode is 2 bytes wide.

jz,      jnz,      jc,       jnc,
js,      jns,      jl,       jnl,
ja,      jna,
loop,    loopz,    loopnz,

There is also "jmpfar," and "callfar," with signature "seg16 absaddr --"
(regular number, not i) immediates).

### int,

"int," is special and is called with a regular number: $80 int,