Title: 8080 PRNG Part 2
Date: November 21, 2018
Tags: altair programming
========================================

Before moving on to the next program (which still doesn't work right), I'll dig
into the Xorshift PRNG, explain how it works and give a little overview of 8080
assembly programming.

Xorshift was chosen for being very low on resource requirements.  There are very
few, easy calculations and little memory used.  My initial approach is very
straightforward and I didn't take into account cycle times or code reuse or loop
costs.  There could be some improvements with loop unrolling or using
subroutines for reused code. And I could have chosen a set of fewer total
shifts.  Everything has trade-offs, though:  execution speed, program size,
stack size. 

If you're not familiar with assembly at all, the 8080 instruction set is fairly
easy to wrap your head around.  Briefly, The 8080 CPU is 8 bits with 16 bits of
memory addressing.  This is accomplished by pairing 8 bit registers and
providing a few 16 bit OP codes that implicitly use the pairs.

The registers are B, C, D, E, H, L, and A (the accumulator).  There is also PSW
which contains the flag bits.  Mostly it's important as a register when making
subroutine calls.  Otherwise, you don't eve use it like a register.  When an OP
code references a register pair, specifying register B will operate on registers
B and C as a 16 bit value and the same for D and E and H and L.  H and L are
special in that when an OP code modifies memory, the 16 bit address is read from
those registers exclusively.  The accumulator is special in that many operations
assume use of that register.  These special properties means you have to
carefully chose which registers to use to reduce unnecessary shuffling or
reading and writing to memory to preserve data for later.

Here is the code as originally written.  I'll then pull out pieces and describe
what it's doing.  Later, we might play with optimization choices.

The code is presented with the following columns:
address, OP code, address parameter or immediate value in octal, OP code in
octal, comment.


000	LHLD		052 ; Load seed into HL
001		111Q	111 ; Address 000111Q - low bits
002		000Q	000 ; high bits
003	MVI A		076 ; Zero accumulator
004		000Q	000
005	CMP H		274 ; Check if H is zero
006	JNZ		302 ; If not 0, start
007		016Q	016 ; Starting address 000016Q - low bits
010		000Q	000 ; high bits
011	CMP L		275 ; Check if L is zero
012	JNZ		302 ; If not 0, start
013		016Q	016
014		000Q	000
015	HLT		166 ; If HL is zero, halt as an error
016	MVI A		076 ; Copy a into accumulator
017		015Q	015 ; a = 13
020	MOV B,H		104 ; Store original seed - H into B
021	MOV C,L		115 ; L into C
022	DAD H		051 ; 16bit left shift HL
023	DCR A		075 ; Decrement accumulator
024	JNZ		302 ; Check if we've shifted enough times
025		022Q	022 ; If no, jump to shift again
026		000Q	000
027	MOV A,L		175 ; Copy low bits
030	XRA C		251 ; XOR with seed low bits
031	MOV L,A		157 ; Store low bits
032	MOV A,H		174 ; Copy high bits
033	XRA B		250 ; XOR with seed high bits
034	MOV H,A		147 ; Store high bits
035	MOV B,H		104 ; Store copy of
036	MOV C,L		115 ; modified seed
037	MVI D		026 ; Copy b into D
040		011Q	011 ; b = 9
041	STC		067 ; Clear carry bit by setting to 1
042	CMC		077 ; and complimenting
043	MOV A,H		174 ; Copy high bits
044	RAR		037 ; Rotate right through carry (thereby saving what falls off the end)
045	MOV H,A		147 ; Store shifted high bits
046	MOV A,L		175 ; Copy low bits
047	RAR		037 ; Rotate right through carry (picking up what fell off the high bits)
050 	MOV L,A		157 ; Store rotated low bits
051	DCR D		025 ; Decrement counter
052	JNZ		302 ; If not zero
053		041Q	041 ; shift again
054		000Q	000
055	MOV A,L		175 ; Copy low bits
056	XRA C		251 ; XOR with modified seed
057	MOV L,A		157 ; Store low bits
060	MOV H,A		174 ; Copy high bits
061	XRA B		250 ; XOR with modified seed
062	MOV H,A		147 ; Store high bits
063	MOV B,H		104 ; Store copy
064	MOV C,L		115 ; of modified seed
065	MVI A		076 ; Store c into accumulator
066		007Q	007 ; c = 7
067	DAD H		051 ; 16bit left shift
070	DCR A		075 ; Decrement counter
071	JNZ		302 ; If not zero
072		067Q	067 ; shift again
073		000Q	000 ;
074	MOV A,L		175 ; Copy low bits
075	XRA C		251 ; XOR with modified seed
076	MOV L,A		157 ; Save low bits
077	MOV A,H		174 ; Copy high bits
100	XRA B		250 ; XOR with modified seed
101	MOV H,A		147 ; Store high bits
102	SHLD		042 ; Store random number
103		111Q	111 ; as new seed
104		000Q	000 ; for next iteration
105	HLT		166 ; Stop

111		053Q	053 ; Initial seed 555 - low bits
112		002Q	002 ; high bits


72 lines, or memory addresses.  72 bytes or 576 bits.  No stack used except for
2 bytes to store the seed (which doubles as the current random number).  All
other data is hard coded.

## Breaking it down ##
# Input check #
One caveat of the Xorshift algorithm is that you can't have a seed of zero and
consequently the series will never include zero as a random value.  So we need
to make sure that either the high byte or the low byte of the seed are not zero
before we begin.  As soon as either byte is not zero, begin, otherwise we halt
as an error.


> 000  LHLD       052 ; Load seed into HL
> 001        111Q 111 ; Address 111Q
> 002        000Q 000
> 003  MVI A      076 ; Zero accumulator
> 004        000Q 000
> 005  CMP H      274 ; Check if H is zero
> 006  JNZ        302 ; If not 0, start
> 007        016Q 016 ; Starting address 000016Q - low bits
> 010        000Q 000 ; high bits
> 011  CMP L      275 ; Check if L is zero
> 012  JNZ        302 ; If not 0, start
> 013        016Q 016
> 014        000Q 000
> 015  HLT        166 ; If HL is zero, halt as an error


I start by reading the seed from address 000111Q which was chosen simply because
it was past the end of the program.  LHLD is one of the special OP codes that
assumes a register pair.  Then we need to set the accumulator to zero so we can
do comparisons with the bytes in H and L to use CMP to check for zero.  I'm not
sure if there a more efficient way to do this.  I haven't thought of one.

# Shift left #


> 016 MVI A       076 ; Copy a into accumulator
> 017        015Q 015 ; a = 13
> 020 MOV B,H     104 ; Store original seed - H into B
> 021 MOV C,L     115 ; L into C
> 022 DAD H       051 ; 16bit left shift HL
> 023 DCR A       075 ; Decrement accumulator
> 024 JNZ         302 ; Check if we've shifted enough times
> 025        022Q 022 ; If no, jump to shift again
> 026        000Q 000


Here is our first left shift.  First, save the seed unshifted as we'll need it
to XOR with later.  This uses the DAD instruction which does a 16 bit addition
of one of the register pairs.  Adding a number to itself is equivalent to a left
shift.  This allow me to operate on the full 16 bits in one instruction.  You'll
see that shifting right is a little bit more complicated.  Then all we do is
keep tack of how many times we shift.  When the accumulator reaches zero, the
zero bit will be set and JNZ won't jump and execution will continue on to
XORing.

# XORing #


> 027 MOV A,L     175 ; Copy low bits
> 030 XRA C       251 ; XOR with seed low bits
> 031 MOV L,A     157 ; Store low bits
> 032 MOV A,H     174 ; Copy high bits
> 033 XRA B       250 ; XOR with seed high bits
> 034 MOV H,A     147 ; Store high bits
> 035 MOV B,H     104 ; Store copy of
> 036 MOV C,L     115 ; modified seed


Using the accumulator to hold a byte a time, we can XOR with the original,
unshifted bytes.  This shifted and XORed result is then used going forward, we
don't have to care about the original seed any more.  We XOR three times so this
might benefit from code reuse by making into a subroutine.  While a subroutine
might mean less program memory used, it will mean stack space is needed and the
context switching might add execution time.  We can avoid the stack by jumping
but reducing hard coded addresses has it benefits, too.

# Shifting right #


> 037 MVI D       026 ; Copy b into D
> 040        011Q 011 ; b = 9
> 041 STC         067 ; Clear carry bit by setting to 1
> 042 CMC         077 ; and complimenting
> 043 MOV A,H     174 ; Copy high bits
> 044 RAR         037 ; Rotate right through carry
> 045 MOV H,A     147 ; Store shifted high bits
> 046 MOV A,L     175 ; Copy low bits
> 047 RAR         037 ; Rotate right through carry
> 050 MOV L,A     157 ; Store rotated low bits
> 051 DCR D       025 ; Decrement counter
> 052 JNZ         302 ; If not zero
> 053        041Q 041 ; shift again
> 054        000Q 000


Shifting right doesn't have a nice 16 bit shortcut.  We have to shift one byte
at a time and take advantage of the carry bit.  If we clear the carry bit to
start with we will always be pushing a zero into the high end.  If the catch
what falls off the low end of the byte in the carry bit, then when we shift the
low byte, it'll pull it into the high end.  RAR rotates the accumulator right
through the carry bit.  Rotate, not shift, meaning whatever falls off one end,
gets put back on the other end.  Using the carry bit prevents what falls off
from getting put back on. but instead gives us a place to hold onto it.


Right shift[0]


Right shifting only happens once, and I don't know of a cleaner way to do it.
The only optimization, I've mentioned previously, is to use XRA A at address 041
instead of STC and CMC to clear the carry bit in half the time.

After this, it's another XOR, a left shift, and a final XOR before writing out
the random number over the original seed in memory.  This is the random number
that can be consumed by another program and it also serves as the next seed to
generate the next number in the sequence.

Another benefit of putting the reused code sections into subroutines, is they
might be usable by other programs in the future.  So it matters what the overall
plan is.  If all we are doing is spitting out a random number without context,
avoiding CALLs and RETs and stack use is very efficient.  If we built a BASIC
type interpreter environment, these might become primitives for many operations
and we'd benefit from generalizing and creating subroutines in saved space.


[0] gopher://kagu-tsuchi.com:70/I/blog/images/PRNG_rshift.png