Title: 8080 PRNG Part 2 Date: November 21, 2018 Tags: altair programming ======================================== Before moving on to the next program (which still doesn't work right), I'll dig into the Xorshift PRNG, explain how it works and give a little overview of 8080 assembly programming. Xorshift was chosen for being very low on resource requirements. There are very few, easy calculations and little memory used. My initial approach is very straightforward and I didn't take into account cycle times or code reuse or loop costs. There could be some improvements with loop unrolling or using subroutines for reused code. And I could have chosen a set of fewer total shifts. Everything has trade-offs, though: execution speed, program size, stack size. If you're not familiar with assembly at all, the 8080 instruction set is fairly easy to wrap your head around. Briefly, The 8080 CPU is 8 bits with 16 bits of memory addressing. This is accomplished by pairing 8 bit registers and providing a few 16 bit OP codes that implicitly use the pairs. The registers are B, C, D, E, H, L, and A (the accumulator). There is also PSW which contains the flag bits. Mostly it's important as a register when making subroutine calls. Otherwise, you don't eve use it like a register. When an OP code references a register pair, specifying register B will operate on registers B and C as a 16 bit value and the same for D and E and H and L. H and L are special in that when an OP code modifies memory, the 16 bit address is read from those registers exclusively. The accumulator is special in that many operations assume use of that register. These special properties means you have to carefully chose which registers to use to reduce unnecessary shuffling or reading and writing to memory to preserve data for later. Here is the code as originally written. I'll then pull out pieces and describe what it's doing. Later, we might play with optimization choices. The code is presented with the following columns: address, OP code, address parameter or immediate value in octal, OP code in octal, comment. 000 LHLD 052 ; Load seed into HL 001 111Q 111 ; Address 000111Q - low bits 002 000Q 000 ; high bits 003 MVI A 076 ; Zero accumulator 004 000Q 000 005 CMP H 274 ; Check if H is zero 006 JNZ 302 ; If not 0, start 007 016Q 016 ; Starting address 000016Q - low bits 010 000Q 000 ; high bits 011 CMP L 275 ; Check if L is zero 012 JNZ 302 ; If not 0, start 013 016Q 016 014 000Q 000 015 HLT 166 ; If HL is zero, halt as an error 016 MVI A 076 ; Copy a into accumulator 017 015Q 015 ; a = 13 020 MOV B,H 104 ; Store original seed - H into B 021 MOV C,L 115 ; L into C 022 DAD H 051 ; 16bit left shift HL 023 DCR A 075 ; Decrement accumulator 024 JNZ 302 ; Check if we've shifted enough times 025 022Q 022 ; If no, jump to shift again 026 000Q 000 027 MOV A,L 175 ; Copy low bits 030 XRA C 251 ; XOR with seed low bits 031 MOV L,A 157 ; Store low bits 032 MOV A,H 174 ; Copy high bits 033 XRA B 250 ; XOR with seed high bits 034 MOV H,A 147 ; Store high bits 035 MOV B,H 104 ; Store copy of 036 MOV C,L 115 ; modified seed 037 MVI D 026 ; Copy b into D 040 011Q 011 ; b = 9 041 STC 067 ; Clear carry bit by setting to 1 042 CMC 077 ; and complimenting 043 MOV A,H 174 ; Copy high bits 044 RAR 037 ; Rotate right through carry (thereby saving what falls off the end) 045 MOV H,A 147 ; Store shifted high bits 046 MOV A,L 175 ; Copy low bits 047 RAR 037 ; Rotate right through carry (picking up what fell off the high bits) 050 MOV L,A 157 ; Store rotated low bits 051 DCR D 025 ; Decrement counter 052 JNZ 302 ; If not zero 053 041Q 041 ; shift again 054 000Q 000 055 MOV A,L 175 ; Copy low bits 056 XRA C 251 ; XOR with modified seed 057 MOV L,A 157 ; Store low bits 060 MOV H,A 174 ; Copy high bits 061 XRA B 250 ; XOR with modified seed 062 MOV H,A 147 ; Store high bits 063 MOV B,H 104 ; Store copy 064 MOV C,L 115 ; of modified seed 065 MVI A 076 ; Store c into accumulator 066 007Q 007 ; c = 7 067 DAD H 051 ; 16bit left shift 070 DCR A 075 ; Decrement counter 071 JNZ 302 ; If not zero 072 067Q 067 ; shift again 073 000Q 000 ; 074 MOV A,L 175 ; Copy low bits 075 XRA C 251 ; XOR with modified seed 076 MOV L,A 157 ; Save low bits 077 MOV A,H 174 ; Copy high bits 100 XRA B 250 ; XOR with modified seed 101 MOV H,A 147 ; Store high bits 102 SHLD 042 ; Store random number 103 111Q 111 ; as new seed 104 000Q 000 ; for next iteration 105 HLT 166 ; Stop 111 053Q 053 ; Initial seed 555 - low bits 112 002Q 002 ; high bits 72 lines, or memory addresses. 72 bytes or 576 bits. No stack used except for 2 bytes to store the seed (which doubles as the current random number). All other data is hard coded. ## Breaking it down ## # Input check # One caveat of the Xorshift algorithm is that you can't have a seed of zero and consequently the series will never include zero as a random value. So we need to make sure that either the high byte or the low byte of the seed are not zero before we begin. As soon as either byte is not zero, begin, otherwise we halt as an error. > 000 LHLD 052 ; Load seed into HL > 001 111Q 111 ; Address 111Q > 002 000Q 000 > 003 MVI A 076 ; Zero accumulator > 004 000Q 000 > 005 CMP H 274 ; Check if H is zero > 006 JNZ 302 ; If not 0, start > 007 016Q 016 ; Starting address 000016Q - low bits > 010 000Q 000 ; high bits > 011 CMP L 275 ; Check if L is zero > 012 JNZ 302 ; If not 0, start > 013 016Q 016 > 014 000Q 000 > 015 HLT 166 ; If HL is zero, halt as an error I start by reading the seed from address 000111Q which was chosen simply because it was past the end of the program. LHLD is one of the special OP codes that assumes a register pair. Then we need to set the accumulator to zero so we can do comparisons with the bytes in H and L to use CMP to check for zero. I'm not sure if there a more efficient way to do this. I haven't thought of one. # Shift left # > 016 MVI A 076 ; Copy a into accumulator > 017 015Q 015 ; a = 13 > 020 MOV B,H 104 ; Store original seed - H into B > 021 MOV C,L 115 ; L into C > 022 DAD H 051 ; 16bit left shift HL > 023 DCR A 075 ; Decrement accumulator > 024 JNZ 302 ; Check if we've shifted enough times > 025 022Q 022 ; If no, jump to shift again > 026 000Q 000 Here is our first left shift. First, save the seed unshifted as we'll need it to XOR with later. This uses the DAD instruction which does a 16 bit addition of one of the register pairs. Adding a number to itself is equivalent to a left shift. This allow me to operate on the full 16 bits in one instruction. You'll see that shifting right is a little bit more complicated. Then all we do is keep tack of how many times we shift. When the accumulator reaches zero, the zero bit will be set and JNZ won't jump and execution will continue on to XORing. # XORing # > 027 MOV A,L 175 ; Copy low bits > 030 XRA C 251 ; XOR with seed low bits > 031 MOV L,A 157 ; Store low bits > 032 MOV A,H 174 ; Copy high bits > 033 XRA B 250 ; XOR with seed high bits > 034 MOV H,A 147 ; Store high bits > 035 MOV B,H 104 ; Store copy of > 036 MOV C,L 115 ; modified seed Using the accumulator to hold a byte a time, we can XOR with the original, unshifted bytes. This shifted and XORed result is then used going forward, we don't have to care about the original seed any more. We XOR three times so this might benefit from code reuse by making into a subroutine. While a subroutine might mean less program memory used, it will mean stack space is needed and the context switching might add execution time. We can avoid the stack by jumping but reducing hard coded addresses has it benefits, too. # Shifting right # > 037 MVI D 026 ; Copy b into D > 040 011Q 011 ; b = 9 > 041 STC 067 ; Clear carry bit by setting to 1 > 042 CMC 077 ; and complimenting > 043 MOV A,H 174 ; Copy high bits > 044 RAR 037 ; Rotate right through carry > 045 MOV H,A 147 ; Store shifted high bits > 046 MOV A,L 175 ; Copy low bits > 047 RAR 037 ; Rotate right through carry > 050 MOV L,A 157 ; Store rotated low bits > 051 DCR D 025 ; Decrement counter > 052 JNZ 302 ; If not zero > 053 041Q 041 ; shift again > 054 000Q 000 Shifting right doesn't have a nice 16 bit shortcut. We have to shift one byte at a time and take advantage of the carry bit. If we clear the carry bit to start with we will always be pushing a zero into the high end. If the catch what falls off the low end of the byte in the carry bit, then when we shift the low byte, it'll pull it into the high end. RAR rotates the accumulator right through the carry bit. Rotate, not shift, meaning whatever falls off one end, gets put back on the other end. Using the carry bit prevents what falls off from getting put back on. but instead gives us a place to hold onto it. Right shift[0] Right shifting only happens once, and I don't know of a cleaner way to do it. The only optimization, I've mentioned previously, is to use XRA A at address 041 instead of STC and CMC to clear the carry bit in half the time. After this, it's another XOR, a left shift, and a final XOR before writing out the random number over the original seed in memory. This is the random number that can be consumed by another program and it also serves as the next seed to generate the next number in the sequence. Another benefit of putting the reused code sections into subroutines, is they might be usable by other programs in the future. So it matters what the overall plan is. If all we are doing is spitting out a random number without context, avoiding CALLs and RETs and stack use is very efficient. If we built a BASIC type interpreter environment, these might become primitives for many operations and we'd benefit from generalizing and creating subroutines in saved space. [0] gopher://kagu-tsuchi.com:70/I/blog/images/PRNG_rshift.png