Title: Altair Assembler Part 3 Date: November 17 2020 Tags: altair programming ======================================== I had intended to write a series of posts documenting the process of writing my assembler but ended up spending my available time writing the assembler. The assembler being the largest program I've undertaken on the Altair, I expected to take two or three years to complete it. But because of the pandemic, I've been home a lot more than usual for summertime and that put me in the mood to write some assembly. After about a year, and almost 3 Kilobytes of code, I've gotten the assembler written and working. I still need to do more exhaustive testing but the problems are solved, sub-routines are written, and it will assemble a small program start to finish. I'll use this post to summarize the project. To remind everyone, the point of this project was to write the minimum features necessary to eliminate human error and allow me to write a real, full featured, assembler. The primary needs were translating assembler mnemonics to opcodes, keeping count of the addresses, and allowing the use of labels. As I went, I ended up filling in many more additional features. I figured that I could write the algorithms now or in the full assembler later and it's "just a little more code" so why not now? And if something took too long to get right, I could just comment it out and move on. I also wanted this project to be a programming challenge so I tried to solve all of the problems on my own. That is, parsing a line of assembly code, storing structured data in memory, searching, error checking, etc. Other than a 20 year old Computer Science education, the only "cheating" was taking a quick glance at Hjalfi's CP/M assembler[0] written in C and seeing how he used function callbacks for each opcode. I had thought I would need to to some complex decoding and branching based on the opcode's bits. I probably could have done some of that to reduce the amount of code, but by the time I got this far I was just trying to get stuff written. There was a lot more copying and pasting with minor tweaking than I usually like to do. That's why the assembler is so big despite a lack of larger features found in other assemblers that aren't much larger in size. The other trick I picked up was to grow the symbol table down like a stack. I got that idea while watching a video about C64 assembly on 8-bit Show and Tell's YouTube channel[1] but I can't recall exactly which one. That idea made sense as a way to maximize available memory for the user's program without trying to guess a limit to the size of the symbol table. I also plan to move the assembler into a PROM chip in high memory where there won't be room, and it won't be writable. # Assembler Features # Refer back to Part 1[2] of this series to see what I planned to do and not do in this first pass. The end result is a bit different. Obviously, the assembler handles translating mnemonics to opcodes, covering the entire 8080 instruction set. I had to handle opcodes that take a register as a parameter but is part of the opcode, takes data, addresses, etc. That's where the callbacks came in. It also handles address counting. It supports the use of the ORG pseudo-opcode to set the address counter to a specific address to start assembly or to continue assembling from. And it tracks the count as opcodes of 1, 2 or 3 bytes are processed. You can create labels which will save the current address to the symbol table to be referenced by other instructions. And, bonus, you can create a label on a line by itself allowing for multiple labels for a single address. You can reference undefined labels, as long as they get defined later. Undefined label references are stored and at the end of assembly, the references are resolved by searching for the labels in the symbol table. DW, DB, and DS pseudo-opcodes are implemented for data storage and can be labeled and referenced by that label. I even managed to support strings with DB including some escaped, non-printable characters like tab and newline. Both the EQU and SET pseudo-opcodes are implemented. EQU definitions cannot be changed and will be an error if you try. SET definitions can be changed and the assembler uses whatever the last value was when you reference it. Unlike labels, both need to be defined before they are referenced. Numeric values can be entered as decimal, binary, hex, single word octal or 2 byte octal. That last one is quirky, but important. I've done most of my manual addressing as 2 octal bytes because when you reference an address in a CALL or JMP, the address is broken into 2 bytes and octal has a bad habit of changing when represented as a single 16-bit word versus 2 8-bit bytes. For example, 123456Q as a 16-bit word becomes 247Q 056Q as 2 bytes. It makes it easier if I can be consistent with how I have been counting up until now. Although, it's about time I switch to using hexadecimal anyway. The code can be commented. Anything after a ';' until the end of the line is ignored. # Missing Features # I still didn't get all the bells and whistles in on this round. I had to draw a line somewhere and I had blown way past it already. Formatting is very strict. Optional label, one tab, opcode, one tab, comma separated args. The args can be followed by any garbage you want (see below). You can't use multiple tabs or spaces and all entered characters are automatically uppercased before being stored. Robust parsing. Once the parser sees all the fields it needs, it stops looking. You could, by mistake, provide 2 args to an opcode that requires one and the second one will be silently ignored. The program might work the way you want but the written code won't make sense. Robust error reporting. It will catch most formatting errors, referencing undefined symbols, redefining an EQU, etc, but no detail is provided regarding what line, or exactly where the error is. Right now, the code is echoed as it is being streamed in so you'll see where assembly stops. Messages are terse to save space. IF/ENDIF pseudo-opcodes aren't implemented. I haven't felt the need for them. It may be a convenient way to comment out code, toggle debug code, or support multiple hardware configurations but I can use ';' to comment out code or debug instructions, and I only need to support my own hardware. MACROs were a "no way". I wasn't sure how to handle MACROS yet, especially MACROS with parameters which would be cool. I'm saving this one for later. It does not support mathematical expressions. Besides chars, number conversions, and strings, you can't get fancy with argument values. Basically, I'd have to write a full 16-bit multi-function calculator. This might be the next thing I do so I can include it in the assembler but too much extra work for my needs right now. Separate from error reporting, it also doesn't do much error checking. For example, you can grow the symbol table down over your own program or the assembler itself, if it's in memory below the symbol table. You can ORG into the symbol table or assembler and overwrite it that way. You can ORG backwards and overwrite your own program you're assembling. All the memory is yours to abuse, the assembler won't be looking out for you. Burning the assembler to a PROM will at least protect that from being wiped out. I also don't check for buffer over runs so you can probably cause wackiness by entering really long labels or something. This doesn't act like a real assembler. You're expected to type each line of assembly, or in the modern day cheat and stream it from a terminal emulator. There is no reading code from storage, or memory, no saving the symbol table, and no saving the binary output, except directly into memory. File IO is on the short list of things to get to. Also an editor to make a full development environment. # Known Bugs # Besides the above shortcomings, there are also a few bugs I already know I'll need to fix. I didn't yet, mostly because adding code would require me to, by hand, re-address over 2 thousand lines of code and update all the references. I should probably use this new assembler to do that for me. I had been adding new sub-routines near the end of the code. When entering a line of code, typing a tab, which is the required file separator, advances the cursor 4 spaces in your terminal, but backspacing or deleting it only moves the cursor back one space. I reused an old subroutine and somehow I never caught that until recently. It doesn't break data entry, but it can be confusing if you try to delete more of the visual spaces and in memory you're deleting actual characters which will still be visible in the terminal. You can't use special escaped characters in expressions, only within strings (which is limited only to the DB pseudo-opcode). You might want to CPI '\0' to check for a null terminating byte at the end of a string but you'll need to use a numeric value instead. Printable literal chars work, though. That will be an easy copy and paste from the string subroutine to implement. All characters are uppercased automatically when entered. I did this just to simplify opcode and symbol lookup but neglected to think that you might want to enter case sensitive strings as data. Oops. I got lazy and it doesn't check data length against an expected size. If you enter a hex word of FFFFH when a byte is needed, it will silently truncate to the last byte, FFH. If a word is needed and you enter less than that, it will prefix it with a 0 as many zeros as it needs. I would consider this a feature, though, but it was worth mentioning the behavior. Similarly, it doesn't check numeric input length so if you make a mistake and use too many digits, you silently loose the most significant digits that didn't fit. Too many bits in that 16-bit binary word and you might not notice you list the most significant bit. I also imagine it's going to be really easy to mistakenly enter a 000 377Q as a 2 byte octal word and end up with 000277Q in memory because I parse it as 2 separate bytes and truncate each one separately. I know how to fix this, but didn't go back and implement it that way. I should really be checking syntax better overall, anyway. # Up Next # The next things I need to do, besides more testing, is to reassemble into higher memory so you can write programs using interrupts and be able to start at address zero and have as much memory real estate as possible. This will require me to rewrite the assembler code in this assembler's assembly syntax. The big features I want to add are simple calculations so you can reference memory offsets and such, and MACROs to better organize and reuse code. I think. I'm not sure I'm sold on the necessity of MACROs, yet. I'd like to revisit a number of design decisions, try other algorithms, clean up the code, optimize for code reuse and make it more user friendly. Alternatively, I could just claim victory here, say I can write an assembler from scratch by hand and know that I could add more to it if I wanted to and instead, just use one off the shelf from the era that is more compact and featureful. It's going to depend on my mood, I guess. And if I can find an off the shelf assembler I like. [0] http://cowlark.com/2019-06-01-cpm-asm/ [1] https://www.youtube.com/c/8BitShowAndTell/ [2] gopher://kagu-tsuchi.com:70/0/blog/articles/altair_assembler_part_1.html