[HN Gopher] A 23-byte "hello, world" program assembled with DEBU... ___________________________________________________________________ A 23-byte "hello, world" program assembled with DEBUG.EXE in MS-DOS Author : susam Score : 68 points Date : 2022-10-30 20:07 UTC (2 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | marginalia_nu wrote: | DEBUG.EXE is some necronomicon-tier dark magic. | Narishma wrote: | What do you mean? It's a simple straightforward debug tool, or | a monitor as it used to be called in 8-bit systems. | userbinator wrote: | PC magazines used to publish source code listings for little | utilities in the form of DEBUG scripts. | blueflow wrote: | The debugger and the 8086 instruction set is well documented, | much better than the "modern" software that i have to work with | at dayjob. Its not magic. | pizza234 wrote: | It's interesting how ASM nowadays appears as a dark magic, | since only a small fraction of the programmers are (rightfully) | very far from that level. | | Curiously, I found learning Rust way more challenging than | learning 16-bit assembly (it was way simpler back then; no | complex instructions, less baggage, simpler processors... and | less expectations :)). | int_19h wrote: | I've learned x86 16-bit assembly originally, and I find that | most of that knowledge is still applicable when looking at | assembly listings while debugging C/C++ today (which is the | most likely area where one might have to deal with it in | production these days; few people get to write asm from | scratch). | | For x86, at least, I wouldn't even say that it was less | complex. Segmented memory alone is a huge complication, and | then on top of that there was all the legacy CISC stuff like | the BCD helpers or ENTER/LEAVE; x64 is comparatively | streamlined. | mtrower wrote: | An 8 year old, using the ring bound manual that came with their | computer, can figure out the basics - enough to write a program | like this, and examine it. I'm not being dismissive here, but | speaking literally from experience. Personally, it's my opinion | that computers were a lot simpler back then. | | Nowadays we're very, very far from the hardware - hardware that | has grown very complicated in comparison. | userbinator wrote: | You can just use ret at the end, saving 3 bytes. Also, the | initial value of bp is 09xx on every version of MS-DOS since 4.0, | so you can also start off with an xchg ax,bp to save another | byte. xchg ax,bp mov dx,107 int | 21 ret db "hello, world" 0d 0a '$' | | 22 bytes, of which 15 is the message. | EvanAnderson wrote: | I was going to suggest the RET but I didn't remember the BP | being set to save the additional byte. Nice, albeit sacrificing | some compatibility. | | For anybody interested some DOS default register values | documented: http://www.fysnet.net/yourhelp.htm | userbinator wrote: | It's an old demoscene trick, so perhaps a bit obscure but | somewhat common in the sizecoding community. | jart wrote: | Wow from feedback to commit in 30 minutes. | https://github.com/susam/hello/commit/36fa08e7cafb7c5268b651... | susam wrote: | Updated the repository to use your RET suggestion. Thanks! | ithinkso wrote: | With no attribution nevertheless :) (/s) | | I love how, without the 'hello, world' message itself, 25% of | your entire HELLO.ASM codebase is from a random HN comment | colejohnson66 wrote: | Impressive. Is there a reason to jump over the string instead of | just having the string _after_ the program? Seems like one could | save two bytes doing that. | donio wrote: | debug.exe is single pass and doesn't do labels so by having the | string first you know its address later. | vore wrote: | You can compute the address of the string yourself like a | two-pass assembler would, though, so that shouldn't be | limiting. | pizza234 wrote: | This is an ASM program with a very standard structure | (including the standard printing API: | http://spike.scu.edu.au/~barry/interrupts.html#ah09) using a | very standard tool (DEBUG.exe, common at the time for quick | debugging); I'm confused why this is impressive. | Agingcoder wrote: | The COM executable gets loaded by DOS at address 100h, so the | first bytes have to be executable code, if memory serves me | well? | vore wrote: | I think OP is saying what if you wrote it as: | mov ah, 9 mov dx, offset helloworld | int 21 mov ah, 0 int 21 | .helloworld: db 'hello, world', d, A, '$' | dmitrygr wrote: | You are right and sister comment to this one is wrong. Thusly: | | MOV AH, 9 | | MOV DX, str | | INT 21 | | MOV AH, 0 | | INT 21 | | Str: | | DB 'hello, world', d, A, '$' | q-big wrote: | This program can be simplified further: MOV | AH, 9 MOV DX, str INT 21 RET str: | DB 'hello, world', d, A, '$' | | Why can MOV AH, 0 INT 21 | | be replaced by RET? Here is the answer: | https://stackoverflow.com/a/60805758 | | UPDATE: Under https://news.ycombinator.com/item?id=33398592 | userbinator posted an additional possible optimization. | ralferoo wrote: | Not a size optimisation, but a performance optimisation... | INT 21 RET | | can be replaced with JP 5 | susam wrote: | I wrote this about 20 years ago during my university days. I | happened to stumble upon it today in my archives and thought of | sharing it on GitHub. I was still learning microprocessors back | then. While browsing the C:\Windows directory, I fortuitously | happened to discover the DEBUG.EXE program. Turned out it was | available on any standard installation of MS-DOS as well as | Windows 98. That chance encounter helped me to dive into the | world of assembly language programming much before the | coursework introduced me to more popular assemblers. | | Since I was still learning the x86 CPUs, the intention here was | not to save bytes but instead to have something working. I | believe I picked up the style of having the string at the top | and jumping over it from other similar code I had come across | in those days. | | You are right of course. Here is a complete example that moves | the string to the bottom and saves two bytes: | C:\>DEBUG -A 1165:0100 MOV AH, 9 1165:0102 | MOV DX, 10B 1165:0105 INT 21 1165:0107 MOV AH, 0 | 1165:0109 INT 21 1165:010B DB 'hello, world', D, A, '$' | 1165:011A -G hello, world Program | terminated normally -N HELLO.COM -R CX CX | 0000 :1A -W Writing 0001A bytes -Q | C:\>HELLO hello, world C:\> | | I have now updated the GitHub repository with this updated | source code and binary. Thank you for the nice comment! | Narishma wrote: | DEBUG.EXE is present in all versions of MS-DOS since the very | first. | owl57 wrote: | Curious. These other programmers probably learned the habit | from some even older code. Jumping over data isn't a very | obvious way of organising code, so probably it served some | purpose many years ago. | | Maybe someone here knows what was that purpose? | [deleted] | userbinator wrote: | I remember seeing that in old Asm books too. My best guess | is that it avoids having too many forward references, which | would take up precious memory in the systems of the time | and perhaps reach the limit of the assembler sooner. | _the_inflator wrote: | Maybe also better for linking the files. At least this | was the reason I did it on Amiga when using absolute | addresses. I only had to remember the start of the | address area even when recompiling. | jstanley wrote: | I wrote a compiler for a small machine that did this so | that it could output the string content straight away | without having to buffer it in memory. | jmole wrote: | you can hard-code values if you know their address in | memory. If the data section comes first, then it doesn't | move around if your code size changes. | secondcoming wrote: | This brings back memories! | | Many years ago I used debug.exe create a bootable floppy that did | nothing but display my name on the screen when booted from. I | peed a little when it finally worked. | | Why did MS stop shipping it??? | ivoras wrote: | Crazy that once upon a time just sequences of machine code were | written to files, without any headers, or checksums, or basically | any modern metadata. Just plain machine code, to be directly fed | into the CPU. | | Cue lamentations of today's complexity mixed with feelings of | life being great because of it. | pizza234 wrote: | Yes, although this was for COM files only, which were limited | to (64k - 0x100) bytes. EXEs didn't have this limitation, but | they indeed had a header. | Lt_Riza_Hawkeye wrote: | And before that, they were toggled directly into the machine | using physical switches! | 13of40 wrote: | MS-DOS v1 didn't have an assembler in debug.exe (or was it | .com?) so the only way to author machine code on a 5150 with no | extra dev tools was to code it on paper, translate it by hand, | and enter it in hex. | susam wrote: | You are right. It was DEBUG.COM. Indeed it did not have an | assembler. It had a disassembler though. An archived copy | from MS-DOS v1.25 can be found here: | https://github.com/microsoft/MS- | DOS/blob/master/v1.25/bin/DE... C:\>DEBUG.COM | -A ^ Error -N HELLO.COM -L -U 100 | 107 0340:0100 B409 MOV AH,09 | 0340:0102 BA0801 MOV DX,0108 0340:0105 CD21 | INT 21 0340:0107 C3 RET - | | There is another copy of the debugger for MS-DOS v2.0 here: | https://github.com/microsoft/MS- | DOS/blob/master/v2.0/bin/DEB... . This one does does have an | assembler. C:\>DEBUG.COM -A | 0482:0100 MOV AH, 2 0482:0102 MOV DL, 41 | 0482:0104 INT 21 0482:0106 RET 0482:0107 -N | A.COM -R CX CX 0000 :7 -W | Writing 0007 bytes -Q C:\>A A | C:\> | pjc50 wrote: | If you're programming a microcontroller, that time is now. | | (OK, so it'll often be the output of a C compiler, but there's | usually some work to be done in asm to get the system in a | state with a stack, clocks, RAM etc to run a C program!) ___________________________________________________________________ (page generated 2022-10-30 23:00 UTC)