Efficient and Concise Multiplication in CHIP-8 I've entertained how to nicely express a decent many algorithms in CHIP-8 but hadn't thought my such thoughts warranted an article. I recently read of someone doing something similar, and this changed my mind. This other wrote a program to unintelligently and inefficiently find instruction listings. My approach targets multiplication and uses simple logic to get, I think, a nicer result; in mulling possible instructions, it's clear which are applicable; if the machine can or can't perform the task in one instruction it's clear if, say, two are found to suffice they form an answer and upper bound. Binary shifts give powers of two a clear advantage, and an important aspect of intelligently hacking machine code is to consider global optimizations; this simple routine is optimal for all such shifts and it could be optimized further through more indirection to handle register management. The first shift is unnecessary; it ultimately clears the register. These are placed at no particular address: 400-401 1024-1025 ß ÜÜÜ 800E 32782 x256 V0  V0 x 2; VF  MSB 402-403 1026-1027 ß ÜÜÜ 800E 32782 x128 V0  V0 x 2; VF  MSB 404-405 1028-1029 ß ÜÜÜ 800E 32782 x64 V0  V0 x 2; VF  MSB 406-407 1030-1031 ß ÜÜÜ 800E 32782 x32 V0  V0 x 2; VF  MSB 408-409 1032-1033 ß ÜÜÜ 800E 32782 x16 V0  V0 x 2; VF  MSB 40A-40B 1034-1035 ß ÜÜÜ 800E 32782 x8 V0  V0 x 2; VF  MSB 40C-40D 1036-1037 ß ÜÜÜ 800E 32782 x4 V0  V0 x 2; VF  MSB 40E-40F 1038-1039 ß ÜÜÜ 800E 32782 x2 V0  V0 x 2; VF  MSB 410-411 1040-1041 ÜÜÜ ÜÜÜ 00EE 00238 Return Addition of a register with itself or a lone shift instruction are both suitable for a doubling, but the latter has the significant advantage of selecting a destination register rather than clobbering. A simple multiplication by three uses a second register, forming this trivial pair: 200-201 0512-0513 ß ÜÜÜß 810E 33038 V1  V0 x 2; VF  MSB 202-203 0514-0515 ß Ü ß 8104 33028 V1  V1 + V0; VF  overflow A multiplication by five is opportunity to use part of the routine and is a mere instruction larger: 200-201 0512-0513 ß ß 8100 33024 V1  V0 202-203 0514-0515 ß ÜÛ 240C 09228 Call x4 204-205 0516-0517 ß Ü ß 8104 33028 V1  V1 + V0; VF  overflow Multiplying by seven is close enough to a power of two for this alternative approach to be pleasant: 200-201 0512-0513 ß ß 8100 33024 V1  V0 202-203 0514-0515 ß ÜßÜ 240A 09226 Call x8 204-205 0516-0517 ß Ü Ü Ü 8015 32789 V0  V0 - V1; VF  borrow Multiplying by larger numbers well uses fragmentation; this takes twenty-four as eight plus sixteen: 200-201 0512-0513 ß Ü 8010 32784 V0  V1 202-203 0514-0515 ß ÜßÜ 240A 09226 Call x8 204-205 0516-0517 ß ß 8200 33280 V2  V0 206-207 0518-0519 ß Ü 8010 32784 V0  V1 208-209 0520-0521 ß Üß 2408 09224 Call x16 20A-20B 0522-0523 ß Ü Ü 8024 32804 V0  V0 + V2; VF  overflow That's no comparison to the much nicer method of splitting twenty-four as eight times three, though: 200-201 0512-0513 ß ÜÜÜÜ 801E 32798 V0  V1 x 2; VF  MSB 202-203 0514-0515 ß Ü Ü 8014 32788 V0  V0 + V1; VF  overflow 204-205 0516-0517 ß ÜßÜ 240A 09226 Call x8 The number forty-three, resulting by adding or subtracting powers of two, should suit as a bad case: 200-201 0512-0513 ß Ü 8010 32784 V0  V1 202-203 0514-0515 ß ÛÜ 2406 09222 Call x32 204-205 0516-0517 ß ß 8200 33280 V2  V0 206-207 0518-0519 ß Ü 8010 32784 V0  V1 208-209 0520-0521 ß ÜßÜ 240A 09226 Call x8 20A-20B 0522-0523 ß Ü Ü 8024 32804 V0  V0 + V2; VF  overflow 20C-20D 0524-0525 ß Ü Ü 8014 32788 V0  V0 + V1; VF  overflow 20E-20F 0526-0527 ß Ü Ü 8014 32788 V0  V0 + V1; VF  overflow 210-211 0528-0529 ß Ü Ü 8014 32788 V0  V0 + V1; VF  overflow Notice three additions is cost just large enough to make other approaches infeasible; multiplication by four, followed by a subtraction, is one instruction larger, due to the extra register management; replacing the three additions with one and a shifting results in the same count, and so isn't shown: 200-201 0512-0513 ß Ü 8010 32784 V0  V1 202-203 0514-0515 ß ÛÜ 2406 09222 Call x32 204-205 0516-0517 ß ß 8200 33280 V2  V0 206-207 0518-0519 ß Ü 8010 32784 V0  V1 208-209 0520-0521 ß ÜßÜ 240A 09226 Call x8 20A-20B 0522-0523 ß Üß 8204 33284 V2  V2 + V0; VF  overflow 20C-20D 0524-0525 ß Ü 8010 32784 V0  V1 20E-20F 0526-0527 ß ÜÛ 240C 09228 Call x4 210-211 0528-0529 ß Ü Ü 8024 32804 V0  V0 + V2; VF  overflow 212-213 0530-0531 ß Ü Ü Ü 8015 32789 V0  V0 - V1; VF  borrow Loop adding forty-three would be concise, but inefficient; a table is a good solution yet changes I: 200-201 0512-0513 ß ß Ü ß A208 41480 I  forty three 202-203 0514-0515 ßßßÛÜÜÜ F01E 61470 I  I + V0 204-205 0516-0517 ßÛÛß Ü Ü F065 61541 Load V0V0; I  I + 01 206-207 0518-0519 ÜÜÜ ÜÜÜ 00EE 00238 Return 208 0520 00 000 forty three 209 0521 Û Û ÛÛ 2B 043 20A 0522 Û Û ÛÛ 56 086 20B 0523 Û Û 81 129 20C 0524 Û Û ÛÛ AC 172 20D 0525 ÛÛ Û ÛÛÛ D7 215 Know the lower registers are rather poor choices for practical routines here, and zero in particular due to its use by BXXX. I will provide no general rule for determining efficient multiplication, as this can easily be done by a human at a whim. An easy way to get good multiplication chooses a good number to multiply by. Making a routine unnecessary is much better than making such more efficient. .