68030 Cycles Optimisation

Last Updated: 01 October 1998
Typed and bad HTML version by Scorpion/Silicon
Based on Motorola 68030 User's manual :)
Sorry for my loosy english ;)
How work the cycles tables
How to know executions times
How to optimise cycles

How work the cycles tables
Lot of you have probably see the amigaguide file with 68030 cycles without understand all this numbers !!! Yes, remember we have something like that: Instructions Head Tail I-Cache No-Cache Move Rn,Dn 2 0 2(0/0/0) 2(0/1/0) When i saw this for the first time i have supposed that a Move take 2 cycles, because at the I-Cache column they were a 2 ;) Anyway that's practically allways right that we only need to look at the I-Cache column ... Ok, let's go for more details about this strange numbers ... In a 030 all instructions and effectives address had what we call an Head and a Tail. Why? Just because 030 instructions can be overlapped eachothers. Look at this figure: >>> Executions times >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> +--- Instruction A ---+ +--- Instruction B ---+ +--- Instruction C --+ Now imagine your instruction look like this: +- Head - ... - Tail -+ A Tail can overlap a Head, in condition that the Head column of instruction B and the Tail of instruction A was different of 0. Now you know what is the head and tail column, let's go for the I-Cache and No-Cache column. (Anyway don't panic, if you want to know more about how to optimise with Head and Tail, it will be said a little bit later ;) I-Cache column was in reality the Instruction-Cache-Case Execution Time (ouf ;) This is the total number of clock periods required to execute the instruction if the instruction was in the instruction cache ... But it don't take in consideration the overlap times. No-Cache was the inverse ... It was the worst execution time featuring cache miss ... Now take a look at the a(b/c/d) numbers: a : Total number of clocks b : Number of read cycles c : Maximum number of instruction access cycles d : Number of writes cycles Ok, now you know everything on how to read the 030 cycles table :)
How to know execution times
Simply ... They were 2 formulas ;) In fact the overall instruction time will depend of the overlap with the previous and the following instructions ... The first formula was in the case that we don't have any effective address: CC1+[ CC2 - min(H2,T1) ]+[ CC3 - min(H3,T2)] ... With: CCn is the I-Cache column of the instruction n Tn is the Tail column of the instruction n Hn is the Head column of the instruction n min(a,b) is the minimum of a and b An example: Code: H T CC move.l d0,d1 2 0 2 add.l d2,d1 2 0 2 sub.l d3,d2 2 0 2 move.l d1,-(a0) 0 2 4 The execution times will be: CC1+[CC2-min(H2,T1)]+[CC3-min(H3,T2)]+[CC4-min(H4,T3)] =2 +[2 -min(2,0) ]+[2 -min(2,0) ]+[4 -min(0,0) ] =2 + 2 -0 + 2 -0 + 4 -0 =10 cycles Oh!! What a shame that's exactly the sum of CC!! They were no overlapped instructions here!! But imagine we have this: move.l d0,d1 2 0 2 add.l d2,d1 2 0 2 move.l d1,-(a0) 0 2 4 sub.l d3,d2 2 0 2 This time we will have: CC1+[CC2-min(H2,T1)]+[CC3-min(H3,T2)]+[CC4-min(H4,T3)] =2 +[2 -min(2,0) ]+[4 -min(0,0) ]+[2 -min(2,2) ] =2 + 2 -0 + 4 -0 + 2 -2 =8 cycles Yeah!! We have win 2 cycles! Notes that the sub after the move take 0 cycles!! So, as you can see we can win cycles on 030 only in moving some instructions.
The second formula take in consideration effective address, here it is: CCea1+ [CCop1 -min(Hop1,Tea1)] + [CCea2 -min(Hea2,Top1)] + [CCop2 ...] + ... With: CCean the effective address time for the instruction-cache-case CCopn the instruction-cache-case-time for the portion of instruction n Tean the tail time for the effective address of instruction n Hean the head time for the effective address of instruction n Hopn the head time for the operation portion of instruction n Topn the tail time fot the operation portion of instruction n min(a,b) the minimum of a and b It's a little bit more complex than the first one, don't you think ;) In fact when we have an effective address we must take it as an instruction ... Here is an example: move.l -(a0),d0 We have in reality: fea -(a0) move.l EA,d0 So with cycle we will have: code H T CC fea -(a0) 2 2 4 move.l EA,d0 0 0 2 Compute now the cycle: CCea1+ [CCop1 -min(Hop1,Tea1)] =4 + [2 -min(0 ,2 )] =4 + 2 =6 cycles Ok, now you know how compute time cycles ...
How to optimise Cycles
Note this part was only for 030 processors ... Anyway as you can see 030 instructions can overlap others instructions, the simple way to see if we can win cycles is to look at the Head and Tail column in diagonal !!! (My english was really, really, really ...bad 8) ) Ok, an example: H T Instruction (n) a b / Instruction (n+1) c d H and T was Head and Tail column a,b,c,d was cycles ... If b or c were 0 they were no overlap if b and c were different of 0, you win cycles :) So, here is a method to know fast how cycles take instructions: Make the add of all CC, look the diagonal (b,c) take the minimum and sub it with the add of CC ... To win cycle on an effective address this is exactly the same things: Effective address a b Instruction c d (EA allways first !) An example: Lot of you have probably read that clr.l -(a0) was faster than clr.l (a0)+, look at it ... clr.l (a0)+ was in reality: cea (a0)+ clr.l cea So in cycle we will have: code H T CC cea (a0)+ 0 0 2 / clr.l cea 0 1 3 As you can see in diagonal we have two 0 so time cycle will be the add of CC =5 cycles Now the second possibility: clr.l -(a0) was in reality: cea -(a0) clr.l cea With cycles ... code H T CC cea -(a0) 2+op Head 0 2 clr.l cea 0 1 3 (op Head=0 here -> it's the Head of clr.l) =5 cycles What a shame, that exactly the same speed!!! Well in reallity if you test it, it's faster, why? Just because imagine we have: clr.l (a0)+ clr.l (a0)+ clr.l (a0)+ After a fast calcule we will have: =15 cycles But look at: clr.l -(a0) clr.l -(a0) clr.l -(a0) We will have in cycles: code H T CC cea -(a0) 2 0 2 / clr.l cea 0 1 3 / cea -(a0) 2 0 2 / clr.l cea 0 1 3 / cea -(a0) 2 0 2 / clr.l cea 0 1 3 The sum of CC were 15 cycles, but if we look at (H,T) every two instructions we have (2,1), the minimum was 1 and that 3 times, so 3 cycles: =15-3 =12 cycles Yeah!!! 3 instructions, 3 cycles wins :) So, now have good cycles optimisations ;) or buy a PPC ...