+FarmerPotato Posted October 22, 2022 Share Posted October 22, 2022 Optimization Exercise: Move one byte into the low byte of a register, make upper byte zero. This is a common idiom for preparing to work on an 8-bit value. Choose: optimize for speed, or program size. A Compiler, for instance, would optimize accordingly. Consider: 9995 can read/write exactly one byte on MOVB, 9900 must read before write. Examples: #1 MOVB *R0,R1 SRL R1,8 #2 CLR R1 MOVB *R0,R1 SWPB R1 I cringe when I see the first one, because I assume shifts are slow. Then I see this a lot in TI source code: R1LB EQU MYWS+3 LWPI MYWS CLR R1 MOVB *R0,@R1LB I suggest: MOVB *R0,@R1LB SB R1,R1 look, we do have a CLRB instruction! also SZCB Similar: STWP R2 CLR R1 MOVB *R0,@3(R2) Returning a byte inside a BLWP call: CLR @2(R13) MOVB *R0,@3(R13) RTWP No advantage here: MOVB @ZERO,@2(R13) MOVB *R0,@3(R13) RTWP ZERO DATA 0 handy thing to put into a register, too. Finally, absolutely nuts: MOV *R0,R1 possibly misaligned. If odd, LSB is ok, else MSB. SRL R0,1 test for odd JEQ $+4 SWPB R1 SB R1,R1 clear MSB Discuss... 2 1 Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted October 22, 2022 Share Posted October 22, 2022 For many years I used CLR register/MOVB/SWPB and the MOV/SRL methods. I changed my approach some years ago for reasons related to the 9995, after noticing that according to its datasheet, SWPB is a slow instruction. When possible, I only clear the destination register once, then move the byte value as shown below. The SB method looks like a neat, quick way to do the same thing. 4 hours ago, FarmerPotato said: R1LB EQU MYWS+3 LWPI MYWS CLR R1 MOVB *R0,@R1LB Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted October 22, 2022 Share Posted October 22, 2022 I'm sure it is all very context dependent, so for a number of applications, looking back at some of my source, I find I have at times just ignored the existence of the low byte. In such cases, I've used word values for immediate operands as necessary to manipulate the high byte the way I want. As long as what you are doing in word instructions doesn't cause undesired side effects to your high byte... such as carrying or borrowing a bit. I had never imagined the subtract high byte from itself trick. I think no matter what instruction(s) you use, the only meaningful assembly code is the part in the far right column. ( the comments ) 1 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted October 23, 2022 Author Share Posted October 23, 2022 Motivation: A compiler needs this sequence often. In C, a char must be promoted to int before arithmetic. In Forth, C@ and C! must also load/store the low byte. (If you need to optimize char operations, use ASM MOVB) 2 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted October 23, 2022 Share Posted October 23, 2022 I've only ever used the CLR/MOVB/SWPB method. So which method is the fastest? My current project has loads of these operations as my data is byte oriented and I wouldn't mind a speed boost... Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted October 23, 2022 Author Share Posted October 23, 2022 I was hoping somebody would check this in Classic99. But here is the calculation following 9900 FAMILY SYSTEMS DESIGN page 8-23. Calculation: Sum C, M for Inst, Src, Dst CPU M includes register access. Add more from tables A,B. Add C for CPU. (includes M for register access) Multiply M by W wait states. Cycles = C + (W * M) (99/4A: time = 333ns * cycles) For 9900 in 99/4A, I assume registers on 16-bit bus 99/4A (no wait), instruction fetch incurs +5 W for external RAM. For 9995 in Geneve, I am assuming the registers are on-chip, code off-chip, instruction fetch incurs W=1 for DRAM. CPU SRC DST Total SRL R1,8 C M ? C M ? C M C M Cycles ---------------- --------- ------- ------- ----- ------ 9900 12+2n 3 A 0 0 - 28 3 36 9995 8+n 2 A 0 0 - 16 2 18 CPU SRC DST Total SWPB R1 C M ? C M ? C M C M Cycles ---------------- --------- ------- ------- ----- ------ 9900 10 3 A 0 0 - 10 3 15 9995 14 2 A 0 0 - 14 2 16 I may have made mistakes -- but SWPB wins on both CPUS. That is a pretty big margin in favor of SWPB on 9900. 2 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted October 23, 2022 Share Posted October 23, 2022 1 hour ago, FarmerPotato said: I may have made mistakes -- but SWPB wins on both CPUS. That is a pretty big margin in favor of SWPB on 9900. But—CLR R1 needs to be added to the calculations for SWPB R1 to truly compare it to the MSB clearing of SRL R1,8 (when clearing the MSB matters). SWPB still wins for the 9900, but not by much (6 cycles?). However, it looks like SRL R1,8 wins for the 9995 (surely more than 2 more cycles). ...lee 1 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted October 23, 2022 Share Posted October 23, 2022 Good to know. I'll stick to my old habits then Quote Link to comment Share on other sites More sharing options...
Tursi Posted October 23, 2022 Share Posted October 23, 2022 I quite like the SB idea, I never thought of that either. Will probably use it in the future. 1 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted October 23, 2022 Author Share Posted October 23, 2022 Oops. My brain: “That was tedious. I don’t wanna do the other instructions. The others are common to both anyway” Lee guesses the 9900 margin in favor of SWPB is 6. I see that now, if CLR is equal to SWPB. 99105 probably is like 9995– I’ll assume SRL there. 1 Quote Link to comment Share on other sites More sharing options...
Willsy Posted October 24, 2022 Share Posted October 24, 2022 On 10/22/2022 at 6:41 PM, FarmerPotato said: I suggest: MOVB *R0,@R1LB SB R1,R1 look, we do have a CLRB instruction! also SZCB I say, old chap! 1 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted October 24, 2022 Share Posted October 24, 2022 12 hours ago, Tursi said: I quite like the SB idea, I never thought of that either. Will probably use it in the future. Very clever. That said, the problem here is that one would have to either set up pointers to the location of each register used unless one sticks to a single register for these operations, which I feel obfuscates the code as compared to the straightforward CLR/MOVB/SWPB method. Worthwhile only if there is a definite speed advantage to be had. 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted October 24, 2022 Share Posted October 24, 2022 3 hours ago, Vorticon said: Very clever. That said, the problem here is that one would have to either set up pointers to the location of each register used unless one sticks to a single register for these operations, which I feel obfuscates the code as compared to the straightforward CLR/MOVB/SWPB method. Worthwhile only if there is a definite speed advantage to be had. Maybe it's just me but when I am counting cycles on different code options I find the old 9900 does not give up many cycles for my efforts. Tursi once posted it generally just comes down to whichever code uses the least number of instructions. Seems like a good rule of thumb. 3 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.