Jump to content
IGNORED

SRL vs SWPB: moving one byte into LSB


FarmerPotato

Recommended Posts

Optimization Exercise:

 

Move one byte into the low byte of a register, make upper byte zero. 
This is a common idiom for preparing to work on an 8-bit value.   

 

Choose: optimize for speed, or program size.     A Compiler, for instance, would optimize accordingly. 


Consider: 9995 can read/write exactly one byte on MOVB, 9900 must read before write.

 

Examples:

 

#1

 MOVB *R0,R1
 SRL  R1,8

 

#2

 CLR  R1
 MOVB *R0,R1
 SWPB R1

 
I cringe when I see the first one, because I assume shifts are slow.


Then I see this a lot in TI source code:

R1LB EQU MYWS+3
 LWPI MYWS
 CLR  R1
 MOVB *R0,@R1LB


I suggest:
 

 MOVB *R0,@R1LB
 SB   R1,R1        look, we do have a CLRB instruction! also SZCB

 

Similar:

 

 STWP R2
 CLR  R1
 MOVB *R0,@3(R2) 

 

Returning a byte inside a BLWP call:

 

 CLR  @2(R13)
 MOVB *R0,@3(R13)
 RTWP

 

No advantage here:

 

 MOVB @ZERO,@2(R13)
 MOVB *R0,@3(R13)
 RTWP 
ZERO DATA 0          handy thing to put into a register, too. 

 
Finally, absolutely nuts:

 

 MOV  *R0,R1    possibly misaligned. If odd, LSB is ok, else MSB.
 SRL  R0,1      test for odd
 JEQ  $+4 
 SWPB R1
 SB   R1,R1     clear MSB

 

Discuss...

 

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

For many years I used CLR register/MOVB/SWPB and the MOV/SRL methods.   I changed my approach some years ago for reasons related to the 9995, after noticing that according to its datasheet, SWPB is a slow instruction.  When possible, I only clear the destination register once, then move the byte value as shown below.  The SB method looks like a neat, quick way to do the same thing. 

4 hours ago, FarmerPotato said:
R1LB EQU MYWS+3
 LWPI MYWS
 CLR  R1
 MOVB *R0,@R1LB

 

Link to comment
Share on other sites

I'm sure it is all very context dependent, so for a number of applications, looking back at some of my source, I find I have at times just ignored the existence of the low byte. In such cases, I've used word values for immediate operands as necessary to manipulate the high byte the way I want. 

As long as what you are doing in word instructions doesn't cause undesired side effects to your high byte... such as carrying or borrowing a bit.

 

I had never imagined the subtract high byte from itself trick. I think no matter what instruction(s) you use, the only meaningful assembly code is the part in the far right column. ( the comments )

  • Like 1
Link to comment
Share on other sites

I was hoping somebody would check this in Classic99.   But here is the calculation following 9900 FAMILY SYSTEMS DESIGN page 8-23.

 

Calculation:

 

Sum C, M for Inst, Src, Dst
CPU M includes register access. Add more from tables A,B.
Add C for CPU.  (includes M for register access)
Multiply M by W wait states. 

Cycles = C + (W * M)

(99/4A:  time = 333ns * cycles)

 

For 9900 in 99/4A, I assume registers on 16-bit bus 99/4A (no wait), instruction fetch incurs +5 W for external RAM.

 

For 9995 in Geneve, I am assuming the registers are on-chip, code off-chip, instruction fetch incurs W=1 for DRAM. 

 

 

 

                  CPU         SRC         DST         Total
SRL  R1,8         C       M   ?   C M     ?   C M     C  M    Cycles
----------------  ---------   -------     -------     -----   ------
9900              12+2n   3   A   0 0     -           28 3    36
9995              8+n     2   A   0 0     -           16 2    18


                  CPU         SRC         DST         Total
SWPB R1           C       M   ?   C M     ?   C M     C  M   Cycles
----------------  ---------   -------     -------     -----  ------
9900              10      3   A   0 0     -           10 3   15
9995              14	  2   A   0 0     -           14 2   16

 

 

I may have made mistakes -- but SWPB wins on both CPUS. That is a pretty big margin in favor of SWPB on 9900.  

 

  • Like 2
Link to comment
Share on other sites

1 hour ago, FarmerPotato said:

I may have made mistakes -- but SWPB wins on both CPUS. That is a pretty big margin in favor of SWPB on 9900.

 

But—CLR R1 needs to be added to the calculations for SWPB R1 to truly compare it to the MSB clearing of SRL R1,8 (when clearing the MSB matters). SWPB still wins for the 9900, but not by much (6 cycles?). However, it looks like SRL R1,8 wins for the 9995 (surely more than 2 more cycles).

 

...lee

  • Like 1
Link to comment
Share on other sites

Oops.  My brain: “That was tedious. I don’t wanna do the other instructions. The others are common to both anyway”


Lee guesses the 9900 margin in favor of SWPB is 6. I see that now,   if  CLR is equal to SWPB. 
 

99105 probably is like 9995– I’ll assume SRL there. 
 

 

 

 

  • Like 1
Link to comment
Share on other sites

12 hours ago, Tursi said:

I quite like the SB idea, I never thought of that either. Will probably use it in the future. ;)

 

Very clever. That said, the problem here is that one would have to either set up pointers to the location of each register used unless one sticks to a single register for these operations, which I feel obfuscates the code as compared to the straightforward CLR/MOVB/SWPB method. Worthwhile only if there is a definite speed advantage to be had.

  • Like 3
Link to comment
Share on other sites

3 hours ago, Vorticon said:

Very clever. That said, the problem here is that one would have to either set up pointers to the location of each register used unless one sticks to a single register for these operations, which I feel obfuscates the code as compared to the straightforward CLR/MOVB/SWPB method. Worthwhile only if there is a definite speed advantage to be had.

 

Maybe it's just me but when I am counting cycles on different code options I find the old 9900 does not give up many cycles for my efforts. :) 

Tursi once posted it generally just comes down to whichever code uses the least number of instructions. Seems like a good rule of thumb.

 

 

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...