6502 Killer hacks

Thomas Jentzsch · March 9, 2007

This is how I would do it:

	lda	 v	  ; 2
ldx	 #-1	; 2
.count:
inx			; 2
.loop:
lsr			; 2
bcs	 .count ; 2³
bne	 .loop  ; 2³

No someone do the math.

Urchlay · March 9, 2007

This is how I would do it:
	lda	 v	 ; 2
ldx	 #-1; 2
.count:
inx		; 2
.loop:
lsr		; 2
bcs	 .count; 2³
bne	 .loop ; 2³
No someone do the math.

Holy crap, *that* is a killer hack!

10 cycles if v was 0, and 63 cycles if it was $FF (worst case)... in 10 bytes of code!

Also, your version is optimized for lower numbers... the execution time depends on where the most significant set bit is, instead of the number of set bits like the Kernighan version (so your version is faster for $0F than it would be for $F0, even though both have 4 bits set).

If I were wearing a hat, I'd take it off to you

vdub_bobby · March 9, 2007

Thanks Urchlay and TJ, lots of fun reading.

Now tell me what in the world "parity" is.

Urchlay · March 9, 2007

Thanks Urchlay and TJ, lots of fun reading.

Now tell me what in the world "parity" is.

Long description: http://en.wikipedia.org/wiki/Parity_%28telecommunication%29

Short version: An 8th bit that's added to a 7-bit byte, so that the number of 1 bits comes out to an even number (even parity) or to an odd number (odd parity). Used to detect errors on a serial line: the sender sends bytes with correct parity, and the receiver calculates what the parity bit should be... and if they're different, the receiver knows the data got garbled during transmission.

vdub_bobby · March 9, 2007

Thanks.

EDIT: How about this one:

Determining if an integer is a power of 2

unsigned int v; // we want to see if v is a power of 2
bool f;		 // the result goes here 

f = (v & (v - 1)) == 0;

Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:

f = !(v & (v - 1)) && v;

What's the killer 6502 hack for this? I've come up with this:

   lda v
  dec v
  and v
  inc v

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

Edited March 9, 2007 by vdub_bobby

Urchlay · March 10, 2007

   lda v
  dec v
  and v
  inc v
Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

The return value should really be in a flag, so you can immediately use a branch instruction afterwards... otherwise you'll need something like "cmp #0" in the caller. How about:

; Warning: untested code!

is_x_power_of_2:; convenience entry point: checks X instead of location v
 stx v
 .byte $2C; aka BIT absolute opcode, skips the "sta v" below

is_a_power_of_2:; convenience entry point: checks accumulator instead of location v
 sta v

; Main entry point:
is_v_power_of_2: subroutine; {
 lda v
 bne .main
 lda #1; Special case: return with Z flag false, if v==0
 rts

.main; v is known to be non-zero, here
 dec v
 and v; accumulator now contains 0 if v is a power of 2 (Z flag is set correctly, but "inc v" is about to destroy it)

; If you don't care about preserving v, you can comment out both these lines:
 inc v
 cmp #0; set Z flag true if a==0, or false otherwise (needed because "inc v" trashes the Z flag)

.done
 rts
; }

It's not so "killer" any more, but it's easy to use:

 ldx $whatever; get a byte from somewhere
 stx v
 jsr is_v_power_of_2; Was it a power of 2?
 beq yes_it_was; yep!
; no, do whatever here...

;; Alternatively:
 lda $whatever; or, A could be the result of some calculations
 jsr is_a_power_of_2
 beq yes_it_was

Also, since you're only dealing with an 8-bit byte, you can just brute-force it:

; Return Z=true if A is a power of two, or Z=false if it isn't
; Uses no RAM, preserves A/X/Y, trashes flags
is_a_power_of_2: subroutine; {
 cmp #1
 beq .yes
 cmp #2
 beq .yes
 cmp #4
 beq .yes
 cmp #8
 beq .yes
 cmp #16
 beq .yes
 cmp #32
 beq .yes
 cmp #64
 beq .yes
 cmp #128
 beq .yes

; if none of the compares are true, Z ends up being false
.yes; if any of the compares are true, Z is true when we branch here

; Either way, just return it.
 rts
; }

Converting the brute-force method to a loop... which will be slower than the original:

is_a_power_of_2: subroutine; {
 ldx #1
 stx tmp

.loop; {
 cmp tmp
 beq .done; return with Z=true
 asl tmp
 bne .loop
;	}

; original: lda #1; return with Z=false (too bad there's no CLZ in 6502-speak)
; EDIT:
 ldx #1; return with Z=false, preserving the value of A

.done
 rts
; }

However, TJ is probably about to post something that blows all this code away

Edited March 10, 2007 by Urchlay

+batari · March 10, 2007

Thanks.

EDIT: How about this one:
Determining if an integer is a power of 2

unsigned int v; // we want to see if v is a power of 2
bool f;		 // the result goes here 

f = (v & (v - 1)) == 0;

Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:

f = !(v & (v - 1)) && v;
What's the killer 6502 hack for this? I've come up with this:
   lda v
  dec v
  and v
  inc v
Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

Nice! :lust:

Urchlay · March 10, 2007

However, TJ is probably about to post something that blows all this code away

This isn't up to TJ's standards, but I just thought of another solution in the shower:

; destructively test to see if the accumulator is a power of two
; on exit, A is trashed, X and Y are preserved, and Z is set if A was a power of two
is_a_power_of_2: subroutine; {
 cmp #0; Test for special case A==0
 bne .loop
 adc #1; another "CLZ"
 rts

; All powers of two have only one bit set!
.loop; {
; This loop terminates when we find a set bit
 lsr
 bcc .loop
;   }

 rts; If we found a set bit, the Z flag will be set if that was the *only* set bit
; }

That would actually be kind of elegant, if it weren't for the need to test for the special A==0 case

supercat · March 10, 2007

That would actually be kind of elegant, if it weren't for the need to test for the special A==0 case

Hmm...

 cmp #1
 adc #255
lp:
 lsr
 bcc lp

Should do an early exit (NZ) in the A=0 case.

Thomas Jentzsch · March 10, 2007

   lda v
  dec v
  and v
  inc v
Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

How about

  dec v
 lda v
 inc v
 beq .exit
 and v
.exit:

?

Not perfect, since you still need a compare.

Urchlay · March 11, 2007

That would actually be kind of elegant, if it weren't for the need to test for the special A==0 case

Hmm...
 cmp #1
 adc #255
lp:
 lsr
 bcc lp
Should do an early exit (NZ) in the A=0 case.

Nice!

Now it's elegant.

supercat · March 11, 2007

Now it's elegant.

Maybe. Still a bit slow, though.

 cmp #1   ; Optional first part to make zero yield a non-zero result
 adc #255

 tax
 dex
 sbx #0

If for some reason it's better to have the result in the carry flag, replace the last instruction with "sbx #1". In that case, the zero flag will not be meaningful.

djmips · March 11, 2007

Now it's elegant.

Maybe. Still a bit slow, though.
 cmp #1  ; Optional first part to make zero yield a non-zero result
 adc #255

 tax
 dex
 sbx #0
If for some reason it's better to have the result in the carry flag, replace the last instruction with "sbx #1". In that case, the zero flag will not be meaningful.

Ahhh... I wasn't going to say anything about your previous routine but I was wondering... I mean it looped!

But this one ties it all together.

+batari · March 11, 2007

To detect zero, just do this:

 tax
 dex
 bmi done
 sbx #0
done

Also, why not load the value to be tested with lax, so you don't need the tax above?

vdub_bobby · March 12, 2007

Cool! Here's another one:

Modulus.

I need to find n mod 20. Well, to be specific, I need to find if n mod 20 is zero or otherwise.

Here's how I do it:

   lda n
  sec
Loop
  sbc #20
  bcc Remainder
  bne Loop
;--remainder is zero

Which is slow. Is there a better way?

EDIT: And another:

I have often wanted to find the absolute value of a signed byte, usually I do something like this:

   lda v
  bpl FoundAbs
  lda #0
  sec
  sbc v
FoundAbs

Which is ok, I guess. The website djmips posted has this non-branching solution:

Compute the integer absolute value (abs) without branching

int v;	  // we want to find the absolute value of v
int r;	  // the result goes here 
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

Patented variation:

r = (v ^ mask) - mask;

Which I translated into 6502 assembly; seems a bit messy:

   lda v
  eor #$80
  asl
  sta temp
  sbc temp
  sta temp
  clc
  adc v
  eor temp

Any improvements on either routine?

Edited March 12, 2007 by vdub_bobby

Urchlay · March 12, 2007

I have often wanted to find the absolute value of a signed byte, usually I do something like this:
   lda v
  bpl FoundAbs
  lda #0
  sec
  sbc v
FoundAbs

How about:

 lda v
 bpl .foundabs
 eor #$ff
 clc; not needed if the state of C is known up front
 adc #1; would be "adc #0", if C is known to be set
.foundabs

Not much of an improvement, but it does run 1 cycle faster if v is negative (or 3, if you can leave out the CLC).

Reading over this thread, I can see that I need to learn how to use the illegal/undocumented opcodes. About the only ones I've ever used are DCP and some of the extended NOPs (e.g. to sleep for 3 cycles).

Thomas Jentzsch · March 13, 2007

Reading over this thread, I can see that I need to learn how to use the illegal/undocumented opcodes. About the only ones I've ever used are DCP and some of the extended NOPs (e.g. to sleep for 3 cycles).

LAX is also often very useful (especially LAX (),y). And the various NOPs can also be used to skip instructions (like BIT does) but without changing the flag and sometimes even slightly faster (NOP #imm).

Thomas Jentzsch · March 13, 2007

I need to find n mod 20. Well, to be specific, I need to find if n mod 20 is zero or otherwise.

Using prime factors might help here. 20 = 2*2*5, so first we could SHR twice and check the carries. But then we still need an effective solution for 5. :ponder:

Not sure, but this may work:

  while (n > 0) {
n = n/4 - n%4
 }

n has to be zero.

Cybergoth · March 13, 2007

But then we still need an effective solution for 5.

A 60 byte lookup table? :twisted:

Thomas Jentzsch · March 13, 2007

A 60 byte lookup table?

Nah, that's cheating.

But I just had another idea: Add the two nibbles and check if the result can be divided by 5 (== 5, A, F). For results above F you have to repeat.

Bruce Tomlin · March 13, 2007

Here's a quick mod 5 I just came up with:

	LDX	#5*32; start with 5 shifted left as far as possible
STX	temp
.10
CMP	temp
BCS	.20; branch if A < temp
SEC; may or not be needed, I can never remember how carry works 6502 subtract
SBC	temp; subtract temp from A
;   BNE	.30; can short-cut exit here if exact multiple
.20
LSR	temp; shift divisor right
BCC	.10; loop until all divisors tried
.30
; remainder is in A

Completely untested of course. This should work for a mod by any odd number, using the low bit as an end condition. If you shift out and save the two low bits, then shift them back in, you should have a mod 20. You can then use LDX #5*8 instead of LDX #5*32. It might even be possible to completely unroll this:

	LSRA
ROR temp
LSRA
ROR temp

CMP #5*8
BCS .10
SEC
SBC #5*8
.10
CMP #5*4
BCS .20
SEC
SBC #5*4
.20
CMP #5*2
BCS .30
SEC
SBC #5*2
.30
CMP #5*1
BCS .40
SEC
SBC #5*1
.40
ASL temp
ROLA
ASL temp
ROLA
.50
BNE notdiv20

Fix the carry flag stuff to work the right way of course.

If all you need to do is test for divide by 20, then just BCS notdiv20 instead of ROR temp.

Edited March 13, 2007 by Bruce Tomlin

supercat · March 13, 2007

Here's a quick mod 5 I just came up with:

Decent approach, though I don't see the need for temp:

 ldx #8
lp:
 cmp #80
 bcc nosub
 sbc #160 ; Note carry set by cmp
 bcs nosub
 adc #80 ; Note carry clear by sbc
nosub:
 asl
 dex
 bne lp

supercat · March 13, 2007

Here's another interesting approach:

; Enter at s0
s4:
 asl
 bcs s4
 beq done_bad
s3:
 asl
 bcs s2
s1:
 asl
 bcs s3
s2:
 asl
 bcc s4
s0:
 asl
 bcs s1
 bne s0
done_good:

djmips · May 21, 2007

Arithmetic Shifts

-------------------

For signed shifting you need to use an arithmetic shift.

The consensus on how to do a single arithmetic shift right is.

  cmp #$80
 ror

And here is the code for when you are doing 4 or more bits at once. 4 in this example.

  lsr
 lsr
 lsr
 lsr
 clc
 adc #$F8
 eor #$F8

For an arithmetic shift left, you can just use ASL but it doesn't correctly set the overflow flag. If you need to check the overflow flag then use the following code sequence.

   clc
  sta temp
  adc temp

Credits - Lee Davison, TJ

Edited May 24, 2007 by djmips

supercat · May 21, 2007

If you need to check the overflow flag then use the following code sequence.

Or you could do the shift the normal way and then do something like:

  bcs  shifted_out_1
 bpl  overflow_nope
overflow_yep:
 ...

shifted_out_1:
 bpl overflow_yep
overflow_nope:

That takes 5 cycles if no overflow; 4 or 6 cycles if overflow. The accumulator is not disturbed.

6502 Killer hacks

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members