Jump to content
IGNORED

6502 Killer hacks


djmips

Recommended Posts

This is how I would do it:

	lda	 v	 ; 2
ldx	 #-1; 2
.count:
inx		; 2
.loop:
lsr		; 2
bcs	 .count; 2³
bne	 .loop ; 2³

No someone do the math. :)

 

Holy crap, *that* is a killer hack!

 

10 cycles if v was 0, and 63 cycles if it was $FF (worst case)... in 10 bytes of code!

 

Also, your version is optimized for lower numbers... the execution time depends on where the most significant set bit is, instead of the number of set bits like the Kernighan version (so your version is faster for $0F than it would be for $F0, even though both have 4 bits set).

 

If I were wearing a hat, I'd take it off to you :)

Link to comment
Share on other sites

:D Thanks Urchlay and TJ, lots of fun reading.

 

Now tell me what in the world "parity" is. :?

 

Long description: http://en.wikipedia.org/wiki/Parity_%28telecommunication%29

 

Short version: An 8th bit that's added to a 7-bit byte, so that the number of 1 bits comes out to an even number (even parity) or to an odd number (odd parity). Used to detect errors on a serial line: the sender sends bytes with correct parity, and the receiver calculates what the parity bit should be... and if they're different, the receiver knows the data got garbled during transmission.

Link to comment
Share on other sites

Thanks. ;)

 

EDIT: How about this one:

Determining if an integer is a power of 2

unsigned int v; // we want to see if v is a power of 2
bool f;		 // the result goes here 

f = (v & (v - 1)) == 0;

Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:

f = !(v & (v - 1)) && v;

What's the killer 6502 hack for this? I've come up with this:

   lda v
  dec v
  and v
  inc v

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

 

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

Edited by vdub_bobby
Link to comment
Share on other sites

 

   lda v
  dec v
  and v
  inc v

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

 

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

 

The return value should really be in a flag, so you can immediately use a branch instruction afterwards... otherwise you'll need something like "cmp #0" in the caller. How about:

 

; Warning: untested code!

is_x_power_of_2:; convenience entry point: checks X instead of location v
 stx v
 .byte $2C; aka BIT absolute opcode, skips the "sta v" below

is_a_power_of_2:; convenience entry point: checks accumulator instead of location v
 sta v

; Main entry point:
is_v_power_of_2: subroutine; {
 lda v
 bne .main
 lda #1; Special case: return with Z flag false, if v==0
 rts

.main; v is known to be non-zero, here
 dec v
 and v; accumulator now contains 0 if v is a power of 2 (Z flag is set correctly, but "inc v" is about to destroy it)

; If you don't care about preserving v, you can comment out both these lines:
 inc v
 cmp #0; set Z flag true if a==0, or false otherwise (needed because "inc v" trashes the Z flag)

.done
 rts
; }

 

It's not so "killer" any more, but it's easy to use:

 

 ldx $whatever; get a byte from somewhere
 stx v
 jsr is_v_power_of_2; Was it a power of 2?
 beq yes_it_was; yep!
; no, do whatever here...

;; Alternatively:
 lda $whatever; or, A could be the result of some calculations
 jsr is_a_power_of_2
 beq yes_it_was

 

Also, since you're only dealing with an 8-bit byte, you can just brute-force it:

 

; Return Z=true if A is a power of two, or Z=false if it isn't
; Uses no RAM, preserves A/X/Y, trashes flags
is_a_power_of_2: subroutine; {
 cmp #1
 beq .yes
 cmp #2
 beq .yes
 cmp #4
 beq .yes
 cmp #8
 beq .yes
 cmp #16
 beq .yes
 cmp #32
 beq .yes
 cmp #64
 beq .yes
 cmp #128
 beq .yes

; if none of the compares are true, Z ends up being false
.yes; if any of the compares are true, Z is true when we branch here

; Either way, just return it.
 rts
; }

 

Converting the brute-force method to a loop... which will be slower than the original:

 

is_a_power_of_2: subroutine; {
 ldx #1
 stx tmp

.loop; {
 cmp tmp
 beq .done; return with Z=true
 asl tmp
 bne .loop
;	}

; original: lda #1; return with Z=false (too bad there's no CLZ in 6502-speak)
; EDIT:
 ldx #1; return with Z=false, preserving the value of A

.done
 rts
; }

 

 

However, TJ is probably about to post something that blows all this code away :)

Edited by Urchlay
Link to comment
Share on other sites

Thanks. ;)

 

EDIT: How about this one:

Determining if an integer is a power of 2

unsigned int v; // we want to see if v is a power of 2
bool f;		 // the result goes here 

f = (v & (v - 1)) == 0;

Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:

f = !(v & (v - 1)) && v;

What's the killer 6502 hack for this? I've come up with this:

   lda v
  dec v
  and v
  inc v

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

 

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

Nice! :lust:

Link to comment
Share on other sites

However, TJ is probably about to post something that blows all this code away :)

 

This isn't up to TJ's standards, but I just thought of another solution in the shower:

 

; destructively test to see if the accumulator is a power of two
; on exit, A is trashed, X and Y are preserved, and Z is set if A was a power of two
is_a_power_of_2: subroutine; {
 cmp #0; Test for special case A==0
 bne .loop
 adc #1; another "CLZ"
 rts

; All powers of two have only one bit set!
.loop; {
; This loop terminates when we find a set bit
 lsr
 bcc .loop
;   }

 rts; If we found a set bit, the Z flag will be set if that was the *only* set bit
; }

 

That would actually be kind of elegant, if it weren't for the need to test for the special A==0 case :(

Link to comment
Share on other sites

   lda v
  dec v
  and v
  inc v

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

 

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

How about

  dec v
 lda v
 inc v
 beq .exit
 and v
.exit:

?

Not perfect, since you still need a compare.

Link to comment
Share on other sites

Now it's elegant.

 

Maybe. Still a bit slow, though.

 cmp #1   ; Optional first part to make zero yield a non-zero result
 adc #255

 tax
 dex
 sbx #0

If for some reason it's better to have the result in the carry flag, replace the last instruction with "sbx #1". In that case, the zero flag will not be meaningful.

Link to comment
Share on other sites

Now it's elegant.

 

Maybe. Still a bit slow, though.

 cmp #1  ; Optional first part to make zero yield a non-zero result
 adc #255

 tax
 dex
 sbx #0

If for some reason it's better to have the result in the carry flag, replace the last instruction with "sbx #1". In that case, the zero flag will not be meaningful.

 

Ahhh... I wasn't going to say anything about your previous routine but I was wondering... I mean it looped! ;)

 

But this one ties it all together.

Link to comment
Share on other sites

Cool! Here's another one:

 

Modulus.

 

I need to find n mod 20. Well, to be specific, I need to find if n mod 20 is zero or otherwise.

Here's how I do it:

   lda n
  sec
Loop
  sbc #20
  bcc Remainder
  bne Loop
;--remainder is zero

Which is slow. Is there a better way?

 

EDIT: And another:

I have often wanted to find the absolute value of a signed byte, usually I do something like this:

   lda v
  bpl FoundAbs
  lda #0
  sec
  sbc v
FoundAbs

Which is ok, I guess. The website djmips posted has this non-branching solution:

Compute the integer absolute value (abs) without branching

int v;	  // we want to find the absolute value of v
int r;	  // the result goes here 
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

Patented variation:

r = (v ^ mask) - mask;

Which I translated into 6502 assembly; seems a bit messy:

   lda v
  eor #$80
  asl
  sta temp
  sbc temp
  sta temp
  clc
  adc v
  eor temp

Any improvements on either routine?

Edited by vdub_bobby
Link to comment
Share on other sites

 

I have often wanted to find the absolute value of a signed byte, usually I do something like this:

   lda v
  bpl FoundAbs
  lda #0
  sec
  sbc v
FoundAbs

 

How about:

 

 lda v
 bpl .foundabs
 eor #$ff
 clc; not needed if the state of C is known up front
 adc #1; would be "adc #0", if C is known to be set
.foundabs

 

Not much of an improvement, but it does run 1 cycle faster if v is negative (or 3, if you can leave out the CLC).

 

Reading over this thread, I can see that I need to learn how to use the illegal/undocumented opcodes. About the only ones I've ever used are DCP and some of the extended NOPs (e.g. to sleep for 3 cycles).

Link to comment
Share on other sites

Reading over this thread, I can see that I need to learn how to use the illegal/undocumented opcodes. About the only ones I've ever used are DCP and some of the extended NOPs (e.g. to sleep for 3 cycles).

LAX is also often very useful (especially LAX (),y). And the various NOPs can also be used to skip instructions (like BIT does) but without changing the flag and sometimes even slightly faster (NOP #imm).

Link to comment
Share on other sites

I need to find n mod 20. Well, to be specific, I need to find if n mod 20 is zero or otherwise.

Using prime factors might help here. 20 = 2*2*5, so first we could SHR twice and check the carries. But then we still need an effective solution for 5. :ponder:

 

Not sure, but this may work:

  while (n > 0) {
n = n/4 - n%4
 }

n has to be zero.

Link to comment
Share on other sites

Here's a quick mod 5 I just came up with:

 

	LDX	#5*32; start with 5 shifted left as far as possible
STX	temp
.10
CMP	temp
BCS	.20; branch if A < temp
SEC; may or not be needed, I can never remember how carry works 6502 subtract
SBC	temp; subtract temp from A
;   BNE	.30; can short-cut exit here if exact multiple
.20
LSR	temp; shift divisor right
BCC	.10; loop until all divisors tried
.30
; remainder is in A

 

Completely untested of course. This should work for a mod by any odd number, using the low bit as an end condition. If you shift out and save the two low bits, then shift them back in, you should have a mod 20. You can then use LDX #5*8 instead of LDX #5*32. It might even be possible to completely unroll this:

 

	LSRA
ROR temp
LSRA
ROR temp

CMP #5*8
BCS .10
SEC
SBC #5*8
.10
CMP #5*4
BCS .20
SEC
SBC #5*4
.20
CMP #5*2
BCS .30
SEC
SBC #5*2
.30
CMP #5*1
BCS .40
SEC
SBC #5*1
.40
ASL temp
ROLA
ASL temp
ROLA
.50
BNE notdiv20

Fix the carry flag stuff to work the right way of course.

 

If all you need to do is test for divide by 20, then just BCS notdiv20 instead of ROR temp.

Edited by Bruce Tomlin
Link to comment
Share on other sites

  • 2 months later...

Arithmetic Shifts

-------------------

 

For signed shifting you need to use an arithmetic shift.

 

The consensus on how to do a single arithmetic shift right is.

 

  cmp #$80
 ror

 

And here is the code for when you are doing 4 or more bits at once. 4 in this example.

 

  lsr
 lsr
 lsr
 lsr
 clc
 adc #$F8
 eor #$F8

 

For an arithmetic shift left, you can just use ASL but it doesn't correctly set the overflow flag. If you need to check the overflow flag then use the following code sequence.

 

   clc
  sta temp
  adc temp

 

Credits - Lee Davison, TJ

Edited by djmips
Link to comment
Share on other sites

If you need to check the overflow flag then use the following code sequence.

 

Or you could do the shift the normal way and then do something like:

  bcs  shifted_out_1
 bpl  overflow_nope
overflow_yep:
 ...

shifted_out_1:
 bpl overflow_yep
overflow_nope:

That takes 5 cycles if no overflow; 4 or 6 cycles if overflow. The accumulator is not disturbed.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...