IGNORED

# 6502 Killer hacks

## Recommended Posts

This is how I would do it:

```	lda	 v	  ; 2
ldx	 #-1	; 2
.count:
inx			; 2
.loop:
lsr			; 2
bcs	 .count ; 2³
bne	 .loop  ; 2³```

No someone do the math.

##### Share on other sites

This is how I would do it:

```	lda	 v	 ; 2
ldx	 #-1; 2
.count:
inx		; 2
.loop:
lsr		; 2
bcs	 .count; 2³
bne	 .loop ; 2³```

No someone do the math.

Holy crap, *that* is a killer hack!

10 cycles if v was 0, and 63 cycles if it was \$FF (worst case)... in 10 bytes of code!

Also, your version is optimized for lower numbers... the execution time depends on where the most significant set bit is, instead of the number of set bits like the Kernighan version (so your version is faster for \$0F than it would be for \$F0, even though both have 4 bits set).

If I were wearing a hat, I'd take it off to you

##### Share on other sites

Thanks Urchlay and TJ, lots of fun reading.

Now tell me what in the world "parity" is.

##### Share on other sites

Thanks Urchlay and TJ, lots of fun reading.

Now tell me what in the world "parity" is.

Long description: http://en.wikipedia.org/wiki/Parity_%28telecommunication%29

Short version: An 8th bit that's added to a 7-bit byte, so that the number of 1 bits comes out to an even number (even parity) or to an odd number (odd parity). Used to detect errors on a serial line: the sender sends bytes with correct parity, and the receiver calculates what the parity bit should be... and if they're different, the receiver knows the data got garbled during transmission.

##### Share on other sites

Thanks.

```Determining if an integer is a power of 2

unsigned int v; // we want to see if v is a power of 2
bool f;		 // the result goes here

f = (v & (v - 1)) == 0;

Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:

f = !(v & (v - 1)) && v;```

What's the killer 6502 hack for this? I've come up with this:

```   lda v
dec v
and v
inc v```

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

Edited by vdub_bobby
##### Share on other sites

```   lda v
dec v
and v
inc v```

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

The return value should really be in a flag, so you can immediately use a branch instruction afterwards... otherwise you'll need something like "cmp #0" in the caller. How about:

```; Warning: untested code!

is_x_power_of_2:; convenience entry point: checks X instead of location v
stx v
.byte \$2C; aka BIT absolute opcode, skips the "sta v" below

is_a_power_of_2:; convenience entry point: checks accumulator instead of location v
sta v

; Main entry point:
is_v_power_of_2: subroutine; {
lda v
bne .main
lda #1; Special case: return with Z flag false, if v==0
rts

.main; v is known to be non-zero, here
dec v
and v; accumulator now contains 0 if v is a power of 2 (Z flag is set correctly, but "inc v" is about to destroy it)

; If you don't care about preserving v, you can comment out both these lines:
inc v
cmp #0; set Z flag true if a==0, or false otherwise (needed because "inc v" trashes the Z flag)

.done
rts
; }
```

It's not so "killer" any more, but it's easy to use:

``` ldx \$whatever; get a byte from somewhere
stx v
jsr is_v_power_of_2; Was it a power of 2?
beq yes_it_was; yep!
; no, do whatever here...

;; Alternatively:
lda \$whatever; or, A could be the result of some calculations
jsr is_a_power_of_2
beq yes_it_was
```

Also, since you're only dealing with an 8-bit byte, you can just brute-force it:

```; Return Z=true if A is a power of two, or Z=false if it isn't
; Uses no RAM, preserves A/X/Y, trashes flags
is_a_power_of_2: subroutine; {
cmp #1
beq .yes
cmp #2
beq .yes
cmp #4
beq .yes
cmp #8
beq .yes
cmp #16
beq .yes
cmp #32
beq .yes
cmp #64
beq .yes
cmp #128
beq .yes

; if none of the compares are true, Z ends up being false
.yes; if any of the compares are true, Z is true when we branch here

; Either way, just return it.
rts
; }
```

Converting the brute-force method to a loop... which will be slower than the original:

```is_a_power_of_2: subroutine; {
ldx #1
stx tmp

.loop; {
cmp tmp
beq .done; return with Z=true
asl tmp
bne .loop
;	}

; original: lda #1; return with Z=false (too bad there's no CLZ in 6502-speak)
; EDIT:
ldx #1; return with Z=false, preserving the value of A

.done
rts
; }
```

However, TJ is probably about to post something that blows all this code away

Edited by Urchlay
##### Share on other sites

Thanks.

```Determining if an integer is a power of 2

unsigned int v; // we want to see if v is a power of 2
bool f;		 // the result goes here

f = (v & (v - 1)) == 0;

Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:

f = !(v & (v - 1)) && v;```

What's the killer 6502 hack for this? I've come up with this:

```   lda v
dec v
and v
inc v```

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

Nice!

##### Share on other sites

However, TJ is probably about to post something that blows all this code away

This isn't up to TJ's standards, but I just thought of another solution in the shower:

```; destructively test to see if the accumulator is a power of two
; on exit, A is trashed, X and Y are preserved, and Z is set if A was a power of two
is_a_power_of_2: subroutine; {
cmp #0; Test for special case A==0
bne .loop
rts

; All powers of two have only one bit set!
.loop; {
; This loop terminates when we find a set bit
lsr
bcc .loop
;   }

rts; If we found a set bit, the Z flag will be set if that was the *only* set bit
; }
```

That would actually be kind of elegant, if it weren't for the need to test for the special A==0 case

##### Share on other sites

That would actually be kind of elegant, if it weren't for the need to test for the special A==0 case

Hmm...

``` cmp #1
lp:
lsr
bcc lp
```

Should do an early exit (NZ) in the A=0 case.

##### Share on other sites

```   lda v
dec v
and v
inc v```

Which returns zero in the accumulator if v is a power of two and anything else if it isn't. And preserves v.

Except it doesn't handle v==0 and I'd rather have it return 0 if false and anything else if true.

```  dec v
lda v
inc v
beq .exit
and v
.exit:```

?

Not perfect, since you still need a compare.

##### Share on other sites

That would actually be kind of elegant, if it weren't for the need to test for the special A==0 case

Hmm...

``` cmp #1
lp:
lsr
bcc lp
```

Should do an early exit (NZ) in the A=0 case.

Nice!

Now it's elegant.

##### Share on other sites

Now it's elegant.

Maybe. Still a bit slow, though.

``` cmp #1   ; Optional first part to make zero yield a non-zero result

tax
dex
sbx #0
```

If for some reason it's better to have the result in the carry flag, replace the last instruction with "sbx #1". In that case, the zero flag will not be meaningful.

##### Share on other sites

Now it's elegant.

Maybe. Still a bit slow, though.

``` cmp #1  ; Optional first part to make zero yield a non-zero result

tax
dex
sbx #0
```

If for some reason it's better to have the result in the carry flag, replace the last instruction with "sbx #1". In that case, the zero flag will not be meaningful.

Ahhh... I wasn't going to say anything about your previous routine but I was wondering... I mean it looped!

But this one ties it all together.

##### Share on other sites

To detect zero, just do this:

``` tax
dex
bmi done
sbx #0
done
```

Also, why not load the value to be tested with lax, so you don't need the tax above?

##### Share on other sites

Cool! Here's another one:

Modulus.

I need to find n mod 20. Well, to be specific, I need to find if n mod 20 is zero or otherwise.

Here's how I do it:

```   lda n
sec
Loop
sbc #20
bcc Remainder
bne Loop
;--remainder is zero```

Which is slow. Is there a better way?

EDIT: And another:

I have often wanted to find the absolute value of a signed byte, usually I do something like this:

```   lda v
bpl FoundAbs
lda #0
sec
sbc v
FoundAbs```

Which is ok, I guess. The website djmips posted has this non-branching solution:

```Compute the integer absolute value (abs) without branching

int v;	  // we want to find the absolute value of v
int r;	  // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

Patented variation:

Which I translated into 6502 assembly; seems a bit messy:

```   lda v
eor #\$80
asl
sta temp
sbc temp
sta temp
clc
eor temp```

Any improvements on either routine?

Edited by vdub_bobby
##### Share on other sites

I have often wanted to find the absolute value of a signed byte, usually I do something like this:

```   lda v
bpl FoundAbs
lda #0
sec
sbc v
FoundAbs```

``` lda v
bpl .foundabs
eor #\$ff
clc; not needed if the state of C is known up front
adc #1; would be "adc #0", if C is known to be set
.foundabs
```

Not much of an improvement, but it does run 1 cycle faster if v is negative (or 3, if you can leave out the CLC).

Reading over this thread, I can see that I need to learn how to use the illegal/undocumented opcodes. About the only ones I've ever used are DCP and some of the extended NOPs (e.g. to sleep for 3 cycles).

##### Share on other sites

Reading over this thread, I can see that I need to learn how to use the illegal/undocumented opcodes. About the only ones I've ever used are DCP and some of the extended NOPs (e.g. to sleep for 3 cycles).

LAX is also often very useful (especially LAX (),y). And the various NOPs can also be used to skip instructions (like BIT does) but without changing the flag and sometimes even slightly faster (NOP #imm).

##### Share on other sites

I need to find n mod 20. Well, to be specific, I need to find if n mod 20 is zero or otherwise.

Using prime factors might help here. 20 = 2*2*5, so first we could SHR twice and check the carries. But then we still need an effective solution for 5.

Not sure, but this may work:

```  while (n > 0) {
n = n/4 - n%4
}```

n has to be zero.

##### Share on other sites

But then we still need an effective solution for 5.

A 60 byte lookup table?

##### Share on other sites

A 60 byte lookup table?

Nah, that's cheating.

But I just had another idea: Add the two nibbles and check if the result can be divided by 5 (== 5, A, F). For results above F you have to repeat.

##### Share on other sites

Here's a quick mod 5 I just came up with:

```	LDX	#5*32; start with 5 shifted left as far as possible
STX	temp
.10
CMP	temp
BCS	.20; branch if A < temp
SEC; may or not be needed, I can never remember how carry works 6502 subtract
SBC	temp; subtract temp from A
;   BNE	.30; can short-cut exit here if exact multiple
.20
LSR	temp; shift divisor right
BCC	.10; loop until all divisors tried
.30
; remainder is in A
```

Completely untested of course. This should work for a mod by any odd number, using the low bit as an end condition. If you shift out and save the two low bits, then shift them back in, you should have a mod 20. You can then use LDX #5*8 instead of LDX #5*32. It might even be possible to completely unroll this:

```	LSRA
ROR temp
LSRA
ROR temp

CMP #5*8
BCS .10
SEC
SBC #5*8
.10
CMP #5*4
BCS .20
SEC
SBC #5*4
.20
CMP #5*2
BCS .30
SEC
SBC #5*2
.30
CMP #5*1
BCS .40
SEC
SBC #5*1
.40
ASL temp
ROLA
ASL temp
ROLA
.50
BNE notdiv20```

Fix the carry flag stuff to work the right way of course.

If all you need to do is test for divide by 20, then just BCS notdiv20 instead of ROR temp.

Edited by Bruce Tomlin
##### Share on other sites

Here's a quick mod 5 I just came up with:

Decent approach, though I don't see the need for temp:

``` ldx #8
lp:
cmp #80
bcc nosub
sbc #160 ; Note carry set by cmp
bcs nosub
adc #80 ; Note carry clear by sbc
nosub:
asl
dex
bne lp
```

##### Share on other sites

Here's another interesting approach:

```; Enter at s0
s4:
asl
bcs s4
s3:
asl
bcs s2
s1:
asl
bcs s3
s2:
asl
bcc s4
s0:
asl
bcs s1
bne s0
done_good:
```

##### Share on other sites

• 2 months later...

Arithmetic Shifts

-------------------

For signed shifting you need to use an arithmetic shift.

The consensus on how to do a single arithmetic shift right is.

```  cmp #\$80
ror```

And here is the code for when you are doing 4 or more bits at once. 4 in this example.

```  lsr
lsr
lsr
lsr
clc
eor #\$F8```

For an arithmetic shift left, you can just use ASL but it doesn't correctly set the overflow flag. If you need to check the overflow flag then use the following code sequence.

```   clc
sta temp

Credits - Lee Davison, TJ

Edited by djmips
##### Share on other sites

If you need to check the overflow flag then use the following code sequence.

Or you could do the shift the normal way and then do something like:

```  bcs  shifted_out_1
bpl  overflow_nope
overflow_yep:
...

shifted_out_1:
bpl overflow_yep
overflow_nope:```

That takes 5 cycles if no overflow; 4 or 6 cycles if overflow. The accumulator is not disturbed.

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.