IGNORED

divide by 3 and by 36

Recommended Posts

Hello!

For hacking some nice demo effect, I need to convert a sprite position (range 0-95) to a cpu cycle offset. This means division by 3. I also need to divide the range into three areas. For technical reasons, I chose 12 cycles = 36 "positions". So I need to divide by 36 as well.

How should I do this?

An idea would be to have a table of 96 bytes storing all possible results: the lower 0-4 bits contain the result of div3, bits 5-6 div36, so that I can AND out or LSR the desired result rather easy and fast. But that costs precious ROM. Creating the table in RAM also is no option, because I already need >80 bytes for other data.

Right now I'm stuck with a subtraction, counting how many times I can subtract the desired value like this:

```   lda   value
ldx   #\$ff
sec
loop36:
inx
sbc   #36
bcs   loop36
stx   div36

lda   value
ldx   #\$ff
sec
loop3:
inx
sbc   #3
bcs   loop3
stx   div3
```

This costs 24 bytes in total, but also costs quite some computing time, worst case should be 263 cpu cycles which is ~3.5 raster lines. Right now I can live with that, but my mind keeps banging if there's still an alternative worth checking out. One thing I found saved one byte:

```   lda   value
ldx   #\$ff
sec
loop3:
inx
sbc   #3
bcs   loop3
stx   div3

txa
ldx   #\$ff
sec
loop12:
inx
sbc   #12
bcs   loop12
stx   div36
```

Just divide by 3 and after that by 12. Gets to the same result, saves one byte by using "txa", but still needs the same amount of cycles.

For now I focused on the div3 since that's where most cpu time is spent on a bad case.

I looked around the internet already and found the idea of multiplying by 256/3 and throw away the low byte. That approach does not work entirely as the result sometimes gets "rounded up", which would be the death of my cycle exact code.

Multiplying by 3 is easy:

```   lda   value
asl
clc           ; only needed if value could be > 127
sta   mul3
```

So I was hoping either to come up wirh or find something similar like this for dividing by 3, but no luck yet.

Any ideas?

p.s.: i should be able modify my code on the 36 positions part to use there 33 positions instead, if someone got a better idea for dividing by 33 (or 11) instead of 36 (or 12). Other numbers are technically not possible.

Share on other sites

Just thinking out loud here without testing it: The division into three areas can be done by manual comparisons (2 cmp instructions). If you do that first, then you could re-use the result for the division by 3, i.e. you only need a look-up table of size 12 (0, 3, ..., 33) and then add a base value depending in your previous cmp results (+0, +36, +72). This should be very fast and might be light on ROM.

EDIT: Scratch that, this is nonsense - I should refrain from "thinking out loud" and think things through before posting instead.

EDIT 2: Or maybe not. (I'm tired today...) After the manual comparisons for the div 36, you could subtract 36 or 72 from the sprite position if it's in area 2 or 3, and then use the smaller, 12 byte look-up table for the div 3 and add 12 or 24 again to the result.

Edited by Kylearan
Share on other sites

Divide by 3 (Omegamatrix)

```;18 bytes, 30 cycles
sta  temp
lsr
lsr
ror
lsr
ror
lsr
ror
lsr
```
Share on other sites

For hacking some nice demo effect, I need to convert a sprite position (range 0-95) to a cpu cycle offset.

You could do something like X * 1/4 + X * 1/16 + X * 1/64... or (X * 16 + X * 4 + X * 1) / 64 (rounding might become a problem).

But why would you have to do that division anyway?

Share on other sites

You could do something like X * 1/4 + X * 1/16 + X * 1/64... or (X * 16 + X * 4 + X * 1) / 64 (rounding might become a problem).

Yes, that was one of the approaches I tried and ran into rounding problems.

Divide by 3 (Omegamatrix)

```;18 bytes, 30 cycles
sta  temp
lsr
lsr
ror
lsr
ror
lsr
ror
lsr
```

Nukey, thanks. I think that was one of the solutions I was looking for evaluation.

I found another shortcut

```   lda   value
sec
ldx   #\$ff
loop36:
inx
sbc   #36
bcs   loop36
stx   div36

ldx   #\$ff
loop3:
inx
sbc   #3
bcs   loop3
txa
ldx   div36
adc   tab12,x ; tab12: .byte 0,12,24
sta   div3
```

I'm using only the remainder of div36 for div3, to lower the number of loops needed to 12 max. This lower the number of cycles in the worst case from 263 to 129 by increasing ROM neede from 24 to 32. Fair tradeoff. This is something I can work with, if I don't figure out anything better.

But if I combine the subtract loop for div36 and omegamatrix' code for div3, I'll end up with 34 bytes and worst case of 65 cycles. Seems like the better deal.

Share on other sites

29 bytes, and 42 cycles (constant). Good for dividing values of 0 to 107.

```;Divide by 3
;18 bytes, 30 cycles
sta  temp
lsr
lsr
ror
lsr
ror
lsr
ror
lsr

;Continue routine for divide by 36
;11 bytes 12 cycles
tax      ; X = div3
lda  #0
cpx  #12
cpx  #24
; A = div36
```

More of my unsigned integer division routines can be found here.

Share on other sites

I took a look, and for the the divide by 3 can be shortened for a limited range. So now it's 27 bytes and 40 cycles (constant), and valid over all for dividing values of 0 to 107.

```;Divide by 3 (can be used for 0-128 range)
;16 bytes, 28 cycles
sta  temp
lsr
lsr
ror
lsr
ror
lsr
ror
lsr

;Continue routine for dividing by 36 (good for 0-107 only)
;11 bytes 12 cycles
tax      ; X = div3
lda  #0
cpx  #12
cpx  #24
; A = div36
```
Share on other sites

Another optimization, a trick I discovered long ago and forgotten for this particular situation. 26 bytes and 40 cycles.

```;Divide by 3 (can be used for 0-128 range)
;16 bytes, 28 cycles
sta  temp
lsr
lsr
ror
lsr
ror
lsr
ror
lsr

;Continue routine for dividing by 36 (good for 0-107 only)
;10 bytes 12 cycles
tax      ; X = div3
lda  #0
cpx  #12
rol
cpx  #24
; A = div36
```
Share on other sites

And by switching A and X for the results, I can even optimize that to 24 bytes and 38 cycles:

```; prepare (division code suggested by omagematrix)
lda   xpos
;Divide by 3 (can be used for 0-107 range)
;16 bytes, 28 cycles
sta   temp
lsr
lsr
ror
lsr
ror
lsr
ror
lsr
; A = div3

;Continue routine for dividing by 36 (good for 0-107 only)
;8 bytes 10 cycles
ldx   #0
cmp   #12
inx
cmp   #24
inx
; X = div36

```

Which turns out even better this way, because I have to run a subtraction with the div3 value.

Edit: never write posts just after getting out of bed... The "inx" don't work of course... and fixing that with "bcc"s wil result in code that's slower and takes more ROM.

Edited by SvOlli
Share on other sites

On a hunch I tried another optimization, and it saves 3 cycles. It just makes the 0-95 criteria.

26 bytes, and 37 cycles overall.

```;Divide by 3 (good for 0-95 range only)
;16 bytes, 25 cycles
cmp  #33
sta  temp
lsr
lsr
ror
lsr
ror
lsr

;Continue routine for dividing by 36 (good for 0-107 range only)
;10 bytes 12 cycles
tax      ; X = div3
lda  #0
cpx  #12
rol
cpx  #24
; A = div36
```
Share on other sites

Thanks again. But I'm sticking with with "26 bytes and 40 cycles" versions, because what comes in via A has been calculated and needs to be stored to RAM anyway for later use. And since the "adc temp" doesn't change anything at "temp", I don't need a temporary storage, but just can "reuse" the original one.

Share on other sites

The "rol" instead of the first "adc #\$00" is a very nice trick to save 1 byte... included.

Share on other sites

I'm going to present one more version. It saves 3 bytes but costs 16 more cycles. Use whatever routine that makes the most sense for the situation.

This one is 23 bytes and 56 cycles (constant cycles) overall.

```;Divide by 3 (can be used for 0-128 range)
;13 bytes, 44 cycles
sta  temp
lsr
lsr

ldx  #3
.loopFinishDiv3:
ror
lsr
dex
bne  .loopFinishDiv3

;Continue routine for dividing by 36 (good for 0-107 only)
;10 bytes 12 cycles
tax      ; X = div3
lda  #0
cpx  #12
rol
cpx  #24
; A = div36
```
Share on other sites

Also.... I always post and realize something else. If Y is available save 1 more byte.

So 22 bytes and 56 cycles (constant).

```;Divide by 3 (can be used for 0-128 range)
;13 bytes, 44 cycles
sta  temp
lsr
lsr

ldy  #3
.loopFinishDiv3:
ror
lsr
dey
bne  .loopFinishDiv3

;Continue routine for dividing by 36 (good for 0-107 only)
;9 bytes 12 cycles
tax      ; X = div3
tya      ; A=0
cpx  #12
rol
cpx  #24
; A = div36
```
Share on other sites

Here's a 21 byte version. It adds another 9 cycles but hey if you got the cycles to spare then it's another byte saved. Use whatever routine works.

21 bytes and 65 cycles (constant) overall.

```;Divide by 3 (can be used for 0-128 range)
;12 bytes, 53 cycles
ldy  #4
sta  temp
.byte \$0C ; NOP skip 2 bytes
.loopFinishDiv3:
lsr
lsr
dey
bne  .loopFinishDiv3

;Continue routine for dividing by 36 (good for 0-107 only)
;9 bytes 12 cycles
tax      ; X = div3
tya      ; A=0
cpx  #12
rol
cpx  #24
; A = div36
```
Share on other sites

• 2 weeks later...

Share on other sites

• 1 year later...

Divide by 3 (Omegamatrix)

```;18 bytes, 30 cycles
sta  temp
lsr
lsr
ror
lsr
ror
lsr
ror
lsr
```

Hello..

I'm trying to divide the 160 color clocks by 6. If I wanted to divide by 6, should I just put an lsr right at the top of this (before storing to temp)? Should I do anything with the carry?

Share on other sites

Hello..

I'm trying to divide the 160 color clocks by 6. If I wanted to divide by 6, should I just put an lsr right at the top of this (before storing to temp)? Should I do anything with the carry?

Can't say I've studied on it any, but I'd just add an lsr at the end, that way you keep the most bits for the divide by 3

Share on other sites

Could this work ?

After line 192, you VBLANK, then :

-init Y at zero (2) and A to proper collision mask (2)

-position a missile at pixel X (11 cycles)

-position a ball at start of line (3 cycles)

-set HMBL at 3 (5 cycles)

-at each new scanline, HMOVE (3 cycles) and increment Y (2) until collision detected (6/5 cycles)

total : 28 + result * 11, max about 380 cycles, but definitively more fun.

Note you can divide many number at once using this techic, or divide a number by severals number, saving many cycles.

Edited by Windless
Share on other sites

Divide by 3 (Omegamatrix)

```;18 bytes, 30 cycles
sta  temp
lsr
lsr
ror
lsr
ror
lsr
ror
lsr
```

Well I've looked at this multiple times this week, manually worked through it with various values and it always works. I made a spreadsheet to try to observe what happens, why and when.. and I still have no solid idea why this works. My guess is that the adc #\$15 (%00010101) is seeding the value to affect the carry for some reason and that repeatedly dividing by four, then adding the original amount back in, somehow works out to dividing by 3. Also thinking that binary 3 behaves differently than decimal 3, being that its a fraction of the base in decimal, but 1.5 times the base in binary.

Would someone care to comment it so I might get a better grip on it? Also, did I get any of it right?

Thanks

Share on other sites

I didn't try to work it out but..

1/3 is 1/4 + 1/16 + 1/64 + 1/256 ...

You're multiplying by reciprocal 3

\$15 is the first three of those

That will (roughly) have the effect of rounding up if the fractional part is > .66

ie you're multiplying by .3333... and adding .333 (approximately, I've not accounted for truncation and rounding)

Edited by bogax
Share on other sites

19 hours ago, bogax said:

1/3 is 1/4 + 1/16 + 1/64 + 1/256 ...

Ahh thanks, that was the missing piece.  I saw Thomas mention something like that but it didn’t click because I had no idea that was a thing.  Everyone here seems to know it without having to say.  All those years of math in school and I don’t recall ever seeing it.  Here’s my interpretation:

```Div6:	SUBROUTINE 	; divide player position by 6
; rors are necessary for larger numbers
sta  temp		; Store the value for recycling. 1/3 is the  convergence of 1/4 + 1/16 + 1/64 + 1/256...
lsr				; Shift significant bits (of 2 bit pairs) over 1
adc  #\$15		; = %00010101. Round up if the 2 bit pair was 3?
lsr				; Now we are at 1/4 the original number.  The results here will be divided 3 more times for the 1/256 portion
adc  temp		; Add that to the original number for round 2 (Original # plus 1/4)
ror				;
lsr				; Divide that by 4.  The results here will be divided 2 more times for the 1/64 portion
adc  temp		; Add that to the total number for round 3 (Original # plus 1/4 plus 1/16)
ror				;
lsr				; Divide that by 4.  The results here will be divided 1 more time for the 1/16 portion
adc  temp		; Add that to the total number for round 4 (Original # plus 1/4 plus 1/16 plus 1/64)
ror				;
lsr				; Divide that by 4 for final result.  Original can be discarded now. Result is 1/4 plus 1/16 plus 1/64 plus 1/256
;Make it divide by 6
lsr				; ```

Edited by BNE Jeff
Fractions were wrong

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.