More efficient sub-pixel movement

GroovyBee · March 8, 2013

On the CP1610 its convenient to express an 8.8 fixed point number in the form :-

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| M | M | M | M | M | M | M | M | F | F | F | F | F | F | F | F |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Where M are bits of the magnitude (the integer part) and F are the bits of the fractional part.

Which can lead you to write code like this for movement in the game's main loop :-

; Move right
   mvi playerX, r0
   addi #$0180, r0    ; Move 1.5 pixels right in X.
   mvo r0, playerX
   andi #$FF00, r0
   cmpi #SOME_LIMIT, r0
; unsigned 16bit compares follow e.g. bc/bnc
...

And this in the VBLANK ISR :-

   mvi thePlayerX, r0
   swap r0            ; Move magnitude into position.
   andi #$00FF, r0    ; Magnitude only.
   xori #STIC.mobx_visb+STIC.mobx_intrm r0
   mvo r0, STIC.mob0_x
....

So how about we optimise the code? Lets start in the ISR :-

1) Lets get rid of the "swap r0". We can do this by making the magnitude part the Least Significant Byte (LSB) and the fraction part the Most Significant Byte (MSB) in the movement variables we operate on. The only downside is that we now have to manually account for the carry generated when adding to the fractional part of the variable. The main loop code now becomes :-

; Move right
   mvi playerX, r0
   addi #$8001, r0    ; Move 1.5 pixels right in X.
   adcr r0            ; Account for fractional carry.
   mvo r0, playerX
   andi #$00FF, r0
   cmpi #SOME_LIMIT, r0
; Use signed branches because the data to compare against is a byte in size.
...

And this in the VBLANK ISR :-

   mvi thePlayerX, r0
   andi #$00FF, r0    ; Magnitude only.
   xori #STIC.mobx_visb+STIC.mobx_intrm r0
   mvo r0, STIC.mob0_x
....

So... We haven't got rid of swap we exchanged it for another opcode .

2) In arcade games 8 fractional bits of movement is probably a bit of overkill so lets reduce that part to say 5 bits so we now have a fixed point word format like this :-

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| F | F | F | F | F | X | X | X | M | M | M | M | M | M | M | M |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Where F is the fractional part, X are don't care and M is the magnitude. Hmmmm.... That layout looks strangely familiar . It matches the layout of the STIC's MOB X register exactly! :-

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   | X | V | I |   |   |   |   |   |   |   |   |
|   |   |   |   |   |   | I | N |   |   |   | X |   |   |   |   |
| X | X | X | X | X | S | S | T |   | C | O | O | R | D |   |   |
|   |   |   |   |   | I | B | R |   |   |   |   |   |   |   |   |
|   |   |   |   |   | Z |   |   |   |   |   |   |   |   |   |   |
|   |   |   |   |   | E |   |   |   |   |   |   |   |   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

So the main loop game code becomes :-

; Move right
   mvi playerMobXShadow, r0
   addi #$8001, r0    ; Move 1.5 pixels right in X.
   adcr r0            ; Account for fractional carry.
   mvo r0, playerMobXShadow
   andi #$00FF, r0
   cmpi #SOME_LIMIT, r0
; Use signed branches because the data to compare against is a byte in size.
...

And the VBLANK ISR can be optimised down to :-

   mvii #theListOfXCoordinates, r4
   mvii #STIC.mob0_x, r5

; Handle the list of X coordinates.
   mvi@ r4, r0
   mvo@ r0, r5
   mvi@ r4, r0
   mvo@ r0, r5
   mvi@ r4, r0
   mvo@ r0, r5
...

The variable playerMobXShadow located in system RAM (16 bit wide) would be initialised at game set-up time with the X position, visibility, interaction and size all at the same time. Although you end up writing the fractional part of the movement to the STIC in the ISR it ignores those bits.

The major downside is that if you let the X coordinate part of the MOB X shadow overflow it does impact how the MOB looks. The good part of that is you get the feedback straight away :lol: .

The same principle can be applied to handling the STIC MOB Y shadow copies as well. In the case of Y the fractional part is only 4 bits.

It also means that you only keep direct shadow copies of the STIC X and Y registers which cuts down on the number of variables and logic especially if you are using the X and Y flip bits in the MOB Y register.

I almost forgot.... the code for left movement uses this :-

; Move left
   mvi playerMobXShadow, r0
   subi #$8001, r0    ; Move 1.5 pixels left in X.
   adcr r0            ; Account for fractional carry.
   decr r0
   mvo r0, playerMobXShadow
   andi #$00FF, r0
   cmpi #SOME_LIMIT, r0
; Use signed branches because the data to compare against is a byte in size.
...

I'll leave that as an exercise to the reader to understand .

+DZ-Jay · March 8, 2013

Great stuff. By the way, I'm curious: why do you manipulated the STIC directly on VBLANK?

My game variables include game objects with sub-pixel positions and a STIC shadow structure. The STIC shadow is separated from the game objects and serves only to avoid having to do any register manipulation during VBLANK. I do all computations of game objects in the main game loop, and at the end synchronize them with the STIC shadow--still outside the VBLANK context. Then, I just block-copy the STIC shadow wholesale on VBLANK using counter registers (R4 and R5).

I have an idea to optimize this in my next engine by keeping a "dirty flag" for each MOB in a vector, so that I can just blast the registers of the necessary objects. (X, Y, and A registers would be copied atomically.)

It does use more RAM, though, which may not be a big deal with architectures like the JLP boards.

This says nothing on the efficiency of your algorithm, which is very neat and interesting.

-dZ.

Edited March 8, 2013 by DZ-Jay

intvnut · March 8, 2013

I use a variation of this trick in Space Patrol, although I use a full 8 bits of fraction. I store all my X and Y values as:

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| F | F | F | F | F | F | F | F | M | M | M | M | M | M | M | M |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

and then just treat my velocities as 1s complement numbers. That way I don't have separate "positive" and "negative" code branches. That works better for my overall code flow, where I have 12 objects I need to move and track. My main velocity update loop then looks like this:

       MVI@    R5,     R0          ;   8 Get velocity
       ADD@    R4,     R0          ;   8 Add velocity to position
       ADCR    R0                  ;   6 end-around carry for 1s compl

       MVI@    R5,     R1          ;   8 Get velocity
       ADD@    R4,     R1          ;   8 Add velocity to position
       ADCR    R1                  ;   6 end-around carry for 1s compl

       SUBR    R2,     R4          ;   6 Rewind
       MVO@    R0,     R4          ;   9 Store new position
       MVO@    R1,     R4          ;   9 Store new position

That, of course, gets repeated for all 12 objects. As I mentioned, X and Y get stored in the same format, so I just do them both the same way in this code.

I don't make use of the trick you're doing, storing the MOB X flags in the X position, because I dynamically reassociate MOB attributes with object positions on a frame-to-frame basis. The code that handles merging X/Y coordinates with MOB attributes looks (approximately) like this:

       INCR    R2                  ;   6 Go to next attr entry
       MVI@    R2,     R5          ;   8 Get attribute #
       DECR    R5                  ;   6 Inactive object?
       BMI     @@gp1skip           ; 7/9 Yes:  Skip it.

       ADDI    #SPATBL,R5          ;   8 Index into global attribute table

       MVI@    R4,     R0          ;   8 Get x position
       ANDI    #$FF,   R0          ;   8
       ADD@    R5,     R0          ;   8 Merge w/ x-pos attr template
       MVO@    R0,     R3          ;   9 Store to MOB X position 

       ADDI    #8,     R3          ;   8 Move to MOB Y register

       MVI@    R4,     R0          ;   8 Get y position
       ANDI    #$FF,   R0          ;   8
       ADD@    R5,     R0          ;   8 Merge w/ y-pos attr template
       MVO@    R0,     R3          ;   9 Store to MOB Y position 

       ADDI    #8,     R3          ;   8 Move to MOB A register

       MVI@    R5,     R0          ;   8 Get attr register
       MVO@    R0,     R3          ;   9 Store to MOB A register

       SUBI    #15,    R3          ;   8 Go to next MOB X register

Yeah, there's annoying code to skip by 8/8/-15 among the X, Y and A registers. I pine for an indexed addressing mode...

If I had a tighter binding of MOBs to game objects (instead of muxing, like SP does), tricks like yours would definitely make things more efficient. Some add'l comments:

The major downside is that if you let the X coordinate part of the MOB X shadow overflow it does impact how the MOB looks. The good part of that is you get the feedback straight away .

You'd have to let it overflow by quite a lot before you could see it... INTR is the first bit you'd corrupt, and VISB is the next bit you'd corrupt.

       subi #$8001, r0         ; Move 1.5 pixels left in X.
       adcr r0                 ; Account for fractional carry.
       decr r0

So I guess you can't SUBI #$8002 here and eliminate the DECR? *headscratch*

One last:

       andi #$00FF, r0
       cmpi #SOME_LIMIT, r0

If you align SOME_LIMIT to the upper half of the word, you can save two cycles and one word of code size with:

       swap r0
       cmpi #SOME_LIMIT, r0

Another advantage of putting X and Y into LSBs is that an SDBD read can slurp up X and Y together, if you've interleaved X and Y in memory. I use a combination of both tricks to make my bounding box collision detection go more quickly in SP. See this file: http://www.spacepatrol.info/src/engine/ckggb.asm

Edited March 8, 2013 by intvnut

GroovyBee · March 8, 2013

Great stuff. By the way, I'm curious: why do you manipulated the STIC directly on VBLANK?

Because of the RAM usage. Manipulating the STIC shadow copies directly means that you don't need extra RAM to hold position information. It also means that you don't need to use extra RAM to to check if X/Y flip/size bits and visibility bits should be set/cleared. Rocketeer has 16 moving "objects" all of which need state information so system RAM is at a premium.

+Gemintronic · March 8, 2013

I can't understand the assembly but I think this kind of movement code is interesting. I tend to think of movement in terms of frames. So, for instance, I'd call the enemy AI function every other frame to get the equivalent .5 pixel movement.

+DZ-Jay · March 8, 2013

I can't understand the assembly but I think this kind of movement code is interesting. I tend to think of movement in terms of frames. So, for instance, I'd call the enemy AI function every other frame to get the equivalent .5 pixel movement.

Loon,

That works really well when you're velocity is in multiples of the frame rate. So what do you do when you're velocity is not constant due to acceleration?

GroovyBee · March 8, 2013

Thanks for the alternate viewpoints.

So I guess you can't SUBI #$8002 here and eliminate the DECR? *headscratch*

Not really. If you consider this code sequence to subtract 3.5 from 30 three times :-

   mvii #30, r0
   mvii #$8003, r1
   subr r1, r0
   adcr r0
   subr r1, r0
   adcr r0
   subr r1, r0
   adcr r0

The "answer" in r0 isn't 19.5 like you'd expect its 22.5 due to the way that carry works.

   mvii #30, r0
   mvii #$8003, r1
   subr r1, r0
   adcr r0
   decr r0
   subr r1, r0
   adcr r0
   decr r0
   subr r1, r0
   adcr r0
   decr r0

The "answer" in r0 is 19.5 like you'd expect. The addition of the "decr" instruction means the magnitude is adjusted correctly when a carry occurs.

GroovyBee · March 8, 2013

I can't understand the assembly but I think this kind of movement code is interesting. I tend to think of movement in terms of frames. So, for instance, I'd call the enemy AI function every other frame to get the equivalent .5 pixel movement.

When you have a bunch of slow moving objects you then need lots of "action counters" to check if its their time to move using this approach which takes CPU resources and RAM that may be in short supply. Fractional movement also helps NTSC to PAL (and visa-versa) conversions because you can adjust the amount an object moves in X and Y on a per frame basis. That way it'll appear to play at the same speed on both systems.

intvnut · March 8, 2013

The "answer" in r0 is 19.5 like you'd expect. The addition of the "decr" instruction means the magnitude is adjusted correctly when a carry occurs.

I think we're talking past each other. I'm saying "SUBI #$8001, R0; ADCR R0; DECR R0" should always produce the same result as "SUBI #$8002; ADCR R0", because neither the SUBI nor the DECR R0 can spill unwanted carries/borrows from the lower byte into the upper byte due to how you've constrained your numbers. Here's a quick test I did in jzIntv just to see:

0000 7000 0000 0000 01FE 103D 02F1 7001 ---Z--iq  SUBI #$8001,R0          660
7FFF 7000 0000 0000 01FE 103D 02F1 7003 ------iq  ADCR R0                 668
7FFF 7000 0000 0000 01FE 103D 02F1 7004 ------iq  DECR R0                 674
7FFE 7000 0000 0000 01FE 103D 02F1 7005 ------iq  SUBI #$8001,R0          680
FFFD 7000 0000 0000 01FE 103D 02F1 7007 S-O---iq  ADCR R0                 688
FFFD 7000 0000 0000 01FE 103D 02F1 7008 S-----iq  DECR R0                 694
FFFC 7000 0000 0000 01FE 103D 02F1 7009 S-----iq  SUBI #$8001,R0          700
7FFB 7000 0000 0000 01FE 103D 02F1 700B -C----iq  ADCR R0                 708
7FFC 7000 0000 0000 01FE 103D 02F1 700C ------iq  DECR R0                 714
7FFB 7000 0000 0000 01FE 103D 02F1 700D ------iq  CLRR R0                 720
0000 7000 0000 0000 01FE 103D 02F1 700E ---Z--iq  SUBI #$8002,R0          726
7FFE 7000 0000 0000 01FE 103D 02F1 7010 ------iq  ADCR R0                 734
7FFE 7000 0000 0000 01FE 103D 02F1 7011 ------iq  SUBI #$8002,R0          740
FFFC 7000 0000 0000 01FE 103D 02F1 7013 S-O---iq  ADCR R0                 748
FFFC 7000 0000 0000 01FE 103D 02F1 7014 S-----iq  SUBI #$8002,R0          754
7FFA 7000 0000 0000 01FE 103D 02F1 7016 -C----iq  ADCR R0                 762
7FFB 7000 0000 0000 01FE 103D 02F1 7017 ------iq  HLT                     768

Both sequences produced the same final results $7FFE, $FFFC, $7FFB.

Edited March 8, 2013 by intvnut

GroovyBee · March 8, 2013

I think we're talking past each other.

Sorry! I get what you're saying now. We are both correct. Its a failure of my contrived example because the constant I subtracted could be the address of a variable like acceleration (or some such) so adding/subtracting one to the final acceleration value elsewhere for MOB movement purposes may not be ideal either.

intvnut · March 8, 2013

Sorry! I get what you're saying now. We are both correct. Its a failure of my contrived example because the constant I subtracted could be the address of a variable like acceleration (or some such) so adding/subtracting one to the final acceleration value elsewhere for MOB movement purposes may not be ideal either.

Ah, that makes perfect sense now. :-)

BTW, I have to admit that storing the flags in this particular way isn't something that's occurred to me. (Or if it has, I've forgotten. Happens more than I'd like these days.) Anyway, I do definitely like it, and I'll be adding it to my bag of tricks.

I hope my initial post above didn't come across the wrong way. I just wanted to share a related set of tricks. This is good stuff.

Edited March 8, 2013 by intvnut

GroovyBee · March 8, 2013

Ah, that makes perfect sense now. :-)

No worries. I should have used an address and not a constant which would have made it less ambiguous.

BTW, I have to admit that storing the flags in this particular way isn't something that's occurred to me. (Or if it has, I've forgotten. Happens more than I'd like these days.) Anyway, I do definitely like it, and I'll be adding it to my bag of tricks.

Agreed! Its always good to have new tricks to call on.

I hope my initial post above didn't come across the wrong way. I just wanted to share a related set of tricks. This is good stuff.

Nah! Don't worry! Discussion is all good if it brings new concepts and (hopefully) games to the Inty table.

+DZ-Jay · January 4, 2015

Another advantage of putting X and Y into LSBs is that an SDBD read can slurp up X and Y together, if you've interleaved X and Y in memory. I use a combination of both tricks to make my bounding box collision detection go more quickly in SP. See this file: http://www.spacepatrol.info/src/engine/ckggb.asm

It's time to bump this thread back into the light, for I am currently implementing the newest version of the P-Machinery sprite driver.

Now to the point above, I didn't know that if you used SDBD on 16-bit RAM it will actually just read the LSB of each word... that is actually quite useful!

My main concern in using the technique described in this thread is that it would make it more costly to use the position information for anything other than movement, such as collision/edge detection. However, intvnut's suggestion of using SDBD to slurp X and Y together (which I missed on my first read of this thread a year ago) seems to compensate for that to some extent.

The other question I have is regarding the treatment of velocities as 1's complement. I don't quite get what that gains us, could someone please expand on this?

-dZ.

intvnut · January 4, 2015

It's time to bump this thread back into the light, for I am currently implementing the newest version of the P-Machinery sprite driver.

Now to the point above, I didn't know that if you used SDBD on 16-bit RAM it will actually just read the LSB of each word... that is actually quite useful!

My main concern in using the technique described in this thread is that it would make it more costly to use the position information for anything other than movement, such as collision/edge detection. However, intvnut's suggestion of using SDBD to slurp X and Y together (which I missed on my first read of this thread a year ago) seems to compensate for that to some extent.

The other question I have is regarding the treatment of velocities as 1's complement. I don't quite get what that gains us, could someone please expand on this?

-dZ.

It allows you to add the velocity without doing a SWAP/ADD/SWAP or adding guard bits. In the swapped representation, you have three main options for adding the velocity that I can think of (others chime in if there's something I don't think of here):

SWAP back to normal representation, add, SWAP back.
Put guard bits between lower and upper halves. IIRC, that requires ADD, ADCR and AND to do the add.
Use 1s complement addition to add the velocity. The velocity add is just ADD, ADCR, but negative velocities need to be adjusted down by 1 ahead of time (0xFFFF is "minus zero"). Or, you live with the slight 1/256 bias.

As far as collision / edge detection, you can do that with packed arithmetic too. That's what I do in Space Patrol.

.

@@checkmob:                           
        MVI@    R5,     R3      ;   8  Get attr for next mob
        DECR    R3              ;   6  Zero?  Skip to next.
        BMI     @@skipmob       ; 7/9
        ADDI    #SPATBL+3, R3   ;   8  Offset to 'size' info in MOB record
                                      
        SDBD                    ;   4      
        MVI@    R4,     R1      ;  10  Get Y/X coordinate
                                      
        MOVR    R1,     R2      ;   6
                                      
        SUB     GGB1,   R1      ;  10  Check the lower right corner
        BNC     @@out1          ; 7/9  If Y went -ve, bullet's to the right
        SWAP    R1,     1       ;   6  
        SWAP    R1,     1       ;   6  
        BMI     @@out1          ; 7/9  If X went -ve, bullet's below

catsfolly · January 5, 2015

DZ-Jay Said:

The other question I have is regarding the treatment of velocities as 1's complement.

Besides, treating velocities as "one's insult" is just so, well, negative. Velocities deserve better treatment than this.

Compliments are nicer than insults. ;-)

If "Two's a complement",

does that mean that "Three's just flattery"?

And "Four's a bunch of yes men"?

So "five's a full house"?

And "six's a hex of a gong"?

Sorry, my brain is tired and I have nothing constructive to ADD,

Catsfolly

More efficient sub-pixel movement

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members