GroovyBee Posted March 8, 2013 Share Posted March 8, 2013 On the CP1610 its convenient to express an 8.8 fixed point number in the form :- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | M | M | M | M | M | M | M | M | F | F | F | F | F | F | F | F | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ Where M are bits of the magnitude (the integer part) and F are the bits of the fractional part. Which can lead you to write code like this for movement in the game's main loop :- ; Move right mvi playerX, r0 addi #$0180, r0 ; Move 1.5 pixels right in X. mvo r0, playerX andi #$FF00, r0 cmpi #SOME_LIMIT, r0 ; unsigned 16bit compares follow e.g. bc/bnc ... And this in the VBLANK ISR :- mvi thePlayerX, r0 swap r0 ; Move magnitude into position. andi #$00FF, r0 ; Magnitude only. xori #STIC.mobx_visb+STIC.mobx_intrm r0 mvo r0, STIC.mob0_x .... So how about we optimise the code? Lets start in the ISR :- 1) Lets get rid of the "swap r0". We can do this by making the magnitude part the Least Significant Byte (LSB) and the fraction part the Most Significant Byte (MSB) in the movement variables we operate on. The only downside is that we now have to manually account for the carry generated when adding to the fractional part of the variable. The main loop code now becomes :- ; Move right mvi playerX, r0 addi #$8001, r0 ; Move 1.5 pixels right in X. adcr r0 ; Account for fractional carry. mvo r0, playerX andi #$00FF, r0 cmpi #SOME_LIMIT, r0 ; Use signed branches because the data to compare against is a byte in size. ... And this in the VBLANK ISR :- mvi thePlayerX, r0 andi #$00FF, r0 ; Magnitude only. xori #STIC.mobx_visb+STIC.mobx_intrm r0 mvo r0, STIC.mob0_x .... So... We haven't got rid of swap we exchanged it for another opcode . 2) In arcade games 8 fractional bits of movement is probably a bit of overkill so lets reduce that part to say 5 bits so we now have a fixed point word format like this :- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | F | F | F | F | F | X | X | X | M | M | M | M | M | M | M | M | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ Where F is the fractional part, X are don't care and M is the magnitude. Hmmmm.... That layout looks strangely familiar . It matches the layout of the STIC's MOB X register exactly! :- +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | | | | X | V | I | | | | | | | | | | | | | | | | I | N | | | | X | | | | | | X | X | X | X | X | S | S | T | | C | O | O | R | D | | | | | | | | | I | B | R | | | | | | | | | | | | | | | Z | | | | | | | | | | | | | | | | | E | | | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ So the main loop game code becomes :- ; Move right mvi playerMobXShadow, r0 addi #$8001, r0 ; Move 1.5 pixels right in X. adcr r0 ; Account for fractional carry. mvo r0, playerMobXShadow andi #$00FF, r0 cmpi #SOME_LIMIT, r0 ; Use signed branches because the data to compare against is a byte in size. ... And the VBLANK ISR can be optimised down to :- mvii #theListOfXCoordinates, r4 mvii #STIC.mob0_x, r5 ; Handle the list of X coordinates. mvi@ r4, r0 mvo@ r0, r5 mvi@ r4, r0 mvo@ r0, r5 mvi@ r4, r0 mvo@ r0, r5 ... The variable playerMobXShadow located in system RAM (16 bit wide) would be initialised at game set-up time with the X position, visibility, interaction and size all at the same time. Although you end up writing the fractional part of the movement to the STIC in the ISR it ignores those bits. The major downside is that if you let the X coordinate part of the MOB X shadow overflow it does impact how the MOB looks. The good part of that is you get the feedback straight away . The same principle can be applied to handling the STIC MOB Y shadow copies as well. In the case of Y the fractional part is only 4 bits. It also means that you only keep direct shadow copies of the STIC X and Y registers which cuts down on the number of variables and logic especially if you are using the X and Y flip bits in the MOB Y register. I almost forgot.... the code for left movement uses this :- ; Move left mvi playerMobXShadow, r0 subi #$8001, r0 ; Move 1.5 pixels left in X. adcr r0 ; Account for fractional carry. decr r0 mvo r0, playerMobXShadow andi #$00FF, r0 cmpi #SOME_LIMIT, r0 ; Use signed branches because the data to compare against is a byte in size. ... I'll leave that as an exercise to the reader to understand . 1 Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted March 8, 2013 Share Posted March 8, 2013 (edited) Great stuff. By the way, I'm curious: why do you manipulated the STIC directly on VBLANK? My game variables include game objects with sub-pixel positions and a STIC shadow structure. The STIC shadow is separated from the game objects and serves only to avoid having to do any register manipulation during VBLANK. I do all computations of game objects in the main game loop, and at the end synchronize them with the STIC shadow--still outside the VBLANK context. Then, I just block-copy the STIC shadow wholesale on VBLANK using counter registers (R4 and R5). I have an idea to optimize this in my next engine by keeping a "dirty flag" for each MOB in a vector, so that I can just blast the registers of the necessary objects. (X, Y, and A registers would be copied atomically.) It does use more RAM, though, which may not be a big deal with architectures like the JLP boards. This says nothing on the efficiency of your algorithm, which is very neat and interesting. -dZ. Edited March 8, 2013 by DZ-Jay Quote Link to comment Share on other sites More sharing options...
intvnut Posted March 8, 2013 Share Posted March 8, 2013 (edited) I use a variation of this trick in Space Patrol, although I use a full 8 bits of fraction. I store all my X and Y values as: +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | F | F | F | F | F | F | F | F | M | M | M | M | M | M | M | M | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ and then just treat my velocities as 1s complement numbers. That way I don't have separate "positive" and "negative" code branches. That works better for my overall code flow, where I have 12 objects I need to move and track. My main velocity update loop then looks like this: MVI@ R5, R0 ; 8 Get velocity ADD@ R4, R0 ; 8 Add velocity to position ADCR R0 ; 6 end-around carry for 1s compl MVI@ R5, R1 ; 8 Get velocity ADD@ R4, R1 ; 8 Add velocity to position ADCR R1 ; 6 end-around carry for 1s compl SUBR R2, R4 ; 6 Rewind MVO@ R0, R4 ; 9 Store new position MVO@ R1, R4 ; 9 Store new position That, of course, gets repeated for all 12 objects. As I mentioned, X and Y get stored in the same format, so I just do them both the same way in this code. I don't make use of the trick you're doing, storing the MOB X flags in the X position, because I dynamically reassociate MOB attributes with object positions on a frame-to-frame basis. The code that handles merging X/Y coordinates with MOB attributes looks (approximately) like this: INCR R2 ; 6 Go to next attr entry MVI@ R2, R5 ; 8 Get attribute # DECR R5 ; 6 Inactive object? BMI @@gp1skip ; 7/9 Yes: Skip it. ADDI #SPATBL,R5 ; 8 Index into global attribute table MVI@ R4, R0 ; 8 Get x position ANDI #$FF, R0 ; 8 ADD@ R5, R0 ; 8 Merge w/ x-pos attr template MVO@ R0, R3 ; 9 Store to MOB X position ADDI #8, R3 ; 8 Move to MOB Y register MVI@ R4, R0 ; 8 Get y position ANDI #$FF, R0 ; 8 ADD@ R5, R0 ; 8 Merge w/ y-pos attr template MVO@ R0, R3 ; 9 Store to MOB Y position ADDI #8, R3 ; 8 Move to MOB A register MVI@ R5, R0 ; 8 Get attr register MVO@ R0, R3 ; 9 Store to MOB A register SUBI #15, R3 ; 8 Go to next MOB X register Yeah, there's annoying code to skip by 8/8/-15 among the X, Y and A registers. I pine for an indexed addressing mode... If I had a tighter binding of MOBs to game objects (instead of muxing, like SP does), tricks like yours would definitely make things more efficient. Some add'l comments: The major downside is that if you let the X coordinate part of the MOB X shadow overflow it does impact how the MOB looks. The good part of that is you get the feedback straight away . You'd have to let it overflow by quite a lot before you could see it... INTR is the first bit you'd corrupt, and VISB is the next bit you'd corrupt. subi #$8001, r0 ; Move 1.5 pixels left in X. adcr r0 ; Account for fractional carry. decr r0 So I guess you can't SUBI #$8002 here and eliminate the DECR? *headscratch* One last: andi #$00FF, r0 cmpi #SOME_LIMIT, r0 If you align SOME_LIMIT to the upper half of the word, you can save two cycles and one word of code size with: swap r0 cmpi #SOME_LIMIT, r0 Another advantage of putting X and Y into LSBs is that an SDBD read can slurp up X and Y together, if you've interleaved X and Y in memory. I use a combination of both tricks to make my bounding box collision detection go more quickly in SP. See this file: http://www.spacepatrol.info/src/engine/ckggb.asm Edited March 8, 2013 by intvnut Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted March 8, 2013 Author Share Posted March 8, 2013 Great stuff. By the way, I'm curious: why do you manipulated the STIC directly on VBLANK? Because of the RAM usage. Manipulating the STIC shadow copies directly means that you don't need extra RAM to hold position information. It also means that you don't need to use extra RAM to to check if X/Y flip/size bits and visibility bits should be set/cleared. Rocketeer has 16 moving "objects" all of which need state information so system RAM is at a premium. Quote Link to comment Share on other sites More sharing options...
+Gemintronic Posted March 8, 2013 Share Posted March 8, 2013 I can't understand the assembly but I think this kind of movement code is interesting. I tend to think of movement in terms of frames. So, for instance, I'd call the enemy AI function every other frame to get the equivalent .5 pixel movement. Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted March 8, 2013 Share Posted March 8, 2013 I can't understand the assembly but I think this kind of movement code is interesting. I tend to think of movement in terms of frames. So, for instance, I'd call the enemy AI function every other frame to get the equivalent .5 pixel movement. Loon, That works really well when you're velocity is in multiples of the frame rate. So what do you do when you're velocity is not constant due to acceleration? Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted March 8, 2013 Author Share Posted March 8, 2013 Thanks for the alternate viewpoints. So I guess you can't SUBI #$8002 here and eliminate the DECR? *headscratch* Not really. If you consider this code sequence to subtract 3.5 from 30 three times :- mvii #30, r0 mvii #$8003, r1 subr r1, r0 adcr r0 subr r1, r0 adcr r0 subr r1, r0 adcr r0 The "answer" in r0 isn't 19.5 like you'd expect its 22.5 due to the way that carry works. mvii #30, r0 mvii #$8003, r1 subr r1, r0 adcr r0 decr r0 subr r1, r0 adcr r0 decr r0 subr r1, r0 adcr r0 decr r0 The "answer" in r0 is 19.5 like you'd expect. The addition of the "decr" instruction means the magnitude is adjusted correctly when a carry occurs. Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted March 8, 2013 Author Share Posted March 8, 2013 I can't understand the assembly but I think this kind of movement code is interesting. I tend to think of movement in terms of frames. So, for instance, I'd call the enemy AI function every other frame to get the equivalent .5 pixel movement. When you have a bunch of slow moving objects you then need lots of "action counters" to check if its their time to move using this approach which takes CPU resources and RAM that may be in short supply. Fractional movement also helps NTSC to PAL (and visa-versa) conversions because you can adjust the amount an object moves in X and Y on a per frame basis. That way it'll appear to play at the same speed on both systems. 1 Quote Link to comment Share on other sites More sharing options...
intvnut Posted March 8, 2013 Share Posted March 8, 2013 (edited) The "answer" in r0 is 19.5 like you'd expect. The addition of the "decr" instruction means the magnitude is adjusted correctly when a carry occurs. I think we're talking past each other. I'm saying "SUBI #$8001, R0; ADCR R0; DECR R0" should always produce the same result as "SUBI #$8002; ADCR R0", because neither the SUBI nor the DECR R0 can spill unwanted carries/borrows from the lower byte into the upper byte due to how you've constrained your numbers. Here's a quick test I did in jzIntv just to see: 0000 7000 0000 0000 01FE 103D 02F1 7001 ---Z--iq SUBI #$8001,R0 660 7FFF 7000 0000 0000 01FE 103D 02F1 7003 ------iq ADCR R0 668 7FFF 7000 0000 0000 01FE 103D 02F1 7004 ------iq DECR R0 674 7FFE 7000 0000 0000 01FE 103D 02F1 7005 ------iq SUBI #$8001,R0 680 FFFD 7000 0000 0000 01FE 103D 02F1 7007 S-O---iq ADCR R0 688 FFFD 7000 0000 0000 01FE 103D 02F1 7008 S-----iq DECR R0 694 FFFC 7000 0000 0000 01FE 103D 02F1 7009 S-----iq SUBI #$8001,R0 700 7FFB 7000 0000 0000 01FE 103D 02F1 700B -C----iq ADCR R0 708 7FFC 7000 0000 0000 01FE 103D 02F1 700C ------iq DECR R0 714 7FFB 7000 0000 0000 01FE 103D 02F1 700D ------iq CLRR R0 720 0000 7000 0000 0000 01FE 103D 02F1 700E ---Z--iq SUBI #$8002,R0 726 7FFE 7000 0000 0000 01FE 103D 02F1 7010 ------iq ADCR R0 734 7FFE 7000 0000 0000 01FE 103D 02F1 7011 ------iq SUBI #$8002,R0 740 FFFC 7000 0000 0000 01FE 103D 02F1 7013 S-O---iq ADCR R0 748 FFFC 7000 0000 0000 01FE 103D 02F1 7014 S-----iq SUBI #$8002,R0 754 7FFA 7000 0000 0000 01FE 103D 02F1 7016 -C----iq ADCR R0 762 7FFB 7000 0000 0000 01FE 103D 02F1 7017 ------iq HLT 768 Both sequences produced the same final results $7FFE, $FFFC, $7FFB. Edited March 8, 2013 by intvnut Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted March 8, 2013 Author Share Posted March 8, 2013 I think we're talking past each other. Sorry! I get what you're saying now. We are both correct. Its a failure of my contrived example because the constant I subtracted could be the address of a variable like acceleration (or some such) so adding/subtracting one to the final acceleration value elsewhere for MOB movement purposes may not be ideal either. Quote Link to comment Share on other sites More sharing options...
intvnut Posted March 8, 2013 Share Posted March 8, 2013 (edited) Sorry! I get what you're saying now. We are both correct. Its a failure of my contrived example because the constant I subtracted could be the address of a variable like acceleration (or some such) so adding/subtracting one to the final acceleration value elsewhere for MOB movement purposes may not be ideal either. Ah, that makes perfect sense now. :-) BTW, I have to admit that storing the flags in this particular way isn't something that's occurred to me. (Or if it has, I've forgotten. Happens more than I'd like these days.) Anyway, I do definitely like it, and I'll be adding it to my bag of tricks. I hope my initial post above didn't come across the wrong way. I just wanted to share a related set of tricks. This is good stuff. Edited March 8, 2013 by intvnut Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted March 8, 2013 Author Share Posted March 8, 2013 Ah, that makes perfect sense now. :-) No worries. I should have used an address and not a constant which would have made it less ambiguous. BTW, I have to admit that storing the flags in this particular way isn't something that's occurred to me. (Or if it has, I've forgotten. Happens more than I'd like these days.) Anyway, I do definitely like it, and I'll be adding it to my bag of tricks. Agreed! Its always good to have new tricks to call on. I hope my initial post above didn't come across the wrong way. I just wanted to share a related set of tricks. This is good stuff. Nah! Don't worry! Discussion is all good if it brings new concepts and (hopefully) games to the Inty table. Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted January 4, 2015 Share Posted January 4, 2015 Another advantage of putting X and Y into LSBs is that an SDBD read can slurp up X and Y together, if you've interleaved X and Y in memory. I use a combination of both tricks to make my bounding box collision detection go more quickly in SP. See this file: http://www.spacepatrol.info/src/engine/ckggb.asm It's time to bump this thread back into the light, for I am currently implementing the newest version of the P-Machinery sprite driver. Now to the point above, I didn't know that if you used SDBD on 16-bit RAM it will actually just read the LSB of each word... that is actually quite useful! My main concern in using the technique described in this thread is that it would make it more costly to use the position information for anything other than movement, such as collision/edge detection. However, intvnut's suggestion of using SDBD to slurp X and Y together (which I missed on my first read of this thread a year ago) seems to compensate for that to some extent. The other question I have is regarding the treatment of velocities as 1's complement. I don't quite get what that gains us, could someone please expand on this? -dZ. Quote Link to comment Share on other sites More sharing options...
intvnut Posted January 4, 2015 Share Posted January 4, 2015 It's time to bump this thread back into the light, for I am currently implementing the newest version of the P-Machinery sprite driver. Now to the point above, I didn't know that if you used SDBD on 16-bit RAM it will actually just read the LSB of each word... that is actually quite useful! My main concern in using the technique described in this thread is that it would make it more costly to use the position information for anything other than movement, such as collision/edge detection. However, intvnut's suggestion of using SDBD to slurp X and Y together (which I missed on my first read of this thread a year ago) seems to compensate for that to some extent. The other question I have is regarding the treatment of velocities as 1's complement. I don't quite get what that gains us, could someone please expand on this? -dZ. It allows you to add the velocity without doing a SWAP/ADD/SWAP or adding guard bits. In the swapped representation, you have three main options for adding the velocity that I can think of (others chime in if there's something I don't think of here): SWAP back to normal representation, add, SWAP back. Put guard bits between lower and upper halves. IIRC, that requires ADD, ADCR and AND to do the add. Use 1s complement addition to add the velocity. The velocity add is just ADD, ADCR, but negative velocities need to be adjusted down by 1 ahead of time (0xFFFF is "minus zero"). Or, you live with the slight 1/256 bias. As far as collision / edge detection, you can do that with packed arithmetic too. That's what I do in Space Patrol. . @@checkmob: MVI@ R5, R3 ; 8 Get attr for next mob DECR R3 ; 6 Zero? Skip to next. BMI @@skipmob ; 7/9 ADDI #SPATBL+3, R3 ; 8 Offset to 'size' info in MOB record SDBD ; 4 MVI@ R4, R1 ; 10 Get Y/X coordinate MOVR R1, R2 ; 6 SUB GGB1, R1 ; 10 Check the lower right corner BNC @@out1 ; 7/9 If Y went -ve, bullet's to the right SWAP R1, 1 ; 6 SWAP R1, 1 ; 6 BMI @@out1 ; 7/9 If X went -ve, bullet's below 2 Quote Link to comment Share on other sites More sharing options...
catsfolly Posted January 5, 2015 Share Posted January 5, 2015 DZ-Jay Said: The other question I have is regarding the treatment of velocities as 1's complement. Besides, treating velocities as "one's insult" is just so, well, negative. Velocities deserve better treatment than this. Compliments are nicer than insults. If "Two's a complement", does that mean that "Three's just flattery"? And "Four's a bunch of yes men"? So "five's a full house"? And "six's a hex of a gong"? Sorry, my brain is tired and I have nothing constructive to ADD, Catsfolly Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.