Jump to content
IGNORED

Sprite/Background collision proof of concept.


marc.hull

Recommended Posts

The attachment is a .dsk file. Change the suffix to .dsk and use TIDIR or PC-TI99 to transfer. This is fairly rough and has some bugs but is a proof that PIXEL perfect, real time collision detection is possible between Sprites and tiles with no record keeping what so ever. It has some bugs like wrapping the sprite causes errors etc. When I get it perfected I'll post the code.

 

You may have to spend some time figuring out how detection can be accomplished from this limited demo but I assure you it works. I can also made to be selectable so that certain tiles do not trigger it.

DEMO.txt

Link to comment
Share on other sites

The attachment is a .dsk file. Change the suffix to .dsk and use TIDIR or PC-TI99 to transfer. This is fairly rough and has some bugs but is a proof that PIXEL perfect, real time collision detection is possible between Sprites and tiles with no record keeping what so ever. It has some bugs like wrapping the sprite causes errors etc. When I get it perfected I'll post the code.

 

You may have to spend some time figuring out how detection can be accomplished from this limited demo but I assure you it works. I can also made to be selectable so that certain tiles do not trigger it.

 

Marc, can you give some details how you are doing this.

Also would you consider "donating" the collision detection routine for inclusion in the spectra2 1.1 release ?

spectra2 is currently still missing a collision detection routine.

 

Sorry can't offer you $$$$ but you will be credited :D

 

Either way, excellent stuff :thumbsup:

Link to comment
Share on other sites

The attachment is a .dsk file. Change the suffix to .dsk and use TIDIR or PC-TI99 to transfer. This is fairly rough and has some bugs but is a proof that PIXEL perfect, real time collision detection is possible between Sprites and tiles with no record keeping what so ever. It has some bugs like wrapping the sprite causes errors etc. When I get it perfected I'll post the code.

 

You may have to spend some time figuring out how detection can be accomplished from this limited demo but I assure you it works. I can also made to be selectable so that certain tiles do not trigger it.

 

Marc, can you give some details how you are doing this.

Also would you consider "donating" the collision detection routine for inclusion in the spectra2 1.1 release ?

spectra2 is currently still missing a collision detection routine.

 

Sorry can't offer you $$$$ but you will be credited :D

 

Either way, excellent stuff :thumbsup:

 

The routine basically reads the characters under the sprite, extracts their pattern data and parses it down to needed bits to create a pattern that is a match to the 16 x 16 grid under the sprite. It then defines the sprite as such. Not complete as detection in itself but......

 

If this sprite were made transparent and a game play sprite was laid on top of it then normal sprite/sprite collision detection can be used as sprite/background collision detection.

 

The routine is currently set up for graphics 1 mode and a 16 x 16 sprite. It would have to be modified for bit map mode so you would most likely need two different routines. It "should" work for regular sized sprites through double size but not double sized and magnified.

 

The VDP RAM set-up is hard coded but making that relative shouldn't be too tough a task.

 

At any rate I'll send you Source this evening and post it here when it is fully pared down.

 

Marc

  • Like 1
Link to comment
Share on other sites

 

The routine basically reads the characters under the sprite, extracts their pattern data and parses it down to needed bits to create a pattern that is a match to the 16 x 16 grid under the sprite. It then defines the sprite as such. Not complete as detection in itself but......

 

If this sprite were made transparent and a game play sprite was laid on top of it then normal sprite/sprite collision detection can be used as sprite/background collision detection.

 

The routine is currently set up for graphics 1 mode and a 16 x 16 sprite. It would have to be modified for bit map mode so you would most likely need two different routines. It "should" work for regular sized sprites through double size but not double sized and magnified.

 

The VDP RAM set-up is hard coded but making that relative shouldn't be too tough a task.

 

At any rate I'll send you Source this evening and post it here when it is fully pared down.

 

Marc

 

Marc, I suspect collision detection will deteriorate proportionately to sprite velocity given all the background work that is happening, more so than with usual sprite to sprite detection, unless you predefine sprites for selected areas on the screen that you want to detect collision against. Am I understanding this correctly?

I also assume that this is useful only for mobile tiles, because otherwise you could simply check the (X,Y) position of the sprite against a fixed table of tile coordinates with appropriate range checking, i.e. if sprite is within X and Y of the uppper left corner of a tile, then detection occurs. If Sprite is X,Y and tile is x,y, then the sign of X-x and Y-y should tell you which quadrant you are in to further refine the detection.

Am I completely off here?

Link to comment
Share on other sites

 

Marc, I suspect collision detection will deteriorate proportionately to sprite velocity given all the background work that is happening, more so than with usual sprite to sprite detection, unless you predefine sprites for selected areas on the screen that you want to detect collision against. Am I understanding this correctly?

I also assume that this is useful only for mobile tiles, because otherwise you could simply check the (X,Y) position of the sprite against a fixed table of tile coordinates with appropriate range checking, i.e. if sprite is within X and Y of the uppper left corner of a tile, then detection occurs. If Sprite is X,Y and tile is x,y, then the sign of X-x and Y-y should tell you which quadrant you are in to further refine the detection.

Am I completely off here?

 

If you are talking about auto sprite motion (which is what I am assuming) then the collision detection will still work perfectly. The interrupt will have to be off when the routine is called therefor the sprite will not be in motion when the code does its work. The demo yields about 70-80 cycles per second which should be plenty fast enough for all but an insane games ;-).

 

Of course you can do normal x,y coordinate checking to determine proximity to a tile and collision but this will not give you PIXEL accuracy which this routine does. The pixel accuracy is important (IMO) because it is possible to encroach on a tile without having pixel collisions (think of a lander navigating down some treacherous jagged mountain.) It could also be expanded to ignore certain tiles therefor giving you selective detection as well.

 

Additionally the programmer will not have to keep track of anything as the routine reads the screen in real time.

 

In short, it works exactly like sprite/sprite detection except it is sprite/background.

After the routine is run, simply check the collision bit. The ownership is on the programmer to determine if the player sprite hit another sprite or the background but that is easy enough. It will cost you one sprite to implement and reduce your visible sprites on a line to 3 but well worth it if you ask me.

Link to comment
Share on other sites

Nice work, Marc!

 

For other conceptual approaches, don't overlook the console GPL COINC routine (which is not the XB CALL COINC), it does pixel perfect collision against two arbitrary objects using a lookup table. Although it seems overcomplicated to me (and is actually a GPL function), some of the concepts in there (like precalculating which offsets are collisions) are possibly useful if you have a small number of collision cases. Thierry documents it here, and also has a utility for making the tables: http://nouspikel.group.shef.ac.uk//ti99/gpl2.htm#COIN

Link to comment
Share on other sites

Marc,this is quite interesting. If you kept the screen in CPU RAM and performed detection there, would it speed things up? I'm assuming you are reading VDP based on your comments but perhaps misunderstood.

 

It speeds thing a bit but not by much. I am working on crunching it down to slide it into scratchpad. That should put it into turbo mode ;-)

Link to comment
Share on other sites

Nice work, Marc!

 

For other conceptual approaches, don't overlook the console GPL COINC routine (which is not the XB CALL COINC), it does pixel perfect collision against two arbitrary objects using a lookup table. Although it seems overcomplicated to me (and is actually a GPL function), some of the concepts in there (like precalculating which offsets are collisions) are possibly useful if you have a small number of collision cases. Thierry documents it here, and also has a utility for making the tables: http://nouspikel.group.shef.ac.uk//ti99/gpl2.htm#COIN

 

That looked like Chinese arithmetic to me. Way to complicated ;-)

Link to comment
Share on other sites

So... if I understand this right, there are always two sprites. The first is your character, ship, whatever... the second is a "reflection" or copy of whatever background tiles are underneath and once the patterns are copied, and the sprite redefined, the built-in collision routine tests for overlap? Or is the VDP doing the work?

 

(be sure to add comments to that source code - grin)

Link to comment
Share on other sites

So... if I understand this right, there are always two sprites. The first is your character, ship, whatever... the second is a "reflection" or copy of whatever background tiles are underneath and once the patterns are copied, and the sprite redefined, the built-in collision routine tests for overlap? Or is the VDP doing the work?

 

(be sure to add comments to that source code - grin)

 

Yes two sprites. 1 player and 1 a reflection of the bitmap directly beneath the sprite.

 

The routine does no comparisons. It relies on the VDP to check for sprite/sprite coincidence.

 

Comments ???? We don't need no stinkin Comments ;-).

Link to comment
Share on other sites

Here is the commented source and EA3 executable for the collision detection demo. Again change the .txt to .dsk and unroll in TIDIR or TI99-pc. I have done as much as I can to speed it up. Looks to be over 80 cycles per second. That is as much as I can see to do without shoving it down into the scratchpad area. If anyone can see a speed up please chime in. I am starting on a new game to utilize this feature. See you in a couple of months ;-)

Marc

Link to comment
Share on other sites

Here is the commented source and EA3 executable for the collision detection demo. Again change the .txt to .dsk and unroll in TIDIR or TI99-pc. I have done as much as I can to speed it up. Looks to be over 80 cycles per second. That is as much as I can see to do without shoving it down into the scratchpad area. If anyone can see a speed up please chime in. I am starting on a new game to utilize this feature. See you in a couple of months ;-)

Marc

 

Sorry, I must have too much blood in my alcohol system for a Friday afternoon. File attached

demo.txt

Link to comment
Share on other sites

hehe.. for anyone checking, demo.txt is a DSK file. (er: which I now see Marc said... why is it a txt file? ;) )

Cause AA wouldn't let me upload a .dsk file for some reason.

 

Doesn't anybody compress files anymore? ;)

Link to comment
Share on other sites

Looks like pretty nice code! I had a look with an eye toward performance optimizations, like you wondered.

 

One simple one I might suggest, replace your VMBR and VMBW routines with routines that unroll and do two bytes per loop (or even four!) Since you are generally doing blocks of 8 bytes, you'll see some improvements there:

 

VMBR1  MOVB @VDPRD,*R1+   * 14 (MOVB) + 4 (read instruction 8-bit) + 8 (symbolic) + 4 (read symbolic) + 6 (WR indirect/auto-inc) + 0+0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
      DEC  R2            * 10 (DEC) + 4 (read instruction 8-bit) + 0+0 (WR in scratchpad)
      JNE VMBR1          * 10 (JNE) + 4 (read instruction 8-bit)

This loop costs 72 cycles per byte read from VDP. The nice thing about unrolling to two is that the DECT takes the same time as a DEC:

 

VMBR1  MOVB @VDPRD,*R1+   * 14 (MOVB) + 4 (read instruction 8-bit) + 8 (symbolic) + 4 (read symbolic) + 6 (WR indirect/auto-inc) + 0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
      MOVB @VDPRD,*R1+   * 14 (MOVB) + 4 (read instruction 8-bit) + 8 (symbolic) + 4 (read symbolic) + 6 (WR indirect/auto-inc) + 0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
      DECT  R2           * 10 (DECT) + 4 (read instruction 8-bit) + 0+0 (WR in scratchpad)
      JNE VMBR1          * 10 (JNE) + 4 (read instruction 8-bit)

This loop costs 116 cycles per two bytes, or 58 cycles per byte. The symbolic addressing is killing you, though, if you can free up another register, lose that. Assume VDPRD is in R5:

 

VMBR1 MOVB *R5,*R1+       * 14 (MOVB) + 4 (read instruction 8-bit) + 4 (read indirect) + 0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
     MOVB *R5,*R1+       * 14 (MOVB) + 4 (read instruction 8-bit) + 4 (read indirect) + 0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
     DECT R2             * 10 (DECT) + 4 (read instruction 8-bit) + 0+0 (WR in scratchpad)
     JNE VMBR1           * 10 (JNE) + 4 (read instruction 8-bit)

This loop needs 88 cycles per loop, or 44 cycles per byte, which is almost 40% faster than the original loop as long as you can guarantee you'll always pass even counts. ;)

 

Even so.. your VMBR and VMBW routines do almost as much setup as work in cases where you only copy 8 bytes at a time - you might consider just inlining them and using locally free registers, if you can, to save all that setup time. Large setup pays off better on longer loops. Consider this, here's your VMBR setup and teardown:

 

VMBR   MOV  R1,@R1HOLD    * 14+4 + 0 + 8+4 + 4+4 = 38
      MOV  R2,@R2HOLD    * 14+4 + 0 + 8+4 + 4+4 = 38
      MOVB @R0LB,@VDPWA  * 14+4 + 8+4 + 8+4 + 4 + 4+4 = 54
      MOVB R0,@VDPWA     * 14+4 + 0 + 8+4 + 4+4 = 38
...
      MOV  @R1HOLD,R1    * 14+4 + 8+4 + 0 + 0 = 30
      MOV  @R2HOLD,R2    * 14+4 + 8+4 + 0 + 0 = 30
      RT                 * 8+4 + 4 = 16

38+38+54+38+30+30+16 = 244 cycles. The BL is (12+4 + 8+4) = 28 cycles, making the total overhead 272 cycles.

 

But if you could reserve a register for the VDP read address (or initialize it with LI at a cost of 20 cycles), you could just unroll for 8 bytes:

 

       MOVB *R5,*R1+      * 14+4 + 4+0 + 6+0+0 + 4 + 4+4 = 40

Repeat 8 times for a cost of 320 cycles. The overhead of VMBR alone (before any work at all is done) is 85% of the entire time needed to unroll it inline. (And in 8-bit RAM it should not overrun).

 

The other thing is, that I suspect you might actually get better performance by not mirroring the sprite coordinates or screen table in CPU RAM -- but that depends on the application. It's not a bad idea to keep the sprite table in CPU RAM, for instance, to ease flickering or rapid updates without worrying about the VDP. Having separate addresses that you need to load might offset that advantage though, if you don't have them in RAM already it's probably almost as fast to read the sprite table in VRAM to get the coordinates.

 

Algorithm wise, it looks really solid, and the only thing that jumps out is more conceptual than practical... if you can find a way to skip over blocks of data you don't need to copy, you can save a little more time. (For instance, rows above or below the sprite don't need to be copied to the work buffer.) It's hard to say if the calculation to determine that can be made faster than a blind copy, though. :)

 

Nice routine, nice concept!

Link to comment
Share on other sites

Looks like pretty nice code! I had a look with an eye toward performance optimizations, like you wondered.

 

One simple one I might suggest, replace your VMBR and VMBW routines with routines that unroll and do two bytes per loop (or even four!) Since you are generally doing blocks of 8 bytes, you'll see some improvements there:

 

VMBR1  MOVB @VDPRD,*R1+   * 14 (MOVB) + 4 (read instruction 8-bit) + 8 (symbolic) + 4 (read symbolic) + 6 (WR indirect/auto-inc) + 0+0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
      DEC  R2            * 10 (DEC) + 4 (read instruction 8-bit) + 0+0 (WR in scratchpad)
      JNE VMBR1          * 10 (JNE) + 4 (read instruction 8-bit)

This loop costs 72 cycles per byte read from VDP. The nice thing about unrolling to two is that the DECT takes the same time as a DEC:

 

VMBR1  MOVB @VDPRD,*R1+   * 14 (MOVB) + 4 (read instruction 8-bit) + 8 (symbolic) + 4 (read symbolic) + 6 (WR indirect/auto-inc) + 0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
      MOVB @VDPRD,*R1+   * 14 (MOVB) + 4 (read instruction 8-bit) + 8 (symbolic) + 4 (read symbolic) + 6 (WR indirect/auto-inc) + 0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
      DECT  R2           * 10 (DECT) + 4 (read instruction 8-bit) + 0+0 (WR in scratchpad)
      JNE VMBR1          * 10 (JNE) + 4 (read instruction 8-bit)

This loop costs 116 cycles per two bytes, or 58 cycles per byte. The symbolic addressing is killing you, though, if you can free up another register, lose that. Assume VDPRD is in R5:

 

VMBR1 MOVB *R5,*R1+       * 14 (MOVB) + 4 (read instruction 8-bit) + 4 (read indirect) + 0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
     MOVB *R5,*R1+       * 14 (MOVB) + 4 (read instruction 8-bit) + 4 (read indirect) + 0+0 (WR in scratchpad) + 4+4 (indirect data write 8-bit)
     DECT R2             * 10 (DECT) + 4 (read instruction 8-bit) + 0+0 (WR in scratchpad)
     JNE VMBR1           * 10 (JNE) + 4 (read instruction 8-bit)

This loop needs 88 cycles per loop, or 44 cycles per byte, which is almost 40% faster than the original loop as long as you can guarantee you'll always pass even counts. ;)

 

Even so.. your VMBR and VMBW routines do almost as much setup as work in cases where you only copy 8 bytes at a time - you might consider just inlining them and using locally free registers, if you can, to save all that setup time. Large setup pays off better on longer loops. Consider this, here's your VMBR setup and teardown:

 

VMBR   MOV  R1,@R1HOLD    * 14+4 + 0 + 8+4 + 4+4 = 38
      MOV  R2,@R2HOLD    * 14+4 + 0 + 8+4 + 4+4 = 38
      MOVB @R0LB,@VDPWA  * 14+4 + 8+4 + 8+4 + 4 + 4+4 = 54
      MOVB R0,@VDPWA     * 14+4 + 0 + 8+4 + 4+4 = 38
...
      MOV  @R1HOLD,R1    * 14+4 + 8+4 + 0 + 0 = 30
      MOV  @R2HOLD,R2    * 14+4 + 8+4 + 0 + 0 = 30
      RT                 * 8+4 + 4 = 16

38+38+54+38+30+30+16 = 244 cycles. The BL is (12+4 + 8+4) = 28 cycles, making the total overhead 272 cycles.

 

But if you could reserve a register for the VDP read address (or initialize it with LI at a cost of 20 cycles), you could just unroll for 8 bytes:

 

       MOVB *R5,*R1+      * 14+4 + 4+0 + 6+0+0 + 4 + 4+4 = 40

Repeat 8 times for a cost of 320 cycles. The overhead of VMBR alone (before any work at all is done) is 85% of the entire time needed to unroll it inline. (And in 8-bit RAM it should not overrun).

 

The other thing is, that I suspect you might actually get better performance by not mirroring the sprite coordinates or screen table in CPU RAM -- but that depends on the application. It's not a bad idea to keep the sprite table in CPU RAM, for instance, to ease flickering or rapid updates without worrying about the VDP. Having separate addresses that you need to load might offset that advantage though, if you don't have them in RAM already it's probably almost as fast to read the sprite table in VRAM to get the coordinates.

 

Algorithm wise, it looks really solid, and the only thing that jumps out is more conceptual than practical... if you can find a way to skip over blocks of data you don't need to copy, you can save a little more time. (For instance, rows above or below the sprite don't need to be copied to the work buffer.) It's hard to say if the calculation to determine that can be made faster than a blind copy, though. :)

 

Nice routine, nice concept!

 

 

Having a special VMBW routine that clobbers 8 bytes at a time makes sense. That is the only read in the routine. But if I shove it into scratchpad then it may bonk out. I can't remember if it is setting the address without a delay that causes the error or reading too fast. I'll experiment. I have a feeling that cramming it all into no wait state memory will have the largest impact. Running with no wait states on my console doubles performance. Obviously that will be pared down as the VMBR will not be in the fast area but I imagine I can get at least a 40 percent performance boost over all.

Link to comment
Share on other sites

Having a special VMBW routine that clobbers 8 bytes at a time makes sense. That is the only read in the routine. But if I shove it into scratchpad then it may bonk out. I can't remember if it is setting the address without a delay that causes the error or reading too fast. I'll experiment. I have a feeling that cramming it all into no wait state memory will have the largest impact. Running with no wait states on my console doubles performance. Obviously that will be pared down as the VMBR will not be in the fast area but I imagine I can get at least a 40 percent performance boost over all.

 

I don't know if I'd bother putting it in scratchpad unless it's actually hurting your game's performance. I would optimize the heck out of it in 8-bit RAM first.

 

That said, I did a lot of theorhetical work on what's safe or not, and as long as you are not using register-indirect to register-indirect (ie: even register-indirect post-increment or symbolic addressing is good) then a normal machine won't have trouble on reads. Where you have to watch your timing is between setting the address and reading, you need a delay between those two steps.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...