Jump to content
IGNORED

techniques for loading back-to-back sprite data: general discussion


grafixbmp

Recommended Posts

I've been sick for a few weeks and haven't had the energy to post anything new lately. I have been bouncing balls around in my head about the many games that use multiple sprite tricks to get higher res graphics on the screen and was wondering what some of your tricks are and some of the ones used in the past for commercial games.

 

I know the just of certain limitations like . the traditional is 6 seamlessly back-to-back for 48 pixels and some use the 2 missles for extra pixels. This keeps from having to do either flicker or blinds. I know there are others out there that DO NOT (as far as I can tell) use flicker or blinds or both to get more than 6 across however, these are not reloadable per line with new data.

 

I am not sure if I totaly understand everything that goes on but am I correct in asuming that when the sprites are alligned properly and triple sprites with proper spacing is set then in order to get new data into each sprite, the first 2 are loaded right off the bat long before there are even displayed then A X and Y are all loaded with data and then just written to each of the coresponding sprites as they come and then there is just enough timing left to do a load and store for the last one? the minimum needed being 5 cycles.

 

I have long since wondered about if it is possible to sacrifice part ofd the resolution to get more screen coverage and possible more time to load more sprites. like double width sprites with proper apacing to have 2 color clock pixles instead of just 1 clock spacing. I bet I an obviously missing something here cause4 I have never heard of or seen this done before. I bet the triple sprites, spacing, and double width is not possible together.

 

I would like to know what some of your techniques are for time constraints when loading data like this as fast as possible? can more than 6 sprites be done where each has new data without flicker/blinds??

Link to comment
Share on other sites

I am not sure if I totaly understand everything that goes on but am I correct in asuming that when the sprites are alligned properly and triple sprites with proper spacing is set then in order to get new data into each sprite, the first 2 are loaded right off the bat long before there are even displayed then A X and Y are all loaded with data and then just written to each of the coresponding sprites as they come and then there is just enough timing left to do a load and store for the last one? the minimum needed being 5 cycles.

 

That almost works, but not quite. Either the last pixel of the first and third sprites or the first pixel of the fourth and sixth sprites will have to share data if that approach is used. If the appropriate columns of data happen to be blank (as may be true of some 'scoring' fonts) that's not a problem. Nonetheless, it's possible to avoid that restriction.

 

The trick is to use the so-called "vertical delay" latches. If the "delay" latch for a sprite is set, changes to that sprite's shape won't be displayed until the next time the other sprite is written. If sprites are written on alternate scan lines, this will shift a sprite vertically. There is no requirement that the latches be used for vertical positioning, however; indeed, they're often useful (essential) for other purposes.

 

The normal approach for a "six-digit score" kernel, assuming player 0 is to the left of player 1, is to store the first three bytes of data into GRP0, GRP1, and GRP0, in that order. Note that at this point player 0 will show the first byte loaded and player 1 the second. Load A, X, and Y with the remaining three bytes of data. Then just as the beam is about to clear player 0, store the fourth byte to GRP1 (which will set player 0 to start displaying the last value written there, and will latch player 1's data but won't cause it to be displayed yet), then store the fifth byte to GRP0 (display new data for player 1; latch but don't display new data for player 0), and the sixth byte to GRP1. Then store anything into GRP0 (this last step is necessary to make player 1 show the last value written).

 

There are 32 pixels between the end of the first copy of player 0 and the start of the last copy of player 1. Each cycle moves the beam three pixels. If one does four stores consecutively, the beam will thus move 27 pixels between the end of the first and the end of the last. If one does three stores, a TSX, and a fourth store, the beam will move 33 pixels between the end of the first and last stores. My 1994 kernel that would later get built into Strat-O-Gems used this approach for displaying gems (since COLUPx doesn't have delay latches).

Link to comment
Share on other sites

I have long since wondered about if it is possible to sacrifice part ofd the resolution to get more screen coverage and possible more time to load more sprites. like double width sprites with proper apacing to have 2 color clock pixles instead of just 1 clock spacing. I bet I an obviously missing something here cause4 I have never heard of or seen this done before. I bet the triple sprites, spacing, and double width is not possible together.

 

Showing more than one double-width or quad-width copy of a sprite on a line will require writing NUSIZx with some rather precise timing. Since at least two different values will have to be written to NUSIZx, there's not a whole lot that can be done with this technique.

 

A more interesting approach is the "multi-resp" trick. Writing to RESP has the following effects:

 

-1- If a sprite is about to appear within three pixels of the beam's position, it will be displayed two pixels from the beam's position.

 

-2- If the second or third sprite copies are enabled, they will appear on the current line (18, 34, or 66 pixels from the beam's position at the time of the store) as well as subsequent lines.

 

My AtariAge logo for Stella's Stocking shows sixteen sprites using venetian blinds (that's eight per line!). There aren't enough cycles to make all the sprite shapes different, but there are enough cycles to spell out "ATARIAGE" in nice big letters.

Link to comment
Share on other sites

I am not sure if I totaly understand everything that goes on but am I correct in asuming that when the sprites are alligned properly and triple sprites with proper spacing is set then in order to get new data into each sprite, the first 2 are loaded right off the bat long before there are even displayed then A X and Y are all loaded with data and then just written to each of the coresponding sprites as they come and then there is just enough timing left to do a load and store for the last one? the minimum needed being 5 cycles.

 

The trick is to use the so-called "vertical delay" latches. If the "delay" latch for a sprite is set, changes to that sprite's shape won't be displayed until the next time the other sprite is written. If sprites are written on alternate scan lines, this will shift a sprite vertically. There is no requirement that the latches be used for vertical positioning, however; indeed, they're often useful (essential) for other purposes.

 

 

 

doesn't the vertical delay cause double pixel heigth in the sprite graphics? I figured this was used more for flicker/blind routines than anything when getting hi res graphics. but I probably have touble following the steps right. Let me read it a bit more closely and get back to ya.

Edited by grafixbmp
Link to comment
Share on other sites

Sorry to double post, but on a side note, IMHO the best resource I have found for the tia that I have been using lately, is at http://www.howell1964.freeserve.co.uk/Atar...description.htm

I now see why This would be very hard to do. however, If NUSIZ is changed after the last sprite is displayed for P0 from xxxxx011 to xxxxx100, would this cause the sprite to finish out the line with 2 sprites wide and the same done to p1 also 2 sprites wide . The only problem with this is that you would have 48 pixles with a 16 pixel gap and then 16 more new pixels? like:

 

XOXOXO__XO

 

If there was a way to fix or patch that gap with something like the vertical delay somewhow. I don't know, this may not even work. the missles could be used but that would only bring it from 16 down to 14 empty pixels.

 

 

P.S. I recall someone once said that there is redundant memory in the TIA for the sprites that is mirrored and thought of a way to have diffrent data in each one and use it with precise timing to shift between the two. Is this true?

 

P.P.S.

 

HOLD THE PHONE! I could be wrong cause it may take up too many cycles but most if not all don't even have to happen when it is being drawn. all could happen in Hblank. I may have found a left field way to get not just 2 extra pixles with missles but a sudo 4 extra depending on how many cycles are left after drawing all the player sprite satas.

 

just a matter of calling up about 15 predefined routines with knowing what needs to be done.

 

First having missle 0 and 1 H trigger preset

 

then, depending on what is needed to be displayed, not using all at once but just the setting needed:

 

write one of the playfield register at the right time on screen with just a single bit at a precise location

 

changing none, one or the other or both of only D4 of the NUSIZs

 

setting the missles on or off

 

setting HMM0 and/or HMM1

 

of course HMOVE falls into place sometimes.

 

Not all these are used together only just a few. But with great timing and some engenious coding, the 2 missles and one the the playfield pixels can be used to make atleast 4 possible pixles apear in one spot verticaly depending on what 16 pissible combinations are needed.

 

I guess this may not be possible but if there are enough cycles, I am only thinking about like a demo cart not an actual game senario.

Edited by grafixbmp
Link to comment
Share on other sites

doesn't the vertical delay cause double pixel heigth in the sprite graphics?

No, it doesn't-- that is, not unless you use it that way. I prefer to call them "video delay" registers, because that describes their function more accurately.

 

When you enable the VDELP0 and VDELP1 features, you can write shape data to GRP0, but the player0 sprite won't display the new shape data until you subsequently write to GRP1. Likewise, you can write shape data to GRP1, but the player1 sprite won't display the new shape data until you subsequently write to GRP0. It doesn't matter whether you do all of this on the same scan line, or on different scan lines. If you write to GRP0 and GRP1 on alternating scan lines, then you do get double pixel height. But if you write to them on the same scan line-- while displaying multiple copies of each player-- then you can get different graphics for each copy of the players on the same scan line.

 

P.S. I recall someone once said that there is redundant memory in the TIA for the sprites that is mirrored and thought of a way to have diffrent data in each one and use it with precise timing to shift between the two. Is this true?

That sounds like it may have been a garbling of two different things.

 

If you store the lines (bytes) for a player shape in a ROM table so they can be loaded on different scan lines, you can use the same data to display a mirrored copy of the player shape, simply by enabling the mirror feature. You don't need to store a second ROM table for the mirrored shape, because that would be redundant, and a waste of ROM.

 

On the other hand, the TIA contains two graphics registers for each player. Normally-- when VDELPx is disabled-- writing to GRPx will immediately change the player's shape. But if VDELPx is enabled, writing to GRPx will store the new shape in one register, yet the player's shape will still be drawn from the other register. Then, when you write the other player's data, the first player's data will be copied from one register to the other, such that the first player is now drawn with the new data.

 

Let's call these registers GRP0a, GRP0b, GRP1a, and GRP1b.

 

   LDA #1
  STA VDELP0; enable video delay for player0
  STA VDELP1; enable video delay for player1

  LDA #%00000001
  STA GRP0; player0 doesn't display %00000001 yet

  LDA #%00000010
  STA GRP1; player1 doesn't display %00000010 yet, but player0 now displays %00000001

  LDA #%00000100
  STA GRP0; player0 doesn't display %00000100 yet, but player1 now displays %00000010

  LDA #%00001000
  STA GRP1; player1 doesn't display %00001000 yet, but player0 now displays %00000100

; etc.

 

The advantage of using video delay like this is that you can store the first 3 shapes for player0 copy 1, player1 copy 1, and player0 copy 2, but player0 still shows the shape for copy 1.

 

Then you can load the next 3 shapes into A, X, and Y, in preparation for updating the graphics.

 

Then you wait until the first copy of player 0 is just about to be finished, and you start storing the other 3 shapes one after the other, writing the third shape to player0 as well as to player1:

 

   LDA #1
  STA VDELP0
  STA VDELP1

  LDA player0copy1
  STA GRP0

  LDA player1copy1
  STA GRP1; now player0 displays player0copy1

  LDA player0copy2
  STA GRP0; now player1 displays player1copy1

  LDA player1copy2
  LDX player0copy3
  LDY player1copy3

; wait for the right moment, then

  STA GRP1; now player0 displays player0copy2
  STX GRP0; now player1 displays player0copy2
  STY GRP1; now player0 displays player0copy3
  STY GRP0; now player1 displays player0copy3

 

In practice, you'll want to use Y as an index into the shape tables, so you'll need to store the Y index value before you load Y with shape data, and then restore the Y index value afterward. You can also use the "illegal" opcode LAX to save time:

 

   LDA #1
  STA VDELP0
  STA VDELP1

  LDY #7; assuming we want to draw 8 lines (7, 6, 5, 4, 3, 2, 1, 0)

loop

  STY temp1; save the index for later

  LDA (player0copy1),Y
  STA GRP0

  LDA (player1copy1),Y
  STA GRP1

  LDA (player0copy2),Y
  STA GRP0

  LAX (player1copy2),Y; load A and X at the same time

  LDA (player0copy3),Y; load A
  STA temp2; save it for a moment

  LDA (player1copy3),Y; load A with the last value
  TAY; move it to Y

  LDA temp2; restore A from the saved value

; wait for the right moment, then

  STX GRP1
  STA GRP0
  STY GRP1
  STY GRP0

  LDY temp1; restore the index to Y
  DEY
  BPL loop

Of course, there will most likely be a STA WSYNC in there somewhere. :)

 

Michael

Link to comment
Share on other sites

Of course, there will most likely be a STA WSYNC in there somewhere. :)

 

Why? If you need a WSYNC someplace, that means you haven't filled up your cycle allotment and you should put more stuff on the screen. :)

 

Well, sometimes a WSYNC is okay if there are parts of the kernel that are a lot less busy than others and there really isn't anything good to put there (e.g. in the parts of the screen between vertically-separated enemies). :) But I'd really suggest learning to do busy kernels without WSYNC. It will give you more than a 4% boost in CPU horsepower power, which on the 2600 is a lot.

Link to comment
Share on other sites

Of course, there will most likely be a STA WSYNC in there somewhere. :)

 

Why? If you need a WSYNC someplace, that means you haven't filled up your cycle allotment and you should put more stuff on the screen. :)

 

Well, sometimes a WSYNC is okay if there are parts of the kernel that are a lot less busy than others and there really isn't anything good to put there (e.g. in the parts of the screen between vertically-separated enemies). :) But I'd really suggest learning to do busy kernels without WSYNC. It will give you more than a 4% boost in CPU horsepower power, which on the 2600 is a lot.

That's why I said "will most likely be." ;) It seems like I'm more often looking for ways to reduce cycles so I can squeeze in all the things I want to do, rather than wasting more cycles. In those cases, WSYNC is certainly the first thing to go, maybe followed by unrolling the loop (if that's feasible).

 

Michael

Link to comment
Share on other sites

I have long since wondered about if it is possible to sacrifice part ofd the resolution to get more screen coverage and possible more time to load more sprites.

This hasn't been specifically answered, so I'll throw this out there.

 

First of all, you can't have duplicate copies of anything other than single-width sprites. Well, you can use multiple-RESP tricks like supercat explains above.

 

But if resolution isn't really a big deal, you can cover 64+ pixels with two quad-width sprites (the '+' comes from using the missiles and/or ball), which is a pretty huge improvement on the 48-52 that the standard 6-digit trick gets you.

Link to comment
Share on other sites

I have long since wondered about if it is possible to sacrifice part ofd the resolution to get more screen coverage and possible more time to load more sprites.

This hasn't been specifically answered, so I'll throw this out there.

 

First of all, you can't have duplicate copies of anything other than single-width sprites. Well, you can use multiple-RESP tricks like supercat explains above.

 

But if resolution isn't really a big deal, you can cover 64+ pixels with two quad-width sprites (the '+' comes from using the missiles and/or ball), which is a pretty huge improvement on the 48-52 that the standard 6-digit trick gets you.

 

I only said part of the resolution not all of it.

besides, with twice the space covered in 2 sprites, could the horizontal position triger be repulsed more than once per scanline especialy with more time to do it and relaod sprite data with double width sprites. maybe4 doublew width sprites with maybe the missles as buffer spacers? The vertical bit shift might also help with this too. The acumulator would need to be free to do extra stuff.

Link to comment
Share on other sites

besides, with twice the space covered in 2 sprites, could the horizontal position triger be repulsed more than once per scanline

 

Not without changing NUSIZx to show multiple copies and then changing it back just before the sprite is actually displayed.

 

I should mention a fun little wrinkle with the 4x sprites, though: if the sprites are positioned so that their pixel boundaries are 2 pixels away from playfield-pixel boundaries, then if the sprite and playfield colors match it's possible to show what will look like a 66-pixel sprite with half resolution, provided there's no need to show a single lighted 'double' pixel in isolation. The left and right halves of the screen could be different colors if 'score' mode is used, provided that either the playfield pixel just to the left of center is unused or else the right edge of it is covered by Player 0 or Missile 0. Otherwise, a tiny sliver of the rightmost part of that playfield pixel--about 1/2 the width of a normal pixel--will appear in the wrong color. The actual color that appears there may change as a 2600 warms up.

Link to comment
Share on other sites

Several keep talking about using the vertical delay for storing p0 copy2. the third in the line. but not the vertical delay for p1 copy 2. Is this so you can alternate on each line with the vertical delay for the 2 sprites? Just wondering since each player has a vertical delay right? so technicaly you could do 4 bytes of data for the sprites for a single line without having to load any other registers.

Link to comment
Share on other sites

Several keep talking about using the vertical delay for storing p0 copy2. the third in the line. but not the vertical delay for p1 copy 2. Is this so you can alternate on each line with the vertical delay for the 2 sprites? Just wondering since each player has a vertical delay right? so technicaly you could do 4 bytes of data for the sprites for a single line without having to load any other registers.

Some code might help illustrate what's happening here:

Both P0 and P1 are VDELed.

   ldy #5
ScoreKernelLoop			;		59
  SLEEP 3
  dey					 ;+5	  64
  sty Temp+7			  ;+3	  67
  lda (MiscPtr),Y
  sta GRP0				;+8	  75   VDEL
  lda (MiscPtr+2),Y
  sta GRP1				;+8	   7   Now GRP0 is written to screen
  lda (MiscPtr+4),Y
  sta GRP0				;+8	  15   Now GRP1 is written to screen
  lda (MiscPtr+6),Y
  tax					 ;+7	  22
  lda (MiscPtr+8),Y
  pha					 ;+8	  30
  lda (MiscPtr+10),Y
  tay					 ;+7	  37
  pla					 ;+4	  41
  stx GRP1				;+3	  44   Now GRP0 is written to screen
  sta GRP0				;			 Now GRP1 is written to screen  
  sty GRP1				;			 Now GRP0 is written to screen
  sty GRP0				;+9	  53   Now GRP1 is written to screen
  ldy Temp+7			  ;+3	  56
  bne ScoreKernelLoop	 ;+3	  59

Link to comment
Share on other sites

Several keep talking about using the vertical delay for storing p0 copy2. the third in the line. but not the vertical delay for p1 copy 2. Is this so you can alternate on each line with the vertical delay for the 2 sprites? Just wondering since each player has a vertical delay right? so technicaly you could do 4 bytes of data for the sprites for a single line without having to load any other registers.

For the "score" trick, you need to use both. With VDELP0 enabled, you can write to GRP0, but it won't update the player0 sprite until you subsequently write something to GRP1. With VDELP1 enabled, you can write to GRP1, but it won't update the player1 sprite until you subsequently write something to GRP0. This is shown in the comments in the sample code.

 

Michael

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...