Jump to content
IGNORED

VBXE example/tutorial - Using the blitter


Thelen

Recommended Posts

Hi everybody,

In order to (hopefully) show some of the basics of the VBXE blitter capabilities, I made this source where everything is somewhat more documentated and in plain 6502 code. (The pseudo opcodes of Mads are great!)

It can be assembled with Mads. At the top of the source you'll have to set the VBXE base adress manually. To assemble the source it also needs the file 'tiles_header_removed.bmp' - this is the stripdown pixeldata version of the Tiles.bmp file which is also in the .zip file.

What is blitter? I'll quote @danwinslow from the XDL VBXE tutorial:

Quote

A blitter is, as was said, a very fast memory transfer unit optimized for video memory targets and sometimes specifically for rectangular arrays of bits. It is intended to move arrays of bits in a 'block transfer' to a destination memory, and it usually has facilities for controlling how the bits get laid into the destination memory - via or, xor, or replacement.It can be used as the core of a 'sprite engine'. but it is not correct to say 'it is the sprite engine'. There still needs to be software wrapped around it that knows what a sprite is, what the color bits in a memory word mean, contains the actual sprite movement and detection logic, etc.



Any questions or comments? Feel free to ask!

 

 

 

 

 

VBXE blitter example.zip

Edited by Thelen
typo
  • Like 6
  • Thanks 1
Link to comment
Share on other sites

It looks like this example is leaving the ANTIC display on, but just setting the colors to black. Ordinarily, you'd want to actually turn off ANTIC playfield DMA via SDMCTL/DMACTL if exclusively using the VBXE display, as the GR.0 screen slows down the CPU by 30%.

 

This is a risky race condition:

;Start to execute the blitterlist
	LDA	#1
	STA	VBXE_BLITTER_START
;we don't check or wait until the blitterlist is done, because it's a small list :-)


;we want to move blitterblock 2 in VRAM, which is displaying the candles, so we
;increase the destination vram address of this 2nd blitterblock by inreasing byte 40 of the blitterlist: which 
;is the low byte destination address of the 2nd blitterblock. By doing this the candles
;will move horizontally
               inc            $8000+$28

 

It's modifying the second blit of a blit list immediately after starting the blitter. Thing is, the first blit takes long enough that this is actually modifying the second blit before it can execute, as the first blit takes ~8000 local cycles, or ~1000 computer cycles. This seems unintentionally subtle for a tutorial as in other circumstances this could lose the race instead, depending on DMA timing, interrupts, and how fast the first blit is.

  • Like 2
Link to comment
Share on other sites

The other thing re Antic DMA - if the banking is such that it's accesses are to VRAM, then there'd be wasted cycles since it takes priority over local accesses.

Worst case is still a fairly small fraction of the 14 Mhz available but potential to cause problems in a demanding demo or game situation.

  • Like 1
Link to comment
Share on other sites

        	.byte $ff    		; AND FF
        	.byte $00       	; XOR

 

Wow, so great the breadth and quality of examples being added every day, thanks so much.

 

I don't use MADS and am still holding out trying not to learn it --although one answer may be that hey, gotta learn it, that's the where the community is at.   But for now, I definitely appreciate the regular 6502 examples.

 

I do see this example is using the BCB configured as above.  And this may not be classified as a VBXE question, perhaps classified as more of a general programming question, but are there any non-mads examples of writing out sprites that overlap each other?

 

or even just a high level, walk me through the logic.  You save off the background, write your sprites.   Then next frame, replace the background, write back your sprites. Is that it?   usually I try to just read through previous work, and that makes it immediatley clear the strategy that is being used - it's just that with previous examples that I found being in MADS, I never really read them....usually just got to reading up on how mads macros work, and then most often life gets in the way, and always mean to get back to it....so real world, I usually find the project trailling off, having not yet learned mads.  thanks

 

 

 

 

Link to comment
Share on other sites

3 hours ago, Mark2008 said:
        	.byte $ff    		; AND FF
        	.byte $00       	; XOR

I don't use MADS and am still holding out trying not to learn it --although one answer may be that hey, gotta learn it, that's the where the community is at.   But for now, I definitely appreciate the regular 6502 examples.

 

I do see this example is using the BCB configured as above.  And this may not be classified as a VBXE question, perhaps classified as more of a general programming question, but are there any non-mads examples of writing out sprites that overlap each other?

 

or even just a high level, walk me through the logic.  You save off the background, write your sprites.   Then next frame, replace the background, write back your sprites. Is that it?   usually I try to just read through previous work, and that makes it immediatley clear the strategy that is being used - it's just that with previous examples that I found being in MADS, I never really read them....usually just got to reading up on how mads macros work, and then most often life gets in the way, and always mean to get back to it....so real world, I usually find the project trailling off, having not yet learned mads.  thanks

I am working on an example of animated sprites moving over a background.  I am using MADS but really none of the advanced features (since I am new to it).  I have unfortunately gotten stuck on the background restore operation.  I think I found the error in my logic.  Whenever I get it working, I'll use it as the basis for a Blitter write-up in the style of my XDL example I posted earlier.  I can't make any promises on when it will be ready.

 

You have the basic idea.  The steps for doing this without background graphics:
1 - Draw object
2 - Calculate new positions
3 - Swap buffers
4 - Clear screen
Goto 1

 

The steps for doing this with background graphics:

1 - Save background (only the area where the new sprite will be drawn)
2 - Draw object
3 - Calculate new positions
4 - Swap buffers
5 - Only restore the 16*16 patch that changed
Goto 1

  • Like 3
Link to comment
Share on other sites

Clearing the screen is a big operation.  If you're only using bitmap overlay for sprites then you can just draw then clear each one.  A wipe operation has advantage over a save/restore type in that it's much quicker.

That's what I did on Moon Cresta.  Start the clear operation where the last possible display of one could occur then draw the new ones when clear is finished.

You could chain the whole lot of BCBs together so it's all done for you.  Can also use blits to populate the BCBs with the positional data.

 

Though a bit wasteful, a trick to make address calculation faster can be to use 512 bytes as the scanline increment instead of 320.  In any case you'd probably want >320 anyway so that you can have sprites that go partially offscreen without wrapping around.

  • Like 2
Link to comment
Share on other sites

16 minutes ago, Rybags said:

Though a bit wasteful, a trick to make address calculation faster can be to use 512 bytes as the scanline increment instead of 320.  In any case you'd probably want >320 anyway so that you can have sprites that go partially offscreen without wrapping around.

Yeah - I am running a chained BCB for my mask then draw operation, which I also use as just a draw only by toggling the BL_ADR0 value.  Also for simplicity, I am using narrow mode so each line is only 256 bytes.  The advantage is, the Blitter's dest_adr0 is the "X position register", and dest_adr1 is the starting line or the "Y position register". 

Link to comment
Share on other sites

29 minutes ago, Rybags said:

Can also use blits to populate the BCBs with the positional data

Can you elaborate on that?  Is it worth setting up a blit operation to just change what is only a few bytes?  I am currently only working on a simple example, starting with one sprite.  Would the idea be that say for a shooter where there can be up to 32 sprites on screen, there would be 32 dedicated BCBs, one per sprite, and the main loop would be:
1 - Run game logic

2 - Calculate collisions and positions and scoring, etc.

3 - Blit everything in one chained operation

4 - Goto 1

Link to comment
Share on other sites

I'll try and get the latest Moon Cresta source and upload it.

I can't remember if I used blits to help calculate object addresses but at the least it's generally easier to keep them in a table then pick/place into BCBs with the blit.

An annoyance is that there's not a NOP type code for BCBs so you can just skip inactive objects.  I think for those I might have just rendered at pos 0,0 which would alway be offscreen.

Additionally you could reduce the size so you're not wasting too much time render/clearing something that's not visible.

  • Thanks 1
Link to comment
Share on other sites

You could set AND=$00 on the disabled sprites if they're blitted in transparent mode. It'd still take the time to blit the object, but you'd be spending that time anyway if the object were live, so it'd get you a more consistent frame time.

 

Alternatively, it should be possible to do a pack left operation with the blitter to delete BCBs from the list since it can compute partial sums, but at that point copying all of the BCBs back and forth would itself consume some bandwidth. Probably not advantageous unless you have some big blits to omit in a long list.

 

Link to comment
Share on other sites

1 hour ago, phaeron said:

Alternatively, it should be possible to do a pack left operation with the blitter to delete BCBs from the list since it can compute partial sums

Can you explain this (or point me to some reference material)?  I thought the only thing the blitter could do was just copy RAM with some optional binary operations (sum, xor, etc.).  I also learned that the SVBXE.SYS driver uses the blitter to draw lines (I am assuming some form of Bresenham algo)?  I can't even begin to think how to do this stuff.

Link to comment
Share on other sites

3 hours ago, Stephen said:

Can you explain this (or point me to some reference material)?  I thought the only thing the blitter could do was just copy RAM with some optional binary operations (sum, xor, etc.).  I also learned that the SVBXE.SYS driver uses the blitter to draw lines (I am assuming some form of Bresenham algo)?  I can't even begin to think how to do this stuff.

A partial sum operation converts a list of numbers into a list of partial sums of all numbers up to that point. For instance, given (1, 1, 0, 0, 1), the partial sum result is (1, 2, 2, 2, 3). This is a useful primitive for pack and unpack operations, since it can provide packed source or destination indices for a sparse list.

 

A pack left operation, in turn, removes some elements from a list and packs all of the remaining elements to the left. Given (1, 2, 0, 0, 5), the pack left operation to remove the zeroes would result in (1, 2, 5).

 

The partial sum operation converts a mask of elements to keep into the indices where they should go. In this example, the 3rd and 4th elements are being deleted, so the mask of elements to keep is (1, 1, 0, 0, 1) and the partial sum result says that the elements with a '1' in the mask should go to indices 1, 2, and 3. Thus, a pack left can be performed as follows:

  • Compute a mask of elements to keep.
  • Compute a partial sum to determine where they should go.
  • Copy the partial sum indices into a blit list to copy only the elements marked valid in the mask into the indices given by the partial sum. The unmarked elements can either have their blits nullified, or the copy order can be set up so that unmarked elements are overwritten by marked elements.

This technique is used in vectorized code using vector instruction sets, such as SSE/AVX or NEON. In AVX-512, there is a series of dedicated instructions for this: VPCOMPRESSB/W/D/Q. The VBXE blitter doesn't support these operations directly, but you can construct them with some trickery.

 

For a partial sum, the trick is to deliberately overlap source and destinations of the blit. Normally when moving a block of memory where source and destination overlap, you have to choose the direction to ensure that the copy is correct: ascending if source > destination and descending if source < destination. However, if you deliberately choose the "wrong" direction, then the blitter will re-read its own output in the middle of the blit. In particular, if destination = source + 1 for a step of +1 in add mode, then the blitter will compute a partial sum by repeatedly adding the last sum result to each successive byte in the array. Similarly, if you use EOR instead of ADD and do this vertically, the blitter will execute an EOR fill.

 

Computing masks is trickier but can be done with the aid of masking and stenciling operations. For instance, if you want to check for X > $65 for an entire array of X's, you can compute ((X + $1A) OR X) in two blits and bit 7 will indicate true/false. Then, stencil blitting this with AND=$80 on top of an array filled with $7F will convert these values to $7F/$80, after which subtracting $7F (add $81) converts to $00/$01. It's a bunch of blits, but keep in mind that the blitter executes these as fast as 1-3 elements per 6502 cycle, so this runs about 20x faster than the 6502 can do it, and it's done in parallel while the 6502 can do something else.

 

Not sure about doing a Bresenham with the VBXE blitter. The Amiga blitter can do Bresenham lines natively, but the VBXE can't, and it'd probably be hard to cobble it together. But you can do a fixed-point line routine with the VBXE blitter, assuming you have a fast way to divide.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

18 hours ago, Stephen said:

The steps for doing this with background graphics:

1 - Save background (only the area where the new sprite will be drawn)
2 - Draw object
3 - Calculate new positions
4 - Swap buffers
5 - Only restore the 16*16 patch that changed
Goto 1

Exactly, it's like that. When I first made Simon of Castlevania walk, it was done with a few blitters like that. Sometime later I needed more sprites onscreen, and I ended up with a very big blitterlist containing 54 blitterblocks.... It was my first try out to test if something can be ran with such a list. (the other choice was, and more effective would be to end the list before BCB's that do nothing will be done. But then there will be a extra process that the Atari checks which ones should be done and which not, and move them in the list.

I was amazed that 54 blitterblocks and 11 separrate blitterblocks could be done!

 

Now I set the Height and width to 0 for blitterblocks that doesn't do anything. Only thing is that the pixel of the BCB top-left should be 0/transparant otherwise there is a pixel somewhere onscreen 🙂

This is my spite blitterlist in Castlevania (I also have a separate blitterlist for the tile graphics):

 

;Blitlist sprites


;Blitterblocks 0-17 are all 'restore background' blits

;0 restore object1/enemy1 bg 
;1 restore object2/enemy2 bg 
;2 restore object3/enemy3 bg  
;3 restore object4/enemy4 bg  
;4 restore object5/enemy5 bg 
;5 restore object6/enemy6 bg  
;6 restore object7/enemy7 bg  
;7 restore object8/enemy8 bg  
;8 restore background hand back 
;9 restore background whip back top 
;10 restore background whip back bottom 
;11 restore background hand front 
;12 restore background whip front 
;13 restore subweapon1 bg  
;14 restore subweapon2 bg 
;15 restore background Simon top 
;16 restore background Simon bot ***NOT USED, all done by 15
;17;restore BG EXTRA


Blitterblocks 18-36 are all 'save background to a temporary space' blits (the same order like the restore)

Blitterblocks 37-54 are all 'Draw sprite' blits (the same order like the restore)


As you can see, like the original NES version, only 8 objects/enemy's can be onscreen. Whip saving/drawing could be probably done more efficient, but I didn't know what to expect with all the blits...

It amazed me a lot that I at firstl could make a scrolling screen with 11 blittercommands...and still had enough cycles to draw all the sprites with all the BCB's, calculating X/Y positions to VRAM posititions, doing soft collisioncheck, let Simon and the objects walk threw the leveldata and react on it...and This with Antic on 🙂 (If I need more CPU cycles I can turn it off, but I don't expect that, I'm hoping to use a mode4 background behind the VBXE GX)

 

Edited by Thelen
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

8 hours ago, phaeron said:

A partial sum operation converts a list of numbers into a list of partial sums of all numbers up to that point. For instance, given (1, 1, 0, 0, 1), the partial sum result is (1, 2, 2, 2, 3). This is a useful primitive for pack and unpack operations, since it can provide packed source or destination indices for a sparse list.

 

A pack left operation, in turn, removes some elements from a list and packs all of the remaining elements to the left. Given (1, 2, 0, 0, 5), the pack left operation to remove the zeroes would result in (1, 2, 5).

 

The partial sum operation converts a mask of elements to keep into the indices where they should go. In this example, the 3rd and 4th elements are being deleted, so the mask of elements to keep is (1, 1, 0, 0, 1) and the partial sum result says that the elements with a '1' in the mask should go to indices 1, 2, and 3. Thus, a pack left can be performed as follows:

  • Compute a mask of elements to keep.
  • Compute a partial sum to determine where they should go.
  • Copy the partial sum indices into a blit list to copy only the elements marked valid in the mask into the indices given by the partial sum. The unmarked elements can either have their blits nullified, or the copy order can be set up so that unmarked elements are overwritten by marked elements.

This technique is used in vectorized code using vector instruction sets, such as SSE/AVX or NEON. In AVX-512, there is a series of dedicated instructions for this: VPCOMPRESSB/W/D/Q. The VBXE blitter doesn't support these operations directly, but you can construct them with some trickery.

 

For a partial sum, the trick is to deliberately overlap source and destinations of the blit. Normally when moving a block of memory where source and destination overlap, you have to choose the direction to ensure that the copy is correct: ascending if source > destination and descending if source < destination. However, if you deliberately choose the "wrong" direction, then the blitter will re-read its own output in the middle of the blit. In particular, if destination = source + 1 for a step of +1 in add mode, then the blitter will compute a partial sum by repeatedly adding the last sum result to each successive byte in the array. Similarly, if you use EOR instead of ADD and do this vertically, the blitter will execute an EOR fill.

 

Computing masks is trickier but can be done with the aid of masking and stenciling operations. For instance, if you want to check for X > $65 for an entire array of X's, you can compute ((X + $1A) OR X) in two blits and bit 7 will indicate true/false. Then, stencil blitting this with AND=$80 on top of an array filled with $7F will convert these values to $7F/$80, after which subtracting $7F (add $81) converts to $00/$01. It's a bunch of blits, but keep in mind that the blitter executes these as fast as 1-3 elements per 6502 cycle, so this runs about 20x faster than the 6502 can do it, and it's done in parallel while the 6502 can do something else.

 

Not sure about doing a Bresenham with the VBXE blitter. The Amiga blitter can do Bresenham lines natively, but the VBXE can't, and it'd probably be hard to cobble it together. But you can do a fixed-point line routine with the VBXE blitter, assuming you have a fast way to divide.

Nice tricks there…

 

re: line… me used VBXE for scan edge. CPU calced the slope in 8.8 format. CPU sets start x position in left edge table and/or right edge. Then with different blitter passes VBXE picks out of a huge 256x256 table (0-255 with 256x each same value) with slope as step value… that value gets added to start x position and written into start adress of 200 line BCB list. That’s for the starting position. Now the length calculated?

 

well… different passes needed there is no SUB but can be done with ADD and EOR FF and ADD+1… and run over the BCB list. Polygon color is blitted into the appropriate byte of each BCB…. Stop is set by termination byte of the blittlist at last scanline needed… and then you can start the blit…. Don’t forget to reset that termination byte after blit.

  • Like 1
Link to comment
Share on other sites

On 1/21/2023 at 8:38 AM, Thelen said:

This is my spite blitterlist in Castlevania (I also have a separate blitterlist for the tile graphics):

When you do the three main operations (Restore, SaveMask, DrawNew) do you do this to the same screen buffer?  I don't know what I am doing wrong but I've been stuck on this damn restore operations for days now.  This should be so simple but something is just not "clicking" for me!  I'm making changes to my code now to allow for operating on a larger amount of sprites and using separate BCBs for each operation rather than swapping a bunch of values trying to reuse.  Maybe that will help.

Link to comment
Share on other sites

1 hour ago, Stephen said:

When you do the three main operations (Restore, SaveMask, DrawNew) do you do this to the same screen buffer?  I don't know what I am doing wrong but I've been stuck on this damn restore operations for days now.  This should be so simple but something is just not "clicking" for me!  I'm making changes to my code now to allow for operating on a larger amount of sprites and using separate BCBs for each operation rather than swapping a bunch of values trying to reuse.  Maybe that will help.

Are you saving the background underneath each sprite as you draw them or all together? If you do them batched with all restore first, then all saves, then all draws, then only the order of draws matters -- the saves and restores can be executed in any order within each pass. If you're doing save-draw interleaved for each object at a time, then the restores must occur in reverse order of the saves as some objects may save a portion of other objects that have been drawn before them.

 

  • Thanks 1
Link to comment
Share on other sites

4 hours ago, phaeron said:

Are you saving the background underneath each sprite as you draw them or all together? If you do them batched with all restore first, then all saves, then all draws, then only the order of draws matters -- the saves and restores can be executed in any order within each pass. If you're doing save-draw interleaved for each object at a time, then the restores must occur in reverse order of the saves as some objects may save a portion of other objects that have been drawn before them.

 

Just thinking what would

be the best on VBXE. Maybe I would do it by sprite with special blitlist per sprite. Otherwise you need to reorder when writing back reverse…

 

or the cheap way double buffer the complete screen…. Or have a 2nd copy of the background without any sprites and copy from there for restore…

 

not sure what is the best through. 

Link to comment
Share on other sites

Save/restore of the entire screen would be pretty expensive.

Using the blit to prepare BCBs can save a lot of CPU time.

In most cases having save/restore operations for sprites shouldn't be too much of a problem.

Or double-buffer with save/restore also which can take away the problem of animation tearing.

Link to comment
Share on other sites

18 hours ago, Stephen said:

When you do the three main operations (Restore, SaveMask, DrawNew) do you do this to the same screen buffer?

I have separated the Restore operation from the Savemask & drawnew, with after the first 18 blitterblocks a end block, en then pointing the blitterlist to start with blitterblock 19 when saving and drawing...Sorry for the confusing explanation of doing 2 separte blitterlists - it are technically 3 different operations (but the blitterlist of sprites felt like 1 and for the tiles also 1 😂 )

 

 


 

 

 

Edited by Thelen
Link to comment
Share on other sites

11 hours ago, Heaven/TQA said:

Just thinking what would

be the best on VBXE. Maybe I would do it by sprite with special blitlist per sprite. Otherwise you need to reorder when writing back reverse…

It is actually the opposite. You do need to reverse the order if you are alternating saving and drawing sprites, you don't if you batch saving together. If you haven't drawn any sprites yet, the order of individual sprites in the save or restore pass doesn't matter -- any place where they overlap, they'll save or restore the same pixels.

 

The simplest case is if you have a solid background, where there is no save and erase is a clear operation. It doesn't matter if sprite A erases before or after sprite B if they're both just going to write $00 to the same addresses.

 

When using a static background with a background buffer, there is no save and the erase is a copy back from the background buffer to the framebuffer. Again, order doesn't matter, because either way the sprites will copy back the same pixels to the same areas whether they overlap or not.

 

When using a static background without a background buffer, the save is a copy from FB to save buffers and the erase is the reverse copy. Once again, sprite order doesn't matter, because overlapping sprites will save the same pixels and then restore the same pixels. The only requirement is that all of the erases happen before any saves, else you will need to do them in a specific order.

 

Two basic 16x16 copy blits take 1066 local cycles or 133.25 machine cycles, so if you run the blitter flat out with an optimized blit list you can save and restore 16 sprites of 16x16 size and still be within the vblank. In contrast, a 320x192 restore takes at least 134 scanlines to do -- it's really expensive.

 

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

Well, I'm back at it.  Getting better at using the debugger.  It's no wonder my screen is not rendering correctly - LMAO.  Incorrect initial BCB setup, incorrect BCB changes during the flip screen code.  Running my code single frame at a time and lining things up this way was a huge help.

Debug.thumb.jpg.3f6fe4df676f3e72a3575eee40d31bbf.jpg

  • Like 1
Link to comment
Share on other sites

here is an example i have been working on in the past few days.

i was using VBXE over ANTIC mode 4.

the sprite is drawn on VBXE  memory area whereas the background is drawn on ANTIC mode 4 area.

i had few hurdles along the way and with the help of excellent AA members! I've overcome most of them.

i am going to cancel my all summary tutorial thread and create new thread for a VBXE step by step coding example using MADs assembly code.

the result of the step by step is what you see below. 

 

image.thumb.png.9cbd0cc72aa16c6d3f60e866b65e15f8.png

 

here is executable:

vbxe.xex

 

 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...