Jump to content
IGNORED

VBXE speed


Recommended Posts

Yes - incredible savings potential. Generally source data will be in some sort of array anyhow so easily picked/placed by a blit whereas with the CPU it's lots of work since the target address will be +1 a few times then +21 from the first one for the next BCB. For several objects not a problem but I imagine moving polygons like you have would be into the hundreds.

  • Like 1
Link to comment
Share on other sites

There is currently an emulation limitation in Altirra where it will only "flush" blit lists at the end of a scanline. For short blits like this, this is slower than the real hardware, which does not have this artifact. However, it's still the case that you will get much better performance with blit lists instead of individual blits. You don't want to use the CPU to copy the left and right edges into the blit list -- put your ledge and redge lists in VRAM and let the blitter do the copies. This is a prerequisite anyway for doing the edge stepping on the blitter.

  • Like 1
Link to comment
Share on other sites

Right now CPU is building the edge lists (thanks of not having fraction steps in Blitter ;)) which are aligned in vram window.

 

Then 2 copy bcbs copy the right edge into the scanline BCB list and the color.

 

Unfortunately as the Blitter has no 2nd source channel CPU calculates the sizex for span length and writes them into the blitlist.

 

Then vbxe blits the poly.

 

Don't see that how to do that with Blitter....

Link to comment
Share on other sites

Yeah, the blitter has it's limitations. It's got the basic add function but without carry it's not greatly useful.

 

Are you using "constant source data" ? Refer fx1.24.pdf page 37:

 

 


The Blitter and constant source data
If the result of the following equation:
(blt_and_mask==0)
is true, then the source data is CONSTANT – it is independent from the source area and
its value is equal to blt_xor_mask. The Blitter will skip the phase of fetching the source
data, and the entire operation will be performed quicker. Filling VRAM with a constant
value is twice as fast as copying.

 

Set the AND mask in the BCB to 00 which instructs the blit to not fetch source data, instead whatever is in the XOR mask is used as the fill data.

  • Like 1
Link to comment
Share on other sites

I simply copied data with standard copy of the edge buffer into blitlist.

 

The colors of spans are filled in with xor.

 

Not sure if it makes sense to blit constant array values into xor and then blit in burst mode into blitlist.

 

Sounds stupid operation ;) but if that works.

Link to comment
Share on other sites

Do you mean XOR as in to force the fast fill mode... not XOR as in XOR to plot then second XOR to unplot?

 

I imagine doing it that way would be very slow... fastest method would probably be to just have a single fill/erase blit that wipes all possible memory that the polygons can occupy each time.

Link to comment
Share on other sites

Phaeron.... how accurate would you describe the emulation level of VBXE in Altirra?

 

Probably about 90%. Attribute map is probably the biggest issue as attribute map collision is not implemented and attribute map cells narrower than 8 pixels do not work authentically -- they are clamped to 8 pixels wide instead of rendering narrower and then running out of data. MEMAC, overlay, and blitter should be feature complete. The emulation is not cycle exact and you will encounter small differences if you attempt to race the beam very tightly. Also, MEMAC cycles are not counted against the blitter.

 

Another thing to keep in mind is that the emulator emulates core version 1.24. The current version is 1.26 and there have been some changes to overlay priority. Since there are multiple versions in the wild, you will probably want to try to work on both and maybe even 1.09 as well.

 

Right now CPU is building the edge lists (thanks of not having fraction steps in Blitter ;)) which are aligned in vram window.

 

Then 2 copy bcbs copy the right edge into the scanline BCB list and the color.

 

Unfortunately as the Blitter has no 2nd source channel CPU calculates the sizex for span length and writes them into the blitlist.

 

Then vbxe blits the poly.

 

Don't see that how to do that with Blitter....

 

Assuming you always have left <= right, use one blit to copy right into width, a second blit to add left into it with XOR $FF, and a third blit to add constant $01. A + (B XOR $FF) + 1 = A - B and the three blits cost 8 VBXE cycles per entry.

 

Add with carry CAN be done in a rather expensive way: create a 64K lookup table and do one blit per operation to look up the result from the two bytes. I think it's also possible to do it with 7-bit math instead of 8-bit math, by using the 8th bit as a carry bit. It can be shifted down by ANDing with $80 during a stencil blit on top of $7F, leaving either $7F or $80. It takes a lot of steps to do all this, but keep in mind that given a big enough blit the blitter is still more than 20x faster than the 6502.

  • Like 1
Link to comment
Share on other sites

Ok... another issue regarding stepx,y esp source.

 

The doks are somehow misleading.

 

Assume a texture line 0-4095.

 

And I want to scale that on a size of maybe 64.

 

Which would mean a stepx of 4096/64 = 256

 

But stepx works in ranges -128 to 127.

 

And stepy get be added after one line.

 

My size x 64

Link to comment
Share on other sites

Yes, if you are trying to downscale a 4096x1 image to 64x1, the X step is too large to fit. What you can do is interpret it as 1x4096 and use Y step instead. Both the X step and Y step are controllable, so you are not required to have X and Y match your actual X and Y in the bitmap.

 

Using the blitter step to scale will only get you integer factors, though, so it's not going to work for texture mapping if that's what you're thinking.

  • Like 1
Link to comment
Share on other sites

Have to wonder though - would just having "zoom tables" be more memory efficient than replicating an object massively oversized?

 

I guess a zoom table for a 100 pixel wide object could potentially be 10,000 entries or make it 20,000 if you want to represent each size from 1 pixel to 200 wide.

Doing it the 256 per pixel way to allow that method of pick and place would come to 25,600 bytes.

I suppose it comes down to what zoom factors, is there more emphasis on enlarge or reduce mode? The advantage of a zoom table might be that you only need one and it can be used for multiple objects, though you'd probably need to align each object on some address boundary.

 

Another possibility - with graphics card textures on PCs, not sure if it's still done - they keep multiple copies of each texture, each one 50% reduced from the previous. Once the displayed size drops to 50% it starts using the next reduced size.

Edited by Rybags
Link to comment
Share on other sites

Native VBXE zoom horizontally is faster than a normal copy operation since you have N replications where only the first one needs the source read.

As Avery said earlier, Y zoom has no such advantage since the data can't be buffered so has to be re-read each time.

Link to comment
Share on other sites

Zoom-X is easy since it's doing the copy/fill operation in a linear fashion and the "current" value is being repeated.

Zoom-Y not quite so since the blits are usually in a raster fashion so after pixel (0,0) is copied, pixel (0,1) might be after another hundred or more have been moved.

 

Since Step_X only allows the signed 8-bit value, doesn't really make it feasible to use the blit in a "sideways" mode.

Link to comment
Share on other sites

ST and Amiga have them? I didn't think so. Can't say I've ever seen stuff on either that uses the sort of variable sizing that would allow.

 

Why not implemented, I suppose it wouldn't fit. Though doing the simple fractional stuff where it's just a phase-accumulator type thing like SID uses, it'd be fairly cheap.

Plenty of things VBXE didn't get that we want... if I had the skills I'd do a core that sacrificed a few display features to give more coprocessing type aids.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...