Jump to content
IGNORED

Blitting into GPU RAM


Recommended Posts

Blitter registers must be written with long access. The reason for (r14+x) opcode ;-) But the Blitter needs likely more time to draw then shuffeling everything into the right format.

 

Are you using the painter's algo to draw the stripes or just the visible parts?

 

Link to comment
Share on other sites

9 hours ago, 42bs said:

Are you using the painter's algo to draw the stripes or just the visible parts?

 

Just visible spans, front to back. It’s amazingly efficient and easy to implement. Very elegant algorithm. 😄

  • Like 1
Link to comment
Share on other sites

7 hours ago, agradeneu said:

Any video how it moves?

Not yet, but I will do before long! :) 

 

A little bit more optimisation (and bug fixing where blits go accross map boundaries) and I'll show it in action!

 

 

  • Like 3
Link to comment
Share on other sites

1 hour ago, SainT said:

Not yet, but I will do before long! :) 

 

A little bit more optimisation (and bug fixing where blits go accross map boundaries) and I'll show it in action!

 

 

I've never tried it, but would the blitter mask register be a cheap way to allow the map boundaries to wrap?

Link to comment
Share on other sites

9 hours ago, Sporadic said:

I've never tried it, but would the blitter mask register be a cheap way to allow the map boundaries to wrap?

The blitter is used to draw the spans. The map (at least in my case) is in main RAM and "wrapping" is just and "and" on the map pointer.

  • Like 1
Link to comment
Share on other sites

2 hours ago, 42bs said:

The blitter is used to draw the spans. The map (at least in my case) is in main RAM and "wrapping" is just and "and" on the map pointer.

I thought saint said he was using the blitter to traverse the height map

Link to comment
Share on other sites

7 minutes ago, Sporadic said:

I thought saint said he was using the blitter to traverse the height map

Yep, in my code the blitter is used in the opposite way than you’d perhaps expect! 😄 Unfortunately the and mask is on A2 and I’m using A1 for the map traversal. Otherwise that would have done the job. 😞

 

I should be able to use the step update in the outer blitter loop to subtract the map width / height at the appropriate place to get it to wrap. Although it wouldn’t work for wrap in both x and y without multiple blits. The one downside of this method. 

  • Like 2
Link to comment
Share on other sites

5 hours ago, 42bs said:

So you copy the visible part of the map rotated into the GPU RAM?

Basically, yes. The blitter replaces the map position increment and read in the inner loop such that the GPU just has a local linear buffer to work with. It makes the inner loop much quicker.

Link to comment
Share on other sites

Oh, I found waiting for the blitter drawing the spans is the most time consuming part. Even the "div" for the z axis does not have much of an impact.

 

Do you have a separate color map or do you derive the color from the height?

Link to comment
Share on other sites

Posted (edited)
9 minutes ago, 42bs said:

Oh, I found waiting for the blitter drawing the spans is the most time consuming part. Even the "div" for the z axis does not have much of an impact.

 

Do you have a separate color map or do you derive the color from the height?

I use one over z and multiply for the inner loop for perspective projection. When you’re doing 16,000 iterations in your inner loop (160 wide with 100 depth samples) then every last cycle helps. I’ve not even started using the blitter for rendering yet, but I will do. As the height map is in local RAM then the blitter can fill in parallel quite nicely.

 

There is a separate colour and height map, but they’re interleaved such that a single 32bit map entry has 8 bit height and 16 bit colour.

Edited by SainT
  • Like 1
Link to comment
Share on other sites

1 hour ago, SainT said:

I use one over z and multiply for the inner loop for perspective projection. When you’re doing 16,000 iterations in your inner loop (160 wide with 100 depth samples) then every last cycle helps. I’ve not even started using the blitter for rendering yet, but I will do. As the height map is in local RAM then the blitter can fill in parallel quite nicely.

 

There is a separate colour and height map, but they’re interleaved such that a single 32bit map entry has 8 bit height and 16 bit colour.

Ouch, so you drawning "by hand"? I made a 192x200x112 and a 320x200x112 version. The difference is massive.
Keen to see your stuff in action.

Link to comment
Share on other sites

Just now, 42bs said:

Ouch, so you drawning "by hand"? I made a 192x200x112 and a 320x200x112 version. The difference is massive.
Keen to see your stuff in action.

Yep, all drawn on the GPU! 😆 If I just draw one pixel per span it goes from 30fps to 38fps, so I'm expecting similar gains by switching to the blitter for span rendering.

 

A bit more tweaking and I'll dig the video capture device out!

  • Like 4
Link to comment
Share on other sites

On 5/25/2024 at 8:02 PM, SainT said:

Not yet, but I will do before long! :) 

 

A little bit more optimisation (and bug fixing where blits go accross map boundaries) and I'll show it in action!

 

 

Nice, looking forward to it! 

  • Like 1
Link to comment
Share on other sites

Posted (edited)

 

With some reasonable (lower) quality settings I can get a pretty consistent 50fps, so it's possible you could get a game running at about 25fps with this kind of landscape.

 

I got around the clipping / wrapping of map data into GPU RAM by just implementing a sliding window on the GD instead. So there is a virtual 512x512 window in memory which you can specify the top left corner with a position from 0-1023 and it reads the subsection of a 1024x1024 map. This way the blitter is always rendering from 256,256 and I just alter the map read position on the GD. This also means it's reading the map data directly from the cart. This cost a couple of fps, but makes things much more flexible and the clipping would have cost a bit of time, so in general a good tradeoff.

 

The voxel map is now 8bit colour and 8bit height, so a 1024x1024 map is 2MB, it could go bigger as you can access the whole 16MB cart space on the GD and load from memory card if you wanted.

Edited by SainT
  • Like 13
Link to comment
Share on other sites

Posted (edited)
31 minutes ago, 42bs said:

16MB cardspace? I need to read some docs I guess.

 

So what is the resolution now? 160x200 or less? Or even more?

 

The GD has 16MB of RAM onboard, you can page it in and out of the 6MB physical space the Jag provides. This allows you to access it via a sliding window as well.

 

The rendered resolution is 160x200, its rendering into a 320x200 image such that it's easy then to composite sprites over the top.

Edited by SainT
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

Hmmm, this may be a daft question, but how the hell do you get the blitter to write Z data?

 

I just want to write some Z data with the column. The image is setup with a pitch of 2 and a z offset of 1, so its interleaved pixel and z data. No matter what settings I've tried (PIXEL16|WID320|XADDPIX|ZOFFS1|PITCH2|DSTWRZ for example), nothing gets written to the Z data. I've tried pixel mode, phrase mode, reading z data, not reading z data, it all ends up writing nothing. Is there something I'm missing like it only writes Z using A1 as dest or something?

Link to comment
Share on other sites

I use this in jag_ball demo:

    movei    #(1<<18)|BLIT_DSTENZ|BLIT_DSTWRZ|BLIT_PATDSEL

 

for CMD

 

and set BLIT_SRCZ1

 

In the demo I interleave screen0, screen1 and Z (no need to waste Z buffer twic)

e and set A1 to

 

BLIT_XADDPIX|BLIT_WID320|BLIT_ZOFFS1|BLIT_PIXEL16|BLIT_PITCH3

 

or

 

BLIT_XADDPIX|BLIT_WID320|BLIT_ZOFFS2|BLIT_PIXEL16|BLIT_PITCH3

 

https://github.com/42Bastian/JaguarDemos/tree/main/jag_ball

 

 

 

  • Like 1
Link to comment
Share on other sites

Ouch, Z buffer completely kills performance. Even just having the video memory as interleaved seems to kill it -- must be the number of page misses increasing. I was hoping it wouldn't be this bad. :(

Link to comment
Share on other sites

30 minutes ago, SainT said:

Ouch, Z buffer completely kills performance. Even just having the video memory as interleaved seems to kill it -- must be the number of page misses increasing. I was hoping it wouldn't be this bad. :(

For the vertical stripes? But yes, you have now only the half of the pixels per page plus the blitter has to read. All this is a performance killer.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...