SainT Posted May 23 Share Posted May 23 Is it possible to blit into GPU RAM while the GPU is running? Am I missing some magic? I have some code which, for test, is copying some data to some screen memory. I can see if copying, it's a small buffer, there's no overflow, and it's working fine. If I change the dest address to be within the GPU RAM it just dies. 🤷♂️ I've seen quite a few comments about blitting into GPU RAM, and there is even the write only 32bit space at G_RAM+$8000 specifically for this purpose. It seems crazy this isn't working... Quote Link to comment Share on other sites More sharing options...
SainT Posted May 23 Author Share Posted May 23 A bit more poking -- if I set the destination address to the object processor line buffer, that works, or anywhere in main RAM it works. Even Jerry RAM works, although it's quite slow. I'm starting to think you may only be able to blit into GPU RAM when the GPU is halted? Which would be a bit crap. And just for reference this is using 32bit per pixel, in pixel mode, with A2 as destination so A1 can scan arbitrarily across some data. All buffers are phrase aligned. Quote Link to comment Share on other sites More sharing options...
+DrTypo Posted May 23 Share Posted May 23 In GemRace I use a temporary buffer in GPU RAM. The GPU is setting up the blitter and waiting for the blit to complete. So blitting into GPU RAM with the GPU running is possible. I'm using 16bit per pixel, destination in pixel mode, source in INC mode. A2 is the destination and A1 the source. Buffers are long aligned. I can send you the whole GPU source if you want? 3 Quote Link to comment Share on other sites More sharing options...
SainT Posted May 23 Author Share Posted May 23 That’s good to know! With one test I set the buffer sometimes to GPU RAM and sometimes to DRAM based on the value of a counter, and that didn’t hang. However it was not very GPU heavy, mostly it was going to DRAM. So there must be some odd timing / contention related issue? If I have a tight loop of calculating the next span location and then starting the blitter and immediately waiting for completion it hangs pretty quickly (within a couple of seconds) but run fine forever when writing to DRAM. I’d certainly be interested to see any GPU code you have with blitter use like this. It might be if I just add a large delay (which will actually be time taken by processing the buffer) after the blit wait has completed it may be happier. I’m just trying to get a proof of concept of getting the data where I need it before proceeding! The idea here is rather than using the GPU to process the DDA to fetch multiple bytes at an arbitrary angle from a byte map, get the blitter to do this and put it in a linear buffer for far easier processing by the GPU and in parallel. Quote Link to comment Share on other sites More sharing options...
+DrTypo Posted May 23 Share Posted May 23 Here is the file with the GPU code. The interesting function starts at line 68 (draw road). The blitter set-up part is mostly done from line 307. gpu.s 2 Quote Link to comment Share on other sites More sharing options...
JagChris Posted May 23 Share Posted May 23 3 hours ago, SainT said: Is it possible to blit into GPU RAM while the GPU is running? Am I missing some magic? I remember Gorf mentioning this once. Supposedly you can run GPU code in a low/high area of memory while loading GPU code into the unused portion. Quote Link to comment Share on other sites More sharing options...
JagChris Posted May 23 Share Posted May 23 Check the Doom code. Chilly Willy claimed the gpu loaded itself. It had to have something running if so. Quote Given how efficient loading the GPU local ram is, having GPU code load itself in stages could have easily been used by more games than just Doom. It probably would have become standard if the Jag had lasted in the marketplace. Quote Link to comment Share on other sites More sharing options...
CyranoJ Posted May 23 Share Posted May 23 2 hours ago, JagChris said: Check the Doom code. Chilly Willy claimed the gpu loaded itself. It had to have something running if so. No, its a byte copy loop. Chris, this is so far above your paygrade, please don't shit up yet another thread with your crap. Go back to reading kindergarden books or something. This is the programming section, after 20+ years you have zero to contribute in here, and mis-information is not helpful. 4 2 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted May 24 Share Posted May 24 13 hours ago, SainT said: Is it possible to blit into GPU RAM while the GPU is running? Am I missing some magic? I have some code which, for test, is copying some data to some screen memory. I can see if copying, it's a small buffer, there's no overflow, and it's working fine. If I change the dest address to be within the GPU RAM it just dies. 🤷♂️ I've seen quite a few comments about blitting into GPU RAM, and there is even the write only 32bit space at G_RAM+$8000 specifically for this purpose. It seems crazy this isn't working... JagTris does run from GPU only and loads overlays via Blitter when needed. So yes, it is possible. BJL also contains macros to handle this. Be sure to use the +$8000 address for 32bit bus access. 2 Quote Link to comment Share on other sites More sharing options...
42bs Posted May 24 Share Posted May 24 (edited) ;----------------------------------------- ;- Copy overlay routine ;----------------------------------------- overlay: load (blitter+$38),r3 shrq #1,r3 jr cc,overlay nop store r0,(blitter) store r1,(blitter+$24) movei #BLIT_PITCH1|BLIT_PIXEL8|BLIT_WID320|BLIT_XADDPHR,r0 xor r1,r1 store r0,(blitter+4) store r0,(blitter+$28) store r1,(blitter+$c) store r1,(blitter+$18) store r1,(blitter+$30) movei #BLIT_SRCEN|BLIT_LFU_REPLACE|BLIT_BUSHI*0,r1 store r2,(blitter+$3c) store r1,(blitter+$38) WAITBLITTER jump (LR) nop and loading movei #MODrun_\0+$8000,r0 ; dest-adr movei #MODstart_\0,r1 movei #1<<16|(MODlen_\0),r2 movei #overlay,r3 BL (r3) Where MODrun_ => destination (run address), MODstart_ => source and MODlen_ WAITBLITTER does the same as the entry code, but uses R0! Edited May 24 by 42bs 4 Quote Link to comment Share on other sites More sharing options...
SainT Posted May 24 Author Share Posted May 24 Well, I got it working, but it is a bit unstable. I think the instability is to do with the 68000 and blitter both writing to the GPU RAM, so the hanging I was seeing was more to do with the GPU getting bad parameters passed from the 68000, I think. But more importantly it is worth around an additional 4fps, going from 18 to 22. So it's a good improvement. 3 Quote Link to comment Share on other sites More sharing options...
42bs Posted May 24 Share Posted May 24 Why is the 68k writing to GPU RAM? You should avoid this in any case, at least when the GPU runs. If you want to pass parameters use a DRAM section. But the 68k has a lesser prio than the Blitter, so there should not be any disturbance. 2 Quote Link to comment Share on other sites More sharing options...
SainT Posted May 24 Author Share Posted May 24 2 minutes ago, 42bs said: Why is the 68k writing to GPU RAM? You should avoid this in any case, at least when the GPU runs. If you want to pass parameters use a DRAM section. But the 68k has a lesser prio than the Blitter, so there should not be any disturbance. Good to know -- this is just a test case, so I have been writing parameters to GPU RAM. There was no issue with parameters being corrupted before the blitter code was added, so there is definitely some kind of issue with the 68K and blitter both writing at the same time. The bus might be getting interrupted half way though the long write or something and one half getting corrupted. Quote Link to comment Share on other sites More sharing options...
agradeneu Posted May 24 Share Posted May 24 18 hours ago, CyranoJ said: No, its a byte copy loop. Chris, this is so far above your paygrade, please don't shit up yet another thread with your crap. Go back to reading kindergarden books or something. This is the programming section, after 20+ years you have zero to contribute in here, and mis-information is not helpful. Well, he is the same guy (under the handle "Achris31") that has the nerve to accuse members of this forum and AA in general of being "con artists" and "destroying homebrew game development" on almost every comment section under a Jaguar related video on youtube. Its a miracle he is still tolerated here or at any place with common sense! 2 1 Quote Link to comment Share on other sites More sharing options...
+cubanismo Posted May 24 Share Posted May 24 Not speaking from experience, but what I've read is that while you can blit to GPU RAM from somewhere else, you can't blit from GPU RAM -> GPU RAM, so beware of that scenario. Hence, the LUT-as-span-buffer trick for texturing. 2 Quote Link to comment Share on other sites More sharing options...
SainT Posted May 24 Author Share Posted May 24 This is the result so far. There doesn’t seem to be much scope to speed up the rendering of the actual spans with the blitter, as if I remove the actual rendering I only get an additional few FPS. The majority of the time is spent just iterating and processing the height map ready for actual rendering. 7 Quote Link to comment Share on other sites More sharing options...
agradeneu Posted May 24 Share Posted May 24 1 minute ago, SainT said: This is the result so far. There doesn’t seem to be much scope to speed up the rendering of the actual spans with the blitter, as if I remove the actual rendering I only get an additional few FPS. The majority of the time is spent just iterating and processing the height map ready for actual rendering. Looks cool, Comanche voxel engine? Quote Link to comment Share on other sites More sharing options...
SainT Posted May 24 Author Share Posted May 24 Just now, agradeneu said: Looks cool, Comanche voxel engine? Yes, exactly that. Just a pretty normal voxelspace engine. The only slightly cunning bit is using the blitter to do the map traversal to allow a tighter inner loop. Quote Link to comment Share on other sites More sharing options...
agradeneu Posted May 24 Share Posted May 24 Just now, SainT said: Yes, exactly that. Just a pretty normal voxelspace engine. The only slightly cunning bit is using the blitter to do the map traversal to allow a tighter inner loop. Yeah but this looks more realistic thamn previous, due to a much more detailed texture. 1 Quote Link to comment Share on other sites More sharing options...
SainT Posted May 24 Author Share Posted May 24 2 minutes ago, agradeneu said: Yeah but this looks more realistic thamn previous, due to a much more detailed texture. True, I don’t think I’ve seen anything this detailed on the Jag in terms of voxelspace. It’s a 512*512 height and colour map with 8 bit height and 16 bit colour. It’s doing 100 depth samples and rendering to a 160*200 buffer. Increasing vertical resolution shouldn’t be that much slower either due to the column nature of voxels. There may be some kind of hierarchical approach that could be used to discard larger blocks of map data to speed things up as well. It’s a nice tight little domain for optimisation! 2 Quote Link to comment Share on other sites More sharing options...
agradeneu Posted May 24 Share Posted May 24 16 minutes ago, SainT said: True, I don’t think I’ve seen anything this detailed on the Jag in terms of voxelspace. It’s a 512*512 height and colour map with 8 bit height and 16 bit colour. It’s doing 100 depth samples and rendering to a 160*200 buffer. Increasing vertical resolution shouldn’t be that much slower either due to the column nature of voxels. There may be some kind of hierarchical approach that could be used to discard larger blocks of map data to speed things up as well. It’s a nice tight little domain for optimisation! That amount of detail and color is truly something special, great stuff! 2 Quote Link to comment Share on other sites More sharing options...
42bs Posted May 24 Share Posted May 24 (edited) 37 minutes ago, SainT said: True, I don’t think I’ve seen anything this detailed on the Jag in terms of voxelspace. It’s a 512*512 height and colour map with 8 bit height and 16 bit colour. It’s doing 100 depth samples and rendering to a 160*200 buffer. Increasing vertical resolution shouldn’t be that much slower either due to the column nature of voxels. There may be some kind of hierarchical approach that could be used to discard larger blocks of map data to speed things up as well. It’s a nice tight little domain for optimisation! In my version you can change the Z depth and directly see the impact of it. Your colors look nicer! Edited May 24 by 42bs 2 Quote Link to comment Share on other sites More sharing options...
SainT Posted May 24 Author Share Posted May 24 29 minutes ago, 42bs said: In my version you can change the Z depth and directly see the impact of it. Your colors look nicer! Yes, I have all of those types of controls on the controller. You can adjust depth samples, depth increment, horizon, etc all from the number pad. I’ve implemented a non-linear z increment as well such that the sample distance increases each successive depth step to give a greater view distance at the expense of detail. It works quite well! I also have sample quantisation and skipping in the near samples such that multiple columns are generated in a single sample. That was a useful optimisation, too. 3 Quote Link to comment Share on other sites More sharing options...
42bs Posted May 24 Share Posted May 24 Do you plan a game or just a demo? Quote Link to comment Share on other sites More sharing options...
SainT Posted May 24 Author Share Posted May 24 25 minutes ago, 42bs said: Do you plan a game or just a demo? Depends on how I can allocate my time. It would be nice to do some kind of racing game with a voxel landscape engine, but getting the time to actually finish a full on game would be a hard push. Getting at least some kind of playable demo would be a nice goal. Is there a way to load the blitter registers a word (16 bits) at a time? Or is it just long only? I tried testing word writes, but it seems to fail, or perhaps write a full long, I’m not sure. Having the integer and fractional parts split across different registers is quite annoying when working with fixed point numbers. More time than I’d like is spend shuffling registers into the right format. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.