42bs Posted April 14, 2022 Share Posted April 14, 2022 Just to record/warn: Following sequence work in VJ but not on real HW: xor r3,r2 btst #4,r2 addqt #4,DISTANCE jr ne,noc addqt #1,LIGHT moveq #0,r4 It seems the Z flag from the btst is not seen, but the one from xor. An or r2,r2 between those fixes it. Quote Link to comment Share on other sites More sharing options...
42bs Posted April 14, 2022 Author Share Posted April 14, 2022 Ok, it is not the Z flag. original sequence: xor r3,r2 loadb (LIGHT),r4 btst #4,r2 addqt #4,DISTANCE jr ne,noc addqt #1,LIGHT moveq #0,r4 noc: Result: "jr" is _always_ taken?! Working: xor r3,r2 loadb (LIGHT),r4 or r4,r4 btst #4,r2 addqt #4,DISTANCE jr ne,noc addqt #1,LIGHT moveq #0,r4 noc: This is the "moveq" bug described in the manual. The write-back of the loadb comes after the writing of the #0. 1 Quote Link to comment Share on other sites More sharing options...
laoo Posted April 14, 2022 Share Posted April 14, 2022 @42bs I'm a new happy owner of a Jaguar for a few weeks now and I'm trying to wrap my head around does RISC intricacies... So the problem isn't in the jump (jumping is correct), but in thrashing r4 by delayed loadb, right? Quote Link to comment Share on other sites More sharing options...
42bs Posted April 14, 2022 Author Share Posted April 14, 2022 Right. 1 Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 14, 2022 Share Posted April 14, 2022 that loadb is from the main ram? loadb (LIGHT),r4 I wonder how big the delay is? E.g, which registers will have a new r4 (LIGHT) content. loadb (LIGHT),r4 move r4,r5 move r4,r6 move r4,r7 move r4,r8 move r4,r9 move r4,r10 Unfortunately I can't check that myself due to an issue with "EZ-HOST" driver or Skunk itself. Quote Link to comment Share on other sites More sharing options...
laoo Posted April 14, 2022 Share Posted April 14, 2022 (edited) @Cyprian As I understand how Jaguar's bus works it's something you just can't rely upon because the bus always can be taken by some master with higher priority (OP to name one). On the other hand I've read somewhere that it's a good approximation that reading from main RAM in optimal circumstances takes 10 cycles. I don't know what this number depends on. Edited April 14, 2022 by laoo Quote Link to comment Share on other sites More sharing options...
42bs Posted April 14, 2022 Author Share Posted April 14, 2022 53 minutes ago, Cyprian said: E.g, which registers will have a new r4 (LIGHT) content. R5, as the read of R4 stalls until loadb has finished. Interesting is this: loadb (LIGHT),r4 REPT m nop ENDR moveq #0,r4 move r4,r5 How many NOPs are needed before r5 becomes 0 and not what was read from LIGHT Quote Link to comment Share on other sites More sharing options...
42bs Posted April 14, 2022 Author Share Posted April 14, 2022 Quote How many NOPs are needed before r5 becomes 0 and not what was read from LIGHT I added this: REPT 14 movei #100000,r9 movei #10,r8 div r8,r9 ENDR Then I see no more display errors. And yes, LIGHT is in the main memory. Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 14, 2022 Share Posted April 14, 2022 14 reps for one loadb is a lot Quote Link to comment Share on other sites More sharing options...
42bs Posted April 14, 2022 Author Share Posted April 14, 2022 Yes, I was kinda shocked. But this was needed for "every" loadb I do (one for each 2nd displayed pixel). The less repetitions the more often the "moveq" was overridden. Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 16, 2022 Share Posted April 16, 2022 Wouldn't it better to copy the data from the main to the GPU with the blitter in that case? That would allow to avoid scoring issue and the the data is transferred a bit faster ( blitter's 64bits vs GPU's "loadb" 8bit ) 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted April 17, 2022 Author Share Posted April 17, 2022 My size limit is 256bytes ? I did not yet try to use the blitter. But that is next on my list of things to squeeze into this corset. Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 17, 2022 Share Posted April 17, 2022 ok, I forgot about that limit. 256 bytes in case of Jag it's a challenge Quote Link to comment Share on other sites More sharing options...
42bs Posted April 26, 2022 Author Share Posted April 26, 2022 On 4/16/2022 at 10:53 PM, Cyprian said: Wouldn't it better to copy the data from the main to the GPU with the blitter in that case? I changed my intro to use the blitter for writing line by line to the DRAM, but there is no visible speed-up. The problem is, that the pixels are 16bit, so need to combine two in GPU RAM, which adds again some cycles. Set simply, reading 320*240*6 bytes and writing 320*240*4 bytes eats a lot of time. Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 26, 2022 Share Posted April 26, 2022 (edited) ok. Does the blitter work in parallel with the GPU, or does it just stop the GPU while blitting? If they can work concurrently, then maybe interleaving the code with blitting would speed it a bit. E.g. process only a half of the line (two lines acutally) in a pass, load the next half on the beginning and save previous half in the middle of the code. Anyway, I guess the code would not fit into 256 bytes. Edited April 26, 2022 by Cyprian Quote Link to comment Share on other sites More sharing options...
42bs Posted April 26, 2022 Author Share Posted April 26, 2022 4 minutes ago, Cyprian said: Does the blitter works in parallel with the GPU or stops it during blitting? Oh, wait. Yes, I wait for the blitter to finish. I should try double buffering the line I write back to RAM. I have a tiny intro which fits into 64 bytes, so plenty of space to give it a try. Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 26, 2022 Share Posted April 26, 2022 (edited) cool Edited April 26, 2022 by Cyprian Quote Link to comment Share on other sites More sharing options...
42bs Posted April 26, 2022 Author Share Posted April 26, 2022 10 minutes ago, Cyprian said: cool I checked and it is really interesting. I now wait _before_ I set up a new blit and it actually takes more time to prepare a line of 320 pixels than to write it with the blitter to the memory. So it is ( for what I see) not possible to have a 320x240 generated picture updated every frame and using the blitter does (at least in my case) not have any advantage. Or, I have somewhere a big bug which I do not yet see. Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 27, 2022 Share Posted April 27, 2022 do I understand correctly that reading/writing a whole line with the blitter isn't faster than reading/writing each pixel by the GPU separately? BTW, I wonder if the blitter operates with 64bit or 32bit data at once. I mean read from/write to the main when it exchanges the data with the GPU RAM. Would be worth to check that with a logic analyzer. Quote Link to comment Share on other sites More sharing options...
42bs Posted April 27, 2022 Author Share Posted April 27, 2022 (edited) The point is, I am building a line of 320 16 bit pixels in GPU RAM. Each pixel takes about 30 cycles to compute, so the stall for writing to memory does not affect the calculation. Since the GPU can only write 32bit and not 16bits (Edit: to internal RAM) I need to spend another 6 cycle to combine odd and even pixels. From what I understand, the Blitter can only read 32bit wise from GPU RAM. Edited April 27, 2022 by 42bs Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 27, 2022 Share Posted April 27, 2022 ok, 30 cycles is a lot for reading the data from main ram this is a good example where the blitter isn't helpful, but maybe it would be worth to use also the DSP together with the GPU. Quote Link to comment Share on other sites More sharing options...
42bs Posted April 27, 2022 Author Share Posted April 27, 2022 It is 30cycle for the computing of the pixel (more or less). Anyway, after Outline Demo Party I will release sources. And maybe I made some major things wrong and someone can point me where Quote Link to comment Share on other sites More sharing options...
Cyprian Posted April 27, 2022 Share Posted April 27, 2022 1 hour ago, 42bs said: It is 30cycle for the computing of the pixel (more or less). yep, I understand that, it was just mental shortcut. I wonder whether in your case would be possible to run the GPU and the DSP concurrently, together they could calculate two pixels in 30 cycles. 1 hour ago, 42bs said: Anyway, after Outline Demo Party I will release sources. And maybe I made some major things wrong and someone can point me where great Quote Link to comment Share on other sites More sharing options...
42bs Posted April 27, 2022 Author Share Posted April 27, 2022 24 minutes ago, Cyprian said: I wonder whether in your case would be possible to run the GPU and the DSP concurrently, together they could calculate two pixels in 30 cycles. In the intros I am currently working on rather not. But did this with the Mandelbrot set. What really is important is to "stop #$2000" the 68k, esp. if it runs in ROM space. It pollutes "The One Bus" (c) Mike Brent Quote The biggest thing to remember about the Jaguar, and you must remember this in all your theory, is that there is only one bus - the One Bus. Quote Link to comment Share on other sites More sharing options...
JagMod Posted April 27, 2022 Share Posted April 27, 2022 If your code does this: loop1: wait for blitter not busy gpu prepares line gpu uses blitter to write line to ram goto loop1 Then it will be faster if you do this with double buffering: loop2: gpu prepares 1st line gpu uses blitter to write 1st line to ram gpu prepares 2nd line gpu uses blitter to write 2nd line to ram goto loop2 You stated the blitter will always finish before the gpu prepares a line. The advantage in loop2 is that the gpu never waits for the blitter to finish so it gets a head start preparing the next line instead of waiting. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.