Score board bug

42bs · April 14, 2022

Just to record/warn:

Following sequence work in VJ but not on real HW:

	xor     r3,r2
	btst	#4,r2
	addqt	#4,DISTANCE
	jr	ne,noc
	addqt	#1,LIGHT
	moveq	#0,r4

It seems the Z flag from the btst is not seen, but the one from xor.

An or r2,r2 between those fixes it.

42bs · April 14, 2022

Ok, it is not the Z flag.

original sequence:

	xor	r3,r2
	loadb	(LIGHT),r4
	btst	#4,r2
	addqt	#4,DISTANCE
	jr	ne,noc
	addqt	#1,LIGHT
	moveq	#0,r4
noc:

Result: "jr" is _always_ taken?!

Working:

	xor	r3,r2
	loadb	(LIGHT),r4
	or	r4,r4
	btst	#4,r2
	addqt	#4,DISTANCE
	jr	ne,noc
	addqt	#1,LIGHT
	moveq	#0,r4
noc:

This is the "moveq" bug described in the manual. The write-back of the loadb comes after the writing of the #0.

laoo · April 14, 2022

@42bs I'm a new happy owner of a Jaguar for a few weeks now and I'm trying to wrap my head around does RISC intricacies... So the problem isn't in the jump (jumping is correct), but in thrashing r4 by delayed loadb, right?

42bs · April 14, 2022

Right.

Cyprian · April 14, 2022

that loadb is from the main ram?

loadb	(LIGHT),r4

I wonder how big the delay is?

E.g, which registers will have a new r4 (LIGHT) content.

loadb	(LIGHT),r4
move	r4,r5
move	r4,r6
move	r4,r7
move	r4,r8
move	r4,r9
move	r4,r10

Unfortunately I can't check that myself due to an issue with "EZ-HOST" driver or Skunk itself.

laoo · April 14, 2022

@Cyprian As I understand how Jaguar's bus works it's something you just can't rely upon because the bus always can be taken by some master with higher priority (OP to name one). On the other hand I've read somewhere that it's a good approximation that reading from main RAM in optimal circumstances takes 10 cycles. I don't know what this number depends on.

Edited April 14, 2022 by laoo

42bs · April 14, 2022

53 minutes ago, Cyprian said:

E.g, which registers will have a new r4 (LIGHT) content.

R5, as the read of R4 stalls until loadb has finished.

Interesting is this:

loadb	(LIGHT),r4
REPT m
nop
ENDR
moveq	#0,r4
move	r4,r5

How many NOPs are needed before r5 becomes 0 and not what was read from LIGHT

42bs · April 14, 2022

Quote

How many NOPs are needed before r5 becomes 0 and not what was read from LIGHT

I added this:

	REPT 14
	movei	#100000,r9
	movei	#10,r8
	div	r8,r9
	ENDR

Then I see no more display errors.

And yes, LIGHT is in the main memory.

Cyprian · April 14, 2022

14 reps for one loadb is a lot

42bs · April 14, 2022

Yes, I was kinda shocked.

But this was needed for "every" loadb I do (one for each 2nd displayed pixel). The less repetitions the more often the "moveq" was overridden.

Cyprian · April 16, 2022

Wouldn't it better to copy the data from the main to the GPU with the blitter in that case?

That would allow to avoid scoring issue and the the data is transferred a bit faster ( blitter's 64bits vs GPU's "loadb" 8bit )

42bs · April 17, 2022

My size limit is 256bytes ? I did not yet try to use the blitter. But that is next on my list of things to squeeze into this corset.

Cyprian · April 17, 2022

ok, I forgot about that limit.

256 bytes in case of Jag it's a challenge

42bs · April 26, 2022

On 4/16/2022 at 10:53 PM, Cyprian said:

Wouldn't it better to copy the data from the main to the GPU with the blitter in that case?

I changed my intro to use the blitter for writing line by line to the DRAM, but there is no visible speed-up. The problem is, that the pixels are 16bit, so need to combine two in GPU RAM, which adds again some cycles.

Set simply, reading 320*240*6 bytes and writing 320*240*4 bytes eats a lot of time.

Cyprian · April 26, 2022

ok.

Does the blitter work in parallel with the GPU, or does it just stop the GPU while blitting?

If they can work concurrently, then maybe interleaving the code with blitting would speed it a bit. E.g. process only a half of the line (two lines acutally) in a pass, load the next half on the beginning and save previous half in the middle of the code.

Anyway, I guess the code would not fit into 256 bytes.

Edited April 26, 2022 by Cyprian

42bs · April 26, 2022

4 minutes ago, Cyprian said:

Does the blitter works in parallel with the GPU or stops it during blitting?

Oh, wait. Yes, I wait for the blitter to finish.

I should try double buffering the line I write back to RAM.

I have a tiny intro which fits into 64 bytes, so plenty of space to give it a try.

Cyprian · April 26, 2022

cool

Edited April 26, 2022 by Cyprian

42bs · April 26, 2022

10 minutes ago, Cyprian said:

cool

I checked and it is really interesting. I now wait _before_ I set up a new blit and it actually takes more time to prepare a line of 320 pixels than to write it with the blitter to the memory.

So it is ( for what I see) not possible to have a 320x240 generated picture updated every frame and using the blitter does (at least in my case) not have any advantage.

Or, I have somewhere a big bug which I do not yet see.

Cyprian · April 27, 2022

do I understand correctly that reading/writing a whole line with the blitter isn't faster than reading/writing each pixel by the GPU separately?

BTW, I wonder if the blitter operates with 64bit or 32bit data at once. I mean read from/write to the main when it exchanges the data with the GPU RAM. Would be worth to check that with a logic analyzer.

42bs · April 27, 2022

The point is, I am building a line of 320 16 bit pixels in GPU RAM. Each pixel takes about 30 cycles to compute, so the stall for writing to memory does not affect the calculation.

Since the GPU can only write 32bit and not 16bits (Edit: to internal RAM) I need to spend another 6 cycle to combine odd and even pixels.

From what I understand, the Blitter can only read 32bit wise from GPU RAM.

Edited April 27, 2022 by 42bs

Cyprian · April 27, 2022

ok, 30 cycles is a lot for reading the data from main ram

this is a good example where the blitter isn't helpful, but maybe it would be worth to use also the DSP together with the GPU.

42bs · April 27, 2022

It is 30cycle for the computing of the pixel (more or less).

Anyway, after Outline Demo Party I will release sources. And maybe I made some major things wrong and someone can point me where ;-)

Cyprian · April 27, 2022

1 hour ago, 42bs said:

It is 30cycle for the computing of the pixel (more or less).

yep, I understand that, it was just mental shortcut.

I wonder whether in your case would be possible to run the GPU and the DSP concurrently, together they could calculate two pixels in 30 cycles.

1 hour ago, 42bs said:

Anyway, after Outline Demo Party I will release sources. And maybe I made some major things wrong and someone can point me where

great

42bs · April 27, 2022

24 minutes ago, Cyprian said:

I wonder whether in your case would be possible to run the GPU and the DSP concurrently, together they could calculate two pixels in 30 cycles.

In the intros I am currently working on rather not. But did this with the Mandelbrot set.

What really is important is to "stop #$2000" the 68k, esp. if it runs in ROM space. It pollutes "The One Bus" (c) Mike Brent

Quote

The biggest thing to remember about the Jaguar, and you must remember this in all your theory, is that there is only one bus - the One Bus.

JagMod · April 27, 2022

If your code does this:

loop1:
    wait for blitter not busy
    gpu prepares line
    gpu uses blitter to write line to ram
    goto loop1

Then it will be faster if you do this with double buffering:

loop2:
    gpu prepares 1st line
    gpu uses blitter to write 1st line to ram
    gpu prepares 2nd line
    gpu uses blitter to write 2nd line to ram
    goto loop2

You stated the blitter will always finish before the gpu prepares a line.

The advantage in loop2 is that the gpu never waits for the blitter to finish
so it gets a head start preparing the next line instead of waiting.

Score board bug

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members