Jump to content
IGNORED

DSP external store bug details?


cubanismo

Recommended Posts

Does anyone have more detail on the 7th bug listed in the software reference bugs appendix? It sounds really nasty at first glance, but after thinking about it, I can't tell how severe it actually is from the brief description and examples. Here's what it says for reference:

 

7) The DSP must not do an external write unless it is preceded by an external read that will complete for [sic] the write starts. This problem is intermittent and could be missed by testing. Be careful in any DSP code that writes to external memory

 

Then it gives some examples. However a few things aren't clear:

 

  1. Do I need to do a separate external read for *every* external write, or is it sufficient to do one external load when starting the GPU up before I do any subsequent stores? Or is some other frequency implied?
  2. What counts as external? I'm reading and writing the JOYSTICK register, for example, which is on Jerry, but has word size, so I'm assuming counts as external since only external locations can handle non-32bit load/store ops.  How about loads/stores to DSP local memory? Is this "internal" or is any load/store considered "external", with only registers counting as "internal?"

 

My current simple joystick parsing loop is working fine, but the bug is apparently intermittent, so it's hard to tell from simple experimentation whether I'm interpreting this bug correctly or not.

Link to comment
Share on other sites

  • 3 weeks later...

Note that I'm not an expert on the Jaguar, but I can throw in my two cent's worth in an attempt to help make sense of things. A different viewpoint sometimes makes a difference.

 

1. From the text on the bug, a completed external read is necessary before every external write. The first key term here being COMPLETED, with the examples showing using an OR on the register targeted so the scoreboard locks the write until the read completes; the second key term being EVERY. They make no examples showing one completed read and two or more writes.

 

2. External is outside Jerry; for example, to Tom or main memory. The joy ports are internal, but Jerry can do some internal locations as words and others as longs. If you read through the manual on Jerry, it specifically tells you when registers MUST be read as 32 bits, even if only 16 bits are valid. An example here would be the I2S ports. But other parts of Jerry have 16 bit registers one after another and no mention of accessing only as 32 bits. An example of that would be the async serial ports, or the joy ports. All I2S ports are on long boundaries and the manual specifically tells you to access as 32 bits. The async serial ports are NOT all on long boundaries and make no mention of accessing as 32 bits. The joy ports are the same, being at 0xF14000 and 0xF14002, and all examples access as 16 bits.

 

All of the Jerry code I've looked at avoids the external write bug by never doing external writes. I've yet to see any Jerry code accessing anything other than internal registers and local ram.

  • Like 1
Link to comment
Share on other sites

Thanks for chiming in. This makes some sense. It's still not clear to me the bug implies every load needs a corresponding read. It's easier to imagine silicon bugs of that behave that way, but working with that assumption just leaves me with more questions. for example, can I do overly-clever things like:

; Some external reads
load (r1), r2
load (r3), r4

; WAR external write bug and perform external write 1
or r2, r2
store r5, (r6)

; WAR external write bug and perform external write 2
or r4, r4
store r7, (r8)

E.g., how many and what type of instructions are allowed between the external read and its corresponding external write? This is not at all clear to me.

 

My joystick code has a couple of semaphores in external memory currently to synchronize access to a shared circular buffer. I could move them to internal memory and be done with it I suppose. Currently, I think I'm working around the bug in a straight-forward manner as done in the errata examples, but it makes me pretty uncomfortable having such a shoddy understanding of it.

Link to comment
Share on other sites

according to "JAG_V8-bugs.pdf"

 

3 Jerry can see previous DBGL
Level: 1 hardware
Description:

If Jerry asserts DSP bus request one cycle after a previous bus request it is
possible for it to see the end of the previous bus grant for one cycle, and this can
mean that Jerry writes occur with the wrong data. The work-around is to ensure
that Jerry is off the bus before performing a write, either by leaving a long period
of bus inactivity, which is usually greater than the maximum possible period of
object processor bus ownership; or to perform a load and perform an operation
on the loaded data so that the score-board unit can ensure the load has
completed.

 

4 Jerry generates long transfer size bits wrongly
Level: 1 hardware
Description:

If Jerry does a long transfer in a 32 bit system, the size bits are 11 where they
should be 00. Either only perform word transfers, or fix this externally.

 

 

Edited by Cyprian
Link to comment
Share on other sites

On 11/10/2021 at 5:07 AM, cubanismo said:

My joystick code has a couple of semaphores in external memory currently to synchronize access to a shared circular buffer. I could move them to internal memory and be done with it I suppose. Currently, I think I'm working around the bug in a straight-forward manner as done in the errata examples, but it makes me pretty uncomfortable having such a shoddy understanding of it.

Don't poll main ram from a jRISC in a tight loop, you'll hammer the bus.

  • Like 1
Link to comment
Share on other sites

20 hours ago, CyranoJ said:

Don't poll main ram from a jRISC in a tight loop, you'll hammer the bus.

I'm not. When I detect a joystick button state change, I read one 16-bit value and write another 16-bit value to main mem. The rest of the values are in local RAM. I did it this way to avoid potential word-tearing with a 32-bit semaphore in DSP local mem, but that isn't actually an issue given  the limited range of values I'm working with, so there's no real reason to do it this way.

 

Incidentally though, my Tom code does poll its semaphore in main RAM in a tight loop, largely because I was curious what effect this would have on the 68k. For some reason, it works OK. I'm not even using any 68k interrupts while in that loop. I should move that to local GPU mem, but I wish there was a way to sleep the GPU but leave its interrupts enabled like you can the 68k.

Link to comment
Share on other sites

11 minutes ago, cubanismo said:

I wish there was a way to sleep the GPU but leave its interrupts enabled like you can the 68k.

If you want the GPU to sleep until the 68k wakes it up, I suppose you could use either the GPUGO bit or the SINGLE_STEP/SINGLE_GO bits in G_CTRL.

I don't really see the point, though ; busy-waiting on a memory location in local RAM doesn't generate any access on the main bus either, so you're not really gaining anything.
(Well, the power consumption may go down a bit, but the Jaguar is not a battery-powered console...)

Link to comment
Share on other sites

Yeah, I just feel dirty spinning a processor doing nothing. Goes against everything I know from working on modern systems where power is always an issue. GPUGO doesn't work if you want GPU interrupts to keep working from what I can tell. Not sure if SINGLE_STEP/SINGLE_GO allow interrupts.

Link to comment
Share on other sites

 

2 hours ago, cubanismo said:

I just feel dirty spinning a processor doing nothing. Goes against everything I know from working on modern systems where power is always an issue.

I agree, but in that specific case, doing things "right" adds complexity for very little gain.

Maybe it would help when running in an emulator?

On the other hand, maybe it would cause issues because of inaccurate emulation. Who knows.

 

2 hours ago, cubanismo said:

Yeah, I just feel dirty spinning a processor doing nothing. Goes against everything I know from working on modern systems where power is always an issue. GPUGO doesn't work if you want GPU interrupts to keep working from what I can tell. Not sure if SINGLE_STEP/SINGLE_GO allow interrupts.

I should have been clearer; I meant having the 68k explicitly wake up the GPU before sending it an interrupt (otherwise it's not going to work).

 

Link to comment
Share on other sites

  • 2 months later...

Edit:
Ah so I got it, Jerry should not read by itself. Still I don't even know what's so bad about Load Store. If abuse Jerry for some CPU intensive algorithm I could have a kinda in place databuffer for input and output ( I mean, if I worry about the amount of memory ), and then load internal, load external, store internal, load external to swap out data. Or in a different application, if I would render small objects into the frame buffer, I could read the z-buffer, do a check and then write back to the z-buffer. Read the pixel, do some shading ( transparency ) write back. Or I work on a phrase level and load 4 z -values, move 0,1 , and, sub, check bits, generate mask, write back z ..  and then read modify ( with the same mask ) write  pixel data.

The second solution sounds strange, like Jerry would be allowed to write 16 bit once per scanline, but only for the OP enabled scanlines? If Jerry doesn't Load .. then the CPU should get the bus. That would be a reason not to STOP the 68k, but to idle wait, or let it do garbage collection or optimization or help with transformation ( work from both sides ).

 

Reading from Jerry is only problematic because I want to utilize the GPU to its full extent and not let it babysit the DSP.

Old:
Is there a working code sample where the DSP loads code or data in a burst? Everything to main RAM goes through the memory controller. If we read or write continously to the same address, it should only take one cycle. I mean, when the memory controller has just read the same phrase address, does it cycle through a RAM read access? There was no write, so how? Same with write, like when I write three times on the same address: The second should stall and the last one should be united. DSP reads 16 bit over the bus .. so the phrase address should stay fixed for four reads. Should I just look into Doom? Or is the bugfix to activate audio on github?

Edited by ArneCRosenfeldt
I reread all the replies
  • Confused 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...