Jump to content

Open Club  ·  76 members

StellaRT
IGNORED

Community-Built Unnamed 1970's Video Game Console-Compatible System (WIP)


Al_Nafuur

Recommended Posts

Pretty sure when Stella enters the debugger, it attempts to read all 4K of the address space at once, into an internal buffer.  There is logic in each of the Cart::bank() methods that checks if we're in 'read-only' mode, and then doesn't do bankswitching if a hotspot is hit.  Obviously a real cart isn't going to be doing that, so hotspots that are hit are going to cause an inadvertent bankswitch.

 

EDIT: I bet trying it on a 4K or less, non-banked cart will allow the debugger to function correctly.

  • Like 3
Link to comment
Share on other sites

5 hours ago, stephena said:

I bet trying it on a 4K or less, non-banked cart will allow the debugger to function correctly.

This function appeared to work fine before overriding the Linux scheduler to be disabled. With the scheduler disabled, I tried entering the debugger in RTLinux on a few 4K roms; it will still freeze into a black canvas. 

Link to comment
Share on other sites

7 hours ago, stephena said:

Pretty sure when Stella enters the debugger, it attempts to read all 4K of the address space at once, into an internal buffer.  There is logic in each of the Cart::bank() methods that checks if we're in 'read-only' mode, and then doesn't do bankswitching if a hotspot is hit.  Obviously a real cart isn't going to be doing that, so hotspots that are hit are going to cause an inadvertent bankswitch.

1 hour ago, MarcoJ said:

This function appeared to work fine before overriding the Linux scheduler to be disabled. With the scheduler disabled, I tried entering the debugger in RTLinux on a few 4K roms; it will still freeze into a black canvas. 

The Debugger had "crashed" bank switched cartridges before which resulted in an "Invalid Instruction", but it never crashed Stella.

 

7 hours ago, stephena said:

EDIT: I bet trying it on a 4K or less, non-banked cart will allow the debugger to function correctly.

Yes cartridges without hotspot in the 4K ROM area will work with the debugger, but others will not. I thought about how the debugger should behave with the CartPort, but I have postponed this for the time being until we have a running CartPort driver.

 

My conclusion is that the debugger cannot simply scan the ROM area of the CartPort in the future. It can only make assumptions about the banking from the addresses (ROM mirrors) requested by the 6502 and the reads/writes to the ROM area at runtime. These assumptions can be displayed in the debugger, but shouldn't be the base for any ROM scans of the debugger.

 

 

Link to comment
Share on other sites

On 9/26/2023 at 11:17 PM, Al_Nafuur said:

I mean too slow for a write cycle of 300 Pi4 NOPs ;-)

I am trying to follow things here. Here is my understanding:

 

We know that code doesn't work on SC RAM. This implies that consecutive peeks in 1 CPU cycle intervals are too fast. The fastest possible access via 6507 CPU is 4 cycles. That means the access speed is somewhere between >1 and <=4 CPU cycles. We also know that we can access ROM much faster. And the current delay is tailored to that ROM access. And we are accessing ROM faster than 1 6507 CPU cycle. Is this correct so far?

 

If yes, then we cannot use the same timing for SC-RAM access. Because it could be that we can e.g. access ROM every 0.5 cycles (and tailored the delays accordingly), but need the full 4 cycles for SC RAM access. That would be 8 times slower, but the delays would only sum up to 4 times slower. 

 

What I mean to say here is, between the last SC-RAM access and the next SC-RAM access, we might need a separate, longer delay time.

 

Does that make sense? And have we tried that already?

Edited by Thomas Jentzsch
  • Like 1
  • Confused 1
Link to comment
Share on other sites

20 minutes ago, Thomas Jentzsch said:

If yes, then we cannot use the same timing for SC-RAM access. Because it could be that we can e.g. access ROM every 0.5 cycles (and tailored the delays accordingly), but need the full 4 cycles for SC RAM access.

I have also noticed that SC-RAM carts need longer delays to be stable as compared to ROM only carts. How does a regular Atari 2600 adjust its timing for carts with SC-RAM or other RAM types? I thought such timings were fixed to the clock speed and read/write cycle characteristics? Or is this the 6507 write cycle (6507->cart/TIA/RIOT) in general?

Link to comment
Share on other sites

33 minutes ago, Thomas Jentzsch said:

I am trying to follow things here. Here is my understanding:

 

We know that code doesn't work on SC RAM. This implies that consecutive peeks in 1 CPU cycle intervals are too fast. The fastest possible access via 6507 CPU is 4 cycles. That means the access speed is somewhere between >1 and <=4 CPU cycles. We also know that we can access ROM much faster. And the current delay is tailored to that ROM access. And we are accessing ROM faster than 1 6507 CPU cycle. Is this correct so far?

 

If yes, then we cannot use the same timing for SC-RAM access. Because it could be that we can e.g. access ROM every 0.5 cycles (and tailored the delays accordingly), but need the full 4 cycles for SC RAM access. That would be 8 times slower, but the delays would only sum up to 4 times slower. 

 

What I mean to say here is, between the last SC-RAM access and the next SC-RAM access, we might need a separate, longer delay time.

Unfortunately we cannot (like a real 2600) know about SC-RAM.

 

33 minutes ago, Thomas Jentzsch said:

Does that make sense? And have we tried that already?

Yes, No.

8 minutes ago, MarcoJ said:

I have also noticed that SC-RAM carts need longer delays to be stable as compared to ROM only carts. How does a regular Atari 2600 adjust its timing for carts with SC-RAM or other RAM types? I thought such timings were fixed to the clock speed and read/write cycle characteristics? Or is this the 6507 write cycle (6507->cart/TIA/RIOT) in general?

 

I have long suspected that the ghost reads and writes do not arrive at the driver. I can't really see them in the screenshots @Kroko has posted:

 

 

So if SC-RAM and some bankings (FA?) need these ghost writes or the extra cycles, we will have to check if they are really requested from the CartPort driver by Stella.

Link to comment
Share on other sites

Just now, Al_Nafuur said:

So if SC-RAM and some bankings (FA?) need these ghost writes or the extra cycles, we will have to check if they are really requested from the CartPort driver by Stella.

Unfortunately this will again reduce our performance.

🤔

Maybe if we for the ghost reads only set the timer for the next request and not wait for the response (like we want to do for the writes) we might reduce the performance impact.
 

But this means we need a reliable precise timer (internal or external).

 

Currently my 64bit PiOS Stella is not working with any delay/timer version we have tested. I can't get stable reads and the emulation always stops with "Invalid instruction". I first suspected a hardware issue with the breadboard again, but switching back (multiple times!) to the 32bit SD-Card resulted in a working emulation.

 

Link to comment
Share on other sites

27 minutes ago, MarcoJ said:

I have also noticed that SC-RAM carts need longer delays to be stable as compared to ROM only carts. How does a regular Atari 2600 adjust its timing for carts with SC-RAM or other RAM types? I thought such timings were fixed to the clock speed and read/write cycle characteristics? Or is this the 6507 write cycle (6507->cart/TIA/RIOT) in general?

Everything is clocked by the 6507 CPU. It runs at 1.x MHz and the fastest possible instruction sequence with two consecutive peeks to SC-RAM is 4 cycles (lda abs, lda abs). Though I am not sure how about the ghost peeks affects this timing. 

Link to comment
Share on other sites

57 minutes ago, Al_Nafuur said:

Unfortunately we cannot (like a real 2600) know about SC-RAM.

First idea: Maybe we do not have too. 

 

The idea would be to add a calibration run before trying to execute code. This calibration would try to find out, how fast we can peek or poke to certain ROM spaces. This would then automatically detect slower accessible regions (e.g. SC-RAM). 

 

Example: 

  1. loop in small intervals (e.g. 64 bytes) over the 4K ROM address space
  2. start with a long, safe delay (e.g. ~4 CPU cycles)
  3. peek from current address
  4. delay
  5. read and remember peek result
  6. peek from a completely different address
  7. delay
  8. peek result should be different from 5., else repeat 6.
  9. peek from current address again
  10. delay
  11. read and compare peek result with 5.
  12. if same, loop with smaller delay (binary search would work too)
  13. else, exit delay loop and use previous delay (+ some safety margin) for this interval (or for binary search, extend delay)

Would this or something like this work?

 

Second idea: Maybe we do not need a valid timer too. 

We might not need an absolute correct timer, just something which is in itself reliable. If we e.g. assume a NOP loop is constant in timing for each NOP, we could use that one too. The calibration from above will automatically deliver the number of NOPs required.

 

Again, does that make sense?

Edited by Thomas Jentzsch
  • Like 1
Link to comment
Share on other sites

7 minutes ago, MarcoJ said:

Are ghost peeks only something Stella does, or are they done on a regular console? 

Stella emulated the regular console. :) 

 

E.g. for LDA absolute,X it does:

 

  const uInt16 low = peek(PC++, DISASM_CODE);
  const uInt16 high = (static_cast<uInt16>(peek(PC++, DISASM_CODE)) << 8);
  intermediateAddress = high | static_cast<uInt8>(low + X);
  if((low + X) > 0xFF)
  {
    peek(intermediateAddress, DISASM_NONE); // ghost peek!
    intermediateAddress = (high | low) + X;
    operand = peek(intermediateAddress, DISASM_DATA);
  }
  else
  {
    operand = peek(intermediateAddress, DISASM_DATA);
  }
  A = operand;
  notZ = A;
  N = A & 0x80;

So, while result from the ghost peek is ignored, it is executed nevertheless.

Edited by Thomas Jentzsch
  • Like 1
Link to comment
Share on other sites

33 minutes ago, Al_Nafuur said:

Maybe if we for the ghost reads only set the timer for the next request and not wait for the response (like we want to do for the writes) we might reduce the performance impact.

I think that's effectively like not executing the ghost peeks at all.

Link to comment
Share on other sites

16 minutes ago, Thomas Jentzsch said:

First idea: Maybe we do not have too. 

 

The idea would be to add a calibration run before trying to execute code. This calibration would try to find out, how fast we can peek or poke to certain ROM spaces. This would then automatically detect slower accessible regions (e.g. SC-RAM). 

 

Example: 

  1. loop in small intervals (e.g. 64 bytes) over the 4K ROM address space
  2. start with a long, safe delay (e.g. ~4 CPU cycles)
  3. peek from current address
  4. delay
  5. read and remember peek result
  6. peek from a completely different address
  7. delay
  8. peek result should be different from 5., else repeat 6.
  9. peek from current address again
  10. delay
  11. read and compare peek result with 5.
  12. if same, loop with smaller delay (binary search would work too)
  13. else, exit delay loop and use previous delay (+ some safety margin) for this interval (or for binary search, extend delay)

Would this or something like this work?

Not with every cartridge, but in the CartPort can be any cartridge. Peeking and poking around blindly like that can be dangerous.

 

The 7800 is doing a ROM scan of some sort at startup, maybe we can imitate this and get some information about the cartridge.🤷‍♂️  

 

16 minutes ago, Thomas Jentzsch said:

Second idea: Maybe we do not need a valid timer too. 

We could do not need an absolute correct timer, just something which is in itself reliable. If we e.g. assume a NOP loop is constant in timing for each NOP, we could use that one too. The calibration from above will automatically deliver the number of NOPs required.

 

Again, does that make sense?

No, we need a real timer. A delay like that will lead to unnecessary waits for us in our writes (and ghost reads).

Link to comment
Share on other sites

3 minutes ago, Thomas Jentzsch said:

I think that's effectively like not executing the ghost peeks at all.

Of course we would set the address! And this would give the cartridge the extra cycle needed by e.g. the SC-RAM

Link to comment
Share on other sites

8 minutes ago, Al_Nafuur said:

Not with every cartridge, but in the CartPort can be any cartridge. Peeking and poking around blindly like that can be dangerous.

What could be dangerous here? 

8 minutes ago, Al_Nafuur said:

No, we need a real timer. A delay like that will lead to unnecessary waits for us in our writes (and ghost reads).

Why would this lead to unnecessary waits?

Link to comment
Share on other sites

8 minutes ago, Al_Nafuur said:

Of course we would set the address! And this would give the cartridge the extra cycle needed by e.g. the SC-RAM

But our cycle is not the 6507 cycle. Ours is faster. Maybe we need two or more of our cycles for SC-RAM access. So setting the address to wouldn't help. And then we can skip the step completely. Instead we have to use two different, independent access cycles, one for ROM and one for RAM.

Link to comment
Share on other sites

8 minutes ago, Thomas Jentzsch said:

What could be dangerous here?

unknown hotspots?

 

10 minutes ago, Thomas Jentzsch said:

Why would this lead to unnecessary waits?

The delay loop doesn't account for the time Stella has consumed for the emulation between the write (ghost read) and the next access.

 

6 minutes ago, Thomas Jentzsch said:

But our cycle is not the 6507 cycle. Ours is faster. Maybe we need two or more of our cycles for SC-RAM access.

As you already have pointed out, accessing SC-RAM in a real cart is always 2 or more 6502 cycles. The SC-RAM is starting to response when the address is set (first cycle), but it hasn't set the data bus at the end of the cycle (like normal ROM or faster RAM does). This doesn't matter because since no code is executed in SC-RAM the read from the 6502 is at the end of the second (or third) cycle. We would imitated that with the ghost reads. Which I suspect are currently not coming to the CartPort driver, so we are trying to read SC-RAM in one cycle! 

 

7 minutes ago, Thomas Jentzsch said:

Instead we have to use two different, independent access cycles, one for ROM and one for RAM.

Does the 2600 has that?

 

 

Link to comment
Share on other sites

23 minutes ago, Al_Nafuur said:

unknown hotspots?

Who cares? We are not executing anything.

23 minutes ago, Al_Nafuur said:

The delay loop doesn't account for the time Stella has consumed for the emulation between the write (ghost read) and the next access.

True. Using a separate thread on a separate CPU would solve the problem. But that's quite costly. A reliable timer would be better.

23 minutes ago, Al_Nafuur said:

As you already have pointed out, accessing SC-RAM in a real cart is always 2 or more 6502 cycles.

Four cycles! Or are you counting different here?

23 minutes ago, Al_Nafuur said:

The SC-RAM is starting to response when the address is set (first cycle), but it hasn't set the data bus at the end of the cycle (like normal ROM or faster RAM does). This doesn't matter because since no code is executed in SC-RAM the read from the 6502 is at the end of the second (or third) cycle. We would imitated that with the ghost reads. Which I suspect are currently not coming to the CartPort driver, so we are trying to read SC-RAM in one cycle! 

Since the addresses of ghost and real reads are different, how would a ghost read help with the timing? 

 

BTW: Ghost peeks happen only in certain cases (e.g. page crossing indexing). E.g. LDA absolute,X (0xAD) does either 4 (no ghost peek) or 5 peeks in total. 

23 minutes ago, Al_Nafuur said:

Does the 2600 has that?

Not required, because its cycles are longer than ours. 6507 cycles are long enough for the slow SC-RAM. Ours are probably not. And if we slow down to cycles required by SC-RAM we might become too slow overall.

Edited by Thomas Jentzsch
Link to comment
Share on other sites

13 minutes ago, Thomas Jentzsch said:

Who cares? We are not executing anything.

But the cartridge..

 

15 minutes ago, Thomas Jentzsch said:

True. Using a separate thread on a separate CPU would solve the problem. But that's quite costly. A reliable timer would be better.

👍

 

15 minutes ago, Thomas Jentzsch said:

Four cycles! Or are you counting different here?

I only made a rough estimate ;-)

17 minutes ago, Thomas Jentzsch said:

Since the addresses of ghost and real reads are different, how would a ghost read help with the timing? 

BTW: Ghost peeks happen only in certain cases (e.g. page crossing indexing). E.g. LDA absolute,X (0xAD) does either 4 (no ghost peek) or 5 peeks in total. 

Indeed! We don't need the ghost read, but the real amount of 6502 cycles and the address of each cycle when doing a data read.

 

17 minutes ago, Thomas Jentzsch said:

Not required, because its cycles are longer than ours.

Currently our cycles aren't much shorter than the 6502!

22 minutes ago, Thomas Jentzsch said:

6507 cycles are long enough for the slow SC-RAM.

Obviously not! Otherwise code execution in SC-RAM would work! SC-RAM needs more than one 6502 cycle to set the data bus.

 

24 minutes ago, Thomas Jentzsch said:

And if we slow down to cycles required by SC-RAM we might become too slow overall.

We also shouldn't do data reads in one cycle! That is what the 2600 does and it is what we should do too.

Link to comment
Share on other sites

22 minutes ago, Al_Nafuur said:

But the cartridge..

Yes, but if we jump to the start vector, it shouldn't matter what the cart was doing before. It will be in a random state and usually the cart's code has to take of making it non-random after starting.

22 minutes ago, Al_Nafuur said:

Obviously not! Otherwise code execution in SC-RAM would work! SC-RAM needs more than one 6502 cycle to set the data bus.

That's not what I meant to say, sorry. I meant that the 6507 cycles are slow enough for successful SC-RAM access (in multiple cycles). Our even only minimal shorter ones could still be too short.

22 minutes ago, Al_Nafuur said:

We also shouldn't do data reads in one cycle! That is what the 2600 does and it is what we should do too.

Same misunderstanding as before. :) 

Edited by Thomas Jentzsch
Link to comment
Share on other sites

I just wondered if we could do the calibration on-the-fly. So no wild peeking, just to be sure.*

 

E.g. Initially we read at safe (incl. SC-RAM) speed 1 and do an extra peek at faster speed 1/2. If the results differ, we know the faster speed is too fast. So the next peeks will be at 1 and 3/4. If they are identical, we use 1/2 and 1/4. And so on, until we have the required accuracy. If we do that per segment, we should soon find out, which segment can handle which speed. 

 

At first the emulation will be a little slow, but that will take only a few moments. And if we later hit yet non-calibrated segments, the already optimized segments will most likely make up for these. 

 

* Have we disabled Stella's frame layout auto detection? This runs the emulation for a few frames (60) to find out if a game is NTSC or PAL. Afterwards a real cart, which doesn't do a full setup at start (e.g. relying on being in a certain bank) might fail.

Edited by Thomas Jentzsch
Link to comment
Share on other sites

4 hours ago, Thomas Jentzsch said:

SC-RAM needs more than one 6502 cycle to set the data bus

This is new to me. Where do you have this info from ? As far as I can tell, the SARA Ram provides a valid output 400ns after stabilization of the address bus.
I came a cross this statement several times now, but I doubt that the SARA Ram is really that slow. Has anybody done measurements on RAM response time in Superchip Cartridges ?

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...