Jump to content

Open Club  ·  76 members

StellaRT
IGNORED

Community-Built Unnamed 1970's Video Game Console-Compatible System (WIP)


Al_Nafuur

Recommended Posts

OK, have been able to update CartPort.cxx and CarPort.hxx with the new code, and it compiles OK.

 

First Impressions

- With the DigDug Cart the emulation started OK and was stable; although was slowed down from real time.

- I noticed the source file has a nop delay set for the Cycle Manager, which is 700 by default.

- Tried PlusCart. It was also stable and working, but slowed down from real time.

- Set the nop delay to 400. Recompiled and this allowed DigDug to run at real time.

- Tried PlusCart. It ran OK with the nop delay set at 400. Also tried 200 and 300. It was able to work at 300, but not 200.

- Overall it seems to have similar behaviour to the static nop delay version. 

- This is a good start at setting up the timer and getting the technical aspects of the timer idea working with Stella.

 

Some further ideas

- It seems to me, Stella is needing to complete emulation processing tasks serially with each instruction cluster. 

- Is there a way to execute the Stella emulation processing jobs in another CPU core that is triggered after an instruction I/O cluster? The idea would be, immediately after that parallel process begins, the Cartport driver then starts the next instruction cluster, and hopefully finishes the parallel processing before the next instruction cluster ends. 

 

 

  • Like 1
Link to comment
Share on other sites

8 hours ago, MarcoJ said:

OK, have been able to update CartPort.cxx and CarPort.hxx with the new code, and it compiles OK.

 

First Impressions

- With the DigDug Cart the emulation started OK and was stable; although was slowed down from real time.

- I noticed the source file has a nop delay set for the Cycle Manager, which is 700 by default.

- Tried PlusCart. It was also stable and working, but slowed down from real time.

- Set the nop delay to 400. Recompiled and this allowed DigDug to run at real time.

- Tried PlusCart. It ran OK with the nop delay set at 400. Also tried 200 and 300. It was able to work at 300, but not 200.

Did it run stable and real time on the PlusCart at 400?

 

8 hours ago, MarcoJ said:

- Overall it seems to have similar behaviour to the static nop delay version.

Yes similar, but it should be real time earlier (with a higher delay), because now Stella emulation is "included" in the write cycle delays.

 

Link to comment
Share on other sites

16 minutes ago, Al_Nafuur said:

Did it run stable and real time on the PlusCart at 400?

Good question. I had used the PlusCart runnimg Kovi Kovi title screen in the past to gauge if errors are occurring. At 400, within 10 mins it had seen some corruptions on the screen. It was more or less real time if scheduler is disabled. At 700, although slow, has lasted for 7 hours without corruptions. 
 

16 minutes ago, Al_Nafuur said:

because now Stella emulation is "included" in the write cycle delays

This is a step forward.

 

is the iolscpu command important here? I had just used performance cpu governor.

  • Like 1
Link to comment
Share on other sites

14 minutes ago, MarcoJ said:

Good question. I had used the PlusCart runnimg Kovi Kovi title screen in the past to gauge if errors are occurring. At 400, within 10 mins it had seen some corruptions on the screen. It was more or less real time if scheduler is disabled. At 700, although slow, has lasted for 7 hours without corruptions.

Even at 700 a simple 4K ROM has occasional glitches at my system. My tests showed that most reads are stable with 250, but a few need much more. It looks like sometimes the loop is done much faster by the CPU (cached?).

🤔 Maybe not only the "nop"s need to be volatile, but also the whole function (thread?).🤷‍♂️

 

 

14 minutes ago, MarcoJ said:

is the iolscpu command important here?

The idea is to use a single CPU without interference from the OS scheduler. So our cycle timer thread is using cpu2, RTstella is using the last cpu (cpu3) and the OS uses the rest (cpu0, cpu1)

 

To isolate the cpus from the OS scheduler, you need to add them to "/boot/cmdline.txt":

isolcpus=domain,managed_irq,2,3

 

 

22 minutes ago, MarcoJ said:

I had just used performance cpu governor.

The performance governor only sets the cpu frequency for all cpus.

 

  • Like 1
Link to comment
Share on other sites

8 hours ago, MarcoJ said:

At 700, although slow, has lasted for 7 hours without corruptions. 

15 hours now, no corruptions. 

 

8 hours ago, Al_Nafuur said:

ybe not only the "nop"s need to be volatile, but also the whole function (thread?).

perhaps. I'll do some testing in a few hours.

  • Like 1
Link to comment
Share on other sites

10 hours ago, MarcoJ said:

15 hours now, no corruptions. 

made 24 hours without corruptions and I stopped it.

 

 

18 hours ago, Al_Nafuur said:

The idea is to use a single CPU without interference from the OS scheduler. So our cycle timer thread is using cpu2, RTstella is using the last cpu (cpu3) and the OS uses the rest (cpu0, cpu1)

OK have added this to the bootup. For now , it didn't seem to change the emulation performance.

 

18 hours ago, Al_Nafuur said:

Maybe not only the "nop"s need to be volatile, but also the whole function (thread?)

Tried to define the Timer functions as volatile. I ran into compilation errors. Will try again tomorrow night.

19 hours ago, Al_Nafuur said:

Stella emulation is "included" in the write cycle delays

I'm trying to understand where Stella injects its emulation processing.

 

I understand the M6502::_execute loop seems to be the heart of where instruction by instruction execution happens. This loop can be exited if Stella tells it to execute a limited number of instructions. Inside that loop there doesn't appear to be any emulation processing happening, just peeks and pokes for one instruction (and its arguments), and this gets repeated however many times Stella has programmed the instruction function to run.

 

I can also see that M6502::execute (container function) does contain the TIA and M6532 updateEmulation() commands, which means that every time Stella indicates a number of instructions to process the emulation gets updated once for those indicated instructions. 

Spoiler

void M6502::execute(uInt64 cycles, DispatchResult& result)
{
  _execute(cycles, result);

#ifdef DEBUGGER_SUPPORT
  // Debugger hack: this ensures that stepping a "STA WSYNC" will actually end at the
  // beginning of the next line (otherwise, the next instruction would be stepped in order for
  // the halt to take effect). This is safe because as we know that the next cycle will be a read
  // cycle anyway.
  handleHalt();
#endif

  // Make sure that the hardware state matches the current system clock. This is necessary
  // to maintain a consistent state for the debugger after stepping and to make sure
  // that audio samples are generated for the whole timeslice.
  mySystem->tia().updateEmulation();
  mySystem->m6532().updateEmulation();

}

 

Do you know more about how many instructions executes at once before updating emulation?

  • Like 1
Link to comment
Share on other sites

1 hour ago, MarcoJ said:

made 24 hours without corruptions and I stopped it.

👍 Awesome.

 

1 hour ago, MarcoJ said:

OK have added this to the bootup. For now , it didn't seem to change the emulation performance.

AFAIK it will not improve (much) the emulation performance, it is to prevent or reduce the interrupts from the scheduler, the OS and IRQs.

 

1 hour ago, MarcoJ said:

Tried to define the Timer functions as volatile. I ran into compilation errors. Will try again tomorrow night.

👍

 

1 hour ago, MarcoJ said:

I'm trying to understand where Stella injects its emulation processing.

The emulation is continued, because the write cycle is returning right after it has set the address/data bus and started the timer. Waiting for the end of the cycle is done before the next peek/poke.

 

1 hour ago, MarcoJ said:

I understand the M6502::_execute loop seems to be the heart of where instruction by instruction execution happens. This loop can be exited if Stella tells it to execute a limited number of instructions. Inside that loop there doesn't appear to be any emulation processing happening, just peeks and pokes for one instruction (and its arguments), and this gets repeated however many times Stella has programmed the instruction function to run.

 

I can also see that M6502::execute (container function) does contain the TIA and M6532 updateEmulation() commands, which means that every time Stella indicates a number of instructions to process the emulation gets updated once for those indicated instructions. 

  Reveal hidden contents

void M6502::execute(uInt64 cycles, DispatchResult& result)
{
  _execute(cycles, result);

#ifdef DEBUGGER_SUPPORT
  // Debugger hack: this ensures that stepping a "STA WSYNC" will actually end at the
  // beginning of the next line (otherwise, the next instruction would be stepped in order for
  // the halt to take effect). This is safe because as we know that the next cycle will be a read
  // cycle anyway.
  handleHalt();
#endif

  // Make sure that the hardware state matches the current system clock. This is necessary
  // to maintain a consistent state for the debugger after stepping and to make sure
  // that audio samples are generated for the whole timeslice.
  mySystem->tia().updateEmulation();
  mySystem->m6532().updateEmulation();

}

 

Do you know more about how many instructions executes at once before updating emulation?

Sorry I don't know more, but I am afraid I will have to look into it

  • Like 1
Link to comment
Share on other sites

Spoiler

void M6502::execute(uInt64 cycles, DispatchResult& result)
{
  _execute(cycles, result);

#ifdef DEBUGGER_SUPPORT
  // Debugger hack: this ensures that stepping a "STA WSYNC" will actually end at the
  // beginning of the next line (otherwise, the next instruction would be stepped in order for
  // the halt to take effect). This is safe because as we know that the next cycle will be a read
  // cycle anyway.
  handleHalt();
#endif

  // Make sure that the hardware state matches the current system clock. This is necessary
  // to maintain a consistent state for the debugger after stepping and to make sure
  // that audio samples are generated for the whole timeslice.
//  mySystem->tia().updateEmulation();
//  mySystem->m6532().updateEmulation();

}

I tried trimming off the above updateEmulation statements. They only seem to apply in the debugger. With this change during normal operation, there is something else updating the emulation and the screen is drawn as normal. During debugger, the screen doesn't get updated.

  • Like 1
Link to comment
Share on other sites

Spoiler

bool TIA::poke(uInt16 address, uInt8 value)
{
 // updateEmulation();

  address &= 0x3F;

  switch (address)
  {
    case WSYNC:
      mySystem->m6502().requestHalt();
      break;

    case RSYNC:
      flushLineCache();
      applyRsync();

In TIA/TIA.cxx, this does a lot of work. If the above bold is commented out, the screen just flickers a few vertical lines. Thus, a lot of the emulation is happening as the TIA is poked from the CartPort.cxx. This is what you were wanting. Interestingly, there is not much improvement in speed if this line is commented out.

Link to comment
Share on other sites

On 10/15/2023 at 8:49 AM, Al_Nafuur said:

Just pushed a fix which changed the CPU for the cycle timer thread.

Have had the threadtimer enabled RTStella running on Pitfall 2 under harmony for 3 days now, with the 700 nop speed. All good, no crashes. 

 

I'm optimistic there's a way to increase emulation performance. Any news on that front?

Link to comment
Share on other sites

I found this loop in dispatchEmulation. It calls the TIA update function, which in turn executes code. I'm wondering if this is the core loop that keeps the emulation ticking when in run mode.

 

Spoiler

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void EmulationWorker::dispatchEmulation(std::unique_lock<std::mutex>& lock)
{
  // Technically, we could do without State::running, but it is cleaner and might be useful in the future
  myState = State::running;

  uInt64 totalCycles = 0;

  do {
    myTia->update(*myDispatchResult, totalCycles > 0 ? myMinCycles - totalCycles : myMaxCycles);
    totalCycles += myDispatchResult->getCycles();
  } while (totalCycles < myMinCycles && myDispatchResult->getStatus() == DispatchResult::Status::ok);
 

The TIA: update function

Spoiler

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void TIA::update(DispatchResult& result, uInt64 maxCycles)
{
  mySystem->m6502().execute(maxCycles, result);

  updateEmulation();
}
 

 

 

Link to comment
Share on other sites

Actually...I think it is in RTEmulationWorker instead. I am guessing the EmulationWorker.cxx is superseded by the RT version. 

 

Spoiler

void RTEmulationWorker::dispatchEmulation()
{
  struct timespec timeOffset;
  struct timespec timesliceStart;
  double virtualTimeNanoseconds = 0;

  clock_gettime(CLOCK_MONOTONIC, &timeOffset);

  myDispatchResult->setOk(0);

  myState = State::running;
  myPendingSignal = Signal::none;

  while (myState == State::running) {
    clock_gettime(CLOCK_MONOTONIC, &timesliceStart);

    uInt64 realTimeNanoseconds = timeDifferenceNanoseconds(timesliceStart, timeOffset);
    double deltaNanoseconds = realTimeNanoseconds - virtualTimeNanoseconds;

    // reset virtual clock if emulation lags
    if (deltaNanoseconds > MAX_LAG_NANOSECONDS) {
      virtualTimeNanoseconds = realTimeNanoseconds;
      deltaNanoseconds = 0.;
    }

    const Int64 cycleGoal = (deltaNanoseconds * myCyclesPerSecond) / 1000000000;
    Int64 cyclesTotal = 0;

    while (cycleGoal > cyclesTotal && myDispatchResult->isSuccess()) {
      myTia->update(*myDispatchResult, cycleGoal - cyclesTotal);
      cyclesTotal += myDispatchResult->getCycles();
    }

 

The bold section I believe is the core loop. Commenting out the myTia line crashes the program. 

Link to comment
Share on other sites

6 hours ago, MarcoJ said:

Have had the threadtimer enabled RTStella running on Pitfall 2 under harmony for 3 days now, with the 700 nop speed. All good, no crashes. 

great! But 700 is not real time even on the Pi4, or is it?

 

6 hours ago, MarcoJ said:

I'm optimistic there's a way to increase emulation performance. Any news on that front?

I suspect my hardware (Pi3B+) is defect. It crashes even when emulating a ROM from the SD-card. Very rarely the desktop crashes too. I tried a fresh install of 32bit PiOS on a new SD-card, but Stella is still not working.

 

I have ordered a new Pi4 (because they are really cheaper than 3B+). It arrived today, but I am still waiting for my mirco-HDMI cable.

 

 

  • Like 1
Link to comment
Share on other sites

3 hours ago, Al_Nafuur said:

But 700 is not real time even on the Pi4, or is it?

no, 700 is still running slow. 

 

3 hours ago, Al_Nafuur said:

suspect my hardware (Pi3B+) is defect

How long could you get Stella running before continuously on your rig?, before the timer? There is also the possibility of cable lengths and/or voltage instability is causing rare glitches. I used a 100 nanofarad capacitor across the LVC245 power inputs.

  • Like 1
Link to comment
Share on other sites

20 minutes ago, MarcoJ said:

How long could you get Stella running before continuously on your rig?, before the timer?

I can't get the CartPort to run for more than a few seconds. No matter what timer method I use, Stella freezes and I have to kill the process manually with "kill -9". ROMs loaded from the SD-Card also freeze Stella after about a minute.

 

25 minutes ago, MarcoJ said:

There is also the possibility of cable lengths and/or voltage instability is causing rare glitches.

I haven't changed the cabling, but maybe I should  check/switch my power supply.

 

30 minutes ago, MarcoJ said:

I used a 100 nanofarad capacitor across the LVC245 power inputs.

👍

Good Idea. I will check here too.

 

Link to comment
Share on other sites

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void TIA::update(DispatchResult& result, uInt64 maxCycles)
{
  mySystem->m6502().execute(maxCycles, result);

  updateEmulation();
}

 

Not sure if I am correct, but I think that TIA::update gets continually executed. mySystem->m6502().execute gets the next 6502 instruction with operands. The delays for the CartridgePort driver are included in the execute command, such that it may spend a few microseconds in the m6502().execute function. Following that, updateEmulation is run, which I believe updates the TIA, sound, etc.

 

I theorize that updateEmulation() is taking quite long to process, such that the emulation slows down from real time. The way I imagine it is below:

 

image.thumb.png.57902be67989c018b7ddbee72c5146a0.png

Instead of doing continuous 6502 fetching instructions, it does a cycle of instruction / update / instruction /update.

 

 

I wonder, is it possible to use the CPU affinity rules to get UpdateEmulation() to run on another CPU core, allowing it to process while the next instruction/opcodes are fetched? 

 

 

image.thumb.png.82d957e6c7619732462b005ea2933884.png

 

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void TIA::update(DispatchResult& result, uInt64 maxCycles)
{
  mySystem->m6502().execute(maxCycles, result);

 { do CPU_SET / CPU_ZERO / sched_setaffinity to make the next line execute in parallel and let function return immediately after starting forked process }
  updateEmulation();
}
  • Like 1
Link to comment
Share on other sites

On 10/20/2023 at 11:05 PM, MarcoJ said:

There is also the possibility of cable lengths and/or voltage instability is causing rare glitches.

I have my Pi4 up and running. Decathlon is running with glitches in debug mode (Cartridge and the PlusCart). @alex_79 mentioned on @Andrew Davie's new forum that the Decathlon cartridge is finicky with extension cable and that he had issues dumping the cartridge. So I tried running Decathlon from my breadboard with an extension to a real 2600 and it had indeed similar glitches than with rtstella and the Pi4.

 

Maybe a PCB and shorter traces will solve our issues with the FE banking.

 

I'd liked to proceed with an (test)-PCB setup. But I am a little bit stuck in a analysis paralysis, because if we want to have "full" compatibility with a real 2600 we most likely have to use GPIOs to "emulate" the joystick ports too. The Raspberry Pi3/4 CPUs have 2x 32 GPIOs but only the first 32 are on the PIN header. So we either need a port expander like the MCP27S17 or one of the Raspberry Pi Compute Modules (which seem to have more GPIOs on the edge connector) 

 

 

  • Like 1
Link to comment
Share on other sites

7 hours ago, Al_Nafuur said:

I have my Pi4 up and running. Decathlon is running with glitches in debug mode (Cartridge and the PlusCart). @alex_79 mentioned on @Andrew Davie's new forum that the Decathlon cartridge is finicky with extension cable and that he had issues dumping the cartridge. So I tried running Decathlon from my breadboard with an extension to a real 2600 and it had indeed similar glitches than with rtstella and the Pi4.

 

Maybe a PCB and shorter traces will solve our issues with the FE banking.

I also tested the SuperCharger with the extension cable to a real 2600 and it worked, so shorter traces will most likely not fix the SC not working on the Pi

  • Like 1
Link to comment
Share on other sites

9 hours ago, Al_Nafuur said:

I have my Pi4 up and running.

Hey, excellent. Have you noticed if the emulation is lasting longer without glitches as compared to your Pi3 setup? I am guessing you're using the same breadboard setup. 

 

9 hours ago, Al_Nafuur said:

Maybe a PCB and shorter traces will solve our issues with the FE banking.

I have tried this out, I could only ever get it working for a few hundred milliseconds on PlusCart. A real cart didn't work. How long did it run on your rig?

 

9 hours ago, Al_Nafuur said:

because if we want to have "full" compatibility with a real 2600 we most likely have to use GPIOs to "emulate" the joystick ports too

I looked into this last month. The I/O expander chips run at max 10Mhz clock, when the bit bang is executed it would be sub 1 MHz, our target to work. Also, the SPI and I2C ports are not available as the cartridge bus uses parts of them. 

 

9 hours ago, Al_Nafuur said:

or one of the Raspberry Pi Compute Modules (which seem to have more GPIOs on the edge connector)

Hmm, this is a good idea! It should work well to bit bang the ports in parallel, perfectly in sync with the execution. Such a design would need at the very least a breakout board of some type. Such as this:

https://www.raspberrypi.com/products/compute-module-io-board-v3/ This would have enough I/O to interface to bidirectional joystick ports. This does give some hope to extend and expand the project. Also, it gives the hobbyist options to build the basic cartport driver or the full emulated ports. 

 

In other news, I've had the "Back to the future" demo rom on Harmony I posted before running for about a week without crashing. The CartridgePort driver is very stable. 

9 hours ago, Al_Nafuur said:

I'd liked to proceed with an (test)-PCB setup

OK, Are you hoping to have have a direct cart connector to your PCB, or wires? If the latter, this design could be used:

image.thumb.png.f5a60108ea6cd5741762d5a4a744393e.png

  • Like 1
Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...