Community-Built Unnamed 1970's Video Game Console-Compatible System (WIP)

+MarcoJ · October 14, 2023

34 minutes ago, Al_Nafuur said:

Just pushed a fix which changed the CPU for the cycle timer thread. Now the emulation works, but again I have trouble getting a stable emulation on my Pi 3.

Awesome. Will try out in a few hrs.

+MarcoJ · October 15, 2023

OK, have been able to update CartPort.cxx and CarPort.hxx with the new code, and it compiles OK.

First Impressions

- With the DigDug Cart the emulation started OK and was stable; although was slowed down from real time.

- I noticed the source file has a nop delay set for the Cycle Manager, which is 700 by default.

- Tried PlusCart. It was also stable and working, but slowed down from real time.

- Set the nop delay to 400. Recompiled and this allowed DigDug to run at real time.

- Tried PlusCart. It ran OK with the nop delay set at 400. Also tried 200 and 300. It was able to work at 300, but not 200.

- Overall it seems to have similar behaviour to the static nop delay version.

- This is a good start at setting up the timer and getting the technical aspects of the timer idea working with Stella.

Some further ideas

- It seems to me, Stella is needing to complete emulation processing tasks serially with each instruction cluster.

- Is there a way to execute the Stella emulation processing jobs in another CPU core that is triggered after an instruction I/O cluster? The idea would be, immediately after that parallel process begins, the Cartport driver then starts the next instruction cluster, and hopefully finishes the parallel processing before the next instruction cluster ends.

+Al_Nafuur · October 15, 2023

8 hours ago, MarcoJ said:

OK, have been able to update CartPort.cxx and CarPort.hxx with the new code, and it compiles OK.

First Impressions

- With the DigDug Cart the emulation started OK and was stable; although was slowed down from real time.

- I noticed the source file has a nop delay set for the Cycle Manager, which is 700 by default.

- Tried PlusCart. It was also stable and working, but slowed down from real time.

- Set the nop delay to 400. Recompiled and this allowed DigDug to run at real time.

- Tried PlusCart. It ran OK with the nop delay set at 400. Also tried 200 and 300. It was able to work at 300, but not 200.

Did it run stable and real time on the PlusCart at 400?

8 hours ago, MarcoJ said:

- Overall it seems to have similar behaviour to the static nop delay version.

Yes similar, but it should be real time earlier (with a higher delay), because now Stella emulation is "included" in the write cycle delays.

+MarcoJ · October 15, 2023

16 minutes ago, Al_Nafuur said:

Did it run stable and real time on the PlusCart at 400?

Good question. I had used the PlusCart runnimg Kovi Kovi title screen in the past to gauge if errors are occurring. At 400, within 10 mins it had seen some corruptions on the screen. It was more or less real time if scheduler is disabled. At 700, although slow, has lasted for 7 hours without corruptions.

16 minutes ago, Al_Nafuur said:

because now Stella emulation is "included" in the write cycle delays

This is a step forward.

is the iolscpu command important here? I had just used performance cpu governor.

+Al_Nafuur · October 15, 2023

14 minutes ago, MarcoJ said:

Good question. I had used the PlusCart runnimg Kovi Kovi title screen in the past to gauge if errors are occurring. At 400, within 10 mins it had seen some corruptions on the screen. It was more or less real time if scheduler is disabled. At 700, although slow, has lasted for 7 hours without corruptions.

Even at 700 a simple 4K ROM has occasional glitches at my system. My tests showed that most reads are stable with 250, but a few need much more. It looks like sometimes the loop is done much faster by the CPU (cached?).

🤔 Maybe not only the "nop"s need to be volatile, but also the whole function (thread?).🤷‍♂️

14 minutes ago, MarcoJ said:

is the iolscpu command important here?

The idea is to use a single CPU without interference from the OS scheduler. So our cycle timer thread is using cpu2, RTstella is using the last cpu (cpu3) and the OS uses the rest (cpu0, cpu1)

To isolate the cpus from the OS scheduler, you need to add them to "/boot/cmdline.txt":

isolcpus=domain,managed_irq,2,3

22 minutes ago, MarcoJ said:

I had just used performance cpu governor.

The performance governor only sets the cpu frequency for all cpus.

+MarcoJ · October 16, 2023

8 hours ago, MarcoJ said:

At 700, although slow, has lasted for 7 hours without corruptions.

15 hours now, no corruptions.

8 hours ago, Al_Nafuur said:

ybe not only the "nop"s need to be volatile, but also the whole function (thread?).

perhaps. I'll do some testing in a few hours.

+MarcoJ · October 16, 2023

10 hours ago, MarcoJ said:

15 hours now, no corruptions.

made 24 hours without corruptions and I stopped it.

18 hours ago, Al_Nafuur said:

The idea is to use a single CPU without interference from the OS scheduler. So our cycle timer thread is using cpu2, RTstella is using the last cpu (cpu3) and the OS uses the rest (cpu0, cpu1)

OK have added this to the bootup. For now , it didn't seem to change the emulation performance.

18 hours ago, Al_Nafuur said:

Maybe not only the "nop"s need to be volatile, but also the whole function (thread?)

Tried to define the Timer functions as volatile. I ran into compilation errors. Will try again tomorrow night.

19 hours ago, Al_Nafuur said:

Stella emulation is "included" in the write cycle delays

I'm trying to understand where Stella injects its emulation processing.

I understand the M6502::_execute loop seems to be the heart of where instruction by instruction execution happens. This loop can be exited if Stella tells it to execute a limited number of instructions. Inside that loop there doesn't appear to be any emulation processing happening, just peeks and pokes for one instruction (and its arguments), and this gets repeated however many times Stella has programmed the instruction function to run.

I can also see that M6502::execute (container function) does contain the TIA and M6532 updateEmulation() commands, which means that every time Stella indicates a number of instructions to process the emulation gets updated once for those indicated instructions.

Spoiler

void M6502::execute(uInt64 cycles, DispatchResult& result)
{
_execute(cycles, result);

#ifdef DEBUGGER_SUPPORT
// Debugger hack: this ensures that stepping a "STA WSYNC" will actually end at the
// beginning of the next line (otherwise, the next instruction would be stepped in order for
// the halt to take effect). This is safe because as we know that the next cycle will be a read
// cycle anyway.
handleHalt();
#endif

// Make sure that the hardware state matches the current system clock. This is necessary
// to maintain a consistent state for the debugger after stepping and to make sure
// that audio samples are generated for the whole timeslice.
mySystem->tia().updateEmulation();
mySystem->m6532().updateEmulation();
}

Do you know more about how many instructions executes at once before updating emulation?

+Al_Nafuur · October 16, 2023

1 hour ago, MarcoJ said:

made 24 hours without corruptions and I stopped it.

👍 Awesome.

1 hour ago, MarcoJ said:

OK have added this to the bootup. For now , it didn't seem to change the emulation performance.

AFAIK it will not improve (much) the emulation performance, it is to prevent or reduce the interrupts from the scheduler, the OS and IRQs.

1 hour ago, MarcoJ said:

Tried to define the Timer functions as volatile. I ran into compilation errors. Will try again tomorrow night.

👍

1 hour ago, MarcoJ said:

I'm trying to understand where Stella injects its emulation processing.

The emulation is continued, because the write cycle is returning right after it has set the address/data bus and started the timer. Waiting for the end of the cycle is done before the next peek/poke.

1 hour ago, MarcoJ said:

I understand the M6502::_execute loop seems to be the heart of where instruction by instruction execution happens. This loop can be exited if Stella tells it to execute a limited number of instructions. Inside that loop there doesn't appear to be any emulation processing happening, just peeks and pokes for one instruction (and its arguments), and this gets repeated however many times Stella has programmed the instruction function to run.

I can also see that M6502::execute (container function) does contain the TIA and M6532 updateEmulation() commands, which means that every time Stella indicates a number of instructions to process the emulation gets updated once for those indicated instructions.

Reveal hidden contents

void M6502::execute(uInt64 cycles, DispatchResult& result)
{
_execute(cycles, result);

#ifdef DEBUGGER_SUPPORT
// Debugger hack: this ensures that stepping a "STA WSYNC" will actually end at the
// beginning of the next line (otherwise, the next instruction would be stepped in order for
// the halt to take effect). This is safe because as we know that the next cycle will be a read
// cycle anyway.
handleHalt();
#endif

// Make sure that the hardware state matches the current system clock. This is necessary
// to maintain a consistent state for the debugger after stepping and to make sure
// that audio samples are generated for the whole timeslice.
mySystem->tia().updateEmulation();
mySystem->m6532().updateEmulation();
}

Do you know more about how many instructions executes at once before updating emulation?

Sorry I don't know more, but I am afraid I will have to look into it

+MarcoJ · October 17, 2023

Spoiler

void M6502::execute(uInt64 cycles, DispatchResult& result)
{
_execute(cycles, result);

#ifdef DEBUGGER_SUPPORT
// Debugger hack: this ensures that stepping a "STA WSYNC" will actually end at the
// beginning of the next line (otherwise, the next instruction would be stepped in order for
// the halt to take effect). This is safe because as we know that the next cycle will be a read
// cycle anyway.
handleHalt();
#endif

// Make sure that the hardware state matches the current system clock. This is necessary
// to maintain a consistent state for the debugger after stepping and to make sure
// that audio samples are generated for the whole timeslice.
// mySystem->tia().updateEmulation();
// mySystem->m6532().updateEmulation();
}

I tried trimming off the above updateEmulation statements. They only seem to apply in the debugger. With this change during normal operation, there is something else updating the emulation and the screen is drawn as normal. During debugger, the screen doesn't get updated.

+MarcoJ · October 17, 2023

Spoiler

bool TIA::poke(uInt16 address, uInt8 value)
{
// updateEmulation();

address &= 0x3F;

switch (address)
{
case WSYNC:
mySystem->m6502().requestHalt();
break;

case RSYNC:
flushLineCache();
applyRsync();

In TIA/TIA.cxx, this does a lot of work. If the above bold is commented out, the screen just flickers a few vertical lines. Thus, a lot of the emulation is happening as the TIA is poked from the CartPort.cxx. This is what you were wanting. Interestingly, there is not much improvement in speed if this line is commented out.

+MarcoJ · October 17, 2023

14 minutes ago, MarcoJ said:

In TIA/TIA.cxx, this does a lot of work.

Actually, this is just updating clocks and cycles. There is something else updating the emulation. Could it be an interrupt routine?

+MarcoJ · October 20, 2023

On 10/15/2023 at 8:49 AM, Al_Nafuur said:

Just pushed a fix which changed the CPU for the cycle timer thread.

Have had the threadtimer enabled RTStella running on Pitfall 2 under harmony for 3 days now, with the 700 nop speed. All good, no crashes.

I'm optimistic there's a way to increase emulation performance. Any news on that front?

+MarcoJ · October 20, 2023

I found this loop in dispatchEmulation. It calls the TIA update function, which in turn executes code. I'm wondering if this is the core loop that keeps the emulation ticking when in run mode.

Spoiler

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void EmulationWorker::dispatchEmulation(std::unique_lock<std::mutex>& lock)
{
// Technically, we could do without State::running, but it is cleaner and might be useful in the future
myState = State::running;

uInt64 totalCycles = 0;

do {
myTia->update(*myDispatchResult, totalCycles > 0 ? myMinCycles - totalCycles : myMaxCycles);
totalCycles += myDispatchResult->getCycles();
} while (totalCycles < myMinCycles && myDispatchResult->getStatus() == DispatchResult::Status::ok);

The TIA: update function

Spoiler

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void TIA::update(DispatchResult& result, uInt64 maxCycles)
{
mySystem->m6502().execute(maxCycles, result);

updateEmulation();
}

+MarcoJ · October 20, 2023

58 minutes ago, MarcoJ said:

I'm wondering if this is the core loop that keeps the emulation ticking when in run mode.

nope. Emulation still works even if I comment out the guts of the function.

+MarcoJ · October 20, 2023

Actually...I think it is in RTEmulationWorker instead. I am guessing the EmulationWorker.cxx is superseded by the RT version.

Spoiler

void RTEmulationWorker::dispatchEmulation()
{
struct timespec timeOffset;
struct timespec timesliceStart;
double virtualTimeNanoseconds = 0;

clock_gettime(CLOCK_MONOTONIC, &timeOffset);

myDispatchResult->setOk(0);

myState = State::running;
myPendingSignal = Signal::none;

while (myState == State::running) {
clock_gettime(CLOCK_MONOTONIC, &timesliceStart);

uInt64 realTimeNanoseconds = timeDifferenceNanoseconds(timesliceStart, timeOffset);
double deltaNanoseconds = realTimeNanoseconds - virtualTimeNanoseconds;

// reset virtual clock if emulation lags
if (deltaNanoseconds > MAX_LAG_NANOSECONDS) {
virtualTimeNanoseconds = realTimeNanoseconds;
deltaNanoseconds = 0.;
}

const Int64 cycleGoal = (deltaNanoseconds * myCyclesPerSecond) / 1000000000;
Int64 cyclesTotal = 0;

while (cycleGoal > cyclesTotal && myDispatchResult->isSuccess()) {
myTia->update(*myDispatchResult, cycleGoal - cyclesTotal);
cyclesTotal += myDispatchResult->getCycles();
}

The bold section I believe is the core loop. Commenting out the myTia line crashes the program.

Thomas Jentzsch · October 20, 2023

57 minutes ago, MarcoJ said:

The bold section I believe is the core loop. Commenting out the myTia line crashes the program.

Of course it does crash, because myDispatchResult is undefined.

+Al_Nafuur · October 20, 2023

6 hours ago, MarcoJ said:

Have had the threadtimer enabled RTStella running on Pitfall 2 under harmony for 3 days now, with the 700 nop speed. All good, no crashes.

great! But 700 is not real time even on the Pi4, or is it?

6 hours ago, MarcoJ said:

I'm optimistic there's a way to increase emulation performance. Any news on that front?

I suspect my hardware (Pi3B+) is defect. It crashes even when emulating a ROM from the SD-card. Very rarely the desktop crashes too. I tried a fresh install of 32bit PiOS on a new SD-card, but Stella is still not working.

I have ordered a new Pi4 (because they are really cheaper than 3B+). It arrived today, but I am still waiting for my mirco-HDMI cable.

+MarcoJ · October 20, 2023

5 hours ago, Thomas Jentzsch said:

Of course it does crash, because myDispatchResult is undefined.

OK. Is this the loop that is running executing code, updating TIA, everything stemming from here?

+MarcoJ · October 20, 2023

3 hours ago, Al_Nafuur said:

But 700 is not real time even on the Pi4, or is it?

no, 700 is still running slow.

3 hours ago, Al_Nafuur said:

suspect my hardware (Pi3B+) is defect

How long could you get Stella running before continuously on your rig?, before the timer? There is also the possibility of cable lengths and/or voltage instability is causing rare glitches. I used a 100 nanofarad capacitor across the LVC245 power inputs.

+Al_Nafuur · October 20, 2023

20 minutes ago, MarcoJ said:

How long could you get Stella running before continuously on your rig?, before the timer?

I can't get the CartPort to run for more than a few seconds. No matter what timer method I use, Stella freezes and I have to kill the process manually with "kill -9". ROMs loaded from the SD-Card also freeze Stella after about a minute.

25 minutes ago, MarcoJ said:

There is also the possibility of cable lengths and/or voltage instability is causing rare glitches.

I haven't changed the cabling, but maybe I should check/switch my power supply.

30 minutes ago, MarcoJ said:

I used a 100 nanofarad capacitor across the LVC245 power inputs.

👍

Good Idea. I will check here too.

Thomas Jentzsch · October 21, 2023

14 hours ago, MarcoJ said:

OK. Is this the loop that is running executing code, updating TIA, everything stemming from here?

Yup.

+MarcoJ · October 24, 2023

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void TIA::update(DispatchResult& result, uInt64 maxCycles)
{
  mySystem->m6502().execute(maxCycles, result);

  updateEmulation();
}

Not sure if I am correct, but I think that TIA::update gets continually executed. mySystem->m6502().execute gets the next 6502 instruction with operands. The delays for the CartridgePort driver are included in the execute command, such that it may spend a few microseconds in the m6502().execute function. Following that, updateEmulation is run, which I believe updates the TIA, sound, etc.

I theorize that updateEmulation() is taking quite long to process, such that the emulation slows down from real time. The way I imagine it is below:

Instead of doing continuous 6502 fetching instructions, it does a cycle of instruction / update / instruction /update.

I wonder, is it possible to use the CPU affinity rules to get UpdateEmulation() to run on another CPU core, allowing it to process while the next instruction/opcodes are fetched?

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void TIA::update(DispatchResult& result, uInt64 maxCycles)
{
  mySystem->m6502().execute(maxCycles, result);

 { do CPU_SET / CPU_ZERO / sched_setaffinity to make the next line execute in parallel and let function return immediately after starting forked process }
  updateEmulation();
}

+Al_Nafuur · October 30, 2023

On 10/20/2023 at 11:05 PM, MarcoJ said:

There is also the possibility of cable lengths and/or voltage instability is causing rare glitches.

I have my Pi4 up and running. Decathlon is running with glitches in debug mode (Cartridge and the PlusCart). @alex_79 mentioned on @Andrew Davie's new forum that the Decathlon cartridge is finicky with extension cable and that he had issues dumping the cartridge. So I tried running Decathlon from my breadboard with an extension to a real 2600 and it had indeed similar glitches than with rtstella and the Pi4.

Maybe a PCB and shorter traces will solve our issues with the FE banking.

I'd liked to proceed with an (test)-PCB setup. But I am a little bit stuck in a analysis paralysis, because if we want to have "full" compatibility with a real 2600 we most likely have to use GPIOs to "emulate" the joystick ports too. The Raspberry Pi3/4 CPUs have 2x 32 GPIOs but only the first 32 are on the PIN header. So we either need a port expander like the MCP27S17 or one of the Raspberry Pi Compute Modules (which seem to have more GPIOs on the edge connector)

+Al_Nafuur · October 30, 2023

7 hours ago, Al_Nafuur said:

I have my Pi4 up and running. Decathlon is running with glitches in debug mode (Cartridge and the PlusCart). @alex_79 mentioned on @Andrew Davie's new forum that the Decathlon cartridge is finicky with extension cable and that he had issues dumping the cartridge. So I tried running Decathlon from my breadboard with an extension to a real 2600 and it had indeed similar glitches than with rtstella and the Pi4.

Maybe a PCB and shorter traces will solve our issues with the FE banking.

I also tested the SuperCharger with the extension cable to a real 2600 and it worked, so shorter traces will most likely not fix the SC not working on the Pi

+MarcoJ · October 30, 2023

9 hours ago, Al_Nafuur said:

I have my Pi4 up and running.

Hey, excellent. Have you noticed if the emulation is lasting longer without glitches as compared to your Pi3 setup? I am guessing you're using the same breadboard setup.

9 hours ago, Al_Nafuur said:

Maybe a PCB and shorter traces will solve our issues with the FE banking.

I have tried this out, I could only ever get it working for a few hundred milliseconds on PlusCart. A real cart didn't work. How long did it run on your rig?

9 hours ago, Al_Nafuur said:

because if we want to have "full" compatibility with a real 2600 we most likely have to use GPIOs to "emulate" the joystick ports too

I looked into this last month. The I/O expander chips run at max 10Mhz clock, when the bit bang is executed it would be sub 1 MHz, our target to work. Also, the SPI and I2C ports are not available as the cartridge bus uses parts of them.

9 hours ago, Al_Nafuur said:

or one of the Raspberry Pi Compute Modules (which seem to have more GPIOs on the edge connector)

Hmm, this is a good idea! It should work well to bit bang the ports in parallel, perfectly in sync with the execution. Such a design would need at the very least a breakout board of some type. Such as this:

https://www.raspberrypi.com/products/compute-module-io-board-v3/ This would have enough I/O to interface to bidirectional joystick ports. This does give some hope to extend and expand the project. Also, it gives the hobbyist options to build the basic cartport driver or the full emulated ports.

In other news, I've had the "Back to the future" demo rom on Harmony I posted before running for about a week without crashing. The CartridgePort driver is very stable.

9 hours ago, Al_Nafuur said:

I'd liked to proceed with an (test)-PCB setup

OK, Are you hoping to have have a direct cart connector to your PCB, or wires? If the latter, this design could be used:

Community-Built Unnamed 1970's Video Game Console-Compatible System (WIP)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Recently Browsing 0 members