Jump to content

DirtyHairy

Members
  • Posts

    873
  • Joined

  • Last visited

Posts posted by DirtyHairy

  1. 8 hours ago, MarcoJ said:

    It would be interesting to try running SC and E7 games with a driver in PlusCart that does not attempt to fulful the PlusROM functions and see how it performs. I expect it would rival that of the real carts.

    Don't overestimate the cycle budget on the STM32. From my experience with the UnoCart the headroom is not that big, and some banking schemes like DPC can break even if a new compiler version emits slightly less efficient code. I would estimate that you would get about a factor like maybe 2 of headroom with extremely simple banking schemes, but certainly not more. I wouldn't be surprised if real cards in fact overclock much better than the Uno / Plus.

  2. 37 minutes ago, MarcoJ said:

    @Al_Nafuur does the interrupt look feasible?

    Nope. No way to use interrupts from userspace. Afaik the closest you can get is the the in-kernel GPIO driver, which lets you do a select on a fd that returns if an interrupt was triggered --- this is what libpigpio does. But that means yielding to the scheduler, and then we're talking microseconds again. I think polling is the only way here.

  3. 1 hour ago, Al_Nafuur said:

    If we want to offer the user to also run "normal" bin files from an external storage like a SD-Card or USB storage, then we need the file selection anyway.

     

    I thought of using the file as a configuration for the hardware (CPU, GPIOs etc. 🤷‍♂️). However, I have not yet thought of anything we really should configure there. For the setting of the TV format for example, one must insert only in the file name "NTSC" or "PAL50".

     

    On the other hand we could also remove all other bankings and the file selector from Stella and only start/stop the emulation of the cartridge port via a switch.

     

    What are your suggestions on this?

     

    Ship two builds of Stella, one rtstella for bitbanging, and a regular build. Run a daemon that watches the port. Once a cartridge is inserted the daemon kills the running Stella instance and starts rtstella, once a cartridge is removed rtstella is killed and ordinary Stella runs again (displaying the dialog).

     

    Why do you want to encode configuration in a "dummy cartridge"? Just use a configuration file 😏For the TV format: just add a hardware switch and wire it to a GPIO.

  4. 6 hours ago, MarcoJ said:

    This would be my second guess to why writes sometimes get corrupted. I tried rewriting the Poke function to include the nop delay inline with the write function, rather than waiting for the next time it’s called. It seemed to make no difference. If there is unstable data on the cart side of the driver IC this could explain why data gets corrupted. I wonder if some light termination on the cart bus side could help or shortening of data bus wiring.

    My thoughts exactly. I am not sure what happens on a read from a write location on a real VCS, either. No one is driving the bus, and I would assume that most of the time the bus holds the last value. I guess the driver chip behaves differently than a real 6502 bus. Electronics is not my strong side, but would an array of pulldowns help? That would put the bus in a well-defined state if noone is driving it.

  5. 5 hours ago, Al_Nafuur said:

    That't the one that I saw initially and that I couldn't find yesterday evening. It gives the corresponding instructions for arm32. However, there is a small flaw: the function on the post only reads the lower 32bit word. The counter is 64 bit wide, and you can use MRRC to transfer both words.

  6. 2 minutes ago, MarcoJ said:

    Question, will/does it make a difference if the "Multithreaded" option is ticked in Stella?

    Ho-hum, good question. It may be that those threads are dispatched from the emu thread, in which case they would inherit the affinity and RT priority and will start competing with the emu thread for core 3. They should cause a lockup, though. Do you see a difference if you tick that option?

     

    Oh, btw, does only the debugger freeze Stella, or does the menu freeze it, too?

  7. 2 hours ago, Al_Nafuur said:

    I think the CPU is fooling us about these timers/cycle counters!

    I'm not convinced 😏 The performance counters are part of the ARM architecture spec, nothing PI specific, and they should work fine. Maybe we are using them wrong, maybe there is something we overlooked, and maybe there is something else wrong altogether. We'll only find out by systematic debugging. What would help:

    • Test the performance counter against a well-known clock source
    • Generate a square wave on a GPIO pin by using a loop against the counter, and check that with a scope.

    Both are on my list, but not today (and probably not tomorrow).

  8. 1 hour ago, MarcoJ said:

    Result: This dramatically reduced the lag that occurs every second or so. I have noticed this lag on both 32bit and 64bit OS. 

    Awesome! The lag comes from the scheduler periodically interrupting the emulation thread. Changing /proc/sys/kernel/sched_rt_runtime_us will disable that.

    1 hour ago, MarcoJ said:

    Result: This didn't appear to change the performance at the time.

    😏 It's not a command, but a kernel parameter. You have to append it to the kernel command line (cmdline.txt) on the boot partition. I just doublechecked and tested myself, you have to append "isolcpus=domain,managed_irq,3" . The nohz part would also be useful, but we'd need to rebuild the kernel to support that. Anyway, this will reserve the fourth core for Stella to claim, and no other threads or processes will be scheduled there.

    29 minutes ago, MarcoJ said:

    It might need a more sophisticated approach to apply.

    Nah, I think those are just bugs in my thread handling. I tried to make sure that the emulation thread gives up control and yields if Stella quits emulation mode, but it seems I missed something. I am pretty sure that I can debug and solve this when I find time.

    • Like 1
  9. 1 hour ago, MarcoJ said:

    I was thinking, as a thought experiment, what happens if normal Stella gets nop delays added to its read/write cycles? Would similar things happen where emulation struggles to keep up in real time? 

    The effect would be the same.

     

    After more digging, it seems like we can fully disable scheduling on the emulation core after all. Specifically, adding "isolcpus=domain,nohz,managed_irqs=3" to the kernel command line should fully disable scheduling and interrupts on the last core, and only threads with explicit affinity will schedule there --- this will give us full owmership of the core. Furthermore, "echo -1 > /proc/sys/kernel/sched_rt_runtime_us" allows realtime threads indefinitely without interruption (although i am not sure whether this is required with isolcpus). A good reference is https://canonical.com/blog/real-time-kernel-tuning .

     

    However, we should first identify and understand the current issue with the performance counter, I don't think it is related to scheduling.

    • Like 1
  10. I don't doubt the counter or the timers, I think there is something fundamentally wrong that we are overlooking --- there is too much weird behaviour that does not line up for me. Could you do a test? For a well-known ROM (say Combat) log each peek and poke in an array, and write that to disk after about 1000000 entries. For each peek / poke the log should contain the values of the performance counter at entry and at exit, the address, the type (peek/poke) and the value read (if applicable), preferable in text format, one line per peek/poke. This should give us some insight into what is happening.

     

    Just take care to reset the log in ::reset, otherwise the entries from TV standard detection and actual emulation will mix 😏

    • Like 2
  11. 9 hours ago, Al_Nafuur said:

    For a read cycle we are waiting the full ~700ns (like the 6502) before we read the data bus.

    Hm, my gut feeling is that this is too long to get full speed (doesn't leave enough time for emulation and for catching up with lost cycles), but anyway, that's not the issue here. From a brief look I can't see any issues in the code. You should definitely implement ::Reset though --- Stella first runs the emulation to detect the video mode, and then resets the system before running the actual emulation. Can't see any obvious issues with that here, though. Maybe a look at the generated assembly gives a hint.

     

    I'll take a close look later this weekend when I find more time.

  12. 6 hours ago, Al_Nafuur said:

    According to this routine the readings are stable with a performance counter > 7000 (cpu cycles?). However when I use the performance counter and this delay value in Stella the emulation crashes really fast. The emulation gets somewhat  stable with delays > 100,000 but it is extremely slow!

    Something is very wrong here. You are right, the performance counter measures CPU cycles. If you have set the PI to performance, then one 6502 cycle is roughly 1000 ARM cycles, so 7000 cycles is already too slow by a factor of 7. The bus should stabilise much faster, as it does on a real VCS. Maybe electrical issues?

     

    On the Stella end I suspect a bug. Can you maybe push your code to a branch so I can have look at it? Each read should look roughly like this:

     

    1. Check how much time is left from the last cycle and spin until  P <= (T_current - T_start)

    2. Store the counter at the beginning of this cycle in a T_start

    3. Write the address

    4. Wait for a short delay until the bus stabilizes

    5. Read and return the value

     

    T_start = counter at cycle start , T_current = current counter, P = bus cycle length in ARM cycles

     

    Emulation happens between 5 and 1, between one call to peek and the next.

  13. 8 hours ago, Al_Nafuur said:

    I think it makes no sense to merge the two branches yet. @DirtyHairy

    I think a merge would be fine. The rtstella scheduling code seems to be stable and work as it should, and you have an ifdef in your timing code that checks for rtstella, so people can try both variants. Anyway, rtstella is the way forward, the normal scheduler will never work at full speed with a real cart..

  14. 5 hours ago, Al_Nafuur said:

    I think I first need the 64 bit Pi OS:

    The same functionality is available in 32bit mode and exposed via CP15, so you need slightly different instruction sequences there. There is also example code on the web for this (both for reading and for the setup in kernel space), but I think switching to 64bit would be a good idea anyway and give better performance.

  15. Well, the performance counter works. After building and inserting the kernel module that I linked I can run the following sample:

     

    #include <iostream>
    #include <cstdint>
    
    using namespace std;
    
    static inline uint64_t read_pmccntr(void) {
        uint64_t val;
        asm volatile("mrs %0, pmccntr_el0" : "=r"(val));
        return val;
    }
    
    int main() {
        uint64_t samples[32];
    
        for (auto& sample: samples) sample = read_pmccntr();
    
        for (int i = 0; i < 32; i++) {
            cout << samples[i];
            if (i > 0) cout << " : " << samples[i] - samples[i-1];
    
            cout << endl;
        }
    }

     

    and the output at "-O2" is

     

    13553396869
    13553396888 : 19
    13553396901 : 13
    13553396914 : 13
    13553396920 : 6
    13553396926 : 6
    13553396932 : 6
    13553396938 : 6
    13553396944 : 6
    13553396950 : 6
    13553396956 : 6
    13553396962 : 6
    13553396968 : 6
    13553396974 : 6
    13553396980 : 6
    13553396986 : 6
    13553396992 : 6
    13553396998 : 6
    13553397004 : 6
    13553397010 : 6
    13553397016 : 6
    13553397022 : 6
    13553397028 : 6
    13553397034 : 6
    13553397040 : 6
    13553397046 : 6
    13553397052 : 6
    13553397058 : 6
    13553397064 : 6
    13553397070 : 6
    13553397076 : 6
    13553397082 : 6

     

    🎉

    @Al_Nafuur This should do the trick.

    • Like 4
×
×
  • Create New...