Jump to content

JetSetIlly

Members
  • Posts

    763
  • Joined

  • Last visited

Everything posted by JetSetIlly

  1. Yes. I was thinking from an emulation point of about why this works without any change to the driver. And it's because these functions are copied from ROM to RAM as part of the .data section, which is already emulated (in Stella, etc.). Very nice solution for time critical code.
  2. This is incredibly useful. PrepAreaBuffers() appears top be at 0x000011b0 while RAM_PrepAreaBuffers() appears to be at 0x4000185c. There's no copying of the custom program to somewhere in RAM, it's just the variable block. Interesting.
  3. The changes I've made today around cycle accuracy and how the MAM works is worth packaging up I think. Summary: - MAM now differentiates between mode 1 and 2 - MAM set according to cartridge mapper (ie. DPC+ or CDF*) [thanks @SpiceWare] - Counting of conditional branches corrected [thanks @Thomas Jentzsch] https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.12.1 And thanks to @Al_Nafuur for help with the Windows binary.
  4. Good point. Currently the assumption that access to SRAM is only made in the case of PUSH, POP and LDMIA and STMIA, which I think is reasonable. All other loads and stores are stretched for Flash memory. The current MAM mode alters that but in principle most read/writes are stretched for Flash. But as you said in a previous post, the available documentation for cycle usage isn't very good, so I'm making a lot of assumptions ? At the moment, I'm happy if the timings work out so that (a) existing ROMs work (or don't work) as expected and (b) that it's helpful for developing new ROMs. Which for me, means causing a screen roll or running past the end of memory if the Thumb program runs too long.
  5. It's definitely running ARM code. So we can assume that it just so happens not to trigger the bug. I'm assuming that the PC will never jump from one area to the other and that data read/writes are always from SRAM. I'm already measuring if the read/write is from RAM or Flash. Right. That's probably enough information to have a linting check in the emulator. (Probably not needed now that CDF* is availble but it's a nice feature). Thanks @SpiceWare and @Thomas Jentzsch for the info and helping me think things through.
  6. Yes. Right. This might be where my confusion has arisen. Many DPC+ARM ROMs I have seen do as you describe. However, there are some which do not. For example, the ROM I have of DK Arcade (link below) does not do this. So are we saying that this ROM may crash on the hardware if it triggers the bug? Either way, I understand now. Right. So it remains in MAM mode 2. That tallies with what I've read in the Turbo demo thread. Let me summarise my understanding: 1. The DPC+ driver does not change the MAM mode from mode 2 1a) The Thumb program must therefore change to MAM mode 1 or risk a crash 1b) The Thumb program must change back to MAM mode 2 before exit 2. The CDF and CDFJ drivers handle the MAM changes (so the Thumb program doesn't have to) 3. The CDFJ+ driver leaves MAM in mode 2 Thanks.
  7. If it's enabled by default on entering the Thumb program then even that is a significant difference to MAM 0. @batari if you could confirm that MAM 1 is set by the driver I would be very grateful.
  8. Can you tell me some about the MAM as it exists in the Harmony? What is the nature of the bug exactly? Reading comments from @johnnywc I know the latest drivers put the Chip in MAM mode 2 by default but what about earlier drivers? Does it initialise in MAM mode 0 or MAM mode 1? In old Harmony cartridges, is it only mode 2 that you can't enter from the Thumb program or can you not enter mode 1 either?
  9. Sort of. I've made the assumption that the caching is perfect and runs at the same speed as SRAM. Which, as you say is negligible. No more than 10ns I suspect. Currently, you can have MAM turned on by default (meaning the driver has turned it on for you); or allow the Thumb program to turn it on (which as I understand some versions of the Harmony do not allow this). By default, I have the MAM active at the start of program execution, which is good for the very recent Champ games. Are you sure that you can discount S cycles if MAM is active? I've not found good information about the MAM but I'm assuming that an S cycle would take 1 unstretched ARM cycle . Is it documented anywhere that you can ignore S cycles?
  10. The big question for me now is the access speed of SRAM and Flash memory as found in the Harmony. I've instrumented the settings so they can be changed on the fly but a good default is required.
  11. Hmm. Good question. If we look at the ARM equivalent instruction which is the MOV instruction, then the format of that instruction suggests that a shift happens when the shift bits are non-zero. If the bits are zero then a shift does not happen. The cycle information for the MOV instruction meanwhile says that the I cycle isn't required unless there is a shift. A bit pattern of zero means no shift, so no I cycle is required. Therefore, if we take at face value the equivalence of Thumb mode LSL/LSR and ARM mode MOV, then that says to me that LSL/LSR instruction with shift bits of zero do not require the additional I cycle. I may be overthinking it and I'm prepared to be wrong, but that was my interpretation.
  12. That makes sense and I think you're right. In ARM mode all instructions can be conditional but in Thumb mode it's only the branch which is conditional. I hadn't considered the possibility of conditionality and took that section at face value. Cheers.
  13. Floats. We must have different documentation. This is from the ARM7TDMI-S technical reference manual https://developer.arm.com/documentation/ddi0234/b
  14. Disabling CRT effects shouldn't make any difference. All the CRT processing is done on the GFX chip so assuming you have a GFX chip the performance impact is negligible. I'm interested if you're experiencing anything different. Note that I have a GTX 650 in my development machine which is the same vintage as your MacBook Pro. What spec GFX chip does the Pro have?
  15. I'll summarise my understanding as best as I can. Apologies if you already know this. There are four types of cycles: I cycles, S cycles and N cycles. The forth type, C cycles, can be ignored in our case. I cycles are unaffected by memory speed and run at the ARM clock rate. N and S cycles can be "stretched" according to the underlying speed of the memory being addressed. For Gopher2600, I've added the cycle profile for each of the 19 instruction groups. During execution of an instruction I make a count of the I, N and S cycles. Crucially, I count N and S cycles according to whether it was a PC fetch or a "data fetch" (this information is in the ARM7TDMI data sheet) Once an instruction has completed I apply the appropriate stretching for the memory type. I make an assumption here that all data fetches are from SRAM and all PC fetches are from the memory area pointed to by the PC value at the end of the instruction. Both of these are reasonable assumptions I think. If MAM is enabled I assume that the caching is "perfect" and all accesses occur at SRAM speed. How the MAM works is probably the area where the most improvement can be found but for now it seems to work okay. On the subject of conditional branching: the ARM7TDMI data sheet says it takes two S cycles and one N cycle (both PC bound) so that's three cycles at the speed of the underlying memory pointed to by the PC. If the branch alters the PC from one memory area to the other then the cycles might stretch differently but I haven't bothered modelling that. (a) it's unlikely and (b) the 6502 probably wouldn't even notice the difference (unless it happens a lot). There's nothing in the documentation that indicates cycle usage is different if the branch is successful or not.
  16. I've made some significant performance improvements to Gopher2600 this week. I can't promise it'll be fast enough for everyone but on my machine there is about a 9% improvement in FPS for a normal 2600 ROM and about 20% improvement for a typical example of a ROM using the ARM chip. The improvements are a combination of TIA streamlining (recognising that some conditions can be eliminated if some other condition is true/false) and removing a counter-productive memory reallocation when the ARM program is executed. https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.12 I've also made some improvements to the CRT emulation. Scaling the image is now limited to whole steps (so 3x or 4x, etc.). This prevents color and size banding noticeable on some ROMs if the scaling factor was not a whole number. I've also controls to adjust the sharpness of the image and the fineness of the shadowmask and scanlines. A bilinear filter is now applied to the source CRT texture. I discovered this by accident when experimenting with scaling methods but I've found that it enhances the brick effect in zookeeper very nicely indeed. This probably isn't news to anyone but me ? Finally, screen roll. There are no settings to adjust this yet but the screen will desynchronise "correctly" when a VSYNC comes too late. Recovery to a stable image takes a second or two. I hadn't thought about screen roll originally but was encountering the need for it more and more now that ARM cycle-counting is in place.
  17. I've messaged you but in case your talking about the scanline effect being too heavy you can turn it down or turn it off through the CRT Preferences window, which you can open with F10. Uncheck or alter the effect strength to your taste. Pixel Perfect renders with no effects at all.
  18. Ouch. That is slow. I'll have another look at performance for next version.
  19. Cheers ? We'll have to see when 1.17 is released but from what I've seen there will be a difference. But speed generally, is a problem for this emulator when compared to Stella. It's partly down to the differences between C++ and Go but a lot of it is down to my emulation method which is probably more fussy than it needs to be. These are the top performance hogs in the emulator, running the Gorf Arcade demo. As you can see the ARM emulation is a very small percentage of overall cost. It's the way I'm doing the TIA emulation which is causing the most expense. I can get a more-or-less solid 60fps on my development machine (a 2012 i3) which was my goal when starting this. You can check for performance with the following: gopher2600 performance -display -fpscap=false romfile.bin and gopher2600 performance -fpscap=false romfile.bin Any difference between the -display and non-display versions tells us the basic overhead of the screen rendering, which is cut out entirely unless the -display flag is used. By my measurements there's quite a lot to gain but I'm no expert on graphics programming so I can't see how to improve it at the moment. If you're getting around 60fps normally, using the -fpscap=false option can give you a better idea of performance. Limiting the frame rate to the TV specification introduces its own set of problems so removing it from the measurement can be good. I think a good next step for me would be to run and profile the program on a different machine (with a different OS). I've only ever seen it run on this machine and I think a different rig might highlight differences I've not considered.
  20. I've packaged up recent changes as v0.11. https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.11 Main features in this release are better CRT shaders and the improved ARM timings. The Turbo Arcade demo will also work with this version now that I've added support for CDFJ+ I was hoping the new Go compiler would be ready to use but it's not due for a few more weeks yet. When compiled with the development version of the compiler however, there is an approx 8% performance increase in Gopher2600. Not massive but still significant. This version has been compiled with 1.16.4
  21. I've been reworking the ARM cycle counting to try to better account for all the variant hardware and expectations people have. In particular, the new Turbo Arcade demo had issues unless the emulator was in "immediate ARM execution" mode. I've gone through all the instructions and differentiated when N and S cycles are addressing "PC" addresses and "data" addresses. I'd already entered the cycle profile for each instruction group so this wasn't as much hard work as I first expected. For simplicity, I've assumed that all "data" read/writes are done in SRAM. This is not as big an assumption as if first appears because only PUSH, POP, LDMIA and STMIA ever use cycles in this way and I'm fairly confident those instructions rarely (if ever) read/write to flash. The other assumption is the MAM caching is essentially perfect and when enabled Flash memory is never touched. N and S cycles are stretched according to the speed of the memory being addressed. In previous versions I used a flat value of 2 (which was a reasonable first estimation) for N cycles only and in all instances. This led to a reasonable average result but the new version should be more accurate in more situations. I've also added some more ARM options to the preferences window. This can be summoned in playmode as well as the debugger. The full list of ARM options is now: * Immediate ARM execution - thumb program returns immediately and consumes no 6507 time * Default MAM Enable for Thumb Programs - assume the Harmony driver is enabling the MAM * Allow MAM Enable from Thumb - allow the enabling of MAM from within the thumb program. From what I understand, some editions of the Harmony do not allow this. I've added this option in case there are versions or variants which do allow it. The Timings sliders: * ARM Clock - the basic speed of the ARM * Flash Access Time and SRAM Access Time - speed in nanoseconds. The slower the memory the more stretching for N and S cycles. I'm not sure if my default speed values are correct. But these are the values that seem to hit the sweet spot for the collection of ARM ROMs I have available. I plan to do some more work on this this week. Checking for accuracy and adding some instrumentation to the debugger. Here's a short video showing the effect of changing memory speed on the Gorf Arcade title screen. Apologies for my poor screen-roll emulation - that's next on the TODO list. Source on Github. debugger-2021-06-02_19.06.09.mp4
  22. Can anyone tell me what the access times for the Flash and SRAM are in the Harmony etc. Flash is 50ns I think, but what's the SRAM access time?
  23. Yes. About halfway through the hill stage. Cycle counting is part of my emulator. It differs to Stella in that Stella executes the ARM program instantly, relative to the 6507. I've tried a different approach whereby the 6507 is stalled with NOPs like in the real hardware. Cycle counting is tricky with the ARM however so it's not perfect but it is helpful to make sure the program isn't going bezerk. I hope to get the emulation to a state where it is accurate enough for optimisation work but it's not there yet. This is the link to the README where I briefly discuss ARM emulation. https://github.com/JetSetIlly/Gopher2600#arm7tdmi-emulation (Current code supports CDFJ+ but I've not prepared a binary yet) Ah. Of course. I didn't think about it being enabled in the driver.
×
×
  • Create New...