+llabnip Posted November 18, 2022 Share Posted November 18, 2022 Yeah... I noticed that with the latest Chaotic Grill ROM that dropped earlier this week. I haven't had a chance to debug yet. There's a chance it relies on something I've optimized away. For example, to gain speed on the DS, I don't wrap the fast fetch index as Stella does - assuming that it will always be set/reset before the next block of transfers. It might be something like that. So many things to work on... so little hobby time! 1 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 18, 2022 Share Posted November 18, 2022 2 hours ago, llabnip said: It might be something like that. That appears to be it - datastream 5 is being used to update color and graphics of both players and is wrapping from the end to beginning of the 4K Display Data RAM. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 18, 2022 Share Posted November 18, 2022 Holy crap - what kind of wizard are you, Darrell?! I'll add back the wrap... it drops Space Rocks by almost 1 frame/sec but I'll find a way to get that speed back elsewhere! Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 18, 2022 Share Posted November 18, 2022 29 minutes ago, llabnip said: Holy crap - what kind of wizard are you, Darrell?! I'm very familiar with DPC+, it exists because I posted "expanded DPC?" after experimenting with the Harmony's DPC driver (for Pitfall 2) back in 2010: On 3/13/2010 at 9:55 PM, SpiceWare said: Did some experimenting with DPC today, very nice. Sprite demo posted here. Would it be possible to expand DPC to support 7 banks of data and 4K for graphics data? xFF5 = bank 0 xFF6 = bank 1 xFF7 = bank 2 xFF8 = bank 3 xFF9 = bank 4 xFFA = bank 5 xFFC = bank 6 over the next couple of months batari wrote the DPC+ driver for the Harmony/Melody while I wrote the CartridgeDPCPlus class in Stella as well as the original demos: As such, when you said "I don't wrap the fast fetch index" I knew exactly what you were talking about. 1 Quote Link to comment Share on other sites More sharing options...
+splendidnut Posted November 18, 2022 Share Posted November 18, 2022 Oh, oops, that data fetcher wrapping on the ChaoticGrill titlescreen is unintentional. I didn't realize that was happening.... You can blame me for being lazy and not applying the windowing feature to that data... which might be something that I want to add in after seeing this. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 18, 2022 Share Posted November 18, 2022 I'd like to claim this was a "developer mode" on StellaDS but... it's just my trying to keep the optimization super high to run on the venerable Nintendo hardware Quote Link to comment Share on other sites More sharing options...
+splendidnut Posted November 18, 2022 Share Posted November 18, 2022 1 hour ago, splendidnut said: You can blame me for being lazy and not applying the windowing feature to that data... which might be something that I want to add in after seeing this. Actually... I just checked the code and I am using the windowing feature. I poked at the code a bit and was able to make some adjustments so the data pointer will stay within range now. It'll be in the next build, whenever that is. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 19, 2022 Share Posted November 19, 2022 Okay... daily build StellaDS 5.8a checked in with the update to wrap the fast fetchers. Chaotic Grill looks much better. I did a bit of optimization to almost offset the performance hit... probably less than half-a-frame per second hit now. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 21, 2022 Share Posted November 21, 2022 Version 5.9 is released! https://github.com/wavemotion-dave/StellaDS V5.9 : 21-Nov-2022 by wavemotion-dave (aka llabnip) Minor fixes for some games to render them more accurately including the new Chaotic Grill homebrew. Improved ARM Thumbulator for another frame of performance. Minor cleanups and optimizations across the board Another frame of performance on DPC+ games using the ARM Thumbulator - 90% of those games will run full speed on the DSi with notable holdouts being Scramble (but still playable) and Space Rocks which is fine with just a hint of slowdown when there are about one billion** rocks floating about. **Did not actually count the rocks. Estimates may be skewed based on perception and length of time spent trying to optimize the ARM core. 3 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 21, 2022 Share Posted November 21, 2022 Nice! After having remapped controls for AD&D in NINTV-DS, I've found that Space Rocks plays best if I map B and X to Joystick Up and Joystick Down. I have noticed a very minor issue with Space Invaders: Launch StellaDS Tap on Cartridge Slot to select game Select Space Invaders Hold down DS's START button Tap A Space Invaders starts up with double-shots 😁 Launch StellaDS Tap on Cartridge Slot to select game Start any game Tap on Cartridge Slot to select new game Select Space Invaders Hold down DS's physical START button Tap A button to load game Space Invaders starts up with single-shot ☹️ Basically the double-shot trick works, but only if Space Invaders is the first game loaded. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 21, 2022 Share Posted November 21, 2022 You know... there was a time when I knew that was a "thing" for Space Invaders (as it's one of my favorite classics on the 2600). I had completely forgot over the years! I didn't know it worked under emulation. But with your detailed report, I can clearly see why it works the first time and not subsequent loads. On the first load, we check the switches and then run the first frame of emulation. On subsequent loads, that's done after we set the switches so holding START button will run the first frame without reset (and the reset switch will not be seen until the 2nd and subsequent frames). Easy fix for something I didn't even know I half-handled 5.9a daily build checked in with this fix... and it will go out in the official build 6.0 in a few weeks. Thanks again! 1 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 21, 2022 Share Posted November 21, 2022 3 hours ago, llabnip said: On the first load, we check the switches and then run the first frame of emulation. On subsequent loads, that's done after we set the switches so holding START button will run the first frame without reset (and the reset switch will not be seen until the 2nd and subsequent frames). With that: Launch StellaDS Tap on Cartridge Slot to select game Start any game Hold down DS's physical START button Tap on Cartridge Slot to select new game (OK to release START after game list shows up) Select Space Invaders Tap A button to load game Space Invaders starts up with double-shots 😁 1 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 22, 2022 Share Posted November 22, 2022 20 hours ago, llabnip said: I didn't know it worked under emulation. on the computer you can right-click a game in Stella's ROM list to access the Power-On options. Checking Reset will start Space Invaders with double-shots. Since I knew it worked there, I was curious if I could replicate it in StellaDS. As a developer I also find the Startup mode option to be very useful - can tell Stella to go immediately into the debugger. From the changelog it was added in 2013: Quote August 21, 2013 Stella release 3.9.1 for Linux, MacOS X and Windows is now available. ... Renamed 'Override properties' dialog (accessible from the ROM launcher by a right-mouse-button click) to 'Power-on options', with the following new options: Set start-up state for both joysticks as well as console select/ reset buttons. Related to this, added 'holdjoy01' and 'holdjoy1' commandline arguments, and removed 'holdbutton0' argument. The ability to load the ROM directly from this dialog, after changing any settings, and also to start in the debugger. Added more detailed information as to how to use this functionality to the UI. Buttons held down are reset approx. 0.5 seconds after starting the ROM, to simulate pressing and releasing the buttons on a real console. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 22, 2022 Share Posted November 22, 2022 (edited) Yeah, the full-blown Stella is the 8th wonder of the world, IMO. Incredible how much awesome is packed into that package. I threw them some financial support last year but I think it's time to do so again. I'm almost embarrassed by the tricks I've had to pull to get any of that running properly on the DS/DSi/3DS. At the top of my emulator core source headers, I place this disclaimer:// This file has been modified by wavemotion-dave for optimized execution on the DS/DSi platform. // Please seek the official Stella source distribution which is far cleaner, newer, and better maintained. With coffee this morning, I did get another frame out of the ARM Thumb emulation... I have a new configuration option for 'Thumb Optimization' which is shorthand for 'Stuff that is unsafe but probably works'. I turn it on for Space Rocks, Stay Frosty 2 and Scramble only (but it can be enabled/disabled for any ARM assisted game). Basically it eliminates the ARM Thumb V overflow flag for ADDs, SUBs and MULs. I have not seen any game utilize the overflow flag for basic arithmetic operations - and if the ARM code is written in C, there may not be any convenient way to look at the overflow flag anyway. I've seen no ill effects yet - still testing. And one frame of boosted performance represents about 10-20 hours of my life-energy via more traditional optimizations I'm still toying with adding support for CDFJ. I realize you're the 'D'. The 32K version with 8K RAM should be doable as I have 8K of fast RAM (which I'm already using for DPC+). But I'm worried that it will be an uphill climb and still won't be playable on the little handheld-that-could. As you know, Space Rocks, Stay Frosty 2 and Scramble use more CPU time than I have available... and I can't imagine CDFJ will be any less taxing. I also can't process the fast music fetchers with any fidelity. My one hope is that the balance shifts from 6502 CPU to ARM Thumb which, truth be told, is leaner and faster than the 6502 (emulation wise - the entire ARM emulation fits in my fast ITCM ARM memory). But it's the TIA processing with collision detection that really soaks up most of my CPU processing... anyway, I go back and forth on whether to add the support. It was a solid 80 hours to get DPC+ working to the point where games are playable. Edited November 22, 2022 by llabnip 1 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 22, 2022 Share Posted November 22, 2022 55 minutes ago, llabnip said: But it's the TIA processing with collision detection that really soaks up most of my CPU processing... 🤔 I just searched the source of Space Rocks, Draconian, and Frantic for CX - not a single instance of TIA collision detection registers. My object multiplexers means players/missiles/ball that are colliding will often not be drawn on the same frame. So I have to use software collision detection. @johnnywc might likewise only be using software collision detection. As such, you might be able to gain performance if you had an option to enable/disable TIA collision detection for specific games. Stay Frosty 2 does use them, so disabling TIA collisions for it would make the game unplayable. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 22, 2022 Share Posted November 22, 2022 Cool! TIA processing is a combo of colors and collisions... but I temporarily removed the collision detection just to see what effect it would have and it gained me almost 2 full frames of performance on Space Rocks and about 1.5. frames of performance on Scramble (which seems to play fine in a 5 minute test). As you mention, Stay Frosty 2 doesn't work as you just float through platforms and can't collect objects, etc. That's enough of a boost that I may add an option to skip collision detection and enable it for the two games that need it most. 2 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 22, 2022 Share Posted November 22, 2022 Nice! I just did a quick test of RobotWar and Qyx in Stella, used the DeveloperKeys to toggle TIA Collision off for each object and it played OK. Likewise SF2 was unplayable. So CDF games that don't use TIA collisions are probably doable. And there's a chance the voices in Draconian would work, there's a lot less ARM overhead for samples vs music. Samples are stored as 2 nybbles per byte, so it's just splitting a byte in half. For 1 sample the value can be returned as is because AUDVx ignores the upper nybble anyway. For the other sample just return a simple sample>>4, which is super fast with the ARM's barrel shifter. 1 Quote Link to comment Share on other sites More sharing options...
+johnnywc Posted November 22, 2022 Share Posted November 22, 2022 3 hours ago, SpiceWare said: @johnnywc might likewise only be using software collision detection. I think the only 2 games of mine that use hardware collision is Conquest of Mars and Avalanche. All others use software collision detection. 2 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 22, 2022 Share Posted November 22, 2022 Well... it's all working... I think. For the first time, the Space Rocks title screen and the entire first wave of rocks will sustain 60 fps. This is with every optimization enabled and reducing the sound quality (the sound quality was always reduced for DPC+ games but with recent speedups, I've restored the normal 20KHz sound and let the user select 10/15/20/30KHz - these are rough numbers as the internal Atari to DS conversions are optimized for powers of 2). With more rocks and especially when the ship is firing, it will momentarily drop down to mid-to-upper 50s which is still playable. As an aside... Adventure is one of my 20 benchmark games (mostly because I love the game and it comes first, alphabetically, in my favorites folder) and I just hit unthrottled 180 fps. That is, I can (if I so choose) now run 3 simultaneous Adventure games on the DSi/3DS I like to think of these as SAUs (Standard Adventure Units) and I'm now at 3.0 SAUs. Unfortunately, I'm only at 0.8 SUs (Scramble Units). 1 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 22, 2022 Share Posted November 22, 2022 53 minutes ago, llabnip said: As an aside... Adventure is one of my 20 benchmark games (mostly because I love the game and it comes first, alphabetically, in my favorites folder) and I just hit unthrottled 180 fps. 🤔 I've not put much time into Treasure of Tarmin yet, but i don't think it required fast reactions like Cloud Mountain. Unthrottling it might help with the slow screen refresh while moving thru the maze. Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 22, 2022 Share Posted November 22, 2022 With NINTV-DS you can set the speed to 130% (or whatever) and it will actually sync the video and sound to that speed. That's what I do to move through the maze faster on Tarmin. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 23, 2022 Share Posted November 23, 2022 With coffee this morning, I did some profiling of the emulation. These are rough numbers - but probably fairly accurate representations of where the emulator is spending most of its time with the two more challenging games (values obtained by running the actual game for 5 minutes each). Game 6502 TIA ARM Other Space Rocks 30% 25% 40% 5% Scramble 35% 30% 33% 2% 6502 represents the CPU execution minus any time spent in TIA or calling into the ARM function (but does include fast fetchers). TIA represents only rendering the screen as I've disabled "hardware" collision detection for these games. Other is stuff like sound or RIOT processing... This is about what I'd expect, all things considered. The removal of HW collision detections only had a small percentage difference - there is still a ton of TIA processing happening. I feel like I'm down to almost the bare-metal with optimization of these three big areas. I'm shifting to seeing if I can render the NDS display faster using DMA and some other fast-memory tricks. 1 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 23, 2022 Share Posted November 23, 2022 4 hours ago, llabnip said: I feel like I'm down to almost the bare-metal with optimization of these three big areas. I'm shifting to seeing if I can render the NDS display faster using DMA and some other fast-memory tricks. Don't know if this is relevant to the DS, or with the compiler you're using, but just in case: On the Harmony/Melody the ARM has a large amount Flash memory (32K) and a small amount of SRAM (8K). As covered by @cd-w in this blog entry, the Flash is slower than SRAM. The game's ROM, and thus custom ARM code, ends up in Flash with 4 cycle access time. We figured out how to run time critical functions in SRAM for a 23% boost in performance due to its 1 cycle access time. It's not 1/4th the original runtime because MAM ( Memory Accelerator Module - aka caching) helps the Flash code run faster. Also in this 2-page discussion @Thomas Jentzsch figured out how to have the compiler optimize specific functions for performance, even in Flash, with similar speed gains. Search on optimize(3) to find those. 1 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 23, 2022 Share Posted November 23, 2022 The DS is a bit of a mess... nothing runs from flash - it's all copied into the "massive" 4MB SRAM (16MB on the DSi) but this SRAM is not fast. Even worse... they hooked it up to a 16-bit bus so there are wait states that clobber performance. To compensate, they have working RAM which is fast... this comes in several flavors... 16K DTCM (Data Tightly Coupled Memory) and 32K ITCM (Instruction Tightly Coupled Memory) which the programmer has lots of control over... plus some data/instruction cache which the programmer doesn't have as much control over. This working RAM has a real 32-bit bus access. There is also 16-bit Video VRAM which is faster than SRAM and slower than working RAM. But it can be general-purposed (and of the 700K of VRAM, I have about half available as pseudo-fast-RAM). I'm just about out of the fast working RAM. I use a general purpose 8K fast buffer for a lot of things... for 4K or 8K carts (or AR carts) I can store the entire program in that memory (I also allocate another fast 256 bytes of Atari RAM for the 6502 RAM plus SARA chip RAM). And there are piles of fetchers and counters and two sets of CPU registers (6502 vs ARM) which I also put in fast memory. 16K doesn't go that far! The cache is the biggest mystery... move one line of code and something falls out of a 32-byte 'cache line' and performance can swing one way or the other. It can be frustrating to make a change you know is fewer cycles only to upset the cache-line balance and lose a frame of performance. The DS is part science, part Voodoo. Having said all this, I did have a revelation in the middle of the night. The DS is an ARM9 core which has Thumb support. In theory, I wouldn't even have to emulate the ARM code. But the DS is not a Harmony cart and so the RAM won't be in the right spot... but it could possibly be managed/redirected to work. But I got a headache thinking about how to do it... and instead I just went back to sleep. 3 Quote Link to comment Share on other sites More sharing options...
+llabnip Posted November 24, 2022 Share Posted November 24, 2022 @SpiceWare Well... it's not going to set any land-speed-records... but it's something. 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.