Jump to content
IGNORED

Any info on StellaDS?


AgentOrange96

Recommended Posts

Yeah... I noticed that with the latest Chaotic Grill ROM that dropped earlier this week. I haven't had a chance to debug yet. 

 

There's a chance it relies on something I've optimized away. For example, to gain speed on the DS, I don't wrap the fast fetch index as Stella does - assuming that it will always be set/reset before the next block of transfers. It might be something like that. 

 

So many things to work on... so little hobby time!

 

 

  • Like 1
Link to comment
Share on other sites

29 minutes ago, llabnip said:

Holy crap - what kind of wizard are you, Darrell?!  :D

 

I'm very familiar with DPC+, it exists because I posted "expanded DPC?" after experimenting with the Harmony's DPC driver (for Pitfall 2) back in 2010:

 

On 3/13/2010 at 9:55 PM, SpiceWare said:

Did some experimenting with DPC today, very nice. Sprite demo posted here.

 

Would it be possible to expand DPC to support 7 banks of data and 4K for graphics data?

 

xFF5 = bank 0

xFF6 = bank 1

xFF7 = bank 2

xFF8 = bank 3

xFF9 = bank 4

xFFA = bank 5

xFFC = bank 6

 

over the next couple of months batari wrote the DPC+ driver for the Harmony/Melody while I wrote the CartridgeDPCPlus class in Stella as well as the original demos:

 

 

 

As such, when you said "I don't wrap the fast fetch index" I knew exactly what you were talking about.

  • Like 1
Link to comment
Share on other sites

1 hour ago, splendidnut said:

You can blame me for being lazy and not applying the windowing feature to that data... which might be something that I want to add in after seeing this.

Actually... I just checked the code and I am using the windowing feature.  I poked at the code a bit and was able to make some adjustments so the data pointer will stay within range now.  It'll be in the next build, whenever that is.

  • Like 1
Link to comment
Share on other sites

Version 5.9 is released! https://github.com/wavemotion-dave/StellaDS 

 

V5.9 : 21-Nov-2022 by wavemotion-dave (aka llabnip)

  • Minor fixes for some games to render them more accurately including the new Chaotic Grill homebrew.
  • Improved ARM Thumbulator for another frame of performance.
  • Minor cleanups and optimizations across the board

Another frame of performance on DPC+ games using the ARM Thumbulator - 90% of those games will run full speed on the DSi with notable holdouts being Scramble (but still playable) and Space Rocks which is fine with just a hint of slowdown when there are about one billion** rocks floating about.

 

 

**Did not actually count the rocks. Estimates may be skewed based on perception and length of time spent trying to optimize the ARM core.

  • Like 3
Link to comment
Share on other sites

Nice!

 

After having remapped controls for AD&D in NINTV-DS, I've found that Space Rocks plays best if I map B and X to Joystick Up and Joystick Down.

 

I have noticed a very minor issue with Space Invaders:

  1. Launch StellaDS
  2. Tap on Cartridge Slot to select game
  3. Select Space Invaders
  4. Hold down DS's START button
  5. Tap A

Space Invaders starts up with double-shots 😁

 

  1. Launch StellaDS
  2. Tap on Cartridge Slot to select game
  3. Start any game
  4. Tap on Cartridge Slot to select new game
  5. Select Space Invaders
  6. Hold down DS's physical START button
  7. Tap A button to load game

Space Invaders starts up with single-shot ☹️

 

 

Basically the double-shot trick works, but only if Space Invaders is the first game loaded.

  • Like 1
Link to comment
Share on other sites

You know... there was a time when I knew that was a "thing" for Space Invaders (as it's one of my favorite classics on the 2600). I had completely forgot over the years!

 

I didn't know it worked under emulation. But with your detailed report, I can clearly see why it works the first time and not subsequent loads. On the first load, we check the switches and then run the first frame of emulation. On subsequent loads, that's done after we set the switches so holding START button will run the first frame without reset (and the reset switch will not be seen until the 2nd and subsequent frames).

 

Easy fix for something I didn't even know I half-handled :)

 

5.9a daily build checked in with this fix... and it will go out in the official build 6.0 in a few weeks.

 

Thanks again!

  • Like 1
Link to comment
Share on other sites

3 hours ago, llabnip said:

On the first load, we check the switches and then run the first frame of emulation. On subsequent loads, that's done after we set the switches so holding START button will run the first frame without reset (and the reset switch will not be seen until the 2nd and subsequent frames).

 

With that:

  1. Launch StellaDS
  2. Tap on Cartridge Slot to select game
  3. Start any game
  4. Hold down DS's physical START button
  5. Tap on Cartridge Slot to select new game (OK to release START after game list shows up)
  6. Select Space Invaders
  7. Tap A button to load game

Space Invaders starts up with double-shots 😁

  • Like 1
Link to comment
Share on other sites

20 hours ago, llabnip said:

I didn't know it worked under emulation.

 

on the computer you can right-click a game in Stella's ROM list to access the Power-On options. Checking Reset will start Space Invaders with double-shots. Since I knew it worked there, I was curious if I could replicate it in StellaDS.

 

As a developer I also find the Startup mode option to be very useful - can tell Stella to go immediately into the debugger.

 

From the changelog it was added in 2013:

 

Quote

August 21, 2013

 

Stella release 3.9.1 for Linux, MacOS X and Windows is now available.

 

...

  • Renamed 'Override properties' dialog (accessible from the ROM launcher by a right-mouse-button click) to 'Power-on options', with the following new options:
    • Set start-up state for both joysticks as well as console select/ reset buttons. Related to this, added 'holdjoy01' and 'holdjoy1' commandline arguments, and removed 'holdbutton0' argument.
    • The ability to load the ROM directly from this dialog, after changing any settings, and also to start in the debugger.
    • Added more detailed information as to how to use this functionality to the UI.
    • Buttons held down are reset approx. 0.5 seconds after starting the ROM, to simulate pressing and releasing the buttons on a real console.

image.thumb.png.62e69736703523406818ad2aeb709f4c.png

 

  • Like 1
Link to comment
Share on other sites

Yeah, the full-blown Stella is the 8th wonder of the world, IMO.  Incredible how much awesome is packed into that package. I threw them some financial support last year but I think it's time to do so again. 

 

I'm almost embarrassed by the tricks I've had to pull to get any of that running properly on the DS/DSi/3DS.

 

At the top of my emulator core source headers, I place this disclaimer:

// This file has been modified by wavemotion-dave for optimized execution on the DS/DSi platform.
// Please seek the official Stella source distribution which is far cleaner, newer, and better maintained.


With coffee this morning, I did get another frame out of the ARM Thumb emulation... I have a new configuration option for 'Thumb Optimization' which is shorthand for 'Stuff that is unsafe but probably works'.  I turn it on for Space Rocks, Stay Frosty 2 and Scramble only (but it can be enabled/disabled for any ARM assisted game).  Basically it eliminates the ARM Thumb V overflow flag for ADDs, SUBs and MULs. I have not seen any game utilize the overflow flag for basic arithmetic operations - and if the ARM code is written in C, there may not be any convenient way to look at the overflow flag anyway.   I've seen no ill effects yet - still testing. And one frame of boosted performance represents about 10-20 hours of my life-energy via more traditional optimizations :)

I'm still toying with adding support for CDFJ. I realize you're the 'D'. The 32K version with 8K RAM should be doable as I have 8K of fast RAM (which I'm already using for DPC+).  But I'm worried that it will be an uphill climb and still won't be playable on the little handheld-that-could. As you know, Space Rocks, Stay Frosty 2 and Scramble use more CPU time than I have available... and I can't imagine CDFJ will be any less taxing.  I also can't process the fast music fetchers with any fidelity. My one hope is that the balance shifts from 6502 CPU to ARM Thumb which, truth be told, is leaner and faster than the 6502 (emulation wise - the entire ARM emulation fits in my fast ITCM ARM memory). But it's the TIA processing with collision detection that really soaks up most of my CPU processing... anyway, I go back and forth on whether to add the support.  It was a solid 80 hours to get DPC+ working to the point where games are playable.

Edited by llabnip
  • Like 1
Link to comment
Share on other sites

55 minutes ago, llabnip said:

But it's the TIA processing with collision detection that really soaks up most of my CPU processing...

 

🤔

 

I just searched the source of Space Rocks, Draconian, and Frantic for CX - not a single instance of TIA collision detection registers.

 

My object multiplexers means players/missiles/ball that are colliding will often not be drawn on the same frame. So I have to use software collision detection. @johnnywc might likewise only be using software collision detection.

 

As such, you might be able to gain performance if you had an option to enable/disable TIA collision detection for specific games.

 

Stay Frosty 2 does use them, so disabling TIA collisions for it would make the game unplayable.

  • Like 1
Link to comment
Share on other sites

Cool!  TIA processing is a combo of colors and collisions... but I temporarily removed the collision detection just to see what effect it would have and it gained me almost 2 full frames of performance on Space Rocks and about 1.5. frames of performance on Scramble (which seems to play fine in a 5 minute test).  As you mention, Stay Frosty 2 doesn't work as you just float through platforms and can't collect objects, etc. 

 

That's enough of a boost that I may add an option to skip collision detection and enable it for the two games that need it most. 

 

 

  • Like 2
Link to comment
Share on other sites

Nice!  

 

I just did a quick test of RobotWar and Qyx in Stella, used the DeveloperKeys to toggle TIA Collision off for each object and it played OK. Likewise SF2 was unplayable.  So CDF games that don't use TIA collisions are probably doable.

 

And there's a chance the voices in Draconian would work, there's a lot less ARM overhead for samples vs music.  Samples are stored as 2 nybbles per byte, so it's just splitting a byte in half. For 1 sample the value can be returned as is because AUDVx ignores the upper nybble anyway. For the other sample just return a simple sample>>4, which is super fast with the ARM's barrel shifter.

  • Like 1
Link to comment
Share on other sites

Well... it's all working... I think.

 

For the first time, the Space Rocks title screen and the entire first wave of rocks will sustain 60 fps. This is with every optimization enabled and reducing the sound quality (the sound quality was always reduced for DPC+ games but with recent speedups, I've restored the normal 20KHz sound and let the user select 10/15/20/30KHz - these are rough numbers as the internal Atari to DS conversions are optimized for powers of 2). 

 

image.png.0fbb08d06a96c0ce5d8081558f09de83.png

 

With more rocks and especially when the ship is firing, it will momentarily drop down to mid-to-upper 50s which is still playable.

 

As an aside... Adventure is one of my 20 benchmark games (mostly because I love the game and it comes first, alphabetically, in my favorites folder) and I just hit unthrottled 180 fps.

 

image.png.f5a11deaa4e22fc857830659b5630a9f.png

 

 

That is, I can (if I so choose) now run 3 simultaneous Adventure games on the DSi/3DS :)

 

I like to think of these as SAUs (Standard Adventure Units) and I'm now at 3.0 SAUs.

 

Unfortunately, I'm only at 0.8 SUs (Scramble Units).

  • Like 1
Link to comment
Share on other sites

53 minutes ago, llabnip said:

As an aside... Adventure is one of my 20 benchmark games (mostly because I love the game and it comes first, alphabetically, in my favorites folder) and I just hit unthrottled 180 fps.

 

🤔 I've not put much time into Treasure of Tarmin yet, but i don't think it required fast reactions like Cloud Mountain.  Unthrottling it might help with the slow screen refresh while moving thru the maze.

Link to comment
Share on other sites

With coffee this morning, I did some profiling of the emulation. 

 

These are rough numbers - but probably fairly accurate representations of where the emulator is spending most of its time with the two more challenging games (values obtained by running the actual game for 5 minutes each). 

 

Game             6502   TIA   ARM   Other

Space Rocks      30%    25%   40%   5%

Scramble         35%    30%   33%   2%

 

6502 represents the CPU execution minus any time spent in TIA or calling into the ARM function (but does include fast fetchers).

TIA represents only rendering the screen as I've disabled "hardware" collision detection for these games.

Other is stuff like sound or RIOT processing... 

 

This is about what I'd expect, all things considered. The removal of HW collision detections only had a small percentage difference - there is still a ton of TIA processing happening.

 

I feel like I'm down to almost the bare-metal with optimization of these three big areas. I'm shifting to seeing if I can render the NDS display faster using DMA and some other fast-memory tricks.

  • Like 1
Link to comment
Share on other sites

4 hours ago, llabnip said:

I feel like I'm down to almost the bare-metal with optimization of these three big areas. I'm shifting to seeing if I can render the NDS display faster using DMA and some other fast-memory tricks.

 

 

Don't know if this is relevant to the DS, or with the compiler you're using, but just in case:

 

On the Harmony/Melody the ARM has a large amount Flash memory (32K) and a small amount of SRAM (8K). As covered by @cd-w in this blog entry, the Flash is slower than SRAM.

 

The game's ROM, and thus custom ARM code, ends up in Flash with 4 cycle access time. We figured out how to run time critical functions in SRAM for a 23% boost in performance due to its 1 cycle access time. It's not 1/4th the original runtime because MAM ( Memory Accelerator Module - aka caching) helps the Flash code run faster.

 

Also in this 2-page discussion @Thomas Jentzsch figured out how to have the compiler optimize specific functions for performance, even in Flash, with similar speed gains. Search on optimize(3) to find those.

 

 

  • Like 1
Link to comment
Share on other sites

The DS is a bit of a mess... nothing runs from flash - it's all copied into the "massive" 4MB SRAM (16MB on the DSi) but this SRAM is not fast. Even worse... they hooked it up to a 16-bit bus so there are wait states that clobber performance.  

 

To compensate, they have working RAM which is fast... this comes in several flavors... 16K DTCM (Data Tightly Coupled Memory) and 32K ITCM (Instruction Tightly Coupled Memory) which the programmer has lots of control over... plus some data/instruction cache which the programmer doesn't have as much control over.   This working RAM has a real 32-bit bus access.

 

There is also 16-bit Video VRAM which is faster than SRAM and slower than working RAM. But it can be general-purposed (and of the 700K of VRAM, I have about half available as pseudo-fast-RAM).

 

I'm just about out of the fast working RAM. I use a general purpose 8K fast buffer for a lot of things... for 4K or 8K carts (or AR carts) I can store the entire program in that memory (I also allocate another fast 256 bytes of Atari RAM for the 6502 RAM plus SARA chip RAM).  And there are piles of fetchers and counters and two sets of CPU registers (6502 vs ARM) which I also put in fast memory.  16K doesn't go that far!

 

The cache is the biggest mystery... move one line of code and something falls out of a 32-byte 'cache line' and performance can swing one way or the other.  It can be frustrating to make a change you know is fewer cycles only to upset the cache-line balance and lose a frame of performance. The DS is part science, part Voodoo.

 

Having said all this, I did have a revelation in the middle of the night.  The DS is an ARM9 core which has Thumb support. In theory, I wouldn't even have to emulate the ARM code. But the DS is not a Harmony cart and so the RAM won't be in the right spot... but it could possibly be managed/redirected to work.  But I got a headache thinking about how to do it... and instead I just went back to sleep.

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...