Jump to content
IGNORED

Gopher2600 (continuing development on Github)


JetSetIlly

Recommended Posts

@JetSetIlly  I completely understand that this is mainly a fun/educational project.  There's no need to apologize.  I'm merely providing input from my experiences of trying out this interesting project. 

 

1 hour ago, JetSetIlly said:

But I have to be honest that I don't understand why you think 70mb after one minute is a memory leak.

 

I did NOT specifically say there is a memory leak.  I only meant to indicate there might be one:

 

Quote

Also, it seems to slowly eat more and more memory.

...

As for memory leaking:  I let it go for awhile and after about a minute, it was around 70mb (in run mode) and was continuing to climb.

(Granted, I should have inserted the word "potentially" in that second sentence and that would have made things more clear.)

 

If that's expected (hitting 70mb... and yet still climbing), then there's really nothing to discuss.  Personally, I would be worried about a slight memory leak with those circumstances.  But that's just me :)

 

...

 

SO.... just ran the program for ~5 minutes... and we're up to 71mb.  So I guess we're okay in that area.  Still climbing though... but that might just be the nature of memory allocation/garbage collection in Go.

 

As for Specs:  I'm running Windows 7 (64-bit) on an Intel Core i5 M520 @2.4ghz with 6gb.  Graphics is NVidia NVS 3100M.

Yes, some would say this is an old laptop... but Stella runs just fine on it :)

Link to comment
Share on other sites

1 minute ago, splendidnut said:

@JetSetIlly  I completely understand that this is mainly a fun/educational project.  There's no need to apologize.  I'm merely providing input from my experiences of trying out this interesting project. 

 

That's fine. I appreciate your input. Getting feeding back is after all, why I'm posting here ?

 

I don't have access to Windows, only Wine, so I was a little concerned that you'd spotted something that I haven't.

 

1 minute ago, splendidnut said:

SO.... just ran the program for ~5 minutes... and we're up to 71mb.  So I guess we're okay in that area.  Still climbing though... but that might just be the nature of memory allocation/garbage collection in Go.

 

It is I'm afraid, the nature of memory allocation in Go. I'm not happy about such relatively high memory usage but hopefully the Go compiler will continually improve in this area. I will look into whether there is anything more I can do to mitigate it but given a redesign of the entire project I'm not sure I'll be able to.

 

It's worth noting that memory usage will be much higher in debug mode because of the new rewind system. In this first version, I'm taking snapshots of the entire system very frequently (currently editable to a small margin in the preferences - potentially much greater). This doesn't include the Cartridge ROM but does include the Cartridge RAM, which in the case of the Supercharger is massive compared to everything else.

 

Speaking personally though, I'm not too concerned. I've done my fair share of low-level, highly optimised programming. I wanted to try something different with this project ?

 

1 minute ago, splendidnut said:

As for Specs:  I'm running Windows 7 (64-bit) on an Intel Core i5 M520 @2.4ghz with 6gb.  Graphics is NVidia NVS 3100M.

Yes, some would say this is an old laptop... but Stella runs just fine on it :)

 

Yes. Stella is definitely more performant as well as more feature complete. C++ also produces faster code than Go, that's just a fact ?

 

There are also no optimisations in Gopher2600 (apart from one which cuts the frequency of collision detection by about a third). This obviously harms comparative performance. I've intentionally not looked at introducing any more optimisations because as far as I am able, I want the code to be as representative of my understanding of the TIA as it can be. In might be interesting to have swapable cores - a reference version as it is now; and a performant version, hopefully while retaining the accuracy.

 

With regards to frame rate not significantly being higher playmode than debug mode on your machine, it might have something to do with the graphics card. I'm not familiar with that chip but the runtime might be deciding to run the GL/shader program on the CPU rather than the GFX chip.

 

Thanks again for your interest ? I appreciate your feedback.

  • Like 1
Link to comment
Share on other sites

6 hours ago, JetSetIlly said:

 

I must admit I can't spot it myself. Can you post some screenshots showing the differences?

Sorry I meant that the animation stalls or isn't running smoothly for short moments, not that there are graphical glitches.
 

Link to comment
Share on other sites

Another version with marginal performance improvements. There won't be much improvement when running the ROM inside the debugger (because of the debugging loop's overhead) but there should be a noticeable difference in the playmode.

 

I'm sure that there is more I can do but this was the low hanging fruit.

 

https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.7.1

 

I've also included a "stats viewer" (details in the link below) which will show you the internal memory performance of the program as it is running. @Al_Nafuur reports that he cannot see all six charts on his Window machine. It works fine on my Linux machine and on an acquaintances Mac so I'm not sure what the problem is. If anyone tries it on Windows and *can* see all six charts, can they let me know?

 

https://github.com/JetSetIlly/Gopher2600#statistics-viewer

 

  • Like 1
Link to comment
Share on other sites

Cool!  I just tried the stats viewer in FireFox and Chrome on Windows 7 and it appears there are JS issues.  But it looks like they're in the stats viewer component itself and not your program.

 

image.thumb.png.a0d31a051f9b5252c65763249a1ab4cb.png

EDIT:  Upon further inspection, it looks like they might be/probably are meaningless.

 

Edited by splendidnut
Link to comment
Share on other sites

45 minutes ago, splendidnut said:

Cool!  I just tried the stats viewer in FireFox and Chrome on Windows 7 and it appears there are JS issues.  But it looks like they're in the stats viewer component itself and not your program.

 

image.thumb.png.a0d31a051f9b5252c65763249a1ab4cb.png

EDIT:  Upon further inspection, it looks like they might be/probably are meaningless.

 

 

Cheers. It's a very new third-party package to present the standard Go profiling data in a nice way. I've submitted a bug report to the author.

The raw data can be viewed at http://localhost:12600/debug/pprof incidentally

Link to comment
Share on other sites

Another version with some small improvements:

 

https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.7.2

 

After @splendidnut stung me into action ? I've been working on performance. I've made some improvements but I don't think I can do anymore unless I start hacking at the emulation itself, which I'm reluctant to do.

 

This version has improved TV frame resizing. What I had already was pretty good but some pathological cases (Spike's Peak and Tapper are the ones I noticed) didn't react very well. Hopefully. this has sorted it once and for all.

 

"Flicker kernels" should look better now. NTSC ROMs running on a 60Hz monitor should be pretty solid. Other combinations will still look a bit janky with those sorts of ROMs, but it's better than before I think.

 

Also, in playmode, I've added F12 to toggle an FPS indicator and F11 to toggle fullscreen.

  • Like 3
Link to comment
Share on other sites

3 minutes ago, Prizrak said:

Is there a win32 option for this emulator? When I try on my windows 10 pc says incompatible

Sent from my SM-N960U using Tapatalk
 

 

Not at the moment. I'll look into it for the future. Although I have to confess that I don't use Windows or really know anything about it, so I can't promise anything. Happy for other people to have a look though.

  • Like 2
Link to comment
Share on other sites

  • 1 month later...

Support for ARM cartridge formats. DPC+ and CDF including CDFJ (but no CDFJ+). Faster rewind system and fancier CRT effects.

 

There's a bunch still to do but enough has been done to warrant another release https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.8.0

 

I recorded a video of me playing around in the debugger a couple of weeks ago. For people who haven't managed to run the program yet it gives a flavour of what the debugger is all about.

 

I'll be working continuing to work on new debugging features in the next few months. I'll hopefully get around to working on a new game so it'll give me change to try it out in practice.

 

 

  • Like 4
Link to comment
Share on other sites

Some speed improvements to the ARM emulation.

 

https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.8.1

 

@Andrew Davie helpfully supplied me with a useful CDFJ test ROM. I was aware that there was no optimisation but most ROMs use the ARM only sparingly so getting some good profiling data was helpful. I also found some other areas of the TIA that I could optimise so there should be a small improvement in non-ARM ROMs too.

 

Unrelated, but Go has some interesting compiler changes penciled in for v1.17 that should bring a significant performance improvement (register based function arguments/return values, as opposed to stack based).

 

@Al_Nafuur spotted that @splendidnut's  Congo Bongo demo ROM wasn't being detected correctly. You can force the cartridge type by file extension or with a command line argument but it's good to have automatic fingerprinting so I've fixed that too.

 

As ever, I'm keen for people to give it a go and give me feedback. The project wouldn't be as far along as it is without AtariAge's input over the last year.

 

  • Like 2
Link to comment
Share on other sites

4 hours ago, JetSetIlly said:

Some speed improvements to the ARM emulation.

 

https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.8.1

 

@Andrew Davie helpfully supplied me with a useful CDFJ test ROM. I was aware that there was no optimisation but most ROMs use the ARM only sparingly so getting some good profiling data was helpful. I also found some other areas of the TIA that I could optimise so there should be a small improvement in non-ARM ROMs too.

 

Unrelated, but Go has some interesting compiler changes penciled in for v1.17 that should bring a significant performance improvement (register based function arguments/return values, as opposed to stack based).

 

@Al_Nafuur spotted that @splendidnut's  Congo Bongo demo ROM wasn't being detected correctly. You can force the cartridge type by file extension or with a command line argument but it's good to have automatic fingerprinting so I've fixed that too.

 

As ever, I'm keen for people to give it a go and give me feedback. The project wouldn't be as far along as it is without AtariAge's input over the last year.

 

 

As I understand things, Stella emulates the ARM code as if it executed in 0 cycles (6507 continues as if it didn't happen). This is why my chess will run on stella but not* on the Harmony Cart.  (*)However, a recent change seems to have made my code fast enough to run on both. But it's just a timing quirk. So, my question - does Gopher2600 correctly adjust the 6507-side timing based on the amount of cycles passed in any ARM code "triggered"?  I'm guessing that this is actually a difficult thing to detect.

 

Essentially one would (in the bankswitch scheme) detect the specific condition that halts the 6502 (or more specifically puts EA on the bus until further notice). Then the ARM code runs -- and we should be able to know how many ARM cycles until it once again re-enables servicing of the address bus and revectors the 6502 to the instruction after the "halt" write.  AFAIK that halt is writing $FF to $1FF3.

 

The latest test chess ROM which should work on both emulator and actual hardware (but unconfirmed)...

 

 

 

 

 

 

CDFJChess.bin

Link to comment
Share on other sites

9 hours ago, Andrew Davie said:

 

As I understand things, Stella emulates the ARM code as if it executed in 0 cycles (6507 continues as if it didn't happen). This is why my chess will run on stella but not* on the Harmony Cart.  (*)However, a recent change seems to have made my code fast enough to run on both. But it's just a timing quirk. So, my question - does Gopher2600 correctly adjust the 6507-side timing based on the amount of cycles passed in any ARM code "triggered"?  I'm guessing that this is actually a difficult thing to detect.

 

Currently, Gopher2600 behaves the same as Stella with regards to ARM timings. That is, the ARM code is assumed to execute instantaneously and for the VCS to continue as if it didn't happen (apart from the side-effects to cartridge memory). This means that the RIOT timer only decrements enough to account for the STX CALLFN in your code and not the NOPs that we would see in real hardware.

 

9 hours ago, Andrew Davie said:

 

Essentially one would (in the bankswitch scheme) detect the specific condition that halts the 6502 (or more specifically puts EA on the bus until further notice). Then the ARM code runs -- and we should be able to know how many ARM cycles until it once again re-enables servicing of the address bus and revectors the 6502 to the instruction after the "halt" write.  AFAIK that halt is writing $FF to $1FF3.

 

Counting cycles executed by the custom ARM program (which is actually Thumb code) would be straightforward. The complication is that a call to the custom ARM program almost certainly involves running non-Thumb code also. Neither Stella or Gopher2600 emulates the non-Thumb code that is found in the Harmony drivers.

 

Full emulation of ARM code would be beneficial because it would mean that new Harmony bankswitching schemes would be supported automatically without having to write new support code in the host language (ie. Go in Gopher2600 and C++ in Stella).

 

In the short term, I'm wondering if it's possible to decode the ARM bytecode with the view to only count cycles (rather than execution). My guess is that it is but I'll have to do some more studying of the tech specs to say for sure.

 

https://developer.arm.com/documentation/ddi0234/b/instruction-cycle-timings/instruction-cycle-count-summary

 

In the end however, I suspect that it will mean emulating the ARM7TDMI in its entirety in order for us to get the results we want.

 

  • Like 2
Link to comment
Share on other sites

6 hours ago, JetSetIlly said:

Counting cycles executed by the custom ARM program (which is actually Thumb code) would be straightforward.

 

Access time varies between ROM and RAM. There's also MAM (cache) which we had to partially disable due to a bug in the original ARM processor used in the Harmony/Melody.  @johnnywc's latest projects do not disable the cache as he needed the extra CPU time. The bug has been fixed in the ARM now used to build the Harmony/Melody boards.

 

On 1/1/2021 at 12:40 PM, johnnywc said:
On 1/1/2021 at 11:55 AM, SpiceWare said:

 

Works on my Harmony Encore, but crashes on my older Harmony after showing the initial splash screen.

 

My Encore has BIOS 1.06, while the Harmony had 1.05. I updated the Harmony to 1.06, but it still crashes.

 


Oh yes, I forgot to mention that this uses the CDFJ driver with MAM enabled which won't work on older Harmony carts.  I would have assumed it would work on a Concerto cart (aren't those new?); it may have the old version of the ARM with the MAM bug.  Gorf Arcade (and RobotWar:2684) require MAM to be enabled due to the high processing requirements (RobotWar:2684 on the later levels with 100+ enemies and Gorf Arcade with the complex graphics and the in-game voice).  I could upload a Gorf Arcade with MAM disabled that should work with older Harmony carts (and possibly the Concerto) but would disable voice by default since that is what causes the screen rolls.

 

 

 

  • Like 2
Link to comment
Share on other sites

41 minutes ago, SpiceWare said:

 

Access time varies between ROM and RAM. There's also MAM (cache) which we had to partially disable due to a bug in the original ARM processor used in the Harmony/Melody.  @johnnywc's latest projects do not disable the cache as he needed the extra CPU time. The bug has been fixed in the ARM now used to build the Harmony/Melody boards.

 

 

 

Good point, I hadn't started thinking about that yet. IIRC, the memory bus will remain the same speed for sequences of S cycles and change when an N cycle is encountered if the new address is in a different type of memory. Is that what you're referring to?

 

I was going to start playing with this this week. The first step will be to define the cycle profile for each of the 19 instruction formats in the Thumb set. ie. what combinations of N, S, I and C cycles are required. After that I can think about how the timing changes for different memory types. I have a low expectation for success at this point. Plus it's a bit of an academic exercise unless I implement the full ARM instruction set. Still, it's worth trying I think.

 

 

The good news is that I've learned about Gorf Arcade. I hadn't come across that before ?

Link to comment
Share on other sites

11 minutes ago, JetSetIlly said:

I was going to start playing with this this week. The first step will be to define the cycle profile for each of the 19 instruction formats in the Thumb set. ie. what combinations of N, S, I and C cycles are required. After that I can think about how the timing changes for different memory types. I have a low expectation for success at this point. Plus it's a bit of an academic exercise unless I implement the full ARM instruction set. Still, it's worth trying I think.

Wouldn't an approximation help here too?  While not perfect, developers would at least notice when new code costs a lot of CPU time or when the available time becomes low.

Edited by Thomas Jentzsch
  • Like 1
Link to comment
Share on other sites

Just now, Thomas Jentzsch said:

Wouldn't an approximation help here too?  While not perfect, developers would at least notice when new code costs a lot of CPU time or when the available time becomes low.

 

I think you're right. There could be a definable warning level in the emulator to alert the developer when the ARM program looks to be using too much time.

Link to comment
Share on other sites

1 hour ago, JetSetIlly said:

Good point, I hadn't started thinking about that yet. IIRC, the memory bus will remain the same speed for sequences of S cycles and change when an N cycle is encountered if the new address is in a different type of memory. Is that what you're referring to?

 

I'm not familiar with how MAM is implemented, so don't know what the impact is. @cd-w might have some insight that would help to add ARM cycle time emulation.

 

Link to comment
Share on other sites

Short video of the debugger with an ARM timing overlay (the purple pixels) sketched in. Still work in progress and it's nowhere near 100% accurate but I wanted to post now to show the potential of the idea.

 

What I need to do next is to factor in the speed of the different memory types.

 

At the moment I'm assuming that S and I cycles are the same length and N cycles are double the length of an S cycle. From what I understand of the ARM (which admittedly is not much) it is the length of the S cycle that can change depending on the memory type being accessed (N cycles being double of whatever the S cycle is).

 

The next step is add a to configuration window that will allow the dialing in of precise timings for each memory type.

 

From an emulation point of view this is now working more like what happens in the Harmony, although not precisely. Step 2 is a fudge, albeit one without serious side effects.

 

1) CallFn register is triggered

2) ARM program is run in it's entirety and number of ARM cycles consumed is returned

3) Reading cartridge memory results in a NOP being returned (put on the data bus)

3b) This continues for the length of time the ARM program requires to run (even though it has finished in actuality)

4) When ARM program has concluded, the three bytes of JMP <resume adddress> is put on the data bus

 

As far as I can see, the only value of running the ARM program in parallel with the 6507 program would be the ability to monitor the ARM program step-by-step and possibly to add breakpoints, but I don't believe it's required to get accurate timing information.

Edited by JetSetIlly
  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...