Jump to content
IGNORED

Classic99 Updates


Tursi

Recommended Posts

-Fix post-increment instructions in GPU (this was the important one!)

-fix CRU testing of VDP interrupt bit

-Added support for F18 opcodes to disassembler (not fully tested)

-Add support for interleaved GPU operation (check option under Video)

-return end of frame on GPU address >7000 so it can be checked for (super hacky)

-add heat map to GPU VDP access

-make GPU VDP memory changes force a redraw

-override TriggerInterrupt for GPU to prevent potential errors

-protect 'muteaudio' call with audio critical section (helps pause on inactive?)

-more DSR debug checks

-fix for more than 100 cartridges across groups

-updated source folder

 

http://harmlesslion.com/software/classic99

 

Fair warning: the GPU interleaving is very hacky and may not work very well. It may also have broken the old single-CPU mode, but it appears to work okay. Switching GPU interleaving on and off during a running program is not guaranteed to work (well, none of it is /guaranteed/). Also, the interleaved GPU will only run if the main CPU leaves enough time, so if your machine can't keep up or you turn on overdrive, the GPU probably won't run. While interleaved GPU is on, single-stepping the GPU will NOT WORK AT ALL. You have to turn it off if you want to use the debugger.

 

The end-of-frame on the GPU register at >7000 always returns 192 -- I just wanted to see a demo run. ;) It's not synchronized with anything so you can expect things to flicker.

 

I'm not positive if the mute on "pause when window inactive" was fixed with the small change I made, but I couldn't reproduce it afterwards. Maybe it was just a race? Audio should mute any time the main window loses focus with that option on (this includes going to the debugger window, although the emulator won't pause).

  • Like 1
Link to comment
Share on other sites

yay! Your demo was why I tried for the interleaving CPU/GPU. The code is done very wrong, like most of the current F18A support, but it was cool to see your demo come up. :) (I apologize for the flicker, fixing that is further out).

 

Thanks for confirming on the audio, too!

Link to comment
Share on other sites

-Fix post-increment instructions in GPU (this was the important one!)

-fix CRU testing of VDP interrupt bit

-Added support for F18 opcodes to disassembler (not fully tested)

-Add support for interleaved GPU operation (check option under Video)

-return end of frame on GPU address >7000 so it can be checked for (super hacky)

-add heat map to GPU VDP access

-make GPU VDP memory changes force a redraw

-override TriggerInterrupt for GPU to prevent potential errors

-protect 'muteaudio' call with audio critical section (helps pause on inactive?)

-more DSR debug checks

-fix for more than 100 cartridges across groups

-updated source folder

 

 

Excellent. So when does Classic99 become sentient so that it can contribute to the community? ;)

Link to comment
Share on other sites

 

Excellent. So when does Classic99 become sentient so that it can contribute to the community? ;)

 

Due to the work of Sarah Connor, the day for this keeps getting pushed back. But, rest assured, there is a Classic99 army of sentient machines sent back from the future to ensure that this will indeed happen.

Link to comment
Share on other sites

 

Due to the work of Sarah Connor, the day for this keeps getting pushed back. But, rest assured, there is a Classic99 army of sentient machines sent back from the future to ensure that this will indeed happen.

 

At least you can take pride in knowing they have a 9900 at their core. You all will be the last hope to save humanity with your knowledge of their assembly language (or join the machines and keep upgrading them, either way they'll probably keep you around ;) ).

Link to comment
Share on other sites

 

I've not been able to get it to work either....

 

 

For some reason you need to disable Interleave GPU. TI Scramble is not using the GPU except for the F18A detection.

 

Edit: Titanium is not starting up the F18A mode either if your interleave the GPU.

 

Edit 2: The GPU based F18A detection routine is working under the (valid) assumption that the GPU is much faster than the CPU and returns its result before the CPU has a chance to read it. Perhaps this is why it fails when you interleave the GPU Classic99? Without interleaving, the CPU is waiting for the GPU to finish.

 

http://atariage.com/forums/topic/207586-f18a-programming-info-and-resources/?do=findComment&comment=2676863

  • Like 1
Link to comment
Share on other sites

For some reason you need to disable Interleave GPU. TI Scramble is not using the GPU except for the F18A detection.

 

Edit: Titanium is not starting up the F18A mode either if your interleave the GPU.

 

Edit 2: The GPU based F18A detection routine is working under the (valid) assumption that the GPU is much faster than the CPU and returns its result before the CPU has a chance to read it. Perhaps this is why it fails when you interleave the GPU Classic99? Without interleaving, the CPU is waiting for the GPU to finish.

 

http://atariage.com/forums/topic/207586-f18a-programming-info-and-resources/?do=findComment&comment=2676863

Ahhh.. that makes sense. The CPU doesn't let the GPU run before it checks to see if the GPU has already ran. ;) That's a Classic99 hack and wouldn't happen on hardware, since the GPU is many, many times faster than the CPU. :)

 

Yay, first interleaving bug! ;)

Link to comment
Share on other sites

Tursi, I'm not sure how you implemented the GPU but it seems having it run in its own thread and execute instructions as fast as it can would be the most reliable way. In the F18A the GPU sits and runs asynchronously and any synchronization is up to the host program.

 

You should only have to ensure that when the GPU is triggered and that the 99/4A CPU yields to ensure the GPU gets to update its state, which happens instantly on the real hardware as a result of the host-to-VDP write operation. It is not so much that the GPU is running much faster than the host CPU, but rather that the GPU responds in real time, and in parallel, to the host system. When the GPU is triggered, it will update its "running" status even before the host system finishes the VDP register write operation.

 

But even if the GPU were running at the same speed as the host system, the internal FSM would still be many times faster than a the host CPU's fetch, decode, execute cycle and even a slow GPU would have easily updated its status before the host CPU could execute the instruction that checks the status.

 

You probably realize all this though...

Link to comment
Share on other sites

379 bank-switched ROMs - it looks like the .bin file is a straight dump of the (EP)ROM - is that correct? Is there a way (in Classic99) to set which bank the 379 boots up in, to be able to test that the cartridge boots correctly from each bank?

 

Thanks,

Stuart.

Edited by Stuart
Link to comment
Share on other sites

Yes - a straight dump of the eprom file. Bank 1 begins at >0000 and bank 0 begins at >8000 in the eprom, if I remember correctly.

 

You cannot set which bank comes up in classic 99 or any other emulator as far as I know. You need to put headers in both banks.

Link to comment
Share on other sites

Tursi, I'm not sure how you implemented the GPU but it seems having it run in its own thread and execute instructions as fast as it can would be the most reliable way. In the F18A the GPU sits and runs asynchronously and any synchronization is up to the host program.

On the surface, yes, but in reality just that won't always work either. Because the operating system schedules threads in fixed increments, you can't guarantee with that approach that the GPU thread actually runs before the CPU thread makes it to the next step. Anyway, two parts are necessary - the CPU thread releasing its timeslice, and the GPU thread /getting/ it. This can be done with a wait object, which is how I'll probably do it.

 

For the current version, it's easier, because the interleave just uses "left over" CPU time, so I can just end the CPU timeslice when the GPU is started. I probably won't get that fix this week, though.

 

Most reliable is always going to be single-threaded, counting every cycle and executing what is due, but, computers aren't fast enough for that yet. ;) Someday they will be, and that will be cool. For now, multi-threaded /is/ more fun, you just have to be more careful in what you assume.

 

The assumption behind the detection trick, which I wrote, remember, is that the GPU must be /guaranteed/ to get to the write before the CPU does. Yeah, there are a number of things that make it more likely even if they're the same speed, but the assumption was that it was faster. The assumption was that I can start the CPU and immediately, in the next CPU instruction, read back the data. /Why/ the GPU is faster doesn't matter to it. :)

Link to comment
Share on other sites

Most reliable is always going to be single-threaded, counting every cycle and executing what is due, but, computers aren't fast enough for that yet. ;) Someday they will be, and that will be cool. For now, multi-threaded /is/ more fun, you just have to be more careful in what you assume.

Well, a modern 3GHz computer is 3000 times faster than the 99/4A, so I would hope that a modern computer would be fast enough to cycle emulate a 3MHz 16-bit machine. Even a 300MHz machine is 100 times faster and should be able to do it. The problem is trying to run such an emulator on a time slice pre-emptive OS. If you wrote the emulator as the *only* program running on the CPU, then it would be easily possible.

 

There is such a platform that could be used on modern systems:

 

http://www.returninfinity.com/

Link to comment
Share on other sites

Well, a modern 3GHz computer is 3000 times faster than the 99/4A, so I would hope that a modern computer would be fast enough to cycle emulate a 3MHz 16-bit machine.

This argument goes back to the very beginnings of emulation. :) Since I started programming my emulation under DOS, which is /not/ a multi-tasking OS, and because I spent ten years of my professional life squeezing true real time performance out of Windows XP, which is not, I do have some ideas of how the OS works and where it impacts you.

 

My advice would be to try it before you say. Those 3000 cycles per original machine cycle will be spent faster than you think. Is it doable? Yeah, maybe. Is it going to be easy? I'd be surprised. At 30,000 cycles per original machine cycle I'd start being comfortable.

 

That sounds ridiculous, yes, but you are doing more in a machine emulation than just a small operation per clock tick. You are emulating all the systems in the computer, and you are also managing the /host/ resources and doing conversion. For the TI you can probably reasonably reduce it to the CPU, VDP, and sound chip, since the 9901 can be loosely considered an extension of the CPU and the memory systems are predictable. Probably should throw in the speech unit. The CPU needs at least 3,000,000 cycles per second (it's really 12MHz due to the 4-phase clock, but you can ignore that if the precise bus timing is not part of your emulation). The VDP needs to update at least at its pixel clock, which I don't have the numbers for (and you know better than I!), but is at least 3,000,000 pixels per second (ignoring overscan areas). The sound chip needs attention 44,000 times a second (if you want that level of audio quality), and the speech unit 8000 times per second. So that gives you 6,052,000 operations per second. Your budget becomes 495 cycles per operation (per chip!). That's not bad, but you wouldn't want to squander it. A single P4 cache miss, for example, takes from 18 to 276 cycles to resolve - that's just fetching data from RAM! Two cache misses and you've spent your budget. There are lots of other ways to eat up those cycles, and I haven't even touched an OS yet.

Link to comment
Share on other sites

A raw hardware cycle-accurate emulator is on my TODO list, somewhere. :-) Though I'm not convinced that "modern hardware" even makes a good platform for writing emulators. Memory is still grossly behind CPU speed to the point where, as you mentioned, getting a cache miss effectively destroys any CPU speed advantage. Without a raw-metal emulator to compare to it is hard to say for sure how much the OS is getting in the way. I just find it hard to accept that a 3000x increase in performance would not be good enough for cycle accurate emulation.

 

Although, I suppose the comparison needs to be done at the *slowest* component in a modern computer instead of the fastest. Modern SDRAM is still only pushing 70ns access time when you have to make a full request and really can't take advantage of burst or wide access (emulators don't *stream* data to/from RAM or access 64-bits).

 

Hmm, I have a pile of ideas now. If only I could quit my day job! :-)

  • Like 1
Link to comment
Share on other sites

Just to give an idea how MAME/MESS attempt to achieve cycle precision:

  • One component handles all timers in the emulated system. This may be the screen at 60 Hz, firing at regular intervals, and ad-hoc timers (e.g. for the delay of the sector on the emulated floppy until it reaches the read head, or the delay until the GROM raises the READY line again.
  • This event subsystem offers a method which returns the time until the next timer is about to fire.
  • The scheduler lets each active component perform a specifically number of cycles, which it calculates from its cycle time and the time until the next event occurs.
  • If all the cycles could be performed before the event occurs, the emulator will report an average speed of 100%.

That is, the emulator does not run 100% in sync with the real machine, but makes sure that a certain number of cycles are executed until a timer event occurs. Since we cannot perceive a jitter in those short time scales, this is just sufficient.

 

The problem that one has to consider with precise emulation is that there are usually more components than just a CPU, especially when there is a also a video processor, a TMS9901 running in counter mode, various circuits with specific timing behavior, and so on.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...