Classic99 Updates

Tursi · October 22, 2013

I can't help you with that one, more than likely APIs will need to be updated again. :/

Looks like I have the August 2009 SDK installed, it may still be on MSDN.

Tursi · October 22, 2013

-Fix post-increment instructions in GPU (this was the important one!)

-fix CRU testing of VDP interrupt bit

-Added support for F18 opcodes to disassembler (not fully tested)

-Add support for interleaved GPU operation (check option under Video)

-return end of frame on GPU address >7000 so it can be checked for (super hacky)

-add heat map to GPU VDP access

-make GPU VDP memory changes force a redraw

-override TriggerInterrupt for GPU to prevent potential errors

-protect 'muteaudio' call with audio critical section (helps pause on inactive?)

-more DSR debug checks

-fix for more than 100 cartridges across groups

-updated source folder

http://harmlesslion.com/software/classic99

Fair warning: the GPU interleaving is very hacky and may not work very well. It may also have broken the old single-CPU mode, but it appears to work okay. Switching GPU interleaving on and off during a running program is not guaranteed to work (well, none of it is /guaranteed/). Also, the interleaved GPU will only run if the main CPU leaves enough time, so if your machine can't keep up or you turn on overdrive, the GPU probably won't run. While interleaved GPU is on, single-stepping the GPU will NOT WORK AT ALL. You have to turn it off if you want to use the debugger.

The end-of-frame on the GPU register at >7000 always returns 192 -- I just wanted to see a demo run. It's not synchronized with anything so you can expect things to flicker.

I'm not positive if the mute on "pause when window inactive" was fixed with the small change I made, but I couldn't reproduce it afterwards. Maybe it was just a race? Audio should mute any time the main window loses focus with that option on (this includes going to the debugger window, although the emulator won't pause).

Asmusr · October 22, 2013

The sound problem is fixed, the CRU bit is working, and my GPU demo is working! All my wishes come true. :thumbsup:

Tursi · October 22, 2013

yay! Your demo was why I tried for the interleaving CPU/GPU. The code is done very wrong, like most of the current F18A support, but it was cool to see your demo come up. (I apologize for the flicker, fixing that is further out).

Thanks for confirming on the audio, too!

+InsaneMultitasker · October 22, 2013

-Fix post-increment instructions in GPU (this was the important one!)

-fix CRU testing of VDP interrupt bit

-Added support for F18 opcodes to disassembler (not fully tested)

-Add support for interleaved GPU operation (check option under Video)

-return end of frame on GPU address >7000 so it can be checked for (super hacky)

-add heat map to GPU VDP access

-make GPU VDP memory changes force a redraw

-override TriggerInterrupt for GPU to prevent potential errors

-protect 'muteaudio' call with audio critical section (helps pause on inactive?)

-more DSR debug checks

-fix for more than 100 cartridges across groups

-updated source folder

Excellent. So when does Classic99 become sentient so that it can contribute to the community?

+OLD CS1 · October 23, 2013

Excellent. So when does Classic99 become sentient so that it can contribute to the community?

Due to the work of Sarah Connor, the day for this keeps getting pushed back. But, rest assured, there is a Classic99 army of sentient machines sent back from the future to ensure that this will indeed happen.

Tursi · October 23, 2013

Due to the work of Sarah Connor, the day for this keeps getting pushed back. But, rest assured, there is a Classic99 army of sentient machines sent back from the future to ensure that this will indeed happen.

At least you can take pride in knowing they have a 9900 at their core. You all will be the last hope to save humanity with your knowledge of their assembly language (or join the machines and keep upgrading them, either way they'll probably keep you around ).

Tursi · October 27, 2013

-Added initial support for F18A ECM sprites provided by RamusM

-Fixed menu list for TurboForth 1.2 (which was added last release!)

http://harmlesslion.com/software/classic99

OX. · October 28, 2013

Tried the latest Classic99 with Scramble - can't say I'm seeing any difference with the sprites

Asmusr · October 28, 2013

Tried the latest Classic99 with Scramble - can't say I'm seeing any difference with the sprites

Do you have the F18A GPU enabled in the Video menu?

Omega-TI · October 28, 2013

Do you have the F18A GPU enabled in the Video menu?

I've not been able to get it to work either....

Asmusr · October 28, 2013

I've not been able to get it to work either....

For some reason you need to disable Interleave GPU. TI Scramble is not using the GPU except for the F18A detection.

Edit: Titanium is not starting up the F18A mode either if your interleave the GPU.

Edit 2: The GPU based F18A detection routine is working under the (valid) assumption that the GPU is much faster than the CPU and returns its result before the CPU has a chance to read it. Perhaps this is why it fails when you interleave the GPU Classic99? Without interleaving, the CPU is waiting for the GPU to finish.

http://atariage.com/forums/topic/207586-f18a-programming-info-and-resources/?do=findComment&comment=2676863

Omega-TI · October 28, 2013

For some reason you need to disable Interleave GPU. TI Scramble is not using the GPU except for the F18A detection.

It worked! THANKS!

Tursi · October 28, 2013

For some reason you need to disable Interleave GPU. TI Scramble is not using the GPU except for the F18A detection.

Edit: Titanium is not starting up the F18A mode either if your interleave the GPU.

Edit 2: The GPU based F18A detection routine is working under the (valid) assumption that the GPU is much faster than the CPU and returns its result before the CPU has a chance to read it. Perhaps this is why it fails when you interleave the GPU Classic99? Without interleaving, the CPU is waiting for the GPU to finish.

http://atariage.com/forums/topic/207586-f18a-programming-info-and-resources/?do=findComment&comment=2676863

Ahhh.. that makes sense. The CPU doesn't let the GPU run before it checks to see if the GPU has already ran. That's a Classic99 hack and wouldn't happen on hardware, since the GPU is many, many times faster than the CPU.

Yay, first interleaving bug!

matthew180 · October 28, 2013

Tursi, I'm not sure how you implemented the GPU but it seems having it run in its own thread and execute instructions as fast as it can would be the most reliable way. In the F18A the GPU sits and runs asynchronously and any synchronization is up to the host program.

You should only have to ensure that when the GPU is triggered and that the 99/4A CPU yields to ensure the GPU gets to update its state, which happens instantly on the real hardware as a result of the host-to-VDP write operation. It is not so much that the GPU is running much faster than the host CPU, but rather that the GPU responds in real time, and in parallel, to the host system. When the GPU is triggered, it will update its "running" status even before the host system finishes the VDP register write operation.

But even if the GPU were running at the same speed as the host system, the internal FSM would still be many times faster than a the host CPU's fetch, decode, execute cycle and even a slow GPU would have easily updated its status before the host CPU could execute the instruction that checks the status.

You probably realize all this though...

OX. · October 28, 2013

Thanks for the update fella's

Stuart · October 28, 2013

379 bank-switched ROMs - it looks like the .bin file is a straight dump of the (EP)ROM - is that correct? Is there a way (in Classic99) to set which bank the 379 boots up in, to be able to test that the cartridge boots correctly from each bank?

Thanks,

Stuart.

Edited October 28, 2013 by Stuart

Willsy · October 28, 2013

Yes - a straight dump of the eprom file. Bank 1 begins at >0000 and bank 0 begins at >8000 in the eprom, if I remember correctly.

You cannot set which bank comes up in classic 99 or any other emulator as far as I know. You need to put headers in both banks.

Willsy · October 28, 2013

I have a program written by Tursi that takes an option 3 object file assembled by Cory Burr's ASM994A and produces bin files. For my 16k eproms I then use av dogs copy command to combine them into one 16 eprom file.

Want a copy of the program?

Tursi · October 28, 2013

Tursi, I'm not sure how you implemented the GPU but it seems having it run in its own thread and execute instructions as fast as it can would be the most reliable way. In the F18A the GPU sits and runs asynchronously and any synchronization is up to the host program.

On the surface, yes, but in reality just that won't always work either. Because the operating system schedules threads in fixed increments, you can't guarantee with that approach that the GPU thread actually runs before the CPU thread makes it to the next step. Anyway, two parts are necessary - the CPU thread releasing its timeslice, and the GPU thread /getting/ it. This can be done with a wait object, which is how I'll probably do it.

For the current version, it's easier, because the interleave just uses "left over" CPU time, so I can just end the CPU timeslice when the GPU is started. I probably won't get that fix this week, though.

Most reliable is always going to be single-threaded, counting every cycle and executing what is due, but, computers aren't fast enough for that yet. Someday they will be, and that will be cool. For now, multi-threaded /is/ more fun, you just have to be more careful in what you assume.

The assumption behind the detection trick, which I wrote, remember, is that the GPU must be /guaranteed/ to get to the write before the CPU does. Yeah, there are a number of things that make it more likely even if they're the same speed, but the assumption was that it was faster. The assumption was that I can start the CPU and immediately, in the next CPU instruction, read back the data. /Why/ the GPU is faster doesn't matter to it.

matthew180 · October 30, 2013

Most reliable is always going to be single-threaded, counting every cycle and executing what is due, but, computers aren't fast enough for that yet. Someday they will be, and that will be cool. For now, multi-threaded /is/ more fun, you just have to be more careful in what you assume.

Well, a modern 3GHz computer is 3000 times faster than the 99/4A, so I would hope that a modern computer would be fast enough to cycle emulate a 3MHz 16-bit machine. Even a 300MHz machine is 100 times faster and should be able to do it. The problem is trying to run such an emulator on a time slice pre-emptive OS. If you wrote the emulator as the *only* program running on the CPU, then it would be easily possible.

There is such a platform that could be used on modern systems:

http://www.returninfinity.com/

Tursi · October 30, 2013

Well, a modern 3GHz computer is 3000 times faster than the 99/4A, so I would hope that a modern computer would be fast enough to cycle emulate a 3MHz 16-bit machine.

This argument goes back to the very beginnings of emulation. Since I started programming my emulation under DOS, which is /not/ a multi-tasking OS, and because I spent ten years of my professional life squeezing true real time performance out of Windows XP, which is not, I do have some ideas of how the OS works and where it impacts you.

My advice would be to try it before you say. Those 3000 cycles per original machine cycle will be spent faster than you think. Is it doable? Yeah, maybe. Is it going to be easy? I'd be surprised. At 30,000 cycles per original machine cycle I'd start being comfortable.

That sounds ridiculous, yes, but you are doing more in a machine emulation than just a small operation per clock tick. You are emulating all the systems in the computer, and you are also managing the /host/ resources and doing conversion. For the TI you can probably reasonably reduce it to the CPU, VDP, and sound chip, since the 9901 can be loosely considered an extension of the CPU and the memory systems are predictable. Probably should throw in the speech unit. The CPU needs at least 3,000,000 cycles per second (it's really 12MHz due to the 4-phase clock, but you can ignore that if the precise bus timing is not part of your emulation). The VDP needs to update at least at its pixel clock, which I don't have the numbers for (and you know better than I!), but is at least 3,000,000 pixels per second (ignoring overscan areas). The sound chip needs attention 44,000 times a second (if you want that level of audio quality), and the speech unit 8000 times per second. So that gives you 6,052,000 operations per second. Your budget becomes 495 cycles per operation (per chip!). That's not bad, but you wouldn't want to squander it. A single P4 cache miss, for example, takes from 18 to 276 cycles to resolve - that's just fetching data from RAM! Two cache misses and you've spent your budget. There are lots of other ways to eat up those cycles, and I haven't even touched an OS yet.

matthew180 · October 30, 2013

A raw hardware cycle-accurate emulator is on my TODO list, somewhere. :-) Though I'm not convinced that "modern hardware" even makes a good platform for writing emulators. Memory is still grossly behind CPU speed to the point where, as you mentioned, getting a cache miss effectively destroys any CPU speed advantage. Without a raw-metal emulator to compare to it is hard to say for sure how much the OS is getting in the way. I just find it hard to accept that a 3000x increase in performance would not be good enough for cycle accurate emulation.

Although, I suppose the comparison needs to be done at the *slowest* component in a modern computer instead of the fastest. Modern SDRAM is still only pushing 70ns access time when you have to make a full request and really can't take advantage of burst or wide access (emulators don't *stream* data to/from RAM or access 64-bits).

Hmm, I have a pile of ideas now. If only I could quit my day job! :-)

+mizapf · October 30, 2013

Just to give an idea how MAME/MESS attempt to achieve cycle precision:

One component handles all timers in the emulated system. This may be the screen at 60 Hz, firing at regular intervals, and ad-hoc timers (e.g. for the delay of the sector on the emulated floppy until it reaches the read head, or the delay until the GROM raises the READY line again.
This event subsystem offers a method which returns the time until the next timer is about to fire.
The scheduler lets each active component perform a specifically number of cycles, which it calculates from its cycle time and the time until the next event occurs.
If all the cycles could be performed before the event occurs, the emulator will report an average speed of 100%.

That is, the emulator does not run 100% in sync with the real machine, but makes sure that a certain number of cycles are executed until a timer event occurs. Since we cannot perceive a jitter in those short time scales, this is just sufficient.

The problem that one has to consider with precise emulation is that there are usually more components than just a CPU, especially when there is a also a video processor, a TMS9901 running in counter mode, various circuits with specific timing behavior, and so on.

Asmusr · October 30, 2013

Perhaps we should have a general MESS thread for this type of question, but how would you go about emulating the F18A in MESS?

Classic99 Updates

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members