7.16mhz 1200XL

atariksi · September 15, 2008

No real need to boot up in turbo mode if it's a software speed-switchable system.

You gain nothing with disk I/O having a quicker CPU (although of course, compression systems would benefit greatly).

My comment on the "OS State" was in reference to the earlier suggestion of sharing the OS bit and having it as the control method to go into turbo mode.

I'm not fanciful about the idea as it then means you lose a lot of versatility so far as being able to arbitrarily jump in and out of fast mode.

Doesn't disk i/o in the OS use timeouts and loops that would affect compatibility if they were in turbo-mode? That's why I was stating you would need a substitute OS or not be in turbo mode.

There isn't that much software that use ROM in disable mode that would suffer from a turbo mode getting enabled at the same time (in machines with this sort of accelerator).

Rybags · September 15, 2008

Yes, I think some of that was covered earlier.

I checked the hardcopy OS-B Source that I have the other day, and it uses nested DEX/DEY in between asserting the Command Line and sending the first byte.

But, you're running from ROM when you boot up, so that sort of delay loop should take the same amount of time to execute since ROM and mapped I/O accesses are always in "slow" mode.

Of course, for RAM based SIO routines that use similar loops, there is definate possibility of problems.

drac030 · September 15, 2008

Well, I would assume that anyone running such code would have one of the other accelerators, so what would be the point? The objective in using a control register that cannot be reached by existing 6502 code is to minimize conflicts.

The project that you refer to is a very ambitious undertaking, on a completely different plane than the XL7. I keep stuff pretty simple because it all has to fit in my head.

The point is to not conflict with other designs and already existing software. For example, one of the HPU registers in F7 can be (and is) used to test, if this is F7 or not (i.e. Warp4). The 65816 OS needs to check what hardware it is running on, and e.g. reset the HPU when it is present. If you locate your control register where the F7 has HPU and Warp4 nothing, then there will be a compatibility problem. Also some programs (e.g. SysInfo) scan the 16 MB address space in order to detect, if there is RAM there.

An address decoder that locates the control register in some unused adress (e.g. $FF8000) should be simple enough. But if you are going to take the whole 64 KB (or whole 16 MB) for just one control register, then well - there is some difference between "simple" and "primitive".

@Larry: F7 is off, but there are some prospects on Warp4. This does not depend on me, unfortunately (because if it did, it would be long done).

Edited September 15, 2008 by drac030

+bob1200xl · September 15, 2008

The clock would be a little weird if you wanted 12.53 mhz... there isn't enough difference between the full 14.32mhz and a 7x clock to solve the problem, anyway. 80ns RAM would not be fast enough - a 12mhz system would only have 40ns or so to access RAM. I have 20ns SRAM if we need it.

Bob

If the 6X mode was your Radar, wouldn't it be better to go for the 7X (12.53Mhz) multiplier and use like 80ns RAM?

I could have figured that you would go straight to this one, Claus... It has been at the top of my list, also.

Basically, there is too long a path to resolve the hardware state in 35ns or less. A 14mhz clock allows you the first half of the cycle to decide if you're going to RAM (at 14mhz) or ROM and I/O (at 1.79mhz). This amounts to 35ns. This path is from the CPU, down to the Atari PAL, thru the PAL, back to the CPLD, and thru the CPLD. Even by using 10ns parts in the PAL and CPLD, we lose that race once in a while - way too often to call it 'usable'. The state is latched at 35ns - half-way into the 70ns clock.

It looks OK on a scope except for that little critter that flicks out towards 35ns...

I have considered using an odd clock value - one that is not a multiple of 2 of the Atari clock. Something like 52.5ns (3x17.5) of setup and 35ns (2x17.5) of execution, giving you the potential to run six times faster. Might be doable... by someone else - not on my Radar.

Otherwise, consider a RAM-only processor, or one that could talk to everybody at 14mhz. (the limit of the 65816 itself) Everybody being FRAM(s) in a cartridge along with a 14mhz CPU. Not much to it - not many decisions to make - select FRAM or SRAM? That's about it. Everything else is s/w. Run like crazy!

Question: if you set up multiple FRAMs with the same starting parameters at the beginning of each video frame, (like all start at address $0000) couldn't you load/start them concurrently? No reason to load each one by itself, is there? Only the data for each memory has to be done individually? There is no protocol, per se. The CPU talks - the FRAMs better be listening.

Not to be greedy but...

How much faster could it be made to go? Which component limits the design to 7 MHz (besides the clock)?

+bob1200xl · September 15, 2008

I don't have a problem with fitting in with existing designs as long as it is relatively straightforward. What is the proper method of recognizing an '816 implementation and toggling 'turbo-mode'?

Bob

Well, I would assume that anyone running such code would have one of the other accelerators, so what would be the point? The objective in using a control register that cannot be reached by existing 6502 code is to minimize conflicts.

The project that you refer to is a very ambitious undertaking, on a completely different plane than the XL7. I keep stuff pretty simple because it all has to fit in my head.

The point is to not conflict with other designs and already existing software. For example, one of the HPU registers in F7 can be (and is) used to test, if this is F7 or not (i.e. Warp4). The 65816 OS needs to check what hardware it is running on, and e.g. reset the HPU when it is present. If you locate your control register where the F7 has HPU and Warp4 nothing, then there will be a compatibility problem. Also some programs (e.g. SysInfo) scan the 16 MB address space in order to detect, if there is RAM there.

An address decoder that locates the control register in some unused adress (e.g. $FF8000) should be simple enough. But if you are going to take the whole 64 KB (or whole 16 MB) for just one control register, then well - there is some difference between "simple" and "primitive".

@Larry: F7 is off, but there are some prospects on Warp4. This does not depend on me, unfortunately (because if it did, it would be long done).

peteym5 · September 15, 2008

I thought running at 7x was alittle strange and probably could not evenly divide up the cycles back to the main board. Running at 4x or 6x probably be more than fast enough for running any existing Atari software. Something I had thought about, wouldn't now be easier to incorporate an IDE interface with the extra address space. A much better MIO type device can be created. Since the address range is 16megs, add the needed addresses can be placed in the $FFxxxx range.

Question about that OS. Does it support binary loading above the 64k area?

Back to the subject of Player/Missile multiplexing. I prefer to do the DLI Zone method because it also leaves options to be able to do include play field color changing. Another project I have sitting on the back burner, "Jungle Quest" uses many play field color changes along with multicolored player/missile graphics. The DLI zones have to scroll vertically also. This is something I have full control over with the decision making and plan to create several versions for different Atari hardware and memory configurations. Plan to have worlds that can be fit on a 64k machine with larger ones that take up 128k. I can easily make a 65816 custom version with more intense color changing and taking advantage of the 24 bit address space. Wouldn't we like to see something demonstrating what these accelerated Ataris' can do?

+bob1200xl · September 15, 2008

Some of the newer CF cards run at very high speed - they could be addressed in the $FFxxxx range if we had extended memory, but we do not. We were talking about a 'turbo' switch control register, which we can toggle at powerup using $FFxxxx as a one-shot decision. No function exists for dynamic turbo, much less an IDE interface in the extended memory. All of that kind of thing will have to be implemented in the existing Atari hardware map - at low speed.

Bob

I thought running at 7x was alittle strange and probably could not evenly divide up the cycles back to the main board. Running at 4x or 6x probably be more than fast enough for running any existing Atari software. Something I had thought about, wouldn't now be easier to incorporate an IDE interface with the extra address space. A much better MIO type device can be created. Since the address range is 16megs, add the needed addresses can be placed in the $FFxxxx range.

Question about that OS. Does it support binary loading above the 64k area?

Back to the subject of Player/Missile multiplexing. I prefer to do the DLI Zone method because it also leaves options to be able to do include play field color changing. Another project I have sitting on the back burner, "Jungle Quest" uses many play field color changes along with multicolored player/missile graphics. The DLI zones have to scroll vertically also. This is something I have full control over with the decision making and plan to create several versions for different Atari hardware and memory configurations. Plan to have worlds that can be fit on a 64k machine with larger ones that take up 128k. I can easily make a 65816 custom version with more intense color changing and taking advantage of the 24 bit address space. Wouldn't we like to see something demonstrating what these accelerated Ataris' can do?

DamageX · September 16, 2008

I don't know why you'd consider a software controlled turbo switch rather than a physical one. It seems to me that one would just end up wasting a lot of time having to reboot the machine whenever a speed change was desired. Even you have a patched OS with a key combination to toggle the setting, some programs will disable the OS and then you have no control. With a real switch you always have full control.

+bob1200xl · September 16, 2008

You make a good case for a physical switch but it does not invalidate a software interface. Some programs may need to jump out of turbo mode for a while in order to run a routine that doesn't like the extra speed. I don't think the idea was to have a manual s/w interface - you would be better off with a mechanical switch in that case, yes.

Bob

I don't know why you'd consider a software controlled turbo switch rather than a physical one. It seems to me that one would just end up wasting a lot of time having to reboot the machine whenever a speed change was desired. Even you have a patched OS with a key combination to toggle the setting, some programs will disable the OS and then you have no control. With a real switch you always have full control.

Shawn Jefferson · September 16, 2008

I don't want your project to get bogged down in feature-creep, so I'd say keep it as simple as you can, but the question of where to put a software switch is important IMO, since you may be setting the "standard" for this family of upgrades, much as Atari set the standard for $D301 memory bank switching. For instance, making the switch attached to any read/write to $FFxxxx creates a "standard" that will make putting a full 16MB of RAM on the board difficult later on.

Rybags · September 16, 2008

I don't see much problem with $FFxxxx.

And for massive RAM upgrades in the future, nothing wrong either with bank-switching something like 2, 4 or 8 meg chunks.

+bob1200xl · September 16, 2008

I understand, although 16mb is some ways down the road - largest SRAM I have seen is 2mb... how about $D5xx? Only carts use that, as far as I know. Do we know what is available in that address range?

It isn't like the system is going to crash if it drops out of turbo mode, anyway. Might be a problem if it jumps into turbo.

Bob

I don't want your project to get bogged down in feature-creep, so I'd say keep it as simple as you can, but the question of where to put a software switch is important IMO, since you may be setting the "standard" for this family of upgrades, much as Atari set the standard for $D301 memory bank switching. For instance, making the switch attached to any read/write to $FFxxxx creates a "standard" that will make putting a full 16MB of RAM on the board difficult later on.

drac030 · September 16, 2008

@bob: in existing implementations there is no such software switch, so such a register, as you are considering it, would provide something new. In Warp 4 the clocks are switched by address: if the address is $00xxxx, the clock is "slow", otherwise it is fast (unless the turbo is disabled by a hardware switch). In F7 the turbo mode is enabled permanently, you only decide in software, how the fast memory is to be mapped into address space (f.e. you can map Atari memory onto $0000-$FFFF, then it works less or more as Warp 4 - or you can map the card's memory onto $0000-$FFFF, then the entire address space is fast). Again, this can be overridden by a hardware switch.

$D5xx is bad idea, this page is already overloaded, so there is a hign possibility for a conflict (with a Maxflash cart, for instance). I think the register should be located outside the first 64K and inside the bank $FFxxxx, as you planned. As I wrote, this area is anyways allocated for I/O by other implementations. This would need a 24-bit address decoder anyways, so making it more precise, i.e. to decode it at $FF8000 (or $FF8000-$FF80FF) should not be a big problem.

Other things: even if the largest SRAM is 2 MB, I don't see a problem by using 8 of them to complete a 16 MB (minus 128K). Banking "large chunks" is hm "interesting" idea, but not everyone likes banking so much as to implement it even when it is not necessary (in other words: I don't understand the pleasure of having 8 banks by 2 MB instead of linear 16 MB).

@peteym: the Atari OS does not support binary loading at all, and the addresses in the "boot file" header are 16-bit, so it does not. However, the SIO is extended so that it can transfer sectors directly from/to the area above the first 64K, so some support for a DOS, that could implement an extended binary loader is there.

atariksi · September 16, 2008

I don't know why you'd consider a software controlled turbo switch rather than a physical one. It seems to me that one would just end up wasting a lot of time having to reboot the machine whenever a speed change was desired. Even you have a patched OS with a key combination to toggle the setting, some programs will disable the OS and then you have no control. With a real switch you always have full control.

It's easier for turbo software to boot in normal mode and then switch over to turbo mode (on demand) for the new, tested software that works in turbo mode. And the accelerator hardware is easier to install w/o a physical switch. A dynamic turbo software switch would allow you to use existing OSes w/o any patches and thus make the accelerator backward compatible (unless the software tampered with the PORTB or whatever hardware memory mapped area was used for enabling turbo mode).

atariksi · September 16, 2008

The clock would be a little weird if you wanted 12.53 mhz... there isn't enough difference between the full 14.32mhz and a 7x clock to solve the problem, anyway. 80ns RAM would not be fast enough - a 12mhz system would only have 40ns or so to access RAM. I have 20ns SRAM if we need it.

Bob

Okay, I thought the max Mhz capability of the 65816 and 35 ns were limiting factors in the 14.3Mhz version so I thought next lower step would be 7X instead of 8X and probably fill in wait states when alignment of cycles was needed. I guess memory reads in parallel into a latch or cache would be too much to do (and won't show on your radar). I mean if you leave some burden on software to wait for aligned cycles when switching modes, would that help do 7X?

atariksi · September 16, 2008

...
$D5xx is bad idea, this page is already overloaded, so there is a hign possibility for a conflict (with a Maxflash cart, for instance). I think the register should be located outside the first 64K and inside the bank $FFxxxx, as you planned. As I wrote, this area is anyways allocated for I/O by other implementations. This would need a 24-bit address decoder anyways, so making it more precise, i.e. to decode it at $FF8000 (or $FF8000-$FF80FF) should not be a big problem.

...

It's more flexible to use a 16-bit address like current Atari hardware registers since then older software can easily adapt to turbo mode without having to use 24-bit addresses and still remain backward compatible with normal Ataris. Even in BASIC, one can enable the turbo mode and make certain calculations faster and if the same program is run on a normal Atari, it still runs but the turbo mode never gets enabled (has no effect).

drac030 · September 16, 2008

It's more flexible to use a 16-bit address like current Atari hardware registers since then older software can easily adapt to turbo mode without having to use 24-bit addresses and still remain backward compatible with normal Ataris. Even in BASIC, one can enable the turbo mode and make certain calculations faster and if the same program is run on a normal Atari, it still runs but the turbo mode never gets enabled (has no effect).

That's some point. Just not $D5xx, nor $D1xx.

peteym5 · September 16, 2008

How hard would it be to wire something onto the same page as GTIA, PIA, POKEY, or Antic. Lets say use the last byte like $D0FF, $D2FF, $D3FF, $D4FF, something that is not used by the internal hardware, cartridges, or known attach devices? I know dual pokey chips can be wired onto the same page with 16 bytes apart.

Shaun.Bebbington · September 17, 2008

And have you tried softloading Drac030's '816 ROM? Does it speed things up?

It is not intended to speed things up. It just allows new programs to fully utilize the 65C816 potential (i.e. the "native", 16-bit mode).

Which would surely speed things up? Like, long branching and 16-bit maths and accumulator and an enhanced instruction set etc...

Okay, so there might not be much noticable difference, but wouldn't the move opcodes be faster than lots of LDA/STA? That sort of thing?

Regards,

Shaun.

Shaun.Bebbington · September 17, 2008

Not to be greedy but...

How much faster could it be made to go? Which component limits the design to 7 MHz (besides the clock)?

The SuperCPU is over-clocked at 20Mhz (there is no difference between a PAL and NTSC unit), from an internal 65816 at 14Mhz, but this causes problems such as heat. I suppose the board could easily run at 14Mhz without a problem then.

Regards,

Shaun.

drac030 · September 17, 2008

And have you tried softloading Drac030's '816 ROM? Does it speed things up?

It is not intended to speed things up. It just allows new programs to fully utilize the 65C816 potential (i.e. the "native", 16-bit mode).

Which would surely speed things up? Like, long branching and 16-bit maths and accumulator and an enhanced instruction set etc...

Okay, so there might not be much noticable difference, but wouldn't the move opcodes be faster than lots of LDA/STA? That sort of thing?

Long branches and extended addressing modes are always available, regardless of the mode the CPU runs. The native mode "only" gives access to 16-bit registers, and, when we put it this way, yes, this "speeds things up", of course only for new programs which use the 16-bit loads/stores/calculations.

An example, let's have a 16-bit value stored in the memory. We need add a 16-bit constant ($6502) to it and store somewhere else. The standard way:

	   lda var
   clc
   adc #$02
   sta ext
   lda var+1
   adc #$65
   sta ext+1

Assuming that var and ext are outside the zero page, this is 22 clocks. Now 65C816:

	   lda var
   clc
   adc #$6502
   sta ext

Assuming the same as above, 15 clocks - 32% faster. But to run this coded you need the possibility to switch to the native mode, and for that you have to replace the OS.

peteym5 · September 17, 2008

How much faster could it be made to go? Which component limits the design to 7 MHz?

The SuperCPU is over-clocked at 20Mhz from an internal 65816 at 14Mhz ... I suppose the board could easily run at 14Mhz...

Around 14 mhz is the speed limit of these CPUs, probably make them run an 8x the speed of an Atari. Would have to keep in mind of certain limitations when writing to the Atari hardware area that you cannot write faster than an equivalent of one update per 1.79mhz cycle. That is a LDA #, sta $Dxxx which takes 6 cycles will translate to less than one cycle. If you get to this speed, you probably need to start adding in NOPs inside the DLIs or anywhere lots of writes send to the hardware area. The Antic/GTIA atleast has to run at 1.79 mhz. Pokey chips in arcade machines ran at 1mhz, 1.5mhz, and 2mhz without any problems.

On the subject of how fast one of these things can go, they have mentioned on the Western Design Center website that an ASIC version can run up to 200mhz. Those are like programmable logic chips. Doing an ASIC version probably opens up the possibility of running at different speeds and you could do your speed switch with one of those. You can add cache memory to buffer the writes to the hardware. This design is a little more complicated and I know we want to keep this stuff simple.

Honestly I think going over 8x is ridiculous, you will run into issues with interfacing the Atari video/audio chipset and other hardware.

atariksi · September 17, 2008

...
	   lda var
   clc
   adc #$6502
   sta ext
Assuming the same as above, 15 clocks - 32% faster. But to run this coded you need the possibility to switch to the native mode, and for that you have to replace the OS.

I don't have a 65816 to play around with. What's the cycle breakdown of the above for 65816-- does the 2nd read cycle for MSB in the LDA Var occur one cycle after the first read for LSB or more? Similarly for STA. There is only 8-bit data bus so much be splitting up into multiple reads/writes. So if I write a 16-bit value to 53270, two color registers would get set but what's the time in cycles between the first write and the second?

atariksi · September 17, 2008

How much faster could it be made to go? Which component limits the design to 7 MHz?

The SuperCPU is over-clocked at 20Mhz from an internal 65816 at 14Mhz ... I suppose the board could easily run at 14Mhz...

...

Honestly I think going over 8x is ridiculous, you will run into issues with interfacing the Atari video/audio chipset and other hardware.

I think 8X would be great even if software has to align code to 8 cycle blocks and page alignment so no code crosses over page boundaries. Even on Pentium processors, misalignment of code on addresses can cause a major speed penalty so you have time and space alignment instead of just space alignment.

potatohead · September 17, 2008

Re: Time in cycles for 16 bit write

I was wondering something along the same lines. Got this from another forum:

Every CPU cycle performs zero or one memory accesses (read or write). "Internal Operation" means it is busy doing something, but it isn't a read or write. Those CPU cycles are always 6 master cycles long.

For a "Read" or "Write" cycle, the length of the CPU cycle might be 6, 8 or 12 master cycles, depending on what address was written, and whether a certain bit of a certain register, was enabled (which allows faster accesses--6 instead of 8--to some address ranges). If I remember correctly, 12 is only used for a small range of addresses that map to some internal registers.

Edit: the W65C816S data sheet has a long table that describes what the 65816 does during each CPU cycle. If it says "IO", that is an Internal Operation cycle. The W65C816S is almost identical to the 65816 core used in the SNES. I don't believe there are any known timing differences between the two chips.

http://www.westerndesigncenter.com/wdc/dat...ts/w65c816s.pdf

(the datasheet referenced)

Looks like one extra cycle to me. See page 25.

Edit: That's at the 7mhz clock. In the slow mode, under this modification, I don't know. My gut says it's gonna happen on the next ordinary Atari clock cycle, for a one cycle color change.

Edited September 17, 2008 by potatohead

7.16mhz 1200XL

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members