Jump to content
IGNORED

A multi-core 6502...would that be possible


carmel_andrews

Recommended Posts

W65C134S 8–bit Microcontroller (MCU)

The W65C134S is a feature rich 8–bit microcontroller based on the W65C02 with an advanced (originally designed for life support) Serial Interface Bus (SIB) token passing local area network for multi–W65C134S processor systems. The W65C134S has a full external memory bus (8–bit data and 16–bit address bus) for flexible system design. This MCU has an embedded debug monitor ROM with a library of routines that can help you reduce development time.

 

Also I think ECI+cartridge supports 2nd external cpu or?

Edited by Matej
  • Like 1
Link to comment
Share on other sites

6 hours ago, Peri Noid said:

Have you heard about the Weronika project by Zenon Rakoczy?

 

Nope, and neither has google. 😆

 

A link or two would be nice...

 

19 hours ago, ivop said:

Definitely! 👍

 

Perhaps next year "we" could submit an Atari 8-bit SOC ;)

 

I was thinking about the existing Atari 800 core for MiSTer: https://github.com/MiSTer-devel/Atari800_MiSTer

 

It would be interesting to come up with extensions to the architecture that Atari never made... making the processor twice as fast and adding improved graphics and sound... that sort of thing. Once you have a decent updated Atari, THEN get a SoC made. :)

 

 

Edited by Chilly Willy
Link to comment
Share on other sites

 

I am absolutely not promising anything here, it is specifically a non-goal of my project to start with, and I'm very keen to avoid feature creep but in recent research for this I was thinking of a 'turbo' mode for the 6502 by implementing a 6502 core in the FPGA. The synthesis results were ...

435803605_Screenshot2022-10-03at12_38_36PM.thumb.png.29de1e4af3dbcab7d74cf12d13623e92.png

The '20K' is a better FPGA fabric than the '9K', and it's a larger chip - it costs a bit more of course :) - but that's still a very good deal. It's also not quite as easy to use, it doesn't come with (for me, ideal) HDMI output, but it does have a lot of io pins, which is a real bonus, and some of them are length-matched so they'd be good candidates for HDMI on a daughterboard. I'm still a little torn on which to use :) 

 

Those max-frequency figures might come down a little bit in any real implementation as well - because as it stands they're for just the 6502 core in the chip. Once you start adding more cores, you'll get a more difficult optimization problem for routing and congestion can be an issue. The larger chip has an advantage in that it'll keep its speed for more complex designs, just because of the abundance of resources.

 

For some reason, it seems ok to me to add a high-speed 6502 core to the bus (and in my case, it'd have access to the same RAM as the XL/XE does by mapping the XL/XE into the RAM provided over the PBI/ECI+CART interface) but it doesn't really seem "right" to link in the even-higher speed (200MHz+) dual-core ARM chip that'll be sitting right next to the FPGA...

 

I am seriously tempted to implement much faster 'math pack' routines though. 8-bit calls to the floating point routines could spend more time putting data into the FP registers memory locations than the actual operation takes to perform...

 

 

Link to comment
Share on other sites

Veronika... the big disadvantage there is that it's somewhat detached from the remainder of the system.

I'm pretty sure it can only execute and write to it's own internal storage.

VBXE is similar (though doesn't operate as a CPU) but has the big advantage that it can map and replace large amounts of it's VRam over system Ram.

So things like blits can process graphical objects that Antic can use directly.

It's my understanding that Veronika can do similar but the memory window visible to the CPU isn't as big and shuffling things around is probably not as easy.

 

All that aside though - we have the problem that a second CPU or core on a stock system isn't of much use.  The 6502 is a bus hog and between it and Antic uses all the cycles.

For a second CPU to work would need a fair bit of re-engineering.  A plug in board with 2 or more 6502 cores emulated and it's own Ram and cache might be the way to go.

Link to comment
Share on other sites

11 hours ago, Rybags said:

All that aside though - we have the problem that a second CPU or core on a stock system isn't of much use.  The 6502 is a bus hog and between it and Antic uses all the cycles.

For a second CPU to work would need a fair bit of re-engineering.  A plug in board with 2 or more 6502 cores emulated and it's own Ram and cache might be the way to go.

 

All you need to counter that is dual-port RAM - either the real thing on a chip, or 'effective' dual-port RAM because the RAM requests are being directed towards something external to the 8-bit, and can run faster or disconnect (with value-hold) the ram bus on the 6502 side, or in fact isn't really RAM at all, it's just providing a RAM interface to the 6502. All these last 3 are easy on an FPGA...

 

That way the 6502/ANTIC get to strut their funky stuff as usual, and the co-processor/whatever-you-call-it can run at its own bus-speed with full access to the same memory the 6502 is seeing.No bus contention unless they both write to the same address at the same time, and if you're implementing this, you'd put in a check for that and delay one of them.

 

My approach would the the third one - use EXSEL/MPD to map RAM requests onto the parallel port/Cartridge+ECI and have the FPGA provide that interface. If a memory-aperture is marked as dual-access (host + FPGA) then have an arbiter make sure there's no write-address contention, otherwise both sides get to play in their "own" sandbox. Use the display-list to map ANTIC's video memory to that memory-space, and the 8-bit will happily draw its screen using that external memory, and then let the in-fabric FPGA core have at it when it comes to writing to "video RAM". 

 

This isn't much over and above what I'm proposing anyway, but with "stay on target" in mind (every time I type that, Star Wars "A new hope" comes to mind...) all this sort of fun stuff will have to wait until I have the basics working.

 

  • Like 1
Link to comment
Share on other sites

Veronica's biggest problem is the IPC bottleneck -- there is one bit in common between the 6502 and the 65C816. This makes it very difficult to do reliable two-way RPC, and the window swapping hack I did in Veronica BASIC ended up breaking the V1 devices because they couldn't stably support 65C816 accessing banked memory during a bank swap. If it simply had a second semaphore bit to allow for one bit in each direction, it would have been much easier to deal with. It was designed solely for the 65C816 to be a helper for the 6502, but this is inefficient since the 65C816 is so much incredibly faster than the 6502.

 

I was under the impression that the V2 hardware had already solved the dual access issue by simply multiplexing a single RAM, since for modern devices it's not a big deal to run at 28MHz. True shared memory would make IPC so much easier, as then you can just use any number of classic synchronization algorithms to implement two-way communication.

 

The second major issue with Veronica, though, is that it has no way to bootstrap itself. It literally just provides the 65C816 and RAM, but no flash or mass storage. This means that you have to boot software from other means and you can't use a cartridge device since Veronica already occupies the cartridge port -- so no MaxFlash or SIDE1-3, no SDX for most people that don't have an original pass-through cart, and for XE people, no ECI devices. So in most cases you can only load software over SIO, which is cumbersome.

 

The lack of interrupts also makes it difficult to run unmodified software since you need polling loops to receive input from the computer. With interrupts, it would potentially have been possible to run some unmodified software on the 65C816 with a modified OS to hand off I/O calls to a driver on the computer side. Wouldn't work for anything hitting the hardware, of course, but it would have allowed for well behaved software like ARC.COM. It's hard to run unmodified 6502 software in native mode due to TXS causing a crash, but Veronica will happily run in emulation mode since it's a 65C816 and also doesn't have 24-bit addressing to muck up indexing. Unfortunately, the current architecture means you really have to build software around it. Besides Veronica BASIC, the only other thing I've done with it is to modify Acid800's 65C816 test to run on it since Veronica is the only actual 65C816 I have access to.

 

IMO, what Veronica needed:

  • Flash mappable to the left cartridge window to allow the cartridge to boot.
  • True shared memory, or at least one semaphore bit in each direction, maybe at least two (one each might be prone to deadlocks).
  • Ability for the 6502 to trigger an IRQ on the 65C816.

 

  • Like 6
Link to comment
Share on other sites

  • 2 weeks later...

PiTube6502

There is also 274mhz emulated "PiStorm" like solution for 6502:

https://github.com/hoglet67/PiTubeDirect/wiki

It is good as You probably own old or spare RPi or RPi Zero at home.

So we need CPU adapter like on Amiga but as second CPU like Veronica - cartridge adapter.

Here is old Acorn turbo board:

https://www.domesday86.com/?page_id=1962

 

intro4.jpg

 

Edited by Matej
  • Like 1
Link to comment
Share on other sites

Along that line of thinking.  I wonder what we could achieve if we kept the exact same hardware (in FPGA of course) but simple scaled the speed.  We've said that if ANTIC/GTIA had double bandwidth we could have 80 columns with colour.  So, let's not double the system from 1.79MHz base, but up to 179MHz and let ANTIC/GTIA combo run at 358.  Probably pointless after maybe a doubling or quadrupling of the original speed.

Link to comment
Share on other sites

An FPGA that can do a 179MHz 6502 CPU core is going to be an expensive FPGA. They exist, certainly, but it's not really hobbyist territory. The Tang 20K can manage about 75MHz, and that's around the best bang-for-the-buck FPGA going at the moment. 

 

I think I got ~100MHz on a Spartan-7 a few years back (which are currently unobtainium anyway) so you're probably looking at a Xilinx Ultrascale or one of the new Intel ones for the sort of speeds you're talking about. That's in the "hundreds of dollars" for the FPGA, going on thousands if you're low-volume.

 

If pure speed is all you're after, then Matej's approach is probably the best option. My target is a bit more broad than just a fast CPU though :)

  • Like 2
Link to comment
Share on other sites

Well my idea is to use standard(64kb XE/XL) or upgraded Atari (ultimate, stereo, vbxe, maybe rapidus too).

And have turbo cartridge with RPi Zero or RPi CM.

And have:

A) Standard ATARI platform but games, demos, apps using fast second cpu emulated in RPi

B) Games, demos and apps on upgraded ATARI but using super fast second cpu emulated in RPi

 

I think standard graphics still have potential (with 3D math coprocessor or lot of sprites, big sprites, parallax scrolling, huge maps, lot of enemies, playing samples on pokey etc etc). ARM will act just as second powerful 6502.

 

Also upgraded Ataris. Its big difference between standard Falcon 030 or Amiga 1200 and fast Falcon or A1200 with 68060 turbocard with 16mb ram and 128mb fast ram or more. You can see better or more complex demos (demoscene) or better full 3D games for example.

It will be just co-processor as 387 for 386DX to have 486 PC was. Simply co-processor. When for example WDC will make 150mhz 6502 we can use it too. There are 150mhz Z80 I think for industrial market or so. Its still 8bit but for 21st century. Yes beauty of coding is to use 1,79mhz and standard Atari. But there are new possibilities (3Ds, parallax etc) with fast CPU. 

 

Edited by Matej
Link to comment
Share on other sites

I think the most interesting idea is to start with the existing A8 core for something like MiSTer and then start making improvements as if this were a real Atari in the 80s. Add another POKEY for stereo. Double or quadruple the speed of the chips.  Make derivations of the existing graphics to take advantage of the speed. Make the 6502 dual core. What Atari MIGHT have done to improve the A8 instead of just more of the same.

  • Like 1
Link to comment
Share on other sites

6 minutes ago, Chilly Willy said:

I think the most interesting idea is to start with the existing A8 core for something like MiSTer

For anything dramatic like dual CPUs, you'd have to rewrite the OS anyway. Seems more likely in doing that it would be easier to just take completely modern components but write a backwards compatible OS for them (the "high level emulation" approach taken by several emulators). Hell if you could jimmy up analog video output to Altirra, you'd be right there!

  • Like 1
Link to comment
Share on other sites

The "over the horizon" thoughts I've had along those terms is to implement the 6502 core (I already have that from a previous project), link it up to (say) 8K of 1-cycle block-ram in the FPGA so it's not limited by RAM speed, and run it at ~75MHz because that's what the fabric will support.

 

The interface to the board will mean I have to write a programming language anyway (kind of similar to a modern BASIC/C hybrid, compiled so I get half of zero-page to play with) in order to maximise performance. The memory-dynamics of the 6502 mean you want to get the best out of zero-page RAM that you can, and given that I can export zero-page to be hosted by the FPGA, the long-term plan is to have a language that understands and takes advantage of the capabilities of the bank-switching that the hardware can perform. 

 

Things like:

  • each function gets to use $F0 -> $FF as scratch space without having to restore,
  • a set of up-to-4 byte "registers" in zero-page that can be accessed by writing to 1-byte or 2-byte indexes also in zero-page. That gets you 256 (or 65536) zero-page registers with a small overhead. Writing the index changes the actual memory provided at the 'register'
  • hardware floating point package (write value to FP0 and FP1, write operation to FPop, and result is in FPresult)

On the flight back from NY on Friday (being somewhat captive on a plane), I got out the MacBook and wrote a transpiler (convert a basic version of my prototype language into 'C') to get some insight into how this might pan out. What it showed me was that cc65 is bloody awful in terms of optimisation, and that a full language is probably the way to go. That's a bit more involved... I can re-use the lexer and the parser I have, but I need to generate an Abstract Syntax Tree (AST), then convert that to Linear Intermediate Register (LIR) form to do the sort of optimizations I'm interested in, and probably a Directed Acyclic Graph (DAG) for some others. I can then bind the emitter stage to produce 6502 assembly rather than C.

 

As it stands, it converted (for example) the sieve of Erastosthenes...

@elysium xtal % cat rsrc/input.xt 
PRINT "How many fibonacci numbers do you want?"
INPUT nums
PRINT ""

LET a = 0
LET b = 1
WHILE nums > 0 REPEAT
    PRINT a
    LET c = a + b
    LET a = b
    LET b = c
    LET nums = nums - 1
ENDWHILE

 

to 

 

@elysium xtal % cat /tmp/out.c
#include <stdio.h>
int main(void) {
float nums;
float a;
float b;
float c;
printf("How many fibonacci numbers do you want?\n");
if (0 == scanf("%f", &nums)) {
nums = 0;
}
printf("\n");
a = 0;
b = 1;
while (nums>0){
printf("%.2f\n", (float)(a));
c = a+b;
a = b;
b = c;
nums = nums-1;
}
return 0;
}

 

Each of the lines in the output is a result of parsing the lines in the source. I didn't bother with trying to do pretty indentation on something that was designed to be machine-readable as opposed to human-readable :) I've tentatively named the language 'xtal' ("crystal") where it stands for 'eXTended Atari Language', "extended" being a play on the fact that both the language (compared to BASIC) and the Atari will be extended :) 

 

It wouldn't be so hard to convert that output to be 6502 assembly instead, but the approach needs some modification (as above) for a couple of reasons:

  • the current algorithm isn't tree-like (so for example when performing + with 'C' as the target, I didn't need to know the operation for the two operands before outputting the first operand, it was sufficient to know I *was* in the middle of an expression, and I could rely on C to do the actual operation (addition, in this case). In assembly, the addition isn't do-able in that manner).
  • To get any reasonable performance without a whole load of redundant load/saves to zero-page RAM, I'd need to have the AST, DAG, and LIR stages to operate on and use to transform to a more-optimal representation, and thus a more-optimal result.

Anyway, dragging this (kicking and screaming) back to the point... Once I have my own language, annotating a function as 'fast' could mean that the code is given to the (one ? Many ?) FPGA-hosted 6502 core(s) to handle, rather than the in-built one. A function seems like the best compromise between the setup-cost of getting data to and from it, and the benefits of running it at speed. Of course, if all your code will fit into a bunch of functions, you'll get a huge boost that way anyway :)

 

And doing it with separate RAM (but with access to the rest of the RAM if it's set up that way) means the 'co-processor' 6502 can run without worrying about interfering with the main host 6502. The PBI/Cart+ECI has the /IRQ line as a signal that the 'soft' 6502 wants to talk (FPGA->XL), and a memory-based FIFO buffer will work in the opposite direction (XL->FPGA)

 

As I've said, though, this is all pie-in-the-sky until such time as the current goals of the project are reached. I wouldn't even have written the transpiler apart from the fact that I had nothing to do on the plane, and it didn't seem like a great idea to pull out all sorts of wired electronics while traveling on a plane. Coding on a MacBook by comparison seemed more "normal" :)

 

If you're looking for a good reference for any of the above language stuff, btw, "the compiler book" is a good, free, downloadable PDF which goes through all the concepts in a more-open way than most (yes, Dragon book, I'm looking at you).

 

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...