Chimera Teaser continued - Ouroboros
Besides using the ARM as a coprocessor, the big feature we're working on are the queues. I'd like to call them Ouroboros but that is kind of a long name. It's hard to get used to it. David Crane calls them "data fetchers". Other people have called them "stacks".
If you aren't a programmer yet or if you are a novice, you might not see the value in these, so I'll try to explain the rationale. There are three big advantages to Ouroboros, speed, simplicity, and compactness.
Let's review the Atari 2600 architecture. The 2600 has been described as a "1D" system because it only has enough registers to draw one scanline at a time. In fact most games rewrite some of these registers in mid-scanline to be able to do more, so it could be seen as even more constrained. Think of the Atari 2600 as a series of brushes that you paint with. The kernel allows you to change the colors and sizes of the brushes as the video beam scans across. How quickly and effectively you can make those changes will make or break your game.
One of the big timesinks in a kernel is the logic involved in knowing when to draw a sprite and how to pull the proper frame of sprite animation data.
Atari engineers realized how tedious it was with the 2600 so when they designed the Atari 400/800, they took the TIA and created a helper chip called ANTIC that would automate the kernel process. The data for the screen would come from zones of RAM. You could finally have real bitmaps. Sprites came from strips of RAM. This wasn't like a PC graphics chip with dedicated RAM. It was still flexible enough that you could rearrange where the graphics data came from, even in the middle of the screen, making for fast animation and other effects.
A similar concept can be applied even with the 2600. All you really need is more RAM. The Supercharger provided that. Imagine a strip of RAM representing the sprite shape across the entire height of the screen. Imagine a strip of RAM representing the color of that shape. The game can erase and redraw the shape up and down the strip to move the sprite up and down. This is the same technique used to move sprites vertically on the Atari 400/800. The difference with the 2600 is that instead of an ANTIC display list, you have a handwritten kernel doing everything manually.
So the inner loop of the kernel to draw a single sprite with Y as the scanline counter might look like this:
LDA SPRITEDATA,Y
STA GRP0
LDA SPRITECOLORDATA,Y
STA COLUP0
STA WSYNC
As you can see, there is no if/then logic. The kernel is very simple, leaving plenty of room for other graphics on the same line.
There are problems with even this approach, though. The Supercharger only gives you 6K to work with. Reserving 100 or 200 bytes per element adds up fast. It also takes precious time during VBLANK to erase and write back in the data into these strips.
Remember that most of the space in these strips would be padded with zeroes. This is fine on the Atari 8-bits. I don't think they shipped with less than 16K RAM in addition to the actual gamecode on ROM carts, but it is very wasteful when you only have 4K available at a time and 6K max for gamecode and data.
By 1984, David Crane, who was intimately familiar with the 2600 and also helped write the Atari 400/800 OS, decided to take what he learned and try to upgrade the VCS with something called the DPC chip in Pitfall II. Unlike the Atari 400/800 approach, these strips of RAM would not be randomly accessible by the VCS. The VCS would only see one byte of data at a time. The cart kept track of an internal index pointer for each queue. Each access would trigger an automatic seek-forward operation. When the cart reached the end of the queue, it would automatically loop back. There were other bells and whistles in the DPC chip, but that was the core of it.
So a DPC kernel might look like this:
LDA SPRITEDATA (aka QUEUE0)
STA GRP0
LDA SPRITECOLORDATA (aka QUEUE1)
STA COLUP0
STA WSYNC
Every read operation has now saved 1 CPU cycle and the cartridge is no longer cluttered with large reserved strips, just a zone of 1-byte access windows or hotspots.
No matter what you do to help out the 2600, no matter how much memory you have, you still have to deal with the critical timing of the kernel itself. I think David Crane really knew what he was doing when he devised the DPC chip. My feeling is that "queues" are indisputably the best way to improve the graphics on the 2600 (let alone sound, if you want to do the Pitfall II approach there too).
Nevertheless, even the DPC architecture is missing something. RAM. Pitfall II still just uses the stock 128 bytes of VCS RAM. The queues can be massaged somewhat at runtime, and some are dynamically generated, but by and large you are talking about read-only resources.
Ouroboros in Chimera are intended to take the DPC idea and add the read-write capability. At first I was envisioning them to be writable only at design-time as part of game loads. You could overwrite them only as part of the multiload process. But we should be able to make them fully read-write at runtime.
I am not a DPC expert, but the patent seems to discuss ways to apply masks to the queues. Let's say a queue had a lot of data in it. The game could tell DPC to only show a subsection of the queue and hide the rest. The kernel would try reading from the entire height of the queue, but will wind up only showing the area it was configured to show, starting at a particular scanline. This largely gets around the read-only limitation. On Chimera, we're talking about potentially having as much as over 64K worth of RAM for queue data. So we're not concerned with optimizing for storage. It makes sense for us to simplify the design instead. So what you put in the queue is what gets displayed. If you don't want it shown, erase it. If you want frames of animation, feel free to precook each frame in its own queue if you don't want to overwrite the queue on each frame. It's simpler, and hopefully more flexible.
Right now we intend to control queues with only two hotspots, a read/write hotspot and a seek control hotspot. We'd also have a handful of "group" seek hotspots so you could do a global or grouped seek. The grouping and the loop-points would be set at design-time.
Since this is a teaser, it's a little too early to go over the Ouroboros system in complete technical detail, but I am happy to announce that we are already successfully running multiple queues on the prototype hardware.
Since no emulators support Chimera right now, nor should they since the hardware is in a state of flux, there isn't much sense sharing the binary of the demo. When run on the Cuttle Cart or an emulator, our most recent demo just displays a solid static playfield since the kernel just reads from fixed ROM addresses. When run on Chimera, however, the game clears and loads up 4 queues which generate two multicolored playfield messages that run vertically on the left and right half of the screen. The left half scrolls up and the right half scrolls down. On the surface it isn't that impressive. You could easily accomplish that effect via Supercharger RAM alone, for instance. But it is a proof of concept. If this works, then all sorts of other things should be possible.
Also, these initial queues store their data directly in the ARM's native RAM. What this means is that the ARM could run specialized functions to write to these queues on its own with full unrestricted throughput. One obvious application of this is for bitmaps. We initially intend to use this feature to build the menuing system of Chimera. Chimera would generate text that looks like Stellar Track automatically internally. The actual kernel would be a lot simpler than Stellar Track. It would simply spool out data directly from the queues and not be concerned about rendering any of the text between rows. So we hope that the VCS-side of the menu application will be incredibly tiny. Also, any game that runs entirely out of a bitmap like Stellasketch or Suicide Mission could just pass the rendering task almost entirely to the ARM between frames. You could imagine the kinds of things that might enable you to do. Maybe going from this:
to a little closer to this:
11 Comments
Recommended Comments