Jump to content
IGNORED

To infinity and beyond... (new hardware)


Spaced Cowboy

Recommended Posts

Cartridge support

 

So it would be really nice to have a cartridge slot on the expansion box, as a convenient place to plug things into when your device is a 1088XEL and I've taken up the cartridge slot with the expansion bus... This does however have some issues.

 

The original design was to make all data-access local to the 6502 bus. That's why there's an SDRAM there - to provide plenty of buffering RAM for the peripherals to use. The basic idea for how a peripheral would communicate with the host machine is:

 

  • Let's say we have a MIDI card that receives and sends data and wants to communicate with the host computer when it does so.
  • On boot, the peripheral sets up an IRQ handler in its own section of the SDRAM, and any routines sending data. These will be mapped into a standard location when the IRQ is triggered or when the send-data routine is called
  • {midi data arrives}, the peripheral transmits the data over the link to the SDRAM, and then triggers an interrupt. The data is already in local SDRAM before the interrupt happens
  • {send-data}, the 6502 copies to a local buffer (in SDRAM over the bus) and then calls a TRAP routine (essentially a software interrupt, so set up a few "registers" in SDRAM memory, and tell XMOS to transfer data).

The important thing is that at no time is the 6502 talking directly to the peripheral. That's a "long path" for the data to go:

  • ​6502 writes address/control lines
  • XMOS [local] picks up address after 177ns
  • XMOS [local] transmits address/control lines over the link
  • XMOS [remote] receives the {address,control} packet
  • XMOS [remote] pushes signals out to peripheral
  • peripheral gets data (say a read-request) and pushes data out to its data-bus
  • XMOS [remote] gets the data off the local data-bus
  • XMOS [remote] packages data up and sends it over the link
  • XMOS [local] gets package and pushes data onto the 6502's data bus

The timing budget is given by

 

bus-timings-orig.png

 

... which leaves me with (558-177)ns of time to do all the above. If *everything* synchronises up just perfectly, I think it's just-about do-able, but it's going to be very very tight if it's even possible. So, what else could we do ?

 

 

Option 1: De-serialise

 

I could ditch the serialisation approach and just transfer data over a parallel bus. There are 41 distinct signals (including GND and +5V) on the ECI. The simplest approach might be a 2x21 ribbon cable, but these are pretty bulky at ~6cm for the connector assembly, so would take up a lot of real-estate on the back-panel of an H80 case. The next simplest way of doing it would be to double-data-rate the signal and use an HD26 cable and sockets; these would take up 4cm x 1.2cm on the back panel. We're still only talking a data-rate of ~3.6MHz for this, so there's no need for differential drivers or even twisted pairs. If it came to it, we could use SCSI-2 cabling, although the only reliable source of cables I can find are the 50-way ones, which have sockets larger than the HD26 above, at 52mm long, although with 50 wires, there's no need for data-doubling, which might be worth it.

 

A major benefit is that it would make the electronics a lot simpler, and it still provides a single cable to the expansion box, even if it's a lot less flexible :)

 

 

Option 2: Emulate

 

There's a lot of work been done to identify cartridge types and how any individual banking has been done on those cartridges. I could leverage that, and (on boot) copy all the contents of the cartridge up to local SDRAM (banking as necessary to get all the data), then the cartridge could be emulated by the local XMOS. In this way, we're more in line with the original design.

 

This does have some drawbacks though - there's no way that hardware cartridges (eg: USB, SIDEx, etc.) could be made to work. The hope would be that such functionality would be taken care of by the box itself, but it's still a barrier that I'd rather not have.

 

 

Anyway, food for thought. Is fully supporting a remote cartridge port worth a re-design...

 

Simon

  • Like 2
Link to comment
Share on other sites

Well, it would simplify the interface a lot. It reduces the 'XE interface' part down to a few buffer chips and a transceiver for the D[7:0] lines.

 

I was opposed to the idea originally because I didn't want to have a ribbon cable coming out the back, but having looked around, there's a VHDCI standard which would give 68 pins and it still has a round (if stiff) cable coming out the back. The price of 2 connectors and a 16" cable would be ~$37, which is about the price of my BOM for the interface card, in other words it's a wash.

 

On the positive side,

 

  • I can pipe the audio line through, so the expansion box can source/sink audio as well. That wasn't going to be possible on the purely-digital approach. Not a huge win, but a win nonetheless.
  • I have 68 pins and I need 40 for the system-bus, so I can put the SIO signals on there as well. Then people can design either SIO or parallel-bus expansion cards. That's actually a pretty big win.
  • It means I can just have a single firmware change for the cross-platform aspect rather than two chips. Simplifying that is a major win, in my experience. Anything users might have to update is best made as easy as possible.
  • From a technical standpoint, there's no possibility of the two sides of the link being out of sync - it would have been possible for the interface board with the SDRAM on it to be plugged in and working, but not connected to the expansion box. The firmware on both would have to manage hot plugging (or at least graceful failure).
  • Different hardware configurations are easier to support - no need for a 4-layer PCB and software control, just route the wires from the edge-connector to the correct pins of the VHDCI connector, and you're done. The XL (with its PBI) and any non-atari boxes just got a lot easier.
  • It *is* a simpler system, I'm generally a fan of the KISS principle...

I'm snowed under with work right now (I have a demo to give on Tuesday) but I've ordered a VHDCI cable and connectors. I'll rig it up and see what the signal integrity is like across the bus, and we'll go from there.

 

I'm envisioning 2 internal PCB's, in the configuration {card-edge interface}<---ribbon-cable--->{VHDCI interface} which ought to allow it to be placed anywhere on the back-panel of either a stock XE/XL or in the mini-itx case of your choice.

 

The first PCB would have the XE card-edge interface, line buffers/transceiver, and connections for ribbon cables of the following:

  • ​2x34-way to the VHDCI connector PCB
  • 2x5-way for each joystick port [7 pins used] because why not {1088XEL-specific}
  • 1x7-way for the SIO interface [5 pins used] {1088XEL-specific}

There are 41 pins used for the system bus, so I have 8 currently 'spare'. I won't be running power across the link, but I can always use them as multiple grounds if there's no better use for them. I'll have to look at the C64 bus to see if there's any special requirements there as well.

 

The second PCB would be the ribbon-cable-to-VHDCI connector. The VHDCI connector is a straddle-type, and I *think* the connector above would be panel-mountable such that it can be screwed to the back of the panel with some M2 hex standoffs, and then the VHDCI cable screw into the standoff.

 

Both of these PCB's can be 2-layer and they're simple designs with a short ribbon cable to connect them. Then there's the VHDCI cable to the expansion box, another VHDCI connector, and you're on the expansion box itself.

 

[edit] The Monoprice cable turned up today, and it's a lot more flexible than I'd been expecting - I was thinking it was going to be like the SCSI cables of old on the ST, but it's way more bend-able than those were. I'd say it was slightly more flexible than a DVI cable, if that helps :) The connector's nice and small, I think we have a winner.

  • Like 5
Link to comment
Share on other sites

Thanks for the memory

 

One of the things I want to explore with the system is memory management. The 8-bit machines have a 64k limit on memory address ranges, and having 32MB of SDRAM attached means we have a lot of playground to play in

 

[aside: just to be clear, I'm not suggesting multiple types of memory descriptor here, I'm leading through the thought process, starting from the simplest form to the final type that will actually be implemented - sorry for any confusion :)]

 

The basics:

 

The obvious first step is to allow access to a range of memory. This will be possible using a memory-descriptor that the 8-bit can fill in, and subsequent memory accesses within the range of that descriptor will be fulfilled by the SDRAM instead of the on-board SRAM.

 

Memory will be made available on a page-by-page basis (ie: any contiguous available set of pages in the 8-bit memory space will be mappable from SDRAM). There are memory ranges which the external bus can't override, but in general it will be possible to source any normally-accessible RAM address from the SDRAM. The memory descriptor will have:

  • a four-byte source-address range (expressed in bytes),
  • a one-byte destination address range, expressed in pages.
  • a one-byte size, expressed in pages
This means that by modifying just one register in the source address, everything can shift by up to 256 bytes, and it takes changing a maximum of 4 registers to remap the pages to any location in SDRAM. All well and good, this is a simple linear memory mapping.

 

Graphics uses:

 

Now that we have a basic aperture-of-memory available, we can try some more things - we can add a 'stride' to the memory descriptor, which defines the length of a horizontal line in the backing memory. Consider setting up a memory-descriptor structure like

 

 

$+0000 [4]: Base address in SDRAM for start of memory aperture
$+0004 [1]: Page address in host-memory for start of memory aperture
$+0005 [1]: Number of pages to map
$+0006 [1]: 'stride', or pages per horizontal line
$+0007 [2]: width in bytes of each virtual line

Now, whenever an address is requested within the aperture of {page address + size} pages, the actual byte returned is discovered by using:

 

 

X    = (address - base * 256) % width   // % is the traditional 'C' modulus operator
Y    = (address - base * 256) / width
byte = base * 256 + Y * stride * 256 + X

The stride and width allow us to specify a longer virtual horizontal length than the physical linear map would allow - but yet we can map this into a linear space in host RAM. Again, moving everything left or right by a byte is a matter of changing the base address, by 1 and moving everything up or down by a line is a matter of changing the base address by 'stride * 256' bytes.

 

This is similar to how Antic can have longer lines in memory and then update display-list LMS values rather than copy data to effect a visual change.

 

Services:

 

So one other point is that there is 32 Megabytes of SDRAM available. That's an enormous amount of space, and even when you start taking chunks of it for each peripheral and giving them dedicated i/o space, you're still left with at least 31 Megabytes of space, or 496x the entire address space of a 6502. Clearly there ought to be services available for the use of this memory.

 

One thing we can exploit is that the memory is dissociated from the host computer's bus. Sure, the host has read/write access to a section at a time, and it's probably worth staying away from that while it's mapped into host space, but the rest doesn't have to be static.

 

Consider allocating a 2k memory-mapping into host space that extends for 512k of SDRAM, and just update the memory-descriptor to "scroll" through that mapping by bumping the memory descriptor by 512 bytes every VBLANK - that's just a few register changes.

 

Why do I choose those values ? Well,this thread here at AtariAge uses some clever coding to produce simply stunning 8-bit pulse-density-modulated audio from an Atari XL at 44.1kHz. The audio itself will take a lot of storage, obviously, and at 44.1kHz and 50Hz Vblank, 882 bytes are consumed every 1/50th of a second (it's similar for 60Hz NTSC, 745/second there). That means it ought to be possible to store the PCM audio in SDRAM, and pretend to have a linear buffer in host-ram that extends for the size of the audio in SDRAM, with judicious pointer/memory-descriptor manipulation on every VBLANK.

 

All well and good.

 

Now let's say, when we get to the point 256k into the 512k section, we ask for two things to happen

 

1. Memory Blit: we copy the last 2k of the 512k to the start of the 512k section. This gives us a buffer that means we can switch pointers and nothing has appeared to happen, but we're now in the first half of the memory buffer, not at the tail end.

 

2. Memory Fill: we request that the SD card copy from file /path/to/somefile.ext on the SD card to {memory-descriptor base + 2k}, with a length of {254k}, and an offset of {x}, where x increases by 254k every time we call the copy op.

 

The blit allows us to switch pointers at any point in the last 2k, to point to the same offset in that 2k, but now at the start of the buffer, making for a seamless transition, the read (which will happen in the background, the host computer being blissfully unaware apart from a status byte) will mean the next section of the song is now available once it finishes.

 

Keep ping-ponging between the two halves of the memory-descriptor, reading into one half while we play audio from the other, and we can play literally gigabytes of PCM audio on an 8-bit computer.

 

Blitting:

 

I'm envisioning the 'blit' operation as taking two memory descriptors and copying the contents of the first into the second. In the basic form, it's just a memory copy, but blits could also have operations associated, allowing masks and effectively getting software sprites. Given the memory-freedom, handling horizontal shifts of less than 1 pixel could be done by having multiple pre-shifted sprites and large enough backing defined in the memory-descriptors. In this case the memory descriptor starts to look like:

 

$+0000 [4]: Base address in SDRAM for start of memory aperture
$+0004 [1]: Page address in host-memory for start of memory aperture
$+0005 [1]: Number of pages to map
$+0006 [1]: 'stride', or pages per horizontal line
$+0007 [2]: X position in bytes within the defined descriptor range
$+0009 [2]: Y position in lines within the defined descriptor range
$+000B [2]: W position in bytes within the defined descriptor range
$+000D [2]: H position in lines within the defined descriptor range

... which fits nicely into 16 bytes with one left over for future expansion. Dedicate a page to descriptors, and you have 15 distinct blits that can be called using the software TRAP-style call {routine, descriptor-1-id, descriptor-2-id, operation} from the host to the XMOS chip. Obviously one of those descriptors will be the screen memory if you're blitting to screen-RAM, thus reducing the total down to 15. There's no reason those can't be re-used, of course...

 

I'm sure there'll be more services (JPEG decode, MP3 decode, ?) given that we have a CPU attached to the SDRAM, and can exploit that independently of the 8-bit host, but these are the fundamental ones that come to mind that will allow 8-bits to explore a lot more memory than they would otherwise be restricted to.

  • Like 2
Link to comment
Share on other sites

So things will probably get a little quieter, as I build up the re-design. There's a fair amount of stuff I can re-use, but there's a fair amount of new things as well, so it might take a little while... I also want to play with the card that I have, and code up some software on the XMOS to implement the behaviour of the expected new card - given that it's going to effectively plug into the bus, the "i/o" card I already have is actually a pretty good test-bed. At the very least, it'll let me validate some of the approach.

 

The new design will use a larger XMOS part - there are 4 tiles on this chip, each of which is effectively its own CPU, with its own RAM and other resources. The pins are mapped as below, which'll give you an idea of where it's headed...

 

xmos-tiles-0-and-1.png . xmos-tiles-2-and-3.png

Pins are listed down the side, ports are across the top, with the colours identifying the range of the port to be used for any given function. In summary, the tiles are structured as:

 

Tile 0: USB, SD-card, Debug, UART x2, XTAG-link

Tile 1: VHCI interface, peripheral i/o

Tile 2: SDRAM, SPI, peripheral i/o

Tile 3: Gigabit ethernet, I2C

 

At this point I'm pretty much committing myself to an external case as well - I'll either 3D print it at low volume, or if we drum up enough support, get a case made as detailed earlier. I've surrendered to the inevitable and ordered an H80 case to join y'all :)

 

With that in mind, the expansion main-board will offer a base set of expansion possibilities, and then the slots will allow anyone to do anything they want. At this point, I'm intending the slots to carry:

  • All bus signals @ native voltage
  • All SIO signals @ native voltage
  • All Joystick signals @ native voltage
  • An address stable signal from the XMOS @ 3.3v
  • A 3-bit peripheral-select output from the XMOS @ 3.3v
  • An 8-bit peripheral-wants-attention input to the XMOS @ 3.3v
  • An 8-bit data, 2-bit control input bus from peripherals to the XMOS @ 3.3v
  • An 8-bit data, 2-bit control output bus to peripherals from the XMOS @ 3.3v
... on a physical form-factor of PCIex16 slots, which have plenty of pins and more besides. Native voltage signals will be directly wired to the bus, so cards that want to use them will have to cope with the voltages themselves. Any cards content to interact via the peripheral-i/o bus can use 3.3v (in fact, must, since the XMOS is not 5v tolerant).

 

In addition, the expansion box motherboard will provide:

 

- 2x high-speed (anything up to 10 Mbit/sec) serial ports. More than the host computer can source/sink, anyway :)

- A USB-host interface

- A gigabit ethernet interface.

- An OLED display on the front (SPI-based)

- An SD-card interface (probably full-size).

 

I'm planning on putting these standard peripherals on the left or right hand rear side of the case, with the slots being rear-facing, that way I can put the motherboard against 2 sides and not have to fill the entire base of the unit. There are going to be SPI and I2C busses coming out of the XMOS, so there may be more things as I think of them. For now, thats 'it' :)

  • Like 2
Link to comment
Share on other sites

Speccing out the parallel connector

So I'm in the process of testing out the parallel bus interface to make sure the 1.79MHz bus speed will be successfully transmitted over the VHDCI cable, and figured I might as well make the PCB something we can actually use in the future - to this end it will serve a dual purpose.

  • If plugged into the back of an XE, it'll be a simple conduit (via buffering) to the VHDCI cable which will plug in the back. The goal is to keep the interface short here, so there's the minimum of space used at the back of the XE. I don't have a good way to get Joystick, or SIO connections to this PCB in this configuration (Lotharek has SIO plugs for ~$8, but I don't know where I'd get a socket to provide the onward daisy-chain)

 

  • If plugged into a 1088XEL, it will act as the aggregator of the {bus, joystick, SIO} connections, and feed them all to a 2x34 way ribbon cable, that in turn has a PCB that takes the ribbon cable and maps the signals 1:1 onto a VHDCI cable connector. This lets me have a tiny PCB that (because of the VHDCI connector mounting screws) is panel-mountable, and allows a lot of flexibility with placement.

 

With that in mind, I started figuring out which signals need buffering, which ones are bi-directional etc. If those-in-the-know could take a gander at the below, I'd appreciate it. I think I have them all in the correct buckets, but the more eyes the better...

 

pin-directionality.png

 

Cheers

Simon

  • Like 2
Link to comment
Share on other sites

Quick update

I'm ankle-deep in the re-design, and it's already undergone some fairly major changes to the above because I discovered that the XMOS USB library was device-only. I'd like to (eventually) support USB keyboards and mice which means the chip has to support host (or on-the-go) mode, so that was a bit of a deal-breaker. To be clear: I'm not saying this is all going to be there on day-1, but I want to make sure I remove as many barriers as possible to any future development, during the design phase. Anyway, for the same cost of the larger XMOS chip, I can get a smaller XMOS + an ARM M7 which has built-in USB (host, device, and OTG). I've re-jigged a few things on the XMOS side to account for the smaller chip ...




redesign-xmos-side.png


It occurs to me that this sheet may need some explaining - basically XMOS chips have fixed ports, of different widths (1,4,8,16,32-bit) which overlap the pins. All the pins on any given port have to be either read or write, you can't mix. That means there's an extra level of planning to make sure that each port is doing what you want. In this revision of the design, I've accounted for any future-needs for different hardware support by adding in 4 more bits of read-port from the VHDCI cable, and 4 more bits of write-port. If and when I get around to doing the (example: C64) version, the wiring from VHDCI to XMOS will remain fixed, but the signals will be specific to the hardware it's connected to, because of course I can flash a C64-specific image to the XMOS and the interface cable on the host side defines the wiring loom layout. As long as I match address-bus to address-bus, data-bus to data-bus, and common control signals to the same, it'll be all well and good.

Given that there is reduced functionality on the XMOS side, we now also have:

redesign-stm32f7-side.png


This is basically the expansion box motherboard host. It handles the peripheral-slot i/o (relaying to the XMOS for transmission back to the host) and also handles the motherboard-supplied peripherals (10/100 ethernet, full-speed (12Mbit/sec) usb, 2x serial (up to 27 Mbit/sec), SD card) and an experiment in video. You can see I've used up every pin available...

Hang on, what was that last thing ? Video ? Well, it occurred to me way back when I started this, that given access to every byte transferred over the bus by CPU and ANTIC, it ought to be possible to build a display just like ANTIC does. You'd need to understand the video-subsystem of the host, and you'd need to be able to produce a video signal. I'd already tried shoe-horning it into the XMOS, but with each core running at only 100 MHz, it was getting pretty tight without using too many resources. With the two chips now available, one of which has video-generation hardware, I have a bit more leeway so I could re-visit... Here's the plan:

  • You need to be able to interpret the bus activity on the fly. Pokey interrupts and DLIs can change the colour registers as the screen is being drawn, can move around players and missiles, and can instigate horizontal scrolling etc. There's a lot to support there.
  • Fortunately we have the XMOS monitoring every bus transaction, and I've included a dedicated video-bus to the STM32F7. I'll dedicate a core on the XMOS to understanding the video-layout of the host, and when ANTIC reads data (/HALT goes low), or when the display list vector is read, or when the display-list itself is read, or when a colour register is changed, or... that core will send a packet over the video-bus.
  • The STM32F7 will get an interrupt (its interrupt latency when running from ITCM memory is 12 clock cycles at 216 MHz) as the packet on the video bus is sent and it will process the packet as needed. The video-bus interrupt request line is the highest-priority ones so it ought to be able to respond quickly. The STM32F7 actually has a reasonable amount of time to respond, even if ANTIC is fetching back-to-back data (which it mainly doesn't), the bus-cycle is only 1.79 MHz, and the STM is running 120x faster than that).
  • The current plan calls for a double-buffered display (ie: I'll be writing to the framebuffer that will be displayed *next*), which puts me a frame behind the host computer (since it's generating the video signal on the fly). I'm not sure if that will be a deal-breaker. Maybe for some. It may be possible to write to the active framebuffer, but you're racing the beam and I'm not sure what that would look like. If I fall behind the beam, the data won't be displayed... If I subsequently catch up, the screen would start to draw again. It might be weird.
  • I'm currently planning on providing output via HDMI using an ADV7511 - the LCD interface on the STM32F7 provides 24-bit RGB, /HSync, /VSync, CLK and /DE signals, which is a good match for the ADV7511 inputs.
  • The ADV7511 can take digital audio, and encode that into the HDMI signal. There are 3 ADC's, I'll connect the SIO audio line to one via a 5v->3.3v scaling circuit, and we can have audio too (I'm assuming the audio is bi-directional, ie: it's not just the input from SIO, but also the output from the XL). There's also the (stereo!) audio output on the 1088XEL, but I'm not sure how to convert a +/- 0.4v signal into something that oscillates between 0 and 3.3v without corrupting the signal. I'll ask around the EE-types at work, or if y'all have any ideas, feel free to chime in. In any event, those signals will be connected to another 2 ADC inputs via the magic black box that will scale/offset them :)

Of course, it doesn't stop there. Once we can support standard atari video modes we can look at expanding it; since I'm using 16-bit access to SDRAM, according to ST 640x480 ought to be reachable in 24-bit colour, and 800x600 in 16-bit colour. If we use an indexed mode (256 colour registers) it could go as high as 1920 x 1080 (!) Coming back down to earth, it does support 320x240 as well, so mapping the existing atari modes to the output ought to be possible without playing tricks by plotting 4 pixels for each one.

One last note on video. It occurs to me that since I'll be writing to a frame-buffer, it ought to be possible to enable the pal-blending style of APAC-style modes. It would probably have to be software-enabled (flip a bit in a "register") or you'd see sharp colours on one scan-line followed by sharp greyscale on the next, but that could be one advantage of *not* racing the beam. It's also worth mentioning that in emulation on my Mac, double-buffering is always enabled by virtue of the host operating system, and Altirra still seems pretty darn good to me :)

And one last thing: Seeed offered me two options to make good on their mistake - they would either (at their expense) ship the boards back to them, re-work them, and ship them to me again, or they'd offer me a refund coupon against further work. I chose the coupon, and it arrived today, valid until 18th August. I was critical of them when they made the mistake, so it behoves me to be appreciative when they make up for it. A "hip, hip, hurrah" for Seeed :)

[addendum]

Had a chat with one of the EE guys, he suggested ...

level-shift-audio-signal.png


... as a way to get the signal within range. It occurs to me that I actually have a 3.3v rail and a 1v power-rail, so I could do the same thing with the 1v rail and get a better dynamic range, since the signal swing is 0.8v. He did caution me that the current across the potential divider (+3.3v to ground) ought to be within an order of magnitude of the audio-signal current - if we're talking milliamps of supply current, then 3.3v and 1K resistors work reasonably well, I may need to change it for +1v<-->GND.

Question: what is the supply current on the 1088XEL audio pins ?

Cheers
Simon

  • Like 3
Link to comment
Share on other sites

In a similar vein - anyone know if Antic's /HALT output is open-drain ?

 

I'm looking at the schematics (page-1, middle right) and my best guess is that it's a standard digital driver. I also don't see any pull-up, either internally or externally on the 1088XEL schematic, so I'm not that hopeful - it would however be very cool if the cartridge port could take control of the /HALT line so thought I'd ask while I'm figuring out the directionality of the signals for buffering etc. ...

Link to comment
Share on other sites

Well, to quote from the Turbo Freezer manual:

'This is achieved using a small trick. The freezer pulls the line of the ANTIC refresh pin, which is intended to be an output, to “low”. This makes the MMU in the Atari believe that ANTIC is performing a memory refresh, hence it deactivates all built-in memory. According to Bernhard Engl, who works as a full-time chip designer nowadays, it is not dangerous to pull a “high” level “low” from outside in the NMOS-Depletion-Load technology, because that is how NMOS works anyway. If the ANTIC had been manufactured in CMOS technology, this approach would have been dangerous and could have damaged the ANTIC – and there never had been a TURBO FREEZER.'

 

OK, thats talking about refresh and not halt. But I guess the same should be true since its also from Antic. Has anyone tried this?

  • Like 1
Link to comment
Share on other sites

That ... is pretty cool.

 

I completely forgot we were using NMOS technology back then - my head is wired (pun intended) to CMOS constraints, and having one chip drive high when another drives the same signal low in a CMOS device is ... well it's not pretty.

 

There is a difference in the output pin configuration of /HALT and /REF ...

 

refresh-vs-halt.png

 

.. and I'm not quite sure I understand what the difference is between them, but it's maybe worth a try.

 

What I'm thinking of is that if the external box understands Antic's /HALT cadence, it can insert its own /HALTs in-between, and perform memory writes directly to internal registers or internal RAM. It'd be like an uber-DLI, because there's no IRQ handler required, and you're not limited to the CPU altering memory/registers.

 

The XMOS chip has on-board high-resolution timers, easily fast enough to do per-pixel writes to (to pluck an example out of the air) color registers... Of course, halting the CPU on every pixel might be a noticeable slow-down, but it's worth exploring as an idea I think :)

 

By the way, I'm slowly making progress on the XIO board. I've been mainly creating symbols for the schematics, but if you want a sneak-peek, it's currently looking like this...

 

xio-layout.png

and here are the corresponding schematics (PDF).

  • Like 2
Link to comment
Share on other sites

There's another couple of things I thought of, which open the floodgates even more.

 

Multi-master DMA to RAM

Let's say you define a RAM mapping in the SDRAM of 0..64K, ie: all of memory. Now, any access to an available RAM address will go to via the XIO SDRAM rather than internal RAM, no matter whether you have 16k or 64k installed. What does this get you ?

 

Well, now you not only have the XMOS chip having access to that memory, any device in one of the slots could send a write-request via the XMOS to the SDRAM, so you have multiple-master DMA without losing any 6502 cycles. In fact, unless we arrange for it (by sending an IRQ), the 6502 won't even know anything has changed. This is independent of whether the Antic /HALT thing works. We can definitely do this.

 

Cycle-timed access to hardware registers

 

The above is of course limited to RAM, and you might want to do change registers or access hardware. Let's offer an 'access-list' (akin to the display-list) which is basically a tuple of {time, address, value, op}. At any given {time} (modulo Antic's control of the bus, in which case your access is deferred), the XMOS could assert /HALT, and ...

  • If 'op' is 'write', take over the bus on the next clock and issue a write of 'value' to 'address'.
  • If 'op' is 'read', take over the bus on the next clock and issue a read of 'address' and report back to the caller
  • If 'op' is one of 'and','or','xor', do a 'read/modify/write' on the address, starting at the next clock

We could have a 'loop' op as well, which would reschedule the same pattern to start again on the next frame (otherwise it's a one-shot sequence).

 

This needs the XMOS to have characterised the current Antic pattern of /HALT asserts, and I think that could be done by piggy-backing onto the screen-parsing code I'm intending to have anyway. The plan is to wait until something (probably Antic) reads the display list (by monitoring ${D402,3}) and then looking for LMS instructions within the display-list space. The first LMS defines the start-of-frame for Antic, and we wait for the cycle to repeat.

 

Once we've found Antic reading it's first LMS, we just set a timer and whenever a bus-cycle has /HALT asserted, we note it down as a time that Antic is in control, and so we're not allowed to assert /HALT ourselves. Timers on the XMOS are accurate to ~10ns, so we'd have periods of {X nanoseconds from beginning of frame -> X + 1120 ns from beginning of frame} as regions where we would not be able to assert /HALT. My understanding is that Antic asserts /HALT on the clock before it accesses the bus, so we need 2 clocks of time blocked out where we can't assert /HALT ourselves.

 

The 'access list' would be offered via a software-trap kind of interface to the program running on the 6502, and as a message-based interface to the peripherals. Peripherals would have to be careful about what they did though - maybe only doing any of this when the program running on the 6502 has in some way enabled it - you don't want weird stuff happening in the middle of a good session of Ballblazer :)

 

Issues

  • Whether this plays well with other internal memory expansions like the U1MB, I don't know. In theory, /EXTSEL ought to be in control of the memory bus, but other hardware may take a different view :)
  • There's an implicit acceptance that Antic remains in control of the bus. I don't think there's really any option here, but what it implies is that we have to characterise the /HALT timings before we can do the above. That could be an issue if the display-list changes from frame to frame. I think we could mitigate this (with even more monitoring) by looking for writes to the display-list vectors, and to the display-list-content area - if we see a change we defer a frame so we can re-characterise the /HALT pattern.
  • If software changes the display-list on a per-frame basis, we'd never get to run. I'm just going to accept that as a limitation. The only things doing this will be newly-written software anyway, so they can make sure they *don't* change the display-list too often.
  • In fact, given that it's only going to be new-software that will care about this, it may even be appropriate to make the re-characterisation process a software-initiated thing - ie: if you're going to be doing things to the hardware registers, make sure you (as the main running program on the 6502) are a good citizen in terms of display-list management.

 

Anyway, food for thought. I'm going to link up the analyser again (just my own one, on limited pins) and try and get a high-res (in terms of time) picture of what's going on with the bus to try and help flesh this out.

 

Simon.

  • Like 6
Link to comment
Share on other sites

  • 2 weeks later...

So, still alive and I've just finished the "baseline" schematics for the expansion box - baseline being defined by me having all the bits in there that are necessary. There's still a few things to go ...

  • I need to add the cartridge port, though given the connectors I have access to, I think that will be done using a mezzanine card, bolted to the side of the case, so all I need on the motherboard is a sufficiently large pin-header. The reason for the side-access is (1) this will eventually be able to fit under the 1088XEL in the standard H80-style cases, and (2) the connectors I have are all vertical ones, not right-angle ones, so the PCB has to be parallel to the side of the case that the slot is cut in.
  • I may yet add a little OLED display. I have to come up with a real reason (other than: it'd look pretty) though...
So the first stab at layout looks something like this...

 

 

xio-baseline.png

... but that will definitely change as I optimise for different criteria. As it stands, it means the cards all have to be pretty long, since the rear slots are at the bottom of that view, and the SD card pokes out the front of the case. Ideally I'd like to move the slots to the back, but it causes the impedance-matched and length-matched routes (HDMI, Ethernet, USB) to be more awkward to route. Still, I have some time to play with it now :)

 

If you're interested, the schematics for the whole thing are also available.

 

Anyway, thought I'd post an update because it's been a while...

  • Like 3
Link to comment
Share on other sites

  • 2 weeks later...

So here's where we are now...

 

xio-prior-to-review.png

 

Things have changed a bit :)

 

The main difference is that the XMOS chip has disappeared, and we're using 2 identical STM32F7 parts. These are 5v-tolerant, so it makes interfacing that much easier, and I'm pretty sure they're sufficiently fast to handle the bus / computation / RAM retrieval / signaling. The first STM is 'XIO' and handles all the slots, serial i/o, and the interface to the 8-bit. The second ('PIO') provides the high-speed peripherals... USB, 100-Base-T ethernet, and HDMI.

 

The port map for the host-interface XIO chip looks like:

 

xio-portmap.png

.. and for the high-speed peripherals we have ...

 

pio-portmap.png

The two STM's communicate via dual 54-Mbit/sec SPI interfaces, and the XIO board has a 26 Mbit/sec SPI channel which it uses to communicate to the slots. Each slot has a hard-coded 4-bit port-id, a 'card-is-present' line it has to pull down, and an I2C and SPI interface to the XIO. The XIO provides 1 MB of PSRAM, used so that the access-time is predictable rather than SD-RAM where refresh might cause a delay too long to be tolerable. This might change after review.

 

What hasn't changed is that the slots are going to be at the front of the box, so any cards will have to be 17cm long. The reason for that is that I can't get the high-speed routes (HDMI is a couple of GBit/sec/lane) to simulate correctly when their board traces are too long, so they have to be close to the back, which pushes the slots towards the front (top) of the board. I might try and move the two main chips to the top, and put the slots in the middle, but that will depend on the review.

 

Review ? Where I work, when an EE has designed a board (which are *far and away* more complex than this), they all get together and "discuss" the schematics - it's called a 'schematic review' for a reason [grin]. I'm planning on providing pizza and soft-drinks if they'll do the same for the above over a lunchtime or two. I'm most definitely not an EE, but I figure if it works for them, it's probable an even better idea for me to do it... so I'm hoping the ensuing deluge of criticism will be constructive :)

 

Anyway, it doesn't look that much, but it's a reasonable milestone - it routes @ 100%, the simulations look good on the high-speed (ethernet, HDMI, USB and SD-card) lines, and it's still fitting onto a 4-layer board (I was afraid I'd have to go to 6-layer at one point).

 

 

[update]

One of the things I have to do as part of a design review is produce a block-diagram at the top of the PDF I send out prior. Typically our designs will have pages of block-diagram, let alone the schematic. This one's a lot simpler, but it'll give an idea of the system design, so I'm uploading here as well ...

 

xio-block.png

 

Simon.

  • Like 7
Link to comment
Share on other sites

100 mbit ethernet.... will the hardware keep up at full duplex eating from and feeding the the port?

 

 

Well, the 6502 won't, but the STM32F7 can. The thing is that the ethernet is across a 50 MBit/sec SPI bridge from the 6502 in any case, so the host cpu could only send a max of ~50 MBit/sec in either direction anyway.

 

The thing is, though, that ethernet is a packet-switched network that allows for packet rate management. The host can send / receive packets as fast as it can, and the ethernet interface will slot them in whenever there is an idle network state. If more than one device on the network tries to insert a packet at the same time, it'll cause a collision-detection and they'll both back off a random amount of time and try again.

 

So you'll end up with the STM32F7 handling all the networking and the 6502 sending data as fast as it can. The plan is to provide a BSD sockets interface to CC65 which isn't totally fleshed out in my head yet, but the basics will be:

 

  • on calling 'socket()' you get back either -1 (error) or a number from 1..N as the socket-id. This number is your socket handle, and all future data communications will use it to determine which socket you're talking to.
  • 'write()' will use the socket-id as an index into N FIFO addresses, meaning that it will sequentially write to the same address over and over, with the STM32 buffering the data into a sequential buffer before calling its own 'write(...)' routine to send it over the network.
  • 'read()' will work in the same fashion, sequentially reading a byte and then checking a status byte for the socket to see if the read was successful. Since sockets are 8-bit clean, there has to be a separate status-read operation per byte.
  • etc. Basically we don't take up much RAM on the 6502 side for the socket itself, we buffer it all on the STM side and expose it through FIFOs.

 

The cool thing is that since the socket interface will be just registers, any language ought to be able to implement a BSD-sockets layer pretty simply, even BASIC. Once you have BSD sockets, implementing telnet/HTTP etc. isn't too bad.

 

The time to write 256 bytes would then be:

 

- write 256 bytes to address X (where X is based on the socket handle).

- write a 'send' byte to the control register for the socket

 

Assuming you're just writing an existing 256 byte page-aligned buffer in 6502 RAM, that's (approximately, there's probably a bit of function-call overhead as well)

 

- 256 * (4 + 4 + 2 + 3) clocks for {LDA $addr,X , STA $fifo, INX, BNE} loop

- 20 clocks or so for the control byte

-> ~3350 clocks or ~ 1/534 secs

-> ~133 KBytes/sec (or ~1Mbit/sec if you prefer)

 

Yeah, that's a very rough guide, but it's gonna be there or there abouts.

  • Like 2
Link to comment
Share on other sites

another though I had was a card for your gizmo that would pre process/convert images and pages and stuff it through the bus, so a web page would display and if you clicked on a video let's say, maybe it would play using something along the lines of Avery's player... so from lan to card to Atari... :)

Just another crazy idea that could work if we have the speed and work done in your box...

Edited by _The Doctor__
Link to comment
Share on other sites

There's a reason why I struggled to get 32-bit access to the SDRAM on the video port of the 'PIO' STM chip - I wanted high bandwidth access so I could push the video output as much as possible. There's also a built-in hardware JPEG decoder in the STM32 fabric ...

 

Cheers

Simon

  • Like 2
Link to comment
Share on other sites

Wow sounds like a think tank of talent being utilized for improving our very old tech. So I'm curious as to what they think about your project being based on an Atari? You can answer that in your super PBI thread if you wish.

Schematic review [latest PDF] set for tomorrow lunch. So far it's all been very civil - they're treating it like any other project and I expect to get grilled, just like any other EE presenting their own schematic. I (at least) have the defence that I'm a talented amateur at PCB design at best though, so I'm hoping they'll go (relatively) easy on the criticism and just point out the mistakes that you can tell from having the experience... There's a fair difference between "What simple-minded cretin would ever do ..." and "it turns out that the better way to do this is..." :)

 

In truth I'm over-stating this - I expect it to all be friendly and helpful - from what I understand the EE's can be pretty brutal amongst themselves, but they're a pretty cool group to work with.

 

Making your own stuff is fairly common in this group; indeed, during my (8-hour !) interview to get into the group, as soon as they realised I'd designed and built a saltwater reef-tank controller they zeroed in on that, asking everything from the thinking behind the overall design, through communications protocols I'd used, design decisions for chip selection, coding practices, software update procedures, boot loader, etc. etc. They used my home project as a compare/contrast to the working methodologies of the group. Apart from one guy, who made me design a realtime debugger for an ARM processor from scratch.

 

Anyway, the point is that they'll tease me for the things I get wrong, and they'll help me through some of the trickier parts, and point out the ways I could make it better. I don't think they really care about the specific purpose, just the actual design. It's a bit like playing D&D - you suspend your disbelief in magic and monsters, and then everything has to follow logically.

  • Like 4
Link to comment
Share on other sites

Schematic review [latest PDF] set for tomorrow lunch. So far it's all been very civil - they're treating it like any other project and I expect to get grilled, just like any other EE presenting their own schematic. I (at least) have the defence that I'm a talented amateur at PCB design at best though, so I'm hoping they'll go (relatively) easy on the criticism and just point out the mistakes that you can tell from having the experience... There's a fair difference between "What simple-minded cretin would ever do ..." and "it turns out that the better way to do this is..." :)

 

In truth I'm over-stating this - I expect it to all be friendly and helpful - from what I understand the EE's can be pretty brutal amongst themselves, but they're a pretty cool group to work with.

 

Making your own stuff is fairly common in this group; indeed, during my (8-hour !) interview to get into the group, as soon as they realised I'd designed and built a saltwater reef-tank controller they zeroed in on that, asking everything from the thinking behind the overall design, through communications protocols I'd used, design decisions for chip selection, coding practices, software update procedures, boot loader, etc. etc. They used my home project as a compare/contrast to the working methodologies of the group. Apart from one guy, who made me design a realtime debugger for an ARM processor from scratch.

 

Anyway, the point is that they'll tease me for the things I get wrong, and they'll help me through some of the trickier parts, and point out the ways I could make it better. I don't think they really care about the specific purpose, just the actual design. It's a bit like playing D&D - you suspend your disbelief in magic and monsters, and then everything has to follow logically.

 

News??

Link to comment
Share on other sites

Sorry to be late reporting back, I wasn't trying to be mysterious or anything, life just got very busy all of a sudden.

The review was very cordial, everyone was very friendly. They gave lots of tips regarding things that I'd either missed or not considered. Things like:

  • In the process of transcribing from XMOS to 2x STM32F7's, I'd forgotten to add in a reset supervisor circuit again, so there was no reset button any more, and no pull-up on the /RESET net.
  • I'd missed the pull-ups on the SWDIO nets for the ARM chips as well
  • They recommended filters on the analogue power supplies, and "PI" filters on the digital supply lines -(there are 3 different supplies) for the HDMI chip
  • They suggested serial terminating resistors on some of the data transport ines (primarily clocks, but also SPI MOSI)
  • They suggested capacitors on the power rails where the user can plug things into, e.g. the slots and the cartridge port. They also wanted bulk capacitors (say 4.7uF) on larger-parts VDD lines - e.g.: SDRAM.
  • Also little things like bringing out more test points, removing 4-way junctions in the schematic, and placing 0-ohm resistors on all the power rail sources for bring up when testing for shorts etc.

There's probably more. This is just off the top of my head. I fixed all the things they were talking about, so I'm good to go there. I have to replace the PSRAM with SDRAM, because refresh cycles aren't something I need to worry about, and I might try to bring out the other chips USB, so we could have 2 USB complexes made available.

The elephant in the room, though, was a simple question at the end: "Why aren't you using the STMH7 instead of the STMF7". To which I didn't have a good answer because I hadn't considered it. The H7 is 2x faster than the F7, which would be very useful in parsing, processing and responding to the 8-bit bus signals, but it's not 5v tolerant, so I'd have to level-shift everything. It has other benefits too, it has 1MB of embedded SRAM, and it bumps the SPI transport to 150Mbit/sec rather than 50Mbit/sec. It's also pretty much a drop-in replacement in terms of pin-out. Things I have to think about:

  • Regarding my idea for taking over the bus with /HALT, I need to make sure the level-shifted signal lines can be reversed and sourced from the expansion box. I have to get that right now, not defer until later because of the level shifting thing.
  • I also have to make sure the level shifted lines have sufficient drive strength to send the signal over the VHDCI connector
  • And I have to figure out if the extra $10/chip, so $20 per expansion board (at Q100), is worth the doubling of the CPU horsepower. So I've spent what little time I have had over the last few days figuring out the timings for how the STM chip would handle the interrupts when the 8-bit bus clock line goes low. It's a bit tighter than I would like, but it does seem do-able with the STM32F7, assuming I run interrupt code from ITCM/DTCM (the instruction/data tightly-coupled memory). It may mean the STM32F7 chip spends most of its time handling interrupts though, which might have knock-on effects I don't want....

So far, I've thought of, and discarded:

 

  • Adding an FPGA (Spartan-2 is 5v tolerant), but the Spartan-2 is (a) more costly than using the 'H' part, and (b) over a decade out of date, so the tools aren't great, and could even be hard to come by
  • Using a CPLD (Coolrunner XPLA3 is 5v-tolerant) to handle the bus logic decoding. The CPLD doesn't have enough register-space for my 'memory descriptor' scheme, though.
  • Using a PSOC-5, which has UMB blocks to handle the decoding (basically 24 tiny CPLD's), but I'm not very familiar with the chip, and again it's adding complexity I don't want.

So I'm tentatively coming down on the side of using the 'H' parts, which gives me a lot more time to perform instructions in the interrupt handler (I might even be able to get away with writing it in 'C' rather than assembly). I just need to make sure they won't blow up when the power is applied, and get familiar with the programming architecture. I'm very familiar with the 'F' parts, but I've never used the 'H' parts before. ST have good documentation/examples generally though, so I'm not overly worried about that. Using the H parts would also help on the graphics chip in terms of what could be done per frame (twice as much :)

 

So, I'm considering. That's what I'm doing. I'm considering :) I'll get back when I have made up my mind, but feel free to chime in :)

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...