Understanding the CRU and usage implications

+OLD CS1 · December 30, 2014

To make certain I understand, the CRU is comprised of the CRU{CLK|IN|OUT} lines, as well as A3-A14, with A0-A2 assumed to be 0. In the world outside the console, CRUOUT is applied to A15. If I understand, a CRU addressing cycle steals away from what could be a regular data address cycle, rather than interleaved especially in external peripherals as A15 is "borrowed." After receiving a CRU address, the device may or may not be expected to return data on the CRUIN line.

Is that close to being correct? I used to think that the CRU worked like a three-wire fast serial buss between CPU and peripherals, asynchronous from everything else in the system, but I seem to have misunderstood part of that.

Stuart · December 30, 2014

Broadly correct.

For CRU devices such as the TMS9901 and 9902, as well as CRUCLK, the device is also supplied with phase 3 of the processor clock. CRU cycles do indeed 'cycle steal' from normal memory accesses and everything is nice and synchronous.

In a typical system that contains several CRU devices, you would decode A0-A9 to produce a chip select /CS signal for a particular CRU device, and A10-A14 would select a particular CRU bit on that device. Just making /CS active for a CRU device won't actually do anything - it needs the pulse on CRUCLK to perform a CRU access cycle.

Be aware of a 'shortcut' on the TI-99/4A console circuit. For a CRU access cycle, A0-A2 are most definitely 0. There are five 9900 instructions - LREX, CKOF, CKON, RSET and IDLE - that produce a pulse on CRUCLK with A0-A2 set to a non-zero value. The intent is that this combination of signals is decoded and used to control some (non-defined) external hardware. But in the console, A0-A2 are not fully decoded, so the console hardware will interpret these five instructions are CRU accesses, with (possible) strange results.

gregallenwarner · December 30, 2014

LREX, CKOF, CKON, RSET and IDLE - that produce a pulse on CRUCLK with A0-A2 set to a non-zero value.

Are these commands supported by the E/A module? If so, could these potentially be used to control some hairbrained console modification ideas, or is that just asking for trouble?

Willsy · December 30, 2014

Are these commands supported by the E/A module? If so, could these potentially be used to control some hairbrained console modification ideas, or is that just asking for trouble?

Yes. The EA will assemble these instructions.

+mizapf · December 30, 2014

... or is that just asking for trouble?

Exactly. :-)

The problem is that you have to include a decoder for A0-A2 to make sure that this is not misinterpreted as a CRU access. At this time, the lines are not decoded, so an execution of one of the control instructions may make the TMS9901 in the console pick up some values, or maybe some other peripheral device gets triggered. One could, for instance, combine the CRUCLK with the A0-A2 lines, like CRUCLK1 = CRUCLK * ¬A0 * ¬A1 * ¬A2. Just a quick guess, I'd have to look up the details.

[Edit:]

The effect depends on the CPU's reaction to the external command invocation. I studied the specifications, but I could not find a clear statement what happens to the remaining address lines A3-A14 during the cycles of the external command. If they fall back to 0, this could be interpreted as an access to the TMS9901 bit 0. The two machine cycles of the external command look like CRU output cycles as there is a CRUCLK pulse, so the 9901 will sample the value on CRUOUT (which is not intended to be set to a particular value for external commands, could be 0 or 1) and probably change from interrupt to clock mode.

If the address lines A3-A14 are not cleared, they may carry some previous setting, or, as a third alternative, they could output the contents of R12 as with normal CRU commands. In that case, anything could be possible, depending on the value of R12.

Edited December 30, 2014 by mizapf

mnbvcxz · December 31, 2014

As I understand it, it requires 1 cycle to transfer each bit of data, if ports were memory mapped I/O could work 16 times faster.

It has to be the worst idea ever.

Stuart · January 1, 2015

As I understand it, it requires 1 cycle to transfer each bit of data, if ports were memory mapped I/O could work 16 times faster.

It has to be the worst idea ever.

But on the plus side, you're not losing memory space to I/O. And at the time, it might well have been considered fast enough. One could still implement memory mapped I/O if one needed it.

Willsy · January 1, 2015

The best solution I've seen is the Z80 I/O port system (IN & OUT)

Lisias · January 5, 2015

As I understand it, it requires 1 cycle to transfer each bit of data, if ports were memory mapped I/O could work 16 times faster.

It has to be the worst idea ever.

As a matter of fact, it was one of the best ideas ever.

The I²C protocol is doing exactly the same for decades now, and even our current Hard Disks are using serial protocols - SATA.

Serial protocols are cheaper (fewer wires across the board/cable) and faster (fewer signals to cause interference each other, so you can speed up things).

Of course, if you have space and money to spare, an equivalent 8/16/32 bus at the same speed will be 8/16/32 times faster. But it will occupy 8/16/32 times the board space, and will cause/suffer 8/16/32 times more interference.

Edited January 5, 2015 by Lisias

Lisias · January 5, 2015

The best solution I've seen is the Z80 I/O port system (IN & OUT)

It's the most elegant solution, but not necessarily the best.

The IN/OUT instructions needs 12 t-cycles to be executed, while a simple LD (HL),A takes only 7. Compare them with the 6502's LDA/STA, that takes only 4.

If you need performance, you should use memory mapped I/O (or better yet, a DMA controller).

Things became better on the R800, that needs only 4 t cycles for IN/OUT and 3 for the LD.

(I never understood why Z80 take so long to execute the IN / OUT instructions, I think it should be faster than accessing the memory...)

Tursi · January 5, 2015

CRU has been debated as best/worst/meh idea back and forth... but it's a rather inefficient way to transfer data. There's no real penalty in using it for single bits.

The usage of SATA and I2C doesn't tend to block the host processor or the memory bus - on nearly all systems those are implemented, they are dedicated busses. Doing that, rather than tying up the entire memory system for a single bit of data, would have been amazing, and made using CRU for serial data transfer valuable. As it is, it's kind of 'meh' and doesn't do anything that a memory access couldn't do faster, except live in a different virtual space.

Lisias · January 5, 2015

As it is, it's kind of 'meh' and doesn't do anything that a memory access couldn't do faster, except live in a different virtual space.

It saves 13 long tracks crossing the motherboard to connect the CRU devices, saving board space (and size), reducing routing problems and, so, reducing costs. Not a small thing.

16 and 32 computers became really viable when 3 and 4 layers board became affordable. But on the 2 layered board era, every single track had counted. IBM adopted the 8088 (an 16 bits CPU with 8 bits data bus) on his IBM PC for a good reason.

Wiring space was a so big problem in the 70's that Zilog decided to use a 4 bit ALU on the Z'80 to save *DIE* space to be used on more registers and more micro-code.

Yeah, Z80 is kind of a 4 bit CPU internally, doing all 8 bit operations in 4 bits chunks.

The Z80 would be considerably faster if it had a 8 bit ALU? No doubt about that - you need an 4MHz Z80 to have the same "raw power" of an 2Mhz 6502. But that Z80 would be also prohibitively expensive - so it's hard to tell that Zilog made a bad choice here.

Tursi · January 5, 2015

It saves 13 long tracks crossing the motherboard to connect the CRU devices, saving board space (and size), reducing routing problems and, so, reducing costs. Not a small thing.

Does it now?

To use the CRU space, you need to decode as many address lines as you have CRU space, plus the CRU data bit, plus the CRU control signals. it is NOT equivalent to I2C or similar protocols which include address information on the data line - it uses the parallel address bus. So if you need a single 16-bit register, you need at least 4 address lines. Unfortunately, you also need 5 address lines (A3-A7) in order to differentiate whether you or a different device is being activated, and you need to dedicate a CRU bit to turning yourself on and off. So that's another 5+1, for a total of 11 address lines being needed. You also need CRUCLK and A15 (CRU data), so there's your 13 lines being run.

Add to that the fact that nearly all devices (the keyboard and cassette port being the only exceptions I can think of) /also/ have memory chips and/or memory-mapped IO, so they already have all the address and data lines, and I do not see the advantage.

What were you thinking of?

Lisias · January 6, 2015

Dude, you need 20 pins to read a 2716 EPROM (2048 x 8 bits) in a parallel bus, plus one /CS to each additional chip hooked on the data bus. To address a 16 bit word using such chips, you'll need an additional 8 wires for the data.

The crazy guys from TI reduced it to 13 wires for a whole I/O bus, derived from the TI 990 Mini Computer (where the TMS9900 came from). If you need to reduce your costs on wiring, and don't bother the performance, it fits.

So, see you, when the TI-99 came out, they already have a lot of support chips to be used on the computer from the TI 990 era. And, well, some of them was CRU chips. Can't you see the advantage?

At least give the TI engineers the benefit of the doubt - these guys were inventing computer architectures 10 years before the 4004, the first micro-processor (and the chip our current status quo derives from). Damn, I wasn't even born when these guys sold their first computer.

And, uh, when I talk about I2C and SATA, I was talking about serial buses - I didn't intended to imply that CRU was a 3 pins bus (besides having a 3 pin *data* bus, as the I2C).

Tursi · January 6, 2015

Now you are changing your argument, and I am done.

Lisias · January 22, 2015

Now you are changing your argument, and I am done.

It wasn't my intention to change argument - but I have to admit I could have done a better job at expressing myself in English.

Using that EPROM example, I would need 28 (but IIRC it can be done with 27 or 26 if you do some compromise, as keeping the IC on all the time) wires to address 2048 words of 16 bits in a full parallel bus. The CRU does the same with 13, so here is my 13 wires being saved.

But by taking another look on it, I realize that in fact I ended up changing my argument... The savings appears to be 15 wires, not 13. :-)

Tursi · January 22, 2015

Good luck with your projects.

Understanding the CRU and usage implications

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members