INTV clock

Walter Ives · April 25, 2023

On 5/24/2016 at 11:15 PM, intvnut said:

As for what those circuits are doing: They're sending in the clocks at a whopping 11V or so... ... I don't fully understand the principle of operation of the transistor part of the circuit; but I understand what the goal is.

The goal of the circuit is to provide clock signals with fast rising and falling edges, a high level near but not exceeding VDD and a low level near but not less than VSS.

The internal logic circuitry of the CP1600 all runs off VDD, which in this circuit is nominally 11.3 volts. All of the other inputs and outputs to the chip go through internal buffers that do logic level translation. That's OK for those signals, because they each drive at most a handful of transistors. The clocks, however, run everywhere inside the chip. They're only used to drive MOS devices (the CPU and the System RAM), so you may as well skip the on-chip level translation and have them directly supply the voltage levels needed by the internal logic.

The console's 12-volt supply is regulated by a 7812 voltage regulator. The 12 volts is supplied through CR1, a general purpose 1N4001 diode with a forward voltage drop of about 0.7 volts so that the actual VDD input voltage to the CPU is nominally 11.3 Volts. One wants the high level of the clock signals to be near but never exceed this value.

The clock inputs are protected by a diode network. CR5 is a 1N758 10 volt 400mW Zener diode. CR3 and CR4 are 1N4148 general purpose small signal diodes with nominal 0.7 volt forward voltage drop. Among them these diodes conduct away any charge that would cause the clock voltages to rise (much) above 10.7 volts. The three diode configuration allows the job to be accomplished with only one somewhat more expensive Zener.

The clock lines use the unregulated 16V supply for their positive rail to get better rise times. Q1 and Q2 are two general purpose (inexpensive) PNP transistors that are configured as current limiters; they stop conducting when the voltage across the corresponding one of R2/R16 and the base-emitter junction exceeds the Zener voltage of CR2, which is 3.3V. This limits the amount of current the 7407's have to sink.

WJI

Walter Ives · April 25, 2023

On 5/24/2016 at 11:15 PM, intvnut said:

They're sending in the clocks at a whopping 11V or so... Those clocks are driving a lot of circuits, so I imagine the high voltage is to charge that capacitance quickly.

The whole guts of the CP1600 is running at VDD, here 11.3 volts. The 11V is thus a normal signal level, not at all "whopping." You generally want logical high inputs to an MOS device to be near but not exceed VDD. You're not completely wrong though, as the amount of current delivered is indeed a wee bit on the high side.

Don't let the fact that all of the other inputs and outputs operate at TTL levels mislead you: they all connect to internal buffers that do the logic-level translation. GI was quite proud of the process that let them do that.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 2:19 AM, intvnut said:

Two things really kill the CP-1610 performance, FWIW:

Double-pumped 8-bit data path. Nearly every cycle is followed by a NACT cycle, and I suspect it's the double-pumped nature.

Multiplexed address/data bus. Every access to the outside world is two bus cycles (each with a NACT).

Those two together give a factor of 4 slowdown on nearly everything. The only bus cycles that aren't followed by a NACT are during SDBD reads.

Nope. The thing that really kills CP-1610 performance is that it is only being clocked at 36% of its design speed, 1.79MHz instead of 5 MHz.

Without going down the rabbit hole of process details, all of the manufacturers were using 6 to 8 micron lithography and all of their transistors switched at about the same speeds (you could speed things up a bit by using a higher VDD or depletion-mode load transistors). What faster multi-phase clocks did was provide you with more internal phasing that you could use for architectural cleverness (like using a half-width ALU): they didn't make the transistors switch any faster.

Internal transistors switched much faster than output drivers. All of the manufacturers were designing processors to use the commonly available MOS memory devices. In 1974-1976, when these particular processors came out, that meant designing for a one microsecond memory cycle time.

This is why a 2 MHz clock on 8080/Z80 style devices yields the same instruction timing as a 1 MHz clock on 6800/6502 devices.

Each CPU1600 bus state (NACT, BAR, DTB, etc.) takes two clock cycles. At 5 MHz that's 400ns per bus state. Accordingly, a complete BAR-NACT-DTB-NACT memory read cycle takes 1.6 us. While that's slower than the 1 us cycle times of the 8080/z80/6800/6502, each cycle does fetch 16 bits (the top six of which are zero for instructions). The CP1600 could execute a minimal instruction like SETC in that time, compared to 2 us for the 8080 or 6502, both of which wasted a memory cycle in the course of completing that instruction.

As I mentioned in another post, although the CP1600/CP1610 can run just fine for a short while at its design speed of 5MHz, it gets hot rather quickly. Resistance goes up with temperature (the vibrating lattice slows down the electrons) and the instructions fail to execute properly. The only difference between the CP1600 and the CP1610 was that the former was packaged in ceramic and the latter in plastic. The ceramic was more expensive than the plastic but helped with the heat dissipation and were less likely to pull the leads off the die due to thermal expansion. In either case the processor could run faster if heat sinks and fans were incorporated into the system. The CPUs in the Intellivsion III prototype systems used ceramic packages and were clocked at double speed with no additional cooling.

Why does a CP1600 run hotter than an 8080? First order answer: it uses more transistors.

The above is a first order discussion. There are obviously many more weeds and nuances, but my fingers are getting tired, so honor pricks me on.

When the CP1600 is run at full speed the instruction set is pretty good. The extra two bits in the op code (as compared to the 8080/6800 families) lets you do quite a bit more with one instruction fetch. The 16-bit register-to-register instructions of a full-speed CP1600 take 2.4 us, which is good. While 10-bit instruction words are just fine for the opcodes, the 10-bit ROM width really slows down access to memory above 1K. That's not a fault of the processor, though: it's the fault of cheaping out and using 10-bit wide ROMs. You get that speed back if you use 16-bit wide ROMs to eliminate the need for using SDBD mode. [In terms 6502 aficionados can understand, that turns your whole 64K address space into zero page.] That's what was planned for Aphix. (Aphix was the code-name the system that used Coffee/STIC 1b, the system that was announced to the trade as Intellivision III.)

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 2:19 AM, intvnut said:

Two things really kill the CP-1610 performance, FWIW:

Double-pumped 8-bit data path.

Well, the limiting factor in ALU addition was propagating the carry. So if you have a double speed clock just sitting around you can use that to pipeline the two bytes of a word (or in the case of the Z80, the two nibbles of a byte) through the same half-width ALU in such a way that the second byte of the operands arrive at the ALU inputs just as the carry-out from the first byte becomes available. You thus achieve a significant savings in transistors with no loss in performance.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 2:19 AM, intvnut said:

Two things really kill the CP-1610 performance ... Multiplexed address/data bus. Every access to the outside world is two bus cycles (each with a NACT).

Those two together give a factor of 4 slowdown on nearly everything. The only bus cycles that aren't followed by a NACT are during SDBD reads.

Nope. The multiplexed address/data bus doesn't slow anything down.

The NACTs following BARs don't cost anything because the memory chips have already latched the address and the selected chip is using that time to access its internal array cells. It won't be ready with the accessed data until the DTB cycle comes around.

As for the NACTs following DTBs: the CP1600 uses those to do processing the 6502 does in the memory cycle it throws away as described in my preceding post. (Additional weeds here.)

You can see that 8080/6800 family processors don't use their data bus during the first half of a memory cycle and that their address bus is the same during the second half of a memory cycle as it was during the first and so isn't being used to convey any more information during that time. What the separate buses do is simplify the overall system by eliminating the need for external address latches. It also simplifies the internal chip circuitry in the days when CPUs were implemented with on the order of 4000 transistors.

If you harbor doubts, look at how both the 8085 and the 8051 deal with their package-pin limitation issues by multiplexing the data and low address on the same 8 lines with no slowdown. The downside was that both these parts need external address latches.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 5:08 AM, decle said:

However, the CP1610 is one of several versions in the CP16x0 range.

You (decle) know this by now, but for others who happen upon this rather authoritative thread while perusing this forum: The CP1600 and the CP1610 use exactly the same die. The only difference is that the former is packaged in ceramic, the latter in plastic. These were the only versions available prior to 1983.

There is, however, another. In 1983 GI completed the design of a functionally identical 5-volt only part as part of a cost-reduction redesign. This part is designated the CP1610A. It was packaged in a 28-pin package because it only needed a single phase clock, eliminated the extra supply pins and the Intellivision didn't use quite a few of the other signals available on the die (EBCx, TCI, STPST, PCIT, HALT). The part slated for use in the Intellivision II as a running change once the inventory of existing processors was exhausted—Mattel closed before that happened. It was also slated for use in the Intellivision III where it could be clocked at double speed when running new cartridges, but that project was also canceled. Valeski finally put it into service in 1987 when he ran out of his inventory of original processors—you can see it in the images of INTV 1988 circuit boards floating around this forum.

image.png.eb833af23e31871df9180bcdda798c97.png

So there are exactly two versions of the die, designed 8 years apart. No more.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 5:08 AM, decle said:

... which suggest that some versions can be clocked as fast as 5MHz, which would imply machine cycle rates as high as 2.5MHz. This would probably bring the CP1600 into the same ballpark as the 6502 and Z80 in terms of real world performance. I still don't think it would match these CPUs on instruction throughput, but the greater flexibility of its architecture stands a chance of making up the difference.

Are you kidding? A full speed CP1600 with 16-bit wide memory is more than competitive with the contemporary 8-bitters in applications requiring data types larger than 8 bits. Like handling BACKTAB entries, object x- and y-positions or pixel patterns that are 16-bit wide (for wider or multi-color cards). Where it comes up short is in string applications that pack two 8-bit characters into a 16-bit word. Where it really outshines them is that you could actually write a tolerable C compiler for it.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 5:11 AM, decle said:

Hilarious, I had assumed the 8bit ALU was an error on the datasheet. And I did not know that the Z80 had a 4 bit ALU, learn something new every day. ... Even better, chatting with a colleague about this little Z80 factoid, he said he had always wondered how they did the half carry. Now it makes perfect sense.

But the Z80's predecessor, the 8080, had an 8-bit ALU, so how did it get its half-carry flag and why did it bother? (If you google it to check, Intel called it the Auxiliary Carry flag.) Out! Out, damn facts! And just when everything seemed like it was starting to make perfect sense…

WJI

"To BCD or not BCD, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous bases, or to take arms against a sea of conversions, and, with opposing thumbs, end them?"

Walter Ives · April 25, 2023

On 5/25/2016 at 1:48 AM, intvnut said:

That works out to an average current of 20mA during the swing. The peak current would be higher, naturally. That's a pretty significant switching current though.

Ah, but it's only for 15 nanoseconds. Just how many picocoulombs of charge does that add up to? Well, at least you now know why there are ferrite beads on the transistor leads.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 8:34 AM, decle said:

OK, follow up question. Parking for a moment the potential problems with R6 and R7, do you think the CP1610 would have been a better CPU (whatever that means) if GI had gone for 8 bit registers and and an 8 bit ALU? So just considering smaller registers vs faster machine cycle rate.

As you intimate, better for what? The CP1600 was a joint product designed by Honeywell's Process Control Division and GI's Microelectronics group. Where the practice was to bring the wires for every sensor and control in a plant into a single large room, like this,

image.png.4a0934436fe19dbebfe9041b3f8ba097.png

Honeywell thought the way of the future was to distribute control processors around the plant and to use video terminals to communicate with those processors over a network. It started using PDP-11s for that purpose, their 16-bit data words being a good match for plant instrumentation such as analog-to-digital converters. But PDP-11s were a bit pricey, so Honeywell had the idea of putting the essence of a PDP-11 on a chip and building its own controllers around that chip. Programmability was far more important than code size, and he relatively easy-to-program CP1600 that resulted was absolutely perfect for this purpose. Although it was designed to be clocked at 5 MHz, speed was not all that critical and they usually ran it at reduced speed to simplify cooling. Honeywell was more than willing to pay a hundred dollars apiece in 1975.

Unlike industrial process control, consumer electronics is extremely sensitive to component prices. GI, in the person of Duncan Harrower, accidentally happened upon the video game business and pretty much owned it for a year. It was obvious to Maine and Harrower that their next generation offering should involve a microprocessor—where to get one cheap? "Ah," they said to themselves, and to Ed Sack and Lew Solomon, "GI has this CP1600 that's being produced on our fully amortized process line. Let's use that. We can design a chipset around it that can be sold for under twenty bucks." And so was born the 8800 GIC and the Unisonic Champion 2711. The CP1600 was overkill for that, but the price was right.

The follow-on STIC chipset was designed to be a match to a CP1600 running at 1.79 MHz. The two are co-joined: you cannot speed up one without speeding up the other. And speeding up the STIC side of things was problematic: in what universe was it a good idea to use an MOS chip as a gateway to access the STIC registers or graphics memory? Answer: in a universe in which you are using your old fab lines to make a chipset that you can sell for 30 bucks. [National used a newer line, that's one of the reasons it wanted a higher price.]

Because the STIC chipset was designed for use with the CP1600 it implemented 16-bit features like BACKTAB and STIC registers. Well, 14-bit, but you get the idea.

So to get back to your question: GI wouldn't have gone for an 8-bit processor as described in your counter-factual scenario because GI it didn't have such a processor in its portfolio at the time. The only reason it had the CP1600 was because Honeywell had paid for its development. And the reason Ed Sack and Lew Solomon reluctantly gave Maine and Harrower a few dollars to finance the development of the STIC chipset was because there was an outside chance it would let them sell large volumes of CP1600s to a small number of well-heeled companies. It isn't a matter of your desire; you go to market with the chips that you have, not the chips you might want or at a later time wish you had had.

Slow as the resulting system was, it seems to me to have been adequate for the first round of Intellivision games. In evaluating the first few games you have to take into account that development weren't started until May/June of 1978, they had to fit in 4K and they had to be done by September to be on store shelves in time for Christmas. Thanks to the Exec and APh's summer program, the first round of games was ready on time (Football was a little late). Unfortunately, the chipset wasn't.

According to secondary sources (meaning I can't really vouch for this and so you should verify it before repeating), the APh crew had close ties with Carver Mead and was quite aware of trends in VLSI design and the speed at which change was occurring. APh predicted that the chipset would only have two years of life before competition became a problem.

Accordingly, APh was advising Chang even before software development began to immediately begin development of an upgraded backward compatible system using more current process technologies and doubling the clock speed of the CPU.

In 1981 Maine proposed a CP16000 microprocessor with a 100ns cycle time, so GI's thinking was moving forward too.

In 1982 GI began laying out the 5-volt only but otherwise functionality identical CP1610A as part of the Big Mac/Intellivision II cost-reduction effort. That device could easily run at 5 MHz without a heat sink, and probably faster, although there was no requirement at the time to qualify the part for greater speed. If you really want to know how fast it can actually run you can obtain one by cannibalizing one of your INTV 88 boards.

The importance of backward compatibility can't be overemphasized: it really was a Holy Grail. Backward compatibility was a very important reason Coffee/Aphix/Intellivision III was approved. (When used internally, names like Intellivision II, Intellivision III and Intellivision IV just meant the next generation system and so kept changing meaning. If Chandler had had his way, Decade would have become Intellivision III. So it's more precise to use Aphix/Decade rather than Intellivision III/IV. Don't use MAGIC as a system name; that was the name for the graphics chip in the Decade system.) To add to the confusion, Prodromou would subsequently invent his own code names like Coffee and Big Mac. Aphix was slated to use the CP1610A, which could be set to be clocked at double speed (3.56 MHz) when executing new programs. Overall performance could be further improved by using 16-bit wide ROMs.

During the course of developing Aphix, APh realized that it could easily do yet another generation of backward compatible upgrades and the design team transitioned to designing Aphix II immediately upon delivering the reference Aphix emulator to Toshiba for layout at the end of January 1983. As far as the CPU was concerned that meant doubling its clock speed again, to 7.16 MHz, expanding its 10-bit opcode to address 16 registers, adding a couple of addressing modes and adding a few bits to the program counter. By keeping other changes to a minimum the resulting CP16000 processor was reduced to a straightforward modification of the existing design that could be completed in a very short time at very low risk and would both significantly outperform the 68000 in video game applications and have a much smaller die size. If you're noticing that the resulting processor is starting to sound a lot like the Thumb mode of the ARM processor you wouldn't be far wrong. The expanded op code set and other operational characteristics had been specified and a team was being assembled to lay it out when Morris halted all hardware development.

The CP16000 would not have been the end-of-the line. Even after the abovementioned enhancements, over half of the opcode space was still available for further expansion.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 8:56 AM, intvnut said:

The CPU has a 0ns setup time relative to TS3 and 10ns hold time relative to the fall of TS3, so the RA-3-9600 just meets timing in a PAL system.

A coincidence? I think not. It is the specified performance of the RA-3-9600 that just meets the required timing. The RA-3-9600 was designed to provide whatever functionality was required and if there happened to be extra margin there was no point in committing to that that in the datasheet. Had you asked Harrower or Dunn to do so they would have looked at you funny and muttered, "Managers. Clueless as usual."

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 9:05 AM, intvnut said:

The 6502 would have been dead with a multiplexed bus.

Nah. Output the addresses during PHI1 and latch them on the rising edge of PHI2. During PHI2 float the bus for reads and drive data out for writes. Easy-peasey.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 1:37 PM, intvnut said:

For example, you could make R0 the "accumulator" (or R0_lo and R0_hi), and make most operations use it as an implicit operand, similar to 6502.

Heathen.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 1:37 PM, intvnut said:

Going back to 8 bit operations I think may allow eliminating a TS state most of the time, getting you closer to the Z80 T-state count. Have a look at the CP1600 pipeline:

I suspect the "Read Registers" phase just reads the low halves of the register. You can probably pull "Start Processing Next Data Bytes" one cycle earlier, or maybe two. One seems doable.

If you carefully analyze this diagram you can see that it's a great illustration of the fact that the transistors inside all of the contemporary microprocessors switched at about the same rate but that you could use multiphase clocking to cleverly synchronize the internal operation, and that getting signals on and off the chip takes much longer than internal processing "[Input Data Valid or Output Buffers Propagate]".

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 1:37 PM, intvnut said:

You can probably pull "Start Processing Next Data Bytes" one cycle earlier, or maybe two.

I don't think that means what you think it means. Each microcode cycle takes four TS states, so each instruction has to be a multiple of four TS states. Even though it takes 8 TS states to complete an ALU cycle on 16 bits, data flow is pipelined so that you can start a new ALU cycle every 4 TS states.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 1:37 PM, intvnut said:

What I'm saying is that the machine is fairly balanced as it is.

I concur. Upon careful analysis it is very hard to improve the instruction set without adding significantly more transistors, and that was out of the question.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 1:37 PM, intvnut said:

If you could get 8-bit instructions in addition to the 16-bit instructions, with the 8 bit instructions going 1 TS state faster, that would give a nice speedup on 8-bit values, but nothing dramatic.

Every one of the 1024 possible op-codes was spoken for. A few combinations were admittedly useless or duplicative—I doubt CMP R7, R7 got much use—but it would have been difficult to decode those slots for other use. Adding a set of 8-bit instructions would have required increasing the width of the instruction word and adding a significant number of additional transistors.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 1:37 PM, intvnut said:

If you could get 8-bit instructions in addition to the 16-bit instructions, with the 8 bit instructions going 1 TS state faster, that would give a nice speedup on 8-bit values, but nothing dramatic.

If you're interested in saving cycles here and there, I draw your attention to the CP-1600 direct addressing instructions, e.g. "ADD R3, foo" and compare them with comparable 8-bit processor instructions, like the 6502's "ADC foo", on comparatively clocked systems. On contemporary 8-bit processors (8080/Z80/6800/6502) the address "foo" must be brought into the CPU on the data bus on one cycle and then output on the address bus on the next. The CP1600 uses the "Addressed Data to Address Register" bus state to skip the cycle that brings the address into the CPU.

WJI

Walter Ives · April 25, 2023

On 5/25/2016 at 1:37 PM, intvnut said:

What I'm saying is that the machine is fairly balanced as it is. If you could get 8-bit instructions in addition to the 16-bit instructions, with the 8 bit instructions going 1 TS state faster, that would give a nice speedup on 8-bit values, but nothing dramatic.

The realities of fitting an uncomfortable number of pounds in the sack and the exigencies of completing the project led to the trading of silicon for microcycles. The CP16000, designed two process iterations later, reclaimed those microcycles.

WJI

INTV clock

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members