VDP overrun revisited...

matthew180 · June 24, 2010

Found it. On Thierry's tech pages, specifically the one about removing the wait states, he says:

You may remember from the introduction that multiplexing occurs for the whole memory range except >0000-1FFF and >8000-83FF. Which means that is does occur when accessing the memory mapped devices: the sound chip (>8400-87FF), the VDP (>8800-8FFF), the speech synthesizer (>9000-97FF) and the GROMs (>9800-9FFF). This might seem strange since all these devices have 8-bit ports and could thus be hooked to half the 16-bit bus, upstream of the multiplexer. This is indeed the case for the VDP, yet multiplexing still occurs when accessing the VDP: a special circuit prevents the multiplexer from answering when data is read from the VDP.

So why multiplex these addresses? Most probably for timing questions. These devices, expecially the GROMs, are very slow and cannot answer within a regular 2-clock-cycle memory operation. So circuitery was added to the console so GROM access automatically triggers the SysRdy line and puts the CPU on hold until the GROMs are ready to answer, which can take over 24 clock cycles!

So, there you go. Even if you have high-speed code in the scratch pad RAM, you are still going to trigger the wait states for the MOVB.

Matthew

marc.hull · June 24, 2010

Found it. On Thierry's tech pages, specifically the one about removing the wait states, he says:

You may remember from the introduction that multiplexing occurs for the whole memory range except >0000-1FFF and >8000-83FF. Which means that is does occur when accessing the memory mapped devices: the sound chip (>8400-87FF), the VDP (>8800-8FFF), the speech synthesizer (>9000-97FF) and the GROMs (>9800-9FFF). This might seem strange since all these devices have 8-bit ports and could thus be hooked to half the 16-bit bus, upstream of the multiplexer. This is indeed the case for the VDP, yet multiplexing still occurs when accessing the VDP: a special circuit prevents the multiplexer from answering when data is read from the VDP.

So why multiplex these addresses? Most probably for timing questions. These devices, expecially the GROMs, are very slow and cannot answer within a regular 2-clock-cycle memory operation. So circuitery was added to the console so GROM access automatically triggers the SysRdy line and puts the CPU on hold until the GROMs are ready to answer, which can take over 24 clock cycles!

So, there you go. Even if you have high-speed code in the scratch pad RAM, you are still going to trigger the wait states for the MOVB.

Matthew

All right just to put this to bed once and for all.... I will put the code into scratchpad and run it at normal speed and see what happens. Any bets ??? ;-) Give me a couple hours....

matthew180 · June 24, 2010

I'm betting it works. :-)

Matthew

marc.hull · June 24, 2010

I'm betting it works. :-)

Matthew

Hold your bets.... It's not happening tonight :/

Tursi · June 25, 2010

Yes, it can overrun if you set the address then immediately read the data port, but running with no wait states (doesn't matter if it's scratchpad or a modified console). It won't happen every time, since you're only about 2uS too fast (out of 10, so 1 in five such accesses could be expected to fail).

It will not overrun if you set the address, delay, then read as much data as you like. It's the delay between setting the address and reading the data that is easy to make too short.

It should not overrun on writes thanks to the wait states combined with the read-before-write.

The stuff we saw with the logic analyzer exactly matches what we saw in the datasheet and the known TI-99 architecture, there were no surprises there. It just took us a few tries to include all the different parts.

Working at 3.58MHz works fine by the numbers, too. That's 20% faster on the CPU, with a cycle time of 0.28uS instead of 0.33uS. So 10uS is 35.8 CPU cycles on the 3.58MHz machine, and 30 CPU cycles on the 3.0MHz machine. 5 cycles difference is less than 1 instruction, you'll probably be good most of the time.

apersson850 · June 13, 2017

Now I've not really used my TI 99/4A in recent years, but I know that during the "golden years", when it was used every day or so, the game Tennis was the only one that definitely didn't work when running on my console. Which has 64 K RAM on 16-bit bus without additional wait states, that is.

The players, obviously made up of two sprites placed adjacent to each other, split in half, with the legs running one way and the body floating away in the other direction.

My memory modification allows me to simply disable it. When I do, the normal 32 K RAM expansion becomes visible, if present. When run like that, Tennis plays as it should.

But I also added the ability, in hardware, to run 16-bit wide RAM but enabling a wait state when decoding a VDP access only. Thus even code that leaves out the NOP (or other time delay) between VDP access instructions runs properly. But the game Tennis becomes virtually unbeatable...

Enabling/disabling of memory and the hardware wait state generation is done via I/O bits (CRU, base address >400), so it can be done under software control.

Edited June 13, 2017 by apersson850

senior_falcon · June 16, 2017

I want to make sure I understand this. There is no need for any delay in sending the address to VDPWA at >8C02; there is no way to send the two byte address too fast. There needs to be a one instruction delay when reading from VDPRD at >8800 or writing to VDPWD at >8C00 but only when reading/writing the first byte. After the delay in reading/writing the first byte there is no need for any additional delays when reading/writing. Correct?

Any need for a delay when sending a new address to VDPWA after reading/writing to VDP?

apersson850 · June 16, 2017

As far as I remember, it's the moment between writing the high byte to WDPWA and reading the first byte from VDPRD you can't do in two instructions immediately after each other.

When I installed my 16-bit memory expansion, I augmented it with a hardware wait state generation to fix this issue, and that one triggers only on the VDP read data chip select signal in the console. If I run software that doesn't work in fast RAM (like the game Tennis), and then enable this hardware wait state, it works. But it does of course delay every read, not only the first one, so it's less intelligent than correctly written software.

But as far as I know, you don't need to delay the write, just the read. I don't have time to check the VDP data book right now, though. However, I see now that a few posts up from here, it says that the write works without delay, since the CPU does a read before it writes, and that wastes enough time.

Edited June 16, 2017 by apersson850

Asmusr · June 16, 2017

Yes, I have only encountered this issue when reading a byte immediately after setting up the address.

JamesD · June 16, 2017

What's with the NOPs? Just change the order of your code.


MOVB @R0LB,@VDPWA
MOVB R0,@VDPWA
CLR R1
MOVB @VDPRD,R1

SLA R1,1

MOVB @R0LB,@VDPWA
MOVB R0,@VDWPA
ORI R0,>4000
MOVB R1,@VDPWD

Edited June 16, 2017 by JamesD

matthew180 · June 16, 2017

The reason for the delay is because the VDP does a prefetch of the read data. So after you setup a new read address, the VDP does a prefetch of the data that will be returned when you do an actual read.

This is also why the the VDP distinguishes between setting-up a read address and a write address. Both operations will set the VDP's address register, however, setting a read address will cause the prefetch and the address register will be incremented, while setting-up a write address inhibits the prefetch.

So it is not an "overrun" per se, but rather you get the wrong data back when you read too fast after setting-up the read address. There is also nothing that stops you from setting-up a read address and then writing. Your data will not go to the address you set though, since the address register was incremented due to the prefetch following the read address setup.

There is also the situation where you interleave writing and reading. I think writing data also inhibits the prefetch, so if you write, then read, you will not get the data expected.

I'm sure this has all been explained before, probably in this thread (I'll have to go check). IIRC, Tursi has done extensive testing and characterization of all of this. I had to make sure to get it all right in the F18A too, but it amazes me how fast I forget these kinds of details.

Tursi · June 16, 2017

As Matt noted, it's all due to the DRAM access, which can only happen at set times during the VDP's internal state machine. When you read or write, you are just accessing a single internal register in the VDP (the "prefetch buffer"). The VDP then asynchronously either fetches data into that buffer or stores data from that buffer -- it's possible to change the buffer more than once before the VDP does the operation because of that.

I did some specific testing on the ColecoVision, since it is fast enough to trivially overrun the VDP, and as far as I've been able to tell there is no need for delays during register access -- this means both the address register and the VDP write-only registers.

Likewise, if you set an address and write a byte, I don't believe you need to delay between those operations (although I've not hammered this case very hard yet - the datasheet suggests you should).

Delays are required between setting an address and reading a byte, and between each byte written or read back (to/from VRAM), 8uS in normal use - as that's the time between memory access windows. In text mode it's only 6us, and if the screen is blanked (or in vertical blank) no delay is required.

The architecture of the TI-99/4A is such that 8uS is shorter than most operations, meaning the VDP can usually keep pace with the CPU. But there are some gotchas:

If you set an address then try to read back immediately, the time between the write (which occurs near the end of the instruction) and the read (which occurs near the beginning) is too short, even though the whole instruction timing seems long enough. So this can overrun - meaning the VDP hasn't had a chance to fetch the data yet and you get whatever was already in the buffer. But it's tight, so "sometimes" it will work. It's unpredictable, though, as you can't synchronize with the VDP's internal state machine.

There are some really obscure sequences that can do it in 16-bit RAM - if you're coding a clever bulk erase or something do a cycle count and make sure you're safe. (Mind you, if it's a bulk erase anyway, just blank the screen first and you can write as fast as you can dream up. )

To Matt's comment about interleaving reading and writing (something I've used in the past ), I've verified that the data written goes into the prefetch buffer, so if you immediately read, you'll get the same data back.

Any need for a delay when sending a new address to VDPWA after reading/writing to VDP?

I've not explicitly tested this case, but in the case of a read, I'd say no, you have already completed the operation. After a write I would say yes, because you can change the address before it actually writes the byte to VRAM. The VDP doesn't double-buffer its registers, the CPU access gets got one address register, one data register, and a couple of bits to indicate if a read or write needs to happen when the access slot comes around.

Just think of the VDP as free-wheeling in its state machine. "display, display, display, memory access, display, display, display, memory access" etc. When it hits the memory access step, it just checks whether a read or write is needed. If it's a read, it fetches the byte in the address register and stores it in the data register. If it's a write, it writes the byte from the data register to the address in the address register. In either case it increments the address register. (If neither is required, I don't actually know what the hardware does, but from the CPU's point of view nothing happens.)

There's no locking or protection on those registers and the CPU can change them at any time, so the delay is just there to ensure you've waited long enough that the VDP must have gotten a memory access cycle.

VDP overrun revisited...

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members