How long...?

Willsy · April 23, 2012

How many uS does this code take to execute. Assuming registers in 16-bit ram, code in 8-bit ram:

LI R0,20     (20)
DEC R0   (14)
JNE $-2    (14)

Any takers?

The numbers in brackets are the values displayed in brackets by the classic99 debugger.

Using the T option in classic99, beginning at the address of the LI instruction, and ending at the line after the JNE, I get the following output:

Timer: 578 CPU cycles - Min: 34  Max: 578  Average(4): 170
Timer: 578 CPU cycles - Min: 34  Max: 578  Average(5): 251
Timer: 578 CPU cycles - Min: 34  Max: 578  Average(6): 306

So, the routine takes 578 CPU cycles. I think the MIN figure is the number of cycles assuming the code doesn't loop. The max is the actual number of cycles accrued during the loop. I don't understand the average figure at all!

Anyway, how do I convert that 578 cycles to uS? Is one cycle 3.333nS? I can't remember! Don't we need 4 cycles (13.332nS) to clock an instruction on the 9900?

The whole thing gets rapidly complicated to a luddite like me!

Willsy · April 23, 2012

Oh, hang on... A cycle is 333.333nS isn't it? Not 3.333!

So: 578 x 333.333 = 192,666nS. Convert to uS (move the decimal 3 places left): 192.666uS.

I think that's right, eh?

matthew180 · April 23, 2012

A000 0400 LI R0,20	* time: 12c, memory: 3 (fetch 8-bit, iop 8-bit, store 16-bit)
A002 0014
A004 0600 DEC R0	* time: 10c, memory: 3 (fetch 8-bit, read 16-bit, store 16-bit)
A006 16FC JNE $-2	* time: 10c, memory: 1 (fetch 8-bit, PC changed)

c = clock cycles = 333ns (0.333us) on the 99/4A. Every time you access 8-bit memory it will add 4c to the time, or 1.332us. If the code is in 8-bit RAM, then the fetch will cause a wait, immediate ops cause a wait, and any source or destination in 8-bit memory will cause a wait.

You have a total of 32c = 10.565us if all RAM was 16-bit. You had four 8-bit memory accesses, so you have to add the wait-states: 4 * 1.332 = 5.328us + 10.565us = 15.839us to execute all 3 instructions.

Edited April 23, 2012 by matthew180

matthew180 · April 23, 2012

A.A. is acting very strange today. I'm not sure if posts are getting through?

Willsy · April 23, 2012

A000 0400 LI R0,20	* time: 12c, memory: 3 (fetch 8-bit, iop 8-bit, store 16-bit)
A002 0014
A004 0600 DEC R0	* time: 10c, memory: 3 (fetch 8-bit, read 16-bit, store 16-bit)
A006 16FC JNE $-2	* time: 10c, memory: 1 (fetch 8-bit, PC changed)
c = clock cycles = 333ns (0.333us) on the 99/4A. Every time you access 8-bit memory it will add 4c to the time, or 1.332us. If the code is in 8-bit RAM, then the fetch will cause a wait, immediate ops cause a wait, and any source or destination in 8-bit memory will cause a wait.

You have a total of 32c = 10.565us if all RAM was 16-bit. You had four 8-bit memory accesses, so you have to add the wait-states: 4 * 1.332 = 5.328us + 10.565us = 15.839us to execute all 3 instructions.

Okay, thanks. But it's a loop, which repeats 20 times! I get the idea though. Just multiply the last two instructions times by 20, and add the first instruction.

Thanks. Yes, AA does seems a little odd today.

matthew180 · April 23, 2012

The DEC and JNE take: 20c * 333ns = 6.66us. Add the wait states for the two 8-bit memory accesses: 2 * 1.332us = 2.664us + 6.66us = 9.324us for one loop, so 9.324us * 20 = 186.48us. The LI takes 6.515us, so: 186.48us + 6.515us = 192.995us.

I'd say Classic99 is pretty spot-on.

Edited April 23, 2012 by matthew180

apersson850 · April 26, 2012

Don't we need 4 cycles (13.332nS) to clock an instruction on the 9900?

I think you are mixing this up with the fact that the TMS 9900 has a four-phase clock to keep track of its timing.

But the four phases of the clock are driven by a 12 MHz timing signal (actually a 48 MHz crystal, which is divided down to get the 12 MHz timing), so these four phases all occur with a frequence of 3 MHz, just skewed against each other and with a 25% duty cycle.

Thus one clock cycle is still only 0.333 us (one period at 3 MHz), even if there are four distinctive pulses within that cycle.

It's the third phase that's used for timing of most other circuitry in the 99/4A.

matthew180 · April 26, 2012

Don't we need 4 cycles (13.332nS) to clock an instruction on the 9900?
I think you are mixing this up with the fact that the TMS 9900 has a four-phase clock to keep track of its timing.

When talking about instruction timing, a "clock cycle" always refers to the 333ns period as measured between any of the same phases, i.e. rising edge of Phi3 to rising edge of Phi3. The *99/4A*, not the 9900, requires four extra 333ns clock cycles for *any* memory access (including the VDP which sits on the CPU side of the multiplexer) other than the ROM or scratchpad. Thierry covers it in detail, as well as how to suppress the wait states for anyway who wants to hardware hack on their 99/4A.

http://nouspikel.group.shef.ac.uk/ti99/wait.htm

apersson850 · April 27, 2012

His description involves modifying the wait state generation logic, hoping that today's circuitry is fast enough to cope with that. Which it probably is, but it does introduce timing issues with software that relies on instruction execution timing to perform.

In my console, I let the 16 to 8-bit multiplexing remain untouched, but put 64 K RAM inside the console. Half of that works as the 32 K RAM expansion, the other half isn't active (normally).

If I want to, I can page in the remaining part in 8 K chunks, to give RAM across the whole address range. But then there's no access to memory mapped devices, DSR, console ROM etc. It's still possible to map in RAM covering only DSR and cartridge space, for example.

Everything running in this memory runs on 16 bit data bus. Yet another CRU bit can enable an additional wait state, which is needed for some software to access the VDP properly. If they have omitted some of the NOPs, which TI recommends, then that fails when all memory suddenly is as fast as possible.

I can also disable the fast 32 K RAM expansion and run on the memory card in the PEB box instead. Thus I can get the normal timing if I want that, or use that memory as a paged expansion, so that I have access to two 32 K RAM expansions at the same time.

This is something I did a long time ago. Now there are several solutions to give the TI much more memory, but back then even circuits with static 32 K RAM in one package was quite a lot.

Tursi · May 1, 2012

So, the routine takes 578 CPU cycles. I think the MIN figure is the number of cycles assuming the code doesn't loop. The max is the actual number of cycles accrued during the loop. I don't understand the average figure at all!

Right on the cycle count. MIN is the minimum number of cycles timed, and MAX is the maximum number of cycles timed. The number in parentheses is how many times the timer has been triggered, and average is the average number of cycles over that many triggers. That's why it's increasing.

It's not clear from your example where '34' came from, but a quick look at the code shows that Classic99 doesn't provide a way to reset those min/max figures. If you played with any timers before running this one, the previous timer is probably interfering with your min/max/avg figures. I've added that to my list.

Sign In

How long...?

Recommended Posts

Willsy

Link to comment

Share on other sites

Willsy

Link to comment

Share on other sites

matthew180

Link to comment

Share on other sites

matthew180

Link to comment

Share on other sites

Willsy

Link to comment

Share on other sites

matthew180

Link to comment

Share on other sites

apersson850

Link to comment

Share on other sites

matthew180

Link to comment

Share on other sites

apersson850

Link to comment

Share on other sites

Tursi

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More