Jump to content
IGNORED

Question about vertical retrace timing


orbitaldecay

Recommended Posts

Hello again everyone,

 

I have another question for you. I read at http://www.unige.ch/medecine/nouspikel/ti99/tms9918a.htm that there is a 4300 uS window to draw during the vertical refresh. 4300 uS / 0.333 = 12900 clock cycles. Now in the TMS 9900 data manual it states that a MOV instruction uses 14 clock cycles (not counting memory access --- see page 28). Say we are in bitmap mode (graphics II mode). To simply clear every byte in the pattern table in VRAM would take at least 6144 * 14 = 86016 clock cycles i.e. 7 frames! But all this implies that the maximum frame rate for animation in bitmap mode (not using sprites) is around 4 - 5 FPS (assuming the refresh rate is 30 FPS), which can't possibly be true. What am I misunderstanding?

  • Like 1
Link to comment
Share on other sites

The frame rate for the TMS9918A is 60 FPS.

 

...lee

 

Thank you for clearing that up, Lee. I also stumbled upon this http://bifi.msxnet.org/msxnet/tech/tmsposting.txt which confirms my suspicions about the number of clock cycles available during the vertical refresh period. It appears to be true that it is not possible to clear the entire screen during a single vertical refresh... an interesting restriction to work with! Let's see... what can I get done in 12900 clock cycles...

Link to comment
Share on other sites

When writing to the screen, blank the screen (VDP register #1, bit 1=0) and disable interrupts. You then have all the time you need to write to the screen.

 

...lee

 

Yeah, I was thinking about that, but I'm poking around with the idea of doing pixel animations and I'd like it to be as fluid as possible. It'll be a good challenge :-)

Link to comment
Share on other sites

 

Yeah, I was thinking about that, but I'm poking around with the idea of doing pixel animations and I'd like it to be as fluid as possible. It'll be a good challenge :-)

 

There are others who will be along shortly who are better versed than I in the necessary fine tuning here; but, you shouldn't need to always rewrite the entire screen for animation.

 

...lee

Link to comment
Share on other sites

You can't (easily) overrun the VDP from the 9900 on a stock 4A, so there's no need to worry about blanking the screen or the safe write time. But your math is right - you won't get much better than 4-5 fps fully loading the bitmap screen every frame. My video playback code loads 50% of the bitmap screen and manages about 9 fps.

Link to comment
Share on other sites

You can't (easily) overrun the VDP from the 9900 on a stock 4A, so there's no need to worry about blanking the screen or the safe write time. But your math is right - you won't get much better than 4-5 fps fully loading the bitmap screen every frame. My video playback code loads 50% of the bitmap screen and manages about 9 fps.

 

Got it. Thanks for the confirmation. I wonder what kind of black magic makes things like

 

https://www.youtube.com/watch?v=rKqqOcTFVm0

 

possible on a stock MSX, which also uses the TMS9918 and a comparable processor (Z80 @ 3.5 Mhz).

Edited by orbitaldecay
  • Like 1
Link to comment
Share on other sites

The Z80 in the MSX can hit the VDP almost twice as hard as our 9900, but there's a LOT of tricks going on as well as brute force. :)

 

Yeah, I'm familiar with a lot of demo trickiery having been involved with it on and off over the years, but most of my experience has been on the x86 side of things. I'm just now getting back to the oldschool stuff :cool:

Link to comment
Share on other sites

Your best chance of updating the full bitmap screen in a reasonable frame rate is to draw the image in CPU memory and then copy everything to the VDP as fast as possible. Here is a demo (one of my first) that did just that. I later improved it a bit, but I can't find the thread now.

 

 

Here is another demo of soft sprites using the same technique:

 

http://atariage.com/forums/topic/237374-soft-sprites-on-9918amsx1/page-2?do=findComment&comment=3237482

 

Both these demos are monochrome, which of course saves half of the memory and bandwidth. The problem when you start using color is the speed of course, but also that you need to synchronize the transfer of colors and patterns to the VDP in an order that avoids flickering, for instance by transferring 8 pixels rows of pattern, then 8 pixel rows of color, and so on.

 

It is possible to double buffer the bottom two thirds of the bitmap screen if you can figure out how to use the same colors on the two thirds as in this demo:

 

 

Smooth scrolling techniques (like this) are not using full bitmap mode, but either graphics I mode or one of the hybrid modes.

  • Like 1
Link to comment
Share on other sites

Your best chance of updating the full bitmap screen in a reasonable frame rate is to draw the image in CPU memory and then copy everything to the VDP as fast as possible. Here is a demo (one of my first) that did just that. I later improved it a bit, but I can't find the thread now.

 

 

Here is another demo of soft sprites using the same technique:

 

http://atariage.com/forums/topic/237374-soft-sprites-on-9918amsx1/page-2?do=findComment&comment=3237482

 

Both these demos are monochrome, which of course saves half of the memory and bandwidth. The problem when you start using color is the speed of course, but also that you need to synchronize the transfer of colors and patterns to the VDP in an order that avoids flickering, for instance by transferring 8 pixels rows of pattern, then 8 pixel rows of color, and so on.

 

It is possible to double buffer the bottom two thirds of the bitmap screen if you can figure out how to use the same colors on the two thirds as in this demo:

 

 

Smooth scrolling techniques (like this) are not using full bitmap mode, but either graphics I mode or one of the hybrid modes.

 

Thanks so much for sharing source for the software sprites. They look very nice. Let me know if you find the thread or source for the vector graphics. I was thinking of doing something similar and would love to see how you chose to implement it.

Link to comment
Share on other sites

 

Thanks so much for sharing source for the software sprites. They look very nice. Let me know if you find the thread or source for the vector graphics. I was thinking of doing something similar and would love to see how you chose to implement it.

 

I can easily post the lines demo source, just don't know where we discussed it. lines - 1.a99 I think is the one from the video. lines - 2.a99 is the new one.

lines - 1.a99

lines - 2.a99

LINES.dsk

  • Like 1
Link to comment
Share on other sites

 

Got it. Thanks for the confirmation. I wonder what kind of black magic makes things like

 

https://www.youtube.com/watch?v=rKqqOcTFVm0

 

possible on a stock MSX, which also uses the TMS9918 and a comparable processor (Z80 @ 3.5 Mhz).

 

I can safely explain the magic behind the spinning roulette

It has been done using some tools of mine based on vector quantization and on a custom version of the Floyd–Steinberg dithering adapted to the color constraints of the screen 2 mode

 

You can find some explanation and tests here

https://sites.google.com/site/ragozini/vdpenc2

and here

https://sites.google.com/site/ragozini/vdpenc2bis

 

The dithering algorithm in C is here

https://sites.google.com/site/ragozini/dithering

 

Encoder and decoder are unreleased, the encoder works in Matlab, the decoder is trivial, it just unpacks frames of 32*24 tiles in RAM and transfer them in VRAM

Edited by artrag
  • Like 4
Link to comment
Share on other sites

 

Is that because of the speed difference between the CPUs or because of the RAM wait states on the TI?

 

Yes. :)

 

I don't know the Z80 or MSX as well, so maybe artrag can fact check me here. :)

 

The Z80 in the MSX is clocked at 3.5xxxx MHz, and instructions take roughly 4 to 23 cycles (http://www.z80.info/z80time.txt). It's got some memory move instructions that can move a byte to the VDP every 21 cycles (OUTIR, and there are ways to stack the instructions to go faster, unrolled OUTI's can do 16 cycles). Some of the faster opcodes are very useful things like add (although you do still have to get data in and out of the registers). 21 cycles at 3.5MHz means a flat copy can copy a byte from a table in RAM every 6 microseconds (which is actually too fast for the VDP in most modes, where the limit is 8uS, but it works when the display is disabled or during vertical blank). 16 cycles is 4.6 microseconds, but you have to do your own looping.

 

The 9900 is clocked at 3MHz flat and instructions take roughly 8-52 cycles (leaving out DIV). In addition, any write to the VDP on the TI-99/4A automatically costs 8 cycles more due to the wait states and the read-before-write. The more useful instructions like MOV and A are 14 cycles - a MOV to the VDP starts at 22 cycles before any other access happens (like fetching the data you intend to move). If we argue that we can do a copy with a MOV register indirect increment, and unroll the whole thing (MOVB *R1+,*R0), and run from scratchpad and have data there too, then we're looking at 14+6+4+(4+4)=32 cycles (and that doesn't include counting down or looping). 32 cycles at 3MHz means a byte every 10.6 microseconds, but the reality is we are probably at least copying from 8-bit memory, so add another 4 cycles for the read. 36 cycles is 12 microseconds. Odds are good that you'll run slightly slower in order to loop (but an unrolled loop at least minimizes that). If you're writing a static value (ie: clearing the screen), you can lose one indirection and the increment, which saves 6 cycles (MOVB R1,*R0) - 26 cycles is 8.6 microseconds, again with no looping.

  • Like 1
Link to comment
Share on other sites

As an aside, the multiplexer delay is frustrating, but it's still the fastest operation the CPU deals with. It may seem tempting to remove wait states by pulling down words instead of bytes, but even in scratchpad the extra instructions just eat up much more time:

 

MOVB *R1+,*R0 (36)

MOVB *R1+,*R0 (36 - total 72 cycles)

 

MOV *R1+,R2 (22)

MOVB R2,*R0 (26)

SWPB R2 (10)

MOVB R2,*R0 (26 - total 84 cycles)

 

Or the ever-popular direct access to the LSB, let's say R3 points to the LSB of R2, since that's faster than a direct memory reference:

 

MOVB *R1+,R2 (22)

MOVB R2,*R0 (26)

MOVB *R3,*R0 (32 - total 80 cycles)

 

So it's there, it's frustrating sometimes, but don't work too hard to work around it. If it doesn't fit in scratchpad, it's probably not going to be faster any way but directly accessing it. Most of the time the best optimizations on the 9900 just come from reducing the instruction count, because the base cost is so high.

Link to comment
Share on other sites

On msx1 it is pretty simple to over run the VDP and miss data.

 

The best conditions for VRAM I/O are during VBLANK and when the screen is disabled.

In these cases you can move a block of ram to vram using and unrolled loop of OUTI instructions (it does the I/O, increases a pointer and decreases a counter all at the same time).

It costs 16 cpu cycles on the paper but due to some crappy choices in the standard you have to add 2 extra wait states.

 

The z80 is at 3.57Mz, so 18 cycles are about 5us.

 

In a frame, the VBLANK lasts about 4300us (assuming NTSC at 60Hz, but there are also msx1 with PAL chips at 50Hz).

This means that you can copy about 4300/5 = 860 bytes per frame at maximum speed.

 

Outside the VBLANK, the vdp needs a delay of about 2+6= 8us between successive bytes.

In z80 cycles it means about 28 cycles which corresponds to the 3 instructions: OUTI, NOP, NOP

 

Theoretically outside VBLANK you can out about others 1546 bytes per frame

(1/60-4300us)/8us = 1546 bytes

 

You get as theoretical maximum 1546+860 = 2406 bytes per frame

Edited by artrag
  • Like 2
Link to comment
Share on other sites

Thanks for the info. If you place the workspace at >8C00 you can clear a VDP byte using CLR R0 instead of CLR *R0. Would that be faster?

 

Yeah, we've toyed with that idea before (though I've never tried it)... that gets you down to 10+4+4=18 cycles (because CLR still does a read before write), which gets you down to 6 microseconds. I'd tried to think about how to use that before but I guess for clearing it's not so bad -- if you're in vblank you can get away with it. Outside of vblank you need to run slower.

 

MOV @ADR,R0 with the workspace in scratchpad would be 14+8+4+4+4=34 cycles if ADR was in 8-bit RAM, so that's 11.3 microseconds, but that's almost as slow as register indirect with increment, and arguably less useful. Just thinking aloud now... ;)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...