orbitaldecay Posted October 6, 2015 Share Posted October 6, 2015 Hello again everyone, I have another question for you. I read at http://www.unige.ch/medecine/nouspikel/ti99/tms9918a.htm that there is a 4300 uS window to draw during the vertical refresh. 4300 uS / 0.333 = 12900 clock cycles. Now in the TMS 9900 data manual it states that a MOV instruction uses 14 clock cycles (not counting memory access --- see page 28). Say we are in bitmap mode (graphics II mode). To simply clear every byte in the pattern table in VRAM would take at least 6144 * 14 = 86016 clock cycles i.e. 7 frames! But all this implies that the maximum frame rate for animation in bitmap mode (not using sprites) is around 4 - 5 FPS (assuming the refresh rate is 30 FPS), which can't possibly be true. What am I misunderstanding? 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted October 6, 2015 Share Posted October 6, 2015 The frame rate for the TMS9918A is 60 FPS. ...lee Quote Link to comment Share on other sites More sharing options...
orbitaldecay Posted October 6, 2015 Author Share Posted October 6, 2015 The frame rate for the TMS9918A is 60 FPS. ...lee Thank you for clearing that up, Lee. I also stumbled upon this http://bifi.msxnet.org/msxnet/tech/tmsposting.txt which confirms my suspicions about the number of clock cycles available during the vertical refresh period. It appears to be true that it is not possible to clear the entire screen during a single vertical refresh... an interesting restriction to work with! Let's see... what can I get done in 12900 clock cycles... Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted October 6, 2015 Share Posted October 6, 2015 When writing to the screen, blank the screen (VDP register #1, bit 1=0) and disable interrupts. You then have all the time you need to write to the screen. ...lee Quote Link to comment Share on other sites More sharing options...
orbitaldecay Posted October 6, 2015 Author Share Posted October 6, 2015 When writing to the screen, blank the screen (VDP register #1, bit 1=0) and disable interrupts. You then have all the time you need to write to the screen. ...lee Yeah, I was thinking about that, but I'm poking around with the idea of doing pixel animations and I'd like it to be as fluid as possible. It'll be a good challenge :-) Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted October 6, 2015 Share Posted October 6, 2015 Yeah, I was thinking about that, but I'm poking around with the idea of doing pixel animations and I'd like it to be as fluid as possible. It'll be a good challenge :-) There are others who will be along shortly who are better versed than I in the necessary fine tuning here; but, you shouldn't need to always rewrite the entire screen for animation. ...lee Quote Link to comment Share on other sites More sharing options...
Tursi Posted October 6, 2015 Share Posted October 6, 2015 You can't (easily) overrun the VDP from the 9900 on a stock 4A, so there's no need to worry about blanking the screen or the safe write time. But your math is right - you won't get much better than 4-5 fps fully loading the bitmap screen every frame. My video playback code loads 50% of the bitmap screen and manages about 9 fps. Quote Link to comment Share on other sites More sharing options...
orbitaldecay Posted October 6, 2015 Author Share Posted October 6, 2015 (edited) You can't (easily) overrun the VDP from the 9900 on a stock 4A, so there's no need to worry about blanking the screen or the safe write time. But your math is right - you won't get much better than 4-5 fps fully loading the bitmap screen every frame. My video playback code loads 50% of the bitmap screen and manages about 9 fps. Got it. Thanks for the confirmation. I wonder what kind of black magic makes things like https://www.youtube.com/watch?v=rKqqOcTFVm0 possible on a stock MSX, which also uses the TMS9918 and a comparable processor (Z80 @ 3.5 Mhz). Edited October 6, 2015 by orbitaldecay 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted October 6, 2015 Share Posted October 6, 2015 The Z80 in the MSX can hit the VDP almost twice as hard as our 9900, but there's a LOT of tricks going on as well as brute force. Quote Link to comment Share on other sites More sharing options...
orbitaldecay Posted October 6, 2015 Author Share Posted October 6, 2015 The Z80 in the MSX can hit the VDP almost twice as hard as our 9900, but there's a LOT of tricks going on as well as brute force. Yeah, I'm familiar with a lot of demo trickiery having been involved with it on and off over the years, but most of my experience has been on the x86 side of things. I'm just now getting back to the oldschool stuff Quote Link to comment Share on other sites More sharing options...
Asmusr Posted October 6, 2015 Share Posted October 6, 2015 Your best chance of updating the full bitmap screen in a reasonable frame rate is to draw the image in CPU memory and then copy everything to the VDP as fast as possible. Here is a demo (one of my first) that did just that. I later improved it a bit, but I can't find the thread now. Here is another demo of soft sprites using the same technique: http://atariage.com/forums/topic/237374-soft-sprites-on-9918amsx1/page-2?do=findComment&comment=3237482 Both these demos are monochrome, which of course saves half of the memory and bandwidth. The problem when you start using color is the speed of course, but also that you need to synchronize the transfer of colors and patterns to the VDP in an order that avoids flickering, for instance by transferring 8 pixels rows of pattern, then 8 pixel rows of color, and so on. It is possible to double buffer the bottom two thirds of the bitmap screen if you can figure out how to use the same colors on the two thirds as in this demo: Smooth scrolling techniques (like this) are not using full bitmap mode, but either graphics I mode or one of the hybrid modes. 1 Quote Link to comment Share on other sites More sharing options...
Opry99er Posted October 6, 2015 Share Posted October 6, 2015 (edited) Light Year demo=single coolest graphics/sound combo I have ever seen on the stock TI ever...... Edited October 6, 2015 by Opry99er 1 Quote Link to comment Share on other sites More sharing options...
Willsy Posted October 6, 2015 Share Posted October 6, 2015 Light Year demo=single coolest graphics/sound combo I have ever seen on the stock TI ever...... Damn straight. Quote Link to comment Share on other sites More sharing options...
orbitaldecay Posted October 6, 2015 Author Share Posted October 6, 2015 Your best chance of updating the full bitmap screen in a reasonable frame rate is to draw the image in CPU memory and then copy everything to the VDP as fast as possible. Here is a demo (one of my first) that did just that. I later improved it a bit, but I can't find the thread now. Here is another demo of soft sprites using the same technique: http://atariage.com/forums/topic/237374-soft-sprites-on-9918amsx1/page-2?do=findComment&comment=3237482 Both these demos are monochrome, which of course saves half of the memory and bandwidth. The problem when you start using color is the speed of course, but also that you need to synchronize the transfer of colors and patterns to the VDP in an order that avoids flickering, for instance by transferring 8 pixels rows of pattern, then 8 pixel rows of color, and so on. It is possible to double buffer the bottom two thirds of the bitmap screen if you can figure out how to use the same colors on the two thirds as in this demo: Smooth scrolling techniques (like this) are not using full bitmap mode, but either graphics I mode or one of the hybrid modes. Thanks so much for sharing source for the software sprites. They look very nice. Let me know if you find the thread or source for the vector graphics. I was thinking of doing something similar and would love to see how you chose to implement it. Quote Link to comment Share on other sites More sharing options...
Asmusr Posted October 6, 2015 Share Posted October 6, 2015 Thanks so much for sharing source for the software sprites. They look very nice. Let me know if you find the thread or source for the vector graphics. I was thinking of doing something similar and would love to see how you chose to implement it. I can easily post the lines demo source, just don't know where we discussed it. lines - 1.a99 I think is the one from the video. lines - 2.a99 is the new one. lines - 1.a99 lines - 2.a99 LINES.dsk 1 Quote Link to comment Share on other sites More sharing options...
artrag Posted October 6, 2015 Share Posted October 6, 2015 (edited) Got it. Thanks for the confirmation. I wonder what kind of black magic makes things like https://www.youtube.com/watch?v=rKqqOcTFVm0 possible on a stock MSX, which also uses the TMS9918 and a comparable processor (Z80 @ 3.5 Mhz). I can safely explain the magic behind the spinning roulette It has been done using some tools of mine based on vector quantization and on a custom version of the Floyd–Steinberg dithering adapted to the color constraints of the screen 2 mode You can find some explanation and tests here https://sites.google.com/site/ragozini/vdpenc2 and here https://sites.google.com/site/ragozini/vdpenc2bis The dithering algorithm in C is here https://sites.google.com/site/ragozini/dithering Encoder and decoder are unreleased, the encoder works in Matlab, the decoder is trivial, it just unpacks frames of 32*24 tiles in RAM and transfer them in VRAM Edited October 6, 2015 by artrag 4 Quote Link to comment Share on other sites More sharing options...
orbitaldecay Posted October 6, 2015 Author Share Posted October 6, 2015 Sweet. Thanks artrag and Asmusr-M! Quote Link to comment Share on other sites More sharing options...
Asmusr Posted October 9, 2015 Share Posted October 9, 2015 The Z80 in the MSX can hit the VDP almost twice as hard as our 9900, but there's a LOT of tricks going on as well as brute force. Is that because of the speed difference between the CPUs or because of the RAM wait states on the TI? Quote Link to comment Share on other sites More sharing options...
+mizapf Posted October 9, 2015 Share Posted October 9, 2015 Also, I wonder whether the Geneve with its fast 9995 and the v9938 could allow for porting MSX2 stuff. Quote Link to comment Share on other sites More sharing options...
Tursi Posted October 10, 2015 Share Posted October 10, 2015 Is that because of the speed difference between the CPUs or because of the RAM wait states on the TI? Yes. I don't know the Z80 or MSX as well, so maybe artrag can fact check me here. The Z80 in the MSX is clocked at 3.5xxxx MHz, and instructions take roughly 4 to 23 cycles (http://www.z80.info/z80time.txt). It's got some memory move instructions that can move a byte to the VDP every 21 cycles (OUTIR, and there are ways to stack the instructions to go faster, unrolled OUTI's can do 16 cycles). Some of the faster opcodes are very useful things like add (although you do still have to get data in and out of the registers). 21 cycles at 3.5MHz means a flat copy can copy a byte from a table in RAM every 6 microseconds (which is actually too fast for the VDP in most modes, where the limit is 8uS, but it works when the display is disabled or during vertical blank). 16 cycles is 4.6 microseconds, but you have to do your own looping. The 9900 is clocked at 3MHz flat and instructions take roughly 8-52 cycles (leaving out DIV). In addition, any write to the VDP on the TI-99/4A automatically costs 8 cycles more due to the wait states and the read-before-write. The more useful instructions like MOV and A are 14 cycles - a MOV to the VDP starts at 22 cycles before any other access happens (like fetching the data you intend to move). If we argue that we can do a copy with a MOV register indirect increment, and unroll the whole thing (MOVB *R1+,*R0), and run from scratchpad and have data there too, then we're looking at 14+6+4+(4+4)=32 cycles (and that doesn't include counting down or looping). 32 cycles at 3MHz means a byte every 10.6 microseconds, but the reality is we are probably at least copying from 8-bit memory, so add another 4 cycles for the read. 36 cycles is 12 microseconds. Odds are good that you'll run slightly slower in order to loop (but an unrolled loop at least minimizes that). If you're writing a static value (ie: clearing the screen), you can lose one indirection and the increment, which saves 6 cycles (MOVB R1,*R0) - 26 cycles is 8.6 microseconds, again with no looping. 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted October 10, 2015 Share Posted October 10, 2015 As an aside, the multiplexer delay is frustrating, but it's still the fastest operation the CPU deals with. It may seem tempting to remove wait states by pulling down words instead of bytes, but even in scratchpad the extra instructions just eat up much more time: MOVB *R1+,*R0 (36) MOVB *R1+,*R0 (36 - total 72 cycles) MOV *R1+,R2 (22) MOVB R2,*R0 (26) SWPB R2 (10) MOVB R2,*R0 (26 - total 84 cycles) Or the ever-popular direct access to the LSB, let's say R3 points to the LSB of R2, since that's faster than a direct memory reference: MOVB *R1+,R2 (22) MOVB R2,*R0 (26) MOVB *R3,*R0 (32 - total 80 cycles) So it's there, it's frustrating sometimes, but don't work too hard to work around it. If it doesn't fit in scratchpad, it's probably not going to be faster any way but directly accessing it. Most of the time the best optimizations on the 9900 just come from reducing the instruction count, because the base cost is so high. Quote Link to comment Share on other sites More sharing options...
Asmusr Posted October 10, 2015 Share Posted October 10, 2015 Thanks for the info. If you place the workspace at >8C00 you can clear a VDP byte using CLR R0 instead of CLR *R0. Would that be faster? Quote Link to comment Share on other sites More sharing options...
artrag Posted October 10, 2015 Share Posted October 10, 2015 (edited) On msx1 it is pretty simple to over run the VDP and miss data. The best conditions for VRAM I/O are during VBLANK and when the screen is disabled. In these cases you can move a block of ram to vram using and unrolled loop of OUTI instructions (it does the I/O, increases a pointer and decreases a counter all at the same time). It costs 16 cpu cycles on the paper but due to some crappy choices in the standard you have to add 2 extra wait states. The z80 is at 3.57Mz, so 18 cycles are about 5us. In a frame, the VBLANK lasts about 4300us (assuming NTSC at 60Hz, but there are also msx1 with PAL chips at 50Hz). This means that you can copy about 4300/5 = 860 bytes per frame at maximum speed. Outside the VBLANK, the vdp needs a delay of about 2+6= 8us between successive bytes. In z80 cycles it means about 28 cycles which corresponds to the 3 instructions: OUTI, NOP, NOP Theoretically outside VBLANK you can out about others 1546 bytes per frame (1/60-4300us)/8us = 1546 bytes You get as theoretical maximum 1546+860 = 2406 bytes per frame Edited October 10, 2015 by artrag 2 Quote Link to comment Share on other sites More sharing options...
Tursi Posted October 10, 2015 Share Posted October 10, 2015 Thanks for the info. If you place the workspace at >8C00 you can clear a VDP byte using CLR R0 instead of CLR *R0. Would that be faster? Yeah, we've toyed with that idea before (though I've never tried it)... that gets you down to 10+4+4=18 cycles (because CLR still does a read before write), which gets you down to 6 microseconds. I'd tried to think about how to use that before but I guess for clearing it's not so bad -- if you're in vblank you can get away with it. Outside of vblank you need to run slower. MOV @ADR,R0 with the workspace in scratchpad would be 14+8+4+4+4=34 cycles if ADR was in 8-bit RAM, so that's 11.3 microseconds, but that's almost as slow as register indirect with increment, and arguably less useful. Just thinking aloud now... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.