Heaven/TQA Posted June 28, 2008 Share Posted June 28, 2008 ehm... we are talking about the draw-routine for building the screen? or do you need the line routine ingame as well? i found it quite fast enough... compared to the first one... Quote Link to comment Share on other sites More sharing options...
+CharlieChaplin Posted June 28, 2008 Share Posted June 28, 2008 (edited) Well, here is a Code3 Cruncher packed version of the Tempest V29 release. Think this one can be loaded with most DOS and gamedos versions and Basic will be disabled automatically. However, if loaded from DOS, one cannot return to DOS after the program has been unpacked and started (but the program will do a coldstart when Reset is pressed)... Some redundant file segment info: $0400-042C = Basic off switch, as released by Bill Wilkinson in Compute! some aeons ago, $0244-0244 = coldstart routine for Reset-key, $0600-0668 = simple text title which appears while loading the game, all other segments = Code3 Cruncher packed data + depacking routines... greetings, Andreas Koch. Edited June 28, 2008 by CharlieChaplin Quote Link to comment Share on other sites More sharing options...
peteym5 Posted June 28, 2008 Share Posted June 28, 2008 (edited) Well having all the variables in zero page was done almost right away. The routine I came across uses the same zero page locations as what the what the OS line drawing routine used. That avoids lots of conflicts. One issue I have been looking at is it uses 2 bytes to store deltax, deltay, tempx, tempy, and wonder how accurate it would be if I found a way to do it with only 8-bits. As Rybags noted, 16-bits are only needed for GR8 lines on normal screen. Also, why do you care about the zero-page locations reserved by the OS? First thing you do you switch the OS off I also have looked at the ideal of cases for each direction, actually think you end up with 8 depending on the slope. I do agree one stumbling block is calculating the screen address for every pixel point there. As I said, you don't need to have 8, 4 is enough. You assume you only handle right (or top, left, down) facing lines, and if you detect one facing left, you simply reverse the start and the end points. Extra step is to optiimize the horizontal or vertical lines, but it's only useful if you know you have a lot of these. Also, you don't have to FULLY compute the address for every pixel. If you follow my suggestion, where X register holds your current X, then you save a lot. Also, it's not smart to keep a 'currentY' variable and for every pixel compute the address. If you detect a line change, you can simply add/sub 40 from a zero page, it's a bit faster. And if you can store every line on a different page, it's even faster. If you send me your line routine, I can probably improve it Doing Add or Subtract the byte screen width is something I have thought about. However storing every line on a different page probably will not work for a game like Tempest because we use vast amounts of RAM already. There is an option with using self-modifying code that can also be implanted, instead of loading the A register, adding 1 or 255, and storing it, simply use INC or DEC, the opcode will be changed depending on the direction. Could be further enhanced with INX or DEX, if the X register can be freed up. This is really a side project I am working on and I should place it in a new thread. Just trying to see how fast we can make something go. I will see if I can get the line drawing routine ready to be sent. Edited June 28, 2008 by peteym5 Quote Link to comment Share on other sites More sharing options...
eru Posted June 29, 2008 Share Posted June 29, 2008 I will see if I can get the line drawing routine ready to be sent. I looked at this routine and it's just oh-so-slow I quickly hacked a different version. As usual, we tradeoff space for speed - it requires 2KB of extra arrays. Can use less, but will get slower. I don't know if Tempest really needs a fast drawto, but perhaps someone else can use it. Sources: http://homepages.cwi.nl/~marcin/a8/drawto15.asx Executable: http://homepages.cwi.nl/~marcin/a8/drawto15.xex Quote Link to comment Share on other sites More sharing options...
peteym5 Posted June 29, 2008 Share Posted June 29, 2008 I actually did find a few compact ways of increasing speed without using huge tables. What I've been doing for plot pixel is using a different sort of table for mask and color. The tables are only like 20 bytes in size. Of course we still have the other 192x2 row address table. Oh yes, I successfully did test adding or subtracting 40 to ROWAC instead of have to multiply or lookup the row address for each row. I don't see a whole lot of gain beyond what I have done already. Does in about 35 CPU cycles. I can always do away with the JSR-RTS in the drawto part. Code is something like this: PLOTPIXEL LDA PIXELCOLUMN AND #3 TAX LDA PIXELCOLUMN LSR LSR TAY LDA (ROWAC),Y AND ANDMASK,X GETCMASK = *+1 ORA $FFFF,X STA (ROWAC),Y RTS GETMASK LDA COLOR ASL ASL CLC ADC #<COLORMASK0 STA GETCMASK LDA #0 ADC #>COLORMASK0 STA GETCMASK+1 RTS ANDMASK DTA 63,207,243,252 COLORMASK0 DTA 0,0,0,0 COLORMASK1 DTA 64,16,4,1 COLORMASK2 DTA 128,32,8,2 COLORMASK3 DTA 192,48,12,3 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted June 29, 2008 Share Posted June 29, 2008 thanks Eru, for sharing the source... it's really readable... but you can not leave out the XASM/QASM fragments like DTA i am forcing me since for few months to use .byte etc instead... just one question as I never understood 100% the MADS macros what are the :1 doing? esp. the .def :1 = * ? PIXEL .MACRO ldy div4,x lda (zer),y and mask,x .def :1 = * ora color_bits,x sta (zer),y .ENDM PREPARE .MACRO sta todo inc todo lsr @ sta tmp lda color ora >color_bits sta :1+2 .ENDM Quote Link to comment Share on other sites More sharing options...
eru Posted June 29, 2008 Share Posted June 29, 2008 (edited) I actually did find a few compact ways of increasing speed without using huge tables. What I've been doing for plot pixel is using a different sort of table for mask and color. The tables are only like 20 bytes in size. Of course we still have the other 192x2 row address table. Oh yes, I successfully did test adding or subtracting 40 to ROWAC instead of have to multiply or lookup the row address for each row. I don't see a whole lot of gain beyond what I have done already. Does in about 35 CPU cycles. I can always do away with the JSR-RTS in the drawto part. Code is something like this: These are of course valid optimizations. If it's fast enough, no point in using tables. Btw, 2KB for my standards is far from huge Another thing I found in your sources is multiplication by 40 for every pixel - please avoid it. thanks Eru, for sharing the source... it's really readable... but you can not leave out the XASM/QASM fragments like DTA i am forcing me since for few months to use .byte etc instead... I hacked it in like 45-60 minutes last night, what do you expect And i like DTA just one question as I never understood 100% the MADS macros what are the :1 doing? esp. the .def :1 = * ? PIXEL .MACRO ldy div4,x lda (zer),y and mask,x .def :1 = * ora color_bits,x sta (zer),y .ENDM PREPARE .MACRO sta todo inc todo lsr @ sta tmp lda color ora >color_bits sta :1+2 .ENDM These are macros. I only used them to make the code shorter, as these pieces repeat roughly for all 4 cases. I truly recommend using macros - these are great. But sometimes dangerous, and MADS handling of the parameters leaves a bit to be desired. :1 means 'insert here macro parameter number 1' ".def :1 = *" means "define a label with a name defined by the macro parameter number 1 and a value equal to the current address" Edited June 29, 2008 by eru Quote Link to comment Share on other sites More sharing options...
peteym5 Posted June 29, 2008 Share Posted June 29, 2008 Thankyou for your help. I am looking to keep the code and tables compact and small since we might be needing some spare ram for wave sounds. Doing some of this research did reduce the time Tempest takes to delete and redraw the screen, which is now less than 0.1 seconds. There is much potential in this and future projects may benefit. I am considering setting something up for APAC screen mode manipulation, have the ability to draw lines in 256 colors. No its not for Tempest, but for games that don't need to have high resolution. On that note, has anyone tried to just make a custom NMI instead DLIs for that? Can make something that won't hog the CPU too badly with that case. Many are stating you can't do much with APAC because of the resources it consumes. Quote Link to comment Share on other sites More sharing options...
Rybags Posted June 30, 2008 Share Posted June 30, 2008 (edited) With APAC in standard width, pretty sure you must do it with a single DLI/kernal. However, there are 2 things that can be done on a 64K machine which might free up enough cycles: - a user vector at $FFFA that redirects to the DLI directly. Code that checks the NMI source could be avoided if a Pokey Timer routine was also used to change the vector to service the VBlank at the appropriate time. Saving there per DLI (since we can bypass this part of the OS code) : BIT $D40F ; 4 cycles BPL DOVBLANK ; 2 cycles (failed branch) JMP ($200) ; 5 cycles 11 cycles saved. - Disable Display List Instruction Fetch on Antic. That would save 1 cycle per scanline, assuming you're doing GTIA modes in Antic F. When DList Instruction fetch is disabled, Antic just updates the memory scan counter and continues displaying the same mode. Of course the problem here is that you have to cater for the memory scan crossing a 4K boundary. Then there's the usual bag of tricks in the NMI itself like self-modifying code, z-page variables etc. For drawing in APAC. One possible workaround which allows using standard routines is to map the colour and luma lines in seperate consecutive 4K bitmaps. e.g. LMS F COLOUR LMS F LUMA LMS F COLOUR + 40 LMS F LUMA + 40 etc. Of course doing that, we lose 2 cycles per scanline due to the extra 2 bytes used for an LMS every scanline. Then, to draw lines, just draw the colour one first, change the screen pointer then draw the LUMA portion. Edited June 30, 2008 by Rybags Quote Link to comment Share on other sites More sharing options...
peteym5 Posted June 30, 2008 Share Posted June 30, 2008 Actually my ideal of getting around the DLIs was to not use the DLIVECTOR (512, $200) and do a simple toggle of the high bit of PRIOR in the NMI, only have to save the A register and restore it before the RTI. The VBI will just set the initial value. Since we have 256 colors available, may not have to use the Player/Missile graphics. It would leave more CPU cycles open for the VBI and other interupts. Of course a APAC type game does not have to be highspeed since I am looking to use it for turn based strategy or puzzle type games like the puzzle piece shifting thing, tetris, or columns. You're most likely will not have enough CPU cycles to do a side-scrollers, high speed shoot-em ups, action games. I have checked out the disable-fetch thing, but for some reason it looked weired in the emulator. Have to try it on a real Atari. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted June 30, 2008 Share Posted June 30, 2008 (edited) ehm... re: APAC... why do you need to squeeze every cycle out of it? have a look on my demo http://atari.fandal.cz/detail.php?files_id=224 there is a complete music player at the beginning or look the the "manga pic" distorter... so it is not like that you have no CPU cycle left... Jesus, done 1996.... i am getting old... and I remember Fox or Eru using APAC in a Forever-compo intro (the unlimited balls one), and Hiassoft (yes...the one doing the high speed sio) has done a demo called "plasma clouds" in 256 colours...and I used 256 apac in font mode... or am I on a complete wrong track??? titel44.zip Edited June 30, 2008 by Heaven/TQA Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.