Wrathchild Posted March 22, 2010 Share Posted March 22, 2010 I guess on all versions the action slows down considerably when many objects are within view and so worrying about a few cycles here or there seems an odd argument to me? Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 22, 2010 Share Posted March 22, 2010 (edited) I guess on all versions the action slows down considerably when many objects are within view and so worrying about a few cycles here or there seems an odd argument to me? You can bump the frame rate a little by optimizing the line drawing but you aren't going to get massive improvements. It just takes time to draw lines. With a 256 pixel wide bitmapped screen you can eliminate 16 bit math through much of the line drawing and that offers a noticeable reduction in clock cycles but the math is still a bottleneck. Now, if you have a 65816 you could probably speed things up since it supports 16 bit math so a 320 pixel wide screen isn't as big of a deal... still slower than 256 but not as much. Too bad they didn't include a multiply instruction on the 65816. That would save a lot of clock cycles. That's one place Motorola has a huge advantage with the 6803 and 6809. <edit> Saving a few cycles here or there makes the difference between the game still being fun and the game being unplayable. For the smooth animation you really need a 6502 at a much higher clock rate. Edited March 22, 2010 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 22, 2010 Share Posted March 22, 2010 You can disable the screen refresh on the Plus/4 if I remember right... it would let you run full speed but it wouldn't do you much good. Elite would have a much better frame rate but you couldn't actually see it. You can turn the screen off and get fill processor speed all the time yes, some decompression schemes do that to get the job done more quickly. What you can also do is shorten the screen vertically (as well as expand it) because the raster register is read/write - you can "lie" to the machine about which scanline it's currently on and it'll take your word for it. Well, like I said about the Atari... in this case it's a moot point. Now, if I were writing a compiler or some other really time intensive utility that could be really useful. For this topic I'm not sure messing with the screen size would be of much use. Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted March 22, 2010 Share Posted March 22, 2010 Both my BBC and C64 conversions are using a narrow screen mode rather than normal width. Made sense as the play area was 256 pixels wide. Quote Link to comment Share on other sites More sharing options...
Lazarus Posted March 22, 2010 Share Posted March 22, 2010 Atari is actually a bit quicker than Plus/4 in default screen mode. Plus/4 has less Refresh (5 vs 9 per line normally), but it has two "badlines" per character row rather than one. Oops, yes forgot about the dual badlines. It's ~1.2 MHz on +4. Fastest "net speed" of Atari should be ~ 1.63 MHz if screen DMA is disabled on PAL. Yeps, but for a screenmode used to run Elite I'd guess it's ~1.3 MHz. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted March 22, 2010 Share Posted March 22, 2010 Wrathchild... I assume that you are still using bitmap mode and not charmode mimic from c64? Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 22, 2010 Share Posted March 22, 2010 Both my BBC and C64 conversions are using a narrow screen mode rather than normal width. Made sense as the play area was 256 pixels wide. I'd guess they are calculating for 256 and centering it on a 320 screen. Not as efficient as an actual 256 pixel screen width though. Quote Link to comment Share on other sites More sharing options...
atariksi Posted March 22, 2010 Share Posted March 22, 2010 Both my BBC and C64 conversions are using a narrow screen mode rather than normal width. Made sense as the play area was 256 pixels wide. I'd guess they are calculating for 256 and centering it on a 320 screen. Not as efficient as an actual 256 pixel screen width though. Although you can set to 256 pixels on A8 with LMS or narrow mode, an efficient line algorithm doesn't need that power of 2 width. If you add the deltaX fractional overflow to screenaddr (carry flag) and then deltaY fractional overflow with a branch on carry to add the width amount, it's same for powers of 2 or non-powers of two. Quote Link to comment Share on other sites More sharing options...
Lazarus Posted March 22, 2010 Share Posted March 22, 2010 Although you can set to 256 pixels on A8 with LMS or narrow mode, an efficient line algorithm doesn't need that power of 2 width. If you add the deltaX fractional overflow to screenaddr (carry flag) and then deltaY fractional overflow with a branch on carry to add the width amount, it's same for powers of 2 or non-powers of two. An efficient line drawing algo would simply use a table for that. Oh and to troll a bit: That table does not care if it's linear A8 adressing or char-based C64 adressing Quote Link to comment Share on other sites More sharing options...
atariksi Posted March 22, 2010 Share Posted March 22, 2010 Although you can set to 256 pixels on A8 with LMS or narrow mode, an efficient line algorithm doesn't need that power of 2 width. If you add the deltaX fractional overflow to screenaddr (carry flag) and then deltaY fractional overflow with a branch on carry to add the width amount, it's same for powers of 2 or non-powers of two. An efficient line drawing algo would simply use a table for that. Oh and to troll a bit: That table does not care if it's linear A8 adressing or char-based C64 adressing I guess you mean table for the slopes-- store the fractional dy & dy in some look-up table. Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted March 22, 2010 Share Posted March 22, 2010 @Heaven, yes - for the C64 port I've re-worked the line-draw routines (LDR) to work on a bitmapped screen. Generally the LDR will use a table to find the start address but then things are done relative to that. So in an A8 bitmap this is simple, up/down is -/+ 32 byte and left/right is a bitshift and then -/+ 1 byte on a char boundary. On the C64 this up/down is -/+ 1 byte but then -/+ 8*40 bytes on a char boundary but then left/right is a bitshift and then -/+ 8 bytes. (Hope that makes sense) @JamesD - the BBC looks to be handling this in H/W but the C64 just adds 4 characters to the Xpos to centre everything. Quote Link to comment Share on other sites More sharing options...
Lazarus Posted March 23, 2010 Share Posted March 23, 2010 I guess you mean table for the slopes-- store the fractional dy & dy in some look-up table. There is no table for slopes if you use Bresenham. Quote Link to comment Share on other sites More sharing options...
Rybags Posted March 23, 2010 Share Posted March 23, 2010 You can get more speed on 256 pixel mode on the A8, simply because you don't need to do anything with the high address byte after the initial table lookup dependant on Y-Pos (assuming you map your screen out properly). No point having a slope lookup table in something like Elite - the angles possible are virtually infinite. The Bresenham algorithm can be further optimised, although I'd suspect existing versions probably already do so. The DeltaX/Y calculations are often carried out for both dimensions in generic draw routines (and our OS one), but in reality you only need do the "short" one, the "long" one simply alternates between 0 and Line_length each iteration. Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted March 23, 2010 Share Posted March 23, 2010 I wonder if the filled vector routines from 'Ars Mori' or 'Black' are feasible to use in a game? Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted March 23, 2010 Share Posted March 23, 2010 Mark, I would not look at some demos using vector stuff esp. filled one... as these are mainly high optimised code obviously done for the one demo fx... not to mention the RAM usage for all the lookups... Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted March 23, 2010 Share Posted March 23, 2010 (edited) granted, but with bank switched carts you can include the pre-calced lookups onboard. Edited March 23, 2010 by Wrathchild Quote Link to comment Share on other sites More sharing options...
+Philsan Posted March 23, 2010 Author Share Posted March 23, 2010 granted, but with bank switched carts you can include the pre-calced lookups onboard. For demanding software, we can always use Corina cartridge (Bomb Jake). Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted March 23, 2010 Share Posted March 23, 2010 I am sure that EliteA8 would look fast enough even using standard Antic F plus our little bit more horsepower... or better... why not implement to skip each 2nd line in the draw routine like in the demos older days and/or multiply the scanlines to "Gr.7" back? Quote Link to comment Share on other sites More sharing options...
Rybags Posted March 23, 2010 Share Posted March 23, 2010 (edited) Scanline skip would look cheap IMO, kinda like that Spectrum Doom clone. VBXE blit might make filled polygons a reality - for a stock machine I reckon it'd be something of a slowdown. The absolute best optimisation would equate to 4 cycles per pixel (for 2 bpp multicolour modes), plus <whatever> overhead per scanline. Might work OK in a lower rez like OS Gr. 7 or 5 but it's a big sacrifice to get that bit of extra colour. Edited March 23, 2010 by Rybags Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted March 23, 2010 Share Posted March 23, 2010 ...we can always use Corina cartridge. I'm quite comfortable with the AtariMax flashcarts thanks as I did the support for it in Atari800WinPLus and Atari++... I read that Corina could be supported in Atari++ but it is not mentioned in 1.58? Quote Link to comment Share on other sites More sharing options...
Lazarus Posted March 23, 2010 Share Posted March 23, 2010 You can get more speed on 256 pixel mode on the A8, simply because you don't need to do anything with the high address byte after the initial table lookup dependant on Y-Pos (assuming you map your screen out properly). I bet testing for high byte change and changing it on change is slower than just LDA table,Y STA HIBYTE The main reason for 256 pixels is that you can use 8 bit Bresenham instead of 16 bit. Quote Link to comment Share on other sites More sharing options...
Rybags Posted March 23, 2010 Share Posted March 23, 2010 Yep, but you can also gain advantage with Ypos changes during the linedraw. No need to do another table-lookup, just inc or dec the high byte of the screen pointer, low byte remains the same unless XPos changes. Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted March 23, 2010 Share Posted March 23, 2010 (edited) There are other quick wins, if you know its a straight line (x1=x2 or y1=y2) then no need to use Bresenham's. [Edit] and then on the X axis its quicker as you can easily draw the 'ends' and then (if needed, e.g. x2-x1>pixels per byte) you quickly loop through slapping the 'full' byte value onto the screen. Edited March 23, 2010 by Wrathchild Quote Link to comment Share on other sites More sharing options...
Rybags Posted March 23, 2010 Share Posted March 23, 2010 Have you dug deep into the linedraw code of an existing version, ie what method and optimisations do they already use? Quote Link to comment Share on other sites More sharing options...
Lazarus Posted March 23, 2010 Share Posted March 23, 2010 (edited) Yep, but you can also gain advantage with Ypos changes during the linedraw. No need to do another table-lookup, just inc or dec the high byte of the screen pointer, low byte remains the same unless XPos changes. But you have to test that so only every 8 lines the high byte is increased. And that test is most likely slower than just updating the high byte from a table. LDA temp SBC dy BCS noychange ADC dx INX LDY table_lo,X STY zp LDY table_hi,X STY zp+1 noychange: STA temp would be: LDA temp SBC dy STA temp BCS noychange ADC dx STA temp INX LDA table_lo,X STA zp TXA AND #$07 BNE noychange INC zp+1 noychange: Edited March 23, 2010 by Lazarus Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.