chjmartin2 Posted July 5, 2021 Share Posted July 5, 2021 Hi, It was funny as I went back and read about stuff I started now 10 years ago (wow @jaybird3rd it has been more than a decade) and realized how much more I know now, and then sad at how little I have progressed. I am super excited about having MAME, as it has really sped up my development cycle. I am working on a quick demo that I think everyone will find fun. Here is my question. It is really easy to update a section of a screen if you want to update the whole line, just set up HL, BC and DE, have your screen data labelled .db style and LDIR your way to success. Here is the rub, I am very cycle starved in my current program and I only want to update a SECTION of the screen and not the whole line. I have several THEORIES of the best way to do this using the fewest t-states but am frustrated because this is clearly a solved problem that SOMEBODY knows the best way. I have a "graphics" that is 4 characters wide by 6 characters tall with both colors and characters. The total graphic is 48 bytes. I have a few versions of it and want to use it to make an animation. So let's say I wanted to load it in the bottom left side of the screen. 12,930 is the starting memory position upper left and 13,133 is the ending memory position lower right. Color is 13,954 down to 14,157. Here are the methods I think I can use, but again feel like this something that somebody else has optimized and why reinvent the wheel. Anyway... I could: ld de, 12930 ; 10 t-states ld hl, IMAGECHARLINEONE ; 10 t-states ld bc, 4 ; 10 t-states LDIR ; 21, 21, 21, 15 t-states ld de, 12970 ld hl, IMAGECHARLINETWO ld bc, 4 LDIR ... It will be like 648 t-states - an eternity and then double it for the color. This just seems like an awfully inefficient way of doing it and doesn't PUSH, POP use IX/IY or anything else. My alternative is just: ld hl, 12930 ;10 t-states ld (hl), nn ;10 t-states inc hl ;11 t-states ld (hl), nn ;10 t-states inc hl ;11 t-states ld (hl),nn ;10 t-states inc hl ;11 t-states ld (hl),nn ;10 t-states ld hl, 12970 ... This would be 498 t-states, better than LDIR but still quite a bit but still again doubled for the color. Is this a solved problem and ideally I would have a more generic routine where I specify the upper left corner and point to the ROM location of the character/color data and it draws it to screen. Any thoughts would be appreciated. Sorry if it is a basic obvious question. Thanks, Chris Quote Link to comment Share on other sites More sharing options...
jltursan Posted July 13, 2021 Share Posted July 13, 2021 If you're absolutely in need of saving cycles the fastest way to dump a graphic is using stack-blasting; but to do so "easily" you need to assume an even width of graphics or auto-modifying code. This way you can POP registers (HL',DE',BC',HL,DE,BC,IX,IY: 16 bytes width) from source and PUSH them into destination, tricky and a bit dirty but there're no fastest ways to do it ? ld hl, 12930 ;10 t-states ld (hl), nn ;10 t-states inc hl ;11 t-states ld (hl), nn ;10 t-states inc hl ;11 t-states ld (hl),nn ;10 t-states inc hl ;11 t-states ld (hl),nn ;10 t-states ld hl, 12970 That's good enough, sadly there's no way to use "INC L" instead "INC HL" due the Aquarius 40 bytes screen width ? Quote Link to comment Share on other sites More sharing options...
chjmartin2 Posted July 14, 2021 Author Share Posted July 14, 2021 On 7/13/2021 at 3:32 AM, jltursan said: If you're absolutely in need of saving cycles the fastest way to dump a graphic is using stack-blasting; but to do so "easily" you need to assume an even width of graphics or auto-modifying code. This way you can POP registers (HL',DE',BC',HL,DE,BC,IX,IY: 16 bytes width) from source and PUSH them into destination, tricky and a bit dirty but there're no fastest ways to do it ? So help me with that because I am very interested in it. I don't know what stack blasting is, nor do I know how to auto-modify code. Is there somewhere I can read about it? Quote Link to comment Share on other sites More sharing options...
jltursan Posted July 15, 2021 Share Posted July 15, 2021 The stack blasting techniques come from the use of stack to quickly transfer (or fill) data. Think that the fastest way to put a byte in RAM is by means of a "LD (HL),A" that runs in 7 t-states; using an "LDI", 16 t-states, you can get some extra speed if you can take profit of the extra updated registers. Now, using the stack, you can "PUSH xx" 2 bytes at a cost of only 11 t-states. Of course it's not so good, you need to load the PUSHed register and build around the code needed to really transfer a block... As an example: ld (savesp),sp ; saves the real stack ld sp,sprite_data+12 ; 10 pop de ; 10 pop bc ; 10 exx ; 4 pop de ; 10 pop bc ; 10 pop ix ; 14 pop iy ; 14 ld sp,$3000+4 ; 10 push ix ; 15 push iy ; 15 ld sp,$3000+40+4 ; 10 push de ; 11 push bc ; 11 ld sp,$3000+80+4 ; 10 exx ; 4 push de ; 11 push bc ; 11 ld sp,(savesp) ; restores the real stack That untested and rough code dumps a 4x3 chars image without attributes to a fixed screen position (top left area). The code can increase its complexity a lot when trying to parametrize its use and making it as "universal" as possible. Right now AF or HL are not being used, you can use them to implement some logic or trade its use in benefit of the stack chain. This code run in 190 t-states so this is about 11 t-states per byte transferred, close to expected. About the so-called "auto-modifying" code, it just means that you can dinamically "overwrite" you code poking data over it. Usually you're not going to overwrite instruction codes (of course sometime you can and it's really handy), it's common to change data values like the ones loaded in the "LD SP,nn" instructions above, "nn" can be replaced easily and you're self-modifying your code ? Hope you get the idea... Quote Link to comment Share on other sites More sharing options...
chjmartin2 Posted July 15, 2021 Author Share Posted July 15, 2021 4 hours ago, jltursan said: The stack blasting techniques come from the use of stack to quickly transfer (or fill) data. Think that the fastest way to put a byte in RAM is by means of a "LD (HL),A" that runs in 7 t-states; using an "LDI", 16 t-states, you can get some extra speed if you can take profit of the extra updated registers. Now, using the stack, you can "PUSH xx" 2 bytes at a cost of only 11 t-states. Of course it's not so good, you need to load the PUSHed register and build around the code needed to really transfer a block... As an example: ld (savesp),sp ; saves the real stack ld sp,sprite_data+12 ; 10 pop de ; 10 pop bc ; 10 exx ; 4 pop de ; 10 pop bc ; 10 pop ix ; 14 pop iy ; 14 ld sp,$3000+4 ; 10 push ix ; 15 push iy ; 15 ld sp,$3000+40+4 ; 10 push de ; 11 push bc ; 11 ld sp,$3000+80+4 ; 10 exx ; 4 push de ; 11 push bc ; 11 ld sp,(savesp) ; restores the real stack That untested and rough code dumps a 4x3 chars image without attributes to a fixed screen position (top left area). The code can increase its complexity a lot when trying to parametrize its use and making it as "universal" as possible. Right now AF or HL are not being used, you can use them to implement some logic or trade its use in benefit of the stack chain. This code run in 190 t-states so this is about 11 t-states per byte transferred, close to expected. About the so-called "auto-modifying" code, it just means that you can dinamically "overwrite" you code poking data over it. Usually you're not going to overwrite instruction codes (of course sometime you can and it's really handy), it's common to change data values like the ones loaded in the "LD SP,nn" instructions above, "nn" can be replaced easily and you're self-modifying your code ? Hope you get the idea... Yeah, so I have not used the stack at all in any of my programs and this is an application I want to get better at. I think I will try to create a 4x3 frame animation demo to see exactly how fast I can get it to run. 190 t-states is MUCH faster (like 60%) than my unrolled loop code. Quote Link to comment Share on other sites More sharing options...
jltursan Posted July 15, 2021 Share Posted July 15, 2021 (edited) I never ended up using a "stack-blasting" sprite routine as I usually need routines a little bit more universal, of course they can be enhanced; but they're hard to adapt as you are using all available registers (the more registers, the faster you can get). However, this next routine is between my favorites, used to fill RAM with a value very fast: ; HL=start ; DE=length ( must be multiple of 8 ) ; A=byte fillram: ld (stack),sp ; Only remove if you're sure you don't need the stack add hl,de ld sp,hl ; start = FINAL_ADDRESS+1 srl d rr e ; LENGTH_TO_FILL/2 srl d rr e ; LENGTH_TO_FILL/4 srl d rr e ; LENGTH_TO_FILL/4 ld h,a ld l,a ; HL= bytes to fill ld b,e dec de inc d .loop: [4] push hl djnz .loop dec d jp nz,.loop ld sp,(stack) ret Edited July 15, 2021 by jltursan Formatting fixes Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.