Jump to content
IGNORED

[AQUARIUS] - Screen Block Data Copy


Recommended Posts

Hi,

 

It was funny as I went back and read about stuff I started now 10 years ago (wow @jaybird3rd it has been more than a decade) and realized how much more I know now, and then sad at how little I have progressed.  I am super excited about having MAME, as it has really sped up my development cycle.  I am working on a quick demo that I think everyone will find fun.  Here is my question.  It is really easy to update a section of a screen if you want to update the whole line, just set up HL, BC and DE, have your screen data labelled .db style and LDIR your way to success.  Here is the rub, I am very cycle starved in my current program and I only want to update a SECTION of the screen and not the whole line.  I have several THEORIES of the best way to do this using the fewest t-states but am frustrated because this is clearly a solved problem that SOMEBODY knows the best way.  I have a "graphics" that is 4 characters wide by 6 characters tall with both colors and characters.  The total graphic is 48 bytes.  I have a few versions of it and want to use it to make an animation.  So let's say I wanted to load it in the bottom left side of the screen.  

 

12,930 is the starting memory position upper left and 13,133 is the ending memory position lower right.  Color is 13,954 down to 14,157.

 

Here are the methods I think I can use, but again feel like this something that somebody else has optimized and why reinvent the wheel.  Anyway... I could:

 

ld de, 12930                                    ; 10 t-states

ld hl, IMAGECHARLINEONE               ;  10 t-states 

ld bc, 4                                           ; 10 t-states

LDIR                                               ; 21, 21, 21, 15 t-states

ld de, 12970                                    

ld hl, IMAGECHARLINETWO

ld bc, 4

LDIR

...

 

It will be like 648 t-states - an eternity and then double it for the color.

 

This just seems like an awfully inefficient way of doing it and doesn't PUSH, POP use IX/IY or anything else.  My alternative is just:

 

ld hl, 12930                  ;10 t-states

ld (hl), nn                     ;10 t-states

inc hl                            ;11 t-states

ld (hl), nn                     ;10 t-states

inc hl                            ;11 t-states

ld (hl),nn                      ;10 t-states

inc hl                            ;11 t-states

ld (hl),nn                      ;10 t-states

ld hl, 12970

...

 

This would be 498 t-states, better than LDIR but still quite a bit but still again doubled for the color.

 

Is this a solved problem and ideally I would have a more generic routine where I specify the upper left corner and point to the ROM location of the character/color data and it draws it to screen.  Any thoughts would be appreciated.  Sorry if it is a basic obvious question.

 

Thanks,

 

Chris

 

 

 

 

 

Link to comment
Share on other sites

If you're absolutely in need of saving cycles the fastest way to dump a graphic is using stack-blasting; but to do so "easily" you need to assume an even width of graphics or auto-modifying code. This way you can POP registers (HL',DE',BC',HL,DE,BC,IX,IY: 16 bytes width) from source and PUSH them into destination, tricky and a bit dirty but there're no fastest ways to do it ?

 


ld hl, 12930                  ;10 t-states

ld (hl), nn                     ;10 t-states

inc hl                            ;11 t-states

ld (hl), nn                     ;10 t-states

inc hl                            ;11 t-states

ld (hl),nn                      ;10 t-states

inc hl                            ;11 t-states

ld (hl),nn                      ;10 t-states

ld hl, 12970

 

That's good enough, sadly there's no way to use "INC L" instead "INC HL" due the Aquarius 40 bytes screen width ?

 

 

Link to comment
Share on other sites

On 7/13/2021 at 3:32 AM, jltursan said:

If you're absolutely in need of saving cycles the fastest way to dump a graphic is using stack-blasting; but to do so "easily" you need to assume an even width of graphics or auto-modifying code. This way you can POP registers (HL',DE',BC',HL,DE,BC,IX,IY: 16 bytes width) from source and PUSH them into destination, tricky and a bit dirty but there're no fastest ways to do it ?

 

So help me with that because I am very interested in it.  I don't know what stack blasting is, nor do I know how to auto-modify code.  Is there somewhere I can read about it?

 

Link to comment
Share on other sites

The stack blasting techniques come from the use of stack to quickly transfer (or fill) data. Think that the fastest way to put a byte in RAM
is by means of a "LD (HL),A" that runs in 7 t-states; using an "LDI", 16 t-states, you can get some extra speed if you can take profit of the extra 
updated registers.
Now, using the stack, you can "PUSH xx" 2 bytes at a cost of only 11 t-states. Of course it's not so good, you need to load the PUSHed register and
build around the code needed to really transfer a block...

 

As an example:

 

ld (savesp),sp           ; saves the real stack
ld sp,sprite_data+12     ; 10
pop de                   ; 10
pop bc                   ; 10
exx                      ; 4
pop de                   ; 10
pop bc                   ; 10
pop ix                   ; 14
pop iy                   ; 14
ld sp,$3000+4            ; 10
push ix                  ; 15
push iy                  ; 15
ld sp,$3000+40+4         ; 10
push de                  ; 11
push bc                  ; 11
ld sp,$3000+80+4         ; 10
exx                      ; 4
push de                  ; 11
push bc                  ; 11
ld sp,(savesp)           ; restores the real stack

 

That untested and rough code dumps a 4x3 chars image without attributes to a fixed screen position (top left area). 
The code can increase its complexity a lot when trying to parametrize its use and making it as "universal" as possible. Right now AF or HL are not being used, you can use them to implement some logic or trade its use in benefit of the stack chain.

 

This code run in 190 t-states so this is about 11 t-states per byte transferred, close to expected.

 

About the so-called "auto-modifying" code, it just means that you can dinamically "overwrite" you code poking data over it. Usually you're not going to overwrite instruction codes (of course sometime you can and it's really handy), it's common to change data values like the ones loaded in the "LD SP,nn" instructions above, "nn" can be replaced easily and you're self-modifying your code ?

 

Hope you get the idea...

Link to comment
Share on other sites

4 hours ago, jltursan said:

The stack blasting techniques come from the use of stack to quickly transfer (or fill) data. Think that the fastest way to put a byte in RAM
is by means of a "LD (HL),A" that runs in 7 t-states; using an "LDI", 16 t-states, you can get some extra speed if you can take profit of the extra 
updated registers.
Now, using the stack, you can "PUSH xx" 2 bytes at a cost of only 11 t-states. Of course it's not so good, you need to load the PUSHed register and
build around the code needed to really transfer a block...

 

As an example:

 

ld (savesp),sp           ; saves the real stack
ld sp,sprite_data+12     ; 10
pop de                   ; 10
pop bc                   ; 10
exx                      ; 4
pop de                   ; 10
pop bc                   ; 10
pop ix                   ; 14
pop iy                   ; 14
ld sp,$3000+4            ; 10
push ix                  ; 15
push iy                  ; 15
ld sp,$3000+40+4         ; 10
push de                  ; 11
push bc                  ; 11
ld sp,$3000+80+4         ; 10
exx                      ; 4
push de                  ; 11
push bc                  ; 11
ld sp,(savesp)           ; restores the real stack

 

That untested and rough code dumps a 4x3 chars image without attributes to a fixed screen position (top left area). 
The code can increase its complexity a lot when trying to parametrize its use and making it as "universal" as possible. Right now AF or HL are not being used, you can use them to implement some logic or trade its use in benefit of the stack chain.

 

This code run in 190 t-states so this is about 11 t-states per byte transferred, close to expected.

 

About the so-called "auto-modifying" code, it just means that you can dinamically "overwrite" you code poking data over it. Usually you're not going to overwrite instruction codes (of course sometime you can and it's really handy), it's common to change data values like the ones loaded in the "LD SP,nn" instructions above, "nn" can be replaced easily and you're self-modifying your code ?

 

Hope you get the idea...

Yeah, so I have not used the stack at all in any of my programs and this is an application I want to get better at.  I think I will try to create a 4x3 frame animation demo to see exactly how fast I can get it to run.  190 t-states is MUCH faster (like 60%) than my unrolled loop code.  

Link to comment
Share on other sites

I never ended up using a "stack-blasting" sprite routine as I usually need routines a little bit more universal, of course they can be enhanced; but they're hard to adapt as you are using all available registers (the more registers, the faster you can get).

 

However, this next routine is between my favorites, used to fill RAM with a value very fast:


; HL=start
; DE=length ( must be multiple of 8 )
; A=byte
fillram:
        ld (stack),sp        ; Only remove if you're sure you don't need the stack
        add hl,de    
        ld sp,hl            ; start = FINAL_ADDRESS+1
        srl d
        rr e                ; LENGTH_TO_FILL/2
        srl d
        rr e                ; LENGTH_TO_FILL/4
        srl d
        rr e                ; LENGTH_TO_FILL/4
        ld h,a
        ld l,a                ; HL= bytes to fill
        ld b,e               
        dec de               
        inc d                
.loop:
[4]     push hl              
        djnz .loop
        dec d
        jp nz,.loop
        ld sp,(stack)
        ret

Edited by jltursan
Formatting fixes
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...