Jump to content
IGNORED

Rewrite of XB ROMs


RXB

Recommended Posts

In order to gain some space in XB ROMs I have seen how to eliminate some insanely useless Assembly in XB ROM 1 that is no faster then GPL.

In this case it is GRSUB1 in XB GROM 3 that calls the XML GREAD is changed here:

***********************************************************
* SUBROUTINE TO READ 2 BYTES OF DATA FROM VDP OR ERAM
*  (use GREAD)
***********************************************************
GRSUB1 FETCH @FAC4            Fetch the source addr on ERAM
       DST  *FAC4,@FAC2       Put it in @FAC2
       CZ   @RAMTOP           If ERAM present
       BS   G6823
       DST  2,@FAC4           @FAC4 : Byte count
       XML  GREAD             Read data from ERAM
*                             @FAC6 : Destination addr on CP
       BR   G6827             ERAM not exists
G6823  DST  V*FAC2,@FAC6      Read data from VDP
G6827  RTN
***********************************************************

RXB NEW PATCH CODE
***********************************************************
GRSUB1 FETCH @FAC4            Fetch the source addr on ERAM
       DST  *FAC4,@FAC2       Put it in @FAC2
       CZ   @RAMTOP           If ERAM present
       BS   G6823
       DST  2,@FAC4         @FAC4 : Byte count
       DST  *FAC2,@FAC6      Read data from ERAM
*                           @FAC6 : Destination addr on CPU
       BR   G6827             ERAM not exists
G6823  DST  V*FAC2,@FAC6      Read data from VDP
G6827  RTN
***********************************************************

 

Now this is the Assembly in XB ROM 1 that is to be removed as it is not any faster than DST *FAC2,@FAC6 vs XML GREAD

3943 7EA6              AORG >7EA6   

 99/4 ASSEMBLER
GREADS                                                       PAGE 0091
  3945                
  3946            * (RAM to RAM)  
  3947            * Read data from ERAM   
  3948            * @GSRC  : Source address on ERAM   
  3949            * @DEST  : Destination address in CPU   
  3950            *           Where the data stored after read from ERAM  
  3951            * @BCNT3 : byte count   
  3952 7EA6 0203  GREAD1 LI   R3,BCNT3          # of bytes to move  
       7EA8 8356  
  3953 7EAA 0202         LI   R2,GSRC           Source in ERAM  
       7EAC 8354  
  3954 7EAE 0201         LI   R1,DEST           Destination in CPU  
       7EB0 8358  
  3955 7EB2 1006         JMP  GRZ1              Jump to common routine  
  3956            * Read data from ERAM to CPU  
  3957            * @ADDR1 : Source address on ERAM   
  3958            * @ADDR2 : Destination address in CPU   
  3959            *           Where the data stored after read from ERAM  
  3960            * @BCNT1 : byte count   
  3961 7EB4 0203  GREAD  LI   R3,BCNT1          # of bytes to move  
       7EB6 834E  
  3962 7EB8 0202         LI   R2,ADDR1          Source in ERAM  
       7EBA 834C  
  3963 7EBC 0201         LI   R1,ADDR2          Destination in CPU  
       7EBE 8350  
  3964            * Common ERAM to CPU transfer routine   
  3965 7EC0 C112  GRZ1   MOV  *R2,R4  
  3966 7EC2 DC74  GRZ2   MOVB *R4+,*R1+         Move byte from ERAM to CPU  
  3967 7EC4 0613         DEC  *R3               One less to move, done?   
  3968 7EC6 16FD         JNE  GRZ2              No, copy the rest   
  3969 7EC8 045B         RT   
  3970            ************************************************************

There are a few other Assembly routines in XB ROM 1 that are no faster than GPL commands or same speed but take up way more to set up vs GPL.

I am looking at maybe as much as almost 1K of assembly in XB ROM 1 that is much less efficient than GPL CALL MOVE command that only takes 7 bytes.

But is no slower than the above assembly routine due to all the time to set up just to use assembly will benefit more overall.

 

As example in GPL 

CALL MOVE @BCNT1,*ADDR1,*ADDR2  only takes 7 bytes and does not require reloading variables to make a call to make it work.

So 7 bytes vs 34 bytes but they do the same thing at same amount of time.

 

I still have to do some timing checks but I do need room to add INTEGERs to XB to go with Floating Point too.

  • Like 5
Link to comment
Share on other sites

I am thinking the code for GREAD looks very similar to what you showed. There are not too many different ways to read bytes from VDP RAM.

 

3965 7EC0 C112  GRZ1   MOV  *R2,R4  
  3966 7EC2 DC74  GRZ2   MOVB *R4+,*R1+         Move byte from ERAM to CPU  
  3967 7EC4 0613         DEC  *R3               One less to move, done?   
  3968 7EC6 16FD         JNE  GRZ2              No, copy the rest   
  3969 7EC8 045B         RT   

 

If you have room you could make a 2nd slightly faster one for an even number of bytes, by reading two bytes in row inside the loop and use DECT on the count.

It could also be a bit faster if R3 contained the actual count and not a pointer to the count.

Something like this. ??

 

 GREAD  LI   R3,BCNT1        

        LI   R2,ADDR1          Source in ERAM  
 
       	LI   R1,ADDR2          Destination in CPU  
 

GRZ3   	MOV  *R2,R4  
        MOV  *R3,R3            # of bytes to move in R3 not address of BCNT1   
GRZ4  	MOVB *R4+,*R1+         Move 1st byte from ERAM to CPU  
        MOVB *R4+,*R1+         Move 2nd byte from ERAM to CPU  
        DECT  R3               decr. counter by two
        JNE  GRZ3              
        RT   

 

 

  • Like 3
Link to comment
Share on other sites

6 hours ago, TheBF said:

I am thinking the code for GREAD looks very similar to what you showed. There are not too many different ways to read bytes from VDP RAM.

 

3965 7EC0 C112  GRZ1   MOV  *R2,R4  
  3966 7EC2 DC74  GRZ2   MOVB *R4+,*R1+         Move byte from ERAM to CPU  
  3967 7EC4 0613         DEC  *R3               One less to move, done?   
  3968 7EC6 16FD         JNE  GRZ2              No, copy the rest   
  3969 7EC8 045B         RT   

 

If you have room you could make a 2nd slightly faster one for an even number of bytes, by reading two bytes in row inside the loop and use DECT on the count.

It could also be a bit faster if R3 contained the actual count and not a pointer to the count.

Something like this. ??

 

 GREAD  LI   R3,BCNT1        

        LI   R2,ADDR1          Source in ERAM  
 
       	LI   R1,ADDR2          Destination in CPU  
 

GRZ3   	MOV  *R2,R4  
        MOV  *R3,R3            # of bytes to move in R3 not address of BCNT1   
GRZ4  	MOVB *R4+,*R1+         Move 1st byte from ERAM to CPU  
        MOVB *R4+,*R1+         Move 2nd byte from ERAM to CPU  
        DECT  R3               decr. counter by two
        JNE  GRZ3              
        RT   

 

 

Well the issue is this Assembly routine is not faster then GPL as it only moves 2 bytes from VDP to CPU or RAM to CPU.

That is a hell of waste of space to just move 2 bytes.

The most it moves in XB is only 4 bytes and the set up in GPL is stupid complicated.

GPL could do exactly the same thing with no speed loss.

  • Like 1
Link to comment
Share on other sites

4 hours ago, RXB said:

Well the issue is this Assembly routine is not faster then GPL as it only moves 2 bytes from VDP to CPU or RAM to CPU.

That is a hell of waste of space to just move 2 bytes.

The most it moves in XB is only 4 bytes and the set up in GPL is stupid complicated.

GPL could do exactly the same thing with no speed loss.

I don't know about the application of it, but it is a loop that will move any number of bytes, specified by BCNT1.

And the Assembler form of a loop like this will spin about 10X faster than the GPL interpreter can spin.

(Maybe GPL passes control to its own loop like this. ?? That's what Forth does.)

This is the kind of loop that makes your new HCHAR run so fast. I remember making a suggestion to Lee to use one extra register, that sped it up about 12%.

But if it is used to only move 1 or 2 bytes it will not make a difference in a program as you say. Not worth it.

  • Like 1
Link to comment
Share on other sites

3 hours ago, oddemann said:

Is this an fix/update of the "standard" Ex Basic?
(That would be cool!)

 

Yes that idea is to replace many of these Assembly routines and if I need one that does it all instead of the 8 different

ones that the Assembly guys did as they did not seem to understand that GPL could do it with less complications.

Easy to see the GPL and Assembly guys did not know what each were doing, hence you get duplicate routines in both

Assembly and GPL but some are no faster than others.

Like XML GREAD is only 2 or 4 bytes so no speed increase overall in Assembly.

This alone gives us 34 of bytes of Assembly space to use for something else.

  • Like 1
Link to comment
Share on other sites

3 hours ago, TheBF said:

I don't know about the application of it, but it is a loop that will move any number of bytes, specified by BCNT1.

And the Assembler form of a loop like this will spin about 10X faster than the GPL interpreter can spin.

(Maybe GPL passes control to its own loop like this. ?? That's what Forth does.)

This is the kind of loop that makes your new HCHAR run so fast. I remember making a suggestion to Lee to use one extra register, that sped it up about 12%.

But if it is used to only move 1 or 2 bytes it will not make a difference in a program as you say. Not worth it.

I know XB and it only calls XML GREAD or XML GREAD1 5 times and all but 4 times it calls it to do 2 bytes, only 1 time does it move 4 bytes.

That is a hell of waste for 34 bytes and does not return any speed increase as GPL has to fill 5 different variables to use it, this is inefficient.

Yes Assembly is faster but those variables are moved and copied for Assmebly so any speed increase is lost due to GPL having to set it up.

This is like using dirt from a second hole to fill the first one when all you needed was 1 hole not 2.

  • Like 1
Link to comment
Share on other sites

36 minutes ago, RXB said:

I know XB and it only calls XML GREAD or XML GREAD1 5 times and all but 4 times it calls it to do 2 bytes, only 1 time does it move 4 bytes.

That is a hell of waste for 34 bytes and does not return any speed increase as GPL has to fill 5 different variables to use it, this is inefficient.

Yes Assembly is faster but those variables are moved and copied for Assembly so any speed increase is lost due to GPL having to set it up.

This is like using dirt from a second hole to fill the first one when all you needed was 1 hole not 2.

 

Not to mention that GREAD code is on the 8-bit bus, whereas DST code is on the 16-bit bus (though the code looks more complicated).

 

...lee

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

1 minute ago, Lee Stewart said:

 

Not to mention that GREAD code is on the 8-bit bus, whereas DST code is on the 16-bit bus (though the code looks more complicated).

 

...lee

Yea this is why GPL MOVE command can sometimes outrun an Assembly version in XB ROM as setting up the Assembly takes more cycles than it is worth.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...