RXB Posted March 3, 2023 Share Posted March 3, 2023 In order to gain some space in XB ROMs I have seen how to eliminate some insanely useless Assembly in XB ROM 1 that is no faster then GPL. In this case it is GRSUB1 in XB GROM 3 that calls the XML GREAD is changed here: *********************************************************** * SUBROUTINE TO READ 2 BYTES OF DATA FROM VDP OR ERAM * (use GREAD) *********************************************************** GRSUB1 FETCH @FAC4 Fetch the source addr on ERAM DST *FAC4,@FAC2 Put it in @FAC2 CZ @RAMTOP If ERAM present BS G6823 DST 2,@FAC4 @FAC4 : Byte count XML GREAD Read data from ERAM * @FAC6 : Destination addr on CP BR G6827 ERAM not exists G6823 DST V*FAC2,@FAC6 Read data from VDP G6827 RTN *********************************************************** RXB NEW PATCH CODE *********************************************************** GRSUB1 FETCH @FAC4 Fetch the source addr on ERAM DST *FAC4,@FAC2 Put it in @FAC2 CZ @RAMTOP If ERAM present BS G6823 DST 2,@FAC4 @FAC4 : Byte count DST *FAC2,@FAC6 Read data from ERAM * @FAC6 : Destination addr on CPU BR G6827 ERAM not exists G6823 DST V*FAC2,@FAC6 Read data from VDP G6827 RTN *********************************************************** Now this is the Assembly in XB ROM 1 that is to be removed as it is not any faster than DST *FAC2,@FAC6 vs XML GREAD 3943 7EA6 AORG >7EA6 99/4 ASSEMBLER GREADS PAGE 0091 3945 3946 * (RAM to RAM) 3947 * Read data from ERAM 3948 * @GSRC : Source address on ERAM 3949 * @DEST : Destination address in CPU 3950 * Where the data stored after read from ERAM 3951 * @BCNT3 : byte count 3952 7EA6 0203 GREAD1 LI R3,BCNT3 # of bytes to move 7EA8 8356 3953 7EAA 0202 LI R2,GSRC Source in ERAM 7EAC 8354 3954 7EAE 0201 LI R1,DEST Destination in CPU 7EB0 8358 3955 7EB2 1006 JMP GRZ1 Jump to common routine 3956 * Read data from ERAM to CPU 3957 * @ADDR1 : Source address on ERAM 3958 * @ADDR2 : Destination address in CPU 3959 * Where the data stored after read from ERAM 3960 * @BCNT1 : byte count 3961 7EB4 0203 GREAD LI R3,BCNT1 # of bytes to move 7EB6 834E 3962 7EB8 0202 LI R2,ADDR1 Source in ERAM 7EBA 834C 3963 7EBC 0201 LI R1,ADDR2 Destination in CPU 7EBE 8350 3964 * Common ERAM to CPU transfer routine 3965 7EC0 C112 GRZ1 MOV *R2,R4 3966 7EC2 DC74 GRZ2 MOVB *R4+,*R1+ Move byte from ERAM to CPU 3967 7EC4 0613 DEC *R3 One less to move, done? 3968 7EC6 16FD JNE GRZ2 No, copy the rest 3969 7EC8 045B RT 3970 ************************************************************ There are a few other Assembly routines in XB ROM 1 that are no faster than GPL commands or same speed but take up way more to set up vs GPL. I am looking at maybe as much as almost 1K of assembly in XB ROM 1 that is much less efficient than GPL CALL MOVE command that only takes 7 bytes. But is no slower than the above assembly routine due to all the time to set up just to use assembly will benefit more overall. As example in GPL CALL MOVE @BCNT1,*ADDR1,*ADDR2 only takes 7 bytes and does not require reloading variables to make a call to make it work. So 7 bytes vs 34 bytes but they do the same thing at same amount of time. I still have to do some timing checks but I do need room to add INTEGERs to XB to go with Floating Point too. 5 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 3, 2023 Share Posted March 3, 2023 I am thinking the code for GREAD looks very similar to what you showed. There are not too many different ways to read bytes from VDP RAM. 3965 7EC0 C112 GRZ1 MOV *R2,R4 3966 7EC2 DC74 GRZ2 MOVB *R4+,*R1+ Move byte from ERAM to CPU 3967 7EC4 0613 DEC *R3 One less to move, done? 3968 7EC6 16FD JNE GRZ2 No, copy the rest 3969 7EC8 045B RT If you have room you could make a 2nd slightly faster one for an even number of bytes, by reading two bytes in row inside the loop and use DECT on the count. It could also be a bit faster if R3 contained the actual count and not a pointer to the count. Something like this. ?? GREAD LI R3,BCNT1 LI R2,ADDR1 Source in ERAM LI R1,ADDR2 Destination in CPU GRZ3 MOV *R2,R4 MOV *R3,R3 # of bytes to move in R3 not address of BCNT1 GRZ4 MOVB *R4+,*R1+ Move 1st byte from ERAM to CPU MOVB *R4+,*R1+ Move 2nd byte from ERAM to CPU DECT R3 decr. counter by two JNE GRZ3 RT 3 Quote Link to comment Share on other sites More sharing options...
RXB Posted March 3, 2023 Author Share Posted March 3, 2023 6 hours ago, TheBF said: I am thinking the code for GREAD looks very similar to what you showed. There are not too many different ways to read bytes from VDP RAM. 3965 7EC0 C112 GRZ1 MOV *R2,R4 3966 7EC2 DC74 GRZ2 MOVB *R4+,*R1+ Move byte from ERAM to CPU 3967 7EC4 0613 DEC *R3 One less to move, done? 3968 7EC6 16FD JNE GRZ2 No, copy the rest 3969 7EC8 045B RT If you have room you could make a 2nd slightly faster one for an even number of bytes, by reading two bytes in row inside the loop and use DECT on the count. It could also be a bit faster if R3 contained the actual count and not a pointer to the count. Something like this. ?? GREAD LI R3,BCNT1 LI R2,ADDR1 Source in ERAM LI R1,ADDR2 Destination in CPU GRZ3 MOV *R2,R4 MOV *R3,R3 # of bytes to move in R3 not address of BCNT1 GRZ4 MOVB *R4+,*R1+ Move 1st byte from ERAM to CPU MOVB *R4+,*R1+ Move 2nd byte from ERAM to CPU DECT R3 decr. counter by two JNE GRZ3 RT Well the issue is this Assembly routine is not faster then GPL as it only moves 2 bytes from VDP to CPU or RAM to CPU. That is a hell of waste of space to just move 2 bytes. The most it moves in XB is only 4 bytes and the set up in GPL is stupid complicated. GPL could do exactly the same thing with no speed loss. 1 Quote Link to comment Share on other sites More sharing options...
oddemann Posted March 3, 2023 Share Posted March 3, 2023 Is this an fix/update of the "standard" Ex Basic? (That would be cool!) Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 3, 2023 Share Posted March 3, 2023 4 hours ago, RXB said: Well the issue is this Assembly routine is not faster then GPL as it only moves 2 bytes from VDP to CPU or RAM to CPU. That is a hell of waste of space to just move 2 bytes. The most it moves in XB is only 4 bytes and the set up in GPL is stupid complicated. GPL could do exactly the same thing with no speed loss. I don't know about the application of it, but it is a loop that will move any number of bytes, specified by BCNT1. And the Assembler form of a loop like this will spin about 10X faster than the GPL interpreter can spin. (Maybe GPL passes control to its own loop like this. ?? That's what Forth does.) This is the kind of loop that makes your new HCHAR run so fast. I remember making a suggestion to Lee to use one extra register, that sped it up about 12%. But if it is used to only move 1 or 2 bytes it will not make a difference in a program as you say. Not worth it. 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted March 3, 2023 Author Share Posted March 3, 2023 3 hours ago, oddemann said: Is this an fix/update of the "standard" Ex Basic? (That would be cool!) Yes that idea is to replace many of these Assembly routines and if I need one that does it all instead of the 8 different ones that the Assembly guys did as they did not seem to understand that GPL could do it with less complications. Easy to see the GPL and Assembly guys did not know what each were doing, hence you get duplicate routines in both Assembly and GPL but some are no faster than others. Like XML GREAD is only 2 or 4 bytes so no speed increase overall in Assembly. This alone gives us 34 of bytes of Assembly space to use for something else. 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted March 3, 2023 Author Share Posted March 3, 2023 3 hours ago, TheBF said: I don't know about the application of it, but it is a loop that will move any number of bytes, specified by BCNT1. And the Assembler form of a loop like this will spin about 10X faster than the GPL interpreter can spin. (Maybe GPL passes control to its own loop like this. ?? That's what Forth does.) This is the kind of loop that makes your new HCHAR run so fast. I remember making a suggestion to Lee to use one extra register, that sped it up about 12%. But if it is used to only move 1 or 2 bytes it will not make a difference in a program as you say. Not worth it. I know XB and it only calls XML GREAD or XML GREAD1 5 times and all but 4 times it calls it to do 2 bytes, only 1 time does it move 4 bytes. That is a hell of waste for 34 bytes and does not return any speed increase as GPL has to fill 5 different variables to use it, this is inefficient. Yes Assembly is faster but those variables are moved and copied for Assmebly so any speed increase is lost due to GPL having to set it up. This is like using dirt from a second hole to fill the first one when all you needed was 1 hole not 2. 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted March 3, 2023 Share Posted March 3, 2023 36 minutes ago, RXB said: I know XB and it only calls XML GREAD or XML GREAD1 5 times and all but 4 times it calls it to do 2 bytes, only 1 time does it move 4 bytes. That is a hell of waste for 34 bytes and does not return any speed increase as GPL has to fill 5 different variables to use it, this is inefficient. Yes Assembly is faster but those variables are moved and copied for Assembly so any speed increase is lost due to GPL having to set it up. This is like using dirt from a second hole to fill the first one when all you needed was 1 hole not 2. Not to mention that GREAD code is on the 8-bit bus, whereas DST code is on the 16-bit bus (though the code looks more complicated). ...lee 1 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted March 3, 2023 Author Share Posted March 3, 2023 1 minute ago, Lee Stewart said: Not to mention that GREAD code is on the 8-bit bus, whereas DST code is on the 16-bit bus (though the code looks more complicated). ...lee Yea this is why GPL MOVE command can sometimes outrun an Assembly version in XB ROM as setting up the Assembly takes more cycles than it is worth. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.