HOME AUTOMATION Posted January 1, 2023 Share Posted January 1, 2023 59 minutes ago, SteveB said: ...the write command comes from >6112 E/A has no >6112, unless you meant in GROM? Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted January 1, 2023 Share Posted January 1, 2023 1 hour ago, SteveB said: I was able to activate the Multicolor mode. A tutorial suggested to use >0000 to >02FF for the Screen Image table and >0800 to >0FFF for the pattern table. This seems to work, but something is writing data to >0820 to >0822 (literal 'KEY') and >0960 to >096F (constantly updated), destroying my screen. All examples I found seem to solve this with LIMI 0. I want to provide routines for XB, so I need to keep LIMI 2. Any hints on what I need to do to relocate those memory blocks? PS: Still no clue on the >0820, but the >0960 seems to be a PAB (Peripheral Access Block), the write command comes from >6112 ... a DSR? How can I tell the system not to use this area for PABs? Well, >0820 is the XB Crunch Buffer and >0960 is the Value Stack. You can change the location of the Value Stack by changing the address in >836E. I believe that area needs to be large enough for 4 floating point numbers (4x8=32 bytes). I have no idea whether you can change the Crunch Buffer address. Others will know this better than I (@RXB, @senior_falcon). I think your best bet, however, is to save/restore areas you cannot live without using. If you want Multicolor Mode to persist upon return to XB, you will need to change quite a few pointers, if I am not mistaken. If it is only to persist while in your code, there is certainly no reason to enable interrupts until you return to XB. Even then, you can test the status word upon entry to your code to see whether to enable interrupts at return time. ...lee Quote Link to comment Share on other sites More sharing options...
HOME AUTOMATION Posted January 1, 2023 Share Posted January 1, 2023 Alright then: The Videoprocessor RAM(link) 2 Quote Link to comment Share on other sites More sharing options...
SteveB Posted January 1, 2023 Share Posted January 1, 2023 Puuh .. this gets tricky... I should have sticked with "hello world" for my first assembler project.. Screen image table This table contains the code of the caracter to be displayed for each screen position. In all video modes except text mode, the screen is 24 lines x 32 columns, so the table is >0300 bytes in length. It must be aligned on a >0400-byte boundary. Character pattern table This table contains the bit pattern to be displayed for each character code. It must be aligned on a >800-byte boundary. It is >1800 bytes long in bitmap mode, >800 bytes in all other modes. Multicolor needs 192 characters, so more than there are in XB with the overlap of screen image and pattern table. I can't use >0000 to >07FF either for the patterns, as >0370 is also XB reserved memory. I may keep >0000 for the screen image, but need to use a higher pattern area, >1000, >1800, >2000 ? When I use >1000 to >1800, can I just update >8324, which is now pointing to VRAM >0985? Just like >836E ? Do I need to copy the values from the stack to a new location. I need "only" >0600 bytes for 192 characters, so the stack could start at >1600. 1 Quote Link to comment Share on other sites More sharing options...
SteveB Posted January 1, 2023 Share Posted January 1, 2023 53 minutes ago, Lee Stewart said: If you want Multicolor Mode to persist upon return to XB, you will need to change quite a few pointers, if I am not mistaken. Right now Multicolor remains active while waiting for the keyboard in line 130, then the screen gets back automatically to XB and Graphics Mode. I was surprised to see this happen... 100 CALL CLEAR :: CALL SCREEN(2) 110 CALL LOAD("DSK4.MCOLOR.OBJ") 120 CALL LINK("MCOLOR") 130 CALL KEY(3,K,S) :: IF S=0 THEN 130 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted January 1, 2023 Share Posted January 1, 2023 1 hour ago, Lee Stewart said: Well, >0820 is the XB Crunch Buffer and >0960 is the Value Stack. You can change the location of the Value Stack by changing the address in >836E. I believe that area needs to be large enough for 4 floating point numbers (4x8=32 bytes). I have no idea whether you can change the Crunch Buffer address. Others will know this better than I (@RXB, @senior_falcon). I think your best bet, however, is to save/restore areas you cannot live without using. If you want Multicolor Mode to persist upon return to XB, you will need to change quite a few pointers, if I am not mistaken. If it is only to persist while in your code, there is certainly no reason to enable interrupts until you return to XB. Even then, you can test the status word upon entry to your code to see whether to enable interrupts at return time. ...lee The best way to avoid all these problems is just move the Screen Image Table to another location. Move the VDP Stack above that location so variables/open files do not overwrite that new Screen Image Table. (Problem solved) I would use CALL POKER to move the Screen Image Table to >0958 the VDP Stack and move the VDP Stack higher up above that location. And no Crunch Buffer is hard coded into the ROMs and GROMs Lee. It took me a year to figure out how to move the VDP Stack so it would not crash XB when you restart and retain the new VDP Stack address. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted January 1, 2023 Share Posted January 1, 2023 29 minutes ago, SteveB said: When I use >1000 to >1800, can I just update >8324, which is now pointing to VRAM >0985? Just like >836E ? Do I need to copy the values from the stack to a new location. I need "only" >0600 bytes for 192 characters, so the stack could start at >1600. The Value Stack is a temporary stack for floating point calculations. When you make a CALL to your code, both >8324 (base of Value Stack) and >836E (top of Value Stack) should be pointing to the same address, which means the stack is not in use and no need to save/restore it. ...lee Quote Link to comment Share on other sites More sharing options...
HOME AUTOMATION Posted January 1, 2023 Share Posted January 1, 2023 30 minutes ago, SteveB said: I was surprised to see this happen... That is likely the value from >83D4, being reasserted. Works sort of like a ghost in the system.👻 Quote Link to comment Share on other sites More sharing options...
RXB Posted January 1, 2023 Share Posted January 1, 2023 1 hour ago, Lee Stewart said: The Value Stack is a temporary stack for floating point calculations. When you make a CALL to your code, both >8324 (base of Value Stack) and >836E (top of Value Stack) should be pointing to the same address, which means the stack is not in use and no need to save/restore it. ...lee Any time you use a Variable or a CALL the VDP Stack in XB is used to push and pop onto the stack the name or address being modified or used. This is why programs lock up as the VDP Stack is repeating the same command as it never popped off the top of stack so just does a repeat forever. Quote Link to comment Share on other sites More sharing options...
SteveB Posted January 1, 2023 Share Posted January 1, 2023 3 hours ago, Lee Stewart said: The Value Stack is a temporary stack for floating point calculations. When you make a CALL to your code, both >8324 (base of Value Stack) and >836E (top of Value Stack) should be pointing to the same address, which means the stack is not in use and no need to save/restore it. ...lee When I use >1000 as base for my character pattern table the stack can grow from >0958 to >0FFF ... this is >06A9 bytes, dec. 1704 ... a waste of VRAM but I hope the stack will always fit. Can I "reserve" the VRAM somehow that it does not get currupted by a downwards growing String-Table etc? My pattern table ends at >15FF, the first free address is >1600. Quote Link to comment Share on other sites More sharing options...
RXB Posted January 1, 2023 Share Posted January 1, 2023 22 minutes ago, SteveB said: When I use >1000 as base for my character pattern table the stack can grow from >0958 to >0FFF ... this is >06A9 bytes, dec. 1704 ... a waste of VRAM but I hope the stack will always fit. Can I "reserve" the VRAM somehow that it does not get currupted by a downwards growing String-Table etc? My pattern table ends at >15FF, the first free address is >1600. Yea if in RXB you use CALL VDPSTACK(5632) it will set VDP Stack after your Pattern Table at >1600 and Strings or Files will go from top of VRAM down to VDP Stack. In RXB you can even restart XB with CALL XB and run as many programs as you want with no problems. Only when you go back to Title Screen will this reset VDP Stack Locations, but that would also restart Graphics mode too. So I would leave normal graphics for XB where they are and switch back to your graphics mode in the XB programs. That way you can switch back and forth and not crash XB ever. (This is pretty much what TML does.) Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted January 1, 2023 Share Posted January 1, 2023 2 minutes ago, SteveB said: When I use >1000 as base for my character pattern table the stack can grow from >0958 to >0FFF ... this is >06A9 bytes, dec. 1704 ... a waste of VRAM but I hope the stack will always fit. That is enough space for 212 floating point (FP) values! XB goes out of its way to keep the stack small. The XB random number generator, which is pretty complex, uses only 4 stack positions (each FP value is 8 bytes). XB should not need a Value Stack anywhere near that large. ...lee 2 Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted January 2, 2023 Share Posted January 2, 2023 Here is the source code for an early version of T80XB. It shows how to reserve additional space in the VDP RAM and how to return to the graphics mode when returning to XB. You should be able to adapt this technique so you can use the multicolor mode with XB. This method is how The Missing Link, T40XB, T80XB and XB256 work. Spoiler *TEST OF 80 COLUMN MODE FOR F18A by Harry Wilhelm, February 2017 *There are a few things that must happen to use the 80 column text mode (T80) within the *TI Extended BASIC environment. I chose to use these VDP memory locations for the screen image table and *the character pattern table: *V1000 is the start of the screen image table (1920 or >780 bytes) - so the stack now must start at V1780 *and not V0958 as with normal XB *V0A00 is the start of the pattern table - the pattern for ASCII 32 (space) begins at >0A00 *This means an offset like in XB, but instead of >60 it is now >20 *If you wanted to have a second set of characters with inverse video for hilighting text they should *begin at >D000 and that would require a screen offset of >80 to display them. *All this means that all the XB tables, crunch buffer, edit/recall buffer are left untouched by the new *T80 routines. Also, the two modes are totally independant and changing one screen has no effect on the other. *The magic that makes it happen is an interrupt driven routine (MONITR) that keeps track of what is happening *with XB. At program startup, normally XB uses >0958 as the top of the stack. These routines need to use *>1780 as the top of the stack. The MONITR looks in certain memory locations to see if the top of the *stack has been reset to >0958 by a "garbage collection". If so, then it will change the pointers to the top *of the stack back to >1780. *MONITR also keeps track of whether a program is running or not running. When a program starts to *run it will look to see why - was it CON that made the program start, or was it RUN. If CON then was the *program running in the T80 mode? If so then it sets the registers for T80. When a program breaks the *registers are automatically reset to the normal XB ones. DEF T80ON DEF T80 DEF G32 DEF DISPLY DEF CLS80 DEF TCOLOR FAC EQU >834A \ NUMASG EQU >2008 NUMREF EQU >200C STRASG EQU >2010 STRREF EQU >2014 XMLLNK EQU >2018 KSCAN EQU >201C EXTENDED BASIC EQUATES VDPWA EQU >8C02 VDPWD EQU >8C00 VDPRD EQU >8800 VSBR EQU >2028 VWTR EQU >2030 ERR EQU >2034 VMBW EQU >2024 VMBR EQU >202C ********************************* VSBW SWPB R0 MOVB R0,@>8C02 bl routine, set up same as SWPB R0 ORI R0,>4000 blwp @vsbw MOVB R0,@>8C02 to run faster MOVB R1,@>8C00 ANDI R0,>BFFF B *R11 ******************************************************************************* *GPLLNK AND DSRLINK FROM THE SMART PROGRAMMER ******************************************************************************** GPLWS EQU >83E0 GR4 EQU GPLWS+8 GR6 EQU GPLWS+12 LDGADD EQU >60 XTAB27 EQU >200E GETSTK EQU >166C GPLLNK DATA GLNKWS DATA GLINK1 RTNAD DATA XMLRTN GXMLAD DATA >176C DATA >50 GLNKWS EQU $->18 BSS >08 GLINK1 MOV *R11,@GR4 MOV *R14+,@GR6 MOV @XTAB27,R12 MOV R9,@XTAB27 LWPI GPLWS BL *R4 MOV @GXMLAD,@>8302(R4) INCT @>8373 B @LDGADD XMLRTN MOV @GETSTK,R4 BL *R4 LWPI GLNKWS MOV R12,@XTAB27 RTWP **************************************************** T80SCN EQU >1000 WKSP BSS 32 BLWPWS BSS 32 MONWS BSS 32 BUFFER BSS 256 SCRBUF BSS 80 **************************** CONTXT DATA >A3C3,>AFCF,>AECE CcOoNn with offset HX06B0 DATA >06B0 HX0870 DATA >0870 BOTSTK DATA >1780 1000+24*80 bottom of stack (>1000 to >177F is table for 80 column screen) HX0958 DATA >0958 HX8080 DATA >8080 DEC80 DATA 80 **************************************************** *INTERRUPT ROUTINE BELOW *MONITR flags: *R5 is a copy of the last XB run flag from >8344 *R6 is whether program was in G32 or T80 mode 0=G32 >ffff = T80 MONITR LWPI MONWS *this section looks to see if there has been a garbage collection *if there has been, then we need to reset from >0958 to >0F80 (BOTSTK) C @>836E,@HX0958 a quick check to see if garbage collection happened - no slow VDP JNE MONITG if NE then a garbage collection has not just happened *garbage collection has happened or is happening, need to check further to see if it is finished LI R0,>0388 LI R1,MONWS+6 LI R2,2 BLWP @VMBR read two bytes from V0388 into R3 C R3,@HX0958 JNE MONITG not done with garbage collection *done with garbage collection, so reset pointers MOV @BOTSTK,R3 >1780 to R3 BLWP @VMBW write 2 bytes from R3 to V0388 MOV R3,@>8324 MOV R3,@>836E *at this point have reset pointers so the bottom of the stack is where we want them ******************************************************************* MONITG CB @>8344,R5 >8344 is the run flag for XB. 0 if not running, >ffff if running JNE RCHANG if NE then there has been a change in RUN flag MONBK LWPI >83E0 go back B *R11 *run state has changed if program gets here RCHANG MOVB @>8344,R5 store the new run state JEQ MONBK IF =0 then prog is not running, nothing to do so go back *program just started this cycle, need to see if it is from RUN or from CON *To do this we copy 16 bytes from the crunch buffer in VDP and look for CON or con NEWRUN LI R0,>08C0 start of crunch buffer LI R1,BUFFER LI R2,16 BLWP @VMBR read 16 bytes from the crunch buffer, will look for CON *get rid of leading spaces if any MONI2A CB *R1,@HX8080 a space? JNE MONI2B INC R1 JMP MONI2A *leading spaces are all gone when we get here MONI2B LI R2,CONTXT "CcOoNn" with offset MONI2C CB *R1,*R2+ JEQ MONI2D CB *R1,*R2 JNE NOTCON not con MONI2D INC R1 INC R2 CI R2,CONTXT+6 JNE MONI2C not a match for con *CON was found in edit/recall buffer BP1 MOV R6,R6 did program break in T80 or G32 mode JEQ MONBK if eq then program broke in graphics mode, just go back LI R3,T80REG broke in T80 mode, so set T80 registers BL @REGSET JMP MONBK *A new RUN below (i.e. not a CON) NOTCON CLR R6 program always starts in the graphics mode so clear T80 flag CLR @CHLFLG Flag that tells whether T80 characters were loaded JMP MONBK ************************************ *T80ON sets up interrupt routine for the 80 column mode in F18A *CALL LINK("T80ON") ********************************** T80ON LWPI WKSP CLR @MONWS+10 MOVB @>8344,@MONWS+10 Move the run state byte into MSB R5 of monitor workspace CLR @MONWS+12 flag for graphics mode =0 LI R0,MONITR MOV R0,@>83C4 start up MONITR BK2XB LWPI >83E0 return to XB B @>006A ************************************************************************* G32 LWPI WKSP CLR @MONWS+12 R6 of monitor workspace. Set to G32 LI R3,G32REG JMP TX80A *********************************************** CHLFLG DATA 0 characters loaded flag 0 if not loaded, -1 if loaded T80 LWPI WKSP MOV @CHLFLG,R0 JNE CHSETY if not 0 then characters already loaded so do not reload them or clear screen BL @CL80SB clears the screen for 80 column display CLR @>83C4 turn off interrupt routine so GPLLNK can happen without trouble *load character set below SETO @CHLFLG set characters loaded flag LI R0,>09FF table starts at >0A00 but we load them 1 pixel higher for better lower case MOV R0,@FAC BLWP @GPLLNK DATA >0018 load small capitals AI R0,>0200 MOV R0,@FAC BLWP @GPLLNK DATA >004A lower case letters LI R0,MONITR MOV R0,@>83C4 restart interrupt routine (this could be put into gpllnk instead) LI R1,LCDEFS lower case definitions CHSETZ MOV *R1+,R2 length to R2 JEQ CHSETY done if length=0 MOV *R1+,R0 BLWP @VMBW A R2,R1 JMP CHSETZ CHSETY SETO @MONWS+12 R6 of monitor workspace. If 0 then in g32 mode; if >ffff then in t80 mode LI R3,T80REG TX80A BL @REGSET load T80 registers JMP BK2XB and return to XB ******************************* CLS80 LWPI WKSP clears the 80 column screen like CALL CLEAR BL @CL80SB JMP BK2XB *sub below is used by CLS80 and by T80 CL80SB LI R7,>0050 bytes are swapped (5000) start writing at vdp >1000 MOVB R7,@VDPWA LI R1,>4000 a space (with offset of >20) into MSB of R1 SWPB R7 MOVB R7,@VDPWA LI R8,80*24 CL80LP MOVB R1,@VDPWD DEC R8 JNE CL80LP B *R11 *********************************** TCOLOR LWPI WKSP CLR R1 BL @GETNUM gets a number within the limits (1-16 below) and puts in FAC DATA 1 DATA 16 MOV @>834A,R4 foreground color to R4 DEC R4 SLA R4,4 to msn of lsb - i.e. >0004 becomes >0040 BL @GETNUM background color to FAC DATA 1 DATA 16 A @>834A,R4 DEC R4 SWPB R4 LI R3,T80REG R3 points to T80 registers MOVB R4,@9(R3) change the colors in the T80REG table MOV @MONWS+12,R5 check R6 of MONWS to find out if in T80 mode JEQ TCBACK not in T80 BL @REGSET in T80, so set the registers to change color TCBACK JMP BK2XB *************************************** DISPLY LWPI WKSP * MOV @>8312,R8 * SRL R8,8 number of arguments in R8 CLR R1 BL @GETNUM gets a number within the limits (1-25 below) and puts in FAC DATA 1 DATA 48 MOV @>834A,R3 row to R3 DEC R3 now row 1 is row 0 MPY @DEC80,R3 multiply x 80 - now R4 has (ROW-1)*80 BL @GETNUM col to FAC DATA 1 DATA 80 A @>834A,R4 AI R4,>0FFF now R4 points to screen address DISPXX INC R1 next item in list CLR R0 *find out if string or number by looking at >8302. If odd then string; if even then number MOVB @>8302,R8 SRL R8,9 JOC PRSTR odd, so it is a string *print out a number BLWP @NUMREF MOVB R0,@>8355 set to BASIC format by clearing byte at >8355 BLWP @XMLLNK DATA >06 convert number to string LI R2,>8300 AB @>8355,@WKSP+5 now R2 points to start of string MOVB @>8356,R3 length byte to MSB of R3 JMP DISPL1 *print out a string PRSTR LI R2,BUFFER SETO *R2 longest possible string is 255 BLWP @STRREF get the string to print MOVB *R2+,R3 length of string to MSB of R3 SRL R3,8 length of string in to LSB of R3 DISPL1 MOV R4,R0 address on screen into R0 DISPL3 CI R0,>1780 JLT DISPL4 if on screen then print BLWP @SCROLL otherwise scroll AI R0,-80 JMP DISPL3 and check again DISPL4 MOV R3,R3 a null string? JEQ DSPEND if null string then go back DISP5 MOVB *R2+,R1 AI R1,>2000 add in screen offset of >20 BL @VSBW N.B. THIS IS BL, NOT BLWP INC R0 DEC R3 done printing? JNE DISPL3 DSPEND B @BK2XB *********************** SCROLL DATA BLWPWS DATA SCROL1 SCROL1 LI R0,>1050 row 2, column1 LI R1,SCRBUF LI R2,80 LI R3,23 SCLOOP BLWP @VMBR read 80 bytes from a row S R2,R0 subtract 80 BLWP @VMBW write 80 bytes 1 row up AI R0,160 down 2 rows DEC R3 do it 23 times JNE SCLOOP S R2,R0 gets here with R0 at Row 25, column 1, so subtract 80 LI R1,>4000 space with the offset SCLP2 BL @VSBW INC R0 DEC R2 JNE SCLP2 SCRT RTWP ********************************************** GETNUM INC R1 GETNU1 CLR R0 BLWP @NUMREF BLWP @XMLLNK DATA >12B8 CFI C @>834A,*R11+ JLT GTNERR C @>834A,*R11+ JGT GTNERR B *R11 ************************************************ GTNERR LI R0,>1E00 BLWP @ERR ****************************************** REGSET MOVB @3(R3),@>83D4 REGSE MOV *R3+,R0 JLT REGSE1 BLWP @VWTR JMP REGSE REGSE1 B *R11 T80REG DATA >0004,>01F0,>0207,>0401,>07F4,>FFFF G32REG DATA >0000,>01E0,>0200,>0400,>0717,>FFFF ************************ LCDEFS DATA 8,>0B78 DATA >3844,>4444,>4444,>3800 O DATA 8,>0A80 DATA >3C44,>4C54,>6444,>7800 0 DATA 16,>0C08 DATA >0000,>3804,>3C44,>3C00 a DATA >4040,>7844,>4444,>7800 b * DATA >0000,>3C40,>4040,>3C00 c DATA 144,>0C20 DATA >0404,>3C44,>4444,>3C00 d DATA >0000,>3844,>7C40,>3C00 e DATA >1824,>2078,>2020,>2000 f DATA >0000,>3C44,>443C,>0438 g DATA >4040,>7844,>4444,>4400 h DATA >1000,>3010,>1010,>3800 i DATA >0800,>0808,>0808,>4830 j DATA >4040,>4448,>7048,>4400 k DATA >3010,>1010,>1010,>3800 l DATA >0000,>6854,>5454,>5400 m DATA >0000,>5864,>4444,>4400 n DATA >0000,>3844,>4444,>3800 o DATA >0000,>7844,>4478,>4040 p DATA >0000,>3C44,>443C,>0404 q DATA >0000,>5864,>4040,>4000 r DATA >0000,>3C40,>3804,>7800 s DATA >2020,>7820,>2024,>1800 t DATA >0000,>4444,>4444,>3C00 u * DATA >0000,>4444,>2828,>1000 v * DATA >0000,>4454,>5454,>2800 w * DATA >0000,>4428,>1028,>4400 x DATA 8,>0CC8 DATA >0000,>4444,>443C,>0438 y * DATA >0000,>7C08,>1020,>7C00 z DATA 8,>09F0 DATA >7C7C,>7C7C,>7C7C,>7C00 DATA 0 END 6 Quote Link to comment Share on other sites More sharing options...
SteveB Posted January 3, 2023 Share Posted January 3, 2023 (edited) On 12/31/2022 at 11:19 AM, Asmusr said: Yes my latest Mario demo uses the column layout for both VDP RAM and the CPU RAM buffer, which gives a small advantage when transferring the buffer to the VDP compared to the linear layout. But the initial question of whether to store one or two pixels per byte in the buffer is still the more important. I could need some advice on speed, as my Buffer-to-VRAM routine needs 120ms ... a little too much to build a usable library. **************************************************************************** * * CALL MCSYNC - Writes the CPU RAM Buffer to VRAM * **************************************************************************** MCSYNC LI R0,CHRPAT VRAM Adress of CharPat table LI R4,SCRN1 Address of even column pixel LI R5,SCRN1+48 Address of odd column pixel LI R6,48 Row Count until 0 LI R7,64 Columns Count until 0 * Setup VRAM Address LIMI 0 Disable interrupts so no one interferes with VDP SWPB R0 MOVB R0,@VDPWA Write lo-byte of VRAM Address SWPB R0 ORI R0,>4000 Set R/W bits 14 and 15 to 01 MOVB R0,@VDPWA Write hi-byte of VRAM Address LPSYNC CLR R1 Clear R1 for lower byte in shift op MOVB *R4+,R1 read even column byte in MSB MOVB *R5+,R8 read odd column byte in MSB SLA R1,4 shift 4 bits for left nybble SOC R8,R1 merge right nybble out of R8 MOVB R1,@VDPWD Write "character" to VRAM DEC R6 JNE LPSYNC column not done? LI R6,48 Row Count until 0 again AI R4,48 next column ... skip one (odd/even) AI R5,48 DECT R7 JNE LPSYNC not all columns done? LIMI 2 Enable Interrrupts RT My thoughts on this code: I eliminated the column-buffer in CPU RAM and now write byte-by-byte as VRAM Patterntable and my buffer are both column-aligned the column-layout of my SCRN1 buffer forces me to use two MOVB. In row-layout it could be a 16bit word fetch for two adjacent pixels, but would have to add 64 to get to the next line instead of using *R4+ ( 2x20 cylcles vs. 22+14 cylces?). Calculating an address from (x,y) would be faster as well, as multiplying by 48 takes longer than ASL Rx,5 for multiplying with 64. what is the minimal time it takes to transfer 1536 bytes from CPU RAM to VRAM? Should my buffer be identical to the VRAM layout, with all the consequences on ugly pixel-coordinate calculations and nybble updates (read before write)? Clearing my buffer with 3072 bytes takes 60ms ... half the bytes would take half the time. I can't understand why this isn't faster than writing to VRAM. MCCLR2 MOV @PCOLOR,R4 SWPB R4 SOC @PCOLOR,R4 Color in MSB and LSB LI R6,3072 3072 bytes to clear, two at a time (word) LI R1,SCRN1 LPMCCL MOV R4,*R1+ Write two color bytes as one word DECT R6 JNE LPMCCL RT Any hints and thoughts are welcome. Steve Edited January 4, 2023 by SteveB Copy error in MCCLR2 Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 3, 2023 Share Posted January 3, 2023 5 minutes ago, SteveB said: Clearing my buffer with 3072 bytes takes 60ms ... half the bytes would take half the time. I can't understand why this isn't faster than writing to VRAM. Because VRAM is the same speed as any 8-bit RAM access. The only thing that makes access VDP slower is when you need to change the address. Speed up your clear loop by unrolling - four MOV's will give you a good tradeoff between unroll and memory space. The benefit seems to level off at 8. Right now you're spending half your time counting and branching instead of copying data. I don't have too much advice off the cuff for the copy loop. Store the value "48" into a register, will be faster than using immediates in your increment phase. Also store VDPWD into a register for the same reason. Do away with R8 and take advantage of the memory-to-memory architecture (SOC *R5+,R1). That's all I see without data structure changes (and I didn't investigate that). Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 3, 2023 Share Posted January 3, 2023 (edited) 25 minutes ago, SteveB said: the column-layout of my SCRN1 buffer forces me to use two MOVB. In row-layout it could be a 16bit word fetch for two adjacent pixels, but would have to add 64 to get to the next line instead of using *R4+ ( 2x20 cylcles vs. 22+14 cylces?). Calculating an address from (x,y) would be faster as well, as multiplying by 48 takes longer than ASL Rx,5 for multiplying with 64. Thinking about this - there's nothing that says you can't waste a little memory and have 64 byte columns so you can shift instead of multiply by 48. Edited January 3, 2023 by Tursi Quote Link to comment Share on other sites More sharing options...
Asmusr Posted January 3, 2023 Author Share Posted January 3, 2023 (edited) I combine pixels from two rows at the same time: mov *r4+,r1 ; Even pixel sla r1,4 ; Shift to high nybble soc *r5+,r1 ; Odd pixel movb r1,*r15 ; Write to VDP movb *r6,*r15 ; Write low byte to VDP Edit: r6 contains the address of r1 low byte. r15 contains VDPWD. Edited January 3, 2023 by Asmusr Quote Link to comment Share on other sites More sharing options...
+TheBF Posted January 3, 2023 Share Posted January 3, 2023 For a speed reference I just tried writing 3072 bytes (C00) and timing with the 9901 timer it took 1128 uS ( 1.128 mS) This includes pulling 2 parameters off the stack into registers and setting the VDP address with a sub-routine call. Quote Link to comment Share on other sites More sharing options...
SteveB Posted January 3, 2023 Share Posted January 3, 2023 2 hours ago, Tursi said: Thinking about this - there's nothing that says you can't waste a little memory and have 64 byte columns so you can shift instead of multiply by 48. If the 99er has something abundantly it is memory ... 🤣 The 8kB lower memory would have this additional 1kB indeed, but the blanking of the memory would be more complicated. Either I stop at 48 bytes per column and move to the next or I have 16 bytes more to blank for each column. Both may kill any gain. Quote Link to comment Share on other sites More sharing options...
SteveB Posted January 3, 2023 Share Posted January 3, 2023 3 hours ago, Asmusr said: I combine pixels from two rows at the same time: mov *r4+,r1 ; Even pixel sla r1,4 ; Shift to high nybble soc *r5+,r1 ; Odd pixel movb r1,*r15 ; Write to VDP movb *r6,*r15 ; Write low byte to VDP Edit: r6 contains the address of r1 low byte. r15 contains VDPWD. Your code is more compact than mine and you use registers only, which is four cycles faster than Label access... As I preserve the XB context I am not as free with the Scratchpad as you in your 2014 demo, but I think I will get this somehow with STWP dynamicly. Thank you, this looks promissing! Quote Link to comment Share on other sites More sharing options...
HOME AUTOMATION Posted January 3, 2023 Share Posted January 3, 2023 I think that the registers(R1,R2,R3...), are also LABELS. Maybe you mean Symbolic Memory Addressing. Quote Link to comment Share on other sites More sharing options...
SteveB Posted January 3, 2023 Share Posted January 3, 2023 3 hours ago, TheBF said: For a speed reference I just tried writing 3072 bytes (C00) and timing with the 9901 timer it took 1128 uS ( 1.128 mS) This includes pulling 2 parameters off the stack into registers and setting the VDP address with a sub-routine call. I try to build a XB library, so my measures are always done with the CALL LINK command ... which attributes to the slowness ... a CALL without parameters take about 21ms, with 4 parameters 60ms, which does not change using RXB or XB 2.9. Any usefull usage in games etc. will require compiling. When substracting 20ms for the plain call, both routines aren't that bad. With the proposed changes I see some potential. But writing 3kB in a little over 1/1000 of a second? Let's say we unroll all MOVs, no overhead for a loop. MOV R1,*R2+ MOV R1,*R2+ MOV R1,*R2+ (1536 times) According to the Data Sheet Page 28f, each line takes Cycles: 14 + 8 Memory Access: 4 + 2 Waitstates: 8 for 8 Bit CPU RAM T = tc * (C + W * M) = 0.333 uS * (22 + 6 * 8 ) = 0,333 *70 uS = 23,3 uS for 3kB: 1536 * 23,3uS = 35,804 uS = 35.8 mS ... without any overhead. Or have I gotten something wrong? 1 Quote Link to comment Share on other sites More sharing options...
SteveB Posted January 4, 2023 Share Posted January 4, 2023 4 hours ago, Tursi said: Speed up your clear loop by unrolling - four MOV's will give you a good tradeoff between unroll and memory space. I tried 4 and 8 MOVs ... 52ms vs. 50ms. Taking into account the 20ms vor the CALL LINK and 5ms for the handling of one parameter, it is 25 vs. 27 vs. 35 without unrolling... and faster than my theoretical speed from above !?! Is the 0,333uS not valid for the 99/4a or do I have a fundamental missunderstanding of the tables in the data sheet? Quote Link to comment Share on other sites More sharing options...
+TheBF Posted January 4, 2023 Share Posted January 4, 2023 You caught me. I use the wrong parameters for the VWRITE function. Using a low res timer based on the interrupt it shows 30mS here . Mea culpa Quote Link to comment Share on other sites More sharing options...
+TheBF Posted January 4, 2023 Share Posted January 4, 2023 To Double check here is the timing from Classic99 Debugger \ get parameters from Forth A7C4 C004 mov R4,R0 (18) DUP byte count to R0 A7C6 C136 mov *R6+,R4 (30) Pop VDP address A7C8 C0B6 mov *R6+,R2 (30) Pop CPU RAM address \ test if the byte count is not zero A7CA C000 mov R0,R0 (18) A7CC 1309 jeq >a7e0 (12) \ set VDP address A7CE 06A0 bl @>a73e (32) A73E A73E 0264 ori R4,>4000 (22) 4000 A742 02A1 stwp R1 (12) A744 0300 limi >0000 (24) 0000 A748 D821 movb @>0009(R1),@>8c02 (50) 0009 8C02 A74E C804 mov R4,@>8c02 (38) 8C02 A752 045B b *R11 (20) \ set R3 to the write port. This improves speed by ~12% A7D2 0203 li R3,>8c00 (20) 8C00 \ write the bytes A7D6 D4F2 movb *R2+,*R3 (40) A7D8 0600 dec R0 (14) A7DA 16FD jne >a7d6 (14) \ enable interrupts A7DC 0300 limi >0002 0002 \ refill top of stack cache register A7E0 C136 mov *R6+,R4 \ return to Forth A7E2 045A b *R10 So the loop is taking 40+14+14= 68 cycles times .333 uS = 22.64 uS 22.64 * 3072 = 69,562uS = 69mS So my interrupt based elapsed timer is not accurate with interrupts being off so much in this code. No surprise there I guess. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.