Jump to content
IGNORED

Multicolor mode - the mode everybody wants


Asmusr

Recommended Posts

1 hour ago, SteveB said:

I was able to activate the Multicolor mode. A tutorial suggested to use >0000 to >02FF for the Screen Image table and >0800 to >0FFF for the pattern table.

 

This seems to work, but something is writing data to >0820 to >0822 (literal 'KEY') and >0960 to >096F (constantly updated), destroying my screen. All examples I found seem to solve this with LIMI 0. I want to provide routines for XB, so I need to keep LIMI 2. 

 

Any hints on what I need to do to relocate those memory blocks?

 

PS: Still no clue on the >0820, but the >0960 seems to be a PAB (Peripheral Access Block), the write command comes from >6112 ... a DSR? How can I tell the system not to use this area for PABs?

 

Well, >0820 is the XB Crunch Buffer and >0960 is the Value Stack. You can change the location of the Value Stack by changing the address in >836E. I believe that area needs to be large enough for 4 floating point numbers (4x8=32 bytes). I have no idea whether you can change the Crunch Buffer address. Others will know this better than I (@RXB, @senior_falcon). I think your best bet, however, is to save/restore areas you cannot live without using.

 

If you want Multicolor Mode to persist upon return to XB, you will need to change quite a few pointers, if I am not mistaken. If it is only to persist while in your code, there is certainly no reason to enable interrupts until you return to XB. Even then, you can test the status word upon entry to your code to see whether to enable interrupts at return time.

 

...lee

Link to comment
Share on other sites

Puuh .. this gets tricky... I should have sticked with "hello world" for my first assembler project..

 

Screen image table
This table contains the code of the caracter to be displayed for each screen position. In all video modes except text mode, the screen is 24 lines x 32 columns, so the table is >0300 bytes in length. It must be aligned on a >0400-byte boundary.

Character pattern table
This table contains the bit pattern to be displayed for each character code. It must be aligned on a >800-byte boundary. It is >1800 bytes long in bitmap mode, >800 bytes in all other modes.

 

Multicolor needs 192 characters, so more than there are in XB with the overlap of screen image and pattern table. I can't use >0000 to >07FF either for the patterns, as >0370 is also XB reserved memory. I may keep >0000 for the screen image, but need to use a higher pattern area, >1000, >1800, >2000 ?

 

When I use >1000 to >1800, can I just update >8324, which is now pointing to VRAM >0985? Just like >836E ? Do I need to copy the values from the stack to a new location. I need "only" >0600 bytes for 192 characters, so the stack could start at >1600.

 

 

  • Like 1
Link to comment
Share on other sites

53 minutes ago, Lee Stewart said:

If you want Multicolor Mode to persist upon return to XB, you will need to change quite a few pointers, if I am not mistaken.

Right now Multicolor remains active while waiting for the keyboard in line 130, then the screen gets back automatically to XB and Graphics Mode. I was surprised to see this happen...

 

100 CALL CLEAR :: CALL SCREEN(2)
110 CALL LOAD("DSK4.MCOLOR.OBJ")				
120 CALL LINK("MCOLOR")  
130 CALL KEY(3,K,S) :: IF S=0 THEN 130

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, Lee Stewart said:

 

Well, >0820 is the XB Crunch Buffer and >0960 is the Value Stack. You can change the location of the Value Stack by changing the address in >836E. I believe that area needs to be large enough for 4 floating point numbers (4x8=32 bytes). I have no idea whether you can change the Crunch Buffer address. Others will know this better than I (@RXB, @senior_falcon). I think your best bet, however, is to save/restore areas you cannot live without using.

 

If you want Multicolor Mode to persist upon return to XB, you will need to change quite a few pointers, if I am not mistaken. If it is only to persist while in your code, there is certainly no reason to enable interrupts until you return to XB. Even then, you can test the status word upon entry to your code to see whether to enable interrupts at return time.

 

...lee

The best way to avoid all these problems is just move the Screen Image Table to another location.

Move the VDP Stack above that location so variables/open files do not overwrite that new Screen Image Table. (Problem solved)

I would use CALL POKER to move the Screen Image Table to >0958 the VDP Stack and move the VDP  Stack higher up above that location.

 

And no Crunch Buffer is hard coded into the ROMs and GROMs Lee.

It took me a year to figure out how to move the VDP Stack so it would not crash XB when you restart and retain the new VDP Stack address.

Link to comment
Share on other sites

29 minutes ago, SteveB said:

When I use >1000 to >1800, can I just update >8324, which is now pointing to VRAM >0985? Just like >836E ? Do I need to copy the values from the stack to a new location. I need "only" >0600 bytes for 192 characters, so the stack could start at >1600.

 

The Value Stack is a temporary stack for floating point calculations. When you make a CALL to your code, both >8324 (base of Value Stack) and >836E (top of Value Stack) should be pointing to the same address, which means the stack is not in use and no need to save/restore it.

 

...lee

Link to comment
Share on other sites

1 hour ago, Lee Stewart said:

 

The Value Stack is a temporary stack for floating point calculations. When you make a CALL to your code, both >8324 (base of Value Stack) and >836E (top of Value Stack) should be pointing to the same address, which means the stack is not in use and no need to save/restore it.

 

...lee

Any time you use a Variable or a CALL the VDP Stack in XB is used to push and pop onto the stack the name or address being modified or used.

This is why programs lock up as the VDP Stack is repeating the same command as it never popped off the top of stack so just does a repeat forever.

Link to comment
Share on other sites

3 hours ago, Lee Stewart said:

 

The Value Stack is a temporary stack for floating point calculations. When you make a CALL to your code, both >8324 (base of Value Stack) and >836E (top of Value Stack) should be pointing to the same address, which means the stack is not in use and no need to save/restore it.

 

...lee

When I use >1000 as base for my character pattern table the stack can grow from >0958 to >0FFF ... this is >06A9 bytes, dec. 1704 ... a waste of VRAM but I hope the stack will always fit.

 

Can I "reserve" the VRAM somehow that it does not get currupted by a downwards growing String-Table etc? My pattern table ends at >15FF, the first free address is >1600.

 

 

Link to comment
Share on other sites

22 minutes ago, SteveB said:

When I use >1000 as base for my character pattern table the stack can grow from >0958 to >0FFF ... this is >06A9 bytes, dec. 1704 ... a waste of VRAM but I hope the stack will always fit.

 

Can I "reserve" the VRAM somehow that it does not get currupted by a downwards growing String-Table etc? My pattern table ends at >15FF, the first free address is >1600.

 

 

Yea if in RXB you use CALL VDPSTACK(5632) it will set VDP Stack after your Pattern Table at >1600 and Strings or Files will go from top of VRAM down to VDP Stack.

In RXB you can even restart XB with CALL XB and run as many programs as you want with no problems.

Only when you go back to Title Screen will this reset VDP Stack Locations, but that would also restart Graphics mode too.

So I would leave normal graphics for XB where they are and switch back to your graphics mode in the XB programs.

That way you can switch back and forth and not crash XB ever. (This is pretty much what TML does.)

Link to comment
Share on other sites

2 minutes ago, SteveB said:

When I use >1000 as base for my character pattern table the stack can grow from >0958 to >0FFF ... this is >06A9 bytes, dec. 1704 ... a waste of VRAM but I hope the stack will always fit.

 

That is enough space for 212 floating point (FP) values! XB goes out of its way to keep the stack small. The XB random number generator, which is pretty complex, uses only 4 stack positions (each FP value is 8 bytes). XB should not need a Value Stack anywhere near that large.

 

...lee

  • Like 2
Link to comment
Share on other sites

Here is the source code for an early version of T80XB. It shows how to reserve additional space in the VDP RAM and how to return to the graphics mode when returning to XB. You should be able to adapt this technique so you can use the multicolor mode with XB. This method is how The Missing Link, T40XB, T80XB and XB256 work.

 

 

Spoiler

*TEST OF 80 COLUMN MODE FOR F18A by Harry Wilhelm, February 2017
*There are a few things that must happen to use the 80 column text mode (T80) within the
*TI Extended BASIC environment.  I chose to use these VDP memory locations for the screen image table and
*the character pattern table:
*V1000    is the start of the screen image table (1920 or >780 bytes) - so the stack now must start at V1780
*and not V0958 as with normal XB
*V0A00    is the start of the pattern table - the pattern for ASCII 32 (space) begins at >0A00
*This means an offset like in XB, but instead of >60 it is now >20
*If you wanted to have a second set of characters with inverse video for hilighting text they should
*begin at >D000 and that would require a screen offset of >80 to display them.
*All this means that all the XB tables, crunch buffer, edit/recall buffer are left untouched by the new
*T80 routines.  Also, the two modes are totally independant and changing one screen has no effect on the other.
 
*The magic that makes it happen is an interrupt driven routine (MONITR) that keeps track of what is happening
*with XB.  At program startup, normally XB uses >0958 as the top of the stack.  These routines need to use
*>1780 as the top of the stack.  The MONITR looks in certain memory locations to see if the top of the
*stack has been reset to >0958 by a "garbage collection".  If so, then it will change the pointers to the top
*of the stack back to >1780.   
*MONITR also keeps track of whether a program is running or not running.  When a program starts to
*run it will look to see why - was it CON that made the program start, or was it RUN.  If CON then was the
*program running in the T80 mode? If so then it sets the registers for T80.  When a program breaks the
*registers are automatically reset to the normal XB ones.
 
 
    DEF T80ON
    DEF T80
    DEF G32
    DEF DISPLY
    DEF CLS80
    DEF TCOLOR
    
FAC    EQU >834A         \
NUMASG EQU >2008
NUMREF EQU >200C
STRASG EQU >2010
STRREF EQU >2014
XMLLNK EQU >2018
KSCAN  EQU >201C          EXTENDED BASIC EQUATES
VDPWA    EQU >8C02
VDPWD    EQU >8C00
VDPRD    EQU >8800
 
 
VSBR   EQU >2028
VWTR   EQU >2030
ERR    EQU >2034    
VMBW   EQU >2024
 
VMBR   EQU >202C
*********************************
 
 
VSBW    SWPB R0
    MOVB R0,@>8C02        bl routine, set up same as
    SWPB R0
    ORI R0,>4000         blwp @vsbw
    MOVB R0,@>8C02        to run faster
    MOVB R1,@>8C00
    ANDI R0,>BFFF
    B *R11  
    
*******************************************************************************
*GPLLNK AND DSRLINK FROM THE SMART PROGRAMMER
********************************************************************************
 
GPLWS  EQU >83E0
GR4    EQU GPLWS+8
GR6    EQU GPLWS+12
LDGADD EQU >60
XTAB27 EQU >200E
GETSTK EQU >166C
 
GPLLNK DATA GLNKWS
       DATA GLINK1
 
RTNAD  DATA XMLRTN
GXMLAD DATA >176C
       DATA >50
 
GLNKWS EQU $->18
       BSS >08
 
 
GLINK1    MOV *R11,@GR4
       MOV *R14+,@GR6
       MOV @XTAB27,R12
       MOV R9,@XTAB27
       LWPI GPLWS
       BL *R4
       MOV @GXMLAD,@>8302(R4)
       INCT @>8373
       B @LDGADD
 
XMLRTN MOV @GETSTK,R4
       BL *R4
       LWPI GLNKWS
       MOV R12,@XTAB27
 
       RTWP
    
****************************************************
T80SCN    EQU >1000    
WKSP    BSS 32
BLWPWS BSS 32
MONWS     BSS 32
BUFFER    BSS 256
SCRBUF BSS 80
 
****************************
 
CONTXT    DATA >A3C3,>AFCF,>AECE    CcOoNn with offset
HX06B0    DATA >06B0
HX0870    DATA >0870
BOTSTK    DATA >1780     1000+24*80    bottom of stack (>1000 to >177F is table for 80 column screen)
HX0958    DATA >0958
HX8080    DATA >8080
DEC80    DATA 80
****************************************************
*INTERRUPT ROUTINE BELOW
*MONITR flags:
*R5 is a copy of the last XB run flag from >8344
*R6 is whether program was in G32 or T80 mode    0=G32  >ffff = T80
 
MONITR    LWPI MONWS    
*this section looks to see if there has been a garbage collection
*if there has been, then we need to reset from >0958 to >0F80 (BOTSTK)
    C @>836E,@HX0958    a quick check to see if garbage collection happened - no slow VDP
    JNE MONITG    if NE then a garbage collection has not just happened
*garbage collection has happened or is happening, need to check further to see if it is finished
    LI R0,>0388
    LI R1,MONWS+6    
    LI R2,2
    BLWP @VMBR    read two bytes from V0388 into R3
    C R3,@HX0958
    JNE MONITG    not done with garbage collection
*done with garbage collection, so reset pointers
    MOV @BOTSTK,R3     >1780 to R3
    BLWP @VMBW    write 2 bytes from R3 to V0388
    MOV R3,@>8324
    MOV R3,@>836E
*at this point have reset pointers so the bottom of the stack is where we want them
 
*******************************************************************
MONITG    CB @>8344,R5        >8344 is the run flag for XB.  0 if not running, >ffff if running
    JNE RCHANG        if NE then there has been a change in RUN flag
                
MONBK    LWPI >83E0     go back
    B *R11
    
*run state has changed if program gets here
RCHANG    MOVB @>8344,R5    store the new run state
    JEQ MONBK        IF =0 then prog is not running, nothing to do so go back  
 
*program just started this cycle, need to see if it is from RUN or from CON
*To do this we copy 16 bytes from the crunch buffer in VDP and look for CON or con
NEWRUN    LI R0,>08C0        start of crunch buffer
    LI R1,BUFFER
    LI R2,16
    BLWP @VMBR        read 16 bytes from the crunch buffer, will look for CON
*get rid of leading spaces if any    
MONI2A    CB *R1,@HX8080        a space?
    JNE MONI2B
    INC R1
    JMP MONI2A
*leading spaces are all gone when we get here    
MONI2B    LI R2,CONTXT        "CcOoNn" with offset
MONI2C    CB *R1,*R2+
    JEQ MONI2D
    CB *R1,*R2
    JNE NOTCON        not con
MONI2D    INC R1
    INC R2
    CI R2,CONTXT+6
    JNE MONI2C        not a match for con
*CON was found in edit/recall buffer    
BP1    MOV R6,R6        did program break in T80 or G32 mode
    JEQ MONBK            if eq then program broke in graphics mode, just go back
    LI R3,T80REG        broke in T80 mode, so set T80 registers
    BL @REGSET
    JMP MONBK
*A new RUN below (i.e. not a CON)    
NOTCON    CLR R6            program always starts in the graphics mode so clear T80 flag
    CLR @CHLFLG        Flag that tells whether T80 characters were loaded
    JMP MONBK
    
************************************
*T80ON sets up interrupt routine for the 80 column mode in F18A
*CALL LINK("T80ON")
**********************************
 
T80ON    LWPI WKSP
    CLR @MONWS+10        
    MOVB @>8344,@MONWS+10    Move the run state byte into MSB R5 of monitor workspace
    CLR @MONWS+12            flag for graphics mode =0
    LI R0,MONITR
    MOV R0,@>83C4            start up MONITR            
    
BK2XB    LWPI >83E0            return to XB
    B @>006A            
    
*************************************************************************
G32    LWPI WKSP
    CLR @MONWS+12        R6 of monitor workspace. Set to G32
    LI R3,G32REG
    JMP TX80A
***********************************************    
CHLFLG    DATA 0            characters loaded flag  0 if not loaded, -1 if loaded
 
T80    LWPI WKSP
    MOV @CHLFLG,R0
    JNE CHSETY        if not 0 then characters already loaded so do not reload them or clear screen
    BL @CL80SB        clears the screen for 80 column display
    
    CLR @>83C4        turn off interrupt routine so GPLLNK can happen without trouble
*load character set below
    SETO @CHLFLG        set characters loaded flag    
    LI R0,>09FF        table starts at >0A00 but we load them 1 pixel higher for better lower case
    MOV R0,@FAC
    BLWP @GPLLNK
    DATA >0018        load small capitals
    AI R0,>0200        
    MOV R0,@FAC
    BLWP @GPLLNK
    DATA >004A        lower case letters
    
    LI R0,MONITR
    MOV R0,@>83C4        restart interrupt routine (this could be put into gpllnk instead)
    
    LI R1,LCDEFS        lower case definitions
CHSETZ    MOV *R1+,R2        length to R2
    JEQ CHSETY        done if length=0
    MOV *R1+,R0
    BLWP @VMBW
    A R2,R1
    JMP CHSETZ
    
 
    
CHSETY    SETO @MONWS+12    R6 of monitor workspace. If 0 then in g32 mode; if >ffff then in t80 mode
 
    LI R3,T80REG
TX80A    BL @REGSET        load T80 registers
 
 
    JMP BK2XB        and return to XB
*******************************    
 
CLS80    LWPI WKSP        clears the 80 column screen like CALL CLEAR
    BL @CL80SB
    JMP BK2XB
*sub below is used by CLS80 and by T80    
CL80SB    LI R7,>0050    bytes are swapped (5000)   start writing at vdp >1000  
    MOVB R7,@VDPWA
    LI R1,>4000        a space (with offset of >20) into MSB of R1
    SWPB R7
    MOVB R7,@VDPWA
    LI R8,80*24
CL80LP    MOVB R1,@VDPWD
    DEC R8
    JNE CL80LP
    B *R11
***********************************
TCOLOR    LWPI WKSP
    CLR R1   
    BL @GETNUM          gets a number within the limits (1-16 below) and puts in FAC
    DATA 1
    DATA 16
    MOV @>834A,R4        foreground color to R4
    DEC R4
    SLA R4,4        to msn of lsb - i.e. >0004 becomes >0040
 
    BL @GETNUM          background color to FAC
    DATA 1
    DATA 16
    A @>834A,R4  
    DEC R4
    SWPB R4
    LI R3,T80REG        R3 points to T80 registers    
    MOVB R4,@9(R3)    change the colors in the T80REG table
    MOV @MONWS+12,R5    check R6 of MONWS to find out if in T80 mode
    JEQ TCBACK        not in T80
    BL @REGSET        in T80, so set the registers to change color
TCBACK    JMP BK2XB
 
***************************************    
DISPLY LWPI WKSP  
*    MOV @>8312,R8   
*    SRL R8,8          number of arguments in R8
    CLR R1   
    BL @GETNUM          gets a number within the limits (1-25 below) and puts in FAC
    DATA 1
    DATA 48
    MOV @>834A,R3     row to R3
    DEC R3          now row 1 is row 0
    MPY @DEC80,R3        multiply x 80 - now R4 has (ROW-1)*80
    
    BL @GETNUM          col to FAC
    DATA 1
    DATA 80
    A @>834A,R4   
    AI R4,>0FFF         now R4 points to screen address
    
DISPXX    INC R1            next item in list
    CLR R0
    
*find out if string or number by looking at >8302.  If odd then string; if even then number
    MOVB @>8302,R8
    
    SRL R8,9
    JOC PRSTR        odd, so it is a string
*print out a number     
    BLWP @NUMREF
 
    MOVB R0,@>8355    set to BASIC format by clearing byte at >8355
    BLWP @XMLLNK
    DATA >06        convert number to string
    
    LI R2,>8300
    AB @>8355,@WKSP+5    now R2 points to start of string
    MOVB @>8356,R3    length byte to MSB of R3
    JMP DISPL1
    
    
*print out a string    
PRSTR    LI R2,BUFFER
    SETO *R2          longest possible string is 255
    BLWP @STRREF        get the string to print
    MOVB *R2+,R3      length of string to MSB of R3
    SRL R3,8        length of string in to LSB of R3
DISPL1    MOV R4,R0        address on screen into R0
 
DISPL3    CI R0,>1780
    JLT DISPL4        if on screen then print
    BLWP @SCROLL        otherwise scroll  
    AI R0,-80
 
    JMP DISPL3        and check again
    
DISPL4    MOV R3,R3        a null string?
    JEQ DSPEND        if null string then go back
 
DISP5    MOVB *R2+,R1  
    AI R1,>2000        add in screen offset of >20
    BL @VSBW         N.B. THIS IS BL, NOT BLWP  
    INC R0
    DEC R3            done printing?
    
    JNE DISPL3
DSPEND    B @BK2XB
***********************
SCROLL    DATA BLWPWS
    DATA SCROL1
 
SCROL1    LI R0,>1050        row 2, column1
    LI R1,SCRBUF
    LI R2,80
    LI R3,23
SCLOOP    BLWP @VMBR        read 80 bytes from a row
 
    S R2,R0        subtract 80
    BLWP @VMBW        write 80 bytes 1 row up
    AI R0,160        down 2 rows
    DEC R3            do it 23 times
    JNE SCLOOP
 
    S R2,R0        gets here with R0 at Row 25, column 1, so subtract 80
    LI R1,>4000        space with the offset
SCLP2    BL @VSBW
    INC R0
    DEC R2
    JNE SCLP2
SCRT    RTWP
**********************************************  
GETNUM  INC R1  
GETNU1    CLR R0
    BLWP @NUMREF   
    BLWP @XMLLNK
    DATA >12B8        CFI
    C @>834A,*R11+
    JLT GTNERR
    C @>834A,*R11+
    JGT GTNERR
    B *R11   
************************************************    
GTNERR    LI R0,>1E00
    BLWP @ERR
 
******************************************    
REGSET    MOVB @3(R3),@>83D4
REGSE    MOV *R3+,R0
    JLT REGSE1
    BLWP @VWTR
    JMP REGSE
REGSE1    B *R11
    
T80REG    DATA >0004,>01F0,>0207,>0401,>07F4,>FFFF
G32REG DATA >0000,>01E0,>0200,>0400,>0717,>FFFF
 
************************
LCDEFS    
    DATA 8,>0B78
    DATA >3844,>4444,>4444,>3800 O
    DATA 8,>0A80
    DATA >3C44,>4C54,>6444,>7800 0  
    
    DATA 16,>0C08
    DATA >0000,>3804,>3C44,>3C00 a
    DATA >4040,>7844,>4444,>7800 b
*    DATA >0000,>3C40,>4040,>3C00 c
    DATA 144,>0C20
    DATA >0404,>3C44,>4444,>3C00 d
    DATA >0000,>3844,>7C40,>3C00 e
    DATA >1824,>2078,>2020,>2000 f
    DATA >0000,>3C44,>443C,>0438 g
    DATA >4040,>7844,>4444,>4400 h
    DATA >1000,>3010,>1010,>3800 i  
    DATA >0800,>0808,>0808,>4830 j
    DATA >4040,>4448,>7048,>4400 k
    DATA >3010,>1010,>1010,>3800 l
    DATA >0000,>6854,>5454,>5400 m  
    DATA >0000,>5864,>4444,>4400 n
    DATA >0000,>3844,>4444,>3800 o
    DATA >0000,>7844,>4478,>4040 p
    DATA >0000,>3C44,>443C,>0404 q
    DATA >0000,>5864,>4040,>4000 r
    DATA >0000,>3C40,>3804,>7800 s
    DATA >2020,>7820,>2024,>1800 t
    DATA >0000,>4444,>4444,>3C00 u
*    DATA >0000,>4444,>2828,>1000 v
*    DATA >0000,>4454,>5454,>2800 w
*    DATA >0000,>4428,>1028,>4400 x
    DATA 8,>0CC8
    DATA >0000,>4444,>443C,>0438 y
*    DATA >0000,>7C08,>1020,>7C00 z
 
    DATA 8,>09F0
    DATA >7C7C,>7C7C,>7C7C,>7C00
 
    DATA 0    
 
 
    END

 

 

  • Like 6
Link to comment
Share on other sites

On 12/31/2022 at 11:19 AM, Asmusr said:

Yes my latest Mario demo uses the column layout for both VDP RAM and the CPU RAM buffer, which gives a small advantage when transferring the buffer to the VDP compared to the linear layout. But the initial question of whether to store one or two pixels per byte in the buffer is still the more important.

 

I could need some advice on speed, as my Buffer-to-VRAM routine needs 120ms ... a little too much to build a usable library.

 

****************************************************************************
*
* CALL MCSYNC - Writes the CPU RAM Buffer to VRAM
*
****************************************************************************

MCSYNC 
       LI R0,CHRPAT        VRAM Adress of CharPat table
       LI R4,SCRN1         Address of even column pixel
       LI R5,SCRN1+48      Address of odd column pixel
       LI R6,48            Row Count until 0 
       LI R7,64            Columns Count until 0

       * Setup VRAM Address 
       LIMI 0                  Disable interrupts so no one interferes with VDP 
       SWPB R0
       MOVB R0,@VDPWA          Write lo-byte of VRAM Address
       SWPB R0
       ORI  R0,>4000           Set R/W bits 14 and 15 to 01 
       MOVB R0,@VDPWA          Write hi-byte of VRAM Address       

LPSYNC CLR R1                  Clear R1 for lower byte in shift op
       MOVB *R4+,R1            read even column byte in MSB
       MOVB *R5+,R8            read odd column byte in MSB
       SLA R1,4                shift 4 bits for left nybble 
       SOC R8,R1               merge right nybble out of R8
       MOVB R1,@VDPWD          Write "character" to VRAM
       DEC R6
       JNE LPSYNC              column not done?
       LI R6,48                Row Count until 0 again 
       AI R4,48                next column ... skip one (odd/even) 
       AI R5,48
       DECT R7
       JNE LPSYNC              not all columns done?
       LIMI 2                  Enable Interrrupts
       RT
       

 

My thoughts on this code:

  • I eliminated the column-buffer in CPU RAM and now write byte-by-byte as VRAM Patterntable and my buffer are both column-aligned
  • the column-layout of my SCRN1 buffer forces me to use two MOVB. In row-layout it could be a 16bit word fetch for two adjacent pixels, but would have to add 64 to get to the next line instead of using *R4+ ( 2x20 cylcles vs. 22+14 cylces?). Calculating an address from (x,y) would be faster as well, as multiplying by 48 takes longer than ASL Rx,5 for multiplying with 64.
  • what is the minimal time it takes to transfer 1536 bytes from CPU RAM to VRAM? Should my buffer be identical to the VRAM layout, with all the consequences on ugly pixel-coordinate calculations and nybble updates (read before write)?
  • Clearing my buffer with 3072 bytes takes 60ms ... half the bytes would take half the time. I can't understand why this isn't faster than writing to VRAM.
MCCLR2 MOV @PCOLOR,R4
       SWPB R4
       SOC @PCOLOR,R4       Color in MSB and LSB
       LI R6,3072           3072 bytes to clear, two at a time (word)
       LI R1,SCRN1
  
LPMCCL MOV R4,*R1+          Write two color bytes as one word 
       DECT R6
       JNE LPMCCL
       RT

 

Any hints and thoughts are welcome.

 

Steve

Edited by SteveB
Copy error in MCCLR2
Link to comment
Share on other sites

5 minutes ago, SteveB said:
  • Clearing my buffer with 3072 bytes takes 60ms ... half the bytes would take half the time. I can't understand why this isn't faster than writing to VRAM.

Because VRAM is the same speed as any 8-bit RAM access. The only thing that makes access VDP slower is when you need to change the address.

 

Speed up your clear loop by unrolling - four MOV's will give you a good tradeoff between unroll and memory space. The benefit seems to level off at 8. Right now you're spending half your time counting and branching instead of copying data.

 

I don't have too much advice off the cuff for the copy loop. Store the value "48" into a register, will be faster than using immediates in your increment phase. Also store VDPWD into a register for the same reason. Do away with R8 and take advantage of the memory-to-memory architecture (SOC *R5+,R1). That's all I see without data structure changes (and I didn't investigate that).

 

 

Link to comment
Share on other sites

25 minutes ago, SteveB said:
  • the column-layout of my SCRN1 buffer forces me to use two MOVB. In row-layout it could be a 16bit word fetch for two adjacent pixels, but would have to add 64 to get to the next line instead of using *R4+ ( 2x20 cylcles vs. 22+14 cylces?). Calculating an address from (x,y) would be faster as well, as multiplying by 48 takes longer than ASL Rx,5 for multiplying with 64.

Thinking about this - there's nothing that says you can't waste a little memory and have 64 byte columns so you can shift instead of multiply by 48. ;)

 

Edited by Tursi
Link to comment
Share on other sites

I combine pixels from two rows at the same time:

mov  *r4+,r1                    ; Even pixel
sla  r1,4                       ; Shift to high nybble
soc  *r5+,r1                    ; Odd pixel
movb r1,*r15                    ; Write to VDP
movb *r6,*r15                   ; Write low byte to VDP

Edit: r6 contains the address of r1 low byte. r15 contains VDPWD.

Edited by Asmusr
Link to comment
Share on other sites

For a speed reference I just tried writing 3072 bytes (C00) and timing with the 9901 timer it took 1128 uS ( 1.128 mS)

This includes pulling 2 parameters off the stack into registers and setting the VDP address with a sub-routine call.

 

image.png.acb6f37331d7270cd217860151b2c2c3.png

 

 

Link to comment
Share on other sites

2 hours ago, Tursi said:

Thinking about this - there's nothing that says you can't waste a little memory and have 64 byte columns so you can shift instead of multiply by 48. ;)

 

If the 99er has something abundantly it is memory ...  🤣

 

The 8kB lower memory would have this additional 1kB indeed, but the blanking of the memory would be more complicated. Either I stop at 48 bytes per column and move to the next or I have 16 bytes more to blank for each column. Both may kill any gain.

Link to comment
Share on other sites

3 hours ago, Asmusr said:

I combine pixels from two rows at the same time:

mov  *r4+,r1                    ; Even pixel
sla  r1,4                       ; Shift to high nybble
soc  *r5+,r1                    ; Odd pixel
movb r1,*r15                    ; Write to VDP
movb *r6,*r15                   ; Write low byte to VDP

Edit: r6 contains the address of r1 low byte. r15 contains VDPWD.

Your code is more compact than mine and you use registers only, which is four cycles faster than Label access... As I preserve the XB context I am not as free with the Scratchpad as you in your 2014 demo, but I think I will get this somehow with STWP dynamicly. Thank you, this looks promissing!

Link to comment
Share on other sites

3 hours ago, TheBF said:

For a speed reference I just tried writing 3072 bytes (C00) and timing with the 9901 timer it took 1128 uS ( 1.128 mS)

This includes pulling 2 parameters off the stack into registers and setting the VDP address with a sub-routine call.

 

image.png.acb6f37331d7270cd217860151b2c2c3.png

 

 

I try to build a XB library, so my measures are always done with the CALL LINK command ... which attributes to the slowness ... a CALL without parameters take about 21ms, with 4 parameters 60ms, which does not change using RXB or XB 2.9.

 

Any usefull usage in games etc. will require compiling. 

 

When substracting 20ms for the plain call, both routines aren't that bad. With the proposed changes I see some potential. But writing 3kB in a little over 1/1000 of a second?

 

Let's say we unroll all MOVs, no overhead for a loop.

    MOV R1,*R2+    

    MOV R1,*R2+    

    MOV R1,*R2+    

    (1536 times)

 

According to the Data Sheet Page 28f, each line takes 

 

Cycles: 14 + 8 

Memory Access: 4 + 2 

Waitstates: 8 for 8 Bit CPU RAM

 

T = tc * (C + W * M)

  =  0.333 uS * (22 + 6 * 8 ) = 0,333 *70 uS = 23,3 uS

 

for 3kB: 1536 * 23,3uS = 35,804 uS = 35.8 mS  ... without any overhead.

 

Or have I gotten something wrong?

 

 

 

 

 

  • Like 1
Link to comment
Share on other sites

4 hours ago, Tursi said:

Speed up your clear loop by unrolling - four MOV's will give you a good tradeoff between unroll and memory space.

I tried 4 and 8 MOVs ... 52ms vs. 50ms. 

 

Taking into account the 20ms vor the CALL LINK and 5ms for the handling of one parameter, it is 25 vs. 27 vs. 35 without unrolling... and faster than my theoretical speed from above !?! Is the 0,333uS not valid for the 99/4a or do I have a fundamental missunderstanding of the tables in the data sheet?

 

Link to comment
Share on other sites

To Double check here is the timing from Classic99 Debugger

 

\ get parameters from Forth    
   A7C4  C004  mov  R4,R0                  (18)  DUP byte count to R0 
   A7C6  C136  mov  *R6+,R4                (30)  Pop VDP address 
   A7C8  C0B6  mov  *R6+,R2                (30)  Pop CPU RAM address 
   
\ test if the byte count is not zero    
   A7CA  C000  mov  R0,R0                  (18)
   A7CC  1309  jeq  >a7e0                  (12)
\ set VDP address 
   A7CE  06A0  bl   @>a73e                 (32)
         A73E
   A73E  0264  ori  R4,>4000               (22)
         4000
   A742  02A1  stwp R1                     (12)
   A744  0300  limi >0000                  (24)
         0000
   A748  D821  movb @>0009(R1),@>8c02      (50)
         0009
         8C02
   A74E  C804  mov  R4,@>8c02              (38)
         8C02
   A752  045B  b    *R11                   (20)
   
 \ set R3 to the write port. This improves speed by ~12%  
   A7D2  0203  li   R3,>8c00               (20)
         8C00
         
 \ write the bytes         
   A7D6  D4F2  movb *R2+,*R3               (40)
   A7D8  0600  dec  R0                     (14)
   A7DA  16FD  jne  >a7d6                  (14)
 
 \ enable interrupts   
   A7DC  0300  limi >0002                 
         0002
 \ refill top of stack cache register         
   A7E0  C136  mov  *R6+,R4           
   
\ return to Forth    
   A7E2  045A  b    *R10        

 

So the loop is taking 40+14+14= 68 cycles

times .333 uS = 22.64 uS 

22.64 * 3072 = 69,562uS = 69mS

 

So my interrupt based elapsed timer is not accurate with interrupts being off so much in this code.

No surprise there I guess. 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...