Jump to content
IGNORED

Memory Expansion vs VDP RAM speed tests


TheBF
 Share

Recommended Posts

I always knew that TI BASIC used the VDP RAM for most things.

I also believed that the VDP RAM was a big contributor to the slow performance of TI BASIC.

 

So I decided to create some code for CAMEL99 Forth that would let me create variables and strings

in VDP RAM using the same kind of code that is used by Forth for this purpose.

 

Here is the code that creates the memory allocation commands.

The actual words are the like the Forth equivalent but they have 'V' prefix.

HEX
1000  CONSTANT VBASE
37D7  CONSTANT VENDS    ( this allows 10k of VDP RAM for data)

VARIABLE VP             ( VDP memory pointer)

: VHERE    ( -- addr) VBASE VP @ + ; ( returns next available VDP addr)

: VALLOT   ( n --)    DUP VHERE +  VENDS >  ABORT" VDP MEM FULL"  
​                      VP +! ;

I needed a way to create a variable and a buffer to hold a string.

I also made a buffer creator for CPU RAM while I was at it.

: VDPVAR:  ( -- Vaddr)   VHERE CONSTANT  2 VALLOT ;
: VBUFFER: ( n -- Vaddr) VHERE CONSTANT    VALLOT ;
: BUFFER:  ( n -- addr) CREATE ALLOT ;  ( CPU ram buffer creator)

CAMEL99 already has ASM code that is equivalent of fetch and store for integers

that operate in the VDP RAM. ( V@ and V!) but I needed routines to move a string

from Forth into VDP ram with the count byte included. I also needed a way to GET

it back into Forth. While I was at it I made a PRINT for both kinds of strings.

( PLACE a string from CPU ram into VDP ram as a counted string)
: VPLACE   ( adr len vdp-addr -- ) 2DUP VC!  1+  SWAP VWRITE ;

( GET a counted string from VDP ram and return address and count on stack)
: VGET     ( V$ -- addr cnt ) DUP VC@  PAD SWAP 1+ VREAD PAD COUNT ;

( syntax candy)
: VPRINT   ( VDP$ -- ) VGET TYPE ; ( print a VDP string )
: PRINT    ( cpu$ -- ) COUNT TYPE ; ( print a CPU string)

Then I created variables and buffers in each memory space.

DECIMAL
   VARIABLE X  
   VARIABLE Y    
50 BUFFER: B$ 
                
   VDPVAR: Q  
   VDPVAR: W     
50 VBUFFER: A$         

Below is the test code which completely blew away my assumptions about

the speed of VDP RAM.

 

For integers there is a pretty big hit as you can see. 50 to 75% slower in VDP ram.

 

BUT there is only about a 9% slowdown when using VDP RAM for string storage!

I am really surprised.

 

This ran on CLASSIC99.

Maybe somebody can tell me if real hardware would show the same kind of results.

 

BF

 

 

NOTE: Both Forth routines PLACE and VPLACE use ASM code to do the actual

byte-by-byte movement of data. Forth only massages the parameters on the stack for the

Assembly code to pick them up.

DECIMAL
( integer store and fetch test)
: CPU_R/W  ( n -- ) 10000 0 DO   I X !    X  @ DROP  LOOP ; ( 4 SECS)
: VDP_R/W  ( n -- ) 10000 0 DO   I Q V!   Q V@ DROP  LOOP ; ( 6 SECS)

( test integer to integer transfer speed)
: CPU_A->B ( n -- ) 10000 0 DO   I X !   X  @  Y  !  LOOP ;  ( 4.5 SECS)
: VDP_A->B ( n -- ) 10000 0 DO   I Q V!  Q V@  W V!  LOOP ;  ( 7.9 SECS)

( test write a string to VDP RAM)
: VDP$   ( -- )
         CR S" Initial A$ string." A$ VPLACE
         CR A$ VPRINT
         5000 0 DO
            S" This string was placed into VDP RAM 5000 times." A$ VPLACE
         LOOP
         CR A$ VPRINT ; ( 11.4 SECS)

: CPU$   ( -- )
         CR S" Initial B$ string." B$ PLACE
         CR B$ PRINT
         5000 0 DO
            S" This string was placed into CPU RAM 5000 times." B$ PLACE
         LOOP
         CR B$ PRINT ; ( 10.6 SECS)

Edited by TheBF
Link to comment
Share on other sites

Ok at the Fest West I explained this issue as being a huge factor for TI Basic being slow compared to XB.

Also this additionally explains why XB is slower than other Basic languages even on other computers.

 

 

Every time Basic or XB using the Crunch Buffer as needed, thus each line is duplicated from RAM into the Crunch Buffer depending on the commands being processed.

Even though Numeric Variables are in RAM they are put into VDP at times for processing like adding values such as:

100 Y=INT(10)

110 GOTO 100

RUN

You will see very little effect on VDP memory at >37D0 as about 10 bytes are used in total.

Now see the difference:

100 Y$="TEST"

110 GOTO 100

RUN

 

In a few seconds VDP memory from >37D7 to >09AB will be filled with TEST

 

And Program speed will decrease by about 50% at least.

 

You can do the above demo of the issue using Classic99 and turn on the DEBUGGER to VDP address >37D0

Edited by RXB
Link to comment
Share on other sites

Wow! that is a pretty weird. I will try that. Seems like a pretty big bug.

 

But I am not touching BASIC in any way for these tests. Most of the work is done by little assembler routines.

One is called CMOVE (character move). The other one is a regular version of VSBW which I renamed to VWRITE.

And the Forth interpreter is just controlling the sequence of what happens and when sort of thing.

 

My stacks are both in low memory expansion. VDP is essentially setup from Editor/Assembler except that I switch to Text mode.

 

So these timings are literally comparing using memory expansion for your storage RAM or using VDP for your storage RAM.

(of course the program itself is running in expansion memory)

 

Thanks for the insights on BASIC VDP memory usage.

I am going to look at that right now.

 

BF

Link to comment
Share on other sites

I haven't looked at your examples in detail, but I believe the main penalty for using VDP RAM is due to having to repeatedly set up read and write addresses. For integers where you have to do this for each word the penalty is high. When transferring longer byte sequences, like the string in your example, the difference evens out.

  • Like 1
Link to comment
Share on other sites

I haven't looked at your examples in detail, but I believe the main penalty for using VDP RAM is due to having to repeatedly set up read and write addresses. For integers where you have to do this for each word the penalty is high. When transferring longer byte sequences, like the string in your example, the difference evens out.

 

That makes perfect sense when you put it that way.

The VMBW routine sets up the address once and then just fires the bytes into VDP ram in a tight loop, with the VDP chip doing the auto-incrementing of the address in hardware.

So of course it goes pretty fast.

 

I guess I had higher expectations of the memory expansion, even with its 8 bit buss and wait states.

 

On the good news side I can now move string storage to VDP Ram in my little system without a big speed penalty.

 

:-)

 

BF

Link to comment
Share on other sites

I haven't looked at your examples in detail, but I believe the main penalty for using VDP RAM is due to having to repeatedly set up read and write addresses. For integers where you have to do this for each word the penalty is high. When transferring longer byte sequences, like the string in your example, the difference evens out.

Since I see your handle is ASMUSR, here are the routines that were called in Forth Assembler

so you can see what was really going on behind the Forth smoke and mirrors.

 

It's pretty obvious to me now that the VDP address set-up sub-routine is bigger than the code to write to VDP.

:-)

 

DUH!

CODE: CMOVE   ( src dst n -- )  \ forward character move
             *SP+ R0 MOV,       \ pop DEST into R0                           22
             *SP+ R1 MOV,       \ pop source into R1                         22
              TOS TOS MOV,
              @@2 JEQ,          \ if n=0 get out
@@1:         *R1+ *R0+ MOVB,    \ byte move, with auto increment by 1. cool  26
              TOS DEC,          \ n is in TOS (R4)                           10
              @@1 JNE,                                                \      10
@@2:          TOS POP,                                                \      22
              NEXT,                                                   \     112
              END-CODE

\ factored sub-routine. Sets up VDP address
l: WMODE      R0 3FFF ANDI,             \ WRITE mode entry. CLR control bits
              R0 4000  ORI,             \ set control bits to write mode (01)
l: VMODE      0 LIMI,
              R0 SWPB,                  \ R0= VDP-adr, set the VDP WRITE address
              R0 VDPWA @@ MOVB,         \ send low byte of vdp ram write address
              R0 SWPB,
              R0 VDPWA @@ MOVB,         \ send high byte of vdp ram write address
              2 LIMI,
              RT,

\ VMBW in Forth style
CODE: VWRITE  ( RAM-addr VDP-addr cnt -- )
              R0 POP,                  \ vaddr to R0
              R1 POP,                  \ cpu addr to R1
              R3 VDPWD LI,             \ vdp addr. in a reg. makes this 12.9% faster
              WMODE @@ BL,             \ setup VDP write address
@@1:         *R1+ *R3 MOVB,            \ write byte to vdp write port
              TOS DEC,                 \ dec the byte counter
              @@1 JNE,                 \ jump back if not done
              TOS POP,
              NEXT,
              END-CODE

Edited by TheBF
Link to comment
Share on other sites

Here is a simple program I wrote to test the difference between VDP and RAM with only 1 single line changed to mesure differences using Classic99

100 OPEN #1:"CLOCK"
110 INPUT #1:A$,B$,C$
120 PRINT A$:B$:C$
130 FOR X=0 TO 10000
140 Y=10
150 NEXT X
160 INPUT #1:A$,B$,C$
170 PRINT A$:B$:C$
180 END

I got 1 minute 10 seconds for Numeric Variable using RAM in line 140 Y=10

 

Next in line 140 change it to 140 Y$="T"

 

That came out to 2 minutes 10 seconds almost twice long.

Edited by RXB
Link to comment
Share on other sites

Here is a simple program I wrote to test the difference between VDP and RAM with only 1 single line changed to mesure differences using Classic99

100 OPEN #1:"CLOCK"
110 INPUT #1:A$,B$,C$
120 PRINT A$:B$:C$
130 FOR X=0 TO 10000
140 Y=10
150 NEXT X
160 INPUT #1:A$,B$,C$
170 PRINT A$:B$:C$
180 END

I got 1 minute 10 seconds for Numeric Variable using RAM in line 140 Y=10

 

Next in line 140 change it to 140 Y$="T"

 

That came out to 2 minutes 10 seconds almost twice long.

 

That's what would expect, but is there any way in BASIC you could compare the storage of long coherent sequences of bytes in VDP RAM vs CPU RAM?

Link to comment
Share on other sites

 

That's what would expect, but is there any way in BASIC you could compare the storage of long coherent sequences of bytes in VDP RAM vs CPU RAM?

Yes in Basic using Classic99 put in this program for Numeric Variables:

100 OPEN #1:"CLOCK"
110 INPUT #1:A$,B$,C$
120 PRINT A$:B$:C$
130 DIM Y(200)
140 FOR X=0 TO 200
150 Y(X)=9999999999
160 NEXT X
170 INPUT #1:A$,B$,C$
180 PRINT A$:B$:C$
190 END

This took 2 seconds to run.

Now put in this one for Strings:

100 OPEN #1:"CLOCK"
110 INPUT #1:A$,B$,C$
120 PRINT A$:B$:C$
130 DIM Y$(200)
140 FOR X=0 TO 200
150 Y$(X)="9999999999"
160 NEXT X
170 INPUT #1:A$,B$,C$
180 PRINT A$:B$:C$
190 END

And this program took 4 seconds to run, again 2 times longer to do same thing.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...