+TheBF Posted May 5, 2017 Share Posted May 5, 2017 (edited) I always knew that TI BASIC used the VDP RAM for most things. I also believed that the VDP RAM was a big contributor to the slow performance of TI BASIC. So I decided to create some code for CAMEL99 Forth that would let me create variables and strings in VDP RAM using the same kind of code that is used by Forth for this purpose. Here is the code that creates the memory allocation commands. The actual words are the like the Forth equivalent but they have 'V' prefix. HEX 1000 CONSTANT VBASE 37D7 CONSTANT VENDS ( this allows 10k of VDP RAM for data) VARIABLE VP ( VDP memory pointer) : VHERE ( -- addr) VBASE VP @ + ; ( returns next available VDP addr) : VALLOT ( n --) DUP VHERE + VENDS > ABORT" VDP MEM FULL" VP +! ; I needed a way to create a variable and a buffer to hold a string. I also made a buffer creator for CPU RAM while I was at it. : VDPVAR: ( -- Vaddr) VHERE CONSTANT 2 VALLOT ; : VBUFFER: ( n -- Vaddr) VHERE CONSTANT VALLOT ; : BUFFER: ( n -- addr) CREATE ALLOT ; ( CPU ram buffer creator) CAMEL99 already has ASM code that is equivalent of fetch and store for integers that operate in the VDP RAM. ( V@ and V!) but I needed routines to move a string from Forth into VDP ram with the count byte included. I also needed a way to GET it back into Forth. While I was at it I made a PRINT for both kinds of strings. ( PLACE a string from CPU ram into VDP ram as a counted string) : VPLACE ( adr len vdp-addr -- ) 2DUP VC! 1+ SWAP VWRITE ; ( GET a counted string from VDP ram and return address and count on stack) : VGET ( V$ -- addr cnt ) DUP VC@ PAD SWAP 1+ VREAD PAD COUNT ; ( syntax candy) : VPRINT ( VDP$ -- ) VGET TYPE ; ( print a VDP string ) : PRINT ( cpu$ -- ) COUNT TYPE ; ( print a CPU string) Then I created variables and buffers in each memory space. DECIMAL VARIABLE X VARIABLE Y 50 BUFFER: B$ VDPVAR: Q VDPVAR: W 50 VBUFFER: A$ Below is the test code which completely blew away my assumptions about the speed of VDP RAM. For integers there is a pretty big hit as you can see. 50 to 75% slower in VDP ram. BUT there is only about a 9% slowdown when using VDP RAM for string storage! I am really surprised. This ran on CLASSIC99. Maybe somebody can tell me if real hardware would show the same kind of results. BF NOTE: Both Forth routines PLACE and VPLACE use ASM code to do the actual byte-by-byte movement of data. Forth only massages the parameters on the stack for the Assembly code to pick them up. DECIMAL ( integer store and fetch test) : CPU_R/W ( n -- ) 10000 0 DO I X ! X @ DROP LOOP ; ( 4 SECS) : VDP_R/W ( n -- ) 10000 0 DO I Q V! Q V@ DROP LOOP ; ( 6 SECS) ( test integer to integer transfer speed) : CPU_A->B ( n -- ) 10000 0 DO I X ! X @ Y ! LOOP ; ( 4.5 SECS) : VDP_A->B ( n -- ) 10000 0 DO I Q V! Q V@ W V! LOOP ; ( 7.9 SECS) ( test write a string to VDP RAM) : VDP$ ( -- ) CR S" Initial A$ string." A$ VPLACE CR A$ VPRINT 5000 0 DO S" This string was placed into VDP RAM 5000 times." A$ VPLACE LOOP CR A$ VPRINT ; ( 11.4 SECS) : CPU$ ( -- ) CR S" Initial B$ string." B$ PLACE CR B$ PRINT 5000 0 DO S" This string was placed into CPU RAM 5000 times." B$ PLACE LOOP CR B$ PRINT ; ( 10.6 SECS) Edited May 6, 2017 by TheBF Quote Link to comment Share on other sites More sharing options...
+RXB Posted May 5, 2017 Share Posted May 5, 2017 (edited) Ok at the Fest West I explained this issue as being a huge factor for TI Basic being slow compared to XB. Also this additionally explains why XB is slower than other Basic languages even on other computers. Every time Basic or XB using the Crunch Buffer as needed, thus each line is duplicated from RAM into the Crunch Buffer depending on the commands being processed. Even though Numeric Variables are in RAM they are put into VDP at times for processing like adding values such as: 100 Y=INT(10) 110 GOTO 100 RUN You will see very little effect on VDP memory at >37D0 as about 10 bytes are used in total. Now see the difference: 100 Y$="TEST" 110 GOTO 100 RUN In a few seconds VDP memory from >37D7 to >09AB will be filled with TEST And Program speed will decrease by about 50% at least. You can do the above demo of the issue using Classic99 and turn on the DEBUGGER to VDP address >37D0 Edited May 5, 2017 by RXB Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 5, 2017 Author Share Posted May 5, 2017 Wow! that is a pretty weird. I will try that. Seems like a pretty big bug. But I am not touching BASIC in any way for these tests. Most of the work is done by little assembler routines. One is called CMOVE (character move). The other one is a regular version of VSBW which I renamed to VWRITE. And the Forth interpreter is just controlling the sequence of what happens and when sort of thing. My stacks are both in low memory expansion. VDP is essentially setup from Editor/Assembler except that I switch to Text mode. So these timings are literally comparing using memory expansion for your storage RAM or using VDP for your storage RAM. (of course the program itself is running in expansion memory) Thanks for the insights on BASIC VDP memory usage. I am going to look at that right now. BF Quote Link to comment Share on other sites More sharing options...
Asmusr Posted May 5, 2017 Share Posted May 5, 2017 I haven't looked at your examples in detail, but I believe the main penalty for using VDP RAM is due to having to repeatedly set up read and write addresses. For integers where you have to do this for each word the penalty is high. When transferring longer byte sequences, like the string in your example, the difference evens out. 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 5, 2017 Author Share Posted May 5, 2017 I haven't looked at your examples in detail, but I believe the main penalty for using VDP RAM is due to having to repeatedly set up read and write addresses. For integers where you have to do this for each word the penalty is high. When transferring longer byte sequences, like the string in your example, the difference evens out. That makes perfect sense when you put it that way. The VMBW routine sets up the address once and then just fires the bytes into VDP ram in a tight loop, with the VDP chip doing the auto-incrementing of the address in hardware. So of course it goes pretty fast. I guess I had higher expectations of the memory expansion, even with its 8 bit buss and wait states. On the good news side I can now move string storage to VDP Ram in my little system without a big speed penalty. :-) BF Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 5, 2017 Author Share Posted May 5, 2017 (edited) I haven't looked at your examples in detail, but I believe the main penalty for using VDP RAM is due to having to repeatedly set up read and write addresses. For integers where you have to do this for each word the penalty is high. When transferring longer byte sequences, like the string in your example, the difference evens out. Since I see your handle is ASMUSR, here are the routines that were called in Forth Assembler so you can see what was really going on behind the Forth smoke and mirrors. It's pretty obvious to me now that the VDP address set-up sub-routine is bigger than the code to write to VDP. :-) DUH! CODE: CMOVE ( src dst n -- ) \ forward character move *SP+ R0 MOV, \ pop DEST into R0 22 *SP+ R1 MOV, \ pop source into R1 22 TOS TOS MOV, @@2 JEQ, \ if n=0 get out @@1: *R1+ *R0+ MOVB, \ byte move, with auto increment by 1. cool 26 TOS DEC, \ n is in TOS (R4) 10 @@1 JNE, \ 10 @@2: TOS POP, \ 22 NEXT, \ 112 END-CODE \ factored sub-routine. Sets up VDP address l: WMODE R0 3FFF ANDI, \ WRITE mode entry. CLR control bits R0 4000 ORI, \ set control bits to write mode (01) l: VMODE 0 LIMI, R0 SWPB, \ R0= VDP-adr, set the VDP WRITE address R0 VDPWA @@ MOVB, \ send low byte of vdp ram write address R0 SWPB, R0 VDPWA @@ MOVB, \ send high byte of vdp ram write address 2 LIMI, RT, \ VMBW in Forth style CODE: VWRITE ( RAM-addr VDP-addr cnt -- ) R0 POP, \ vaddr to R0 R1 POP, \ cpu addr to R1 R3 VDPWD LI, \ vdp addr. in a reg. makes this 12.9% faster WMODE @@ BL, \ setup VDP write address @@1: *R1+ *R3 MOVB, \ write byte to vdp write port TOS DEC, \ dec the byte counter @@1 JNE, \ jump back if not done TOS POP, NEXT, END-CODE Edited May 5, 2017 by TheBF Quote Link to comment Share on other sites More sharing options...
+RXB Posted May 5, 2017 Share Posted May 5, 2017 (edited) Here is a simple program I wrote to test the difference between VDP and RAM with only 1 single line changed to mesure differences using Classic99 100 OPEN #1:"CLOCK" 110 INPUT #1:A$,B$,C$ 120 PRINT A$:B$:C$ 130 FOR X=0 TO 10000 140 Y=10 150 NEXT X 160 INPUT #1:A$,B$,C$ 170 PRINT A$:B$:C$ 180 END I got 1 minute 10 seconds for Numeric Variable using RAM in line 140 Y=10 Next in line 140 change it to 140 Y$="T" That came out to 2 minutes 10 seconds almost twice long. Edited May 5, 2017 by RXB Quote Link to comment Share on other sites More sharing options...
Asmusr Posted May 5, 2017 Share Posted May 5, 2017 Here is a simple program I wrote to test the difference between VDP and RAM with only 1 single line changed to mesure differences using Classic99 100 OPEN #1:"CLOCK" 110 INPUT #1:A$,B$,C$ 120 PRINT A$:B$:C$ 130 FOR X=0 TO 10000 140 Y=10 150 NEXT X 160 INPUT #1:A$,B$,C$ 170 PRINT A$:B$:C$ 180 END I got 1 minute 10 seconds for Numeric Variable using RAM in line 140 Y=10 Next in line 140 change it to 140 Y$="T" That came out to 2 minutes 10 seconds almost twice long. That's what would expect, but is there any way in BASIC you could compare the storage of long coherent sequences of bytes in VDP RAM vs CPU RAM? Quote Link to comment Share on other sites More sharing options...
+RXB Posted May 6, 2017 Share Posted May 6, 2017 That's what would expect, but is there any way in BASIC you could compare the storage of long coherent sequences of bytes in VDP RAM vs CPU RAM? Yes in Basic using Classic99 put in this program for Numeric Variables: 100 OPEN #1:"CLOCK" 110 INPUT #1:A$,B$,C$ 120 PRINT A$:B$:C$ 130 DIM Y(200) 140 FOR X=0 TO 200 150 Y(X)=9999999999 160 NEXT X 170 INPUT #1:A$,B$,C$ 180 PRINT A$:B$:C$ 190 END This took 2 seconds to run. Now put in this one for Strings: 100 OPEN #1:"CLOCK" 110 INPUT #1:A$,B$,C$ 120 PRINT A$:B$:C$ 130 DIM Y$(200) 140 FOR X=0 TO 200 150 Y$(X)="9999999999" 160 NEXT X 170 INPUT #1:A$,B$,C$ 180 PRINT A$:B$:C$ 190 END And this program took 4 seconds to run, again 2 times longer to do same thing. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.