Jump to content
IGNORED

Improving TI BASIC performance


Recommended Posts

I have done some experimenting with TI BASIC, trying to improve the performance. One of the big slowdowns is that CALL subprograms are much slower than they are in XB. It looks like when CALL is performed, BASIC looks everywhere else in the system first, and only at the very end looks in the cartridge and console groms.

I wrote a fast assembly lookup routine which compares the name against a table in the routine. If the name is found, then the pad is set up the same as if the match had been found via the normal route. (I also added the fast HCHAR and VCHAR from XB 2.9)

This requires MiniMemory to run. Using Classic99, under "options", start by enabling GRAM at >2000 and >4000.

Select BASIC

CALL INIT

CALL LOAD("DSK2.MMXB.OBJ")

CALL LINK("WTGROM")

Now BASIC is set up so that CALL, HCHAR and VCHAR will go to the new assembly routines.

I tested this with APERTURE and the play action is noticeably faster than standard TI BASIC.

Testing shows the CALLS are considerably faster than BASIC and somewhat slower than XB.

This is as far as this will go, but it serves as a tantalizing example of "what might have been."

 

(edit) In the original post I omitted HCHAR. MMXB1 below corrects that.

MMXB1.obj

 

Aperture

 

Below is a GIF showing a program running 3 different ways. In order from left to right:

TI BASIC        TI BASIC with fast CALL       XB running from VDP

10 FOR R=1 TO 24
20 FOR C=1 TO 32
30 CALL VCHAR(R,C,65)
40 NEXT C
50 NEXT R

 

MMXBDEMO.gif

 

 

 

 

Edited by senior_falcon
  • Like 13
  • Thanks 3
Link to comment
Share on other sites

13 minutes ago, oddemann said:

From what you promote, this should be made into a different ver. "cart" in Classic99. I don't think this break old programs?

No, it should be compatible with any XB program. The problem with making a cart is that the console GROMs need to be modified in 3 places. (I could move HCHAR and VCHAR to cartridge GROM, but that still leaves CALL in the console.

BASIC is interesting. There is nothing going on in the VDP that would prevent you from using the 40 column text mode. There is room in the MiniMemory GROM and ROM to add things such as DISPLAY AT, SPRITES, etc.

  • Like 6
Link to comment
Share on other sites

Here is what I did to modify the Basic CALL which starts at >50DB

50DB   CEQ @>8342,>C8

50DE   BR GROM @5671

50E0   CLR @>830C           Replaced this with XML >99 which goes to my assembly lookup routine.

50E2   DST @>8356,@>832C

 

The lookup routine has a table containing the Basic and MiniMemory CALLs. It moves the name of the routine to >834a and looks through the table for a match. If a match is found it sets up the scratchpad memory just like it would be if the match was found the normal way:

R2=>9800

R9=address of routine

>8356 points to the next byte in the program after the name of the routine

3 entries are added to the stack and the pointer at >8373 is adjusted

then B @>0B94 which is where the normal CALL lookup routine goes when it finds a match. This bypasses a huge amount of code.

(Edit) In Basic, the interpreter gets to >0B94 via a byzantine path, first through GPL which, after many twists and turns, winds up doing an XML that does many more things before going to a routine from >0BE8 to >0C0A that looks for a matching name. If one is found then INCT R11 and B *R11 which now contains >0B94. Since my routine is in assembly as well, I can bypass all that and go directly to >0B94 when a match is found. I set up the stack just like it would be if CALL was done the usual way.

If no match found, then clear the byte at >830C and B *R11, which returns to >50E2 to continue with the normal CALL routine. This handles errors and dsr CALLS, and I also let it handle the slow CALLs like INIT, EXPMEM1, etc.

 

I was totally amazed when this actually worked!

 

 

Edited by senior_falcon
  • Like 5
Link to comment
Share on other sites

On 6/8/2022 at 6:58 AM, senior_falcon said:

Here is what I did to modify the Basic CALL which starts at >50DB

50DB   CEQ @>8342,>C8

50DE   BR GROM @5671

50E0   CLR @>830C           Replaced this with XML >99 which goes to my assembly lookup routine.

50E2   DST @>8356,@>832C

 

The lookup routine has a table containing the Basic and MiniMemory CALLs. It moves the name of the routine to >834a and looks through the table for a match. If a match is found it sets up the scratchpad memory just like it would be if the match was found the normal way:

R2=>9800

R9=address of routine

>8356 points to the next byte in the program after the name of the routine

3 entries are added to the stack and the pointer at >8373 is adjusted

then B @>0B94 which is where the normal CALL lookup routine goes when it finds a match. This bypasses a huge amount of code.

(Edit) In Basic, the interpreter gets to >0B94 via a byzantine path, first through GPL which, after many twists and turns, winds up doing an XML that does many more things before going to a routine from >0BE8 to >0C0A that looks for a matching name. If one is found then INCT R11 and B *R11 which now contains >0B94. Since my routine is in assembly as well, I can bypass all that and go directly to >0B94 when a match is found. I set up the stack just like it would be if CALL was done the usual way.

If no match found, then clear the byte at >830C and B *R11, which returns to >50E2 to continue with the normal CALL routine. This handles errors and dsr CALLS, and I also let it handle the slow CALLs like INIT, EXPMEM1, etc.

 

I was totally amazed when this actually worked!

This is very interesting stuff, thanks for explaining and kudos for achieving such boost for TI Basic! 

 

I have based my StrangeCart TI Basic integration on the MiniMemory cartridge, so this is relevant and interesting information for me. The StrangeCart is able to override the system GROMs, so I could copy your idea of injecting the XML >99 or something similar in there. Currently I am not overriding GROMs when running my Basic on the StrangeCart.

 

In my case, the default cartridge image when the StrangeCart is inserted is the modified MiniMemory. I have extended the GROM from 6K to 8K and I have additional GPL code in the extra 2K. I noticed that luckily the 4k ROM of the MiniMemory has a sizeable unused block, which I am using to perform machine code routines i.e. this is where the TMS9900 code runs a simple "I/O server" routine when the StrangeCart is running TI Basic code. The server routine copies stuff to VDP RAM as required and also performs I/O such as keyboard reading. I have created an extra entry XML >73 in my version of the MiniMemory to support this.

 

I have followed the TI Basic GPL code to check how certain things are done, and every time I find the GPL code quite confusing (part of this is that I have still used GPL only a little). GPL execution is not fast to begin with, and certainly the code seem to jump all around only making performance worse.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

At present when you CALL LOAD("DSK2.MMXB1.OBJ") the code is placed into expansion memory at >A000 (or minimemory ram if there is no expansion memory). I figured if this worked I would put the code into that block of unused memory in the MiniMemory rom.

I was toying with the idea of putting a copy of BASIC into groms 4 and 5. That would make changes like this easy to do without having to change the console groms.

Edited by senior_falcon
clarification
  • Like 1
Link to comment
Share on other sites

4 minutes ago, senior_falcon said:

At present when you CALL LOAD("DSK2.MMXB1.OBJ") the code is placed into expansion memory at >A000 (or minimemory ram if there is no expansion memory).

This is awesome.  I was going to suss this out on my own -- I assumed but did not want to assume.  Without a disk system, one would have to make a tape dump to be loaded in EasyBug as you would do, anyway.  The speed improvements for HCHAR and VCHAR alone make the extra effort worth it if you plan to target console-only systems.

  • Like 2
Link to comment
Share on other sites

It's really great to see more interest in TI Basic. With all its shortcomings it's still no language to poopoo on.

I feel that -compared to other languages such as Extended Basic- there's a lot more unknown territory to explore.

So it's great to see what people like @senior_falcon, @speccery and @pixelpedant are doing in that area.

 

Myself I got rather fond of TI Basic while working on its integration in Stevie.

Should do a proper video, it's kinda cool having multiple TI Basic sessions open at the same time and jumping back and forth between sessions)
 

 

 

 

  • Like 4
Link to comment
Share on other sites

7 hours ago, OLD CS1 said:

How about making CHAR faster?

I will look into that.

 

There is one very creative way to squeeze more performance out of Basic. Make another CALL named USEREM or similar. It would work like this:

10 CALL USEREM

20 REM HCHAR,1,1,42,96,VCHAR,1,10,65,48,SOUND,1000,220,0  etc.

USEREM goes to an assembly routine that reads bytes from the next line, which must be a REM statement. In this example, it would read HCHAR, and go to a custom HCHAR routine that reads row,column,character,#repeats and acts on them When done with that it reads VCHAR and again goes to a different custom VCHAR routine. And so on.

This would run completely from assembly and would be lightning fast, at least compared to the low bar set by BASIC. It also would allow multiple statements in a line (sort of).

Of course the big drawback is that everything must be constants. You couldn't use CALL HCHAR(ROW,COL+I,ASG(SEG$(A$,I,1))) so naturally the regular subprograms must be available as well.

 

This is a wonderful opportunity for someone to achieve fame and acclaim.

 

Link to comment
Share on other sites

It's a neat idea, but, without wishing to derail the topic from improving TI BASIC performance, I think at that point, you might as well invest the time in learning something like Forth, or assembler, or, of course, compiling the BASIC code which already works superbly well.

 

My preference would be to junk the internal ROMS & GROMS and just replace them entirely with something completely new and not related to the TI system in any way. A blank sheet. Not an easy feat of course, because the cards in the PEB (e.g. disk controller) have certain expectations and dependencies.

 

But it would be a great way to develop a basic interpreter on the console that is not shackled by the GPL interpreter. It should be possible (based on no calculations whatsoever ;)) to produce a BASIC interpreter that runs at, say, half the speed of a Forth system. I say that because a BASIC system is always interpreting the tokenised code, whereas a typical Forth system isn't normally doing that - it's executing a bunch of subroutines one way or another.

 

But if you replace the internal ROMS is it still a TI? I was keen on replacing the internal ROMS in the TI with TurboForth (since it fits in 16K) - it would be running on the native 16-bit bus so would be faaaaast - but.... Is it a TI? And... there would be a lot of stuff to redevelop - the DSR system (DSRLNK), XMLLNK etc which would probably be required for interfacing with the PEB cards. Not an easy job. TI/Microsoft really did a remarkable job back in the day with what they had.

  • Like 1
Link to comment
Share on other sites

Since we have a disassembly of the BASIC groms in INTERN, I believe that it should be possible to convert every gpl instruction to an assembly equivalent. Such an approach should produce a Basic interpreter that is totally compatible with the original gpl based interpreter, but considerably faster. It would have to go somewhere, probably in bank switched pages of ram in the cartridge. Certain things such as the editor would not have to be converted to assembly, although that would be nice.

Of course it is better to start with a blank sheet, but I will not hold my breath waiting for an all new assembly based BASIC.

 

  • Like 3
Link to comment
Share on other sites

1 hour ago, senior_falcon said:

Since we have a disassembly of the BASIC groms in INTERN, I believe that it should be possible to convert every gpl instruction to an assembly equivalent. Such an approach should produce a Basic interpreter that is totally compatible with the original gpl based interpreter, but considerably faster. It would have to go somewhere, probably in bank switched pages of ram in the cartridge. Certain things such as the editor would not have to be converted to assembly, although that would be nice.

Of course it is better to start with a blank sheet, but I will not hold my breath waiting for an all new assembly based BASIC.

 

Pretty much exactly what I have been doing with XB.

But there are some commands that just do not see any improvement like CALL GCHAR(row,column,numeric-variable) turned into Assembly was no faster.

Actually as it took more time to do GPL CLR @>6004 to turn on ROM 3 page was the slow down and no way to get around this the GPL GCHAR was faster.

  • Like 1
Link to comment
Share on other sites

I like the original idea of this project as it is essentially just a "wedge."  No special hardware is required beyond the MiniMemory module.  No introduction of special syntax, e.g. XB's CHAR subprogram which over-loads and extends the number of characters which can defined at once to four.  No special target environment for programming, just standard TI BASIC with a little kick in the pants.

  • Like 3
Link to comment
Share on other sites

4 hours ago, Willsy said:

But it would be a great way to develop a basic interpreter on the console that is not shackled by the GPL interpreter. It should be possible (based on no calculations whatsoever ;)) to produce a BASIC interpreter that runs at, say, half the speed of a Forth system. I say that because a BASIC system is always interpreting the tokenised code, whereas a typical Forth system isn't normally doing that - it's executing a bunch of subroutines one way or another.

 

And... there would be a lot of stuff to redevelop - the DSR system (DSRLNK), XMLLNK etc which would probably be required for interfacing with the PEB cards. Not an easy job.

Half the speed of Forth? Is that realistic, really? Forth is based on storing word-size addresses of the functions you want to execute. BASIC is based on storing byte-size instruction codes. It's more efficient to get the address directly, and jump to it, than to first find a token, then figure out where the code to execute is located and finally jump to it.

Look at the p-system. The p-code interpreter is pretty well optimized, not only for the 99/4A. It takes seven instructions to perform fetch, i.e. get the next instruction token (p-code) and figure out what to do with it. Then one more instruction is the minimum to actually do it, if it's simple enough. Like add two integers. The Forth experts here can tell you how many instructions are used to figure out what to do with a Forth address.

 

Actually, the DSR concept is one of the least tricky. Since TI did put almost all the burden on the DSR itself, not on the computer's code, you only have to find and execute the DSR. The principle for that is already known. It will work for all properly written DSR code. Only those taking shortcuts by using things in an unintended way will fail. There are at least one of them.

Link to comment
Share on other sites

3 hours ago, apersson850 said:

Half the speed of Forth? Is that realistic, really? Forth is based on storing word-size addresses of the functions you want to execute. BASIC is based on storing byte-size instruction codes. It's more efficient to get the address directly, and jump to it, than to first find a token, then figure out where the code to execute is located and finally jump to it.

Look at the p-system. The p-code interpreter is pretty well optimized, not only for the 99/4A. It takes seven instructions to perform fetch, i.e. get the next instruction token (p-code) and figure out what to do with it. Then one more instruction is the minimum to actually do it, if it's simple enough. Like add two integers. The Forth experts here can tell you how many instructions are used to figure out what to do with a Forth address.

 

Actually, the DSR concept is one of the least tricky. Since TI did put almost all the burden on the DSR itself, not on the computer's code, you only have to find and execute the DSR. The principle for that is already known. It will work for all properly written DSR code. Only those taking shortcuts by using things in an unintended way will fail. There are at least one of them.

I'd say of all the issues with a BASIC interpreter to be solved, the threading model and address lookup/execution model are one of the less contentious issues. One of the main issues is the syntax validation and error checking that takes place at runtime, rather than when a line of BASIC code is crunched/tokenised.

 

A case in point:

 

10 CALL HCHAR(1,2)

 

This line is completely invalid in TI BASIC/XB and results in an INCORRECT STATEMENT error. However, there's no reason why that particular error (missing argument) could not be caught at compile/tokenisation time. However, the interpreter writers chose to do it (and most BASIC interpreters do this, to be fair) at run time, presenting a further runtime penalty.

 

This is a long-winded way of saying "Yes, you're probably right!" - but I did pre-qualify my ill-considered statement by saying it was based on no research at all - and it was! Read at your own risk :)

 

It's an interesting point though isn't it? There is a lot of syntax validation that occurs at runtime that should be caught at compile time. E.g, missing operands, un-balanced parenthesis etc, which would all contribute to faster run times, even for interpreted code.

  • Like 4
Link to comment
Share on other sites

8 hours ago, OLD CS1 said:

I like the original idea of this project as it is essentially just a "wedge."  No special hardware is required beyond the MiniMemory module.  No introduction of special syntax, e.g. XB's CHAR subprogram which over-loads and extends the number of characters which can defined at once to four.  No special target environment for programming, just standard TI BASIC with a little kick in the pants.

The limit checks can be modified for CALL CHAR. This would allow  you to redefine characters 4-23 which are in the crunch buffer. The limit checks for CALL COLOR would need to be changed as well so you can set the colors of those characters. Or, with the changed limits for CALL CHAR, you have access to the sprite attribute list and could define up to 27 sprites. (Standard size and unmagnified) The sprite attribute list has to share space with the color table which is where sprites 4-8 are located. Auto motion should be possible, again using CALL CHAR.

Is this getting too far from standard TI BASIC?

  • Like 1
Link to comment
Share on other sites

23 minutes ago, senior_falcon said:

The limit checks can be modified for CALL CHAR. This would allow  you to redefine characters 4-23 which are in the crunch buffer. The limit checks for CALL COLOR would need to be changed as well so you can set the colors of those characters. Or, with the changed limits for CALL CHAR, you have access to the sprite attribute list and could define up to 27 sprites. (Standard size and unmagnified) The sprite attribute list has to share space with the color table which is where sprites 4-8 are located. Auto motion should be possible, again using CALL CHAR.

Is this getting too far from standard TI BASIC?

A good philosophical question.  While I think getting character definitions in place faster is sufficient, if we are talking about using the MiniMemory to run a wedge, why not expand CHAR and COLOR a little bit?  MiniMemory already adds several subprograms to TI BASIC, like POKEV and CHARPAT, so revamping existing subprograms is not a big deal.

 

But then, on the other hand, one could utilize POKEV for the same purposes.  I wrote some MiniMemory-specific programs which manipulated the console to use things like sprites, auto-motion, and text mode.

 

I was thinking about the purity question and considered all of the software BASIC extensions and wedges for the Commodore 64.  Why hold TI BASIC to a different purity standard than the Commodore 64 -- or VIC-20, for that matter, with its never-ending array of memory cartridges and BASIC extensions?  If it can all still run on a console with just MM, what does it hurt to spice BASIC up a bit?  We can have a wedge which runs existing programs with an extra kick of speed while offering a little extra functionality to those who want to use it.

  • Like 3
Link to comment
Share on other sites

14 hours ago, OLD CS1 said:

I like the original idea of this project as it is essentially just a "wedge."  No special hardware is required beyond the MiniMemory module.  No introduction of special syntax, e.g. XB's CHAR subprogram which over-loads and extends the number of characters which can defined at once to four.  No special target environment for programming, just standard TI BASIC with a little kick in the pants.

This actually is unfortunately not true. Like @senior_falcon wrote earlier in the thread, he modified the GPL code in the console ROMs to insert XML >99 at >50E0 in the GROM space. So you need to have GRAM capability and MiniMemory at the same time. Or from my perspective, you need the StrangeCart :) , which can both act as the MiniMemory and override system GROMs simultaneously. Without the GRAM ability on the system GROM area the wedge can't intercept the CALL routine and the speed benefits to existing unmodified Basic programs are not realized.

The action point for me is to modify the StrangeCart code a little so that I can test this on the real hardware. I have only tested the system GROM override feature before, but I haven't implemented TMS9900 writable GROMs i.e. GRAMs. This is a simple thing to do.

 

15 hours ago, TheMole said:

If @speccery's BASIC interpreter for the strangecart can be adapted and cross-compiled, it might be a good starting point?

My BASIC interpreter is written in C++ and currently its base class takes about 24K of ARM Cortex M4F code when compiled. The StrangeCart firmware derives two implementations of this base class, one to communicate with a serial port (over USB) for testing and another which actually talks to the TMS9900 and does the "real thing". Thus the base class is not sufficient, and a bit more code would required, which probably means something like 27k in total (just a guess) of ARM code.

 

If compiled for the TMS9900, the code would need to be ported to C as I don't think we have a C++ compiler for the TMS9900. However this is not a huge effort as the amount of code in the base class is just over 3500 lines of C++. Anyway since the TMS9900 compiled size would well exceed the 8k available in cartridge space, the code would need to be restructured to work with memory banks, which is a further complication. 

 

Finally, currently my version of TI Basic uses single precision floating point as the base data type for numeric Basic variables. This choice is by design since I wanted to have Basic with floating point math and use the FPU hardware of the StrangeCart. A floating point library for the TMS9900 would be needed. This we could take from the console ROMs, but integrating that would be more work.

  • Like 3
Link to comment
Share on other sites

43 minutes ago, speccery said:

This actually is unfortunately not true. Like @senior_falcon wrote earlier in the thread, he modified the GPL code in the console ROMs to insert XML >99 at >50E0 in the GROM space. So you need to have GRAM capability and MiniMemory at the same time. Or from my perspective, you need the StrangeCart :) , which can both act as the MiniMemory and override system GROMs simultaneously. Without the GRAM ability on the system GROM area the wedge can't intercept the CALL routine and the speed benefits to existing unmodified Basic programs are not realized.

I went back to see the disappointment with my own eyes.  I missed the entire post with the XML modification.  This will not work in hardware, but works great in Classic99.  Still an interesting project, even while lacking application to real iron without modification.

  • Like 3
Link to comment
Share on other sites

11 hours ago, Willsy said:

I'd say of all the issues with a BASIC interpreter to be solved, the threading model and address lookup/execution model are one of the less contentious issues. One of the main issues is the syntax validation and error checking that takes place at runtime, rather than when a line of BASIC code is crunched/tokenised.

I was a bit in a hurry when I wrote my comment. My intention was to point at that Pascal, as it is with the p-code card, doesn't stand a chance against Forth's speed, due to the interpreter's principle. On the other hand, many p-code instructions are half the size of a Forth instruction, as the p-codes are bytes.

Still, Pascal is syntax-checked at compile time, not at runtime. The checks actually done at runtime are those that are impossible at compile time. Like verifying that an index into an array isn't out of range. If the index is computed, it's impossible to catch at compile time, as you can't know which data the calculation will be based on.

Still, Pascal at best reaches the goal of half the speed of Forth. I did a test of something else the other day, where Pascal did the same thing in just over two seconds, but Extended BASIC spent a bit over eleven seconds. Thus it's much quicker than BASIC, especially if you can use integers where you had to use floating point variables in BASIC.

Now I see you are considering the same issue. What you want is actually the same thing as the p-system does, but you want a line-by-line compilation, rather than compiling the whole program in one fell swoop. If I understood correctly?

 

Edited by apersson850
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...