Jump to content
IGNORED

StrangeCart


speccery

Recommended Posts

RXB unlike XB256 can work with ONLY CONSOLE without 32K or any other devices.

And you get Assembly speeds from CONSOLE.

They are not even close to the same approach.

You are not being objective in the least.

 

Also if you take out the GROM delays then GPL would be the same speed as Forth.

 

Link to comment
Share on other sites

There is nowhere in my message where I were subjective. I tested for a while RXB and XB256 and made my choice. Please, accept that.
"ONLY CONSOLE without 32K ", an argument that you constantly put forward in loops and loops on AtariAge, yes it is technically interesting, an especially taste for challenge for you but, for present users, who in here really cares now, we are not anymore in the years 80 where any TI-99/4A expansion cost an arm and a leg. Now the 32 KB expansion costs nearly nothing. Like XB256, if ones want to use RXB, he must get too a FlashROM 99 or a FinalGROM99 that cost . So if ones can buy a such cartridge he can also buy a 32Kb expansion.  We see now that price is not a problem anymore. So I personally prefer using a language that offers many powerful graphic and sound features and great performances thanks to compilation,  benefiting in addition the 32KB memory space for more elaborated programs than the 16KB of the stock computer can offers.

  • Like 2
Link to comment
Share on other sites

14 hours ago, RXB said:

A good portion of XB3 is Assembly in the ROMs so I have been disassembling XB3 as there is no source code.

So far I have 90% of ROM 1 done and 30% of ROM 2 done.

With the GPL and ROMs source I can make XB way faster as more Assembly replaces GPL and is 100% backwards compatible.

If GPL could be speeded up it would kill my task entirely as useless.

It wouldn't be useless, Rich, but it would give people some additional use cases. That is the beauty of our hobby--variety.

  • Like 2
Link to comment
Share on other sites

1 hour ago, RXB said:

Also if you take out the GROM delays then GPL would be the same speed as Forth.

It would be the about same speed as byte-coded Forth. 

 

Byte code Forth is about 30% .. 40% slower than indirect threaded code Forth (Most TI-99 Forth systems)

ITC is 15% slower than direct threaded code Forth

DTC  is ~20% .. 30% slower than sub-routine threaded code Forth (Camel99 DTC Forth)

STC  is  ~2.5X slower than native code generating Forth compilers. :)

 

  • Like 2
Link to comment
Share on other sites

On 6/8/2023 at 6:03 AM, speccery said:

At least I don't see it that way. When we think about speeding up GPL in the broad sense, it appears there would be three ways to go:

  1. Convert GPL to TMS9900 machine code, which is what you have been doing. The beauty of this is that all you would need in addition to the bare computer is a ROM/GROM cartridge (I guess ROM only if everything was converted from GPL to assembly). Also, this approach is "era correct" since you're using the original CPU etc. aside from potentially using higher density memory chips RXB could have existed back in the day with the same good performance.

In principle, you could take a gpl program like TI BASIC or XB and do an instruction by instruction replacement of the gpl code with assembly instructions.

The following lines are from the TI BASIC interpreter, with assembly equivalents on the right.

Naturally, what I have written would need some changes. There are no labels. CALL GROM is stack oriented, so instead of BL you'd have to implement a stack for this, but I think you could use the same stack locations as regular BASIC.

I am certain that a total rewrite would be more efficient, but this approach has the advantage of not needing a lot of design work. It is based on code that is known to work, so you would not need to reinvent TI BASIC.

I wonder if one of the AI engines in the news recently could be trained to do this automatically.

 

(Edit) The beauty of converting TI BASIC as a first step is that it runs from VDP ram, which means that you have 32K of memory to use for the interpreter. So you you can just load and test without the complexities of a bank switched cartridge. This would be an excellent first step to see:

1 - if this is even possible

2 - if it is possible, what sort of speed increase could result.

 

2C2B DST @>8314,>0064       MOV @HX0064,@>8314
2C2F DST @>831E,>000A       MOV @HX000A,@>831E
2C33 ST @>8308,>2C            MOVB @HX2C00,@>8308            
2C36 DDEC @>8320               DEC @>8320
2C38 CALL GROM@>2C75        BL @G2C75
2C3B BS GROM@>2C2A          JEQ G2C2A
2C3D CALL GROM@>2EF9        BL @G2EF9
2C40 CZ @>830C                    CB @HX000A,@>830C  or MOVB @>830C,@>830C
2C42 BR GROM@>2C4F           JNE G2C4F
2C44 CZ @>8300                   CB @HX000A,@>8300
2C46 BR GROM@>2C4D         JNE G2C4D        (This seems odd, I think it could be BR GROM@>2016)
2C48 CALL GROM@>2C75       BL @G2C75
2C4B BR GROM@>2C65          JNE G2C65
2C4D BR GROM@>2016          JNE G2016
2C4F DST @>8314,@>8344     MOV @>8344,@>8314
2C52 CZ @>8300                    CB @HX000A,@>8300
2C54 BR GROM@>2C60          JNE G2C60
2C56 CALL GROM@>2C75        BL @G2C75
2C59 BS GROM@>2C2A           JEQ G2C2A
2C5B ST @>830E,@>8309       MOVB @>8309,@>830E
2C5E BS GROM@>2C65           JEQ G2C65
2C60 CALL GROM@>2C7A        BL @G2C7A
2C63 BS GROM@>2C2A           JEQ G2C2A
2C65 CALL GROM@>2EF9         BL @G2EF9

 

                                  HX0064 DATA >0064

                                  HX000A DATA >000A

 

 

Edited by senior_falcon
  • Like 3
Link to comment
Share on other sites

10 hours ago, fabrice montupet said:

There is nowhere in my message where I were subjective. I tested for a while RXB and XB256 and made my choice. Please, accept that.
"ONLY CONSOLE without 32K ", an argument that you constantly put forward in loops and loops on AtariAge, yes it is technically interesting, an especially taste for challenge for you but, for present users, who in here really cares now, we are not anymore in the years 80 where any TI-99/4A expansion cost an arm and a leg. Now the 32 KB expansion costs nearly nothing. Like XB256, if ones want to use RXB, he must get too a FlashROM 99 or a FinalGROM99 that cost . So if ones can buy a such cartridge he can also buy a 32Kb expansion.  We see now that price is not a problem anymore. So I personally prefer using a language that offers many powerful graphic and sound features and great performances thanks to compilation,  benefiting in addition the 32KB memory space for more elaborated programs than the 16KB of the stock computer can offers.

Do whatever you want no one is stopping you.

I do not rag on anything you do and getting sick of you doing it to me.

 

Link to comment
Share on other sites

6 hours ago, TheBF said:

It would be the about same speed as byte-coded Forth. 

 

Byte code Forth is about 30% .. 40% slower than indirect threaded code Forth (Most TI-99 Forth systems)

ITC is 15% slower than direct threaded code Forth

DTC  is ~20% .. 30% slower than sub-routine threaded code Forth (Camel99 DTC Forth)

STC  is  ~2.5X slower than native code generating Forth compilers. :)

 

Well unlike Forth the OS is on the 16 bit bus that includes the GPL Interpreter.

Tursi emulated GROM without the delays and stated it was as fast as Forth.

Link to comment
Share on other sites

5 hours ago, RXB said:

Well unlike Forth the OS is on the 16 bit bus that includes the GPL Interpreter.

Tursi emulated GROM without the delays and stated it was as fast as Forth.

Yes the 16 bit buss is a huge advantage on the 99 for sure. Very hard to top that.

Tursi's comparison to Forth is true as long as you limit the comparison to the indirect threaded Forth's like all the Forth's made for TI-99 in the past. 

 

I have a directed threaded compiler that is about 15% faster than Turbo Forth on most things I have tested. 

 

I have not made a sub-routine threaded Forth yet, but I want to make one to see what happens.

What this means is that Forth compiles real machine code, but each word is a 9900 sub-routine.

It should be about 2X faster than threaded Forths out there right now.  Downside is that programs will be much bigger.

The fix for that is to do a lot "inline" instructions rather than calling every command. We shall see if I can figure that out. :)

(This is how the commercial Forth compilers work since 1995 or so) 

 

I have a machine Forth system that generates native code which is 3X to 5X faster on some tests I have done. It is only a compiler, no interpreter.

 

And if you want to get really crazy, I made something called ASMFORTH. It is a Forth virtual machine with two stacks, Forth syntax for loops and branching but you can also use the registers. :-)

I made this one because @Reciprocating Bill made a sieve benchmark that blew everything out of the water including GCC and was 10X faster than threaded Forth. :) 

 

Here is what the code looks like. (It's really an assembler in a disguise) :) 

https://github.com/bfox9900/ASMFORTH/blob/main/demo/ASMFORTH-SIEVE.FTH

 

All that to say Forth is really just an idea about computing. How you implement it is a personal choice.

 

 

 

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

6 hours ago, RXB said:

Well unlike Forth the OS is on the 16 bit bus that includes the GPL Interpreter.

Tursi emulated GROM without the delays and stated it was as fast as Forth.

I don't know how fast Forth is. I did run a console with no real GROMs and UberGROMs that were 2-3 times faster than real GROMs... and it didn't make much difference to performance of TI BASIC. (A simple FOR...NEXT for 300 counts is about 1 second in either case). I then analyzed the GPL interpreter and determined that the GPL interpreter doesn't hit the GROMs often enough for their performance to make a lot of difference.

 

Classic99 didn't emulate GROM speed in the early days, and it was pretty hard to see the difference. Except for copy loops the impact of GROM speed is pretty minimal.

 

Please don't propagate that through the threads. Rich just remembered a little wrong. ;)

 

I do believe that with modern techniques we could re-write the GPL interpreter and make it fly. It does a lot of redundant work. But I guess nobody will believe me until it's done. ;) My thinking is ideas like the strangecart are better - I want to emulate the system on the cartridge and only talk to the console for actual I/O. Should be able to make XB fly that way. ;)


 

Edited by Tursi
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

12 hours ago, RXB said:

Do whatever you want no one is stopping you.

I do not rag on anything you do and getting sick of you doing it to me.

 

I read all the threads only because I am very interested in all concerning our dear TI-99/4A and I participate when I like to, don't be paranoiac.  So catch your breath and don't be surprised if, maybe one day, I answer to a future message from you to share my point of view.

  • Like 1
Link to comment
Share on other sites

18 hours ago, senior_falcon said:

In principle, you could take a gpl program like TI BASIC or XB and do an instruction by instruction replacement of the gpl code with assembly instructions.

The following lines are from the TI BASIC interpreter, with assembly equivalents on the right.

Naturally, what I have written would need some changes. There are no labels. CALL GROM is stack oriented, so instead of BL you'd have to implement a stack for this, but I think you could use the same stack locations as regular BASIC.

I am certain that a total rewrite would be more efficient, but this approach has the advantage of not needing a lot of design work. It is based on code that is known to work, so you would not need to reinvent TI BASIC.

I wonder if one of the AI engines in the news recently could be trained to do this automatically.

 

(Edit) The beauty of converting TI BASIC as a first step is that it runs from VDP ram, which means that you have 32K of memory to use for the interpreter. So you you can just load and test without the complexities of a bank switched cartridge. This would be an excellent first step to see:

1 - if this is even possible

2 - if it is possible, what sort of speed increase could result.

 

2C2B DST @>8314,>0064       MOV @HX0064,@>8314
2C2F DST @>831E,>000A       MOV @HX000A,@>831E
2C33 ST @>8308,>2C            MOVB @HX2C00,@>8308            
2C36 DDEC @>8320               DEC @>8320
2C38 CALL GROM@>2C75        BL @G2C75
2C3B BS GROM@>2C2A          JEQ G2C2A
2C3D CALL GROM@>2EF9        BL @G2EF9
2C40 CZ @>830C                    CB @HX000A,@>830C  or MOVB @>830C,@>830C
2C42 BR GROM@>2C4F           JNE G2C4F
2C44 CZ @>8300                   CB @HX000A,@>8300
2C46 BR GROM@>2C4D         JNE G2C4D        (This seems odd, I think it could be BR GROM@>2016)
2C48 CALL GROM@>2C75       BL @G2C75
2C4B BR GROM@>2C65          JNE G2C65
2C4D BR GROM@>2016          JNE G2016
2C4F DST @>8314,@>8344     MOV @>8344,@>8314
2C52 CZ @>8300                    CB @HX000A,@>8300
2C54 BR GROM@>2C60          JNE G2C60
2C56 CALL GROM@>2C75        BL @G2C75
2C59 BS GROM@>2C2A           JEQ G2C2A
2C5B ST @>830E,@>8309       MOVB @>8309,@>830E
2C5E BS GROM@>2C65           JEQ G2C65
2C60 CALL GROM@>2C7A        BL @G2C7A
2C63 BS GROM@>2C2A           JEQ G2C2A
2C65 CALL GROM@>2EF9         BL @G2EF9

 

                                  HX0064 DATA >0064

                                  HX000A DATA >000A

 

 

This is wild. You show that there is a one to one correspondence here between GPL and 9900 Ass'y language. :o

The only advantage with this kind of interpreter would therefore be portability. Program size is going to be the same or similar.

 

 

 

Link to comment
Share on other sites

2 hours ago, TheBF said:

This is wild. You show that there is a one to one correspondence here between GPL and 9900 Ass'y language. :o

The only advantage with this kind of interpreter would therefore be portability. Program size is going to be the same or similar.

 

 

 

Yea that is exactly what I am doing in RXB. 

Taking GPL routines and turning them into Assembly.

The big difference is I am doing it so it can still run from Console only without need for expansion RAM, 

of course, this is way tougher than using Expansion RAM.

Example of a math routine:

Spoiler

763C 0203  LI   R3,>7B14      (>4001 3907 6020 435F)                                         
763E 7B14                                                             
7640 0204  LI   R4,ARG                                             
7642 835C                                                            
7644 CD33  MOV  *R3+,*R4+     >4001 INTO ARG                           
7646 CD33  MOV  *R3+,*R4+     >3907 INTO ARG2                             
7648 CD33  MOV  *R3+,*R4+     >6020 INTO ARG4                             
764A C513  MOV  *R3,*R4       >435F INTO ARG6                             
764C C30B  MOV  R11,R12                      (??)                             
764E 06A0  BL   @FADD =>0D80                                                  
7650 0D80                                                             
7652 C2CC  MOV  R12,R11                      (??)                             
7654 C28B  MOV  R11,R10        *                            
7656 06A0  BL   @>79F6                       (??)                             
7658 79F6                                    (y?)                             
765A 0203  LI   R3,>7B1C      (>3F3F 4213 4D17 433A)                                          
765C 7B1C                                                           
765E 0204  LI   R4,ARG                                              
7660 835C                                                            
7662 CD33  MOV  *R3+,*R4+    >3F3F INTO ARG                            
7664 CD33  MOV  *R3+,*R4+    >4213 INTO ARG2                             
7666 CD33  MOV  *R3+,*R4+    >4D17 INTO ARG4                             
7668 C513  MOV  *R3,*R4      >433A INTO ARG6                             
766A 06A0  BL   @FMULT =>0E88                                                
766C 0E88                                                             
766E 04CC  CLR  R12                          (??)                             
7670 D320  MOVB @FAC,R12                   (? )                             
7672 834A                                    (?J)                             
7674 0760  ABS  @FAC                       (?`)                             
7676 834A                                    (?J)                             
7678 9820  CB   @FAC,@>7C9E   (>44)                             
767A 834A                                                              
767C 7C9E                                                              
767E 15D9  JGT  >7632                        (??)                             
7680 06A0  BL   @>7A26                       (??)                             
7682 7A26                                    (z&)                             
7684 06A0  BL   @>7028                       (??)                             
7686 7028                                    (p()                             
7688 D060  MOVB @FAC,R1                    (?`)                             
768A 834A                                    (?J)                             
768C 130B  JEQ  >76A4                        (??)                             
768E 0221  AI   R1,>BA00                     (?!)                             
7690 BA00                                    (??)                             
7692 1508  JGT  >76A4                        (??)                             
7694 0221  AI   R1,>5100                     (?!)                             
7696 5100                                    (Q?)                             
7698 0981  SRL  R1,8                         (??)                             
769A D821  MOVB @VAR0(R1),@R12LB            (?!)                             
769C 8300                                    (??)                             
769E 83F9                                    (??)                             
76A0 024C  ANDI R12,>FF03                    (?L)                             
76A2 FF03                                    (??)                             
76A4 06A0  BL   @SSUB                       (??)                             
76A6 0D74                                    (?t)                             
76A8 2320  COC  @>6058,12    (>0001)                            
76AA 6058                                                                
76AC 1609  JNE  >76C0                        (??)                             
76AE 0201  LI   R1,ARG                                             
76B0 835C                                                             
76B2 CC60  MOV  @>7B76,*R1+  (>4001 INTO ARG)                            
76B4 7B76                                                            
76B6 04F1  CLR  *R1+         >0000 INTO ARG2                             
76B8 04F1  CLR  *R1+         >0000 INTO ARG4                             
76BA 04D1  CLR  *R1          >0000 INTO ARG6                            
76BC 06A0  BL   @>0D7C                       (??)                             
76BE 0D7C                                    (?|)                             
76C0 2320  COC  @>60C2,12    (>0002)                            
76C2 60C2                                                             
76C4 1601  JNE  >76C8                        (??)                             
76C6 054C  INV  R12                          (?L)                             
76C8 C80C  MOV  R12,@TOPSTK                   (??)                             
76CA 8310                                    (??)                             
76CC 0203  LI   R3,FAC                                            
76CE 834A                                                            
76D0 0204  LI   R4,ARG                                               
76D2 835C                                                          
76D4 0205  LI   R5,LINUM                                              
76D6 8312                                                             
76D8 CD13  MOV  *R3,*R4+     FAC INTO ARG                            
76DA CD73  MOV  *R3+,*R5+    FAC INTO *>8312                             
76DC CD13  MOV  *R3,*R4+     FAC2 INTO ARG2                            
76DE CD73  MOV  *R3+,*R5+    FAC2 INTO *>8312                           
76E0 CD13  MOV  *R3,*R4+     FAC4 INTO ARG4                           
76E2 CD73  MOV  *R3+,*R5+    FAC4 INTO *>8312                             
76E4 C513  MOV  *R3,*R4      FAC6 INTO ARG6                             
76E6 C553  MOV  *R3,*R5      FAC6 INTO *>8312                             
76E8 06A0  BL   @FMULT =>0E88                                                
76EA 0E88                                                            
76EC 06A0  BL   @>7A9E                       (??)                             
76EE 7A9E                                    (z?)                             
76F0 7BEC  SB   @>0203(R12),@LINUM(R15)      ({?)                             
76F2 0203                                    (??)                             
76F4 8312                                    (??)                             
76F6 0204  LI   R4,ARG                                             
76F8 835C                                                                
76FA CD33  MOV  *R3+,*R4+                    (?3)                             
76FC CD33  MOV  *R3+,*R4+                    (?3)                             
76FE CD33  MOV  *R3+,*R4+                    (?3)                             
7700 C513  MOV  *R3,*R4                      (??)                             
7702 06A0  BL   @FMULT =>0E88                                                
7704 0E88                                                            
7706 0560  INV  @TOPSTK                       (?`)                             
7708 8310                                    (??)                             
770A 1102  JLT  >7710                        (??)                             
770C 0520  NEG  @FAC         NEGATE 1st WORD                            
770E 834A                                                              
7710 0460  B    @>74A4                       (?`)                             
7712 74A4                          

 

  • Like 2
Link to comment
Share on other sites

3 hours ago, TheBF said:

This is wild. You show that there is a one to one correspondence here between GPL and 9900 Ass'y language. :o

The only advantage with this kind of interpreter would therefore be portability. Program size is going to be the same or similar.

Actually, GPL is considerably more compact if you don't count the interpreter. In the example above, the gpl instructions take 58 bytes and the assembly instructions take 94.

I deliberately chose a section of code that was easy to convert to assembly. Most of the gpl instructions could be converted directly to assembly, but there are some more complex instructions such as IO and a strange one called FMT, which I have never used.

From Intern:

Op-Code: >08
 Description: FMT several operands
Description: Special output command for the screen. The FMT Interpreter is independent of the GPL Interpreter. ( See ROM-Listing >04DE through >05A1 )

 

So the task would require a good understanding of GPL. I think a clever programmer could write something that did the conversion automatically.

 

 

  • Like 4
Link to comment
Share on other sites

3 hours ago, senior_falcon said:

Actually, GPL is considerably more compact if you don't count the interpreter. In the example above, the gpl instructions take 58 bytes and the assembly instructions take 94.

I deliberately chose a section of code that was easy to convert to assembly. Most of the gpl instructions could be converted directly to assembly, but there are some more complex instructions such as IO and a strange one called FMT, which I have never used.

From Intern:

Op-Code: >08
 Description: FMT several operands
Description: Special output command for the screen. The FMT Interpreter is independent of the GPL Interpreter. ( See ROM-Listing >04DE through >05A1 )

 

So the task would require a good understanding of GPL. I think a clever programmer could write something that did the conversion automatically.

 

 

I started a GPL conversion project at one time into pure assembly.

It ended when it appeared that I was not up to that level of conversion.

Which is why I was hopeful someone would make a device like strange cart to do that instead.

Link to comment
Share on other sites

I've worked a little on optimising Basic execution on the StrangeCart, and I've been thinking about token formats for Basic. This message is going to be a bit technical, hopefully it makes sense. 

 

The tokenizer in the TI BASIC is a bit weird. Consider this line:

10 ABC=123

The TI BASIC tokenizer - and my tokenizer by default - create this, a screenshot from js99er.net VDP memory:

image.thumb.png.fd22985ceee65ef7bfeea6bce8bf4365.png

There is the line number table at >37C9, which has a single entry: Line number >000A (10 decimal) and the pointer to the line, >37CE.

In there we have (everything in hex):

37CE: 41 42 43 (ABC)
37D1: BE (token for assignment = )
37D2: C8 03 31 32 33 (Unquoted string, length 3, contents 123)

Thus at >37CE we find the the string ABC, in ASCII. Before it, the byte at >37CD is >0A which is the length of the tokenized line. The pointer in line number table never points to the length byte, it points to the first actual character.  

 

Anyway, the thing is that the variable name ABC is presented just like that, ABC, while the number 123 is tokenized as unquoted string, which conveniently includes the length byte. As I've been focused on performance, the small issue with ABC being stored just like that is that since there is no string length, the interpreter needs to count the length every time so that it can search the symbol table with that length. On the other hand, the constant 123 is stored as a string with length. 

From a performance point of view, the interpreter could run faster if the variable name length was precomputed, i.e. if it was stored as an unquoted string. I already implemented this as an optional feature, and it does improve performance if the variable names are a bit longer.

 

For the constant 123, it would be better if the numeric constants would be stored with their own token, and then stored in binary format not requiring any run time conversions. For example if there was a token for 16-bit integers, 123 could be encoded with that token followed by two bytes. This could then be interpreted in fixed time, very fast, without all the checks normally needed when converting from ASCII to a binary number. In a simple scenario all numbers could be handled with two tokens: a token for 16 bit numbers, and another token for floating point format constants to handle all non-integers and numbers not fitting in to 16 bits.

 

For variable references, it's time consuming and complex to have to search for symbol table all the time.  I'm thinking about creating a new token for variable references, let's call it VAR, and have a separate table which would contain the name, and a runtime pointer to the variable entry in the runtime symbol table. That would mean that the name "ABC" would be copied into a variable name table, let's say as entry 0 since it's the first variable in the program. In the tokenized program line there would be the token VAR, followed by an 8-bit index into the variable name table. This way all references to ABC would become two bytes VAR >00, and the program size would become smaller if ABC was used a lot (ABC uses 3 bytes, VAR+index two bytes). The variable name table would need to contain the length of 3, the string ABC, and a pointer to the runtime symbol table.

 

In this setup the variable name table would become an integral part of a Basic program, as important as the tokenized lines. However, it would be possible to convert it back to normal TI Basic format for saving. Also listing would be simple,

 

When a program is run, the symbol table is cleared at start. [The symbol table in the StrangeCart Basic contains all variable values, their dimensions if they are arrays, their type (floating point or string) etc.]

With this new token format the pointers in the variable name table would also need to be set to zero on start. As VAR 0 is encountered for the first time, the variable would be created in the runtime symbol table normally, the same way variables are created as they are encountered during interpretation. Once that's done, the address of the variable in the symbol table would be stored into this new variable name table. The net result would be that variables would never have to be searched, instead they could be directly referenced with the pointers in the variable name table. 

 

Sorry if this was a bit confusing, there are quite a few tables involved, but the benefit of this type of arrangement is that all variable references could be done in fixed time, regardless of program size. The program could still be listed normally. When saving, a simple conversion would have to be done to get back to TI BASIC format. The interpreter would not need to worry about variable names during runtime.

 

If you got this far you might wonder why not store the addresses of variables directly into the token stream. This could be done, but it would expand the size of the tokenized code quite a lot. It also might cause complications when editing the code - removing lines or adding new ones.

The other observation one might have is that what happens if a program has more than 256 variable names, since that's the maximum that a single byte after VAR token could reference. I think it rarely happens - if ever with TI Basic programs. This could be mitigated for example so that there would be an escape into two bytes after VAR token. A simple way would be to store indices 0-127 as a single byte. Having the most significant bit set would mean there would be another index byte, thus creating 15-bit index values.

  • Like 1
Link to comment
Share on other sites

I have been working with GPL the core of TI Basic and XB for way over 20 years now.

I believe the biggest problem for TI Basic and XB is Floating Point math has to be converted back and forth from Integer constantly.

An example is ROW=7 and COL=21 are both saved in the program as Floating-Point format, and when you do a DISPLAY AT(ROW,COL):A$

First the Floating-Point values of ROW and COL have to fetched and converted to Integer before being used.

This is also a problem for just ROW=ROW+16 too!

ROW is fetched and converted to Integer both times than 16 is added to it.

This really slows down the execution in a loop.

Why I think integer math added would really speed up just about everything in TI Basic and XB.

The second problem is all Variable names and Strings are stored in slower VDP RAM.

  • Like 1
Link to comment
Share on other sites

  • 3 months later...

GPL is in the TI-994A console ROM 0 while the GPL GROM 0 has the Menu sub-system and Cassette sub-system.

TI software like TI XB or PASCAL or most Cartridges are mostly written in GPL.

 

My misunderstanding of the Strange Cart was it was going to Emulate the GPL with a ARM chip, this turned out to be wrong.

 

Making the GPL 1000 times faster would make the TI on par with many computers like the PC with a 300 MHZ CPU instead of 3 MHZ CPU currently.

This would make the standard XB cart run almost 1000 times faster so most XB programs would require rewrites but would keep us busy.

Compiling XB from GPL to Assembly is what has been going on with the TI community but the real solution is just speed up GPL.

Link to comment
Share on other sites

8 minutes ago, RXB said:

GPL is in the TI-994A console ROM 0 while the GPL GROM 0 has the Menu sub-system and Cassette sub-system.

TI software like TI XB or PASCAL or most Cartridges are mostly written in GPL.

 

My misunderstanding of the Strange Cart was it was going to Emulate the GPL with a ARM chip, this turned out to be wrong.

 

Making the GPL 1000 times faster would make the TI on par with many computers like the PC with a 300 MHZ CPU instead of 3 MHZ CPU currently.

This would make the standard XB cart run almost 1000 times faster so most XB programs would require rewrites but would keep us busy.

Compiling XB from GPL to Assembly is what has been going on with the TI community but the real solution is just speed up GPL.

Do note, that though PASCAL is mostly stored in GROM, it is definitely not written in GPL. The GROM chips are being used as a GROM Disk here. The code itself is a mix of Assembly and p-Code.

 

I do agree that a GPL accelerator would be a very useful tool, especially if one could adjust the acceleration. The Geneve used something like this, with several speed settings for the GPL Interpreter, so there is precedent for this approach in the TI world.

  • Like 1
  • Thanks 2
Link to comment
Share on other sites

On 10/5/2023 at 8:31 PM, Ksarul said:

Do note, that though PASCAL is mostly stored in GROM, it is definitely not written in GPL. The GROM chips are being used as a GROM Disk here. The code itself is a mix of Assembly and p-Code.

 

I do agree that a GPL accelerator would be a very useful tool, especially if one could adjust the acceleration. The Geneve used something like this, with several speed settings for the GPL Interpreter, so there is precedent for this approach in the TI world.

Sorry for my long absence, I have been busy with real life, not much time for retro computing. With the autumn coming I hope I will have some more time.

The GPL acceleration interesting. I am not familiar with the Geneve (other than wanting one), I mean I know what it is but not from a user's perspective. I suppose they have done a better job with implementing the GPL interpreter.

 

To @RXB's question, about speeding up GPL, I have ventured into this domain in the icy99 project where I added a few new instructions to the TMS9900 core I built, and modified the ROM GPL interpreter to use those instructions in a few places. Accelerating GPL could be done with an accelerator like the StrangeCart, but I haven't found enough interest in me yet to try to do it. One issue is that the GPL interpreter is very tied to the scratchpad memory, and my understanding is that any machine code called by GPL using the XML opcode expects that the scratchpad is laid out exactly as it is normally. XML calls are quite common in GPL, so accelerating it becomes an exercise to interfacing the TMS9900 code too, it's not only running a very fast GPL interpreter.

  • Like 3
Link to comment
Share on other sites

32 minutes ago, speccery said:

Sorry for my long absence, I have been busy with real life, not much time for retro computing. With the autumn coming I hope I will have some more time.

The GPL acceleration interesting. I am not familiar with the Geneve (other than wanting one), I mean I know what it is but not from a user's perspective. I suppose they have done a better job with implementing the GPL interpreter.

 

To @RXB's question, about speeding up GPL, I have ventured into this domain in the icy99 project where I added a few new instructions to the TMS9900 core I built, and modified the ROM GPL interpreter to use those instructions in a few places. Accelerating GPL could be done with an accelerator like the StrangeCart, but I haven't found enough interest in me yet to try to do it. One issue is that the GPL interpreter is very tied to the scratchpad memory, and my understanding is that any machine code called by GPL using the XML opcode expects that the scratchpad is laid out exactly as it is normally. XML calls are quite common in GPL, so accelerating it becomes an exercise to interfacing the TMS9900 code too, it's not only running a very fast GPL interpreter.

Sorry you are wrong about SCRATCH PAD!

 

QUOTE: "One issue is that the GPL interpreter is very tied to the scratchpad memory, and my understanding is that any machine code called by GPL using the XML opcode expects that the scratchpad is laid out exactly as it is normally. XML calls are quite common in GPL, so accelerating it becomes an exercise to interfacing the TMS9900 code too, it's not only running a very fast GPL interpreter."

 

RXB 2022 uses scratchpad for APHALOCK, CLEAR, CLEARPRINT, HCHAR, VCHAR, HEX, HPUT, HGET, VPUT, HGET, VGET, INVERSE, & SAMS COMMANDS are all ASSEMBLY all NEW XML ROUTINES!

The first 24 bytes can be used for anything as they are all temporary and you still can use FAC & ARG (36 bytes) so that does not seem by much but only uses GPL Registers in Scratchpad.

RXB 2022 uses the SCRATCH PAD GPL Registers R0 to R10 for everything it does as only Registers R11 to R15 need to be preserved in XB.

 

The problem with GPL is GROM chip speed access not that it is slow. As Tursi has stated GPL would be as fast as Forth if this problem was addressed. VDP has the same exact issue.

RXB 2022 is an attack on the VDP problem as not much I can do about the GPL problem except make it pure assembly, if possible, in some subroutines for speed but ROM is slow too.

 

A device like STRANGE CART could have a on board version of GROM 0 and ROM 0 to take over access to GPL and offload that to the ARM chip instead which is much faster.

Thus, such a device would speed up GPL by 1000% increase and that would spur a bunch of people rewriting GPL to take advantage of this device.

Link to comment
Share on other sites

  • 3 weeks later...

After a long pause I've been working a bit on improving the StrangeCart Basic interpreter.

 

Performance improvement attempt

I have used Noel's (I forget his handle here) Basic program as a test program, as I am sure I have written here before. I have also written about how much excess work the interpreter does when running the code. The test program goes like this:

10 FOR I=1 TO 10
20 S=0
30 FOR J=1 TO 1000
40 S=S+J
50 NEXT J
60 PRINT ".";
70 NEXT I
80 PRINT S

The inner loop consists of lines 40 and 50. The TI-99/4A Basic runs this in 77 seconds if I remember properly. On the STM32G431 port of StrangeCart Basic (explained below a bit) this code ran in 0.137 seconds. That's only 560 times faster, and I know there is a lot to do optimise execution. One thing to do is to just optimise the code in general, step by the step to get incremental gains. I think I have previously contemplated here about transforming the code into a different representation, and rather than making simple optimisations I made a complex modification to the code. I targeted expression evaluation, specifically LET statements (above lines 20 and 40). I modified the expression evaluator so that in addition to interpreting, it builds a parse tree of the expressions. The parse tree generation builds a directed graph of the expression. It doesn't support yet support all expressions, as this more of a proof of concept.

Anyway now when a LET statement is encountered, the code checks if a node tree exists already for the line in question. If it does, the code skips interpretation completely and uses the stored node tree to evaluate the expression. If no node tree exists, it will build the tree and stores it. For debugging I added code to dump the parse trees into textual format, so that I can see if it works right. This the output after inserting a line "45 stop" to halt execution so that the output can be observed without the crazy loops. The code writes spaces in front of the operations in the node tree to reflect the depth in the tree:

>45 stop
>run
*** STRANGECART RUN ***
BIOP STORE:
  ADDR &0x20001AD4
  CONSTF 0
BIOP STORE:
  ADDR &0x20001AD4
  BIOP ADD:
    FETCH
      ADDR &0x20001AD4
    FETCH
      ADDR &0x20001AEC
Stop. Use cont to continue.Runtime 0 seconds and 19802 us.
0.019802

The two let statements have as root nodes "BIOP STORE" nodes. BIOP stands for binary operator, in this case that means the operator takes two arguments. Line 20 S=0 generated this:

BIOP STORE:
  ADDR &0x20001AD4
  CONSTF 0

The first operand of the LET contains the target address for the store (it is the left node of BIOP in the tree). The address is the address of the variable S' data field in the symbol table. The first execution of the line 20 runs in interpreter mode, and among other things creates the variable S in the symbol table, in this example the actual floating point value storage address of S is at 0x20001AD4. The second operand of store is the value, which in this case is presented with the terminal node CONSTF 0 (floating point constant with value of zero). When this simple tree is evaluated, it simply writes the floating value of zero to the address 0x20001AD4. Note that when the stored tree is evaluated, the code does not need to search for variables or anything, it's all stored in the tree. [For Forth aficionados this would be the same as "0.0 0x20001AD4 !" or something like that, I don't remember how floating point numbers are expressed.]

 

The second tree for the Basic line 40 S=S+J is a little more complex:

BIOP STORE:
  ADDR &0x20001AD4
  BIOP ADD:
    FETCH
      ADDR &0x20001AD4
    FETCH
      ADDR &0x20001AEC

The beginning is the same, with the STORE and the destination address. The data to be stored is more interesting, as it is another BIOP node, this time ADD. The two children of this node are both memory fetch unary operations, one from the address of S and the other from the address of J.

 

Running normally - without displaying the contents - the benchmark now runs on the STM32G431 much faster, finishing the benchmarking 0.036 seconds, which happens to be 3.6 times faster than the previous time 0.129 seconds. Each iteration of lines 30 and 40 together now takes 3.6 microseconds. This can still be substantially improved. The TMS9900 can hardly execute a single machine instruction in that time, and here we do on line 40 two fetches, one add, a store, plus all the activities of NEXT (fetch J, add the step of 1.0 to it, store new value of J, compare it to the limit of 10.0 and do a conditional branch) back to line 40. 

 

In case you are wondering, this kind of node presentation of expressions is typical for compilers. However, I am not compiling the Basic code (yet) to machine code, but rather storing the whole tree as a data structure and evaluating (in practice traversing through) the tree on the fly. The node trees consume a lot of memory compared to the Basic tokens, and the STM32G431 only has 32K of SRAM. But this functionality is getting close to what a just-in-time compiler would do. Still, in many programs certain inner loops consume most of the time, and storing a bunch of expression evaluation trees for those would not need to consume much memory. One kilobyte would go a long way.

 

A couple of ports of code

I ported the current version over to two microcontrollers, the STM32G0B1 and STM32G431. The former is based on the Cortex M0 core and the latter on the Cortex M4 (incidentally the MCU on the StrangeCart contains both of these cores). I've used the ST Micro's software development toolchain now for a while, and have started become familiar with it. My GROM replacement grommy boards use the STM32G0 series chips too.

Even if Cortex M4 and M0 are very similar, there are differences which uncovered a few bugs, or at least portability issues, with the code. It turned out that Cortex M0 does not support unaligned memory loads. The Ti99/4A tokenised lines of code contain line numbers stored as 16-bit integers. I was loading them as 16-bit quantities with a 16-bit load, but that raises a memory access exception on the Cortex M0 if the 16-bit quantity is located on an odd address. Also the TMS9900 cannot do unaligned loads like that. That was simple to fix.

 

Bug fixes

I uncovered some bugs, the expression evaluator was not doing math in the correct order. An expression such as 1/2*3 was evaluated as 1/(2*3) instead of the correct order (1/2)*3. That was simple to fix. I also noticed that I didn't properly handle everywhere in the code the valid unusual variable name characters: @ [ ] are valid characters in TI Basic. You can write:

]=3

And use that as a variable name. Now those are supported too.

I've also done work to make the tokeniser work better. It can now correctly tokenise more complex programs, although I know there are still some issues.

 

Overall, this is interesting stuff to work on. I will make a new StrangeCart firmware version when I am a bit further with all of this.

Edited by speccery
  • Like 7
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...