RXB Posted June 8, 2023 Share Posted June 8, 2023 RXB unlike XB256 can work with ONLY CONSOLE without 32K or any other devices. And you get Assembly speeds from CONSOLE. They are not even close to the same approach. You are not being objective in the least. Also if you take out the GROM delays then GPL would be the same speed as Forth. Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5266761 Share on other sites More sharing options...
fabrice montupet Posted June 8, 2023 Share Posted June 8, 2023 There is nowhere in my message where I were subjective. I tested for a while RXB and XB256 and made my choice. Please, accept that. "ONLY CONSOLE without 32K ", an argument that you constantly put forward in loops and loops on AtariAge, yes it is technically interesting, an especially taste for challenge for you but, for present users, who in here really cares now, we are not anymore in the years 80 where any TI-99/4A expansion cost an arm and a leg. Now the 32 KB expansion costs nearly nothing. Like XB256, if ones want to use RXB, he must get too a FlashROM 99 or a FinalGROM99 that cost . So if ones can buy a such cartridge he can also buy a 32Kb expansion. We see now that price is not a problem anymore. So I personally prefer using a language that offers many powerful graphic and sound features and great performances thanks to compilation, benefiting in addition the 32KB memory space for more elaborated programs than the 16KB of the stock computer can offers. 2 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5266794 Share on other sites More sharing options...
+Ksarul Posted June 8, 2023 Share Posted June 8, 2023 14 hours ago, RXB said: A good portion of XB3 is Assembly in the ROMs so I have been disassembling XB3 as there is no source code. So far I have 90% of ROM 1 done and 30% of ROM 2 done. With the GPL and ROMs source I can make XB way faster as more Assembly replaces GPL and is 100% backwards compatible. If GPL could be speeded up it would kill my task entirely as useless. It wouldn't be useless, Rich, but it would give people some additional use cases. That is the beauty of our hobby--variety. 3 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5266841 Share on other sites More sharing options...
+TheBF Posted June 9, 2023 Share Posted June 9, 2023 1 hour ago, RXB said: Also if you take out the GROM delays then GPL would be the same speed as Forth. It would be the about same speed as byte-coded Forth. Byte code Forth is about 30% .. 40% slower than indirect threaded code Forth (Most TI-99 Forth systems) ITC is 15% slower than direct threaded code Forth DTC is ~20% .. 30% slower than sub-routine threaded code Forth (Camel99 DTC Forth) STC is ~2.5X slower than native code generating Forth compilers. 3 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5266869 Share on other sites More sharing options...
senior_falcon Posted June 9, 2023 Share Posted June 9, 2023 (edited) On 6/8/2023 at 6:03 AM, speccery said: At least I don't see it that way. When we think about speeding up GPL in the broad sense, it appears there would be three ways to go: Convert GPL to TMS9900 machine code, which is what you have been doing. The beauty of this is that all you would need in addition to the bare computer is a ROM/GROM cartridge (I guess ROM only if everything was converted from GPL to assembly). Also, this approach is "era correct" since you're using the original CPU etc. aside from potentially using higher density memory chips RXB could have existed back in the day with the same good performance. In principle, you could take a gpl program like TI BASIC or XB and do an instruction by instruction replacement of the gpl code with assembly instructions. The following lines are from the TI BASIC interpreter, with assembly equivalents on the right. Naturally, what I have written would need some changes. There are no labels. CALL GROM is stack oriented, so instead of BL you'd have to implement a stack for this, but I think you could use the same stack locations as regular BASIC. I am certain that a total rewrite would be more efficient, but this approach has the advantage of not needing a lot of design work. It is based on code that is known to work, so you would not need to reinvent TI BASIC. I wonder if one of the AI engines in the news recently could be trained to do this automatically. (Edit) The beauty of converting TI BASIC as a first step is that it runs from VDP ram, which means that you have 32K of memory to use for the interpreter. So you you can just load and test without the complexities of a bank switched cartridge. This would be an excellent first step to see: 1 - if this is even possible 2 - if it is possible, what sort of speed increase could result. 2C2B DST @>8314,>0064 MOV @HX0064,@>8314 2C2F DST @>831E,>000A MOV @HX000A,@>831E 2C33 ST @>8308,>2C MOVB @HX2C00,@>8308 2C36 DDEC @>8320 DEC @>8320 2C38 CALL GROM@>2C75 BL @G2C75 2C3B BS GROM@>2C2A JEQ G2C2A 2C3D CALL GROM@>2EF9 BL @G2EF9 2C40 CZ @>830C CB @HX000A,@>830C or MOVB @>830C,@>830C 2C42 BR GROM@>2C4F JNE G2C4F 2C44 CZ @>8300 CB @HX000A,@>8300 2C46 BR GROM@>2C4D JNE G2C4D (This seems odd, I think it could be BR GROM@>2016) 2C48 CALL GROM@>2C75 BL @G2C75 2C4B BR GROM@>2C65 JNE G2C65 2C4D BR GROM@>2016 JNE G2016 2C4F DST @>8314,@>8344 MOV @>8344,@>8314 2C52 CZ @>8300 CB @HX000A,@>8300 2C54 BR GROM@>2C60 JNE G2C60 2C56 CALL GROM@>2C75 BL @G2C75 2C59 BS GROM@>2C2A JEQ G2C2A 2C5B ST @>830E,@>8309 MOVB @>8309,@>830E 2C5E BS GROM@>2C65 JEQ G2C65 2C60 CALL GROM@>2C7A BL @G2C7A 2C63 BS GROM@>2C2A JEQ G2C2A 2C65 CALL GROM@>2EF9 BL @G2EF9 HX0064 DATA >0064 HX000A DATA >000A Edited June 9, 2023 by senior_falcon 4 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5266898 Share on other sites More sharing options...
RXB Posted June 9, 2023 Share Posted June 9, 2023 10 hours ago, fabrice montupet said: There is nowhere in my message where I were subjective. I tested for a while RXB and XB256 and made my choice. Please, accept that. "ONLY CONSOLE without 32K ", an argument that you constantly put forward in loops and loops on AtariAge, yes it is technically interesting, an especially taste for challenge for you but, for present users, who in here really cares now, we are not anymore in the years 80 where any TI-99/4A expansion cost an arm and a leg. Now the 32 KB expansion costs nearly nothing. Like XB256, if ones want to use RXB, he must get too a FlashROM 99 or a FinalGROM99 that cost . So if ones can buy a such cartridge he can also buy a 32Kb expansion. We see now that price is not a problem anymore. So I personally prefer using a language that offers many powerful graphic and sound features and great performances thanks to compilation, benefiting in addition the 32KB memory space for more elaborated programs than the 16KB of the stock computer can offers. Do whatever you want no one is stopping you. I do not rag on anything you do and getting sick of you doing it to me. Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5266945 Share on other sites More sharing options...
RXB Posted June 9, 2023 Share Posted June 9, 2023 6 hours ago, TheBF said: It would be the about same speed as byte-coded Forth. Byte code Forth is about 30% .. 40% slower than indirect threaded code Forth (Most TI-99 Forth systems) ITC is 15% slower than direct threaded code Forth DTC is ~20% .. 30% slower than sub-routine threaded code Forth (Camel99 DTC Forth) STC is ~2.5X slower than native code generating Forth compilers. Well unlike Forth the OS is on the 16 bit bus that includes the GPL Interpreter. Tursi emulated GROM without the delays and stated it was as fast as Forth. Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5266947 Share on other sites More sharing options...
+TheBF Posted June 9, 2023 Share Posted June 9, 2023 5 hours ago, RXB said: Well unlike Forth the OS is on the 16 bit bus that includes the GPL Interpreter. Tursi emulated GROM without the delays and stated it was as fast as Forth. Yes the 16 bit buss is a huge advantage on the 99 for sure. Very hard to top that. Tursi's comparison to Forth is true as long as you limit the comparison to the indirect threaded Forth's like all the Forth's made for TI-99 in the past. I have a directed threaded compiler that is about 15% faster than Turbo Forth on most things I have tested. I have not made a sub-routine threaded Forth yet, but I want to make one to see what happens. What this means is that Forth compiles real machine code, but each word is a 9900 sub-routine. It should be about 2X faster than threaded Forths out there right now. Downside is that programs will be much bigger. The fix for that is to do a lot "inline" instructions rather than calling every command. We shall see if I can figure that out. (This is how the commercial Forth compilers work since 1995 or so) I have a machine Forth system that generates native code which is 3X to 5X faster on some tests I have done. It is only a compiler, no interpreter. And if you want to get really crazy, I made something called ASMFORTH. It is a Forth virtual machine with two stacks, Forth syntax for loops and branching but you can also use the registers. :-) I made this one because @Reciprocating Bill made a sieve benchmark that blew everything out of the water including GCC and was 10X faster than threaded Forth. Here is what the code looks like. (It's really an assembler in a disguise) https://github.com/bfox9900/ASMFORTH/blob/main/demo/ASMFORTH-SIEVE.FTH All that to say Forth is really just an idea about computing. How you implement it is a personal choice. 4 1 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5267015 Share on other sites More sharing options...
Tursi Posted June 9, 2023 Share Posted June 9, 2023 (edited) 6 hours ago, RXB said: Well unlike Forth the OS is on the 16 bit bus that includes the GPL Interpreter. Tursi emulated GROM without the delays and stated it was as fast as Forth. I don't know how fast Forth is. I did run a console with no real GROMs and UberGROMs that were 2-3 times faster than real GROMs... and it didn't make much difference to performance of TI BASIC. (A simple FOR...NEXT for 300 counts is about 1 second in either case). I then analyzed the GPL interpreter and determined that the GPL interpreter doesn't hit the GROMs often enough for their performance to make a lot of difference. Classic99 didn't emulate GROM speed in the early days, and it was pretty hard to see the difference. Except for copy loops the impact of GROM speed is pretty minimal. Please don't propagate that through the threads. Rich just remembered a little wrong. I do believe that with modern techniques we could re-write the GPL interpreter and make it fly. It does a lot of redundant work. But I guess nobody will believe me until it's done. My thinking is ideas like the strangecart are better - I want to emulate the system on the cartridge and only talk to the console for actual I/O. Should be able to make XB fly that way. Edited June 9, 2023 by Tursi 2 1 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5267039 Share on other sites More sharing options...
RXB Posted June 9, 2023 Share Posted June 9, 2023 My bad, I guess i just remembered that one statement without the context. 1 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5267122 Share on other sites More sharing options...
fabrice montupet Posted June 9, 2023 Share Posted June 9, 2023 12 hours ago, RXB said: Do whatever you want no one is stopping you. I do not rag on anything you do and getting sick of you doing it to me. I read all the threads only because I am very interested in all concerning our dear TI-99/4A and I participate when I like to, don't be paranoiac. So catch your breath and don't be surprised if, maybe one day, I answer to a future message from you to share my point of view. 1 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5267231 Share on other sites More sharing options...
+TheBF Posted June 9, 2023 Share Posted June 9, 2023 18 hours ago, senior_falcon said: In principle, you could take a gpl program like TI BASIC or XB and do an instruction by instruction replacement of the gpl code with assembly instructions. The following lines are from the TI BASIC interpreter, with assembly equivalents on the right. Naturally, what I have written would need some changes. There are no labels. CALL GROM is stack oriented, so instead of BL you'd have to implement a stack for this, but I think you could use the same stack locations as regular BASIC. I am certain that a total rewrite would be more efficient, but this approach has the advantage of not needing a lot of design work. It is based on code that is known to work, so you would not need to reinvent TI BASIC. I wonder if one of the AI engines in the news recently could be trained to do this automatically. (Edit) The beauty of converting TI BASIC as a first step is that it runs from VDP ram, which means that you have 32K of memory to use for the interpreter. So you you can just load and test without the complexities of a bank switched cartridge. This would be an excellent first step to see: 1 - if this is even possible 2 - if it is possible, what sort of speed increase could result. 2C2B DST @>8314,>0064 MOV @HX0064,@>8314 2C2F DST @>831E,>000A MOV @HX000A,@>831E 2C33 ST @>8308,>2C MOVB @HX2C00,@>8308 2C36 DDEC @>8320 DEC @>8320 2C38 CALL GROM@>2C75 BL @G2C75 2C3B BS GROM@>2C2A JEQ G2C2A 2C3D CALL GROM@>2EF9 BL @G2EF9 2C40 CZ @>830C CB @HX000A,@>830C or MOVB @>830C,@>830C 2C42 BR GROM@>2C4F JNE G2C4F 2C44 CZ @>8300 CB @HX000A,@>8300 2C46 BR GROM@>2C4D JNE G2C4D (This seems odd, I think it could be BR GROM@>2016) 2C48 CALL GROM@>2C75 BL @G2C75 2C4B BR GROM@>2C65 JNE G2C65 2C4D BR GROM@>2016 JNE G2016 2C4F DST @>8314,@>8344 MOV @>8344,@>8314 2C52 CZ @>8300 CB @HX000A,@>8300 2C54 BR GROM@>2C60 JNE G2C60 2C56 CALL GROM@>2C75 BL @G2C75 2C59 BS GROM@>2C2A JEQ G2C2A 2C5B ST @>830E,@>8309 MOVB @>8309,@>830E 2C5E BS GROM@>2C65 JEQ G2C65 2C60 CALL GROM@>2C7A BL @G2C7A 2C63 BS GROM@>2C2A JEQ G2C2A 2C65 CALL GROM@>2EF9 BL @G2EF9 HX0064 DATA >0064 HX000A DATA >000A This is wild. You show that there is a one to one correspondence here between GPL and 9900 Ass'y language. The only advantage with this kind of interpreter would therefore be portability. Program size is going to be the same or similar. Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5267251 Share on other sites More sharing options...
RXB Posted June 9, 2023 Share Posted June 9, 2023 2 hours ago, TheBF said: This is wild. You show that there is a one to one correspondence here between GPL and 9900 Ass'y language. The only advantage with this kind of interpreter would therefore be portability. Program size is going to be the same or similar. Yea that is exactly what I am doing in RXB. Taking GPL routines and turning them into Assembly. The big difference is I am doing it so it can still run from Console only without need for expansion RAM, of course, this is way tougher than using Expansion RAM. Example of a math routine: Spoiler 763C 0203 LI R3,>7B14 (>4001 3907 6020 435F) 763E 7B14 7640 0204 LI R4,ARG 7642 835C 7644 CD33 MOV *R3+,*R4+ >4001 INTO ARG 7646 CD33 MOV *R3+,*R4+ >3907 INTO ARG2 7648 CD33 MOV *R3+,*R4+ >6020 INTO ARG4 764A C513 MOV *R3,*R4 >435F INTO ARG6 764C C30B MOV R11,R12 (??) 764E 06A0 BL @FADD =>0D80 7650 0D80 7652 C2CC MOV R12,R11 (??) 7654 C28B MOV R11,R10 * 7656 06A0 BL @>79F6 (??) 7658 79F6 (y?) 765A 0203 LI R3,>7B1C (>3F3F 4213 4D17 433A) 765C 7B1C 765E 0204 LI R4,ARG 7660 835C 7662 CD33 MOV *R3+,*R4+ >3F3F INTO ARG 7664 CD33 MOV *R3+,*R4+ >4213 INTO ARG2 7666 CD33 MOV *R3+,*R4+ >4D17 INTO ARG4 7668 C513 MOV *R3,*R4 >433A INTO ARG6 766A 06A0 BL @FMULT =>0E88 766C 0E88 766E 04CC CLR R12 (??) 7670 D320 MOVB @FAC,R12 (? ) 7672 834A (?J) 7674 0760 ABS @FAC (?`) 7676 834A (?J) 7678 9820 CB @FAC,@>7C9E (>44) 767A 834A 767C 7C9E 767E 15D9 JGT >7632 (??) 7680 06A0 BL @>7A26 (??) 7682 7A26 (z&) 7684 06A0 BL @>7028 (??) 7686 7028 (p() 7688 D060 MOVB @FAC,R1 (?`) 768A 834A (?J) 768C 130B JEQ >76A4 (??) 768E 0221 AI R1,>BA00 (?!) 7690 BA00 (??) 7692 1508 JGT >76A4 (??) 7694 0221 AI R1,>5100 (?!) 7696 5100 (Q?) 7698 0981 SRL R1,8 (??) 769A D821 MOVB @VAR0(R1),@R12LB (?!) 769C 8300 (??) 769E 83F9 (??) 76A0 024C ANDI R12,>FF03 (?L) 76A2 FF03 (??) 76A4 06A0 BL @SSUB (??) 76A6 0D74 (?t) 76A8 2320 COC @>6058,12 (>0001) 76AA 6058 76AC 1609 JNE >76C0 (??) 76AE 0201 LI R1,ARG 76B0 835C 76B2 CC60 MOV @>7B76,*R1+ (>4001 INTO ARG) 76B4 7B76 76B6 04F1 CLR *R1+ >0000 INTO ARG2 76B8 04F1 CLR *R1+ >0000 INTO ARG4 76BA 04D1 CLR *R1 >0000 INTO ARG6 76BC 06A0 BL @>0D7C (??) 76BE 0D7C (?|) 76C0 2320 COC @>60C2,12 (>0002) 76C2 60C2 76C4 1601 JNE >76C8 (??) 76C6 054C INV R12 (?L) 76C8 C80C MOV R12,@TOPSTK (??) 76CA 8310 (??) 76CC 0203 LI R3,FAC 76CE 834A 76D0 0204 LI R4,ARG 76D2 835C 76D4 0205 LI R5,LINUM 76D6 8312 76D8 CD13 MOV *R3,*R4+ FAC INTO ARG 76DA CD73 MOV *R3+,*R5+ FAC INTO *>8312 76DC CD13 MOV *R3,*R4+ FAC2 INTO ARG2 76DE CD73 MOV *R3+,*R5+ FAC2 INTO *>8312 76E0 CD13 MOV *R3,*R4+ FAC4 INTO ARG4 76E2 CD73 MOV *R3+,*R5+ FAC4 INTO *>8312 76E4 C513 MOV *R3,*R4 FAC6 INTO ARG6 76E6 C553 MOV *R3,*R5 FAC6 INTO *>8312 76E8 06A0 BL @FMULT =>0E88 76EA 0E88 76EC 06A0 BL @>7A9E (??) 76EE 7A9E (z?) 76F0 7BEC SB @>0203(R12),@LINUM(R15) ({?) 76F2 0203 (??) 76F4 8312 (??) 76F6 0204 LI R4,ARG 76F8 835C 76FA CD33 MOV *R3+,*R4+ (?3) 76FC CD33 MOV *R3+,*R4+ (?3) 76FE CD33 MOV *R3+,*R4+ (?3) 7700 C513 MOV *R3,*R4 (??) 7702 06A0 BL @FMULT =>0E88 7704 0E88 7706 0560 INV @TOPSTK (?`) 7708 8310 (??) 770A 1102 JLT >7710 (??) 770C 0520 NEG @FAC NEGATE 1st WORD 770E 834A 7710 0460 B @>74A4 (?`) 7712 74A4 2 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5267300 Share on other sites More sharing options...
senior_falcon Posted June 10, 2023 Share Posted June 10, 2023 3 hours ago, TheBF said: This is wild. You show that there is a one to one correspondence here between GPL and 9900 Ass'y language. The only advantage with this kind of interpreter would therefore be portability. Program size is going to be the same or similar. Actually, GPL is considerably more compact if you don't count the interpreter. In the example above, the gpl instructions take 58 bytes and the assembly instructions take 94. I deliberately chose a section of code that was easy to convert to assembly. Most of the gpl instructions could be converted directly to assembly, but there are some more complex instructions such as IO and a strange one called FMT, which I have never used. From Intern: Op-Code: >08 Description: FMT several operands Description: Special output command for the screen. The FMT Interpreter is independent of the GPL Interpreter. ( See ROM-Listing >04DE through >05A1 ) So the task would require a good understanding of GPL. I think a clever programmer could write something that did the conversion automatically. 4 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5267339 Share on other sites More sharing options...
RXB Posted June 10, 2023 Share Posted June 10, 2023 3 hours ago, senior_falcon said: Actually, GPL is considerably more compact if you don't count the interpreter. In the example above, the gpl instructions take 58 bytes and the assembly instructions take 94. I deliberately chose a section of code that was easy to convert to assembly. Most of the gpl instructions could be converted directly to assembly, but there are some more complex instructions such as IO and a strange one called FMT, which I have never used. From Intern: Op-Code: >08 Description: FMT several operands Description: Special output command for the screen. The FMT Interpreter is independent of the GPL Interpreter. ( See ROM-Listing >04DE through >05A1 ) So the task would require a good understanding of GPL. I think a clever programmer could write something that did the conversion automatically. I started a GPL conversion project at one time into pure assembly. It ended when it appeared that I was not up to that level of conversion. Which is why I was hopeful someone would make a device like strange cart to do that instead. Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5267412 Share on other sites More sharing options...
speccery Posted June 11, 2023 Author Share Posted June 11, 2023 I've worked a little on optimising Basic execution on the StrangeCart, and I've been thinking about token formats for Basic. This message is going to be a bit technical, hopefully it makes sense. The tokenizer in the TI BASIC is a bit weird. Consider this line: 10 ABC=123 The TI BASIC tokenizer - and my tokenizer by default - create this, a screenshot from js99er.net VDP memory: There is the line number table at >37C9, which has a single entry: Line number >000A (10 decimal) and the pointer to the line, >37CE. In there we have (everything in hex): 37CE: 41 42 43 (ABC) 37D1: BE (token for assignment = ) 37D2: C8 03 31 32 33 (Unquoted string, length 3, contents 123) Thus at >37CE we find the the string ABC, in ASCII. Before it, the byte at >37CD is >0A which is the length of the tokenized line. The pointer in line number table never points to the length byte, it points to the first actual character. Anyway, the thing is that the variable name ABC is presented just like that, ABC, while the number 123 is tokenized as unquoted string, which conveniently includes the length byte. As I've been focused on performance, the small issue with ABC being stored just like that is that since there is no string length, the interpreter needs to count the length every time so that it can search the symbol table with that length. On the other hand, the constant 123 is stored as a string with length. From a performance point of view, the interpreter could run faster if the variable name length was precomputed, i.e. if it was stored as an unquoted string. I already implemented this as an optional feature, and it does improve performance if the variable names are a bit longer. For the constant 123, it would be better if the numeric constants would be stored with their own token, and then stored in binary format not requiring any run time conversions. For example if there was a token for 16-bit integers, 123 could be encoded with that token followed by two bytes. This could then be interpreted in fixed time, very fast, without all the checks normally needed when converting from ASCII to a binary number. In a simple scenario all numbers could be handled with two tokens: a token for 16 bit numbers, and another token for floating point format constants to handle all non-integers and numbers not fitting in to 16 bits. For variable references, it's time consuming and complex to have to search for symbol table all the time. I'm thinking about creating a new token for variable references, let's call it VAR, and have a separate table which would contain the name, and a runtime pointer to the variable entry in the runtime symbol table. That would mean that the name "ABC" would be copied into a variable name table, let's say as entry 0 since it's the first variable in the program. In the tokenized program line there would be the token VAR, followed by an 8-bit index into the variable name table. This way all references to ABC would become two bytes VAR >00, and the program size would become smaller if ABC was used a lot (ABC uses 3 bytes, VAR+index two bytes). The variable name table would need to contain the length of 3, the string ABC, and a pointer to the runtime symbol table. In this setup the variable name table would become an integral part of a Basic program, as important as the tokenized lines. However, it would be possible to convert it back to normal TI Basic format for saving. Also listing would be simple, When a program is run, the symbol table is cleared at start. [The symbol table in the StrangeCart Basic contains all variable values, their dimensions if they are arrays, their type (floating point or string) etc.] With this new token format the pointers in the variable name table would also need to be set to zero on start. As VAR 0 is encountered for the first time, the variable would be created in the runtime symbol table normally, the same way variables are created as they are encountered during interpretation. Once that's done, the address of the variable in the symbol table would be stored into this new variable name table. The net result would be that variables would never have to be searched, instead they could be directly referenced with the pointers in the variable name table. Sorry if this was a bit confusing, there are quite a few tables involved, but the benefit of this type of arrangement is that all variable references could be done in fixed time, regardless of program size. The program could still be listed normally. When saving, a simple conversion would have to be done to get back to TI BASIC format. The interpreter would not need to worry about variable names during runtime. If you got this far you might wonder why not store the addresses of variables directly into the token stream. This could be done, but it would expand the size of the tokenized code quite a lot. It also might cause complications when editing the code - removing lines or adding new ones. The other observation one might have is that what happens if a program has more than 256 variable names, since that's the maximum that a single byte after VAR token could reference. I think it rarely happens - if ever with TI Basic programs. This could be mitigated for example so that there would be an escape into two bytes after VAR token. A simple way would be to store indices 0-127 as a single byte. Having the most significant bit set would mean there would be another index byte, thus creating 15-bit index values. 1 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5268143 Share on other sites More sharing options...
RXB Posted June 11, 2023 Share Posted June 11, 2023 I have been working with GPL the core of TI Basic and XB for way over 20 years now. I believe the biggest problem for TI Basic and XB is Floating Point math has to be converted back and forth from Integer constantly. An example is ROW=7 and COL=21 are both saved in the program as Floating-Point format, and when you do a DISPLAY AT(ROW,COL):A$ First the Floating-Point values of ROW and COL have to fetched and converted to Integer before being used. This is also a problem for just ROW=ROW+16 too! ROW is fetched and converted to Integer both times than 16 is added to it. This really slows down the execution in a loop. Why I think integer math added would really speed up just about everything in TI Basic and XB. The second problem is all Variable names and Strings are stored in slower VDP RAM. 1 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5268178 Share on other sites More sharing options...
RXB Posted October 5, 2023 Share Posted October 5, 2023 GPL is in the TI-994A console ROM 0 while the GPL GROM 0 has the Menu sub-system and Cassette sub-system. TI software like TI XB or PASCAL or most Cartridges are mostly written in GPL. My misunderstanding of the Strange Cart was it was going to Emulate the GPL with a ARM chip, this turned out to be wrong. Making the GPL 1000 times faster would make the TI on par with many computers like the PC with a 300 MHZ CPU instead of 3 MHZ CPU currently. This would make the standard XB cart run almost 1000 times faster so most XB programs would require rewrites but would keep us busy. Compiling XB from GPL to Assembly is what has been going on with the TI community but the real solution is just speed up GPL. Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5328872 Share on other sites More sharing options...
+Ksarul Posted October 5, 2023 Share Posted October 5, 2023 8 minutes ago, RXB said: GPL is in the TI-994A console ROM 0 while the GPL GROM 0 has the Menu sub-system and Cassette sub-system. TI software like TI XB or PASCAL or most Cartridges are mostly written in GPL. My misunderstanding of the Strange Cart was it was going to Emulate the GPL with a ARM chip, this turned out to be wrong. Making the GPL 1000 times faster would make the TI on par with many computers like the PC with a 300 MHZ CPU instead of 3 MHZ CPU currently. This would make the standard XB cart run almost 1000 times faster so most XB programs would require rewrites but would keep us busy. Compiling XB from GPL to Assembly is what has been going on with the TI community but the real solution is just speed up GPL. Do note, that though PASCAL is mostly stored in GROM, it is definitely not written in GPL. The GROM chips are being used as a GROM Disk here. The code itself is a mix of Assembly and p-Code. I do agree that a GPL accelerator would be a very useful tool, especially if one could adjust the acceleration. The Geneve used something like this, with several speed settings for the GPL Interpreter, so there is precedent for this approach in the TI world. 1 2 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5328878 Share on other sites More sharing options...
speccery Posted October 8, 2023 Author Share Posted October 8, 2023 On 10/5/2023 at 8:31 PM, Ksarul said: Do note, that though PASCAL is mostly stored in GROM, it is definitely not written in GPL. The GROM chips are being used as a GROM Disk here. The code itself is a mix of Assembly and p-Code. I do agree that a GPL accelerator would be a very useful tool, especially if one could adjust the acceleration. The Geneve used something like this, with several speed settings for the GPL Interpreter, so there is precedent for this approach in the TI world. Sorry for my long absence, I have been busy with real life, not much time for retro computing. With the autumn coming I hope I will have some more time. The GPL acceleration interesting. I am not familiar with the Geneve (other than wanting one), I mean I know what it is but not from a user's perspective. I suppose they have done a better job with implementing the GPL interpreter. To @RXB's question, about speeding up GPL, I have ventured into this domain in the icy99 project where I added a few new instructions to the TMS9900 core I built, and modified the ROM GPL interpreter to use those instructions in a few places. Accelerating GPL could be done with an accelerator like the StrangeCart, but I haven't found enough interest in me yet to try to do it. One issue is that the GPL interpreter is very tied to the scratchpad memory, and my understanding is that any machine code called by GPL using the XML opcode expects that the scratchpad is laid out exactly as it is normally. XML calls are quite common in GPL, so accelerating it becomes an exercise to interfacing the TMS9900 code too, it's not only running a very fast GPL interpreter. 3 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5330108 Share on other sites More sharing options...
RXB Posted October 8, 2023 Share Posted October 8, 2023 32 minutes ago, speccery said: Sorry for my long absence, I have been busy with real life, not much time for retro computing. With the autumn coming I hope I will have some more time. The GPL acceleration interesting. I am not familiar with the Geneve (other than wanting one), I mean I know what it is but not from a user's perspective. I suppose they have done a better job with implementing the GPL interpreter. To @RXB's question, about speeding up GPL, I have ventured into this domain in the icy99 project where I added a few new instructions to the TMS9900 core I built, and modified the ROM GPL interpreter to use those instructions in a few places. Accelerating GPL could be done with an accelerator like the StrangeCart, but I haven't found enough interest in me yet to try to do it. One issue is that the GPL interpreter is very tied to the scratchpad memory, and my understanding is that any machine code called by GPL using the XML opcode expects that the scratchpad is laid out exactly as it is normally. XML calls are quite common in GPL, so accelerating it becomes an exercise to interfacing the TMS9900 code too, it's not only running a very fast GPL interpreter. Sorry you are wrong about SCRATCH PAD! QUOTE: "One issue is that the GPL interpreter is very tied to the scratchpad memory, and my understanding is that any machine code called by GPL using the XML opcode expects that the scratchpad is laid out exactly as it is normally. XML calls are quite common in GPL, so accelerating it becomes an exercise to interfacing the TMS9900 code too, it's not only running a very fast GPL interpreter." RXB 2022 uses scratchpad for APHALOCK, CLEAR, CLEARPRINT, HCHAR, VCHAR, HEX, HPUT, HGET, VPUT, HGET, VGET, INVERSE, & SAMS COMMANDS are all ASSEMBLY all NEW XML ROUTINES! The first 24 bytes can be used for anything as they are all temporary and you still can use FAC & ARG (36 bytes) so that does not seem by much but only uses GPL Registers in Scratchpad. RXB 2022 uses the SCRATCH PAD GPL Registers R0 to R10 for everything it does as only Registers R11 to R15 need to be preserved in XB. The problem with GPL is GROM chip speed access not that it is slow. As Tursi has stated GPL would be as fast as Forth if this problem was addressed. VDP has the same exact issue. RXB 2022 is an attack on the VDP problem as not much I can do about the GPL problem except make it pure assembly, if possible, in some subroutines for speed but ROM is slow too. A device like STRANGE CART could have a on board version of GROM 0 and ROM 0 to take over access to GPL and offload that to the ARM chip instead which is much faster. Thus, such a device would speed up GPL by 1000% increase and that would spur a bunch of people rewriting GPL to take advantage of this device. Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5330138 Share on other sites More sharing options...
speccery Posted October 26, 2023 Author Share Posted October 26, 2023 (edited) After a long pause I've been working a bit on improving the StrangeCart Basic interpreter. Performance improvement attempt I have used Noel's (I forget his handle here) Basic program as a test program, as I am sure I have written here before. I have also written about how much excess work the interpreter does when running the code. The test program goes like this: 10 FOR I=1 TO 10 20 S=0 30 FOR J=1 TO 1000 40 S=S+J 50 NEXT J 60 PRINT "."; 70 NEXT I 80 PRINT S The inner loop consists of lines 40 and 50. The TI-99/4A Basic runs this in 77 seconds if I remember properly. On the STM32G431 port of StrangeCart Basic (explained below a bit) this code ran in 0.137 seconds. That's only 560 times faster, and I know there is a lot to do optimise execution. One thing to do is to just optimise the code in general, step by the step to get incremental gains. I think I have previously contemplated here about transforming the code into a different representation, and rather than making simple optimisations I made a complex modification to the code. I targeted expression evaluation, specifically LET statements (above lines 20 and 40). I modified the expression evaluator so that in addition to interpreting, it builds a parse tree of the expressions. The parse tree generation builds a directed graph of the expression. It doesn't support yet support all expressions, as this more of a proof of concept. Anyway now when a LET statement is encountered, the code checks if a node tree exists already for the line in question. If it does, the code skips interpretation completely and uses the stored node tree to evaluate the expression. If no node tree exists, it will build the tree and stores it. For debugging I added code to dump the parse trees into textual format, so that I can see if it works right. This the output after inserting a line "45 stop" to halt execution so that the output can be observed without the crazy loops. The code writes spaces in front of the operations in the node tree to reflect the depth in the tree: >45 stop >run *** STRANGECART RUN *** BIOP STORE: ADDR &0x20001AD4 CONSTF 0 BIOP STORE: ADDR &0x20001AD4 BIOP ADD: FETCH ADDR &0x20001AD4 FETCH ADDR &0x20001AEC Stop. Use cont to continue.Runtime 0 seconds and 19802 us. 0.019802 The two let statements have as root nodes "BIOP STORE" nodes. BIOP stands for binary operator, in this case that means the operator takes two arguments. Line 20 S=0 generated this: BIOP STORE: ADDR &0x20001AD4 CONSTF 0 The first operand of the LET contains the target address for the store (it is the left node of BIOP in the tree). The address is the address of the variable S' data field in the symbol table. The first execution of the line 20 runs in interpreter mode, and among other things creates the variable S in the symbol table, in this example the actual floating point value storage address of S is at 0x20001AD4. The second operand of store is the value, which in this case is presented with the terminal node CONSTF 0 (floating point constant with value of zero). When this simple tree is evaluated, it simply writes the floating value of zero to the address 0x20001AD4. Note that when the stored tree is evaluated, the code does not need to search for variables or anything, it's all stored in the tree. [For Forth aficionados this would be the same as "0.0 0x20001AD4 !" or something like that, I don't remember how floating point numbers are expressed.] The second tree for the Basic line 40 S=S+J is a little more complex: BIOP STORE: ADDR &0x20001AD4 BIOP ADD: FETCH ADDR &0x20001AD4 FETCH ADDR &0x20001AEC The beginning is the same, with the STORE and the destination address. The data to be stored is more interesting, as it is another BIOP node, this time ADD. The two children of this node are both memory fetch unary operations, one from the address of S and the other from the address of J. Running normally - without displaying the contents - the benchmark now runs on the STM32G431 much faster, finishing the benchmarking 0.036 seconds, which happens to be 3.6 times faster than the previous time 0.129 seconds. Each iteration of lines 30 and 40 together now takes 3.6 microseconds. This can still be substantially improved. The TMS9900 can hardly execute a single machine instruction in that time, and here we do on line 40 two fetches, one add, a store, plus all the activities of NEXT (fetch J, add the step of 1.0 to it, store new value of J, compare it to the limit of 10.0 and do a conditional branch) back to line 40. In case you are wondering, this kind of node presentation of expressions is typical for compilers. However, I am not compiling the Basic code (yet) to machine code, but rather storing the whole tree as a data structure and evaluating (in practice traversing through) the tree on the fly. The node trees consume a lot of memory compared to the Basic tokens, and the STM32G431 only has 32K of SRAM. But this functionality is getting close to what a just-in-time compiler would do. Still, in many programs certain inner loops consume most of the time, and storing a bunch of expression evaluation trees for those would not need to consume much memory. One kilobyte would go a long way. A couple of ports of code I ported the current version over to two microcontrollers, the STM32G0B1 and STM32G431. The former is based on the Cortex M0 core and the latter on the Cortex M4 (incidentally the MCU on the StrangeCart contains both of these cores). I've used the ST Micro's software development toolchain now for a while, and have started become familiar with it. My GROM replacement grommy boards use the STM32G0 series chips too. Even if Cortex M4 and M0 are very similar, there are differences which uncovered a few bugs, or at least portability issues, with the code. It turned out that Cortex M0 does not support unaligned memory loads. The Ti99/4A tokenised lines of code contain line numbers stored as 16-bit integers. I was loading them as 16-bit quantities with a 16-bit load, but that raises a memory access exception on the Cortex M0 if the 16-bit quantity is located on an odd address. Also the TMS9900 cannot do unaligned loads like that. That was simple to fix. Bug fixes I uncovered some bugs, the expression evaluator was not doing math in the correct order. An expression such as 1/2*3 was evaluated as 1/(2*3) instead of the correct order (1/2)*3. That was simple to fix. I also noticed that I didn't properly handle everywhere in the code the valid unusual variable name characters: @ [ ] are valid characters in TI Basic. You can write: ]=3 And use that as a variable name. Now those are supported too. I've also done work to make the tokeniser work better. It can now correctly tokenise more complex programs, although I know there are still some issues. Overall, this is interesting stuff to work on. I will make a new StrangeCart firmware version when I am a bit further with all of this. Edited October 26, 2023 by speccery 7 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5338672 Share on other sites More sharing options...
Artoj Posted May 2 Share Posted May 2 Hi Speccery, any news on your StrangeCart? 3 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5459125 Share on other sites More sharing options...
speccery Posted May 8 Author Share Posted May 8 On 5/2/2024 at 11:13 AM, Artoj said: Hi Speccery, any news on your StrangeCart? Hi @Artoj sorry for the long silence. I have been very busy at work - and then I've also played around with some electronics music hardware. I was actually reading my post above and that was a pretty good refresher for me as well. I hope I find the time and energy to work on the StrangeCart and my other TI related projects soon. Of course finding time is also partially a matter of priorities like one would guess. I have a pretty basic 3D printer which I've had for at least five years, but I only recently started to spend a bit of time learning FreeCAD and thanks to that I have rediscovered my 3D printer and been using it to make some fairly simple designs. The primary use case for me has been to design "base boards" for some simple electronics projects, i.e. slabs of plastic with mounting to screw in some boards. This has been a pretty interesting journey so far, I have been able to put into good use some of my microcontroller and peripheral boards to build more "complete" systems from those. I'll post a picture when I get home. It's such a basic use of the 3D printer, but extremely useful for me, turning a mess of loose boards and wires into a more rugged and organised mess of boards and wires 5 1 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5462738 Share on other sites More sharing options...
Artoj Posted May 8 Share Posted May 8 Amazing Speccery, I am using a 3D printer at this moment, tuning it for use in printing TI cartridges and my TI MPEB boxes. In the past I would have made it out of wood or carved it from plastic, I hope to finalise multiple projects using the 3D printer. FREECAD is an excellent program, so much can be designed with great accuracy and easily turned into STL files. At the moment I am using Tinkercad, it has a simple interface but not easy to define edge radii and has limited flexibility. Looking forward to more on Strange cart and your many other designs. Regards Arto. 1 Quote Link to comment https://forums.atariage.com/topic/306889-strangecart/page/18/#findComment-5462794 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.