+TheBF Posted November 23, 2019 Share Posted November 23, 2019 46 minutes ago, unhuman said: If I remember correctly, I used HCHAR b/c printing with ; puts an extra space in on the TI, unlike other basics. Thanks for weighing in. Nice to hear from the author. When I ran it, the output seemed exactly the same as the original. Try it if you have an emulator and let me know if I missed something. It also takes a fair bit less code. Always a win in TI BASIC. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted November 23, 2019 Share Posted November 23, 2019 On 11/18/2019 at 10:17 PM, TheBF said: I have been using Lee's version of this program to evaluate my screen I/O routines. I found a small improvement. We don't have to make a counted string since TYPE uses a stack string (addr,len) and we have LENGTH. It takes a few milliseconds off. : TYPE-A1 ( -- ) \ LENGTH PAD C! \ store string length PAD \ copy of PAD to start string storage loop A1 1- DUP LENGTH + DO \ DO A1+length-1 to A1 I C@ \ get next digit 48 + \ convert to ASCII OVER C! \ store ASCII digit in PAD 1+ \ next PAD location -1 +LOOP DROP \ clean up stack CR PAD LENGTH TYPE CR ; \ type number And after glancing at it again I realized we don't have put the chars into PAD at all! Lee has nicely made the array printable as is, we just have to type from back to front. : TYPE-A1 A1 LENGTH 1- + 0 LENGTH DO DUP C@ 48 + EMIT 1- -1 +LOOP DROP ; And if I make use of my new words to use VDP hardware auto-increment it becomes this and it runs in 1:03 : TYPE-A1 VPOS VDPWA! A1 LENGTH 1- + 0 LENGTH DO DUP C@ 48 + VEMIT ?CR 1- -1 +LOOP DROP ; Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted November 23, 2019 Share Posted November 23, 2019 3 hours ago, unhuman said: If I remember correctly, I used HCHAR b/c printing with ; puts an extra space in on the TI, unlike other basics. TI Basic “PRINT ;” does output a space or ‘-’ before and a space after a number. It does not, however, output any spaces for characters or strings. ...lee Quote Link to comment Share on other sites More sharing options...
unhuman Posted November 23, 2019 Share Posted November 23, 2019 (edited) 9 minutes ago, Lee Stewart said: TI Basic “PRINT ;” does output a space or ‘-’ before and a space after a number. It does not, however, output any spaces for characters or strings. ...lee Wow... I never knew that. I was always annoyed at that extra space. Prevented some things that other basics allowed when I was a kid... Edited November 23, 2019 by unhuman Quote Link to comment Share on other sites More sharing options...
unhuman Posted November 23, 2019 Share Posted November 23, 2019 3 hours ago, TheBF said: Thanks for weighing in. Nice to hear from the author. When I ran it, the output seemed exactly the same as the original. Try it if you have an emulator and let me know if I missed something. It also takes a fair bit less code. Always a win in TI BASIC. Now you gotta compile it! Quote Link to comment Share on other sites More sharing options...
+TheBF Posted November 23, 2019 Share Posted November 23, 2019 9 hours ago, unhuman said: Now you gotta compile it! I will have to learn how first... or write a compiler Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 1, 2020 Share Posted September 1, 2020 This topic is rather long winded but what the heck. I have taken to using Lee's version of the Sevens Problem, with a few "Camel Forth friendly changes, as a general test of my system builds. Lately I built a "poor-man's just-in-time compiler addition that requires some changes to the source code. You use the INLINE[ ] directive on Forth primitives, constants, variables and user-variables and they are compiled as inline machine code. Kind of manual but it works. So I wondered what it would do. With the inlining, doing the version that only prints the final results, I finally got down near the times that Lucien gets with his MLC compiler. 9.3 seconds versus 16.1 secs without inlining any code. Nice to know the tool can make a difference. Spoiler \ Lee Stewart's mod of Lucien2's code for the sevens problem \ Speedup mods for CAMEL99 Forth \ 1. Used VALUES \ 2. VTYPE for all short strings \ 3. Used UM/MOD , native division \ 4. >DIGIT for digit conversion \ 5. Redefined PAD as static memory \ 6. Defined 7* \ 7. Used inline for as many primitives as possible NEEDS ELAPSE FROM DSK1.ELAPSE NEEDS VALUE FROM DSK1.VALUES NEEDS MALLOC FROM DSK1.MALLOC NEEDS VTYPE FROM DSK1.VTYPE NEEDS INLINE[ FROM DSK1.INLINE MARKER /SEVENS DECIMAL \ ------------------------------------ 180 CONSTANT SIZE SIZE MALLOC CONSTANT A1 \ A1 digit array SIZE MALLOC CONSTANT PAD 0 VALUE LENGTH \ current number of digits in result HEX CODE 7* C044 , 0A34 , 6101 , NEXT, ENDCODE DECIMAL : A1*7->A1 ( -- ) \ perform 7 * last result 0 \ initialize carried digit on stack 1 +TO LENGTH \ assume we will increase length by 1 digit A1 LENGTH BOUNDS DO INLINE[ I C@ 7* + 0 10 UM/MOD ] \ make result ud..unsigned divide by 10 INLINE[ SWAP I C! ] \ store rem as cur digit..carry on stack LOOP DROP \ clean up stack \ eliminate leading 0 A1 LENGTH INLINE[ 1- + C@ 0= ] \ highest digit = 0? IF -1 +TO LENGTH \ correct digit count THEN ; : A1$ ( -- addr len) PAD DUP \ PAD & COPY for string storage A1 1- DUP LENGTH + DO INLINE[ I C@ >DIGIT OVER C! 1+ ] -1 +LOOP DROP \ clean up stack ( PAD) LENGTH ; : A1$.TYPE ( --) [ A1 1- ] LITERAL LENGTH OVER + DO INLINE[ I C@ >DIGIT VPUT VCOL 1+@ C/L@ ] >= IF CR THEN -1 +LOOP ; : 7COUNTER ( -- ? ) \ Brian Fox's technique 0 \ initialize counter A1 LENGTH BOUNDS \ DO A1 to A1 + length DO INLINE[ 1+ I C@ 7 = AND DUP 6 = ] IF \ more than '77777'? LEAVE \ yup..we're done THEN LOOP ; DECIMAL : .POWER ( n -- ) S" SEVEN TO THE POWER OF " VTYPE DECIMAL . S" IS" VTYPE ; : RUN \ V2.58 1:26 , \ v2.59 with 8 line scrolling, 1:02 \ v2.62 with inline v4 54 sec. PAGE A1 SIZE 0 FILL 7 A1 C! 1 TO LENGTH 2 \ starting power BEGIN 7COUNTER 5 < WHILE A1*7->A1 DUP \ dup power for display CR .POWER 1+ \ increment power CR A1$.TYPE CR REPEAT DROP ; DECIMAL : NOSCROLL PAGE A1 SIZE 0 FILL 7 A1 C! 1 TO LENGTH 2 \ starting power BEGIN 7COUNTER 5 < WHILE A1*7->A1 DUP \ dup power for display 0 0 AT-XY .POWER 1+ \ increment power \ CR CR A1$ TYPE \ 39:08 CR CR A1$ VTYPE \ 24:50, W/INLINE4 00:15.35 REPEAT 0 7 AT-XY ; DECIMAL : FASTRUN ( 16.1 seconds) ( W/INLINE 9.3 seconds) PAGE ." Working..." A1 SIZE 0 FILL PAD SIZE 0 FILL 7 A1 C! 1 TO LENGTH 2 \ starting power BEGIN 7COUNTER 5 < WHILE A1*7->A1 1+ \ increment power REPEAT 1- CR .POWER CR A1$ VTYPE 0 7 AT-XY ; DECIMAL CR .( TYPE: ELAPSE RUN, ELAPSE NOSCROLL or ELAPSE FASTRUN ) CR 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted June 1, 2022 Share Posted June 1, 2022 Revisiting old benchmarks to beat up my newest version of Camel99 Forth and I am back on this old chestnut. It is a good general workout for a language. Using my last Lee Stewart version that prints the output array directly, in ITC Forth I now get: 59 seconds on the stopwatch. Can't use the screen timeout timer because we spend too much time with interrupts disabled doing all that VDP I/O. A lot of the speedup is because I am now using unused memory for a large screen buffer for fast screen scrolling. (what the hell) In the new DTC Forth we get 56 seconds on the stopwatch. 4 Quote Link to comment Share on other sites More sharing options...
BillG Posted July 23, 2022 Share Posted July 23, 2022 This is probably years away from my being able to produce actual performance numbers, but I thought I would throw it out there. Many trips around the sun ago, I joined a makerspace where I experimented with the Arduino and began to learn Python. I thought Arduino is popular and Python is popular. Why can't we program an Arduino in Python?" I set out to implement a Python compiler. Yes, a compiler. By not requiring an interpreter implementing the entire language, we have a fighting chance to get some programs to run on the microcontroller of the Arduino. Initially, I had concerns around the 2K bytes of RAM in the Arduino. Because a Commodore 128 was on display at the makerspace available for me to use, I chose the 6502 as the initial target. Some arm twisting from a friend wanting me to target the 9900 brought me to this forum where I discovered the Sevens problem. Solving it requires: * multiplying a large number by 7 * converting the number to ASCII decimal * searching for "777777" in the string all of which are built into the Python language. Meaning that I can write the critical parts in assembly language as part of the run-time library, something I cannot do in most other languages. Python has the potential to provide the fastest high-level language solution of the Sevens problem, so I thought. The Python code would be merely be the "glue" tying these parts together. number = 1 power=0 string="" while string.find("777777") < 0: number = number * 7 power = power + 1 string = str(number) print('Seven to the power of', power, 'is', string) When I got to the point where I was able to compile and run this program, disappointment set in. It crawled. Analysis showed the vast majority of the time was spent in the int function which converted a number to a string representation. I had initially chosen to implement the obvious approach - dividing the number by 10 to extract each digit. The divisions were killing me. Some research turned up the Double Dabble algorithm to avoid dividing: https://en.wikipedia.org/wiki/Double_dabble I finally got around to reimplementing the faster conversion and got a greatly improved time, going from over 22 billion CPU cycles to a little over 90 million. An Apple II would take just over 90 seconds to find the solution. I do not know enough to even guess how fast something like this would run on the 9900. https://talk.dallasmakerspace.org/t/project-log-python-on-the-6502-c64-8080-6800-6809-and-avr/18852/363 In parallel, I experimented with the different approach of performing the calculation in BCD to avoid having to do the conversion. My best solution along those lines is using the Borland dialect of Pascal, doing the arithmetic using unpacked BCD stored in a string, marching a pchar pointer through the string for fast "indexing" and using pos to look for six consecutive bytes of binary 7 in the string. It completes almost instantaneously on my relatively slow netbook. Unless there is a 9900 Pascal compiler supporting pchar, I cannot test this. 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted July 23, 2022 Share Posted July 23, 2022 (edited) This is really interesting to me Bill. I have been noodling on how to make number conversion faster as well. As you probably know the Forth converter called # traditionally converts a double integer and so does two divisions. Even worse. And a python compiler , even a subset of the language, would be a big hit around here I suspect. I bet it's no mean feat to accomplish. I am going to look over the double_dabble very closely. I have found however that because 9900 consumes so many cycles for every instruction it is hard to get big improvements in speed by replacing DIV with other instructions but some improvements are possible. We shall see. Thanks! Edited July 23, 2022 by TheBF typo 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted July 24, 2022 Share Posted July 24, 2022 Way back in 2011 Lucien posted a version of this program running on Wycove Weiand Forth. I always wondered how Wycove performed. (I will see if I can run it locally for a Wycove timing) It was his first cut at it using two arrays, but we have a video record of the time on Wycove Weiand Forth. I was curious if we (and me particularly) are making progress in Forth on TI-99 so I ran the exact same code with a small amount of translation harness off the top for Camel99. We also have a video record of the time on TurboForth. Some progress has been made. EDIT: Ran the program on Wycove 3.0 and posted timing July 25,2022 \ Camel99 DTC 2:51 \ TurboForth 3:04 \ Camel99 ITC 3:16 \ Wycove 3.0 4:10 \ Weiand 4:58 Using Lee Stewart's Forth solution, Camel99 2.69 ITC runs the the program Un-optimized, with full text output now in 53 seconds. The calculation only version runs in 15.8 seconds. Using my latest version of the inline optimizer and direct to video text output, the full text runs in 45 seconds and "calculation only" version runs in 9 seconds. Original early version code Spoiler \ Original Code by Lucien \ Camel99 DTC 2:51 \ TurboForth 3:04 \ Camel99 ITC 3:16 \ Wycove 4:58 \ CAMEL Forth harness ...... \ don't need these \ : CREATE2 ( -- ) \ <BUILDS DOES> ; \ : CELLS ( n -- n ) 2 * ; \ -1 CONSTANT TRUE \ 0 CONSTANT FALSE : ENDIF POSTPONE THEN ; IMMEDIATE : CLS PAGE ; \ WYCOVE version begins ............. 180 CONSTANT SIZE VARIABLE V1 VARIABLE V2 VARIABLE V3 CREATE A1 SIZE CELLS ALLOT CREATE A2 SIZE CELLS ALLOT : A1*7->A2 ( -- ) 0 V1 ! 0 V2 ! BEGIN A1 V1 @ CELLS + @ 7 * V2 @ + DUP 10 MOD A2 V1 @ CELLS + ! 10 / V2 ! V1 @ 1+ DUP V1 ! SIZE < WHILE REPEAT ; : A2->A1 ( -- ) SIZE 0 DO A2 I CELLS + @ A1 I CELLS + ! LOOP ; : TYPE-A2 ( -- ) 0 V1 ! FALSE V2 ! -1 SIZE 1- DO A2 I CELLS + @ 48 + DUP 48 = 0= IF TRUE V2 ! ENDIF V2 @ IF PAD V1 @ 1+ + C! V1 @ 1+ V1 ! ELSE DROP ENDIF -1 +LOOP V1 @ PAD C! CR PAD COUNT TYPE CR ; : TEST-A2 ( -- f ) 0 V1 ! FALSE V2 ! SIZE 0 DO A2 I CELLS + @ 7 = 0= IF 0 V1 ! ELSE V1 @ 1+ V1 ! ENDIF V1 @ 5 > IF TRUE V2 ! ENDIF LOOP V2 @ ; : 7'S-PROBLEM CLS A1 SIZE CELLS 0 FILL A2 SIZE CELLS 0 FILL 7 A1 ! 2 V3 ! BEGIN A1*7->A2 CR ." SEVEN TO THE POWER OF " V3 @ . ." IS" V3 @ 1+ V3 ! TYPE-A2 A2->A1 TEST-A2 UNTIL ; 3 Quote Link to comment Share on other sites More sharing options...
lucien2 Posted July 24, 2022 Share Posted July 24, 2022 It was Weiand (named "Fig Forth" on the title screen), not Wycove. I don't know if they are more or less the same. Quote Link to comment Share on other sites More sharing options...
BillG Posted July 24, 2022 Share Posted July 24, 2022 22 hours ago, TheBF said: I am going to look over the double_dabble very closely. I have found however that because 9900 consumes so many cycles for every instruction it is hard to get big improvements in speed by replacing DIV with other instructions but some improvements are possible. I should probably warn you that the wiki writeup on Double Dabble is neither very clear or very optimal. It is oriented toward someone writing conversion code in a high-level language. There are two ways to convert a number from one base to another. The obvious and most intuitive method is to repeatedly divide the number to be converted by the destination radix to extract digits. But it is the slowest due to the many division operations required. A better way is to take each "digit" of the number from high to low, multiply the accumulated resultant number (initially 0) by the source radix then add in the new digit. Note that these operations must be done in the destination radix. For example, 9 is 1001 in binary. It can be viewed as 1 x 8 + 0 x 4 + 0 x 2 + 1 x 1 So you take the highest bit of the number to be converted and add to the resultant number (initially 0) yielding 1. Then multiply it by 2 yielding 2 and add the next highest bit of the number to be converted yielding 2. And then multiply it by 2 yielding 4 and add the next highest bit of the number to be converted yielding 4. Finally multiply it by 2 yielding 8 and add the lowest bit yielding 9. Double Dabble is an implementation of the second approach but shifts the number left instead of multiplying by 2. All of that fiddling about testing groups of bits and adding a correction factor is nothing more than what is known as "decimal adjust" on hardware supporting it. Unfortunately, the 9900 does not have decimal adjust so you will have to do it. More discussion and implementation for some common microprocessors can be found at: https://talk.dallasmakerspace.org/t/project-log-python-on-the-6502-c64-8080-6800-6809-and-avr/18852/364 2 1 Quote Link to comment Share on other sites More sharing options...
atrax27407 Posted July 24, 2022 Share Posted July 24, 2022 Weiand and Wycove are completelt different applications of the same language (i.e., FORTH). 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted July 24, 2022 Share Posted July 24, 2022 7 hours ago, lucien2 said: It was Weiand (named "Fig Forth" on the title screen), not Wycove. I don't know if they are more or less the same. Thank you for the correction I will update the post. Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted July 24, 2022 Share Posted July 24, 2022 1 hour ago, atrax27407 said: Weiand and Wycove are completelt different applications of the same language (i.e., FORTH). Actually, Wycove Forth descended from Fig Forth. On page 55 of the Wycove Forth 2.0 manual it states, "Wycove Forth is based on FIG Forth, version 1.0, with several minor differences and extensions for the 99/4A environment." I like Wycove Forth 3.0, which is quite fast, particularly with 16-bit memory installed. For example, that combination runs Lee's version of the SEVENS benchmark (with slight modifications) in ~ 58 seconds, printing everything to screen. 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.