+TheBF Posted April 22, 2022 Share Posted April 22, 2022 Over in Benchmarking Languages @VORTICON provided us with the Classic Byte magazine article on the subject . In this topic I would like to do this for our TI-99 universe. TI-99 has an excellent set of languages for a machine of its vintage. Pick your favourite language and use the Byte version of the sieve as your first contribution. Stay as close as possible to the Byte listing as you can with the TI-99 dialect of your chosen language. If your language supports optimization or alternative coding techniques use them and post those results as an "optimized" result. Here are some results from other machines circa 1989. Here is what we have so far. 1 Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted April 22, 2022 Share Posted April 22, 2022 The followup January 1983 followup BYTE article, with a sh-t ton of results: https://archive.org/details/byte-magazine-1983-01/page/n291/mode/2up My physical copy (from my original subscription) is tattered, taped and scribbled to an embarrassing degree. Not unlike myself. Here are some additional TI results, running on a 16-bit console. IIRC the Forth code had to be modified a bit to run. Cortex BASIC gets a small push from multiple statements per line in a couple instances. Assembly: 6.4" Wycove Forth 3.0: 92" Wycove Forth 2.0 112" Turbo Forth 105" FB Forth 158" Cortex BASIC 1 1318" (Cortex BASIC loaded into RAM, integer FLAGS array in SuperCart) Cortex BASIC 2 1685" (Cortex BASIC from FinalGrom, integer FLAGS array in expansion memory) 8 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted April 22, 2022 Share Posted April 22, 2022 Are these results using the full 8K array and 10 iterations? On the ZX-81 I could only dimension a 2K array before running out of memory, so I had to multiply the processing time by 4 to match other computers (1800 seconds). Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Author Share Posted April 22, 2022 3 hours ago, Reciprocating Bill said: Here are some additional TI results, running on a 16-bit console. IIRC the Forth code had to be modified a bit to run. Cortex BASIC gets a small push from multiple statements per line in a couple instances. Assembly: 6.4" Wycove Forth 3.0: 92" Wycove Forth 2.0 112" Turbo Forth 105" FB Forth 158" Cortex BASIC 1 1318" (Cortex BASIC loaded into RAM, integer FLAGS array in SuperCart) Cortex BASIC 2 1685" (Cortex BASIC from FinalGrom, integer FLAGS array in expansion memory) Bill, when you say a 16-bit console, do you mean it was modded so all the RAM is on the 16 bit buss? Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted April 22, 2022 Share Posted April 22, 2022 1 hour ago, Vorticon said: Are these results using the full 8K array and 10 iterations? On the ZX-81 I could only dimension a 2K array before running out of memory, so I had to multiply the processing time by 4 to match other computers (1800 seconds). They are all against the 8192K array, and 10 iterations. CBASIC could not dimension an array that size either, because numeric arrays are dimensioned as floating point with no option for an integer DIM. But what it can do is set aside an arbitrary amount of RAM (originally for the purpose of running assembly routines) and then poke and peek bytes or words (with the MEM() operation) into specific addresses, with statements like "MEM(FLG+K)=0" where "FLG" is the base address of the array. When running CBASIC entirely in RAM (the fastest option on my console) there is only about 6K free after loading the interpreter, so I loaded CBASIC from a SuperCart and situated the FLAGS array in the 8K slow RAM it contains. When running out of FinalGrom CBASIC has about 16K free, so I allocated 8K of high memory for the integer FLAGS array (but then parts of the interpreter are running out of slow module ROM). In either case there is actually a bit of extra work for the interpreter to do relative to the original algorithm, as the MEM() address has to be calculated (one extra addition each access). Because CBASIC otherwise treats variables as integers where possible (but as FP if necessary), I'd say it's meaningful to put it up against other integer BASICs running the Sieve. So, for example, Apple II Integer BASIC completes the Sieve in 1850 seconds, an IBM PC using integers in MBASIC in 1950 seconds, and so on - both of which CBASIC bests handily at either 1318 or 1685 seconds. Remembering that this is a 16-bit console, it puts me in mind of "what could have been." 2 Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted April 22, 2022 Share Posted April 22, 2022 (edited) 4 minutes ago, TheBF said: Bill, when you say a 16-bit console, do you mean it was modded so all the RAM is on the 16 bit buss? Right. CBASIC in particular, which was written for the 9995, really benefits from that. Edited April 22, 2022 by Reciprocating Bill 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Author Share Posted April 22, 2022 1 minute ago, Reciprocating Bill said: Right. CBASIC in particular, which was written for the 9995, really benefits from that. I'm jealous. Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted April 22, 2022 Share Posted April 22, 2022 4 minutes ago, TheBF said: I'm jealous. Don't be. I'll have to run the console for 100 years to recover the time it took to finish the mod - which included a couple failed attempts. I'm pretty sure I don't have that kind of time. 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Author Share Posted April 22, 2022 Me either. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Author Share Posted April 22, 2022 So these are Classic99 timings but they usually hold very close to my real hardware. The Byte mag. code needed one minor change for ANS Forth and the 10X iterator seems missing so I added that. Spoiler \ BYTE MAGAZINE SIEVE NEEDS ELAPSE FROM DSK1.ELAPSE DECIMAL 8192 CONSTANT SIZE VARIABLE FLAGS SIZE ALLOT 0 FLAGS ! : DO-PRIME FLAGS SIZE 1 FILL ( set array ) 0 ( counter ) SIZE 0 DO FLAGS I + C@ IF I DUP + 3 + DUP I + BEGIN DUP SIZE < WHILE 0 OVER FLAGS + C! OVER + REPEAT DROP DROP 1+ THEN LOOP CR SPACE . ." Primes" ; : PRIMES ( n -- ) ." 10 Iterations" 10 0 DO DO-PRIME LOOP CR ." Done!" ; Camel99 DTC 101.7 secs TurboForth 107 secs Camel99 ITC 119.5 secs FbForth 2.0:13 174 secs Quote Link to comment Share on other sites More sharing options...
jstimson Posted April 22, 2022 Share Posted April 22, 2022 There was also the Creative Computing magazine article from 1984. What amazes me is just how many completely separate (and competing) hardware and software platforms there were back in the day. It was truly the wild-west of home computing. I have the magazine but here's a direct link to the specific section.... https://archive.org/details/creativecomputing-1984-03/page/n7/mode/2up 2 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted April 23, 2022 Share Posted April 23, 2022 6 hours ago, TheBF said: fbForth 2.0:13 174 secs I changed two words in the loops to see whether it made a difference: IF I DUP + 3 + DUP I + to IF I I + 3 + DUP I + because reusing I is marginally faster than DUPing it, and DROP DROP 1+ to 2DROP 1+ 2DROP was defined as HEX CODE: 2DROP ( n1 n2 -- ) 05C9 \ SP INCT, 05C9 \ SP INCT, ;CODE It reduced the time by only one second! ...lee 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 23, 2022 Author Share Posted April 23, 2022 I enabled my inline optimizer where it could make a difference on the same code on the Indirect threaded Camel99 Forth. (I have a bug in the Direct threading version) NEEDS INLINE[ FROM DSK1.INLINE \ optimized Version DECIMAL : DO-PRIME2 FLAGS SIZE 1 FILL ( set array ) 0 ( counter ) SIZE 0 DO INLINE[ FLAGS I + C@ ] IF INLINE[ I DUP + 3 + DUP I + ] BEGIN INLINE[ DUP SIZE ] < WHILE INLINE[ 0 OVER FLAGS + C! OVER + ] REPEAT INLINE[ DROP DROP 1+ ] THEN LOOP CR SPACE . ." Primes" ; \ Camel99 : PRIMES-OPT ( -- ) PAGE ." 10 Iterations" 10 0 DO DO-PRIME2 LOOP CR CR ." Done!" ; This ran the 10 iterations in 80.8 seconds. Here is what the Forth code "FLAGS I + C@" became. 2002 0646 dect R6 (14) 2004 C584 mov R4,*R6 (30) 2006 0204 li R4,>c1c4 (20) 200A 0646 dect R6 (14) 200C C584 mov R4,*R6 (30) 200E C117 mov *R7,R4 (26) 2010 6127 s @>0002(R7),R4 (34) 2014 A136 a *R6+,R4 (30) 2016 D114 movb *R4,R4 (26) 2018 0984 srl R4,8 (32) 201A 045A b *R10 1 Quote Link to comment Share on other sites More sharing options...
lucien2 Posted April 24, 2022 Share Posted April 24, 2022 GPL is disappointing with this one: 342 seconds. Maybe it's all these accesses to expansion RAM? I used an old mess version (0.136), I can't find how to run it on mame 0.164: eratosthenes sieve.g.bin With Classic99, it takes 387 seconds. Spoiler grom >6000 data >aa00,>0100,>0000 data menu data >0000,>0000,>0000,>0000 menu data >0000 data start stri 'ERATOSTHENES SIEVE' upcase equ >0018 iter equ >8300 count equ >8301 size equ >8303 i equ >8305 flags equ >a000 prime equ >8307 k equ >8309 tmp equ >830b s1 stri 'ITERATIONS' s2 stri 'PRIMES' s3 stri 'DONE!' line equ >830d start dst 8191,@size st 10,@iter dst >0900,@fac call upcase dclr @arg clr @arg+2 st @iter,@arg+3 call print clr @tmp move 1,g@s1,@tmp+1 dinc @arg+4 move @tmp,g@s1+1,v*arg+4 st 1,@line L6 dclr @count dst @size,@i st 1,@tmp L1 st @tmp,@flags(@i) ddec @i dcz @i br L1 dclr @i L3 cz @flags(@i) bs L2 dst @i,@prime dadd @i,@prime dadd 3,@prime dst @prime,@k dadd @i,@k L5 dch @size,@k bs L4 clr @flags(@k) dadd @prime,@k b L5 L4 dinc @count L2 dinc @i dch @size,@i br L3 st @line,@arg clr @arg+1 dst @count,@arg+2 call print clr @tmp move 1,g@s2,@tmp+1 clr @arg st @line,@arg+1 inc @line dsll 5,@arg dinc @arg+4 dadd @arg+4,@arg move @tmp,g@s2+1,v*arg dec @iter cz @iter br L6 clr @tmp move 1,g@s3,@tmp+1 clr @arg st @line,@arg+1 dsll 5,@arg move @tmp,g@s3+1,v*arg L7 b L7 * print unsigned word decimal * arg: row, arg+1: col, arg+2/3: number print clr @fac st @arg,@fac+1 dsll 5,@fac clr @arg dadd @arg,@fac call >2f7c st @>8367,@fac+3 clr @fac+2 clr @arg+4 move @arg+4,@>8300(@fac+2),v*fac rtn 2 2 Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted April 24, 2022 Share Posted April 24, 2022 (edited) 3 hours ago, lucien2 said: GPL is disappointing with this one: 342 seconds. Maybe it's all these accesses to expansion RAM? I used an old mess version (0.136), I can't find how to run it on mame 0.164: eratosthenes sieve.g.bin With Classic99, it takes 387 seconds. Reveal hidden contents grom >6000 data >aa00,>0100,>0000 data menu data >0000,>0000,>0000,>0000 menu data >0000 data start stri 'ERATOSTHENES SIEVE' upcase equ >0018 iter equ >8300 count equ >8301 size equ >8303 i equ >8305 flags equ >a000 prime equ >8307 k equ >8309 tmp equ >830b s1 stri 'ITERATIONS' s2 stri 'PRIMES' s3 stri 'DONE!' line equ >830d start dst 8191,@size st 10,@iter dst >0900,@fac call upcase dclr @arg clr @arg+2 st @iter,@arg+3 call print clr @tmp move 1,g@s1,@tmp+1 dinc @arg+4 move @tmp,g@s1+1,v*arg+4 st 1,@line L6 dclr @count dst @size,@i st 1,@tmp L1 st @tmp,@flags(@i) ddec @i dcz @i br L1 dclr @i L3 cz @flags(@i) bs L2 dst @i,@prime dadd @i,@prime dadd 3,@prime dst @prime,@k dadd @i,@k L5 dch @size,@k bs L4 clr @flags(@k) dadd @prime,@k b L5 L4 dinc @count L2 dinc @i dch @size,@i br L3 st @line,@arg clr @arg+1 dst @count,@arg+2 call print clr @tmp move 1,g@s2,@tmp+1 clr @arg st @line,@arg+1 inc @line dsll 5,@arg dinc @arg+4 dadd @arg+4,@arg move @tmp,g@s2+1,v*arg dec @iter cz @iter br L6 clr @tmp move 1,g@s3,@tmp+1 clr @arg st @line,@arg+1 dsll 5,@arg move @tmp,g@s3+1,v*arg L7 b L7 * print unsigned word decimal * arg: row, arg+1: col, arg+2/3: number print clr @fac st @arg,@fac+1 dsll 5,@fac clr @arg dadd @arg,@fac call >2f7c st @>8367,@fac+3 clr @fac+2 clr @arg+4 move @arg+4,@>8300(@fac+2),v*fac rtn I ran it on real iron from a FinalGrom: 399 seconds. And that's with all RAM on the 16-bit bus. Further perspective. For my assembly version: 6.4 seconds, all RAM 16-bit. 7.1 seconds, unmodified console, code and workspace squeezed into PAD ram, array in memory expansion. (So array in memory expansion is not that big of a factor.) ~10 seconds, unmodified console, WS in PAD, code and array in memory expansion. ~15 seconds, unmodified console, everything in expansion RAM. Edited April 24, 2022 by Reciprocating Bill OCD 2 1 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted April 24, 2022 Share Posted April 24, 2022 5 hours ago, lucien2 said: I used an old mess version (0.136), I can't find how to run it on mame 0.164: eratosthenes sieve.g.bin No naked bins with MAME please. Just put it in a RPK file with this layout.xml. You can re-use it for similar cartridges. <?xml version="1.0" encoding="utf-8"?> <romset version="1.0" listname="editass"> <resources> <rom id="gromimage" file="eratosthenes_sieve.bin"/> </resources> <configuration> <pcb type="standard"> <socket id="grom_socket" uses="gromimage"/> </pcb> </configuration> </romset> sieve.rpk Edit: 402 seconds on MAME. 1 1 Quote Link to comment Share on other sites More sharing options...
lucien2 Posted April 24, 2022 Share Posted April 24, 2022 11 minutes ago, mizapf said: No naked bins with MAME please. Just put it in a RPK file with this layout.xml. You can re-use it for similar cartridges. Thanks! I used "pcb type=paged", instead of "standard". Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 24, 2022 Author Share Posted April 24, 2022 2 hours ago, Reciprocating Bill said: I ran it on real iron from a FinalGrom: 399 seconds. And that's with all RAM on the 16-bit bus. Further perspective. For my assembly version: 6.4 seconds, all RAM 16-bit. 7.1 seconds, unmodified console, code and workspace squeezed into PAD ram, array in memory expansion. (So array in memory expansion is not that big of a factor.) ~10 seconds, unmodified console, WS in PAD, code and array in memory expansion. ~15 seconds, unmodified console, everything in expansion RAM. Bill, can you post the Assembly Language code that you wrote? Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 24, 2022 Author Share Posted April 24, 2022 I got the optimizer working on the Direct Threaded Forth system. In order to make the optimizer simpler for Direct threaded Forth I had to roll back an optimization that I had put this new system. ( inlining the 2 instruction Forth interpreter ) That brought the un-optimized kernel time up to 107.5 seconds (versus 101 posted earlier) With optimizer the timing on the DTC system is 71.1 seconds. Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted April 24, 2022 Share Posted April 24, 2022 (edited) Here's the assembly code. Later I'll try posting the binary. This version sets up the workspace at >8300 but makes no attempt to roll any code into the PAD. It should run in about 10 seconds on a stock console. * SIEVE OF ERATOSTHENES ------------------------------------------ * WSM 4/2022 * TMS9900 assembly adapted from BYTE magazine 9/81 and 1/83 issues * 10 iterations 6.4 seconds on 16-bit console * ~10 seconds on stock console DEF START REF VMBW M DATA >0000 for PRINT routine M2 BSS 6 M3 DATA 10000 DATA 1000 DATA 100 DATA 10 ROW DATA >0004 MSG1 TEXT ' Sieve of Eratosthenes ' MSG2 TEXT ' 10 iterations ' MSG3 TEXT ' done! ' SIZE EQU 8190 FLAGS BSS 8191 PRWS BSS >20 * main * EVEN START LWPI >8300 workspace in PAD for stock console LI R5,>0101 to initialize array LI R8,SIZE index LI R12,10 # iterations LI R15,11 constant (for calculating row of display) LI R0,36 display title LI R1,MSG1 LI R2,24 BLWP @VMBW AI R0,32 LI R1,MSG2 LI R2,20 BLWP @VMBW AGAIN CLR R7 0 constant CLR R10 counts primes LI R6,FLAGS base of FLAGS array LI R0,SIZE index FLAGS array LOOP MOV R5,*R6+ write ones to FLAGS DECT R0 JNE LOOP LI R6,FLAGS reset index FOR CB *R6+,R7 FLAGS(I)=0? JEQ SIEVE3 yes, skip strikeout loop MOV R0,R1 PRIME=I SLA R1,1 *2 AI R1,3 +3 MOV R0,R2 K=I A R1,R2 add PRIME to K SIEVE1 C R2,R8 K>SIZE? JGT SIEVE2 yes, goto SIEVE2 MOVB R7,@FLAGS(R2) no,reset @FLAGS(R2) A R1,R2 K=K+PRIME JMP SIEVE1 SIEVE2 INC R10 increment count of primes SIEVE3 INC R0 next I C R0,R8 I>SIZE? JLT FOR no, next MOV R10,@M print #primes found BL @PRINT DEC R12 JNE AGAIN rinse and repeat MOV @ROW,R15 print "Done!" INCT R15 SLA R15,5 MOV R15,R0 LI R1,MSG3 LI R2,12 BLWP @VMBW OUT LIMI 2 wait for quit JMP OUT * Print binary value as decimal (from Morley, p.149) * value to print @M PRINT LWPI PRWS INC @ROW increment row counter LI R0,4 LI R1,M3 LI R2,M2 R2 points to address containing result MOV @M,R4 mov integer to be converted to R4 J1 CLR R3 see Morley for comments DIV *R1+,R3 SLA R3,8 AI R3,>3000 MOVB R3,*R2+ DEC R0 JNE J1 SLA R4,8 AI R4,>3000 MOVB R4,*R2+ LI R1,>2000 MOVB R1,*R2 LI R0,>3000 LI R3,M2 J2 CB *R3,R0 JNE J3 MOVB R1,*R3+ JMP J2 J3 CB *R3,R1 JEQ J4 JMP J5 J4 DEC R3 MOVB R0,*R3 J5 MOV @ROW,R15 use row counter SLA R15,5 row *32 MOV R15,R0 INCT R0 write to this addres in VDP LI R1,M2 LI R2,6 BLWP @VMBW LWPI >8300 RT END Edited April 24, 2022 by Reciprocating Bill OCD 3 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 24, 2022 Author Share Posted April 24, 2022 It's a thing of beauty. 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 24, 2022 Share Posted April 24, 2022 8 hours ago, lucien2 said: GPL is disappointing with this one: 342 seconds. Maybe it's all these accesses to expansion RAM? I used an old mess version (0.136), I can't find how to run it on mame 0.164: eratosthenes sieve.g.bin With Classic99, it takes 387 seconds. Reveal hidden contents grom >6000 data >aa00,>0100,>0000 data menu data >0000,>0000,>0000,>0000 menu data >0000 data start stri 'ERATOSTHENES SIEVE' upcase equ >0018 iter equ >8300 count equ >8301 size equ >8303 i equ >8305 flags equ >a000 prime equ >8307 k equ >8309 tmp equ >830b s1 stri 'ITERATIONS' s2 stri 'PRIMES' s3 stri 'DONE!' line equ >830d start dst 8191,@size st 10,@iter dst >0900,@fac call upcase dclr @arg clr @arg+2 st @iter,@arg+3 call print clr @tmp move 1,g@s1,@tmp+1 dinc @arg+4 move @tmp,g@s1+1,v*arg+4 st 1,@line L6 dclr @count dst @size,@i st 1,@tmp L1 st @tmp,@flags(@i) ddec @i dcz @i br L1 dclr @i L3 cz @flags(@i) bs L2 dst @i,@prime dadd @i,@prime dadd 3,@prime dst @prime,@k dadd @i,@k L5 dch @size,@k bs L4 clr @flags(@k) dadd @prime,@k b L5 L4 dinc @count L2 dinc @i dch @size,@i br L3 st @line,@arg clr @arg+1 dst @count,@arg+2 call print clr @tmp move 1,g@s2,@tmp+1 clr @arg st @line,@arg+1 inc @line dsll 5,@arg dinc @arg+4 dadd @arg+4,@arg move @tmp,g@s2+1,v*arg dec @iter cz @iter br L6 clr @tmp move 1,g@s3,@tmp+1 clr @arg st @line,@arg+1 dsll 5,@arg move @tmp,g@s3+1,v*arg L7 b L7 * print unsigned word decimal * arg: row, arg+1: col, arg+2/3: number print clr @fac st @arg,@fac+1 dsll 5,@fac clr @arg dadd @arg,@fac call >2f7c st @>8367,@fac+3 clr @fac+2 clr @arg+4 move @arg+4,@>8300(@fac+2),v*fac rtn Are you counting all the time it takes to load character sets and set up in that timing? I mean I have that same routine in RXB 2022 but is now mostly Assembly! Quote Link to comment Share on other sites More sharing options...
lucien2 Posted April 24, 2022 Share Posted April 24, 2022 For this benchmark program, we could also keep the default character set. That could improve it by a few seconds, but I was expecting something 1.5x slower than Forth (~150-200 seconds). 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 24, 2022 Share Posted April 24, 2022 3 hours ago, lucien2 said: For this benchmark program, we could also keep the default character set. That could improve it by a few seconds, but I was expecting something 1.5x slower than Forth (~150-200 seconds). I would use GPL and Assembly in ROMs to do things just like XB does. Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted April 25, 2022 Share Posted April 25, 2022 4 hours ago, RXB said: I would use GPL and Assembly in ROMs to do things just like XB does. Rich - Are your Call Load (when used to poke) and Call Peek operations GPL routines? Seems to me that having fast, assembly equivalents would be very useful. It would then be practical to set up the equivalent of integer arrays larger than are possible with ordinary DIM statements - just as I did with Cortex BASIC. The original Xbasic calls are too slow for that purpose. (Although perhaps that wouldn't be much help vis speed - as anything to be poked or handled after being peeked would have to be floating point...but it would allow more compact data structures when integer values are sufficient.) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.