Jump to content
IGNORED

Byte Magazine Sieve Benchmark


TheBF

Recommended Posts

6 hours ago, speccery said:

Just realised I was missing the repeat loop, i.e. the thing should be run 10 times, here is that result:

1130496375_Sieve8192ScreenShot14.5.20229_34.thumb.png.87990e809f6a0ccd04d02fa91aae8b6f.png

At 8.69 seconds I am behind the assembly version on the TMS9900, so will definitely need to optimise the interpreter :) 

I think you beat it.  On a stock console Bill's hand coded solution runs in 10 seconds. 

I will put you in the record books. ;)

 

  • Thanks 1
Link to comment
Share on other sites

On 5/14/2022 at 9:20 AM, TheBF said:

I think you beat it.  On a stock console Bill's hand coded solution runs in 10 seconds. 

I will put you in the record books.

On Classic99 it runs in about 9.3 - 9.4 seconds. That's hand-timed. 

 

Edited to add: On a stock console it runs in 9.35 seconds. 

Edited by Reciprocating Bill
Stock console
Link to comment
Share on other sites

33 minutes ago, Reciprocating Bill said:

On Classic99 it runs in about 9.3 - 9.4 seconds. That's hand-timed. 

I think my old Dell struggles a bit. Classic99 is using 6..7% of the CPU.

So I saw closer to 10 here.

I will push it over to my stock console and double check.

 

  • Like 1
Link to comment
Share on other sites

  • 1 month later...
On 5/14/2022 at 8:29 AM, speccery said:

After some bug fixing, I got the Sieve running on the StrangeCart Basic. Needed to fix a problem with FOR loop handing when the initial value was already exceeding the limit. Also increased the size of symbol space (as I call it) to 40kbytes, so that the DIM for the flags succeeds. Below are screenshots of a run on the actual hardware as well as the listing of the TI Basic program. This cannot be run on the normal TI Basic, since the array size exceeds RAM space. Also for convenience line 110 makes the DIM based on a variable value, this does not work with TI Basic.

The runtime is 0.86 seconds with current firmware version.

22976621_SieveRun.thumb.png.396e480c1e58d656f21b7628a28ad9a8.png

And here is the listing.

2066605220_SieveListing.thumb.png.182f7208622f66f3515ab355acfc5183.png

I also edited this so that the code reads as below, which can be run on TI Basic. It takes about 47 seconds to run. On the StrangeCart Basic this takes just under 0.14 seconds. This is only 340 times faster, so I still have optimisation work to do, I haven't focused on optimisation yet. Without I/O the Basic runs at around 60.000 to 80.000 lines per second.

5240603_Sieve1024.thumb.png.5ed908e442287049782d7f511e5b54b7.png

Excuse me, where can i find this basic that handles array 8192
numbers?
Thanks !

Link to comment
Share on other sites

22 hours ago, atari1byte said:

Excuse me, where can i find this basic that handles array 8192
numbers?
Thanks !

Yes, like @HOME AUTOMATION linked to my thread above, this is a re-creation of TI BASIC running on a processor on a cartridge, not on the stock TI-99/4A. It also uses single precision floating point as the default data type, so each array entry takes 4 bytes instead of the 8 with TI BASIC. Thus larger arrays become possible - the StrangeCart has 192K of RAM, of which something like 128K is available to BASIC. At the moment the BASIC programs, in terms of program length, are limited to saved TI BASIC programs, which in turn have to fit into 16K VDP RAM. But the variable space is on top of that and currently I have set it to something like 40K. These are compile time options in the firmware.

Link to comment
Share on other sites

47 minutes ago, speccery said:

Sì, come AUTOMAZIONE @HOME collegato al mio thread sopra, questa è una ri-creazione di TI BASIC in esecuzione su un processore su una cartuccia, non sul TI-99/4A di serie. Utilizza anche la virgola mobile a precisione singola come tipo di dati predefinito, quindi ogni voce di matrice richiede 4 byte invece degli 8 con TI BASIC. Così diventano possibili array più grandi: StrangeCart ha 192K di RAM, di cui qualcosa come 128K è disponibile per BASIC. Al momento i programmi BASIC, in termini di lunghezza del programma, sono limitati ai programmi TI BASIC salvati, che a loro volta devono adattarsi alla RAM VDP 16K. Ma lo spazio variabile è in cima a quello e attualmente l'ho impostato su qualcosa come 40K. Queste sono le opzioni del tempo di compilazione nel firmware.

Hi!

Interesting!!!

StrangeCart Basic is only real hardware? 

ps. Thanks to the forum I'm also starting to work on RXB is a little slow....
I need to learn how to use it!

Link to comment
Share on other sites

2 minutes ago, atari1byte said:

Hi!

Interesting!!!

StrangeCart Basic is only real hardware? 

ps. Thanks to the forum I'm also starting to work on RXB is a little slow....
I need to learn how to use it!

Yes it's only real hardware. It's not totally impossible imagining to run it on classsic99, but that is not a small project...

Link to comment
Share on other sites

47 minutes ago, speccery said:

Yes it's only real hardware. It's not totally impossible imagining to run it on classsic99, but that is not a small project...

I have a TI 99...
I haven't turned it on in 45 years....
I don't know if it still works!
I'd like to turn it on again.
The price of the cartridges ?
 

  • Like 1
Link to comment
Share on other sites

  • 7 months later...

I got a version of this running with my JIT compiler early December last year but I forgot to cross-post it here. 

 

This JIT is limited

  • it can only optimize Forth words that are written in Assembly Language
  • it can replace all looping and branching words with native code equivalents (big win)
  • it replaces variables and constants with native code that loads the accumulator register with LI instruction

This happens to be enough to optimize the entire inner loop of the Byte Mag. sieve. :)

Here is the final source code that uses the JIT compiler. 

 

 

Spoiler
\ BYTE MAGAZINE SIEVE
\ 10 ITERATION TIMINGS
\                Normal          OPT
\ Camel99 DTC     104            50.5  (jit.fth DTC version)
\ TurboForth      107
\ Camel99 ITC     120            50.1 (jit2b.fth)
\ FbForth 2.0:13  174

NEEDS ELAPSE FROM DSK1.ELAPSE
NEEDS JIT:   FROM DSK1.JIT

NEW-HEAP

DECIMAL
 8190 CONSTANT SIZE
    0 VARIABLE FLAGS   SIZE ALLOT  0 FLAGS !

HEX
JIT: DO-PRIME
   FLAGS SIZE  01 FILL
   0
   SIZE 0
   DO FLAGS I + C@
     IF I DUP +  3 +  DUP I +
        BEGIN
          DUP SIZE <
        WHILE
           0 OVER FLAGS +  C!
           OVER +
        REPEAT
        DROP DROP 
        1+
     THEN
   LOOP
;JIT

DECIMAL
: PRIMES ( -- )
   PAGE ."  10 Iterations"
   10 0
   DO
      DO-PRIME CR . ." PRIMES"
   LOOP
   CR ." Done!"
;

 

 

  • Like 2
Link to comment
Share on other sites

Down the rabbit hole I went...

 

So I wondered was that really the best I could do? 

What other tricks have I developed over all this time?

 

  1. Included FILLW which fills word at a time not bytes (like the ASM version Bill did)
  2. Included CODEMACROS to give Forth indexed addressing ability 
  3. make getter & setter words for the array with the indexed addressing macros 
  4. Modernized the Forth code:
    1. changed VARIABLE to CREATE
    2. changed "DUP +" to 2*   
    3. "DROP DROP" to 2DROP 
  5. Pre-initialize the 1st 2 bytes of the array to zero
  6. Include the JIT compiler 
  7. JIT the inner loop 
Spoiler
\ BYTE MAGAZINE SIEVE WITH TABLE: and FILLW  

\ 10 ITERATION TIMINGS
\                Normal         JIT + indexed addressing + code improvement
\ TurboForth      107
\ Camel99 ITC     120                   37.8
\ FbForth 2.0:13  174

NEEDS JIT:   FROM DSK1.JIT
NEEDS ELAPSE FROM DSK1.ELAPSE
NEEDS FILLW  FROM DSK1.FILLW 
NEEDS ()C!,  FROM DSK1.CODEMACROS 

DECIMAL
8192 CONSTANT SIZE
CREATE FLAGS   SIZE ALLOT 

\ getter and setter for FLAGS ARRAY 
MACRO ]FLAGS@ ( ndx -- c )  FLAGS ()C@,  ;MACRO
MACRO ]FLAGS! ( c ndx -- )  FLAGS ()C!,  ;MACRO

HEX
NEW-HEAP 
JIT: DO-PRIME
   FLAGS SIZE  0101 FILLW
   FLAGS OFF 
   -1   
   SIZE 0   
   DO I ]FLAGS@
     IF I 2*  3 +  DUP I + ( cntr ndx1 ndx2 )
        BEGIN
          DUP SIZE <
        WHILE
           0 OVER ]FLAGS!
           OVER +
        REPEAT
        2DROP 
        1+
     THEN
   LOOP
;JIT

DECIMAL
: PRIMES ( -- )
   DECIMAL 
   PAGE ."  10 Iterations"
   10 0
   DO
      DO-PRIME CR . ." PRIMES"
   LOOP
   CR ." Done!"
;

 

 

And here is what happened.  About 3.2 times faster.

I think this is as fast I can get using just Camel99 Forth extensions.

 

image.thumb.png.d13e86cf749de6d01bfd703c71588c5c.png

  • Like 3
Link to comment
Share on other sites

I re-worked Bills' Assembler version to compile in Forth Assembler to finish the series.

It is a thing of beauty to watch on Tursi's heatmap display. :)

 

The code was so organized that it dropped perfectly into the structured looping used by the Forth assembler. 

I broke out the array initializer as a sub-routine which costs a few milli-seconds but not a big deal.  

 

In this example I BLWP to a new workspace so I didn't have to re-work  Bill's register allocations.

That is easy with a PROG: directive I created for Camel99 Forth.

At compile time  PROG:  takes a workspace argument and creates it's own vector. 

At run-time it does a BLWP to its vector and after RTWP it returns to Forth. 

 

Spoiler
\ SIEVE in Forth Assembler based on code by @Reciprocating Bill  on atariage.com 

\ Original notes by BIll.
\ * SIEVE OF ERATOSTHENES ------------------------------------------
\ * WSM 4/2022
\ * TMS9900 assembly adapted from BYTE magazine 9/81 and 1/83 issues
\ * 10 iterations 6.4 seconds on 16-bit console
\ * ~10 seconds on stock console

NEEDS DUMP FROM DSK1.TOOLS
NEEDS MOV, FROM DSK1.ASM9900

\ PROG: is compiler extension for Forth
\ - at compile time creates 9900 Vector with workspace and entry address
\ - at run-time it does a BLWP to it's own vector 
: PROG: ( wksp -- )
   CREATE    ,  HERE CELL+ ,  !CSP
  ;CODE  *W BLWP,  NEXT, ENDCODE

: ;PROG  ( -- ) ?CSP  ;  \ check stack position for junk left on it.

\ macro to give you Forth's TOS register from a foreign workspace
: [TOS]  ( -- )  8  R13 () ; 


DECIMAL 
8190 CONSTANT SIZE
CREATE FLAGS   SIZE CELL+ ALLOT 

CREATE WRKSP2  16 CELLS ALLOT 

\ ----------------------------------------
\ Registers used 
\ R0    I loop index                      
\ R1    prime address 
\ R2    inner loop address (K)
\ R7    NUMBER 0  for byte comparison     
\ R5    0101 
\ R6    base address of flags array 
\ R8    size of array              
\ R10   prime number counter       

HEX
CREATE INIT-FLAGS
\ set array to 1
    R6 FLAGS LI,        \ base of FLAGS array
    R0 SIZE  LI,        \ index FLAGS array
    R5 0101  LI,        \ To initialize array
    BEGIN,
        R5 R6 *+ MOV,  \ write ones to FLAGS
        R0 DECT,
    NC UNTIL,  
    RT, 

DECIMAL 
WRKSP2 PROG: DO-PRIME ( 0 -- n)  
\ Called with '0' on stack to provide a place for return value 

   INIT-FLAGS @@ BL, 
\ inits 
   R7 CLR,       \ 0 constant
   R6 FLAGS LI,  \ array base address 
   R8 SIZE  LI,  \ size 

   R10 CLR,       \ clear counter
   R0  CLR,       \ clear loop index 
   BEGIN, 
     R6 *+ R7 CMPB,        \ FLAGS I + C@
     NE IF, 
          R0 R1 MOV,     \ I -> R1
          R1  1 SLA,     \ R1 2*
          R1  3 AI,      \ R1 3+
          R0 R2 MOV,     \ I -> R2 ( R2 is K index) 
          R1 R2 ADD,     \ PRIME K +! 
          BEGIN,  
            R2 R8 CMP,       \ K SIZE COMPARE 
          LTE WHILE,  
            R7 FLAGS R2 () MOVB, \ no,reset @FLAGS(R2)
            R1 R2 ADD,     \ PRIME K +! 
          REPEAT, 
          R10 INC,           \ increment count of primes
    ENDIF, 
    R0 INC,                \ next I
    R0 R8 CMP,             \ I>SIZE?
  GTE UNTIL,              \ no, next

  R10 [TOS] MOV,          \ R10 -> Forth top of stack register, R4
  
  RTWP, 
;PROG  

: PRIMES ( -- )
   PAGE ."  10 Iterations"
   10 0 DO   0 DO-PRIME  CR . ." primes"  LOOP
   CR ." Done!"
;

 

 

image.thumb.png.6f4c143daab680c7d20a56a029835b1d.png

 

  • Like 3
Link to comment
Share on other sites

3 hours ago, Reciprocating Bill said:

"SIEVE in Forth Assembler based on code by @billsullivan atariage.com"

 

Oops - I'm a different Bill. Just FYI. 

My humble apologies.  I will fix it directly.  

Appropriate reciprocation shall be added. :)

 

Link to comment
Share on other sites

10 hours ago, Reciprocating Bill said:

No, always and only as Reciprocating Bill. I'm not Bill R. Sullivan. 

I see that Bill Sullivan used to sign off as "Retro Bill." Plenty similar. 

 

I was not yet even lurking here when he was posting.

 

"Reciprocating Bill" has a very obscure origin in vintage HiFi, specifically, the University Sound "Reciprocating Flare" design for midrange and high frequency horns, which originated in the early 1950's or so. The TI has not been my only retro interest. 

  • Like 3
Link to comment
Share on other sites

  • 2 months later...
On 4/24/2022 at 2:40 PM, lucien2 said:

For this benchmark program, we could also keep the default character set.

That could improve it by a few seconds, but I was expecting something 1.5x slower than Forth (~150-200 seconds).

I missed this comment last year.

 

GPL is a byte coded system so it will always run slower as you suspected, but I don't think the GPL interpreter is as stripped down as the Forth equivalent. 

Forth's running interpreter as you know is 3 instructions on TI-99 for indirect threading and 2 instructions for direct threading. 

 

I have it on my project list to write a byte-coded Forth for TI-99 to see what happens and I suspect your 1.5 factor is going to be about right.

Although the TI-99's indirect addressing instructions might make the difference smaller. 

 

Does anyone have the GPL runtime interpreter code or know how many instructions are executed to interpret a byte code?

Based on the benchmark it appears to be between 6 to 10 instructions.

 

 

 

Link to comment
Share on other sites

30 minutes ago, TheBF said:

Does anyone have the GPL runtime interpreter code or know how many instructions are executed to interpret a byte code?

Based on the benchmark it appears to be between 6 to 10 instructions.

 

The GPL Interpreter is most of the first third or so of the console ROM (see the listing of ROM-4A.lst at TI-99/4A Operating System). It starts at line 261 in the listing. You will notice the familiar NEXT label at listing line 282.

 

...lee

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

8 hours ago, Lee Stewart said:

 

The GPL Interpreter is most of the first third or so of the console ROM (see the listing of ROM-4A.lst at TI-99/4A Operating System). It starts at line 261 in the listing. You will notice the familiar NEXT label at listing line 282.

 

...lee

Thanks Lee,

 

From looking at NEXT in GPL it's no surprise that it runs slower than address threaded Forth.

I don't pretend to understand all the details and I can see that not all of this code runs for every GPL instruction, but it's still a lot more than 3 instructions.

 

The counter to this on the Forth side, is that some of the GPL NEXT seems to be doing things that are distributed in other places in Forth.

Variables, branching and code to enter a routine exist separate from the NEXT interpreter. So that's where Forth spends extra time that is not seen if we just look at Forth's NEXT code. 

I think that's as deep as I need to go. :) 

 241  0070 0300 NEXT   LIMI 2                ALLOW INT'S BETWEEN GPL INSTN'S
 241  0072 0002  
 242  0000 0072 HX0002 EQU  $-2
 243  0074 0300 C074   LIMI 0
 243  0076 0000  
 244  0078 D25D        MOVB *R13,R9          LOAD INSTN FROM GAME ROM.
 245  007A 1105        JLT  ABOPS            JMP IF MS BIT SET
 246  007C D109        MOVB R9,R4            MOVE INSTN TO WORK REG'R
 247  007E 09C4        SRL  R4,12            LEAVE TOP 3 BITS *2
 248  0080 C164        MOV  @ITAB(R4),R5     GET BRANCH ADDRESS
 248  0082 0C36  
 249  0084 0455        B    *R5              B @MICSLN, MOVDAT, BRESET, BSET
 250            *---------INSTRUCTIONS WITH A & B OPERANDS---------
 251  0086 04C4 ABOPS  CLR  R4               CLEAR VDP RAM FLAGS
 252  0088 C149        MOV  R9,R5            LOAD R5 WITH DOUBLE FLAG
 253  008A 0245        ANDI R5,>0100         IF 0, BYTE OPND, IF 1 = WORD OPND
 253  008C 0100  
 254  008E 06A0        BL   @GETMAD          GET FIRST OPERAND
 254  0090 077A  
 255  0092 06C4        SWPB R4
 256  0094 C0C1        MOV  R1,R3            SAVE VARIABLE ADDRESS
 257  0096 C080        MOV  R0,R2            SAVE VAR. VALUE (OPERAND)
 258  0098 0289        CI   R9,>A000         SINGLE OPND? (8X OR 9X = SINGLE)
 258  009A A000  
 259  009C 1A09        JL   AOPS             YES
 260  009E 2260        COC  @C030,R9         IMMEDIATE OR VAR? (C030 =>0200)
 260  00A0 0030  
 261  00A2 160C        JNE  ABOPA            VARIABLE
 262  00A4 C04D        MOV  R13,R1           GET IMMEDIATE VALUE
 263            *                            (2ND OPERAND IS IMMEDIATE)
 264  00A6 D011        MOVB *R1,R0           GET FIRST BYTE
 265  00A8 0601        DEC  R1               MODIFY FOR COMMON ROUTINE
 266  00AA 06A0        BL   @MADC2           GO TO COMMON ROUTINE (PUT 2ND IMM.
 266  00AC 07AA  
 267            *                            OPND BYTE IN R0 LSB, IF WORD.
 268            *                            OTHERWISE SIGN EXTEND BYTE TO LSB)
 269  00AE 1008        JMP  BOPS
 270  00B0 C209 AOPS   MOV  R9,R8            BRANCH THROUGH BRANCH TABLE
 271  00B2 0988        SRL  R8,8
 272  00B4 0700        SETO R0
 273  00B6 C228        MOV  @ATAB(R8),R8
 273  00B8 0BFE  
 274  00BA 0458        B    *R8
 275  00BC 06A0 ABOPA  BL   @GETMAD
 275  00BE 077A  
 276  00C0 C209 BOPS   MOV  R9,R8            BRANCH THROUGH BRANCH TABLE
 277  00C2 0998        SRL  R8,9
 278  00C4 C228        MOV  @BTAB(R8),R8     LOAD BRANCH TABLE
 278  00C6 0C4E  
 279  00C8 8002        C    R2,R0            COMPARE FOR IF PRIMITIVES
 280  00CA 0458        B    *R8

 

  • Like 2
Link to comment
Share on other sites

I posted this over in the Camel99 thread but here is ASMForth code for this sieve thread. 

ASMForth was my attempt to try and blend Assembler and Forth. 

I knew that using only stacks for data, the 9900 would never compete with C or Assembler so this was my rebuttal. :) 

(A smarter man would have made an optimizing Forth compiler)

 

You can read about it here: GitHub - bfox9900/ASMFORTH: Experimental Assembler using Forth like syntax

 

It works and it's fast but nobody else will ever use it. :) 

I wanted to see if it was possible. 

(TOS in the code is just R4 aliased) 

 

Here is what the sieve looks like is ASMForth. It runs in ~10 seconds with WKSP at >8300

\ SIEVE in ASMFORTH for Camel99 Forth                 Mar 1 2023 Brian Fox
\ based on code by @Reciprocating Bill atariage.com 

\ Original notes by BIll.
\ * SIEVE OF ERATOSTHENES ------------------------------------------
\ * WSM 4/2022
\ * TMS9900 assembly adapted from BYTE magazine 9/81 and 1/83 issues
\ * 10 iterations 6.4 seconds on 16-bit console
\ * ~10 seconds on stock console

\ Removed size in R8. Used immediate compare 

HOST 
DECIMAL 8190 CONSTANT SIZE
HEX     2000 CONSTANT FLAGS   \ array in Low RAM 

ASMFORTH 
: FILLW ( addr size char --) \ nestable sub-routine 
    TOS R0 !        \ char -> R0 
    TOS POP         \ size in TOS for FOR to use 
    R1  POP         \ Memory location in R1
    TOS FOR         \ FOR takes a register argument 
       R0 R1 @+ !   \ write ones to FLAGS
    NEXT2           \ *NEW* counts down by 2 
;                 

: PRIME-MOVER 
  FLAGS # SIZE # 0101 # FILLW
\ inits 
  R0 OFF        \ clear loop index 
  R3 OFF        \ 0 constant
  FLAGS R5 #!   \ array base address 
  0 #           \ counter on top of Forth stack 

  SIZE #FOR     \ #FOR takes a literal number argument 
    R5 @+ R3 CMPB       \ FLAGS C@+ byte-compared to R3 (ie: 0)
    <> IF               \ not equal to zero ? 
          R0 R1 !       \ I -> R1
          R1 2*  R1 3 #+ \ R1 2* 3+
          R0 R2 !       \ I -> R2 ( R2 is K index) 
          R1 R2 +       \ PRIME K +! 
          BEGIN  
            R2 SIZE #CMP  \ K SIZE compare 
          < WHILE  
            R3 FLAGS (R2) C! \ reset byte FLAGS(R2)
            R1 R2 +     \ PRIME K +! 
          REPEAT 
          TOS 1+         \ increment count of primes
    THEN 
    R0 1+                \ bump index register
  NEXT 
;

HEX
CODE DO-PRIME ( -- n)  \ code words can be called from Forth 
    PRIME-MOVER        \ ASMForth colon words are called from CODE words
;CODE  

ONLY FORTH ALSO MFORTH ALSO FORTH DEFINITIONS 
DECIMAL 
: PRIMES ( -- )
   PAGE ."  10 Iterations"
   10 0 DO   DO-PRIME  CR . ." primes"  LOOP
   CR ." Done!"
;

 

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...