Jump to content
IGNORED

Benchmarking Languages


Tursi

Recommended Posts

6 hours ago, Reciprocating Bill said:

C'mon Rich. On real iron with a stop watch:

 

10 for I = 1 to 1000

20 call clear

30 next I

 

RXB 2020  22.6" (identical to XB) 

RXB 2022  51.9"

 

Of course, the accuracy is to within ~.25 seconds (namely, my reaction time). But you don't need an atomic clock, or 100,000 iterations, to see that something changed between RXB 2020 and RXB 2022. 

Your STOP WATCH is a terrible way to time something on a device that is Electronic. (Go ahead and tell me I am wrong here?)

 

And here is my reply to Senior Falcon:

 

"You are correct and I was wrong.

Turns out when working on CALL COLLIDE I accidently reverted back a section of Assembly for CALL CLEAR and made it worse.

 

I have to update RXB 2022 to send out the correction, thanks for your help."

  • Like 1
Link to comment
Share on other sites

3 hours ago, Asmusr said:

The way to improve the performance of CALL CLEAR relative to XB would be to make it an unrolled loop. I think just 2 times unrolled would beat XB even though it has the advantage of running from 16-bit ROM, and 4 times unrolled would be 20% faster. But there are probably more important places where the performance could be improved by assembly routines. 

 

Good valid point and this is my CALL CLEAR assembly I need to put back into RXB 2022B

*********************************************************
* CALL CLEAR                                            *
*********************************************************
CLEAR  MOV  R11,R9    * save return address
       CLR  R0        * set to screen start
       LI   R1,>8000  * Screen offset + Space Character
       LI   R3,24     * ROW counter
       BL   @VWADD    * write out VDP write address
       LI   R8,VDPWD  * Register faster then address
ROWLP  MOVB R1,*R8    * write next VDP byte from R1 1
       MOVB R1,*R8    * write next VDP byte from R1 2
       MOVB R1,*R8    * write next VDP byte from R1 3
       MOVB R1,*R8    * write next VDP byte from R1 4
       MOVB R1,*R8    * write next VDP byte from R1 5
       MOVB R1,*R8    * write next VDP byte from R1 6
       MOVB R1,*R8    * write next VDP byte from R1 7
       MOVB R1,*R8    * write next VDP byte from R1 8
       MOVB R1,*R8    * write next VDP byte from R1 9
       MOVB R1,*R8    * write next VDP byte from R1 10
       MOVB R1,*R8    * write next VDP byte from R1 11
       MOVB R1,*R8    * write next VDP byte from R1 12 
       MOVB R1,*R8    * write next VDP byte from R1 13
       MOVB R1,*R8    * write next VDP byte from R1 14
       MOVB R1,*R8    * write next VDP byte from R1 15 
       MOVB R1,*R8    * write next VDP byte from R1 16
       MOVB R1,*R8    * write next VDP byte from R1 17 
       MOVB R1,*R8    * write next VDP byte from R1 18
       MOVB R1,*R8    * write next VDP byte from R1 19
       MOVB R1,*R8    * write next VDP byte from R1 20
       MOVB R1,*R8    * write next VDP byte from R1 21
       MOVB R1,*R8    * write next VDP byte from R1 22
       MOVB R1,*R8    * write next VDP byte from R1 23
       MOVB R1,*R8    * write next VDP byte from R1 24
       MOVB R1,*R8    * write next VDP byte from R1 25
       MOVB R1,*R8    * write next VDP byte from R1 26
       MOVB R1,*R8    * write next VDP byte from R1 27
       MOVB R1,*R8    * write next VDP byte from R1 27
       MOVB R1,*R8    * write next VDP byte from R1 29
       MOVB R1,*R8    * write next VDP byte from R1 30
       MOVB R1,*R8    * write next VDP byte from R1 31
       MOVB R1,*R8    * write next VDP byte from R1 32
       DEC  R3        * ROW-1
       JNE  ROWLP     * Not 0 so loop again
       B    *R9       * return to caller
***********************************************************

 

Link to comment
Share on other sites

26 minutes ago, Reciprocating Bill said:

A stopwatch is plenty accurate to show that RXB 2020 (like Extended Basic) runs the Call Clear loop more than twice as quickly as RXB 2022. 

 

The question remains, why? 

Go back a few posts I explain why and admit I did make a mistake and have a fix.

 

My solution is use TIPI clock in REAL IRON (TI99/4A) and use my program.

 

100 CALL CLEAR
110 OPEN #1:"CLOCK"
120 INPUT #1:A$,B$,C$
130 FOR C=1 TO 10000
140 PRINT
150 NEXT C
160 INPUT #1:D$,E$,F$
170 PRINT A$,D$:B$,E$,C$,F$
180 END

Change line 140 to CALL CLEAR

You know the changing to ROM 3 is the slow part.

  ST >64,@>6004 * changes to ROM3

  XML >81            * Assembly of program in ROM3 above

  ST >60,@>6000 * changes to ROM0

  BR   LNKRT2       * return to XB 

 

This is slower then 

  ALL  32+96        * SPACE + OFFSET

  BR   LNKRT2

 

Wish there was a faster way to switch to ROM 3 then using GPL to do it.

XB normally will switch ROM1 and ROM2 so I guess I will have to fix that....

Link to comment
Share on other sites

On 4/22/2022 at 1:31 PM, Vorticon said:

If anyone is interested here's the original Byte article regarding the benchmarking of the common programming languages of the time using the sieve of Eratosthenes and includes interesting comparative performance tables. I used it to benchmark the Pascal language on the ZX-81 recently (listing below).

 

1004677803_sieveoferasthosthenespascallisting.thumb.jpg.ad463eba158f65de8f6b51fd2b57cc93.jpg

 

Erastosthenes sieve benchmark.pdf 11.29 MB · 12 downloads

I converted it to TF as follows:

2047 constant size
create flags size allot
variable prime
variable k
variable _count

: sieve
  prime 0!  k 0!  
  cr ." 10 iterations "
  10 0 do
    _count 0!
    flags size $ff fill
    
    size 0 do
      flags i + c@ if
        i 3 + i + prime !
        i prime @ + k !
        begin 
          k @ size <= while
          k @ flags + 0! 
          k @ prime @ + k !
        repeat
        1 _count +!
      then
    loop
    i .
  loop
  cr _count @ . ." primes" cr

Note: 0! is peculiar to TF. It just zeros a cell at an address. It can be added to other Forths as follows:
 

: 0! ( addr -- ) 0 ! ;

However the code produces an incorrect result of 563 primes. I've checked the code against the Pascal version but can't see where I went wrong. Can anyone comment?

 

I ran another sieve program from the Rosetta website in TF and produces a result (309 primes) in ~2.5 seconds.

 

Spoiler

: prime? ( n -- ? ) here + c@ 0= ;
: composite! ( n -- ) here + 1 swap c! ;
0 value pcount

: sieve ( n -- )
  here over 0 fill
  2
  begin
    2dup dup * >
  while
    dup prime? if
      2dup dup * do
        i composite!
      dup +loop
    then
    1+
  repeat
  drop
  0 to pcount
  2 do i prime? if 1 +to pcount then loop 
  cr pcount . ." primes found" cr ;
  
  2047 sieve

 

 

  • Like 1
  • Haha 1
Link to comment
Share on other sites

21 minutes ago, Reciprocating Bill said:

"Size" in the BYTE Sieve is 8192. That yields 1899 primes. 

I'm not sure what you're inferring here, sorry. :dunce:

I did find a little bug, fixed below:

2047 constant size
create flags size allot
variable prime
variable k
variable _count

: sieve
  prime 0!  k 0!  
  cr ." 10 iterations "
  10 0 do
    _count 0!
    flags size $ff fill
    
    size 0 do
      flags i + c@ if
        i 3 + i + prime !
        i prime @ + k !
        begin 
          k @ size <= while
          0 k @ flags + c!   <--- don't use 0! here, the array is an array of bytes, not cells.
          k @ prime @ + k !
        repeat
        1 _count +!
      then
    loop
    i .
  loop
  cr _count @ . ." primes" cr
;

But the results are the same.

  • Like 1
Link to comment
Share on other sites

3 hours ago, Willsy said:

However the code produces an incorrect result of 563 primes. I've checked the code against the Pascal version but can't see where I went wrong. Can anyone comment?

 

That answer is correct. Because it really doesn’t make much sense to include even numbers (known to be composite, i.e., not prime), the program under discussion uses an array of flags representing all odd numbers from 3 to size*2+1. For size = 2047, the range of numbers tested is 3 – 4095. There are, indeed 563 primes in that range.

 

That said, I would make one change to your program to avoid overflow by 1 of your flags array:

Spoiler

2047 constant size
create flags size allot
variable prime
variable k
variable _count

: sieve
  prime 0!  k 0!  
  cr dup . ." iterations "
  10 0 do
    _count 0!
    flags size $ff fill
    size 0 do
      flags i + c@ if
        i 3 + i + prime !
        i prime @ + k !
        begin 
          k @ size < while    \ LES change
          0 k @ flags + c!
          k @ prime @ + k !
        repeat
        1 _count +!
      then
    loop
    i .
  loop
  cr _count @ . ." primes" cr
;

 

 

...lee

  • Like 1
Link to comment
Share on other sites

4 hours ago, Willsy said:

 

Note: 0! is peculiar to TF. It just zeros a cell at an address. It can be added to other Forths as follows:

 

HsForth (circa 1985..1995) which I still use, (Jim Kalahan) has 0! as well and also has 0C! .

 

Tom Zimmer's FPC and a few others started calling this ON and also had the reciprocal OFF back in the 90s.

That has caught on and GForth uses ON/OFF now so I went with that in Camel99.

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, Lee Stewart said:

 

That answer is correct. Because it really doesn’t make much sense to include even numbers (known to be composite, i.e., not prime), the program under discussion uses an array of flags representing all odd numbers from 3 to size*2+1. For size = 2047, the range of numbers tested is 3 – 4095. There are, indeed 563 primes in that range.

Ah! Yes! That makes more sense. Thanks for the clarification, Lee.

 

Well, okay then - for 10 iterations, ~37 secs in TF (stopwatch). However, that's in Classic99, running in Linux, under Wine. YMMV!

Link to comment
Share on other sites

Just for fun I took a 2nd run at Mark's translation of the program. (was Pascal the reference?) 

I would have done it the same way using variables first to get it correct to original. 

 

The second phase is to make it idiomatic to the stack machine.

 

Below I removed the K variable in the inner loop and used the data stack instead.

Then I noticed I didn't need the count variable either and used the data stack for that.  

 

I was surprised at the difference in the speed and it saved 26 bytes of program space. 

This demonstrates the old chestnut from Forth gurus about translations of an algebraic language program to Forth are never optimal. 

 

P.S.  I am not trying to show-off.  I have only recently started to "get this" :dunce: so I am trying to share what I have learned. 

Spoiler


INCLUDE DSK1.ELAPSE

HEX FF CONSTANT $FF

DECIMAL
2047 CONSTANT SIZE
CREATE FLAGS SIZE ALLOT
VARIABLE PRIME
VARIABLE K
VARIABLE _COUNT

: SIEVE
  PRIME OFF  K OFF
  CR ." 10 ITERATIONS "
  10 0 DO
    _COUNT OFF
    FLAGS SIZE $FF FILL
    SIZE 0 DO
      FLAGS I + C@ IF
        I 3 + I + PRIME !
        I PRIME @ + K !
        BEGIN
          K @ SIZE < WHILE    \ LES CHANGE
          0 K @ FLAGS + C!
          K @ PRIME @ + K !
        REPEAT
        1 _COUNT +!
      THEN
    LOOP
    I .
  LOOP
  CR _COUNT @ . ." PRIMES" CR
;

: SIEVE-NO-K
  PRIME OFF ( K OFF)
  CR ." 10 ITERATIONS "
  10 0 DO
    0  ( _COUNT OFF)           \ put counter on the data stack
    FLAGS SIZE $FF FILL
    SIZE 0 DO
      FLAGS I + C@
      IF
        I 3 + I + PRIME !
        I PRIME @ +   (  K !)  \ use data stack for storage
        BEGIN
           DUP SIZE < WHILE    \ LES CHANGE + remove K variable
           0 OVER FLAGS + C!
           PRIME @ +   ( K !)  \ use data stack for storage
        REPEAT
        DROP                  \ extra instruction to clean up
        1+     ( 1 _COUNT +!) \ increment counter on data stack
      THEN
    LOOP
    I .
  LOOP
  CR ( _COUNT @ )  . ." PRIMES" CR
;

 

 

 

 

image.png.c5741c106612ebeefb489af0182b8e17.png

  • Like 1
Link to comment
Share on other sites

1 minute ago, apersson850 said:

The article in Byte, linked to earlier in this thread, does have the benchmark listed in different languages. Some Forth version included.

Yes.  I got impression that Mark did his own translation of the algorithm because his is different than the article Forth version.

 

Link to comment
Share on other sites

I took the Pascal code that @Vorticon posted a screen-shot of. See my post earlier today ^^^^.

 

Okay, taking your second code, Brian, sieve-no-k, TF seems to take about 27 ish seconds - a whopping 27% faster. But again, that's running on Classic99 in Linux, in Wine - so there's no guarantee of the accuracy of the instruction timing. Could you take TK for a spin on your machine? You'll need a stopwatch and a good eye ;-) 

Link to comment
Share on other sites

3 minutes ago, Willsy said:

I took the Pascal code that @Vorticon posted a screen-shot of. See my post earlier today ^^^^.

 

Okay, taking your second code, Brian, sieve-no-k, TF seems to take about 27 ish seconds - a whopping 27% faster. But again, that's running on Classic99 in Linux, in Wine - so there's no guarantee of the accuracy of the instruction timing. Could you take TK for a spin on your machine? You'll need a stopwatch and a good eye ;-) 

Sure. No problem.

( I used my new build of DTC Forth because I don't trust it yet so more code more bugs found) :)

 

Link to comment
Share on other sites

On my machine it ran in 41 seconds measured by eye. (Windows 10, Dell "optiplex" computer) (Linux/Wine is pretty amazing!)

 

And for comparison my unreleased Camel99 ITC version took 40.53 and look the same on the clock.

Link to comment
Share on other sites

Just now, Willsy said:

Looks like SIEVE-NO-K is a fraction faster when replacing VARIABLEs with VALUEs. That would make sense... There would be fewer passes through the inner interpreter.

Yes. You use an nicely optimized VALUE method to do that. 

Link to comment
Share on other sites

1 minute ago, TheBF said:

On my machine it ran in 41 seconds measured by eye. (Windows 10, Dell "optiplex" computer) (Linux/Wine is pretty amazing!)

 

And for comparison my unreleased Camel99 ITC version took 40.53 and look the same on the clock.

Ah that's interesting - that would imply Classic99 is running much too fast under wine!

 

It's cool that TF (ITC) is holding its own with Camel ITC. Would be interesting to port to other languages - BASIC etc and re-run. Unfortunately, in those language the use of floating point to represent integers would be the big killer.

Link to comment
Share on other sites

Oh yes TF holds it's own just fine.

Your number conversion/print to screen is over 2X faster. :) 

 

I get some advantage from TOS in register that kind of/sort of compensates for all the primitives you have in fast ram. 

In the newest version here I squeaked ! and DROP into the scratch-pad ram because ! is slower with TOS in a register.

It's pretty much all I can do now without going to an optimizer.

My simple one line optimizer does get me about 40% improvement by removing next between code words.

 

You could probably port this to TF without too much inconvenience. (?)

 

Spoiler

\ INLINE5B.FTH Compiles inline code as headless word in HEAP  Dec 2, 2020 B Fox
\ *VERSION 5* CAN OPTIMIZE VARIABLES, CONSTANTS AND LITERAL NUMBERS*
\ Improved constant optimization
\ APR 15 2021: Made CODE, one definition
\ This is a very narrow focus, static JIT (just in time compiler)
\ Dec 2021: Simplified CODE, with BOUNDS
\ Apr 2022  Added primitives for DROP, @ and ! that are in hispeed RAM.
\ Problem:
\  The extra overhead to compile an ITC word as inline machine code
\  is 4 bytes for the ENTRY and 4 bytes to correct the IP.
\  This meant it was OK to make new code words that combined other code words.
\  INLINE[ ] in this version uses HEAP memory to compile a headless version
\  of the new code word. That NEW XT is compiled into your Forth definition.
\

\ **not portable Forth code**  Uses TMS9900/CAMEL99 CARNAL Knowledge

\ NEEDS .S   FROM DSK1.TOOLS
\ NEEDS MOV, FROM DSK1.ASM9900
NEEDS CASE FROM DSK1.CASE

HERE
HEX
\ *** changed for kernel V2.69 ***
\ need copies of words that do not end in NEXT, in the Camel99 kernel
CODE DUP    0646 , C584 ,  NEXT, ENDCODE
CODE DROP   C136 ,         NEXT, ENDCODE
CODE !      C536 , C136 ,  NEXT, ENDCODE
CODE @      C114 ,         NEXT, ENDCODE
CODE C@     D114 , 0984 ,  NEXT, ENDCODE
CODE +      A136 ,         NEXT, ENDCODE

\ CFA of a code word contains the address of the next cell
: NOTCODE? ( -- ?)  DUP @ 2- - ;

\ Heap management words
: HEAP    ( -- addr) H @ ;
: HALLOT  ( n -- )   H +! ;
: H,   ( n -- )    HEAP ! 2 HALLOT ;

045A CONSTANT 'NEXT'  \ 9900 CODE for B *R10   Camel99 Forth's NEXT code

: CODE,  ( xt --)  \ Read code word from kernel, compile into target memory
           >BODY 80 CELLS  ( -- addr len)
           BOUNDS ( -- IPend IPstart)
           BEGIN
              DUP @ 'NEXT' <>  \ the instruction is not 'NEXT'
           WHILE
             DUP @  ( -- IP instruction)
             H,   \ compile instruction
             CELL+  \ advance IP
             2DUP < ABORT" End of code not found"
           REPEAT
           2DROP
;

HEX
: DUP,    ['] DUP  CODE, ;

\ LIT,              TOS PUSH,   TOS n LI,
: LIT,      ( n -- ) DUP,     0204 H,  ( n) H,  ;

\ new interpreter loop for inlining *future* make this the Forth compiler
: INLINE[ ( -- addr)  \ Returns address where code has been copied
           HEAP ( -- XT)  \ HEAP will be our new execution token (XT)
           DUP 2+ H,    \ create the ITC header for CODE word
           BEGIN   BL WORD CHAR+ C@  [CHAR] ] <>  WHILE
              HERE FIND
              IF ( *it's a Forth word* )
                 DUP NOTCODE?
                 IF DUP
                    @  \ get the "executor" code routine address
                    CASE
                      [']  DOVAR    OF >BODY LIT,    ENDOF
                      [']  DOCON    OF  EXECUTE LIT, ENDOF
                      [']  DOUSER   OF  EXECUTE LIT, ENDOF
                      CR ." *Can't optimize type"  TRUE ?ERR
                    ENDCASE

                 ELSE  \ it's a CODE primitive
                       CODE,  \ compile kernel code
                 THEN

             ELSE ( maybe its a number)
                 COUNT NUMBER?  ?ERR
                 ( n ) LIT,   \ compile n as a literal
             THEN
           REPEAT
           045A H,    \ compile NEXT at end of HEAP code
           COMPILE,   \ compile HEAP XT into current colon definition
;  IMMEDIATE

HERE SWAP - SPACE DECIMAL . .( bytes) HEX CR

 

 

 

 

  • Like 1
Link to comment
Share on other sites

28 minutes ago, TheBF said:

Oh yes TF holds it's own just fine.

Your number conversion/print to screen is over 2X faster. :) 

 

I get some advantage from TOS in register that kind of/sort of compensates for all the primitives you have in fast ram. 

In the newest version here I squeaked ! and DROP into the scratch-pad ram because ! is slower with TOS in a register.

It's pretty much all I can do now without going to an optimizer.

My simple one line optimizer does get me about 40% improvement by removing next between code words.

 

You could probably port this to TF without too much inconvenience. (?)

Oh! Nice - I already have an inliner somewhere I think... Let me see if I can find it - not eveything got transferred from my old Windows box to my Linux box...

 

Yep! Found it - file creation date is 19th May 2015 - Oh boy! Time flies...

 

variable ilExit

: asmHeader ( "name" -- )
  \ creates a dictionary header for a primitive
  header        \ create dictionary entry
  here 2+ ,     \ lay down cfa (points 2 bytes ahead)
;

: appendCode ( cfa -- )
  @        \ addr
  begin 
    dup @ $045C ( NEXT opcode) <> while
      dup @ , 2+
  repeat
  drop
;

: ;inLine ( -- ) ;

: inLine: ( -- )
  asmHeader  ilExit 0!
  begin
    bl word find if
      dup ['] ;inLine <> if appendCode else true ilExit ! then
    else
      true abort" Unknown word in definition"
    then
  >in @ span @ >= ilExit @ or until
  drop 
  $045C ,  \ append NEXT op-code
;

inLine: dup+drop dup + drop ;inLine

: test 0 0 do dup+drop loop ; \ 11.5 seconds
: test 0 0 do dup + drop loop ; \ 15 seconds

~23% speed improvement

The filename is called 'primitives inliner.txt' so I guess I was thinking that 'primitives' (i.e. intrinsic TF machine code words) could be chained end-to-end to reduce inner interpreter machinations. It really needs updating to ensure that the words selected for in-lining are indeed primitives.

  • Like 1
Link to comment
Share on other sites

I found that I had already, almost 40 years ago, written this program and done the benchmark. It was before my clock card was ready, so timing had to be done with an alarm clock.

It turns out that the TI implementation of UCSD Pascal runs at a speed that's virtually identical to what the Apple ][ did back then.

  • Like 3
Link to comment
Share on other sites

Also, here's an experimental peephole optimise that I was working on (file date: 15 March, 2017). It was inspired by an article I read in Forth Dimensions.


0 value _wa
0 value _wl
0 value _im
0 value _cfa
0 value _ucc
0 value _num
0 value cfa1
0 value cfa2
0 value cfa3
0 value bytesSaved

: +opt ( -- )
    0 to bytesSaved ;

: -opt ( -- )
    cr bytesSaved . ." total bytes saved." cr ;

: .saved ( n -- )
    dup +to bytesSaved
    . ." bytes saved" cr ;

: re-compile ( xt offset -- )
    dup h @ + h ! swap ,
    .saved ;
    
: <1+ ( a b -- a+1 b ) swap 1+ swap ;
: <1- ( a b -- a+1 b ) swap 1- swap ;
: <2+ ( a b -- a+2 b ) swap 2+ swap ;
: <2- ( a b -- a+2 b ) swap 2- swap ;
: <2* ( a b -- a+2 b ) swap 2* swap ;

: n+ ( n -- n+x ) + ;
: n- ( n -- n-x ) - ;
: n* ( n -- n*x ) * ;
: n= ( n -- flag ) = ;
: n< ( n -- flag ) < ;
: n> ( n -- flag ) > ;
: n<= ( n -- flag ) <= ;
: n>= ( n -- flag ) >= ;

: optimise ( -- )
    cfa2 to cfa3   cfa1 to cfa2   _cfa to cfa1 
    
    cfa3 ['] swap =  cfa1 ['] swap = and if
        cfa2 case 
            ( swap 1+ swap) ['] 1+ of true ['] <1+ -6 re-compile endof
            ( swap 1- swap) ['] 1- of true ['] <1- -6 re-compile endof
            ( swap 2+ swap) ['] 2+ of true ['] <2+ -6 re-compile endof
            ( swap 2- swap) ['] 2- of true ['] <2- -6 re-compile endof
            ( swap 2* swap) ['] 2* of true ['] <2* -6 re-compile endof
            dup of false endof
        endcase
        
        0= if 
            cfa2 ['] lit = if
                cfa1 case
                    ( lit +)  ['] +  of true ['] n+  -6 re-compile endof
                    ( lit -)  ['] -  of true ['] n-  -6 re-compile endof
                    ( lit *)  ['] *  of true ['] n*  -6 re-compile endof
                    ( lit =)  ['] =  of true ['] n=  -6 re-compile endof
                    ( lit <)  ['] <  of true ['] n<  -6 re-compile endof
                    ( lit >)  ['] >  of true ['] n>  -6 re-compile endof
                    ( lit <=) ['] <= of true ['] n<= -6 re-compile endof
                    ( lit >=) ['] >= of true ['] n>= -6 re-compile endof
                    dup of false endof
                endcase
            then
        then
    then ;
    
: -int ( -- ) \ quit the optimising interpreter
    $f4 screen   abort ;
    
: int ( -- ) \ start the optimising interpreter
    $c0 screen \ different colour for optimising interpreter
    begin
        tib @ 80 expect
        begin
            bl word to _wl to _wa
            _wl while
                _wl 0> if
                    _wa _wl find to _im to _cfa
                    _im 0<> if
                        state @ if
                            _im 0> if
                                _cfa execute
                            else
                                _cfa ,
                                optimise
                            then 
                        else
                            _cfa execute
                        then
                    else
                        \ no cfa. maybe it's a number
                        _wa _wl number to _ucc to _num
                        _ucc 0= if
                            state @ if
                                $a052 @ 0> if
                                    \ its a double int
                                    compile lit _num ,
                                    compile lit ,
                                else
                                    $a068 @ 0= if \ coding?
                                        compile lit _num ,
                                    else
                                        _num ,
                                    then 
                                then 
                            else
                                _num
                            then
                        else
                            cr ." Error: " _wa _wl type space ." not found. " 
                            s0 sp! 
                        then
                    then
                then
        repeat
        stk? ." ok:" depth . cr
    again ;

Most of the code is a re-write of the compiler which calls the optimisation routines!! But you can clearly see the sequences that it can identify, and how it replaces the code with optimised code - the optimised code would be replaced with assembler code later on - I was just working on finding combinations of phrases that could be optimised. There are probably some really great optimisations that occur often (e.g. over +, or dup < etc.) that would make a significant difference.

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...