Jump to content
IGNORED

Camel99 Forth Information goes here


TheBF

Recommended Posts

6 hours ago, TheBF said:

TI-99 Forth implementation idea for discussion

 

Consider this: 

  1. Screen output on TI-99 goes to VDP RAM
  2. File output on TI-99 goes to VDP RAM 
  3. File input on TI-99 comes from VDP RAM

A Forth system could use VDP RAM as method to re-direct output to screen or to file simply by changing the VDP address where the output is written. 

 

If the terminal input buffer used VDP RAM, then input could also be redirected from a file by changing the VDP buffer address to a PAB buffer.

 

Penny for your thoughts.

 

(A Canadian penny, which actually doesn't exist anymore so you might not get paid) :)

 

 

 

 

Sure, like a scratchpad Temp work file I guess of sorts 

Link to comment
Share on other sites

On 10/29/2022 at 11:03 AM, TheBF said:

TI-99 Forth implementation idea for discussion

Consider this: 

  1. Screen output on TI-99 goes to VDP RAM
  2. File output on TI-99 goes to VDP RAM 
  3. File input on TI-99 comes from VDP RAM

A Forth system could use VDP RAM as method to re-direct output to screen or to file simply by changing the VDP address where the output is written. 

 

If the terminal input buffer used VDP RAM, then input could also be redirected from a file by changing the VDP buffer address to a PAB buffer.

 

Penny for your thoughts.  (A Canadian penny, which actually doesn't exist anymore so you might not get paid) :)

 

I am used to thinking about blocks of 1024 characters. The 40-column screen will take almost all of one block (1024-960=64 short). The 80-column screen is easy (I think). The 40-column screen could be scrolled up and down 2 rows with no problem.

 

The screen could be the PAB buffer—even in 40-column mode (32 bytes of rollout area1 and 128 bytes of value stack2 will surely not be used while editing).

 

The above use of the screen area should be easy enough to manage for DV80 (or similar) I/O scenarios, as well.

________________

  1. The VDP rollout area is used primarily for temporary storage for GPL floating point transcendental functions, almost certainly not used while doing anything related to editing.
  2. The VDP value stack is used for most (all?) console (XML and GPL) floating point operations, but it can be moved elsewhere in VRAM by changing the value stack pointer, VSPTR (>836E).

...lee

Edited by Lee Stewart
CLARIFICATION
  • Like 3
  • Thanks 1
Link to comment
Share on other sites

I hadn't thought about how straightforward it is with blocks.

With files I will have to keep track of line endings and write to the file buffer and then write to file with every CR.

I thinks all doable but might get complicated.

It's very tempting though.

  • Like 3
Link to comment
Share on other sites

  • 2 weeks later...

A light has shone in my darkness 

 

I was fighting with MachForth to behave in a way I could understand when using IF ELSE THEN.

I decided to look at some of my other projects that needed native code for IF and that brought me to INLINE[ ]   

 

Turns out I had never finished a version that "TRIED" to compile IF/THEN or WHILE/REPEAT as native code. 

 

Armed with my new experience in MachForth it all suddenly made sense.  :) 

I realized that the way I did ?BRANCH in threaded Forth would work just as well if it used JMP instruction.  ( see:  IF,   in the code ) 

  • DEC the TOS register,
  • do a DROP to refill the TOS register. (*SP+ TOS MOV, ) 
  • DROP has no effect on the CARRY flag, so you can JNC to the branch address if the TOS went below zero.

 

My INLINE[ ] system uses a control flow stack, which needed a CS>SWAP word to replace the SWAPs in my normal Forth ELSE and WHILE statements.  That was easy.

 

After that I added the new words to the JIT CASE statement and things started coming together.

The resulting code is not as fast as hand coded Assembler, but it is about 3 times faster than threaded Forth as you can see in the screen capture. 

 

Future:

  1. There is no attempt to do PUSH/POP optimization
  2. It should be straightforward to optimize variables with symbolic addressing as I have in MachForth
  3. Make the JIT able to consume multiple lines or source code
  4. Re-write MachForth using these ideas.  

I am pretty pleased with this now. 

I want to try it on the Sieve benchmark. :) 

 

Here is the new code which is invoked with JIT[  ] .

This JIT compiler uses 1392 bytes of dictionary space. 

 

Spoiler
\ jit.fth Compiles inline code as headless words in HEAP  Nov 10 2022

\ Problem:
\ ITC Forth spends 50% of it's time running 3 instructions call NEXT.
\ This system compiles primitives from the kernel as super-instructions
\ and compiles the execution for the super instructions in a Forth word.

NEEDS .S     FROM DSK1.TOOLS
NEEDS CASE   FROM DSK1.CASE
NEEDS LIFO:  FROM DSK1.STACKS
NEEDS ELAPSE FROM DSK1.ELAPSE
NEEDS MARKER FROM DSK1.MARKER

MARKER /INLINE

HERE

8 LIFO: CS     \ small CONTROL FLOW STACK for loops and branching
: >CS     ( n -- ) CS PUSH ;
: CS>     ( -- n ) CS POP ;
: CS>SWAP ( -- )   CS> CS> SWAP >CS >CS ;

HEX
\ *** changed for kernel V2.69 ***
\ Words in scratchpad RAM end in a JMP instruction, not NEXT
\ Might change this, but for now just make some conventional versions.
CODE DUP    0646 , C584 ,  NEXT, ENDCODE
CODE DROP   C136 ,         NEXT, ENDCODE
CODE !      C536 , C136 ,  NEXT, ENDCODE
CODE @      C114 ,         NEXT, ENDCODE
CODE C@     D114 , 0984 ,  NEXT, ENDCODE
CODE +      A136 ,         NEXT, ENDCODE

\ Heap management
: THERE  ( -- addr) H @ ;  \ returns end of Target memory in HEAP
: HALLOT ( n -- )   H +! ; \ Allocate n bytes of target memory.
: T,     ( n -- )   THERE ! 2 HALLOT ;  \ "target compile" n into heap
: NEW-HEAP ( -- ) 2000 2000 0 FILL   2000 H ! ; \ reset HEAP

045A CONSTANT 'NEXT'  \ 9900 CODE for B *R10   Camel99 Forth's NEXT code

: CODE,  ( xt --)  \ Read code word from kernel, compile into target memory
           >BODY
           DUP 80 CELLS +   \ 128 bytes is max size we will try to compile
           SWAP   ( -- IPend IPstart)
           BEGIN
              DUP @ 'NEXT' <>  \ the instruction is not 'NEXT'
           WHILE
             DUP @  ( -- IP instruction)
             T,     \ compile instruction
             CELL+  \ advance IP
             2DUP < ABORT" End of code not found"
           REPEAT
           2DROP
;
\ now we can steal code word from the kernel and compile it to target memory
: DUP,   ['] DUP  CODE, ;
: DROP,  ['] DROP CODE, ;

: LIT,   ( n -- ) DUP,  0204 T, ( n) T, ;  \ DUP TOS and LI R4,n

: BEGIN,    THERE >CS ;  \ push location onto control stack

\ <DO> is CODE preamble to setup return stack.
: DO,  ( -- there) ['] <DO> CODE,  BEGIN, ;

\ store a byte offset in odd byte of addr.
\ Addr is the location of Jump instruction
: RESOLVE ( addr offset --) 2- 2/ SWAP 1+ C! ;

\ compute offset from addr addr' & complete the jump instruction
: <BACK   ( addr addr' -- ) TUCK -  RESOLVE ;

: ?BYTE ( c -- c)  DUP FF00 AND ABORT" Jump out of range" ;

\ compile misc. jump instructions with offset.
: JMP,  ( c --) ?BYTE 1000 + T, ;
: JNO,  ( c --) ?BYTE 1900 + T, ;
: JEQ,  ( c --) ?BYTE 1300 + T, ;
: JNE,  ( c --) ?BYTE 1600 + T, ;
: JOC,  ( c --) ?BYTE 1800 + T, ;
: JNC,  ( c --) ?BYTE 1700 + T, ;

: 1-,    ( n -- n') 604 T, ; \ TOS DEC,

: LOOP,
          0597 T,             \ *RP INC,
          CS> THERE  0 JNO, <BACK   \ compute offset, compile into JNO
          ['] UNLOOP CODE,    \ collapse stack frame
;

: +LOOP,
          A5CA T,  \ TOS *RP ADD,
          DROP,    \ don't need TOS value anymore
          LOOP,    \ compile loop code
;

: AGAIN,   CS> THERE 0 JMP, <BACK ;

: UNTIL,
          1-,
          DROP,
          CS> THERE 0 JNC, <BACK ;

: IF,    ( n -- )
           1-,       \ If tos=0, DEC will cause a carry
           DROP,
           THERE >CS 0 JNC, ;

: THEN,    CS> THERE OVER - RESOLVE ;
: ELSE,    THERE >CS  0 JMP,  CS>SWAP THEN, ;

: WHILE,   ( n -- ) IF, CS>SWAP ;
: REPEAT,   AGAIN, THEN, ;

\ CFA of a code word contains the address of the next cell
: NOTCODE? ( XT -- ?)  DUP @ 2- - ;

: OPT-FORTH ( cfa -- )
       ['] DOCOL @ OVER @ =  \ a colon definition?
       IF ( -- cfa)
          CASE
             ['] DO     OF DO,      ENDOF
             ['] LOOP   OF LOOP,    ENDOF
             ['] +LOOP  OF +LOOP,   ENDOF
             ['] BEGIN  OF BEGIN,   ENDOF
             ['] UNTIL  OF UNTIL,   ENDOF
             ['] AGAIN  OF AGAIN,   ENDOF
             ['] IF     OF IF,      ENDOF
             ['] ELSE   OF ELSE,    ENDOF
             ['] THEN   OF THEN,    ENDOF
             ['] WHILE  OF WHILE,   ENDOF
             ['] REPEAT OF REPEAT,  ENDOF

             TRUE ABORT" Can't optimize word"

          ENDCASE
          DROP
       ELSE \ Other type of Forth word
          DUP @   \ get the "executor" code routine address
          CASE ( data words )
             ['] DOVAR    OF >BODY LIT,    ENDOF
             ['] DOCON    OF  EXECUTE LIT, ENDOF
             ['] DOUSER @ OF  EXECUTE LIT, ENDOF
             TRUE ABORT" Unknown data type"
         ENDCASE
         DROP
      THEN
;

\ new interpreter loop for inlining
: JIT[ ( -- addr)  \ Returns address where code has been copied
           THERE ( -- XT)    \ execution token (XT) for the NEW compiled code
           DUP CELL+ T,      \ create the ITC header for CODE word
           BEGIN
             BL WORD CHAR+ C@  [CHAR] ] <>
           WHILE
              HERE FIND
              IF ( *it's a word in the dictionary* )
                 DUP NOTCODE?
                 IF ( -- cfa )
                    DUP OPT-FORTH
                 ELSE      \ it's a CODE primitive
                    CODE,  \ compile code without NEXT
                 THEN
             ELSE ( maybe its a number)
                 COUNT NUMBER? ?ERR
                 ( n ) LIT,   \ compile n as a literal
             THEN
           REPEAT      \ CR .S  ( debug line)
           'NEXT' T,   \ compile NEXT at end of new code word
            ,          \ compile CODE word's XT into Forth definition
; IMMEDIATE

HERE SWAP - SPACE DECIMAL . .( bytes)

\ ==================================================
\ Test code
NEW-HEAP
HEX
: FOREVER     JIT[ 0  BEGIN 1+ AGAIN ] ;
: JITWHILE    JIT[ FFFF BEGIN DUP WHILE  1- REPEAT ] . ;

: OPTCOUNTDN  JIT[ FFFF BEGIN 1- DUP 0= UNTIL ] .  ;

: IFTEST     JIT[ IF -1 ELSE 0  THEN ] . ;

: FORTHFILL  ( char --)  C/SCR @ 0 DO DUP I VC! LOOP  DROP ;
: FORTHTEST   [CHAR] Z  BL  DO  I FORTHFILL LOOP ;

: OPT2FILL    C/SCR @ 0 DO  JIT[ DUP I VC! ] LOOP DROP  ;
: INNEROPT   [CHAR] Z  BL DO  I OPT2FILL LOOP ;

: OPTFILL    ( char --) JIT[ C/SCR @ 0 DO DUP I VC! LOOP DROP ] ;
: LOOPOPT    [CHAR] Z  BL  DO  I OPTFILL LOOP ;

: FULLOPT
 [CHAR] Z  JIT[ BL  DO  I C/SCR @ 0 DO DUP I VC! LOOP DROP LOOP ] ;

 

image.png.729798100fcede21edff5b07813d0dd3.png

  • Like 2
Link to comment
Share on other sites

While trying to make my optimizer swallow multiple lines, I was reading up about the Forth word REFILL which is not implemented in Camel Forth. 

 

REFILL - CORE EXT (forth-standard.org)

 

There are three versions of the darn thing. One for console, one for files and one BLOCKs.

I have always used a temp buffer for INCLUDEing files but I got the sense from REFILL that I could read files into TIB so I tried it.

So far it sems to work fine. Nested INCLUDEs work just like before.  To be honest I am not 100% why.  ??

 

I have not built REFILL yet, but I seem to have the pieces to do it. 

I called this FGET which reads a record into VDP RAM and now it transfers it to TIB.

: FGET  ( -- tib len) \ read file buffer->TIB
      TIB  [PAB FBUFF] V@  OVER  [PAB CHARS] VC@ DUP>R  VREAD R> ;

 

For reference here is INCLUDED from the kernel

Spoiler
: INCLUDED  ( caddr len -- )
           ?FILE
           CR T." Loading: " 2DUP TYPE
           SOURCE-ID @ >IN @ 2>R           \ save source-ID, input pointer
           PSZ NEGATE ^PAB +!              \ make new PAB, on pab stack
           ( $ len ) 50 14 FOPEN ?FILERR   \ OPEN as 80 FIXED DV80 INPUT
           SOURCE-ID 1+!                   \ incr. source ID (1st file is 1)
           LINES OFF                       \ reset the line counter
           BEGIN
             2 FILEOP 0=                   \ file read operation
           WHILE
             FGET INTERPRET                \ interpret the buffer
             LINES 1+!                     \ count the line
           REPEAT
           PSZ ^PAB +!                     \ remove PAB from pab stack
           2R> >IN !  SOURCE-ID !          \ restore >IN, SOURCE-ID
;

 

 

  • Like 2
Link to comment
Share on other sites

I needed a distraction from native code generation so today I got something working that I always wanted.

 

I have a SUPERCART version of Camel99 Forth, which is amazing because it frees up 8K of HI RAM, and still allows one to change the kernel if need be.

However it is kind of an orphan. The existing SAVESYS word doesn't know how to deal with it and so you could never make stand-alone binaries with it.

 

I fixed that today with SUPERSAVE.  I also went one step further.  I added a SAVEHEAP function.

If your program moves the H variable ie: uses some LOW RAM at >2000, that gets saved as well as a separate image file. 

And when the program wakes up the value of H is persevered.  preserved. (Damned spelled check) 

(I could add one more module to save VDP RAM as well but that's for another day)

 

In this test file I load a bunch of libraries and I also put 4660 bytes of CHAR '#' in low RAM and save it as a "fat" Forth called DEVSYS.

 

Spoiler
CR .( SUPERSAVE.FTH for SUPERCART Camel99 Forth  Nov2022 B Fox)
\ creates a binary program E/A 5 format.
\ Makes as many files as needed to save the system

\ For SUPERCART we must save the kernel at >6000 as the primary file.
\ IF the heap is used (H<>$2000) the HEAP is also saved.
\ ALL the memory from >A000 to end of dictioanry is saved as secondary files

\ test: load up libaries
INCLUDE DSK1.TOOLS
INCLUDE DSK1.ELAPSE
INCLUDE DSK1.GRAFIX
INCLUDE DSK1.AUTOMOTION
INCLUDE DSK1.WORDLISTS

NEEDS LOCK  FROM DSK1.MARKER

: NEWBOOT    WARM INIT-WORDLISTS  ABORT ; \ new init word for the system

LOCK

\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\ nothing past here will be in the image.

NEEDS LOAD-FILE FROM DSK1.LOADSAVE  \ we use SAVE-FILE from this library

HERE
HEX
        A000 CONSTANT HIMEM
        1000 CONSTANT VDPBUFF
        2000 CONSTANT 8K
        2000 CONSTANT LOWRAM

          13 CONSTANT PROG     \ file mode for Program files
     3 CELLS CONSTANT HEADLEN
8K 3 CELLS - CONSTANT MAXSIZE

\ define the file header fields. *THESE ARE VDP ADDRESSES*
VDPBUFF            CONSTANT MULTIFLAG
VDPBUFF  1 CELLS + CONSTANT PROGSIZE
VDPBUFF  2 CELLS + CONSTANT LOADADDR
VDPBUFF  HEADLEN + CONSTANT CODEBUFF     \ COPY 8K program chunks to here

VARIABLE FILECOUNT

: ?LOCK
  ORGDP @ A100 <
  IF BEEP CR ." WARNING: missing LOCK directive" CR THEN ;

: ENDMEM  ( -- addr ) ?LOCK  ORGDP @ ;

\ words to compute the himmem parts of the system
: ?SIZE  DUP 8K > ABORT" Code to big" ;
: SYS-SIZE    ( -- n)  ENDMEM HIMEM - ;
: #FILES      ( -- n)  SYS-SIZE 8K /MOD SWAP IF 1+ THEN ;
: CODECHUNK   ( n -- addr) MAXSIZE * HIMEM + ;
: CHUNKSIZE   ( n -- n ) ENDMEM SWAP CODECHUNK -  MAXSIZE MIN ;

: ?PATH  ( addr len -- addr len )
         2DUP  [CHAR] . SCAN NIP 0= ABORT" Path expected" ;

CREATE FILE$ ( -- caddr) 20 ALLOT

: FILENAME   ( -- addr len) FILE$ COUNT ;
: LASTCHAR++ ( Caddr len --)  1- +  1 SWAP C+! ;

: SAVE-IMAGE ( addr len Vaddr size -- )
    CR ." Writing file: " FILENAME TYPE
    HEADLEN +  PROG SAVE-FILE
    FILENAME LASTCHAR++
    FILECOUNT 1+! ;

: HEADER  ( addr size ?) \ store header info in VDP RAM
    MULTIFLAG V!  PROGSIZE V!  LOADADDR V! ;

\ kernel ................
HEX
               6000 CONSTANT KERNORG
' ; 20 +  KERNORG - CONSTANT KERNSIZE \ last word in kernel is ';'

HEX
: SAVEKERNEL ( xt -- <textpath> )
    BOOT !
    PARSE-NAME ?PATH  FILE$ PLACE
    KERNORG KERNSIZE TRUE HEADER  ( Kernel always needs more files )
    KERNORG CODEBUFF KERNSIZE VWRITE \ copy kernel to VDP
    FILENAME VDPBUFF KERNSIZE SAVE-IMAGE
;

: HEAPSIZE ( -- n)  H @ LOWRAM - ;

: SAVEHEAP ( -- )
    HEAPSIZE
    IF
        LOWRAM HEAPSIZE DUP>R TRUE HEADER
        LOWRAM CODEBUFF R@ VWRITE        \ copy HEAP to VDP
        FILENAME VDPBUFF R> SAVE-IMAGE
    THEN ;

INCLUDE DSK1.UDOTR

: .BYTES&ADDR ( addr size --)
   DECIMAL 5 U.R ."  bytes, at " HEX ." >" 4 U.R ;

HEX
: REPORT
    CR
    CR ." Kernel: "  KERNORG  KERNSIZE .BYTES&ADDR
    CR ." Himem : "  HIMEM  ORGDP @ OVER -  .BYTES&ADDR
    CR ." Heap  : "  LOWRAM  HEAPSIZE  .BYTES&ADDR
    CR ." Saved in " FILECOUNT @ .  ." EA5 files"
    CR
;

: SAVEHIMEM ( -- <textpath> )
    #FILES 0
    ?DO
      \ Init file header in VDP RAM
       I CODECHUNK  I CHUNKSIZE  I 1+ #FILES <> HEADER
       LOADADDR V@  CODEBUFF  PROGSIZE V@ HEADLEN +  VWRITE
       FILENAME  VDPBUFF      PROGSIZE V@ SAVE-IMAGE
    LOOP
;

: SUPERSAVE ( xt -- <path>) SAVEKERNEL SAVEHEAP SAVEHIMEM  REPORT ;

' NEWBOOT SUPERSAVE DSK7.DEVSYS

 

 

 

 

 

 

Edited by TheBF
typo
  • Like 4
Link to comment
Share on other sites

Building this SUPERSAVE has opened up a range of options that has me excited. I must have been sleeping on it because I woke up with these ideas.

 

1. DTC Forth needs more space. (The kernel alone takes ~500 extra bytes) I had abandoned DTC because it really needs the Supercart version, but I could not make fast loading applications. 

Now you can use SuperCart for the Kernel and compile what you need then save it all off as an application or a FAT Forth that loads in seconds.

(Just tested the existing SUPERSAVE code on DTC)

 

2. With a little more effort I can save SAMS pages as a sequence of binary program files. Then with a simple loader I can pull them back into the machine.

If I put that together with the SAMSCODE library I can put SAMS Forth headers at A000 to DFFF, and use E000 to EFFF as a SAMS "code segment".

That would create a SAMS based Forth that works pretty much like normal Forth.

At the moment you still have to manually specify the code page that the compiler uses but the headers have a SAMS page field so running code is seamless. 

 

---

 

DTC Surprise Observation:

I spent a lot of time improving the DOES> runtime in the Camel99 ITC version. It leverages the BL instruction to speed it up.

TCREATE: DODOES  ( -- a-addr)
              TOS PUSH,       \ save TOS reg on data stack    
              W TOS MOV,      \ put defined word's PFA in TOS
              IP RPUSH,       \ push old IP onto return stack
              R11 IP MOV,     \ R11 has the new PFA -> IP 
              NEXT, 

 

DTC has to do a bit more work. BL is already used for each Forth word to call _DOCOL.  Means I can't use it again to get to _DODOES.

l: _DODOES  ( -- a-addr)
            TOS PUSH,       \ save TOS reg on data stack
            R11 TOS MOV,    \ After BL, R11 has defined word's PFA. Move to TOS
            IP  RPUSH,      \ save current IP on return stack
           -2 (R11) IP MOV, \ CFA to Forth IP
            IP  4 ADDI,     \ cfa>pfa of the defined word.
            NEXT,

 

I use the compile time of the Assembler as test of compiling speed.

With ITC Forth it is ~13.5 seconds. 

I was shocked to see that DTC Forth did the same job in ~17 seconds! 

Other files compile faster with DTC, but I think the lower efficiency of DOES> in DTC is the culprit in the Assembler code. 

 

 

 

  • Like 2
Link to comment
Share on other sites

SAMS Code save/load Tests

 

In my SAMSCODE project I limited the code to the top 16 PAGES of the 1M card, so 64K of Forth code potential.

I can't imagine a project on TI-99 that would need more code and I suspect the headers would fill the HI-RAM by then.

TBD.

 

Simple re-working the SUPERSAVE code gave me the save routine. SAMSCODE gave me a fast CMAP function.

This test version saves all 64K. 

The video shows the speed to save 64k. In reality each page might not be full and we won't save pages that have no code.

 

The loader works like the EA/5 loader and keeps "SLOADing" files until the flag field=0.

 

Looks like I can make this work. (Way more code here than I really want)

 

Spoiler
\ SAMSLOADER.FTH   E/A file loader for CAMEL99 Forth  Nov 2022 Brian Fox

MARKER /LOADER
NEEDS FAR:       FROM DSK1.SAMSCODE
NEEDS LOAD-FILE  FROM DSK1.LOADSAVE

HEX 1000 VP !  \ beginning of free VDP RAM

\ define the file header fields in *VDP RAM*
VP @             CONSTANT MULTIFLAG
VP @   1 CELLS + CONSTANT PROGSIZE
VP @   2 CELLS + CONSTANT LOADADDR

VP @   3 CELLS + CONSTANT CODEBUFF    \ COPY 8K program chunks to here

              13 CONSTANT PROG     \ file mode for Program files
            1000 CONSTANT 4K
            3000 CONSTANT CSEG

: LASTCHAR++  ( Caddr len --) 1- +  1 SWAP C+! ;

: HEADER  ( addr size ?) \ store header info in VDP RAM
    MULTIFLAG V!  PROGSIZE V!  LOADADDR V! ;

CREATE FILE$   20 ALLOT
: FILENAME ( -- addr len)  FILE$ COUNT ;


: SAMSAVE  ( $adr len Vaddr size page# -- )
      CR ." Writing SAMS page " .  ." to "  FILENAME TYPE
      PROG SAVE-FILE
      FILENAME LASTCHAR++
;

: SAVE-SAMS ( $adr len 1st last  -- )
     2SWAP FILE$ PLACE
     1+ SWAP
     DO
         I CMAP
        \ loadaddr     size       multiflag
          CSEG   4K 3 CELLS +   I _MAXBANK <> HEADER

          LOADADDR V@  CODEBUFF  PROGSIZE V@  VWRITE
          FILENAME      VP @     PROGSIZE V@  I SAMSAVE
     LOOP ;

\ sams loader --------------------------------------------------
: SLOAD  ( page# -- ?)     \ 4K max code size. FILE$ must be set
          CMAP
          FILENAME VP @  4K 6 + PROG LOAD-FILE   \ read into VDP RAM
          CODEBUFF  CSEG  PROGSIZE V@ 6 - VREAD  \ read VDP to CPU RAM
          FILENAME LASTCHAR++
          MULTIFLAG V@           \ return the multi-file flag
;

: LOAD-SAMS  ( addr len 1stpage -- )
          >R
          FILE$ PLACE
          BEGIN
             R@ SLOAD
          WHILE
            R> 1+ >R
          REPEAT
          R> DROP
;

\ TEST TOOLS
HEX
: FILLSAMS   100 0F0 DO  I CMAP  CSEG 4K I FILL  LOOP ;
: CLEARSAMS  100 0F0 DO  I CMAP  CSEG 4K 0 FILL   LOOP ;

 

 

Edited by TheBF
fixed comment
  • Like 2
Link to comment
Share on other sites

Since it looks like I may start compiling code into SAMS memory as a matter of course I thought I should fix a little bug waiting to happen in my FAR: compiler.

In the original code I used R1 as the page argument in _CMAP, the mapping sub-routine, which would totally blow up if you tried to use VDP routines, for example by using .S  to debug some code. 

Didn't notice that until just recently.

 

So it was back into the code and I am now using the return stack to pass the argument to _CMAP. 

This works well because FAR: pushes things onto the return stack anyway so it seems consistent for ;FAR to pickup the old SAM page from the return stack.

 

Another thing in the old code that bugged me was that I maintained an array of dictionary pointers, one for each SAMS page that could have code in it.

That's crazy when you have a 1M memory card. :)

 

The array is removed and replaced by using the last memory location in a SAMS code page as a variable. 

This meant that I had to pull the SAMS page into the window a bit earlier to get at that memory location but it a much better solution because now any SAMS page can be used for code.

The system now just needs you to initialize that memory location once with the word FIRSTUSE. 

After that you just select it with CODEPAGE as required. 

 

 

Example:  

PASSTHRU CMAP  \ init the bank# variable to default memory page

DECIMAL 
16 CODEPAGE FIRSTUSE
FAR: HELLO   CR ." Hello SAMS World!"  ;FAR

17 CODEPAGE FIRSTUSE
FAR: NESTED1  ." nesting 1" ;FAR

18 CODEPAGE FIRSTUSE
FAR: NESTED2  NESTED1  ."  2 "  ;FAR

19 CODEPAGE FIRSTUSE
FAR: NESTED3  NESTED2  ."  3 "  ;FAR

16 CODEPAGE ( Use this page again )
FAR: GO   NESTED3 HELLO ;FAR

 

image.png.ef0be1e319cbd37ee787e6b9fad33d28.png

 

Spoiler
\ SAMSCODE.FTH                for Camel99 Forth  Brian Fox
\ Code in SAMS memory based on concept in TurboForth by Mark Wills
\ Ported to Camel99 Forth with changes Oct 13, 2021,

\ Concept:
\ FAR: word headers are in the normal Forth memory space so all SAMS words
\ can be found.
\ FAR: word data structure has two extra fields
\ <link> < HEADER> <imm> <len NAME..> <FARCOL>  <BANK#> , <IP> ,

\ FAR: compiles a "fat" header that remember SAMS BANK# and SAMS IP

\ ;FAR  compiles FARSEMIS in SAMS memory, not in RAM Dictionary to save space.

\ Compile time check: ;FAR tests for SAMS memory
\ Smart MAP. Remembers the last SAMS page that was mapped. (BANK#)
\ Only performs a MAP if it's a new SAMS page

\ Update Nov 2022: removed array of SAMS DP variables.
\ - Each SAMS page uses last memory cell to hold its own DP.
\ - Can now compile code to any SAMS page.
\ - You must use ( n) CODEPAGE FIRSTUSE the first time you use a page


NEEDS SAMSINI   FROM DSK1.SAMSINI  \ common code for SAMS card

HERE
HEX
\ **************[ CHANGE CSEG to your requirements ]******************

HEX              3000 CONSTANT CSEG      \ CODE window in CPU RAM

\ ********************************************************************

\ Derived SAMS memory addresses for code
          CSEG 0FFE + CONSTANT SAMSDP    \ variable at end of SAMS page
4000 CSEG 0B RSHIFT + CONSTANT CREG      \ compute CSEG SAMS register
     CSEG 0C RSHIFT   CONSTANT PASSTHRU  \ default page for CSEG

VARIABLE  SAVHERE   \ temp holder for RAM Dictionary pointer
VARIABLE  BANK#     \ last SAMS bank# selected
VARIABLE  CPAGE     \ active code page used for compiling


HEX
\ **LEAF SUB-ROUTINE**
CREATE _CMAP ( -- ) ( R: page# -- )
      R0 RPOP,              \ POP parameter from Rstack
      R0 BANK# @@ CMP,      \ already mapped?
      NE IF,
         R0 BANK# @@ MOV,   \ update the last bank used
         R0 SWPB,           \ swap bytes
         R12 1E00 LI,       \ set SAMS card CRU address
         0 SBO,             \ turn on the card
         R0 CREG @@ MOV,    \ map it
         0 SBZ,             \ turn off card
      ENDIF,
      RT,

CODE CMAP  ( page# --) \ Forth word to map SAMS pages
      TOS RPUSH,    \ need parameter on Rstack
      _CMAP @@ BL,  \ call it
      TOS POP,      \ refill TOS
      NEXT,
ENDCODE

\ run time executor for SAMS colon words.
CREATE FARCOL
     IP RPUSH,
     W IP MOV,            \ IP = DATA cell of this word
     BANK# @@ RPUSH,      \ Rpush the currently active SAMS bank
     *IP+ RPUSH,          \ fetch bank# in PFA & save on return stack
     _CMAP @@ BL,         \ call _CMAP (uses RSTACK parameter)
     *IP IP MOV,          \ get SAMS DP & set new IP
     NEXT,

CODE FAREXIT             \ exit for SAMS word
     _CMAP @@ BL,        \ RSTACK has old BANK#, map it in
      IP RPOP,           \ Regular FORTH EXIT
      NEXT,
ENDCODE

\ \\\\\\\\\\\\\\\\ code words end  //////////////////

: FAR: ( -- ) \ special colon for words in FAR memory
     !CSP
     HEADER             \ compile Forth header with name
     FARCOL ,           \ compile the new executor as CFA
     CPAGE @ DUP ,      \ compile codepage as the DATA field
     CMAP               \ pull in the SAMS page
     SAMSDP @ DUP ,     \ SAMSDP is this word's IP address
     HERE SAVHERE !     \ save "normal here"
     ( samsdp) DP !     \ set Forth DP to SAMSDP. Compiling to SAMS now
     HIDE
     ]                  \ turn on the compiler
;

: ;FAR ( -- ) \ end SAMS compilation. *NEW* compile time memory test
      POSTPONE FAREXIT    \ compiles at end of SAMS code
      POSTPONE [          \ turn compiler off
      REVEAL ?CSP
      HERE DUP SAMSDP !   \ update HERE for this bank, keep a copy
      SAVHERE @ DP !      \ restore DP to CPU RAM
      CSEG 0FF8 + > ABORT" SAMS bank full"
; IMMEDIATE

HEX
: CODEPAGE ( bank# -- ) CPAGE ! ; \ select SAMS page for compiling

: FIRSTUSE
      CPAGE @ CMAP
      CSEG 1000 FF FILL \ fill is for debugging
      CSEG SAMSDP !     \ set the local CSEG DP variable to start of CSEG
;

HERE SWAP -
DECIMAL CR . .( bytes)

 

 

 

Edited by TheBF
error test had wrong constant
  • Like 2
Link to comment
Share on other sites

This SAMS compilation is great but there is one thing you can't do with this "fat" header for SAMS compilation.

 

Things that involve ' and CREATE DOES> do not work with this bigger header.

Tick is returning a code field in RAM but the real XT of the word is in SAMS memory. 

This will also affect EXECUTE and PERFORM. 

 

At the moment this affects VALUE and DEFER when I built them with CREATE DOES>/

These could be redone using primitives like HEADER and COMPILE, but it's not that big of deal.

It means you compile those kinds of words in conventional RAM.

 

It was easy everybody would be doing it. :)

 

Hmm.. at the moment I have defined a SAMS vocabulary with special version of :  and ;  

Maybe I can make a new tick as well... 

 

Edit: Never easy.  New tick is simple but DOES> needs to know how to pull in a SAMS page before running some code.

        Makes my head spin.  It's going to be caveat emptor for now. 

 

  • Like 1
Link to comment
Share on other sites

5 hours ago, Willsy said:

I got around this by building the headers for SAMS-hosted words in conventional dictionary, and compiling a trampoline function that, upon execution, pages in the appropriate bank, and does a simple branch to word in SAMS memory.

Your code sent me down this path.  :) 

I used your concept, (credits in the file) but wanted to see if I could improve it.

  1. Write the hard working code in Assembler. 
  2. reduce the size of the dictionary entries by putting the Exit code in SAMS memory
    • FAREXIT has been recently improved to be one BL to _CMAP plus Forth's EXIT code. :)
  3. Use the return stack to linkage
  4. write a "docol" for SAMS words (called it FARCOL) for faster entry.
  5. Make the mapping code as fast as possible.  
  6. Remove the HERE pointers and keep each pointer in its own SAMS page  ( EDIT)
     

The speed is excellent now but it won't compile "compiling" words in SAMS. 

It is because of the different header size now. 

\ run time executor for SAMS colon words.
CREATE FARCOL
     IP RPUSH,
     W IP MOV,            \ IP = DATA cell of this word
     BANK# @@ RPUSH,      \ Rpush the currently active SAMS bank
     *IP+ RPUSH,          \ fetch bank# in PFA & save on return stack
     _CMAP @@ BL,         \ call _CMAP (uses RSTACK parameter)
     *IP IP MOV,          \ get SAMS DP & set new IP
     NEXT,

CODE FAREXIT             \ exit for SAMS word
     _CMAP @@ BL,        \ RSTACK has old BANK#, map it in
      IP RPOP,           \ Regular FORTH EXIT
      NEXT,
ENDCODE

 

In one benchmark I tested I split the code with half in one bank and the other half in a 2nd calling back and forth continuously.

The difference from normal Forth and SAMS version was almost nothing.

 

I think I can find the compromise between the two.

  • Keep standard Forth header with DOCOL
  • Entry to SAMS words can be written as a CODE word instead of Forth. (I had something like this first. Gotta find that)
  • Use Return stack for linkage
  • Fast FAREXIT code 

Thanks for making me think this over. ;)

 

 

 

  • Like 2
Link to comment
Share on other sites

My first round at translating Marks code last year works perfectly.  I used DEFER as a test bed because it blew up so completely.

I have written a few variants of my changes and they all fail on DEFER. 

So, I will go back to Mark's methods and slowly add my "improvements" :)   one at a time until I find the culprit.

It could be that the return stack can't be used but I can't understand why at the moment.

 

I think I can do these changes:

  1. reduce the size of the dictionary entries by putting the Exit code in SAMS memory
  2. Use the return stack to linkage
  3. Make the mapping code as fast as possible. (done. 6 instructions) 
  4. Remove the HERE pointers and keep each pointer in its own SAMS page

 

Here is my working translation 

Spoiler
\ Code in SAMS memory based on TurboForth by Mark Wills
\ Translation to Camel99 Forth  Sept 30 2021

NEEDS MARKER  FROM DSK1.MARKER
NEEDS VALUE   FROM DSK1.VALUES
NEEDS DUMP    FROM DSK1.TOOLS
NEEDS MOV,    FROM DSK1.ASM9900

HERE
DECIMAL
CREATE HERES  32 CELLS ALLOT
HEX
CODE ]HERE ( ndx -- addr )
     A104 ,           \ TOS TOS ADD,
     0224 , HERES ,   \  TOS SAT AI,
     NEXT,
ENDCODE

\ SAMS memory management for code
HEX              3000 CONSTANT CSEG      \ code seg in CPU RAM
4000 CSEG 0B RSHIFT + CONSTANT CREG      \ compute CSEG SAMS register
\ CSEG 0C RSHIFT      CONSTANT PASSTHRU  \ default page for CSEG

\ CMAP brings pages of code into the window called CSEG
\ The SAMS register is pre-calculated as constant CREG
CODE CMAP ( bank# -- )
     TOS SWPB,
     R12 1E00 LI,
     0 SBO,             \ turn on the card
     TOS CREG @@ MOV,   \ store bank# in SAMS register
     0 SBZ,             \ turn off card
     TOS POP,           \ refill top of stack register
     NEXT,
ENDCODE

-1 VALUE _BANK          \ current bank
 0 VALUE _MAXBANK
 0 VALUE _NHERE
\ _____________________________________________
\ Stack to handle pages
DECIMAL
CREATE BS0     20 CELLS ALLOT
CREATE BSP  BS0 ,    \ stack pointer, initialzed to BS0

\ : BSDEPTH ( -- n) BSP @   BS0 -  2/ ;

: >BS     ( bank# --) DUP  2 BSP +!   BSP @ !  CMAP  ;
: BS>     ( -- bank#)
      BSP @
\      DUP BS0 = ABORT" Bank stack underflow"  \ remove line for speed
      @  CMAP  -2 BSP +!  ;

HEX  0 >BS   \ force first entry on bank stack to SAMS page 0

HEX
: BANKS ( n -- )  \ reserve space for here pointers for n banks
  DUP TO _MAXBANK
  DUP 1+ 0 DO   CSEG I ]HERE !  LOOP  \ init "here" for each bank
  CR 4 * .  ." K of SAMS reserved." CR ;

\ TF uses address branching. Camel Forth uses Offset branching.
\ GOTO lets us do a direct branch to a literal address in the Forth code
CODE GOTO  ( addr -- )  *IP IP MOV,  NEXT, ENDCODE

: FAR: ( -- )
    :                     \ compile header in CPU RAM
\ Run-time action
    POSTPONE LIT _BANK ,  \ compile my bank#
    POSTPONE >BS          \ push my bank# and MAP
    POSTPONE GOTO _BANK ]HERE @ DUP ,  \ compile jump to here for this bank

\ compile-time action
     HERE TO _NHERE      \ save "normal here"
     DP !                \ set dp to _bank's "here"
    _BANK CMAP           \ map in the appropriate bank
;

: ;FAR ( -- ) \ end banked compilation
      POSTPONE GOTO  _NHERE ,
      HERE  _BANK ]HERE !           \ update here for bank
      _NHERE DP !                   \ restore dp to "normal" memory
      POSTPONE BS>
      POSTPONE ;
; IMMEDIATE

HEX
: _BFREE ( -- n) 4000  _BANK ]HERE @ - ;

: .BFREE ( -- ) DECIMAL
    CR ." Bank# " _BANK . ." , "  _BFREE  .  ." bytes free." CR ;

: CODEPAGE ( bank -- )
   DUP _MAXBANK 0 WITHIN ABORT"  Bad bank number"
   DUP TO _BANK CMAP  .BFREE
;
HERE SWAP -
DECIMAL CR . .( bytes)

 

 

  • Like 3
Link to comment
Share on other sites

I went back to Mark's code with a translation harness to try and get a stable starting place.

I even emulated the >MAP function from TF but using SAMS Forth primitives. 

I re-discovered that I had to make the SAMS semi-colon IMMEDIATE to work in Camel Forth.

 

DEFER works as expected

 

There are some buggy things too:

Simple CREATE DOES> worked in testing but something dies when I compile the Assembler.

For some reason the last definition in a SAMS block sometimes does not return normally.

(The last definition might be getting overwritten by something in my system that uses HERE) ?

 

It's remarkable that it ports as easy as it does. 

 

Spoiler
\ Literal translation from TF to Camel99

\ HARNESS for CAMEL99 ===================================================
INCLUDE DSK1.TOOLS  \ debug only
INCLUDE DSK1.SAMSINI
INCLUDE DSK1.VALUES

HERE
HEX
3000 CONSTANT $3000
4000 CONSTANT $4000

\ SAMS CARD management
HEX              $3000 CONSTANT CSEG      \ CPU RAM window for SAMS code
$4000 CSEG 0B RSHIFT + CONSTANT CREG      \ pre-compute CSEG SAMS register
 CSEG 0C RSHIFT        CONSTANT PASSTHRU  \ default page for CSEG

\ reproduce TF function
: >MAP ( bank# window-- )
      DROP              \ don't need address in this version
      ><                \ swap bytes
      SAMSCARD          \ Select SAMS Card
      0SBO              \ turn on the card
      ( bank#) CREG !   \ store bank# in SAMS register
      0SBZ ;            \ turn off card

\ TF uses address branching. Camel Forth uses Offset branching.
\ BRANCH lets us do a direct branch to a literal address in the Forth code
HEX
CODE BRANCH ( addr -- )  C259 , ( *IP IP MOV,)  NEXT, ENDCODE

\ =====================================================================
\ turbo forth code
( replaced H with DP )
( replaced COMPILE and [COMPILE] with POSTPONE )

DECIMAL
CREATE _BNKSTK   20 CELLS ALLOT  \ bank stack

\ when Turbo Forth is in its "default" configuration the second
\ half of the 8K memory expansion (>3000) is set to SAMS page
\ >F9. The following line of code initialises the first page of
\ the bank stack to page F9. This ensures that when executing
\ nested bank/pages, when it all unwinds, the default page is
\ swapped into memory.

HEX
F9F9 _BNKSTK ! \ force first entry on bank stack to $f9

_BNKSTK VALUE _BSP             \ pointer into bank stack
-1 VALUE _BANK                 \ current bank
0 VALUE _HERES                 \ holds "here" for each bank
0 VALUE _NHERE                 \ "normal" here
0 VALUE _MAXBANK

: >BANK ( BANK -- ) \ push bank to bank stack
  2 +TO _BSP   DUP _BSP ! $3000 >MAP ;

: BANK> ( -- ) \ POP BANK FROM BANK STACK
  -2 +TO _BSP  _BSP @    $3000 >MAP ;


: BANKS ( N -- )  \ reserve space for here pointers for n banks
  HERE TO _HERES
  DUP TO _MAXBANK
  DUP 0 DO  $3000 , LOOP  \ init "here" for each bank to $3000
  CR 4 * U. ." K of banked memory reserved." CR ;

\ ** added this for code clarity **
: SAMSDP  ( -- addr)  _BANK CELLS _HERES + ;

: B: ( BANK -- )
  \ begin compiling a banked definition in bank bank
  _BANK -1 <> IF
  :
  POSTPONE LIT _BANK ,
  POSTPONE >BANK
  POSTPONE BRANCH  SAMSDP @ DUP ,
  HERE TO _NHERE           \ save "normal here"
  DP !                     \ set dp to _bank's "here"
  _BANK  $3000 >MAP        \ map in the appropriate bank
  ELSE : THEN ;

: _BFREE ( -- ) \ determine free memory in the bank...
  $4000  SAMSDP @ -  .
  ." BYTES FREE." CR ;

: ;B ( -- ) \ end banked compilation
  POSTPONE BRANCH  _NHERE ,
  HERE  SAMSDP ! \ update here for bank
  _BFREE
  _NHERE DP !    \ restore dp to "normal" memory
  POSTPONE BANK>  POSTPONE ;
; IMMEDIATE  ( had to add immediate for CAMEL99 )

: SETBANK ( BANK -- )
  \ sets the bank number that will receive colon definitions
  DUP -1 _MAXBANK WITHIN
  IF
      TO _BANK
      _BANK -1 <> IF
        CR ." Bank " _BANK . ." is now active. "
        _BFREE
      ELSE
        CR ." Compiling to standard 32K memory." CR
      THEN
   ELSE
      TRUE ABORT" Illegal bank number specified"
   THEN ;

 : : ( -- ) \ banked / non-banked compilation
  _BANK -1 = IF : ELSE B: THEN ;

 : ; ( -- ) _BANK -1 =
      IF   POSTPONE ;
      ELSE POSTPONE ;B   \ ;b is immediate in camel99 forth
      THEN ; IMMEDIATE

HERE SWAP - DECIMAL  . .( bytes)

 

 

This benchmark code using Forth for SAMS paging, is slower than running from normal RAM by about 2%.

When I ran it with CODE word switching it was about 0.4% slower.

Spoiler
\ DEMO: Compile code in CPU RAM & SAMS memory and compare speed May 2022 BFox
20 BANKS

-1 SETBANK
INCLUDE DSK1.ELAPSE

HEX
  5 CONSTANT FIVE
100 CONSTANT MASK
  0 VALUE BVAR

: INNERBENCH
    BEGIN
      DUP SWAP DUP ROT DROP 1 AND
      IF FIVE +
      ELSE 1-
      THEN TO BVAR
      BVAR DUP MASK AND
    UNTIL ;

: BENCHIE  MASK 0 DO   1 INNERBENCH  DROP   LOOP ; \ 25.55 seconds

\ __________________________________________

1 SETBANK

: INNERBENCH2
    BEGIN
      DUP SWAP DUP ROT DROP 1 AND
      IF FIVE +
      ELSE 1-
      THEN TO BVAR
      BVAR DUP MASK AND
    UNTIL
;

2 SETBANK
: BENCHIE2  MASK 0 DO  1 INNERBENCH2  DROP  LOOP ;
\ 26.03 seconds using Forth >MAP

CR .( To test: BENCHIE: )
CR .( ELAPSE BENCHIE )
CR .( ELAPSE BENCHIE2)
CR .( ~25 seconds)

 

 

 

image.png.ea829aae53b93cd2eb0f1c6febec8f1e.png

  • Like 3
Link to comment
Share on other sites

I have been trying for some time now to learn how to make a native code compiler for Forth.

It's something was always amazed by back in the '90s with TCOM by Tom Zimmer and FORTHCOM by Thomas Almy.

 

I don't have the whole thing figured out yet, but I now have a beginning with a just-in-time compiler for a subset of Forth.

It only took me 30 years. :) 

One the things that was driving me nuts was making the Forth compile/interpret loop do the correct thing.

Making the looping words IMMEDIATE was the key and using a vocabulary with search order control really simplified things.

 

This method does only 3 things:

  1. IMMEDIATE looping and branching words that compile machine code.
    • The JIT versions of the words are found first by putting the JITS vocabulary 1st in the search order. 
  2. When a code word is encountered, copy the machine code from Forth inline in the heap 
  3. When a VARIABLE, CONSTANT or USER variable is encountered, it is compiled to the TOS register with the LI instruction
     

So, this means it can only optimize CODE words, data words and loops and branches, but you can do a lot with that. 

 

My goal was to JIT compile the Byte Magazine sieve program without making a lot of changes to the source code.

 

Here is the new code for the JIT.

Things to note:

1. Addition of the NEW-HEAP command to erase the LOW RAM at >2000 and reset the H variable to >2000

2. Use the JIT:  compiler only on the inner section because the JIT can't cope with all words

3. Move the printing of the result and the word "PRIMES" out of the computation section 

 

The regular code running on Camel99 takes 120 seconds. 

This JIT version runs in 50.1 seconds. It's a long way off from GCC at 15 seconds but I have a better understanding of why now and that's personal progress. 

Forth's use of the stack instead of registers is a slowdown, especially since I am not doing any push/pop removals here. (future) 

Another big one is not using the CPU features like symbolic and indexed addressing.  That needs a bit of analysis by the compiler but using symbolic addressing is possible. (future) 

 

NEW-HEAP

DECIMAL
 8190 CONSTANT SIZE
    0 VARIABLE FLAGS   SIZE ALLOT  0 FLAGS !


JIT: DO-PRIME
   FLAGS SIZE  1 FILL
   0
   SIZE 0
   DO FLAGS I + C@
     IF I DUP +  3 +  DUP I +
        BEGIN
          DUP SIZE <
        WHILE
           0 OVER FLAGS +  C!
           OVER +
        REPEAT
        DROP DROP
        1+
     THEN
   LOOP
;JIT

: PRIMES ( -- )
   PAGE ."  10 Iterations"
   10 0
   DO
      DO-PRIME CR . ." PRIMES"
   LOOP
   CR ." Done!"
;

 

Here is the code.  It compiles to 324 bytes (excluding the libraries) 

Spoiler
\ jit.fth Compiles inline code as headless words in HEAP  Nov 29 2022

\ Problem:
\ ITC Forth spends 50% of it's time running 3 instructions called NEXT.
\ This system compiles primitives from the kernel as super-instructions
\ and compiles the execution token for the super instructions in a Forth word.

NEEDS .S     FROM DSK1.TOOLS
NEEDS CASE   FROM DSK1.CASE
NEEDS LIFO:  FROM DSK1.STACKS
NEEDS ELAPSE FROM DSK1.ELAPSE
NEEDS MARKER FROM DSK1.MARKER
NEEDS WORDLIST FROM DSK1.WORDLISTS

: EMPTY  S" *CLEAN* MARKER *CLEAN*" EVALUATE ;
MARKER *CLEAN*
EMPTY

HERE

8 LIFO: CS     \ small CONTROL FLOW STACK for loops and branching
: >CS     ( n -- ) CS PUSH ;
: CS>     ( -- n ) CS POP ;
: CS>SWAP ( -- )   CS> CS> SWAP >CS >CS ;
: ?CS     CS STACK-DEPTH ABORT" Un-match IF or loop" ;

: ABORT"  ( ? --) \ restores normal Forth interpreter
   DUP IF  ['] <INTERP> 'IV ! THEN  POSTPONE ABORT" ; IMMEDIATE

HEX
\ *** changed for kernel V2.69 ***
\ Words in scratchpad RAM end in a JMP instruction, not NEXT
\ Might change this, but for now make some conventional versions.
CODE DUP    0646 , C584 ,  NEXT, ENDCODE
CODE DROP   C136 ,         NEXT, ENDCODE
CODE !      C536 , C136 ,  NEXT, ENDCODE
CODE @      C114 ,         NEXT, ENDCODE
CODE C@     D114 , 0984 ,  NEXT, ENDCODE
CODE +      A136 ,         NEXT, ENDCODE

\ Heap management
: THERE  ( -- addr) H @ ;  \ returns end of Target memory in HEAP
: HALLOT ( n -- )   H +! ; \ Allocate n bytes of target memory.
: T,     ( n -- )   THERE ! 2 HALLOT ;  \ "target compile" n into heap
: NEW-HEAP ( -- ) 2000 2000 0 FILL   2000 H ! ; \ reset HEAP

045A CONSTANT 'NEXT'  \ 9900 CODE for B *R10   Camel99 Forth's NEXT code

: CODE,  ( xt --)  \ Read code word from kernel, compile into target memory
           >BODY
           DUP 80 CELLS +   \ 256 bytes is max size we will try to compile
           SWAP   ( -- IPend IPstart)
           BEGIN
              DUP @ 'NEXT' <>  \ the instruction is not 'NEXT'
           WHILE
             DUP @  ( -- IP instruction)
             T,     \ compile instruction
             CELL+  \ advance IP
             2DUP < ABORT" End of code not found"
           REPEAT
           2DROP
;
\ now we can steal code word from the kernel and compile it to target memory
: DUP,   ['] DUP  CODE, ;
: DROP,  ['] DROP CODE, ;
: 1-,    ( n -- n') ['] 1-  CODE,  ; \ TOS DEC,
: LIT,   ( -- n) DUP,  0204 T, ( n) T, ; \ compile n as literal in TOS

\ store a byte offset in odd byte of addr.
\ Addr is the location of Jump instruction
: RESOLVE ( addr offset --) 2- 2/ SWAP 1+ C! ;

\ compute offset from addr addr' & complete the jump instruction
: <BACK   ( addr addr' -- ) TUCK -  RESOLVE ;

: ?BYTE ( c -- c)  DUP FF00 AND ABORT" Jump out of range" ;

\ compile misc. jump instructions with no offset.
: JMP,  ( c --) ?BYTE 1000 + T, ;
: JNO,  ( c --) ?BYTE 1900 + T, ;
: JEQ,  ( c --) ?BYTE 1300 + T, ;
: JNE,  ( c --) ?BYTE 1600 + T, ;
: JOC,  ( c --) ?BYTE 1800 + T, ;
: JNC,  ( c --) ?BYTE 1700 + T, ;



\ --------[ JIT compilers for LOOPS and BRANCHING ]---------
\ When the JITS wordlist is placed 1st in the search order
\ these immediate words will run in place of their normal Forth versions.
\ These version compile machine code into the HEAP.
\ In the case of DO it "steals" the code for setting up the loop
\ from the kernel.

VOCABULARY JITS
ALSO JITS DEFINITIONS

: BEGIN    THERE >CS ; IMMEDIATE  \ push location onto control stack

\ <DO> is CODE preamble to setup return stack.
: DO  ( -- there) ['] <DO> CODE,  POSTPONE BEGIN ; IMMEDIATE

: LOOP
     0597 T,                   \ *RP INC,
     CS> THERE  0 JNO, <BACK   \ compute offset, compile into JNO
    ['] UNLOOP CODE,           \ collapse stack frame
; IMMEDIATE

: +LOOP
     0A5CA T,         \ TOS *RP ADD,
     DROP,            \ don't need TOS value anymore
     POSTPONE LOOP    \ compile loop code
; IMMEDIATE

: AGAIN   CS> THERE 0 JMP, <BACK ; IMMEDIATE

: UNTIL
          1-,
          DROP,
          CS> THERE 0 JNC, <BACK ; IMMEDIATE

: IF    ( n -- )
           1-,       \ If tos=0, DEC will cause a carry
           DROP,
           THERE >CS 0 JNC, ; IMMEDIATE

: THEN     CS> THERE OVER - RESOLVE ; IMMEDIATE

: ELSE     THERE >CS  0 JMP,
            CS>SWAP POSTPONE THEN ; IMMEDIATE

: WHILE   ( n -- ) POSTPONE IF CS>SWAP ; IMMEDIATE

: REPEAT   POSTPONE AGAIN  POSTPONE THEN ; IMMEDIATE

\ --------------------------------------------------------------

FORTH DEFINITIONS
\ CFA of a code word contains the address of the next cell
: CODE? ( XT -- ?)  DUP @ 2- = ;

: JITCOMPILE,  ( xt -- )
          DUP CODE? IF  CODE,   EXIT  THEN  \ compile kernel code & exit
  \ xt is not a code word...
          DUP @   \ FETCH the "executor" address
          CASE ( data words )
             ['] DOVAR    OF  >BODY   LIT, ENDOF
             ['] DOCON    OF  EXECUTE LIT, ENDOF
             ['] DOUSER @ OF  EXECUTE LIT, ENDOF
             TRUE ABORT" Can't optimize word"  ( any other Forth word bombs)
         ENDCASE
;

\ new interpreter loop
\ 1. Executes Immediate words: BEGIN WHILE IF THEN etc.
\ 2. Steals kernel code and compiles any CODE word.
\ 3. ABORTS if you try to compile a Forth word
\ 4. Compiles literal numbers with LI instruction

: <JIT> ( -- addr)
         'SOURCE 2!  >IN OFF
          BEGIN
             BL WORD  DUP C@
          WHILE
              FIND ?DUP
              IF ( it's a word)
                  1+ STATE @ 0= OR
                  IF   EXECUTE      \ IF BEGIN etc. are all immediate
                  ELSE JITCOMPILE,  \ code and data are not
                  THEN
              ELSE
                  COUNT NUMBER? ?ERR
                  STATE @ IF LIT, THEN  \ special number compiler
              THEN
              DEPTH 0< ABORT" JIT: Underflow"
          REPEAT
          DROP
;

: JIT: ( -- JIT-xt)
        ALSO JITS
        :
         THERE ( -- XT)   \ execution token (XT) for the NEW compiled code
         DUP CELL+ T,     \ create the ITC header for a CODE word
         ['] <JIT> 'IV !  \ switch to JIT compiler
;

: ;JIT ( JIT-XT -- )
        PREVIOUS
        ['] <INTERP> 'IV !
         'NEXT' T,       \ compile NEXT at end of new code word
         ,               \ compile CODE word's XT into Forth definition
        ?CS
        POSTPONE ;
; IMMEDIATE

HERE SWAP - SPACE DECIMAL . .( bytes)

 

 

  • Like 2
Link to comment
Share on other sites

Is your compiler doing any optimisations such as holding stack values in registers etc? That seems to be the (really) difficult bit. It's relatively easy to have the compiler just essentially copy and paste machine code into a definition, or compile a BL etc. to a machine code routine, but doing optimisations is where it gets really tricky. I wonder how VFX does it?

 

Peephole/pinhole optimsations are relatively simple. I wrote one (un-finished) for TF that would replace oft used Forth incantations with optimised versions. It would replace sequences of words with optimised equivalents - the idea being the equivalents would be written in assembly.

 

Looking at the code, the following (plus some others) are detected:

 

  • swap 1+ swap
  • swap 1- swap
  • swap 2+ swap
  • swap 2- swap
  • swap 2* swap
  • lit +
  • lit -
  • lit *
  • lit =
  • lit <
  • lit >
  • lit <=
  • lit >=

The excercise was to write the code to spot the patterns in the compiled code. After that it's easy to add optimisations. It's just a big CASE statement.

 

Anyway, from a machine code point of view, how would you tackle (for example) this:

 

: test 1 2 + ;

 

Which (naively) would be (pseudo code):

 

  • push 1
  • push 2
  • call +
  • push result

Into:

 

  • LI R0, 3  ; result can be calculated at compile time since both parameters are constants
  • MOV R0, *STACK+ ; push result

Which is all that would be required in an optimised form. Interested to hear your thoughts. 

  • Like 2
Link to comment
Share on other sites

4 hours ago, Willsy said:

Is your compiler doing any optimisations such as holding stack values in registers etc? That seems to be the (really) difficult bit. It's relatively easy to have the compiler just essentially copy and paste machine code into a definition, or compile a BL etc. to a machine code routine, but doing optimisations is where it gets really tricky. I wonder how VFX does it?

 

Peephole/pinhole optimsations are relatively simple. I wrote one (un-finished) for TF that would replace oft used Forth incantations with optimised versions. It would replace sequences of words with optimised equivalents - the idea being the equivalents would be written in assembly.

 

Looking at the code, the following (plus some others) are detected:

 

  • swap 1+ swap
  • swap 1- swap
  • swap 2+ swap
  • swap 2- swap
  • swap 2* swap
  • lit +
  • lit -
  • lit *
  • lit =
  • lit <
  • lit >
  • lit <=
  • lit >=

The excercise was to write the code to spot the patterns in the compiled code. After that it's easy to add optimisations. It's just a big CASE statement.

 

Anyway, from a machine code point of view, how would you tackle (for example) this:

 

: test 1 2 + ;

 

Which (naively) would be (pseudo code):

 

  • push 1
  • push 2
  • call +
  • push result

Into:

 

  • LI R0, 3  ; result can be calculated at compile time since both parameters are constants
  • MOV R0, *STACK+ ; push result

Which is all that would be required in an optimised form. Interested to hear your thoughts. 

At the moment it is totally naive.  The big wins are removing NEXT and loops are way faster.  LOOP is just 2 instructions. :) 

So this:

JIT: DOTEST
    FFFF  0
    DO  LOOP
;JIT

 

Compiles to:

2042  inc  *R7         \ bump the index   
      jno  >2042       \ jump back if no overflow
      ai   R7,>0004    \ unwind return stack           
      b    *R10        \ return to Forth

 

And indeed, the hard parts are reliable effective optimizations. VFX, from what Stephen writes is using all the conventional compiler tricks that you would see in conventional compilers. 

 

In my machine Forth project, I made @ and ! and anything that might touch an address smart enough to use symbolic addressing if it could and that really works well for our old friend. 

I am noodling how to do that under this scheme and also how to reliably remove DROP/DUP code when they are back-to-back

 

This DROP/DUP happens a lot when you put TOS in a register because any primitive that consumes its arguments ends with DROP to refill the TOS register. 

For DROP/DUP removal I might just do a second pass after I compile it. Seems easier than doing it on the fly.

I could probably use a string search to find the 6 bytes of code, and just MOVE the code after it six bytes to the left. 

Once you have that 2nd pass you could do anything. 

 

Your example with + is constant folding as I have recently learned and is part of Mecrisp Forth. I looked at the optimizing code and it is pretty involved. 

 

Thanks for reminding me that I have more work to do! :)

 

 

 

 

 

 

 

  • Like 2
Link to comment
Share on other sites

I got a surprise today.

I did the minor adjustments to the JIT compiler to make it work under direct threaded Forth. (DTC)

It's mostly dealing with the differences in the headers of words.

Spoiler
\ jit.fth DTC VERSION                               Dec 4 2022 Brian Fox
\ Compiles inline code as headless words in HEAP

\ Problem:
\ DTC Forth spends upto 50% of it's time running 2 instructions called NEXT.
\ This system compiles primitives from the kernel as super-instructions
\ and compiles the execution token for the super instructions in a Forth word.

\ NEEDS .S     FROM DSK1.TOOLS
\ NEEDS ELAPSE FROM DSK1.ELAPSE
NEEDS CASE   FROM DSK1.CASE
NEEDS LIFO:  FROM DSK1.STACKS
NEEDS WORDLIST FROM DSK1.WORDLISTS

HERE

8 LIFO: CS     \ small control flow stack for loops and branching
: >CS     ( n -- ) CS PUSH ;
: CS>     ( -- n ) CS POP ;
: CS>SWAP ( -- )   CS> CS> SWAP >CS >CS ;
: ?CS     CS STACK-DEPTH ABORT" Un-match IF or loop" ;

: ABORT"  ( ? --) \ restores normal Forth interpreter
   DUP IF  ['] <INTERP> 'IV ! THEN  POSTPONE ABORT" ; IMMEDIATE

HEX
\ *** changed for kernel V2.69 ***
\ Words in scratchpad RAM end in a JMP instruction, not NEXT
\ Might change this, but for now make some conventional versions.
CODE DUP    0646 , C584 ,  NEXT, ENDCODE
CODE DROP   C136 ,         NEXT, ENDCODE
CODE !      C536 , C136 ,  NEXT, ENDCODE
CODE @      C114 ,         NEXT, ENDCODE
CODE C@     D114 , 0984 ,  NEXT, ENDCODE
CODE +      A136 ,         NEXT, ENDCODE

\ CFA of a Forth word contains the BL @xxxx instruction
: FORTH? ( xt -- ?)  @ 06A0 = ;

\ Heap management
: THERE  ( -- addr) H @ ;  \ returns end of Target memory in HEAP
: HALLOT ( n -- )   H +! ; \ Allocate n bytes of target memory.
: T,     ( n -- )   THERE ! 2 HALLOT ;  \ "target compile" n into heap
: NEW-HEAP ( -- ) 2000 2000 0 FILL   2000 H ! ; \ reset HEAP

045A CONSTANT 'NEXT'  \ 9900 CODE for B *R10   Camel99 Forth's NEXT code

: CODE,  ( CODExt --)  \ Read code word from kernel, compile to HEAP
           80 CELLS  ( -- addr len)
           BOUNDS ( -- IPend IPstart)
           BEGIN
              DUP @ 'NEXT' <>  \ the instruction is not 'NEXT'
           WHILE
             DUP @  ( -- IP instruction)
             T,     \ compile instruction
             CELL+  \ advance to next instruction
             2DUP < ABORT" End of code not found"
           REPEAT
           2DROP
;

: COMPILES ( <codeword> ) POSTPONE [']  POSTPONE CODE, ; IMMEDIATE

\ now we can steal code word from the kernel and compile it to target memory
: DUP,   COMPILES DUP ;

HEX
C136 CONSTANT 'DROP'   \ 9900 machine code for DROP (*SP+ R4 MOV,)

\ Forth primitive compilers
: DROP,  COMPILES DROP ;
: 1-,    ( n -- n') COMPILES 1-  ;     \ TOS DEC,
: TOS!,   ( n --)  0204 T,  ( n) T, ;  \ LI R4,nnnn
: LIT,   ( -- n) DUP,  TOS!, ;         \ compile n as literal in TOS

\ store a byte offset in odd byte of addr.
\ Addr is the location of Jump instruction
: RESOLVE ( addr offset --) 2- 2/ SWAP 1+ C! ;

\ compute offset from addr addr' & complete the jump instruction
: <BACK   ( addr addr' -- ) TUCK -  RESOLVE ;

: ?BYTE ( c -- c)  DUP FF00 AND ABORT" Jump out of range" ;

\ compile jump instructions with no offset
: JMP,  ( c --) ?BYTE 1000 + T, ;
: JNO,  ( c --) ?BYTE 1900 + T, ;
: JNC,  ( c --) ?BYTE 1700 + T, ;

\ --------[ JIT compilers for LOOPS and BRANCHING ]---------
\ When the JITS wordlist is placed 1st in the search order
\ these immediate words will run in place of their normal Forth versions.
\ These version compile machine code into the HEAP.
\ In the case of DO it "steals" the code for setting up the loop
\ from the kernel.

VOCABULARY JITS
ALSO JITS DEFINITIONS

: BEGIN    THERE >CS ; IMMEDIATE  \ push location onto control stack

\ <DO> is CODE preamble to setup return stack.
: DO  ( -- there) COMPILES <DO>   POSTPONE BEGIN ; IMMEDIATE

: LOOP
     0597 T,                   \ *RP INC,
     CS> THERE  0 JNO, <BACK   \ compute offset, compile into JNO
     COMPILES UNLOOP           \ collapse stack frame
; IMMEDIATE

: +LOOP
     0A5CA T,         \ TOS *RP ADD,
     DROP,            \ don't need TOS value anymore
     POSTPONE LOOP    \ compile loop code
; IMMEDIATE

: AGAIN   CS> THERE 0 JMP, <BACK ; IMMEDIATE

: UNTIL
          1-,
          DROP,
          CS> THERE 0 JNC, <BACK ; IMMEDIATE

: IF    ( n -- )
           1-,       \ If tos=0, DEC will cause a carry
           DROP,
           THERE >CS 0 JNC, ; IMMEDIATE

: THEN     CS> THERE OVER - RESOLVE ; IMMEDIATE

: ELSE     THERE >CS  0 JMP,
            CS>SWAP POSTPONE THEN ; IMMEDIATE

: WHILE   ( n -- ) POSTPONE IF CS>SWAP ; IMMEDIATE

: REPEAT   POSTPONE AGAIN  POSTPONE THEN ; IMMEDIATE

\ --------------------------------------------------------------


FORTH DEFINITIONS
\ In DTC ad CODE word does not begin with BL instruction
: CODE? ( XT -- ?)  FORTH? 0= ;

: JITCOMPILE,  ( xt -- )
          DUP CODE? IF  CODE,   EXIT  THEN  \ compile kernel code & exit
  \ xt is not a code word...
          DUP CELL+ @   \ FETCH the "executor" address
          CASE ( data words )
             _DOVAR  OF  >BODY   LIT, ENDOF
             _DOCON  OF  EXECUTE LIT, ENDOF
             _DOUSER OF  EXECUTE LIT, ENDOF
             TRUE ABORT" Can't optimize word"  ( any other Forth word bombs)
         ENDCASE
;

\ new interpreter loop
\ 1. Executes Immediate words: BEGIN WHILE IF THEN etc.
\ 2. Steals kernel code and compiles any CODE word.
\ 3. ABORTS if you try to compile a Forth word
\ 4. Compiles literal numbers with LI instruction

: <JIT> ( -- addr)
         'SOURCE 2!  >IN OFF
          BEGIN
             BL WORD  DUP C@
          WHILE
              FIND ?DUP
              IF ( it's a word)
                  1+ STATE @ 0= OR
                  IF   EXECUTE      \ IF BEGIN etc. are all immediate
                  ELSE JITCOMPILE,  \ code and data are not
                  THEN
              ELSE
                  COUNT NUMBER? ?ERR
                  STATE @ IF LIT, THEN  \ special number compiler
              THEN
              DEPTH 0< ABORT" JIT: Underflow"
          REPEAT
          DROP
;

: JIT: ( -- JIT-xt)
        ALSO JITS         \ put JITS VOCABULARY first in search order
        :
         THERE ( -- XT)   \ execution token (XT) for the NEW compiled code
         DUP CELL+ T,     \ create the ITC header for a CODE word
         ['] <JIT> 'IV !  \ switch to JIT compiler
;

: ;JIT ( JIT-XT -- )
        PREVIOUS          \ restore previous search order
        ['] <INTERP> 'IV !
         'NEXT' T,        \ compile NEXT at end of new code word
         ,                \ compile CODE word's XT into Forth definition
        ?CS
        POSTPONE ;
; IMMEDIATE

HERE SWAP - SPACE DECIMAL . .( bytes)

 

 

When I JIT compiled the sieve and ran it under DTC Forth it was .8 seconds slower than when I run the same code in ITC Forth. 🤔

I don't why exactly but it could be the extra BL at the start of every hi-level word eats more time than I ever realized.

But program spends most of its time running the machine code in DO-PRIME.  ??

 

-------------

One small change I have made is to create the word COMPILES. This makes it super easy to steal code primitives from the Forth kernel.

045A CONSTANT 'NEXT'  \ 9900 CODE for B *R10   Camel99 Forth's NEXT code

: CODE,  ( CODExt --)  \ Read code word from kernel, compile to HEAP
           80 CELLS  ( -- addr len)
           BOUNDS ( -- IPend IPstart)
           BEGIN
              DUP @ 'NEXT' <>  \ the instruction is not 'NEXT'
           WHILE
             DUP @  ( -- IP instruction)
             T,     \ compile instruction
             CELL+  \ advance to next instruction
             2DUP < ABORT" End of code not found"
           REPEAT
           2DROP
;

: COMPILES ( <codeword> ) POSTPONE [']  POSTPONE CODE, ; IMMEDIATE

 


With COMPILES we can steal code from the kernel and compile it inline. 


: DUP,   COMPILES DUP ;
: +,      COMPILES + ;

 

CODE 2*   DUP,  +,  NEXT,  ENDCODE 

 

 

 

  • Like 2
Link to comment
Share on other sites

I think I will start using Visual Studio Code since the Atom editor is being put out to pasture.

So far VSC works great and there is even a Forth language hi-lighter. :)

 

Sunsetting Atom

We are archiving Atom and all projects under the Atom organization for an official sunset on December 15, 2022.

 

 

 

  • Like 1
Link to comment
Share on other sites

On 12/1/2022 at 6:22 AM, Willsy said:

Anyway, from a machine code point of view, how would you tackle (for example) this:

 

: test 1 2 + ;

 

Which (naively) would be (pseudo code):

 

  • push 1
  • push 2
  • call +
  • push result

Into:

 

  • LI R0, 3  ; result can be calculated at compile time since both parameters are constants
  • MOV R0, *STACK+ ; push result

Which is all that would be required in an optimised form. Interested to hear your thoughts. 

Getting back to this question from @Willsy

 

An idea that I learned from Tom Almy's native code compiler is to create a literal stack.

So at compile time it would work like this (I think):

  • 1   -> literal stack (no code generated)
  • 2   -> literal stack (no code generated) 
  • +    is smart with 3 cases:
    • nothing on literal stack: Do normal Forth +
    • One literal stack item: Load into temp register add temp to TOS 
    • Two literal stack items:  Add lit. stack items. Compile as a LIT

 

Link to comment
Share on other sites

This code works for a test of this idea.

To use it in a functional compiler we would need to make a new interpreter loop that puts literals on the literal stack and all the math operators need a case statement which gets pretty big.

 

\ constant folding tests
NEEDS WORDLIST FROM DSK1.WORDLISTS

ONLY FORTH DEFINITIONS
NEEDS DUMP  FROM DSK1.TOOLS 
NEEDS LIFO: FROM DSK1.STACKS 

VOCABULARY ASSEMBLER 

ONLY FORTH ALSO ASSEMBLER DEFINITIONS
NEEDS MOV,  FROM DSK1.ASM9900

\ Assembler psuedo instruction macros 
: LOAD,  ( n register -- )  SWAP LI,  ;
: LIT,   ( n ) TOS PUSH,  TOS LOAD,  ;

ONLY FORTH DEFINITIONS ALSO ASSEMBLER 

8 LIFO: LITSTK 
: >LS   ( n --) LITSTK PUSH ;
: LS>   ( -- n) LITSTK POP ;
: LITS  ( -- n ) LITSTK  STACK-DEPTH ;

: OPT+  
    LITS 
    CASE 
        0 OF  *SP+ TOS ADD,    ENDOF 

        1 OF   LS> R3 LOAD,   
               R3 TOS ADD,     ENDOF 
        
        2 OF   LS> LS> +   \ add at compile time
               LIT,            ENDOF 
    ENDCASE    
;

CODE TEST0 ( n n -- n) \ 2 ARGS on data stack 
     OPT+
     NEXT,
ENDCODE      

CODE TEST1 ( n -- n) \ 1 arg on data stack, 1 arg on lit stack 
    99 >LS 
    OPT+ 
    NEXT,
ENDCODE    

CODE TEST2 ( -- n) \ 2 args on lit stack. constant folding  
    1 >LS
    2 >LS 
    OPT+ 
    NEXT,
ENDCODE    


 

Link to comment
Share on other sites

I back migrated ideas from SUPERSAVE to the SAVESYS word. SAVESYS now checks the H variable (heap pointer) and will save that memory as well.

This is very nice if you have a pile of data in your program that was compiled into the HEAP. Now you don't need to put a file loader in the program.

Long overdue but better late than never. 

 

I will dump my current library of source code text files on Github tonight.

 

\ ----------------
\  TEST CODE
INCLUDE DSK1.MALLOC

HEX 800 MALLOC CONSTANT MYBUFFER \ mybuffer is in Low RAM

MYBUFFER 800  CHAR $ FILL     \ put something in memory to see it work
   
: GO   WARM  ABORT ;          \ minimum startup code to start Forth interpreter

LOCK                          \ lock dictionary to current size on re-boot

INCLUDE DSK1.SAVESYS 

' GO SAVESYS DSK7.TESTKERNEL  \ CODE @ >A000 and DATA @ >2000 will be saved as E/A5 program

 

Quote
CR .( SAVESYS.FTH  creates EA5 program Jun 2022 B Fox)
\ creates a binary program E/A 5 format.
\ Makes as many files as needed to save the system
\ Jun 2022 version fixed section overlap. Tested with check sum.
\ Dec 2022 saves the HEAP (Low RAM) as a file if variable H <> 0

\ Usage example:
\  INCLUDE DSK2.MYPOGRAM   ( load all your code)
\  : STARTUP     WARM   CR ." Myprogram ready"  ABORT" ;
\  LOCK   ( this locks the dictionary to the current size )
\
\   INCLUDE DSK1.SAVESYS
\  ' STARTUP SAVESYS DSK3.MYFILENAME

\ NEEDS DUMP      FROM DSK1.TOOLS
NEEDS LOCK      FROM DSK1.MARKER
NEEDS LOAD-FILE FROM DSK1.LOADSAVE  \ we use SAVE-FILE from this library
NEEDS U.R       FROM DSK1.UDOTR

HERE
HEX
A000 CONSTANT HIMEM     \ start of Camel99 Forth program in CPU RAM
1000 CONSTANT VDPBUFF  \ Programs write to file from VDP Ram
2000 CONSTANT LOWRAM
2000 CONSTANT 8K
8K 3 CELLS - CONSTANT IMGSIZE \ makes space for header cells
  13 CONSTANT PROGRAM     \ file mode for Program files

\ define the file header fields. *THESE ARE VDP ADDRESSES*
VDPBUFF            CONSTANT MULTIFLAG
VDPBUFF  1 CELLS + CONSTANT PROGSIZE
VDPBUFF  2 CELLS + CONSTANT LOADADDR
VDPBUFF  3 CELLS + CONSTANT CODEBUFF  \ COPY 8K program chunks to here
         3 CELLS   CONSTANT HEADLEN

: HEADER  ( Vaddr size ?) \ store header info in VDP RAM
    MULTIFLAG V!  PROGSIZE V!  LOADADDR V! ;

: END  ( -- addr )
  ORGDP @ DUP C000 < IF HONK CR ." WARNING: missing LOCK directive" THEN ;

 \ words to compute Forth system properties
: SYS-SIZE    ( -- n)  HIMEM  END  SWAP - ;
: #FILES      ( -- n)  SYS-SIZE 8K /MOD SWAP IF 1+ THEN ;
: CODECHUNK   ( n -- addr) IMGSIZE * HIMEM + ;
: CHUNKSIZE   ( n -- n ) CODECHUNK END SWAP -  IMGSIZE MIN ;
: LASTCHAR++  ( Caddr len --)  1- +  1 SWAP C+! ;
: HEAPSIZE    ( -- n)  H @ LOWRAM - ;
: ?PATH  ( addr len -- addr len )
  2DUP  [CHAR] . SCAN NIP 0= ABORT" Path expected" ;

: GET-PATH    ( <text>) BL PARSE-WORD ?PATH  PAD PLACE ;

: FILENAME    ( -- addr len) PAD COUNT ;

VARIABLE FILECOUNT

: SAVE-IMAGE ( addr len Vaddr size -- )
    CR ." Writing file: " FILENAME TYPE
    HEADLEN +  PROGRAM SAVE-FILE
    FILENAME LASTCHAR++
    FILECOUNT 1+! ;

: SAVELO ( -- )
    HEAPSIZE
    IF
        LOWRAM HEAPSIZE DUP>R FALSE HEADER \ heap is last file saved
        LOWRAM CODEBUFF R@ VWRITE          \ copy HEAP to VDP
        FILENAME VDPBUFF R> SAVE-IMAGE
    THEN ;

HEX
: SAVEHI ( -- )
    #FILES 0
    ?DO
      \ compute file header values
       I CODECHUNK  I CHUNKSIZE       ( -- addr size )
       I 1+ #FILES <>  HEAPSIZE 0> OR \ multiflag=true if heap has data
       ( addr size ?) HEADER          \ store in file header
      \ Copy to VDP RAM
       LOADADDR V@  CODEBUFF  PROGSIZE V@ HEADLEN +  VWRITE
      \ write VDP to disk"
       FILENAME   VDPBUFF   PROGSIZE V@  SAVE-IMAGE
    LOOP
;
: .BYTES&ADDR ( addr size --)
   DECIMAL 5 U.R ."  bytes, at " HEX ." >" 4 U.R ;

: REPORT
    CR
    CR ." Himem : "  HIMEM  ORGDP @ OVER -  .BYTES&ADDR
    CR ." Heap  : "  LOWRAM  HEAPSIZE  .BYTES&ADDR
    CR ." Saved in " FILECOUNT @ .  ." EA5 files"
    CR
;

: SAVESYS ( xt -- <path>)
    BOOT !
    FILECOUNT OFF
    GET-PATH  SAVEHI  SAVELO REPORT ;

HERE SWAP - CR DECIMAL . .( bytes)

 

 

Edited by TheBF
fixed comment
  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...