Jump to content
IGNORED

Camel99 Forth Information goes here


TheBF

Recommended Posts

4 minutes ago, Vorticon said:

Well nuts. This is what I did:

bcd2hex mov     r4,r5           ;isolate low digit
        andi    r5,000fh
        mov     r4,r6           ;isolate high digit
        andi    r6,00f0h
        srl     r6,4
        mov     r6,r4           ;save original value
        sla     r6,3            ;multiply by 8
        sla     r4,1            ;multiply original value by 2
        a       r4,r6           ;high value has now been multiplied by 10
        a       r5,r6           ;add low and high numbers. r6 now has hex value
        b       *r11

 

Welcome to my club. :-)

If pascal or you have a hex output procedure you can use that on each byte.  The clock just never counts past 9 for any digit.

  • Haha 1
Link to comment
Share on other sites

Taking @Vorticon 's advice I implemented words to handle the UTI. (no, not a urinary tract infection) 

In this case the Update transfer inhibit bit. 

 

I also add words to set the time and date and went wild and added words to generate a time and date string which could be used to stamp a report.

There are also words to convert BCD to single precision integers and the reciprocal operator as well. 

 

It would not be too hard to port this stuff to the other systems.  My consultation services are available and my regular rates apply. 🙂

 

(For the Forth student it could be worth looking at how we get string concatenation here, reverse polish style, without including a whole string library.)

 

\ IDECLOCK.FTH       July 2024 B FOX 

\ INCLUDE DSK1.TOOLS  ( DEBUG only)

\ clock registers in memory 
HEX 
4020 CONSTANT secs   \ Register 0: Seconds. Valid values: >00 to >59.
4024 CONSTANT mins   \ Register 2: Minutes. Valid values: >00 to >59.
4028 CONSTANT hrs    \ Register 4: Hours. Valid values: >00 to >23, 

402C CONSTANT day    \ Register 6: Day of the month. Valid values: >01 through >31.
4032 CONSTANT month  \ Register 9: Month. Valid values >01 to >12.
4034 CONSTANT year   \ Register 10: Year. Valid values >00 to >99.

403C CONSTANT ctrl   \ Register 14: Control register

1000 CONSTANT IDECARD 

DECIMAL
12 2* USER CRU   \ address of R12 in any Camel Forth workspace 

\ Machine code CRU words
HEX
CODE 0SBO  ( -- ) 1D00 ,  NEXT, ENDCODE
CODE 0SBZ  ( -- ) 1E00 ,  NEXT, ENDCODE

CODE 1SBO  ( -- ) 1D01 ,  NEXT, ENDCODE
CODE 3SBO  ( -- ) 1D03 ,  NEXT, ENDCODE

\ BCD integer conversions 
: BCD>S ( bcd -- n ) \ "BCD to single"
    DUP  0F AND
    SWAP F0 AND
    4 RSHIFT 0A * + ;

DECIMAL
: S>BCD ( n -- bcd )  10 /MOD  4 LSHIFT + ;

\ Update Transfer Inhibit
: UTI-ON   ( -- ) ctrl DUP C@ 8 OR SWAP  C! ;
: UTI-OFF  ( -- ) ctrl DUP C@ 7 AND SWAP C! ;

: CLOCK-ON 
    IDECARD CRU !  
    0SBO            
    1SBO            \ enable mapping of >4000 - >40ff space
    3SBO            \ fixed page at >4000 - >40ff
;

: TIME@ ( -- secs min hrs)
    CLOCK-ON 
    UTI-ON 
    secs C@  mins C@  hrs C@ 
    UTI-OFF 
    0SBZ            \ card off 
;     

: DATE@ ( -- day month yr )
    CLOCK-ON 
    day C@ month C@ year C@ 
    0SBO
;
  
\ formatted output 

\ convert n into a 2 digit string. can be used to create a time/date string 
:  (##)  ( n --) addr len ) 0 <#  # #  #>  ;

: .##    ( n --) (##) TYPE ;

: .TIME  ( -- ) 
    BASE @ >R  
    HEX TIME@ .## ':' .## ':' .## 
    R> BASE ! ;         

: .DATE  
   BASE @ >R HEX 
   DATE@ .## ." /"  .## ." /" .## 
   R> BASE !  
;

: .DATE&TIME   .DATE  SPACE .TIME ;

\ +PLACE  concatenates (addr n) to counted string $addr
: +PLACE  ( addr n $addr -- ) 2DUP 2>R  COUNT +  SWAP MOVE 2R> C+! ;

\ syntax suger: concatenate stack string to pad  
: &       ( addr len -- ) PAD +PLACE ; 

\ usage:  TIME@ >TIME$ 
: >TIME$  ( mins secs hrs -- addr len) 
    PAD OFF  
    BASE @ >R 
    HEX (##) &  S" :" &  (##) & S" :" &  (##) & 
    R> BASE !  

    PAD COUNT ;

\ Usage: DATE@ >DATE$ 
: >DATE$  ( day month year -- addr len) 
    PAD OFF  
    BASE @ >R 
    HEX (##) &  S" /" &  (##) &  S" /" &  (##) & 
    R> BASE !  

    PAD COUNT ;

\ Set the clock words 
DECIMAL
: TIME! ( hr min sec -- )
       CLOCK-ON  
       UTI-ON 
       S>BCD secs C!  
       S>BCD mins C!       
       S>BCD hrs  C!     
       UTI-OFF 
       0SBO ;         

: DATE! ( yr month day-- )
       CLOCK-ON  
       UTI-ON 
       S>BCD day C!  
       S>BCD month C!       
       S>BCD year  C!     
       UTI-OFF 
       0SBO ;         

 

  • Like 2
Link to comment
Share on other sites

1 hour ago, Vorticon said:

I noticed you omitted fetching the day of week at >4030 (Sunday = 1). You might as well add it to make your package complete :)

Good catch. I have all the "words" to do it now. 

  • Like 1
Link to comment
Share on other sites

So here is an even bigger version.  You probably never use all of this. It compiles to 1418 bytes.

But is has examples of how to manage this kind of stuff in Forth. 

 

Some concepts explored here:

I changed things up and used create/does>  to make clock "field readers" that all use the same code. This saves space.

Created words that make it pretty simple to make the date format anything you like. 

Used compact string arrays that pack the strings end to end for months and days of the week

Spoiler
\ IDECLOCK.FTH       July 2024 B FOX

\ INCLUDE DSK1.TOOLS  ( DEBUG only)

HERE 
\ clock registers in memory 
HEX 
4020 CONSTANT secs   \ Register 0: Seconds. Valid values: >00 to >59.
4024 CONSTANT mins   \ Register 2: Minutes. Valid values: >00 to >59.
4028 CONSTANT hrs    \ Register 4: Hours. Valid values: >00 to >23, 

402C CONSTANT day    \ Register 6: Day of the month. Valid values: >01 through >31.
4030 CONSTANT dow    \ day of the week
4032 CONSTANT month  \ Register 9: Month. Valid values >01 to >12.
4034 CONSTANT year   \ Register 10: Year. Valid values >00 to >99.

403C CONSTANT ctrl   \ Register 14: Control register

1000 CONSTANT IDECARD 

DECIMAL
12 2* USER CRU   \ address of R12 in any Camel Forth workspace 

\ Machine code CRU words
HEX
CODE 0SBO  ( -- ) 1D00 ,  NEXT, ENDCODE
CODE 0SBZ  ( -- ) 1E00 ,  NEXT, ENDCODE

CODE 1SBO  ( -- ) 1D01 ,  NEXT, ENDCODE
CODE 3SBO  ( -- ) 1D03 ,  NEXT, ENDCODE

\ BCD integer conversions 
: BCD>S ( bcd -- n ) \ "BCD to single"
    DUP  0F AND
    SWAP F0 AND
    4 RSHIFT 0A * + ;

DECIMAL
: S>BCD ( n -- bcd )  10 /MOD  4 LSHIFT + ;

\ Update Transfer Inhibit
: UTI-ON   ( -- ) ctrl DUP C@ 8 OR SWAP  C! ;
: UTI-OFF  ( -- ) ctrl DUP C@ 7 AND SWAP C! ;

: CLOCK-ON 
    IDECARD CRU !  
    0SBO            
    1SBO            \ enable mapping of >4000 - >40ff space
    3SBO            \ fixed page at >4000 - >40ff
;

\ creates "field readers" that read a clock field using common code. 
\ slower reads, but saves a lot of space
: CLOCK-FIELD:   
    CREATE  ,     
    DOES> 
        CLOCK-ON 
        UTI-ON 
        @         \ get the clock address from this word
        C@ BCD>S  \ read the address and convert
        UTI-OFF 
        0SBO ;    \ card off 

\ define the field readers 
 secs CLOCK-FIELD: SECS@ 
 mins CLOCK-FIELD: MINS@ 
  hrs CLOCK-FIELD: HRS@  
  dow CLOCK-FIELD: DOW@ 
  day CLOCK-FIELD: DAY@ 
month CLOCK-FIELD: MONTH@  
 year CLOCK-FIELD: YEAR@ 

: TIME@ ( -- secs min hrs)    SECS@ MINS@ HRS@ ;
: DATE@ ( -- day month year) DAY@ MONTH@ YEAR@ ;  

\ formatted output 

\ returns a string. can be used to create a time/date string 
:  (##)   ( -- addr len ) 0 <#  # #  #>  ;

: .##    (##) TYPE ;


\ +PLACE  concatenates (addr n) to counted string $addr
: +PLACE  ( addr n $addr -- ) 2DUP 2>R  COUNT +  SWAP MOVE 2R> C+! ;

\ syntax suger: concatenate stack string to pad  
: &       ( addr len -- ) PAD +PLACE ; 

\ usage:  TIME@ >TIME$ 
: >TIME$  ( mins secs hrs -- addr len) 
    PAD OFF  
    BASE @ >R 
    DECIMAL (##) &  S" :" &  (##) & S" :" &  (##) & 
    R> BASE !  

    PAD COUNT ;

\ Usage: DATE@ >DATE$ 
: >DATE$  ( day month year -- addr len) 
    PAD OFF  
    BASE @ >R 
    DECIMAL (##) &  S" /" &  (##) &  S" /" &  (##) & 
    R> BASE !  

    PAD COUNT ;

\ Set the clock words 
DECIMAL
: TIME! ( hr min sec -- )
       CLOCK-ON  
       UTI-ON 
       S>BCD secs C!  
       S>BCD mins C!       
       S>BCD hrs  C!     
       UTI-OFF 
       0SBO ;         

: DATE! ( yr month day-- )
       CLOCK-ON  
       UTI-ON 
       S>BCD day C!  
       S>BCD month C!       
       S>BCD year  C!     
       UTI-OFF 
       0SBO ;         

\ OUTPUT WORDS 
: .TIME  ( -- ) 
    BASE @ >R  
    DECIMAL TIME@ .## ." :" .## ." :" .## 
    R> BASE ! ;         

\ compact string array. Uses count byte as link to next string.
: NTH$ ( $array n -- address len )
  0 DO  COUNT +  ALIGNED  LOOP COUNT ;

CREATE MONTHS
  S"  " S,    S" Jan" S,
  S" Feb" S,  S" Mar" S,
  S" Apr" S,  S" May" S,
  S" Jun" S,  S" Jul" S,
  S" Aug" S,  S" Sep" S,
  S" Oct" S,  S" Nov" S,
  S" Dec" S,  0 ,

: ]MONTH  ( n -- addr len)
  DUP 13 1 WITHIN ABORT" Bad month#"
  MONTHS SWAP NTH$ ;

CREATE DAYS
  S"  " S,
  S" Sunday" S,
  S" Monday" S,
  S" Tuesday" S,
  S" Wednesday" S,
  S" Thursday" S,
  S" Saturday" S,
  S" Friday" S,
  0 ,

: ]DAY ( n --) 
    DUP 7 1 WITHIN ABORT" Bad day#" 
    DAYS SWAP NTH$ ;

: .DOW     ( --) DOW@ ]DAY TYPE ;
: .MONTH   ( --) MONTH@ ]MONTH TYPE ;
: .YEAR    ( --) YEAR@ 20 .##  .## ;

: .M/D/Y      ( -- ) MONTH@  .## ." /" DAY@ .## ." /"  YEAR@ .## ;
: .Y-M-D      ( -- ) YEAR@ .YEAR ." -" MONTH@ .## ." -" DAY@ .## ;
: .D.M.Y      ( -- ) DAY@ .## ." ." MONTH@ .## ." ." YEAR@ .## ;
: .USADATE    ( -- ) .MONTH SPACE  DAY@ .## ." , "  .YEAR ;
: .FORTH-DATE ( -- ) DAY@ .##  .MONTH  YEAR@ .## ;
: .LONG-DATE  ( -- )  .DOW  ." , "  .MONTH SPACE DAY@ .## ." , "  .YEAR ;  
: .STAMP       ( -- )  .LONG-DATE ." , " .TIME ;
HERE SWAP - DECIMAL . .( bytes)
\ TEST
CR 
.M/D/Y  CR 
.Y-M-D  CR
.D.M.Y CR 
.USADATE  CR 
.FORTH-DATE  CR 
.LONG-DATE  CR 
.STAMP  CR 

 

 

COM1 - Tera Term VT 2024-07-18 10_57_40 AM.png

  • Like 2
Link to comment
Share on other sites

You really went all out, didn't you :lol: An equivalent idea in Pascal would be to create a unit named CLOCK which would have a set of procedures that format and display the date and/or time in a variety of ways while accepting x,y screen coordinates as input. You just gave me another thing to do!

  • Like 1
  • Haha 1
Link to comment
Share on other sites

1 hour ago, Vorticon said:

You really went all out, didn't you :lol: An equivalent idea in Pascal would be to create a unit named CLOCK which would have a set of procedures that format and display the date and/or time in a variety of ways while accepting x,y screen coordinates as input. You just gave me another thing to do!

 

Confession time.  I found a file from 1991 that I had written for HsForth to start with. 

But I made a lot of changes to my thinking from 33 years ago!. Wow time is relentless.

 

You mean like this:

: XY.TIME    AT-XY .TIME ; 
: XY.DATA    AT-XY .DATE ; 

;)

 

  • Like 2
Link to comment
Share on other sites

I always wondered how few chips it would take to make a Forth CPU. This is cool.

I think the world should credit Chuck's work properly. 

There are Von Neumann architectures and there are Harvard Architectures but nobody talks about the "Moore Architecture".

Memory, 2 stacks and I/O. 

 

I think his best CPU design was ShBoom. :) 

 

Chip Hall of Fame: Computer Cowboys Sh-Boom Processor - IEEE Spectrum

  • Like 3
Link to comment
Share on other sites

2 hours ago, Vorticon said:

Saw this today. Pretty cool.

 

I need more life span. :) 

 

I started spinning on what it would look like to build one of these 7400 CPUs but incorporate a couple of 74181 ALU chips to speed up math operations 74181 - Wikipedia

AND...

Steal the 9900 idea of using RAM for the registers except the register window would actually be the data stack and return stack ( 32 cells each ie: 128 bytes) 

Internally with this design you would only need a workspace register, a data stack register (8 bits) and a return stack register 8 bits) 

Like Camel99 Forth, workspace register could be used as the base address for the "user area" where task local variables are stored.

 

And on it goes down the rabbit hole... 

 

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...
Posted (edited)

In process of working on the VI99 editor I of course began to wonder how I could hold bigger files.

The original VI used a swap file and I could go that way using BLOCK so that's a consideration.

 

Of course the super power is in the SAMS card.  Previously for ED99 I made a 2 "window" memory manager so you could freely copy records from anywhere to anywhere in a SAMS 64K segment. 

 

I went down the rabbit hole of how would I do that on a character by character basis.  My solution was to wrap the entire thing into a sub-program with its own workspace.

I used registers as variables and constants to try and keep things as speedy as possible. But of course now these registers are in expansion RAM so they are slower. 

The manager as is compiles to 464 bytes. 

It works but I don't know if the editor could tolerate moving data at Forth speeds. 

 

Anyway here it is.

I like it but it probably needs much faster FILL and CMOVE code words that somehow use the SAMS manager differently. 

 

Spoiler
\ SAMS 2 window virtual memmory system as SUB-PROGRAM  2024 Brian Fox

\ This version replaces variables with registers in a workspace.


NEEDS DUMP FROM DSK1.TOOLS
NEEDS MOV, FROM DSK1.ASM9900
NEEDS ELAPSE FROM DSK1.ELAPSE 

NEEDS SAMSINI  FROM DSK1.SAMSINI

HERE
 
CREATE SAMSWKSP   16 CELLS ALLOT 

\ init the registers at compile time for debugging
HEX
SAMSWKSP 16 CELLS FF FILL 

\ name R0 for easy Forth access 
SAMSWKSP CONSTANT SEG  

\ set the segment in R0 like this
1 SEG ! 

\ store with memory increment 
: !++   ( addr n -- addr') OVER ! CELL+ ;

: INIT-SAMSWKSP 
    SAMSWKSP  
       1 !++     \ R0 = segment
    2000 !++     \ R1 = last window used 
      0  !++     \ R2  page# in window0 
      0  !++     \ R3  page# in window1
    2000 !++     \ R4 low ram window0
    3000 !++     \ R5 low ram window1 
    DROP 
;     

INIT-SAMSWKSP

\ ==========================================
\ REGISTER Usage in VIRTWSP
\ separate workspace allows register renaming for code clarity 

R0 CONSTANT SEG#    \ 64K segment in use  
R1 CONSTANT USED    \ last used memory WINDOW ( 2000 or 3000 ) 
R2 CONSTANT PAGE0   \ SAMS page number in window 0 (variable)
R3 CONSTANT PAGE1   \ SAMS page number in window 1 (variable)
R4 CONSTANT WINDOW0 \ WINDOW0 address (constant)
R5 CONSTANT WINDOW1 \ WINDOW1 address (constant)
R6 CONSTANT OFFSET  \ OFFSET into window (computed)
R7 CONSTANT PAGE#   \ SAMS PAGE# (computed)
\ R8    scratch 
\ R9    scratch 
\ R10   scratch 
\ R11   scratch 
\ R12   scratch 
\ R13   address of Forth workspace  
\ R14   Forth program counter
\ R15   Forth status  


\ address of Forth TOS register in Forth workspace 
: [TOS]  ( -- ) ?EXEC  8 R13 () ; IMMEDIATE

\ 9900 sub-program. NOT a Forth word. Called with BLWP 
CREATE _real ( virtaddr  --  realaddress)
    SAMSWKSP , HERE CELL+ ,  \ compile a vector to code below 

\ perform TOS 4096 /MOD to compute offset,page# 
    [TOS] OFFSET MOV,    \ virtaddress in r6 
    R6    PAGE# MOV,     \ dup in R7

    PAGE# 0C SRL,        \ R7 / 4096 = quot.
    PAGE#  4 SLA,        \ R7 = samspage# = quotient * 16

    OFFSET 0FFF ANDI,    \ R6 = offset = virtaddr MOD >1000
      
\ fast tests if page# is already in memory 
    PAGE# PAGE0 CMP,
    EQ IF,
        WINDOW0 [TOS] MOV,     \ use 1st window
         OFFSET [TOS] ADD,     \ add the offset
                      RTWP,    \ return to Forth
    ENDIF,
\ test if page# is in 2nd buffer
    PAGE# PAGE1 CMP, 
    EQ IF,
        WINDOW1 [TOS] MOV,    \ use 2nd window,
         OFFSET [TOS] ADD,    \ add the offset
                      RTWP,   \ return to Forth
    ENDIF,

\ ***********************************************
\ * page# not in memory:  *
\ ***********************************************
      USED WINDOW0 CMP,   \ last window was window0 ? 
      EQ IF,  ( yes so used window1)
            WINDOW1 USED MOV,   \ use window1 @ 3000  
            PAGE#  PAGE1 MOV,   \ remember this page#

      ELSE, ( must have used window1 )
            WINDOW0 USED MOV,   \ use window0 @ 2000
            PAGE#  PAGE0 MOV,   \ remember this page# is in window0
      ENDIF,  
      USED R9 MOV,      \ copy memory window in use to R9 

\ map new page into a RAM window         
         R9    0B SRL,   \ R9 = index into SAMS registers

         R12 1E00 LI,    \ cru address of SAMS
                0 SBO,   \ SAMS card on
            PAGE# SWPB,  \ swap bytes on page# argument
 PAGE# 4000 R9 () MOV,   \ load page# into SAMS card memory ("registers")
                0 SBZ,   \ SAMS card off

 \ return "real" address to Forth        
       USED [TOS] MOV,    \ move active window to Forth tos  
     OFFSET [TOS] ADD,    \ add the offset into the window 
                  RTWP,
\ end code 

\ test word 
CODE >REAL  ( a -- a' ) _real @@ BLWP,  NEXT, ENDCODE 

\ FAR memmory operators convert virtual address to real address 
CODE !L ( n virtual -- ) 
     _real  @@ BLWP, 
      *SP+ *TOS MOV, 
            TOS POP, 
    NEXT, 
ENDCODE     

CODE C!L   ( c virtual -- ) 
    _real @@ BLWP, 
    1 (SP) *TOS MOVB,    
            SP INCT,  \ drop virtual
            TOS POP,  \ drop c 
    NEXT, 
ENDCODE 

CODE @L ( virtual -- n)
    _real @@ BLWP, 
    *TOS TOS MOV,
    NEXT,
ENDCODE     

CODE C@L ( virtual -- c)
    _real @@ BLWP, 
    *TOS TOS MOVB,
     TOS 8   SRL, 
    NEXT,
ENDCODE     
HERE SWAP - DECIMAL . .( bytes)

 

 

And here are some Forth memory movers and fillers. 

It takes 22 seconds to fill 64K bytes.  13 seconds to fill 32K "cells". 

: SAMSFILL ( addr len char --) 
    -ROT BOUNDS DO   DUP I C!L   LOOP DROP ;

: SAMSFILLW ( addr len U -- )
    -ROT BOUNDS DO  DUP I !L  2 +LOOP DROP ;

: SAMSCMOVE ( addr1 addr2 u --) \ 8 bit move. 
    BOUNDS DO  DUP C@L  I C!L CHAR+  LOOP DROP ;   

: SAMSMOVE ( addr1 addr2 u --) \ 16 bit move. 
    BOUNDS DO  DUP @L I !L CELL+  2 +LOOP  DROP ;   

 

Edited by TheBF
fixed a comment
  • Like 2
Link to comment
Share on other sites

I was staring at the code that performed 4096 /MOD but using bit shifting.

    PAGE# 0C SRL,        \ R7 / 4096 = quot.
    PAGE#  4 SLA,        \ R7 = samspage# = quotient * 16
    PAGE#    SWPB,       

 

Given a virtual address of >1234 

We get the result   >1000

 

It seems like a lot of shifts.

 

How about we replace all that with this?  :) 

       PAGE# F000 ANDI, 

 

And if we do this then the page# in the correct byte for use by the SAMS card so I removed another SWPB in that section. 

 

So now performing  "<address> 4096 /MOD"  takes 2 instructions. 

     PAGE#  F000 ANDI,    \ R7 = page# in low byte 
    OFFSET  0FFF ANDI,    \ R6 = offset = virtaddr MOD >1000

This speeds up writing 64K bytes by 5%.

I'll take it. 

  • Like 2
Link to comment
Share on other sites

2 minutes ago, TheBF said:

I was staring at the code that performed 4096 /MOD but using bit shifting.

    PAGE# 0C SRL,        \ R7 / 4096 = quot.
    PAGE#  4 SLA,        \ R7 = samspage# = quotient * 16
    PAGE#    SWPB,       

 

Given a virtual address of >1234 

We get the result   >1000

 

It seems like a lot of shifts.

 

How about we replace all that with this?  :) 

       PAGE# F000 ANDI, 

 

And if we do this then the page# in the correct byte for use by the SAMS card so I removed another SWPB in that section. 

 

So now performing  "<address> 4096 /MOD"  takes 2 instructions. 

     PAGE#  F000 ANDI,    \ R7 = page# in low byte 
    OFFSET  0FFF ANDI,    \ R6 = offset = virtaddr MOD >1000

This speeds up writing 64K bytes by 5%.

I'll take it. 

Of course this is only valid for 1M of SAMS memory but that's what I have so I will keep until somebody complains. :)

 

  • Like 3
Link to comment
Share on other sites

I spoke too soon. (of course I did) 

I needed to add one shift back into the calculation

But at least its a smaller and faster shift. ( only 4 bits)

 

The method is to manage the SAMS page number in the low byte of the register.

This saves a SWPB instruction later when we map the page into memory. 

 

The API to set R0 in the SAMS workspace is

\ store segment value in low byte to save instructions later
: SEGMENT  ( n -- ) 16* >< SEG ! ; 

 

So  1 SEGMENT  would set the SEG variable (alias for R0)  to  >1000 

 

Then the correction in the /mod computation is this.

    PAGE#   F000 ANDI,    \ get the significant digit
    PAGE#   04    SRL,    \ shift 4 bits 
    SEG#    PAGE# ADD,    \ add the segment no. * 16 

 

  • Like 1
Link to comment
Share on other sites

So now the hard reality of dealing with SAMS memory on a byte by byte basis. 

 

Here is a FILL routine written using C!L  that writes a byte to SAMS in a 64K segment

: SAMSFILL ( addr len char --) 
    -ROT BOUNDS DO   DUP I C!L   LOOP DROP ;

It fills 64K bytes in 22.9 seconds.  the code is 16 bytes. 

 

Here is the same function written in Forth Assembler

CODE SAMS.FILL ( VirtAddr len char -- )
    TOS R3 MOV,        \ char is held in R3
        R3 SWPB,       \ do that 9900 thing
        R0 POP,        \ R0 is our down counter 
        R1 POP,        \ R1 is the virtual address
    BEGIN, 
        R1    TOS MOV, \ Virt to TOS ie: R4
        _real @@ BLWP, \ convert virtual to real in TOS 
        R3  *TOS MOVB, \ put the byte into the real address 
        R1 INC,        \ next virtual address 

        R0 DEC,        \ dec the counter   
    EQ UNTIL, 
    TOS POP, 
    NEXT, 
ENDCODE

It fills 64K bytes in 11.23 seconds 

 

If I run the normal FILL routine, filling 8K bytes, 8 times, in a DO LOOP it takes ...  1.33 seconds. :( 

 

So using SAMS by computing the window address on each byte, is at best 9X slower than expansion RAM.  

 

  • Sad 1
Link to comment
Share on other sites

The TI-99  is a cursed machine to try and optimize. 

 

I took an earlier version of this 2 window SAMS idea that doesn't use a separate workspace.

I call the "_real" word with BL. 

I incorporated the slightly improved divide code and I used the free registers R13 14 & 15. 

It is also now using registers on the 16 bit buss in the Forth workspace.

 

This version does the 64K fill operation in 9.85 seconds vs 11.23 with the BLWP version so that's a 14% improvement.

The separate workspace is great but it's a big hit when the registers are in expansion ram. :(

 

So all that to say, I think for speedy purposes you have to use SAMS as 4K chunks.

 

 

Spoiler
\ SAMS memmory access as BLOCK from Forth.  Source code   Brian Fox

NEEDS DUMP FROM DSK1.TOOLS
NEEDS MOV, FROM DSK1.ASM9900
NEEDS ELAPSE FROM DSK1.ELAPSE 

NEEDS SAMSINI  FROM DSK1.SAMSINI

HERE
\ ==========================================
\ BLOCK is the entire SAMS manager
HEX
VARIABLE USE
VARIABLE SEG  
\ set the segment 
CODE 16*    TOS 4 SLA, NEXT,  ENDCODE 

\ store segment value in low byte to save instructions later
: SEGMENT  ( n -- ) 16* >< SEG ! ; 

1 SEGMENT 

CREATE WINDOWS  2000 , 3000 ,      \ windows in Low CPU RAM
CREATE PAGES       0 ,    0 ,      \ SAMS page in the each window

4000 CONSTANT SAMS                 \ base address of registers in SAMS card 

: (R1)  R1 () ; \ syntax sugar

\ 9900 sub-routine. NOT a Forth word. 
CREATE _real ( virtual -- real_addr)
\ REGISTERS USED 
\ R0 R1 R4 R5 W 

\ perform TOS 4096 /MOD to compute offset,page# 
      TOS R5 MOV,        
      R5  R0 MOV,        \ dup in R0

      SEG @@ TOS MOV,    \ segment# to TOS

 \ manage the page numbers in the low byte to save instructions 
    R0   F000 ANDI,    \ get the significant digit
    R0   04    SRL,    \ shift 4 bits 
    R0   TOS   ADD,    \ add the segment no. * 16 

    R5    0FFF ANDI,      \ offset= virtual masked to 12 bits 
      
\ search if page# is already in 1st buffer
    TOS PAGES @@ CMP,
    EQ IF,
        TOS WINDOWS @ LI,   \ set 1st window. ~2x FASTER using LI 
        R5 TOS ADD,         \ add the offset
               RT,          \ get out 
    ENDIF,

\ search if page# is in 2nd buffer
    TOS PAGES CELL+ @@ CMP, 
    EQ IF,
         TOS WINDOWS CELL+ @ LI, \ set 2nd window,
         R5 TOS ADD,        \ add the offset
                RT,  
    ENDIF,

\ page# not in memory: Select another window 
        W    0001 LI, 
        USE @@  W XOR,  
        W  USE @@ MOV, 
        W       1 SLA,   \ W 2* is new index into PAGES & WINDOWS 

    TOS PAGES (W) MOV,   \ remember this new page#
    WINDOWS (W) R1 MOV,  \ get the window to use into R1

\ map new page into a RAM window         
         R1    0B SRL,   \ divide window by 2048 = index into SAMS registers
         R12 1E00 LI,    \ cru address of SAMS
                0 SBO,   \ SAMS card on
 \            TOS  SWPB,  \ swap bytes on page# argument
    TOS SAMS (R1) MOV,   \ load page# into SAMS card register
                0 SBZ,   \ SAMS card off

  WINDOWS (W) TOS MOV,   \ get window into TOS
           R5 TOS ADD,        \ add the offset
                  RT,
\ ------------------------------------------------------------

CODE >REAL ( virtual -- offset page#)
      _real @@ BL, 
      NEXT, 
ENDCODE 

\ Fetch and store in virtual memory. (Long addresses)
CODE !L ( n virtual -- ) 
      _real @@ BL, 
      *SP+ *TOS MOV, 
            TOS POP, 
    NEXT, 
ENDCODE     

CODE C!L   ( c virtual -- ) 
    _real @@ BL, 
    1 (SP) *TOS MOVB,    
            SP INCT, 
            TOS POP,
    NEXT, 
ENDCODE 

CODE @L ( virtural -- n)
    _real @@ BL, 
    *TOS TOS MOV,
    NEXT,
ENDCODE     

CODE C@L ( virtual -- c)
    _real @@ BL, 
    *TOS TOS MOVB,
       TOS 8 SRL, 
    NEXT,
ENDCODE     
 
: SAMSMOVE ( addr1 addr2 u --) \ 16 bit move. 
        BOUNDS
        DO 
          DUP @L I !L 
          CELL+
        2 +LOOP 
        DROP 
;   

: SAMSFILL ( addr len char -- )
       -ROT 
        BOUNDS 
        DO  
          DUP I C!L 
        LOOP 
        DROP 
; \ 20.65 seconds 

CODE SAMS.FILL ( VirtAddr len char -- )
    TOS R13 MOV,        \ char is held in R13
        R13 SWPB,       \ do that 9900 thing
        R14 POP,        \ R14 is our down counter 
        R15 POP,        \ R15 hold virtual address  
    BEGIN, 
        R15  TOS MOV,   \ Virt to TOS and auto-inc
        _real @@ BL,    \ convert virtual to real in TOS 
        R13  *TOS MOVB, \ put the byte into the real address 
        R15 INC,        \ next virtual address 

        R14 DEC,        \ dec the counter   
    EQ UNTIL, 
    TOS POP, 
    NEXT, 
ENDCODE \ 64k 9.8 SECONDS

 

 

 

 

 

  • Like 3
Link to comment
Share on other sites

As you've seen the TI is almost 100% memory constrained. Since everything including registers hits memory, you need to reduce the number of operations for best performance. Many processors have tricks to do things in unusual ways faster than the obvious way - unless it reduces the instruction count they usually don't help on the TI. DIV is the slowest opcode by far, but it's usually faster than two shifts and an add.

 

I can see two optimizations in your main loop - though the biggest one depends on an assumption that I don't know: is there any reason to expect that the virtual memory base will change inside of a block?

 

If you know that the start and end address of your copy loop will be inside the same virtual memory page (or whatever the terminology is in this case), then you will see massive gains by only converting the virtual address to physical once, outside of the loop.

The second gain would be to use MOV instead of MOVB and move two bytes at a time. This will immediately double your copy performance, but you need to deal with two things:
- odd starting address - move one byte before you start. MOV always moves to an even address, no matter what it's passed.
- odd count - after the loop you can drop one last byte with a MOVB.

I've taken a stab at it here, but I've never used the forth assembler before, so I posted the raw assembly since that may be more clear than my "Forthglish". ;)

 

Spoiler
* I am assuming I can use R0 as a scratch register

    MOV TOS,R13         * char is held in R13 (00XX) !! Assuming "TOS" defines to *reg or @address
    MOV  R13,R0         * make a copy in R0 (00XX)
    SWPB R0             * get into MSB (XX00)
    MOVB R0,R13         * MOVB R0,R13 - R13 now contains the same byte in MSB and LSB (XXXX)
    POP R14             * R14 is our down counter (!!MAGIC!!)
    POP R15             * R15 hold virtual address (!!MAGIC!!)
    MOV R15,TOS         * Virtual address to stack for function call (!!MAGIC TOS!!)
    BL @_real           * convert virtual to real in TOS (!!MAGIC TOS!!)
    MOV TOS,R15         * now it's a physical address (!!MAGIC TOS!!)

    MOV R15,R0          * need to check if its an odd address
    AND R0,1            * ANDI R0,1 - isolate the 1s bit
    JEQ STARTLOOP       * if it's zero, jump ahead to start the loop
    
    MOVB R13,*R15+      * move a single byte to ensure the address starts even
    DEC R14             * count down the byte
    JEQ FINISH          * if we're done, skip to the end
    
STARTLOOP               * startloop label - correct my syntax as needed
    DECT R14            * pre-decrement so that we overflow at the end - that way we can detect even if odd
    JLT ENDLOOP         * if we already underflowed, then there was only 1 byte left (or zero to start!), go deal with it
        
LOOP
    MOV R13,*R15+       * put TWO bytes to the real address - same cycles as MOVB, twice the data, and autoincrement in one
    DECT R14            * dec the counter !!BY TWO!!
    JGT LOOP            * if still positive, loop
    JNE ENDLOOP         * If we went negative, go check for -1

    MOV R13,*R15        * else it was exactly zero, so move once more. We don't need to inc or dec anything
    JMP FINISH          * we already know we're done
    
ENDLOOP                 * label to deal with the last byte, if necessary
    INV R14             * FFFF -> 0000, otherwise we don't care
    JNE FINISH          * skip ahead if it's not 0
    MOVB R13,*R15       * Move the last byte. No need to increment or decrement anything

FINISH                  * finish label - continue with TOS POP, NEXT...

 

 

Most of your cycles are going to the BL @_real and whatever that function is doing, so if you don't need to do it for every byte, you'll save a lot by only calling it once. And unless you're talking to hardware registers, using MOV instead of MOVB will double your copy performance instantly.

 

HTH.

(Edit: if you do have big copy blocks that might change virtual memory bases, you could do a little more setup up front, and have the code split up the memory copies into safely sized and aligned chunks. It'll still be faster. ;) )

 

 

Edited by Tursi
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

@Tursi  Oh and the magic TOS is an alias for R4. :) 

In my Forth the top of stack value is cached in R4. 

The literature says this speeds up Forth about 10%. 

I see 8 to 10 % improvement on 9900. 

Link to comment
Share on other sites

Given Tursi's "magic" comments, I guess I should give a short lesson an making "macro" instructions in Forth Assembler since my code uses them all the time. 

Forth Assembler is a secret super power of Forth IMHO. It is so intimately connected to Forth it's like programming in Forth with Registers instead of the data stack.

(In fact I wrote a version of Forth Assembler that uses Forth names rather than TI names for many instructions and it works remarkably well)

 

My personal version of the TI Forth Assembler has the following "pseudo instructions" as the TI manual call them. 

: RT,     ( -- )  R11 ** B, ;
: NOP,    ( -- )  0 JMP, ;
: NEXT,   ( -- )  R10 ** B, ;  \ CAMEL99 interpreter cached in R10 

 

Notice the these Assembler words just wrap the instructions in a "colon" definition. When these words are invoked, they "run" the assembler words. 

The magic part is that Assembler words write numbers into the Forth dictionary using the "comma" word. 

For example the NOP, above does nothing more than compile the number >1000 into the Forth dictionary memory.

 

The special part is that 0 before the JMP, word. 

JMP,  takes a signed byte on the data stack and combines it with the "JMP" instruction so the the JMP will jump forward or backward.

The 0 causes JMP to go ahead 1 cell. 

1 JMP,   will compile the number >1001 and that instruction will jump forward 2 cells.

 

So that's the theory. Other instructions require different input arguments like the registers to use and the addressing mode but it all boils down to taking inputs from the data stack and mixing those inputs with the "raw" instruction number to make the final 9900 instruction.

 

Since the input arguments are taken from the Forth data stack we can play with those arguments with normal Forth words like SWAP DUP ROT etc.

 

So here is all it took to make a pseudo-instruction to "POP" a number from the data stack into a 9900 register. 

: POP,    ( dst -- ) *SP+  SWAP MOV, ;

The destination register is placed on the data stack.   

Then in the definition we put the source argument on the stack, which is the address of the data stack held in the SP register. (SP is just an alias for R6 in Camel99)

Next a SWAP puts the two arguments in the correct order for the MOV,  instruction.  So the instruction *SP+ <dst> MOV,  will be compiled into memory when POP, is invoked.

 

Here is the reciprocal operation PUSH,  which takes two instructions on 9900.

: PUSH,   ( src -- )  SP DECT,    *SP  MOV, ;  

 

And here are the same "macros" for the other Forth stack, the return stack.

: RPUSH,  ( src -- ) RP DECT,  *RP   MOV, ;
: RPOP,   ( dst -- ) *RP+      SWAP  MOV, ;


 

I think that's pretty cool. 

 

  • Like 2
Link to comment
Share on other sites

21 hours ago, Tursi said:

I can see two optimizations in your main loop - though the biggest one depends on an assumption that I don't know:

 

is there any reason to expect that the virtual memory base will change inside of a block?

 

@Tursi I didn't answer your question.   

The thing I was trying to accomplish was making the SAMS memory within one 64K "segment" look like a contiguous chunk of memory.

With the two window concept it means you could freely move a block of memory in the segment to another address in the segment.

I had done this before for 80 bytes file records and it works great but as I have discovered the speed hit is too much for byte by byte copying/filling. 

So the answer is yes.  And now I have to change that expectation.

 

  • Like 2
Link to comment
Share on other sites

OK so after lunch and a little thinking I realize that what @Tursi has taught me is:

 

1. Do the virtual->real conversion outside the loop.  

2. Work at the cell level rather than the byte level. 

 

I have code for #2 in in a library file, but my version can be improved using ideas from Tursi's example.

 

And converting outside the loop is simplest in Forth using a word I made for testing, but now will probably be the core of the whole thing. :) 

 

: SAMS.FILL ( virtual len char --)  ROT >REAL -ROT  FILL ; 

 

Now the tricky part is that we can only fill 4K with this simple method and we run the risk of going over a page boundary so I will devise a method to handle that.

 

 

 

 

  • Like 2
Link to comment
Share on other sites

Well that took me way longer than expected. :)

 

I figured the best way to try this was to use Forth and I think it will be an acceptable solution for an expanded editor.

I have not got SAMS.CMOVE yet but SAMS.FILL is very promising and I am still using the byte by byte standard FILL word so it could go faster still.

 

I am using the SUB-PROGRAM (BLWP)  version of the SAMS manager because it's safer.  If I really need 14% more speed at that end I could change to the BL version. 

 

Here is the method:

1.  Given an (address,length) pair, find a chunk of data from the input pair with length=#bytes to the first 4K boundary

2.  Cut the input data pair by the length of that first chunk to get a "remainder" (address, length).
    (Here we can use the amazing /STRING ... again!) 

3. Loop on that until the remaining length is less than 4K

4. FILL the final chunk 

 

Here is the code: 

HEX 
1000   CONSTANT 4K
F000   CONSTANT PAGE_END 

\ compute start of next SAMS page boundary ( 1000,2000,3000 ETC)
: BOUNDARY ( addr -- addr addr') DUP PAGE_END AND   4K + ;

: CHUNK  ( addr len -- addr len addr1 n1) 
        OVER  BOUNDARY  OVER -  1FFF AND ;

\ compute the next chunk to use and what is remaining 
: NEXTCHUNK   ( addr len -- addr len addr len')  
        CHUNK  DUP>R               \ compute chunk and save length 
        2SWAP  R> /STRING  2SWAP ; \ reduce size of data by length 

HEX
: SAMS.FILL  ( virtaddress len char -- )
        >R                           \ save char on Ret. stack 
        BEGIN
            DUP                      \ test for 0 bytes 
        WHILE 
            DUP FFF U>               \ test if SIZE > 1 SAMS page 
        WHILE 
            NEXTCHUNK                \ get a chunk to fill 
            SWAP >REAL SWAP R@ FILL  
        REPEAT
            SWAP >REAL SWAP R> FILL  \ do the last chunk (partial page) 
        THEN 
;

 

And here is the performance in the video. 

This took over 11 seconds doing the virtual to real conversions on every byte. 

 

 

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   1 member

×
×
  • Create New...