Jump to content
IGNORED

Bitmap mode.


bfollett

Recommended Posts

Scratchpad wasn't really worth the time it took to code anyway, 5 seconds off of 30 isn't a huge win. ;) Removing the EQUs at the beginning, removing the scratchpad copy loops, and renaming the functions at the end to match what the EQUs were (ie: "PLOTX" becomes "PLOT") will make the code run without scratchpad (and the overhead of the branching is barely measurable). It occurred to me after the fact I should have saved a non-scratchpad version, but it was too late by then. ;)

 

Depends upon your perspective. Competitively, five seconds is an eternity if it gives the TI the edge over the Atari or others.

Link to comment
Share on other sites

Scratchpad wasn't really worth the time it took to code anyway, 5 seconds off of 30 isn't a huge win. ;) Removing the EQUs at the beginning, removing the scratchpad copy loops, and renaming the functions at the end to match what the EQUs were (ie: "PLOTX" becomes "PLOT") will make the code run without scratchpad (and the overhead of the branching is barely measurable). It occurred to me after the fact I should have saved a non-scratchpad version, but it was too late by then. ;)

16.7% isn't worth it?

Link to comment
Share on other sites

 

 

16.7% isn't worth it?

 

Not off of 30 seconds, no. The percentage doesn't tell the whole story.

 

 

 

Competitively, five seconds is an eternity if it gives the TI the edge over the Atari or others.

 

I like the way you think. ;) But, the last time I heard was 20 minutes in BASIC - if they've done an assembly version, I'd be pretty curious to hear how it measures up. (I would expect the 6502 to have a tougher time of it). Bob, are they close? ;)

Link to comment
Share on other sites

 

Not off of 30 seconds, no. The percentage doesn't tell the whole story.

 

 

 

 

I like the way you think. ;) But, the last time I heard was 20 minutes in BASIC - if they've done an assembly version, I'd be pretty curious to hear how it measures up. (I would expect the 6502 to have a tougher time of it). Bob, are they close? ;)

Have you seen Elite on the C64 and Apple or 1337 on the Oric? Even a 1MHz CPU on an odd screen setup can do a lot of math and push a lot of pixels.

Link to comment
Share on other sites

I shaved nearly an hour off of the time in post #166 by using an integer square root routine using Newton's method ( xn+1 = xn - f(xn)/f'(xn) ) implemented in Forth Assembler. Time was 1 hour, 18 minutes, 57 seconds. The graphic is not quite as clean. I will post the code later tonight—gotta go see a grandson perform!

 

post-29677-0-79726900-1426021717_thumb.jpg

 

...lee

  • Like 2
Link to comment
Share on other sites

I thought Forth was supposed to be faster than BASIC.

 

It's a lot faster than Basic. The TI Basic version took several hours. I've just been trying to make it even faster by avoiding floating point operations as much as possible. Any floating point operations (in this case, SQR , SIN , F* , F/ , F+ ) fbForth 2.0 uses are substantially the same routines as TI Basic uses, except they don't use GPLLNK. I also have not yet implemented @senior_falcon's optimized loops that run in the reverse direction of the original problem. I may do that next before I work on an integer sine function (or just use @Tursi's table lookup).

 

...lee

Link to comment
Share on other sites

Did you consider a version that generates the image in CPU RAM?

 

No, since I felt watching it draw was part of the effect. But you're right, that would make a big difference, since we'd no longer need to read/modify/write VDP RAM, and could just directly SOCB the image in CPU.

 

Damn it, I'll have to try that now, won't I? :)

  • Like 1
Link to comment
Share on other sites

For the Archimedes Spiral Hat graphic in post #206 above, here's the Forth TMS9900 Assembler SQRT function, which makes use of Newton's method for converging on the closest integer to the square root. The first guess is simply the square itself:

 

 

 

ASM:  SQRT  ( n --- √n )
    *SP R0 MOV,     ( get square)
    R0 R1 MOV,      ( square to R1 for starting x)
    R4 CLR,         ( init counter)
    BEGIN,
        R1 *SP MOV, ( current x to stack)
        R4 20 CI,   ( enough iterations?)
    LT WHILE,
        R4 INC,     ( inc counter)
        R2  CLR,    ( clear high word for DIV,)
        R0 R3 MOV,  ( square to R3)
        R1 R2 DIV,  ( square/x)
        R1 R2 A,    ( square/x + x)
        R2 1 SRL,   ( [square/x + x]/2)
        R2 R1 C,    ( same as last x?)
        EQ IF,      
            R2 INC,     ( x + 1)
            R1 R2 MPY,  ( x^2 + x)
            R0 R3 C,    ( y > x^2 + x?)
            GT IF,
                R1 INC, ( [x+1]^2 is closer to square than x^2)
            THEN,
            R4 20 LI,   ( force exit at WHILE,)
        ELSE,
            R2 R1 MOV,  ( new x)
            R2 R2 MPY,  ( x^2)
            R2 0 CI,    ( 16-bit result?)
            EQ IF,
                R0 R3 S,    ( x^2 - square)
                R3 ABS,     ( |x^2 - square|)
                R3 2 CI,    ( close enough?)
                LT IF,
                    R4 20 LI,   ( force exit at WHILE,)
                THEN,
            THEN,
        THEN,
    REPEAT,
;ASM 

 

 

 

I will post the slightly modified code for drawing the hat in an hour or two.

 

...lee

Link to comment
Share on other sites

 

It's a lot faster than Basic. The TI Basic version took several hours. I've just been trying to make it even faster by avoiding floating point operations as much as possible. Any floating point operations (in this case, SQR , SIN , F* , F/ , F+ ) fbForth 2.0 uses are substantially the same routines as TI Basic uses, except they don't use GPLLNK. I also have not yet implemented @senior_falcon's optimized loops that run in the reverse direction of the original problem. I may do that next before I work on an integer sine function (or just use @Tursi's table lookup).

 

...lee

Just for the record. I was just trying to goad you into implementing a new version based on the revised code. :D

Link to comment
Share on other sites

I also have not yet implemented senior_falcon's optimized loops that run in the reverse direction of the original problem. I may do that next before I work on an integer sine function (or just use @Tursi's table lookup).

 

...lee

I thought I was doing well to get the time down from around 5 hours to 2:20 hours. But I would use JamesD's program in post #128 as a starting point. His program gets the time down to about 52 minutes.

Link to comment
Share on other sites

Here is the fbForth 2.0 code for the Archimedes Spiral Hat in post #206 that uses the integer square root function ( SQRT ) from post #210:

 

 

 

BASE->R                         ( save current radix)
DECIMAL                         ( set current radix to decimal)
>F 4.71238905 >F 108 F/ FCONSTANT XF    ( create floating point [FP] constant XF)
0 CONSTANT ZS                           ( create integer constant ZS)
( define IDLE to loop until <break> key tapped)
: IDLE BEGIN ?TERMINAL UNTIL ;
( Archimedes Spiral Hat Plot)
: DO_HAT  
  GRAPHICS2                     ( set up bitmap mode)
  [ HEX ] 1 83D6 ! [ DECIMAL ]  ( set screen-blank counter to never blank)
  65 -64 DO                     ( z loop)
    I I * S->F                  ( square z index and make FP)
    >F 2.84765625 F*            ( FP multiply by factor to scale square of index to 108^2)
    F->S ' ZS !                 ( convert to integer and store new zs)
    11664 ZS - SQRT             ( 108^2 - zs; take integer square root)
    DUP 1+ SWAP MINUS DO        ( set up x loop with [xl+1 -xl DO])
      I I * ZS + SQRT           ( take integer square root of xi^2 + zs)
      S->F XF F*                ( convert to FP; xt = [xi^2 + zs]^0.5 * xf)
      FDUP SIN                  (  dup xt and get sine; {stack: xt sin[xt]})
      FSWAP                     (  swap positions; {stack: sin[xt] xt})
      >F 3 F* SIN               (  sin[xt*3])
      >F 0.4 F*                 (  0.4 * sin[xt*3])
      F+ >F 56 F*               ( yy = [sin[xt*3] + 0.4 * sin[xt*3]] * 56)
      F->S                      ( convert yy to integer {stack: yy})
      90 SWAP - J +             ( y1 = 90 - yy + zi)
      I J + 128 + SWAP          ( x1 = xi + zi + 128; swap positions {stack: x1 y1})
      OVER OVER                 ( copy both {stack: x1 y1 x1 y1})
      DRAW DOT                  ( plot dot {stack: x1 y1})
      1+ OVER 191               ( prepare to erase all dots below {stack: x1 y1+1 x1 191})
      UNDRAW LINE               ( erase all dots below last dot plotted)
    LOOP                    ( x loop again)
  LOOP                  ( z loop again)
  IDLE ;                ( idle until <break> tapped)
R->BASE                 ( restore radix)  

 

 

 

...lee

Link to comment
Share on other sites

30 minutes! :-o for fbForth 2.0 for the Archimedes Spiral Hat displayed in post #206.

 

Incorporating @JamesD's ideas from post #186 into the fbForth 2.0 code in post #216 got the time for drawing the Archimedes Spiral Hat displayed in post #206 down to 30 minutes. The fbForth 2.0 code follows (code to track the time in Classic99 is also included):

 

 

 

( Reverse loop version, using SQRT---)
HEX 
0 VARIABLE CLOKSTART 20 ALLOT   ( buffer for start time)
0 VARIABLE CLOKSTOP 20 ALLOT    ( buffer for stop time)
( set up Classic99 CLOCK file PAB in a mode-safe area of VRAM)
1CD2 CLOKSTOP OVER 20 + FILE CLOK CLOK SET-PAB F-D" CLOCK"
DECIMAL                         ( set current radix to decimal)
>F 4.71238905 >F 108 F/ FCONSTANT XF    ( create FP constant XF)
0 CONSTANT ZS                           ( create integer constant ZS)
( define IDLE to loop until <break> key tapped)
: IDLE BEGIN ?TERMINAL UNTIL ;  ( routine to idle, waiting for <break>)
0 VARIABLE RR 254 ALLOT         ( max plot row array)
: initRR RR 256 192 FILL ;      ( initialize RR with 192, max row + 1)
( Do we plot pixel and update RR[x]? [must first load Assembler]---)
ASM: PLOT?  ( x y --- [x y true]|false )
    *SP R0 MOV,                 ( get y)
    R0 SWPB,                    ( get y to high byte for comparison)
    2 @(SP) R1 MOV,             ( get x)
    SP  DECT,                   ( reserve stack space for flag)
    RR @(R1) R0 CB,             ( compare bytes: RR[x] with y)
    H IF,                       ( logical >?)
        R0 RR @(R1) MOVB,       ( set new max RR[x])
        *SP SETO,               ( set true flag)
    ELSE,
        SP 4 AI,                ( pop top 2 stack positions)
        *SP CLR,                ( set false flag)
    THEN,
;ASM
( Archimedes Spiral Hat Plot)
: DO_HAT  
  GRAPHICS2                     ( set up bitmap mode)
  [ HEX ] 1 83D6 ! [ DECIMAL ]  ( set screen-blank counter to never blank)
  initRR DRAW                   ( initialize RR array and set DRAW mode)
  -65 64 DO                     ( z loop, counting down)
    I I * S->F                  ( square z index and make FP)
    >F 2.84765625 F*            ( scale square of index to 108^2)
    F->S ' ZS !                 ( convert to integer and store new zs)
    11664 ZS - SQRT             ( 108^2 - zs; take integer square root)
    1+ 0 DO                     ( set up x loop with [xl+1 0 DO])
      I I * ZS + SQRT           ( take integer square root of xi^2 + zs)
      S->F XF F*                ( convert to FP; xt = [xi^2 + zs]^0.5 * xf)
      FDUP SIN                  (  dup xt and get sine; {stack: xt sin[xt]})
      FSWAP                     (  swap positions; {stack: sin[xt] xt})
      >F 3 F* SIN               (  sin[xt*3])
      >F 0.4 F*                 (  0.4 * sin[xt*3])
      F+ >F 56 F*               ( yy = [sin[xt*3] + 0.4 * sin[xt*3]] * 56)
      F->S                      ( convert yy to integer {stack: yy})
      90 SWAP - J +             ( y1 = 90 - yy + zi {stack:y1})
      J 128 +                   ( zx = zi + 128 {stack:y1 zx})
      OVER OVER                 ( {stack:y1 zx y1 zx})
      I + SWAP                  ( x1 = zx + xi; swap y1 x1 {stack:y1 zx x1 y1})
      PLOT? IF                  ( plot pixel?)
        DOT                     ( plot dot {stack: y1 zx})
      THEN
      I - SWAP                  ( x2 = zx - xi; swap y1 x2 {stack:x2 y1})
      PLOT? IF                  ( plot mirror pixel?)
        DOT                     ( plot dot)
      THEN
    LOOP                    ( x loop again)
  -1 +LOOP                  ( z loop again)
;
( Keep track of time while drawing Archimedes Spiral Hat---)
: HAT
  OPN RD                        ( open CLOCK file and read start time)
  CLOKSTOP CLOKSTART ROT CMOVE  ( copy start time to start buffer)
  DO_HAT                        ( draw hat)
  RD DROP CLSE                  ( read stop time; close CLOCK file)
  IDLE                          ( idle, waiting for <break>)
  TEXT                          ( set TEXT mode)
  ( display start and stop times)
  CR ." Start time: " CLOKSTART 11 + 8 TYPE 
  CR ."  Stop time: " CLOKSTOP 11 + 8 TYPE CR CR
;   

 

 

 

Note that the SQRT routine from post #210 is required for the above code to work.

 

There is still a fair amount of floating point math, especially the sine function calls. I will probably include @Tursi's sine lookup table before I quit.

 

...lee

  • Like 2
Link to comment
Share on other sites

I wonder about your square root function -- it's pretty large compared to the one I ported - did you check the cycle times? You might be able to get another speed boost if you can report mine? (Not sure how noticeable it would be, and I'm not sure your number format, but if you check the link I put in the source it describes the routine... it basically takes one loop for every bit you want calculated.)

  • Like 1
Link to comment
Share on other sites

 

30 minutes! :-o for fbForth 2.0 for the Archimedes Spiral Hat displayed in post #206.

 

Incorporating @JamesD's ideas from post #186 into the fbForth 2.0 code in post #216 got the time for drawing the Archimedes Spiral Hat displayed in post #206 down to 30 minutes. The fbForth 2.0 code follows (code to track the time in Classic99 is also included):

 

FWIW, I just ported someone else's code to the TI, they weren't my ideas.

Link to comment
Share on other sites

I wonder about your square root function -- it's pretty large compared to the one I ported - did you check the cycle times? You might be able to get another speed boost if you can report mine? (Not sure how noticeable it would be, and I'm not sure your number format, but if you check the link I put in the source it describes the routine... it basically takes one loop for every bit you want calculated.)

 

Yes, I'm sure I can—and I will try it soon. Newton's Method for square root is only a first order convergence. For squares of the larger numbers (near 1082) it takes 8 or so iterations. Yours might take more iterations, but mine has one MPY and one DIV (!) per iteration.

 

I also need to implement your fixed point math. At the very least, that would smoothe out the second square root.

 

...lee

Link to comment
Share on other sites

Incidentally, I snuck over to the Atari thread to see what they were doing, challenged them to beat the assembly language time -- and they kicked my butt with a 30 second runtime. ;) So more work to do here!

Pretty much what I expected and I think they have a little more room for optimization.

Link to comment
Share on other sites

 

I also need to implement your fixed point math. At the very least, that would smoothe out the second square root.

 

...lee

That's what I'm interested to see. At that point, you're more down to what *fbForth* can do. Using the FP approach means that the Forth is really only glue to stick the machine code routines together, and, of course, the speed will ultimately be determined by those floating point routines.

Link to comment
Share on other sites

Incidentally, I snuck over to the Atari thread to see what they were doing, challenged them to beat the assembly language time -- and they kicked my butt with a 30 second runtime. ;) So more work to do here!

Surprising what those 6502 guru's can squeeze out of that little 8-bitter, huh? Chuck Peddle would be proud!

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...