Jump to content
IGNORED

Known fast line algorithms ?


R0ger

Recommended Posts

I see your improvement to SCRNP. That was my code, not original FS1. Thanks for that. 

 

Looks like your test case is one diagonal line from 0,191 to 159,0 and you wait for VCOUNT=0, draw the line, and read VCOUNT again to measure speed, right? So, you get $49 which is 73 but VCOUNT increments every 2 scan lines so that's 146 scan lines, right? At 15,700 scan lines per second, we have 15,700/146= 108 lines per second, right? But that does not account for VBI service cycles lost, right? Or am I missing something?

Link to comment
Share on other sites

SCRNP:
;	LDA SAVMSC		; Sbr: build screen ptr
;	STA SPZ
;	LDA SAVMSC+1
;	STA SPZ+1

	lda Y1

        sta     SPZ+1           ; remember value for later addition...
        ldx     #0              ; clear high-byte
        asl     @               ; * 2
        bcc     mul4            ; high-byte affected?
        ldx     #2              ; this will be the 1st high-bit soon...

mul4:   asl     @               ; * 4
        bcc     mul5            ; high-byte affected?
        inx                     ; => yes, apply to 0 high-bit
        clc                     ; prepare addition

mul5:   adc     SPZ+1           ; * 5
        bcc     mul10           ; high-byte affected?
        inx                     ; yes, correct...

mul10:  stx     SPZ+1           ; continue with classic shifting...

        asl     @               ; * 10
        rol     SPZ+1

        asl     @               ; * 20
        rol     SPZ+1

        asl     @               ; * 40
        rol     SPZ+1

	add SAVMSC
	sta SPZ

	lda SPZ+1
	adc SAVMSC+1
	sta SPZ+1

        LDA X1
        LSR @
        LSR @
        TAY			; offset into row
        LDA X1
        AND #3
        TAX
        LDA BITS,X
        STA BPZ			; bit pattern for 1st pixel
        RTS

 

https://github.com/cc65/cc65/blob/master/libsrc/common/mul40.s

 

FS1 is fast and memory efficient, if it is from 1979 it is impressive

 

Eru fast draw is similar (maybe he knew FS1), also allows you to use colors, you can use in GRAPHICS 8,15

Edited by tebe
  • Like 1
Link to comment
Share on other sites

5 hours ago, R0ger said:

Anyone has the source code, or better explanation ? I understand the core (I think) I just don't see how it can be fast.

If I understand the post, it's faster because you don't have to update a fractional accumulator. Instead, the X register is loaded with a slope factor value and each pixel update in the unrolled loop compares against it with a different value to determine if it should do a fractional update. The slope threshold determines what %age of the pixel updates do a +1 and the bit-reverse spreads out the +1s semi-regularly instead of clumping them at one end. So if you're drawing a 4-pixel line, you'd jump to the tail of the routine that compares X against a sequence like $00, $80, $40, $C0 and X controls whether the routine fractionally increments by +1-4 over the line.

 

Aside from the 6502 cycle savings, another interesting advantage of this algorithm is that the fractional calcs for each pixel are independent -- only final summation into the minor axis is dependent. This makes BRR potentially lucrative for drawing lines with the VBXE blitter.

 

  • Like 1
Link to comment
Share on other sites

21 minutes ago, phaeron said:

Aside from the 6502 cycle savings, another interesting advantage of this algorithm is that the fractional calcs for each pixel are independent -- only final summation into the minor axis is dependent. This makes BRR potentially lucrative for drawing lines with the VBXE blitter.

I still can't wrap my head around using the blitter for calculating or line drawing - I can only see the straight ahead use of a fast mem copy from a to b.

Link to comment
Share on other sites

13 hours ago, Stephen said:

I still can't wrap my head around using the blitter for calculating or line drawing - I can only see the straight ahead use of a fast mem copy from a to b.

well... remember that the blitter can ADD and SUB... think of multipass... so you can do logical operations (plus ADD, SUB, AND, OR, EXOR, Amiga has 256 minterms plus several source channels where to pull data).

 

then you have modulo offsets which could be the slope. or you have lookup tables which you can blit into 6502 unrolled code..... etc etc etc ;)

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

19 hours ago, phaeron said:

If I understand the post, it's faster because you don't have to update a fractional accumulator. Instead, the X register is loaded with a slope factor value and each pixel update in the unrolled loop compares against it with a different value to determine if it should do a fractional update. The slope threshold determines what %age of the pixel updates do a +1 and the bit-reverse spreads out the +1s semi-regularly instead of clumping them at one end. So if you're drawing a 4-pixel line, you'd jump to the tail of the routine that compares X against a sequence like $00, $80, $40, $C0 and X controls whether the routine fractionally increments by +1-4 over the line.

 

Aside from the 6502 cycle savings, another interesting advantage of this algorithm is that the fractional calcs for each pixel are independent -- only final summation into the minor axis is dependent. This makes BRR potentially lucrative for drawing lines with the VBXE blitter.

 

Yeah, I understand this part. But as you said, it looks like they have the whole line unrolled, so every bitreverse table line has it own code. It wouldn't be unheard of, obviously .. for a demo, that is. But then you still have to draw the pixels, solve when to move to the next byte, do other stuff .. which won't go too well with unroll. I'd look into the demo .. but it's C64, and the computation is on the disk drive, so I don't feel too brave to go into it.

Link to comment
Share on other sites

4 hours ago, R0ger said:

Yeah, I understand this part. But as you said, it looks like they have the whole line unrolled, so every bitreverse table line has it own code. It wouldn't be unheard of, obviously .. for a demo, that is. But then you still have to draw the pixels, solve when to move to the next byte, do other stuff .. which won't go too well with unroll. I'd look into the demo .. but it's C64, and the computation is on the disk drive, so I don't feel too brave to go into it.

Maybe not, if it works out. Could be arranged, for instance, so that an 8-pixel line routine draws with thresholds $E0, $60, $A0, $20, $C0, $40, $80, $00. 4-pixel line then jumps into the middle of it and uses only the $C0, $40, $80, $00 tail portion. But no way to confirm without actually trying it or mining out the code from the demo, and yeah, C64 1541 code, blech. 😁

 

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

Working on my TRON Recognizer animation from 1983. Originally (below) it rendered 72 frames of GR.22 bitmaps, 2K each, and played them back at 15 fps on an 800 equipped with the Axlon 128K RAMdisk. Using my ported SubLogic FS1 line plot routines, now it can draw and erase each GR.23 frame in less than 1/15 second, so no need for so much RAM. The raw speed is measured at 1152 lines per second! Of course, most are short lines, and a few of the 59 lines per frame are clipped away, and the CPU runs faster in GR.23 than in GR.24.

 

The original BASIC program rendered the animation over an hour or so and saved 144K of bitmaps to 2 SD floppies. An assembly program played it back on the Axlon at 15 fps with sound effects. I modified the BASIC program to save and clip line endpoint coordinates instead, only 17K. An assembly program feeds those coordinates to the FS1 plotter. A side-effect of using GR.23 is more colors, so I took advantage (above). Some work remains, such as double-buffering and sound.

 

Thanks to @tebe and @Rybags for inspiring some optimizations.

 

 

  • Like 4
Link to comment
Share on other sites

53 minutes ago, ClausB said:

Working on my TRON Recognizer animation from 1983. Originally (below) it rendered 72 frames of GR.22 bitmaps, 2K each, and played them back at 15 fps on an 800 equipped with the Axlon 128K RAMdisk. Using my ported SubLogic FS1 line plot routines, now it can draw and erase each GR.23 frame in less than 1/15 second, so no need for so much RAM. The raw speed is measured at 1152 lines per second! Of course, most are short lines, and a few of the 59 lines per frame are clipped away, and the CPU runs faster in GR.23 than in GR.24.

 

The original BASIC program rendered the animation over an hour or so and saved 144K of bitmaps to 2 SD floppies. An assembly program played it back on the Axlon at 15 fps with sound effects. I modified the BASIC program to save and clip line endpoint coordinates instead, only 17K. An assembly program feeds those coordinates to the FS1 plotter. A side-effect of using GR.23 is more colors, so I took advantage (above). Some work remains, such as double-buffering and sound.

 

Thanks to @tebe and @Rybags for inspiring some optimizations.

 

 

Looking forward to seeing the new version.

  • Like 1
Link to comment
Share on other sites

Had some trouble after sync'ing to VCOUNT to wait for the display list to finish before clearing the bitmap for the next frame. Discovered that some frames take a bit too long for 15 fps. Was getting 14.5 on average. Had to resort to a look-up table for the *40 routine (instead of the optimizations credited above). Also had to disable the deferred VBIR with a SEI. Now it averages 15 fps, though a few frames are long and a few are short. Tested in emulation only. Please test on real hardware and report back.

 

Runs in 40K RAM:

TRON2.XEX

 

Edit: Don't forget to press START after it loads and the screen goes black.

Edited by ClausB
  • Like 3
Link to comment
Share on other sites

58 minutes ago, Teapot said:

NTSC 800XL w/ U1MB.

 

Works great in both stock and 1088K memory modes.

 

Now that you are using multiple colors maybe the recognizer can be red.

 

Thanks for testing!

 

Yes, well, it could have been red originally. In 1983 I made it from memory (I didn't have YouTube then, not even VHS) so the shape isn't quite right either. I was lucky it was even orange, since I did it on a B&W TV!

  • Like 1
Link to comment
Share on other sites

6 hours ago, ClausB said:

Discovered that some frames take a bit too long for 15 fps.

If you implement hidden line removal you could possibly reach 20+fps.

 

4 hours ago, ClausB said:

the shape isn't quite right either.

Coincidentally, I came across my copy of the novelization in a box a few days ago. Published in 1982 - "COMPLETE WITH A FABULOUS FULL-COLOR PHOTO INSERT!"  None of the photos has a recognizer so not even that would have helped.

 

All things considered it's a great rendition.

 

  • Like 1
Link to comment
Share on other sites

51 minutes ago, ClausB said:

Have you tried it on hardware? Do you see any flaws, like flashes of black lines, or uneven frame rate? Thanks.

Not yet - sorry.  I'm at a rare point in time where none of my Ataris are hooked up.  I'll try ASAP.

  • Thanks 1
Link to comment
Share on other sites

On 12/23/2023 at 7:27 AM, ClausB said:

If you calculate all the endpoints before clearing the screen and drawing the lines, then it would be easier to judge the line drawing speed. Might not even need double buffering then. FS1 does just that and gets by without a double buffer.

This is the test of the fast_draw routine. my only problem is that it scales to 160*192 to avoid 2 byte calc.
it is pretty fast though,
https://github.com/marianodominguez/8bit-samples/blob/master/cc65/simple_graph/src/atari_draw.s

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...