Known fast line algorithms ?

tebe · January 6

50 minutes ago, ClausB said:

What are these numbers?

vcount ($d40b)

ClausB · January 6

33 minutes ago, tebe said:

vcount ($d40b)

I don't understand. The number of scan lines for how many lines? How long are the lines? What slope?

ClausB · January 7

I see your improvement to SCRNP. That was my code, not original FS1. Thanks for that.

Looks like your test case is one diagonal line from 0,191 to 159,0 and you wait for VCOUNT=0, draw the line, and read VCOUNT again to measure speed, right? So, you get $49 which is 73 but VCOUNT increments every 2 scan lines so that's 146 scan lines, right? At 15,700 scan lines per second, we have 15,700/146= 108 lines per second, right? But that does not account for VBI service cycles lost, right? Or am I missing something?

tebe · January 7

SCRNP:
;	LDA SAVMSC		; Sbr: build screen ptr
;	STA SPZ
;	LDA SAVMSC+1
;	STA SPZ+1

	lda Y1

        sta     SPZ+1           ; remember value for later addition...
        ldx     #0              ; clear high-byte
        asl     @               ; * 2
        bcc     mul4            ; high-byte affected?
        ldx     #2              ; this will be the 1st high-bit soon...

mul4:   asl     @               ; * 4
        bcc     mul5            ; high-byte affected?
        inx                     ; => yes, apply to 0 high-bit
        clc                     ; prepare addition

mul5:   adc     SPZ+1           ; * 5
        bcc     mul10           ; high-byte affected?
        inx                     ; yes, correct...

mul10:  stx     SPZ+1           ; continue with classic shifting...

        asl     @               ; * 10
        rol     SPZ+1

        asl     @               ; * 20
        rol     SPZ+1

        asl     @               ; * 40
        rol     SPZ+1

	add SAVMSC
	sta SPZ

	lda SPZ+1
	adc SAVMSC+1
	sta SPZ+1

        LDA X1
        LSR @
        LSR @
        TAY			; offset into row
        LDA X1
        AND #3
        TAX
        LDA BITS,X
        STA BPZ			; bit pattern for 1st pixel
        RTS

https://github.com/cc65/cc65/blob/master/libsrc/common/mul40.s

FS1 is fast and memory efficient, if it is from 1979 it is impressive

Eru fast draw is similar (maybe he knew FS1), also allows you to use colors, you can use in GRAPHICS 8,15

Edited January 7 by tebe

tebe · January 8

A new line drawing method for the cycle savvy

http://www.quiss.org/boo/

Edited January 8 by tebe

R0ger · January 11

On 1/8/2024 at 8:25 PM, tebe said:

A new line drawing method for the cycle savvy

http://www.quiss.org/boo/

Anyone has the source code, or better explanation ? I understand the core (I think) I just don't see how it can be fast.

phaeron · January 12

5 hours ago, R0ger said:

Anyone has the source code, or better explanation ? I understand the core (I think) I just don't see how it can be fast.

If I understand the post, it's faster because you don't have to update a fractional accumulator. Instead, the X register is loaded with a slope factor value and each pixel update in the unrolled loop compares against it with a different value to determine if it should do a fractional update. The slope threshold determines what %age of the pixel updates do a +1 and the bit-reverse spreads out the +1s semi-regularly instead of clumping them at one end. So if you're drawing a 4-pixel line, you'd jump to the tail of the routine that compares X against a sequence like $00, $80, $40, $C0 and X controls whether the routine fractionally increments by +1-4 over the line.

Aside from the 6502 cycle savings, another interesting advantage of this algorithm is that the fractional calcs for each pixel are independent -- only final summation into the minor axis is dependent. This makes BRR potentially lucrative for drawing lines with the VBXE blitter.

+Stephen · January 12

21 minutes ago, phaeron said:

Aside from the 6502 cycle savings, another interesting advantage of this algorithm is that the fractional calcs for each pixel are independent -- only final summation into the minor axis is dependent. This makes BRR potentially lucrative for drawing lines with the VBXE blitter.

I still can't wrap my head around using the blitter for calculating or line drawing - I can only see the straight ahead use of a fast mem copy from a to b.

Heaven/TQA · January 12

13 hours ago, Stephen said:

I still can't wrap my head around using the blitter for calculating or line drawing - I can only see the straight ahead use of a fast mem copy from a to b.

well... remember that the blitter can ADD and SUB... think of multipass... so you can do logical operations (plus ADD, SUB, AND, OR, EXOR, Amiga has 256 minterms plus several source channels where to pull data).

then you have modulo offsets which could be the slope. or you have lookup tables which you can blit into 6502 unrolled code..... etc etc etc

R0ger · January 12

19 hours ago, phaeron said:

If I understand the post, it's faster because you don't have to update a fractional accumulator. Instead, the X register is loaded with a slope factor value and each pixel update in the unrolled loop compares against it with a different value to determine if it should do a fractional update. The slope threshold determines what %age of the pixel updates do a +1 and the bit-reverse spreads out the +1s semi-regularly instead of clumping them at one end. So if you're drawing a 4-pixel line, you'd jump to the tail of the routine that compares X against a sequence like $00, $80, $40, $C0 and X controls whether the routine fractionally increments by +1-4 over the line.

Aside from the 6502 cycle savings, another interesting advantage of this algorithm is that the fractional calcs for each pixel are independent -- only final summation into the minor axis is dependent. This makes BRR potentially lucrative for drawing lines with the VBXE blitter.

Yeah, I understand this part. But as you said, it looks like they have the whole line unrolled, so every bitreverse table line has it own code. It wouldn't be unheard of, obviously .. for a demo, that is. But then you still have to draw the pixels, solve when to move to the next byte, do other stuff .. which won't go too well with unroll. I'd look into the demo .. but it's C64, and the computation is on the disk drive, so I don't feel too brave to go into it.

phaeron · January 13

4 hours ago, R0ger said:

Yeah, I understand this part. But as you said, it looks like they have the whole line unrolled, so every bitreverse table line has it own code. It wouldn't be unheard of, obviously .. for a demo, that is. But then you still have to draw the pixels, solve when to move to the next byte, do other stuff .. which won't go too well with unroll. I'd look into the demo .. but it's C64, and the computation is on the disk drive, so I don't feel too brave to go into it.

Maybe not, if it works out. Could be arranged, for instance, so that an 8-pixel line routine draws with thresholds $E0, $60, $A0, $20, $C0, $40, $80, $00. 4-pixel line then jumps into the middle of it and uses only the $C0, $40, $80, $00 tail portion. But no way to confirm without actually trying it or mining out the code from the demo, and yeah, C64 1541 code, blech. 😁

ClausB · January 22

ClausB · January 24

Working on my TRON Recognizer animation from 1983. Originally (below) it rendered 72 frames of GR.22 bitmaps, 2K each, and played them back at 15 fps on an 800 equipped with the Axlon 128K RAMdisk. Using my ported SubLogic FS1 line plot routines, now it can draw and erase each GR.23 frame in less than 1/15 second, so no need for so much RAM. The raw speed is measured at 1152 lines per second! Of course, most are short lines, and a few of the 59 lines per frame are clipped away, and the CPU runs faster in GR.23 than in GR.24.

The original BASIC program rendered the animation over an hour or so and saved 144K of bitmaps to 2 SD floppies. An assembly program played it back on the Axlon at 15 fps with sound effects. I modified the BASIC program to save and clip line endpoint coordinates instead, only 17K. An assembly program feeds those coordinates to the FS1 plotter. A side-effect of using GR.23 is more colors, so I took advantage (above). Some work remains, such as double-buffering and sound.

Thanks to @tebe and @Rybags for inspiring some optimizations.

+Stephen · January 24

53 minutes ago, ClausB said:

Working on my TRON Recognizer animation from 1983. Originally (below) it rendered 72 frames of GR.22 bitmaps, 2K each, and played them back at 15 fps on an 800 equipped with the Axlon 128K RAMdisk. Using my ported SubLogic FS1 line plot routines, now it can draw and erase each GR.23 frame in less than 1/15 second, so no need for so much RAM. The raw speed is measured at 1152 lines per second! Of course, most are short lines, and a few of the 59 lines per frame are clipped away, and the CPU runs faster in GR.23 than in GR.24.

The original BASIC program rendered the animation over an hour or so and saved 144K of bitmaps to 2 SD floppies. An assembly program played it back on the Axlon at 15 fps with sound effects. I modified the BASIC program to save and clip line endpoint coordinates instead, only 17K. An assembly program feeds those coordinates to the FS1 plotter. A side-effect of using GR.23 is more colors, so I took advantage (above). Some work remains, such as double-buffering and sound.

Thanks to @tebe and @Rybags for inspiring some optimizations.

Looking forward to seeing the new version.

ClausB · January 28

Had some trouble after sync'ing to VCOUNT to wait for the display list to finish before clearing the bitmap for the next frame. Discovered that some frames take a bit too long for 15 fps. Was getting 14.5 on average. Had to resort to a look-up table for the *40 routine (instead of the optimizations credited above). Also had to disable the deferred VBIR with a SEI. Now it averages 15 fps, though a few frames are long and a few are short. Tested in emulation only. Please test on real hardware and report back.

Runs in 40K RAM:

TRON2.XEX

Edit: Don't forget to press START after it loads and the screen goes black.

Edited January 28 by ClausB

Teapot · January 28

NTSC 800XL w/ U1MB.

Works great in both stock and 1088K memory modes.

Now that you are using multiple colors maybe the recognizer can be red.

ClausB · January 28

58 minutes ago, Teapot said:

NTSC 800XL w/ U1MB.

Works great in both stock and 1088K memory modes.

Now that you are using multiple colors maybe the recognizer can be red.

Thanks for testing!

Yes, well, it could have been red originally. In 1983 I made it from memory (I didn't have YouTube then, not even VHS) so the shape isn't quite right either. I was lucky it was even orange, since I did it on a B&W TV!

Teapot · January 29

6 hours ago, ClausB said:

Discovered that some frames take a bit too long for 15 fps.

If you implement hidden line removal you could possibly reach 20+fps.

4 hours ago, ClausB said:

the shape isn't quite right either.

Coincidentally, I came across my copy of the novelization in a box a few days ago. Published in 1982 - "COMPLETE WITH A FABULOUS FULL-COLOR PHOTO INSERT!" None of the photos has a recognizer so not even that would have helped.

All things considered it's a great rendition.

ClausB · January 29

On 1/24/2024 at 12:23 PM, Stephen said:

Looking forward to seeing the new version.

Have you tried it on hardware? Do you see any flaws, like flashes of black lines, or uneven frame rate? Thanks.

+Stephen · January 29

51 minutes ago, ClausB said:

Have you tried it on hardware? Do you see any flaws, like flashes of black lines, or uneven frame rate? Thanks.

Not yet - sorry. I'm at a rare point in time where none of my Ataris are hooked up. I'll try ASAP.

Mariano DM · January 30

On 12/23/2023 at 7:27 AM, ClausB said:

If you calculate all the endpoints before clearing the screen and drawing the lines, then it would be easier to judge the line drawing speed. Might not even need double buffering then. FS1 does just that and gets by without a double buffer.

This is the test of the fast_draw routine. my only problem is that it scales to 160*192 to avoid 2 byte calc.
it is pretty fast though,
https://github.com/marianodominguez/8bit-samples/blob/master/cc65/simple_graph/src/atari_draw.s

Known fast line algorithms ?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members