Jump to content
IGNORED

Scrolling in bitmap modes on A8


Recommended Posts

@ Sheddy

 

I think Analmux is talking about about something like this? I wasn't quite thinking along the same lines to start with, but his idea should work out better

 

Yes, that's a bit what I mean, except, you'd need 16 kB screendata. Nice pictures you made  :D.

 

I thought about this method last weekend. It must be possible to do this in both directions, that's for sure. But I stepped off of this idea. It needs twice as much RAM.

 

I'm working on a Super Mario clone, which plays in ONE load from disk with 40 big levels, so I had to minimize the amount of RAM needed for the screen. Though it's in charmode 4, I had just about 1500 Bytes left for the screen, so I'm using a different technique, which can be applied in Antic E too.

 

It's based on some Display List hocus pocus, splitting the screen in a way you don't have to worry about the LMS bullshit. It's usable for both horizontal AND vertical scrolling.

 

You know, when you make a scroller you MUST take care of where the graphicsline BEFORE the LMS-line ends. But I don't. I just make two identical lines in different memory locations to solve the problem. So for the exceptional lines there's always one copy somewhere else.

 

The exceptional lines are not only lines before LMS but also at the line with the highest address, so the line after.... forget it, it's too complex  :D , I'll make some pictures to demonstrate this soon.

 

It's kind of a complex way, but it saves the most time AND memory.

 

-----

mux

 

Very intriguing

Look forward to seeing how this works.. :D (Instantly some Q's arise...How can you avoid having almost 2 screens worth of memory set aside without having to do a large copy at some point???) Sounds clever!

 

Hadn't really thought it all through to having double buffered software sprites...mode E - 30K - scary.

Link to comment
Share on other sites

Sheddy... the clever thing in this "2 collum method" is that you don't need to do this "copy whole screen back" once... as you are actually building 2 screens... the only thing you have to do is then reset the LMSs in the dlist... very clever... damned it's so common on nintendo machines that i would have thought on that method earlier... :D

 

check out on No$GB emulator f.e. mario world... and have a look on the tilemap... or get an intro with a softscroller... then you see that they do exactly the same like the discribed one...

 

hve

Link to comment
Share on other sites

No Sheddy & Heaven, that's not what I mean.

 

you'd just need 8kB and you have only ONE screen, and only ONE column to buffer the incoming graphics.

 

You can cleverly time software sprite rendering when you paint the highest sprites (on the upper part of the screen) first, before a certain deadline.

 

You won't need double buffering then.

 

-----

mux

Link to comment
Share on other sites

i was just refering in terms of horizontal scrolling in bitmap mode without the need of copying the whole screen at a certain point before you reset the LMSs...

 

Then we're still talking about the same thing :D. My method uses 8 kB with one screen and some DL hocus pocus (and you still don't need to copy the whole screen once in a while), instead of 16 kB and two screens for horizontal (& vertical scrolling).

 

The difference between the method Sheddy explained and mine differ in the amount of double mapped graphics. Sheddy's method uses a copy of the graphics on EVERY line (might be 192), and I use a copy of graphics only on the exceptional lines (might be 2,3 or 4), namely the lines that cause trouble. So my screen has appr. 196 * 40 bytes = appr. 8 kB, and Sheddy's screen has appr. 192*80 bytes = appr. 16 kB.

 

-----

mux

Link to comment
Share on other sites

Mine writes a new column to the left and the right of the viewing window once every four frames (scrolling at a pixel a frame).  My prototype NES scrolling routine works in almost exactly the same way, it seems to be the norm on the NES... =-)

 

Is the NES screen that flexible?

 

A nes screen is 32 chars wide, but are you saying that it has a 64 chars wide screen area in which the NES can scroll.

 

Does the NES also has a Dlist? Or is it bound to 64 chars lines?

 

Isn't the NES screen a perfect wrap-around with exactly 32 chars wide? It seems to me it has a colorbug on the borders. Look at Mario 3 when scrolling. In the right border the color/palette-RAM is refreshed.

 

-----

mux

Link to comment
Share on other sites

Mine writes a new column to the left and the right of the viewing window once every four frames (scrolling at a pixel a frame).  My prototype NES scrolling routine works in almost exactly the same way, it seems to be the norm on the NES... =-)

 

Is the NES screen that flexible?

 

A nes screen is 32 chars wide, but are you saying that it has a 64 chars wide screen area in which the NES can scroll.

 

The NES screen RAM is actually four blocks of memory with optional mirroring, so you can have screens at $2000, $2400, $2800 and $2c00 arranged in a 64 by about 60 character square. As you increment the horizontal smooth scroll, the $2000 screen moves off the left and the $2400 screen arrives from the right and something similar happens vertically.

 

Does the NES also has a Dlist? Or is it bound to 64 chars lines?

 

No Dlist (you don't get much more than an interrupt when a Vblank occurs) but it takes care of the screen for you since there are only a couple of modes. Anything that's split screen is an evil bit of timing... =-)

 

[isn't the NES screen a perfect wrap-around with exactly 32 chars wide?

 

No, although if my memory serves it's possible to set the mirrors to work that way. It may even be the default for all i remember of the (sparse) docs i've found! =-)

 

It seems to me it has a colorbug on the borders. Look at Mario 3 when scrolling. In the right border the color/palette-RAM is refreshed.

 

That's because the tiles are 8*8 pixels but the colour attributes are 16*16 pixels and a 32*32 pixel area is represented by four bit pairs in one byte(!) so, unless you're quite careful and start working in bitpairs, that colour change over happens on visible screen - Parodius has the same at the far left for example. My own routine doesn't colour shifting yet because i'm still trying to get my head around a good way to deal with it, judging by how Journey to Silius and others look there are ways to handle it cleanly.

Link to comment
Share on other sites

ok guys, here is the 1st code fragment.... it's more a riddle for you as i don't know why it distroys the displaylist and why the window is not scrolling fast...

 

i divided the screen in 2 areas so actually scrolling area is 1st half of the screen (routine stops at scanline 96)

 

in the 2nd half you can see how the collums are built... they are build out of random colors... the start collum number is based in the visible area of the screen just to see what is happening...

 

i am tired now so i tried to find the logic error and the bugs but i failed...so it is more a riddle for you guys... ;)

 

speak to you tomorrow...

 

hve/tqa

 

 

 

 

* antic e "gba like" scroll wrapper

* written by Heaven/TQA

* 

* assembled with XASM / XBOOT

*

* heaven_tqa @ hotmail.com

* 02/09/03



* known bugs the nasty '4k' boundary antic limitation causes gfx errors



gractl  equ	53277

graphp0 equ	$d00d  

sizep0	equ	53256

hposp0	equ	53248

colbk0	equ	$d01a

hscrol	equ	$d404





dliv	equ	512

dlistv	equ	560

vbiv	equ	546

nmien	equ	$d40e

wsync	equ	$d40a



si	equ	$b0

di	equ	$b2



dlist	equ	$5000	;582 bytes

dlist2	equ	$5300

coltab	equ	$5800	;pattern lookup table

vram	equ	$6010	;320x192x4 = 80x192 bytes = 15360 total







org	$4000



start	jsr	dlist_init

jsr	tab_init



jsr	wait_vbl

*	jsr	fill_vram

mva	#0	559;switch off screen dma (screen off)

mva	#3	shscrol;default scroll position

mwa	#dlist	dlistv;custom new display list

mwa	#vbl	vbiv;custom VBL

mva	#$40	nmien;enable DLI + VBL

mva	#34	559;normal screen width



loop	jsr	wait_vbl

jsr	scroll

jmp  loop



vbl	nop

vbl_end	jmp	$e45f	;back into OS routine



wait_vbl mva	#1	540;will be decremented by OS every VBL

wait0	lda  540

bne	wait0

rts



fill_vram	mwa	#vram	di	

ldx	#$3e	;15k to fill

fill0	ldy	#0

fill1	lda	53770	;random number 0-255

sta	(di),y+

bne	fill1

inc	di+1	;next page

dex

bne	fill0

rts



* this routine fills the two collums



fill_collum mwa	#vram	di

ldx	#192	;192 scanlines to fill

lda	53770

and	#3

tay

lda	coltab,y;get color pattern

sta	fc0+1

ldy	collum

fc0	lda	#0

sta  (di),y

lda	di

add	#80

sta	di

bcc	fc1

inc	di+1

fc1	dex

bne	fc0

rts



scroll	dec	shscrol	

beq	scroll0	;hardscroll?

mva	shscrol	hscrol;softscroll

rts

scroll0	mva	#4	shscrol

mva	#0	hscrol

jsr	fill_collum

jsr	move_window;move visible window

inc	collum

lda	collum

cmp	#80	;reached the boundary? (80th collum)

beq	scroll1	;reset dlist to the left boundary

rts



*copy back default dlist

scroll1	ldx	#0

mva	#39	collum

scroll2	lda	dlist2,x

sta	dlist,x

lda	dlist2+256,x

sta  dlist+256,x+

bne	scroll2	;512 bytes

scroll3	lda	dlist2+512,x

sta	dlist+512,x+

cpx	#72	;+72 = 584 length of displaylist

bcc	scroll3

rts





move_window	mwa	#dlist+4 movew+1

mwa	#dlist+5 movew2+1

ldx	#96	;192 scanlines

movew	inc	dlist+4	;add +1 on each LMS adress

beq  movew0

movew2	inc	dlist+5

movew0	lda  movew+1

add	#3

sta	movew+1

bcc	movew1

inc	movew+2

movew1	lda	movew2+1

add	#3

sta	movew2+1

bcc	movew3

inc	movew2+2

movew3	dex

bne	movew

rts



dlist_init	mwa	#vram	di

lda  #$70	;$70,$70,$70

jsr	putline

jsr	putline

jsr	putline

ldx	#192	;192 scanlines

dlinit0	lda	#$5e	;now generate the lms commands

jsr	putline

lda	di

jsr	putline

lda	di+1

jsr	putline

lda	di

add	#80	;80 bytes per scanline

sta	di

bcc	dlinit3

inc	di+1

dlinit3	dex

bne	dlinit0

lda	#$41

jsr	putline

lda	#<dlist

jsr	putline

lda	#>dlist

putline	sta	dlist

sta	dlist2	;our copy of the default dlist

inc	putline+1

inc	putline+4

beq  putline2

rts

putline2 inc	putline+2

inc	putline+5

rts



tab_init	ldx	#0

tab0	txa

and	#3

tay

lda  patttab,y

sta	coltab,x+

bne	tab0

rts



shscrol	dta	b(3)

patttab	dta	b(%00000000,%01010101,%10101010,%11111111)

collum	dta	b(20)	;collum number

[/code]

demo1.zip

Link to comment
Share on other sites

Still don't get how it could work Mux - you'll have to show us mere mortals an example!

 

OK - I'm beginning to get it....I think.... :ponder:

surely you still need LMS every line to avoid an ugly mess on the sides, rather than just a standard 40 byte screen, (one line straight after the other)? - or do we use widescreen so you don't see that? Waste some missiles to mask the sides? Or you have something more devious in mind, Mux ;)

 

So the 4K boundary problem crops up this way - but if you know which part of memory it's going to jump "incorrectly" to, you can have that place in memory prepared so it will look right. Way to go, Mux.

 

 

Nice effort Heaven - but my brain is not in debug mode today though :D And those strange psuedo op codes you use in xasm confuse my poor little 6502 head when it is working ;)

Link to comment
Share on other sites

surely you still need LMS every line to avoid an ugly mess on the sides?

 

No, just the exceptional trouble lines. Let me make some demo pics soon, I've got a big load of homework now. Expect something tomorrow.

 

It's not necessary to mask the sides. Just use a normal display, without any Missile tricks or something.

 

-----

mux

Link to comment
Share on other sites

surely you still need LMS every line to avoid an ugly mess on the sides?

 

No, just the exceptional trouble lines. Let me make some demo pics soon, I've got a big load of homework now. Expect something tomorrow.

 

It's not necessary to mask the sides. Just use a normal display, without any Missile tricks or something.

 

-----

mux

 

Thanks - No need to rush :)

Link to comment
Share on other sites

for the confused guys here:

 

mva will be assembled to

(move value)

 

lda

sta

 

mwa will be assembled to

(move word)

 

lda

sta

lda +1

sta +2

 

,x+

,y+

 

will be assembled to

 

,x

inx

,y

iny

 

add will be assembled to

 

clc

adc

 

so really no strange commands... i should show you numen source code with highly optimised XASM source (with all nasty tricks FOX can get out of XASM)... i bet you'll understand 5% ;) as me...

 

hve

Link to comment
Share on other sites

errors found...

 

check the "move window" routine... the pseudo opcode might not generate the code i thought and mess up the display list. that's mainly the reason for getting messed up...

 

next bug is putting the vram to $x010 and therefore causing 4k boundary display errors as mux pointed out...so i changed the scanline length to 128 bytes... and it seem to help a lot...still some weird pixels but now it's more visible how the whole stuff works...

 

now imagine that the "color bars" are pure antic e gfx like mario background etc... and you got it...

 

mux, can you give me kind of starting adress for the vram...to avoid these 4kblock restriction??? (i was never good at maths... ;))

 

hve/tqa

demo1.zip

Link to comment
Share on other sites

Okay, here is the demopic:

 

For simplicity consider a 3*3 charactermode display.

 

The numbers in the grid denote the linenumber, as they are mapped in memory:

 

0 is mapped at f.e. $8000

1 is mapped at $8003

2 is mapped at $8006

3 is mapped at $8009

 

line 0 plays the role of the copy of line 3.

When wrapping around, there is nothing beyond line 3, so it must have some connection to line 1, that's why line 0 is the same as line 3. You see in picture 1B how this works.

 

Picture 1A is just a start configuration.

When scrolling vertical, just move from pic 1A --> 2A --> 3A --> 1A and so on, with the lowest line not in sight to buffer new graphics on screen. When vertical buffering happens in line 0 or 3 then make a copy to line 3 or 0 respectively.

 

When scrolling horizontal, just move from pic 1A --> 1B --> 1C --> 2A --> 2B ...... --> 3C --> 1A and so on.

 

The LMS is only needed at the top of the screen, and a second LMS where line 0 has to be displayed.

 

When a screen is bigger (that is: it contains more than 4 kB), you can apply the principle again. So for every screen that is smaller than size of n*4 kB, you'd need n extra/exceptional lines. And you can replace charmode to bitmapmode, with the usual 40byte-length

 

I hope it's a bit clearer now.

 

-----

mux

Link to comment
Share on other sites

Some additional notes:

 

when the screen is actual in configuration 1B then in vertical scrolling you'd have to move from 1B --> 2B --> 3B --> 1B and over and over. Or just the other way around, when scrolling in a different direction.

 

The main thing to worry about is the computing of the buffer-addresses, for hor/vert update of screendata. When you write somewhere in line 3 then you must write to line 0 (with the same offset) at the same time. The horizontal buffer is still a column at the most right of the display.

 

I'll make some ASM-proc soon to show this in Antic 4. It will be the screenmanager for my mariogame (I just found out this method last weekend, so it's even not totally clear for me :D )

 

You see, there is never a time you need to copy the whole screen. Just (parts of) a line in vertical scrolling. You'd also only need one extra line of memory (40 extra bytes per 4kB Block you're using for the screen).

 

The minimum amount of LMSs is an advantage too, because in f.e. Heaven's routine you still need to update 100 or 200 bytes in the DL every 4th pixel.

 

@ Heaven.

 

Do you want to make the scrolling full-screen? Then why not keep a 80byte wide display?

 

As I wrote before, 4096/80 = approx. 51, so 192 lines would need 4 parts.

 

Just let every part of 51 lines start at the first address of a 4k block.

 

So:

First part:

line 0: $8000

line 1: $8050

line 2: $80A0

line 3: .....

.....

line 50: $8FA0

 

Second part:

line 51: $9000

line 52: $9050

.....

.....

etcetera.

 

-----

mux

Link to comment
Share on other sites

@mux...

 

why are complicated things sometimes simple? i am scrolling horizontal and steve wanted to have an antic e wrapper... ;) that's the reason why i am dealing with highres... :D antic e would be more ideal imho as you have to deal with less data....

 

i will split tonight the copy routines in 4 chunks to avoid the 4k limitation as you mentioned (every 51th scanline).

 

unfortunatly i seem to need 192 LMS commands or is there a way to avoid that? but i do not know how you will then work with 80 bytes scanlines...

 

if everything works smooth then we can talk about optimasations like

 

- putting copy routines into zeropage

- using 256 byte length scanlines ($c000 vram!). easier for sprite copy routine and you can have 4 screens for double buffering on the same line:

 

vram1 (0-79)

vram2 (80-119)

vram3 (120,159)

vram4 (160,199)

 

- scheduling the work more proper for less CPU usage per frame:

 

4 frames will be scrolled per hardware scrolling...so 4 full frames are free to build the next "stripes". so the "copy_collum" can be divided into 4 portions... or we even scroll full 16 pixel with hardware to gain more...

 

hve

Link to comment
Share on other sites

i guess the scrolling isnt smooth as the softscroll routine is not perfect (again ages when done the last one... ;)) and mainly right now taking too much time so it's not in sync with VBL... (that's why i haven'T put the scroller into "real" vbl...)

 

and i haven't tried it out on real hardware just in emulator...

 

hve

Link to comment
Share on other sites

unfortunatly i seem to need 192 LMS commands or is there a way to avoid that? but i do not know how you will then work with 80 bytes scanlines...

 

You can't avoid it. Only if you make 40bytes scanlines, like my scroller will allow for.

 

By the way .. sounds great about an antic4 display with 256 byte rows, but isn't that overkill?? The more bytes you're using, the slower things get.

 

By the way 2.

I played Feud last night, and indeed doesn't scroll at all. It even has a miserable animation rate of 15 frames/sec. or something :D, AND the worst part: it isn't full screen (like I seem to remember).

 

-----

mux

Link to comment
Share on other sites

i am sure that dealing with 256 bytes long scanlines gain more speed due to less calculations... think on copy sprite data f.e. which you will do in general a lot in a game... and the "overhead shadow bytes" in each scanline can be used for tables, variables, etc...

 

so it's just another kind of organisation of RAM...

 

and if you use the PRO loader of XBOOT than you have a lot of RAM as the OS is switched of...

 

ok... might be an overkill but maybe you don't need 192 scanlines maybe just 128 for your shooter...

 

but if you are using antic 4 (for your mario...) then i would even try to go for 256 bytes scanlines as you might gain cycles because of the clever moving/positioning routines...

 

vram[xpos+ypos*256] which is ideal... isn't it?

 

hve

 

ps. and 256 bytes scanlines is standard on the 7800... ;)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...