Jump to content
IGNORED

Help desperately needed: streamlining the CIN code


8bit-Dude

Recommended Posts

This is a "call to arms" for veterans.

 

I am developping a cross-platform Atari/C64 online top-down racing game (8bit-slicks).

see: http://8bit-slicks.com/ and https://github.com/8bit-Dude/8bit-Slicks

 

The C64 implementation is complete, but I have been battling the A8 for the past 3 weeks to figure out how to get good GFX/Sprites/Music/Sound.

 

I am 95% done, but there is one last thing that I am struggling mightily to achieve: streamlining the CIN code for gaming.

The CIN as ouput by Atari Interlace Studio disables: the OS ROM (mva #$fe portb), all interrupts (sei), and runs a custom NMI (mwa #NMI $fffa).

This is fine when just showing an image, but not when running a game because the timers, keyboard, joysticks... don't get updated.

 

So I would like to have the CIN code running in a way that is similar to the output of the JAG creator: One VBI to set the first line, and then switch GFX each raster line with DLIs.
Then setup the VBI like this, so that it still runs the OS VBlank code for timers, etc...:

		ldy #(<_VBI)				; install VBI
		ldx #(>_VBI)
		lda #6
		jsr SETVBV

So yeah, this is what I wanna do but I am struggling mightily to write the ASM code for it... :-(
So if one of the veterans of A8 ASM could help me achieve this, I will be soooo grateful!

 

P.S: it will be even better if the code is pure 6502 opcodes, rather than MADS macros, because then I can compile it with CC65 as part of the main program!

 

sources.zip

menu.xex

  • Like 1
Link to comment
Share on other sites

If you're running a DLI each line you probably won't save any or much CPU vs a single DLI that covers the whole visible window. If it's character mode then you have the badlines which make it even harder.

 

An extreme example of cycle saving is the Project M (Wolf3D) game engine. It uses narrow DMA mode and instead of DLIs has Pokey Timer IRQs which are tuned to a cycle on the scanline to minimize CPU wastage.

That is bitmap mode, and OS switched out to further streamline the interrupt processing.

 

With DLIs you can get similar savings. Normal OS processing is like:

- several cycles pushing PC, status to the stack, then load PC with the NMI handler address from $FFFA, can't be avoided.

- OS based code does BIT NMIST / BPL VBLANK / JMP ($200)

 

Maximum cycle saving can be had by just having the NMI code execute the DLI straight away without the test, but then you need to keep track of where you are either by soft counter or reading VCOUNT register.

Next biggests saving is delete the JMP ($200) and just put your DLI code starting there - put your VBlank handler somewhere just before the main NMI handler.

 

What you use can come down to what machines you want to be compatible with, how many cycles you think you can save etc.

Link to comment
Share on other sites

If you're just trying to get minimal VBI functionality back to use this image as a menu screen, do this to get normal OS processing back:

  • change the writes from the NMI vector at $FFFA-FFFB to just write the DLI vector into VDSLST
  • change the display list write from "dlptr" (DLISTL) to the OS shadow at SDLSTL ($0230)
  • move VBI writes to direct hardware registers into the OS shadow registers instead: SDMCTL ($022F) and COLOR0-COLOR4 ($02C4-02C8)
  • delete the PORTB writes

This will cause the DLI to be activated by the normal OS DLI path. You won't have a lot of CPU time as the DLI will occupy the CPU during the entire image, but it'll be enough for simple menus.

 

While you're at it, might as well fix all the other lameness in the output: $D402 is DLISTL and not DLPTR, $D016-D019 is COLPF0-COLPF3 and not COLOR0-3, $D01A is COLBK and not COLBAK, $D01B is GRACTL and not GTICTL, 20 is RTCLOK+2, 764 is CH, and the STA NMIST in the VBI handler should be STA NMIRES.

 

Trying to do this kind of mode through a DLI per scanline is not effective, because DLIs trigger at the start of a scanline while the change needs to occur at the end, so the DLIs will eat almost all the CPU in STA WSYNC anyway. To get significant CPU back requires using an IRQ instead, as Rybags notes.

Edited by phaeron
  • Like 2
Link to comment
Share on other sites

Guyz, thanks for the good replies already!

 

I plan to use CIN for actual gameplay, so reducing the time used by CIN is critical.
With the current code, FPS tests show my game runnning at 10-20 FPS (depend on number of players, 2-4).

If the FPS reduces dues to usage of DLIs, it is bad news....

 

Could you guyz give m specific lines of code I should use to convert this code to IRQ, while still having the OS ROM running in background for timers/keyboard/joysticks?

Link to comment
Share on other sites

Timer IRQs are poorly serviced by the OS as they're way down the list so will have totally unacceptable overhead if used more than once every several scanlines.

 

The Immediate IRQ vector is a good method - if you mask all IRQs except the timer you're using then it's guaranteed that it's the source so no further checking needed (disregarding that an inadvertant BRK execution messes that up).

 

Even despite that, you're probably still better off using a Ram based OS and taking over the hardware vector to save some cycles.

 

To get Timer IRQ triggering on an exact cycle in the scanline every time it's easiest to use 16 KHz Pokey mode (AUDCTL bit 0 =1 ) but that also means all the sound will default to that frequency base. Additionally you'd lose the use of one voice.

For the initial IRQ you want to either put Pokey in the INIT state (SKCTL=00) then back to operating state at a specific time or use STIMER (I think that should work OK).

 

I don't have a code example handy but it'd be something like...

 

 

  lda #0
  sta skctl ; Pokey into INIT
  sta audf3 ; AUDF3 = 0 for 1 scanline delay between IRQs
  sta audc3 ; AUDC3 = 0 to mute volume
  lda #1
  sta audctl ; 16 Khz base frequency for audio, uses divider of 114 instead of 28
  lda $14
waitvb
  cmp $14
  beq waitvb ; wait for a VBlank to complete
  lda #3 ; value to get SKCTL back to normal
  sta wsync ; sync to near end of scanline
  nop
  nop
; variable number of nop and time wasting instructions to get the Timer IRQ alignment that we want
  sta skctl ; Pokey back into operational state
Link to comment
Share on other sites

Your cycle usage will be as follows -

graphics DMA 40, DList 1, Refresh 9, PM Graphics 5 = 55

 

Take that from 114 per scanline leaves you 59.

You could probably use some zero page based self-modifying code for the IRQ.

Something like:

 

prior_val=*+1

lda #$xx

sta prior

eor #$c0

sta prior_val

pla

rti

 

That's 21 cycles there.

On top of that you'd want to throw in about 6 to account for jitter (waiting for current ins to finish when IRQ triggers), then another 7 (?) for IRQ pre-processing then a good few more for the IRQ routine before this.

So effectively you'd be over 40 cycles. You could claw a few back by doing stuff like self-modifying code instead of PHA/PLA. Another trick you could use is seperate routines that do the 00 and C0 values for PRIOR. And use those values as the IRQ routine address also to reduce the cycle count.

 

But realistically, the best case scenario is probably going to be something between 10-16 cycles per scanline for normal program running.

That's not entirely bad - on a 200 scanline display that can mean 2000 to 3200 cycles you'd otherwise be missing out on.

Link to comment
Share on other sites

There is a lot of good talk here, but the problem is that it is way beyond my level of understanding.

To try and make things clearer, I have attached a portion of my code.

 

In the file demo.c, I load the menu CIN image, and then have a keyboard loop.
The keyboard hit is never detected, because the OS rom is disabled.

 

If I remove the line that disables the OS rom in the CIN asm file, then the image is not displayed.
I have a real hard time understanding the connections between the various elements: OS ROM, NMI, VBI, DLI.

 

So my hope is that someone will understand what I am hoping for: something like the JAG code (see attached), that is CC65 friendly, does not jam the OS (timers/keyboard), and does not burn all CPU time.

Atari-CIN.zip

Atari-JAG.rar

Link to comment
Share on other sites

I'd recommend copy the OS to Ram, then you'd still have the services it offers.

 

If you choose to use the Pokey Timers, disable all other IRQs except the timer you use, then just overwrite the hardware IRQ vector at $FFFE/F.

Yes, you'll lose keyboard IRQ. But reading the keyboard just by the Pokey registers is pretty easy.

Link to comment
Share on other sites

Or add keyboard IRQs further down your dispatcher. Look at the Altirra OS sources for good examples of pretty much everything you could want to do with regard to IRQ servicing.

 

As I said in PM, at this point it might be sensible to get rid of the OS entirely, mapping in the ROM only when you need to perform IO. Turbo BASIC XL does this, and so does TLW and a whole lot of other stuff. A small wrapper toggles bit 0 of PORTB either side of calls to DOS, etc.

Link to comment
Share on other sites

My problem is as follow: I can get the entire game (IP65 network code, Joystick, SFX, RMT music, PMG) working with the 4 color Graphic Mode 15.

All I want now is to replace GFX 15 with CIN, as it gives me 64 colours with the same resolution.

 

But I would like to achieve this without restructuring the entire code, because the codebase is shared between C64, A8 and Apple II.

Link to comment
Share on other sites

If you're just trying to get minimal VBI functionality back to use this image as a menu screen, do this to get normal OS processing back:

  • change the writes from the NMI vector at $FFFA-FFFB to just write the DLI vector into VDSLST
  • change the display list write from "dlptr" (DLISTL) to the OS shadow at SDLSTL ($0230)
  • move VBI writes to direct hardware registers into the OS shadow registers instead: SDMCTL ($022F) and COLOR0-COLOR4 ($02C4-02C8)
  • delete the PORTB writes

This will cause the DLI to be activated by the normal OS DLI path. You won't have a lot of CPU time as the DLI will occupy the CPU during the entire image, but it'll be enough for simple menus.

 

While you're at it, might as well fix all the other lameness in the output: $D402 is DLISTL and not DLPTR, $D016-D019 is COLPF0-COLPF3 and not COLOR0-3, $D01A is COLBK and not COLBAK, $D01B is GRACTL and not GTICTL, 20 is RTCLOK+2, 764 is CH, and the STA NMIST in the VBI handler should be STA NMIRES.

 

Trying to do this kind of mode through a DLI per scanline is not effective, because DLIs trigger at the start of a scanline while the change needs to occur at the end, so the DLIs will eat almost all the CPU in STA WSYNC anyway. To get significant CPU back requires using an IRQ instead, as Rybags notes.

Hey Phaeron,

 

I think I followed your suggestion step-by-step, but I must have made a mistake somewhere, as the colours are wrong (see attached).

This is the current code:

// Atari Interlaced Studio

buf0	= $2010
buf1	= $4010

RTCLOK  = $0012
VDSLST  = $0200
SDMCTL	= $022F		;dmactl	= $d400
SDLSTL  = $0230 

COLPF0	= $02C4		;COLPF0	= $d016
COLPF1	= $02C5		;COLPF1	= $d017
COLPF2	= $02C6		;COLPF2	= $d018
COLPF3	= $02C7		;COLPF3	= $d019
COLBK	= $02C8		;COLBK	= $d01a

GRACTL	= $d01b
SKCTL	= $d20f
;PORTB	= $d301

DLISTL 	= $d402
WSYNC	= $d40a
VCOUNT	= $d40b
NMIEN	= $d40e
NMIRES	= $d40f


/*-------------------------------------------------------------------------------------------------*/

	org $80

regA	.ds 1
regX	.ds 1
regY	.ds 1
cnt	.ds 1

/*-------------------------------------------------------------------------------------------------*/

	.get 'menu.dat',-9		; palette

	org buf0
	ins 'menu.dat',0,8000

	org buf1
	ins 'menu.dat',$2800,8000

/*-------------------------------------------------------------------------------------------------*/

	.align	$100

	dlist0:	dta d'pp',$70+$80
		dta $4e,a(buf0)
		:50 dta $f,$e
		dta $f
		dta $4e,0,h(buf0+$1000)
		:44 dta $f,$e
		dta $f
		dta $41,a(dlist1)

	dlist1:	dta d'pp',$70+$80
		dta $4f,a(buf1)
		:50 dta $e,$f
		dta $e
		dta $4f,0,h(buf1+$1000)
		:44 dta $e,$f
		dta $e
		dta $41,a(dlist0)

/*-------------------------------------------------------------------------------------------------*/

main	lda:cmp:req RTCLOK+2

	;sei
	mva	#$00	NMIEN
	;mva	#$fe	PORTB

	mwa	#dlist0	SDLSTL 		;mwa	#dlist0	DLISTL 
	mwa	#dli0	vdli

	lda	#$c0
	sta	mode+1
	sta	loop+1
	
	lda	#(<dli0)	
	sta	VDSLST
	lda	#(>dli0)
	sta	VDSLST+1
	;mwa	#NMI	$fffa
	
	mva	#$c0	NMIEN
	
	lda:rne VCOUNT

wait	lda SKCTL			; press any key
	and #4
	bne wait

	lda:rne VCOUNT

	;mva	#$ff	PORTB
	mva	#$40	NMIEN
	;cli

	mva #$ff 764			; clear info about pressed key
	rts				; exit

/*-------------------------------------------------------------------------------------------------*/

	dli0:	sta regA
		stx regX

		ldx #192
	mode:	lda #$c0

	loop:	eor #$c0
		sta WSYNC
		sta GRACTL
		dex
		bne loop

		eor #$c0
		sta mode+1

		lda regA
		ldx regX
		rti

/*-------------------------------------------------------------------------------------------------*/

NMI	bit NMIRES
	bpl vbl

	jmp dli0
vdli	equ *-2

vbl	sta NMIRES
	phr

	mva #$22	SDMCTL

	mva #.get[0]	COLBK
	mva #.get[1]	COLPF0
	mva #.get[2]	COLPF1
	mva #.get[3]	COLPF2	

	plr
	rti

/*-------------------------------------------------------------------------------------------------*/

	run main

post-53448-0-00072900-1508940073_thumb.png

Link to comment
Share on other sites

Does this one look any better?

 

sources_edit.zip

 

Tested and it runs in Altirra, but I'm honestly not sure what colours I'm supposed to be seeing. Make sure you enable any artifacting or blending options you need in the emulator. :)

 

EDIT: I load up the OS colour shadow registers with the colour palette and then clear the interrupt disable bit to give the stage 2 OS VBL a chance to update the colour registers.

Edited by flashjazzcat
Link to comment
Share on other sites

Here you go:

 

sources_fixed.zip

 

The stage 2 VBI was resetting the display list pointer, upsetting the inherent display list swap. Re-enabling interrupts and waiting a couple of frames before shutting them off again fixes it.

 

Having the stage 2 VBI running while the display's on will require a different method of swapping display lists, since the OS will keep loading up the shadow register.

Edited by flashjazzcat
  • Like 1
Link to comment
Share on other sites

The stage 2 VBI was resetting the display list pointer, upsetting the inherent display list swap. Re-enabling interrupts and waiting a couple of frames before shutting them off again fixes it.

 

Having the stage 2 VBI running while the display's on will require a different method of swapping display lists, since the OS will keep loading up the shadow register.

Yeah, the colors show up correctly in that case.

But "sei" disables interrupts again, which means the OS does not refresh timers/keyboard... :-S

Link to comment
Share on other sites

OK. Un-swap the pointers at the ends of the display lists and patch into the stage 1 VBL and swap the display list vector shadows there. This would remove the issue of the stage 2 VBL resetting the display list pointers from the shadows.

 

If adopting this approach (and I'm sure there are other better solutions), you'd need to point VVBLKI at your VBI routine and end it with a JMP to the original vector (the original contents of VVBLKI). In there, you can swap the display list pointer every VBLANK.

 

EDIT: You could update the display list pointer shadow at the end of the DLI kernel, actually, which would remove the need for a custom VBI.

Edited by flashjazzcat
Link to comment
Share on other sites

WOW, this worked!!! Thanks so much Flash, you have really helped me a lot!

 

I have integrated your latest code in my demonstration and attached the sources (press space bar to go through the various screens).

 

I only have two little remaining issues:

(1) When stopping the CIN code, the screen remains stuck in graphic mode 10, do you know how I can return to default text mode?

(2) The sprites are all screwed up (black and occupying the entire screen), would you know how I can fix that?

 

I have also attached the C64 version (which can be opened in VICE), to show how the demonstration should look like.

sources.zip

demo.atr

demo.zip

Link to comment
Share on other sites

To get rid of the CIN mode, open the screen editor:

 

http://atariki.krap.pl/index.php/Otwarcie_ekranu_w_trybie_konsoli_%28GRAPHICS_0%29

 

You may have conflicting priorities with the player missile graphics. Note that the NMI is writing PRIOR every scan line (toggling between $C0 and $00), so you'll need to change it so bits 0-5 are managed the way you want them.

Link to comment
Share on other sites

To get rid of the CIN mode, open the screen editor:

 

http://atariki.krap.pl/index.php/Otwarcie_ekranu_w_trybie_konsoli_%28GRAPHICS_0%29

 

 

Hey Flash!

I tried to insert the code you recommended, but all I get is a yellowed screen and crash (See attached):

.proc StopCIN
	jsr waitvbl
	mva	#$40	nmien
gr0    
	ldx #$00        ;zamkniecie IOCB #0
	lda #$0c        ;CLOSE
	jsr ?xcio

	lda #<ename
	sta icbufa,x
	lda #>ename
	sta icbufa+1,x
	lda #$0c        ;READ/WRITE
	sta icax1,x
	lda #$00
	sta icax2,x
	lda #$03        ;OPEN
?xcio  
	sta iccmd,x
	jmp ciov
	rts
.endp

Trying to figure out the PMG issue meanwhile...

post-53448-0-01517600-1509023445_thumb.png

Edited by 8bit-Dude
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...