8bit-Dude Posted October 25, 2017 Share Posted October 25, 2017 This is a "call to arms" for veterans. I am developping a cross-platform Atari/C64 online top-down racing game (8bit-slicks). see: http://8bit-slicks.com/ and https://github.com/8bit-Dude/8bit-Slicks The C64 implementation is complete, but I have been battling the A8 for the past 3 weeks to figure out how to get good GFX/Sprites/Music/Sound. I am 95% done, but there is one last thing that I am struggling mightily to achieve: streamlining the CIN code for gaming. The CIN as ouput by Atari Interlace Studio disables: the OS ROM (mva #$fe portb), all interrupts (sei), and runs a custom NMI (mwa #NMI $fffa). This is fine when just showing an image, but not when running a game because the timers, keyboard, joysticks... don't get updated. So I would like to have the CIN code running in a way that is similar to the output of the JAG creator: One VBI to set the first line, and then switch GFX each raster line with DLIs.Then setup the VBI like this, so that it still runs the OS VBlank code for timers, etc...: ldy #(<_VBI) ; install VBI ldx #(>_VBI) lda #6 jsr SETVBV So yeah, this is what I wanna do but I am struggling mightily to write the ASM code for it... :-(So if one of the veterans of A8 ASM could help me achieve this, I will be soooo grateful! P.S: it will be even better if the code is pure 6502 opcodes, rather than MADS macros, because then I can compile it with CC65 as part of the main program! sources.zip menu.xex 1 Quote Link to comment Share on other sites More sharing options...
Rybags Posted October 25, 2017 Share Posted October 25, 2017 If you're running a DLI each line you probably won't save any or much CPU vs a single DLI that covers the whole visible window. If it's character mode then you have the badlines which make it even harder. An extreme example of cycle saving is the Project M (Wolf3D) game engine. It uses narrow DMA mode and instead of DLIs has Pokey Timer IRQs which are tuned to a cycle on the scanline to minimize CPU wastage. That is bitmap mode, and OS switched out to further streamline the interrupt processing. With DLIs you can get similar savings. Normal OS processing is like: - several cycles pushing PC, status to the stack, then load PC with the NMI handler address from $FFFA, can't be avoided. - OS based code does BIT NMIST / BPL VBLANK / JMP ($200) Maximum cycle saving can be had by just having the NMI code execute the DLI straight away without the test, but then you need to keep track of where you are either by soft counter or reading VCOUNT register. Next biggests saving is delete the JMP ($200) and just put your DLI code starting there - put your VBlank handler somewhere just before the main NMI handler. What you use can come down to what machines you want to be compatible with, how many cycles you think you can save etc. Quote Link to comment Share on other sites More sharing options...
phaeron Posted October 25, 2017 Share Posted October 25, 2017 (edited) If you're just trying to get minimal VBI functionality back to use this image as a menu screen, do this to get normal OS processing back: change the writes from the NMI vector at $FFFA-FFFB to just write the DLI vector into VDSLST change the display list write from "dlptr" (DLISTL) to the OS shadow at SDLSTL ($0230) move VBI writes to direct hardware registers into the OS shadow registers instead: SDMCTL ($022F) and COLOR0-COLOR4 ($02C4-02C8) delete the PORTB writes This will cause the DLI to be activated by the normal OS DLI path. You won't have a lot of CPU time as the DLI will occupy the CPU during the entire image, but it'll be enough for simple menus. While you're at it, might as well fix all the other lameness in the output: $D402 is DLISTL and not DLPTR, $D016-D019 is COLPF0-COLPF3 and not COLOR0-3, $D01A is COLBK and not COLBAK, $D01B is GRACTL and not GTICTL, 20 is RTCLOK+2, 764 is CH, and the STA NMIST in the VBI handler should be STA NMIRES. Trying to do this kind of mode through a DLI per scanline is not effective, because DLIs trigger at the start of a scanline while the change needs to occur at the end, so the DLIs will eat almost all the CPU in STA WSYNC anyway. To get significant CPU back requires using an IRQ instead, as Rybags notes. Edited October 25, 2017 by phaeron 2 Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 25, 2017 Author Share Posted October 25, 2017 Guyz, thanks for the good replies already! I plan to use CIN for actual gameplay, so reducing the time used by CIN is critical.With the current code, FPS tests show my game runnning at 10-20 FPS (depend on number of players, 2-4). If the FPS reduces dues to usage of DLIs, it is bad news.... Could you guyz give m specific lines of code I should use to convert this code to IRQ, while still having the OS ROM running in background for timers/keyboard/joysticks? Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 25, 2017 Author Share Posted October 25, 2017 P.S: Attached is an example of CIN showing the track, and just 2 players running around. demo.atr 2 Quote Link to comment Share on other sites More sharing options...
Rybags Posted October 25, 2017 Share Posted October 25, 2017 Timer IRQs are poorly serviced by the OS as they're way down the list so will have totally unacceptable overhead if used more than once every several scanlines. The Immediate IRQ vector is a good method - if you mask all IRQs except the timer you're using then it's guaranteed that it's the source so no further checking needed (disregarding that an inadvertant BRK execution messes that up). Even despite that, you're probably still better off using a Ram based OS and taking over the hardware vector to save some cycles. To get Timer IRQ triggering on an exact cycle in the scanline every time it's easiest to use 16 KHz Pokey mode (AUDCTL bit 0 =1 ) but that also means all the sound will default to that frequency base. Additionally you'd lose the use of one voice. For the initial IRQ you want to either put Pokey in the INIT state (SKCTL=00) then back to operating state at a specific time or use STIMER (I think that should work OK). I don't have a code example handy but it'd be something like... lda #0 sta skctl ; Pokey into INIT sta audf3 ; AUDF3 = 0 for 1 scanline delay between IRQs sta audc3 ; AUDC3 = 0 to mute volume lda #1 sta audctl ; 16 Khz base frequency for audio, uses divider of 114 instead of 28 lda $14 waitvb cmp $14 beq waitvb ; wait for a VBlank to complete lda #3 ; value to get SKCTL back to normal sta wsync ; sync to near end of scanline nop nop ; variable number of nop and time wasting instructions to get the Timer IRQ alignment that we want sta skctl ; Pokey back into operational state Quote Link to comment Share on other sites More sharing options...
tebe Posted October 25, 2017 Share Posted October 25, 2017 $D01B is GRACTL and not GTICTL $D01B GTIACTL, PRIOR $D01D GRACTL Quote Link to comment Share on other sites More sharing options...
Rybags Posted October 25, 2017 Share Posted October 25, 2017 Your cycle usage will be as follows - graphics DMA 40, DList 1, Refresh 9, PM Graphics 5 = 55 Take that from 114 per scanline leaves you 59. You could probably use some zero page based self-modifying code for the IRQ. Something like: prior_val=*+1 lda #$xx sta prior eor #$c0 sta prior_val pla rti That's 21 cycles there. On top of that you'd want to throw in about 6 to account for jitter (waiting for current ins to finish when IRQ triggers), then another 7 (?) for IRQ pre-processing then a good few more for the IRQ routine before this. So effectively you'd be over 40 cycles. You could claw a few back by doing stuff like self-modifying code instead of PHA/PLA. Another trick you could use is seperate routines that do the 00 and C0 values for PRIOR. And use those values as the IRQ routine address also to reduce the cycle count. But realistically, the best case scenario is probably going to be something between 10-16 cycles per scanline for normal program running. That's not entirely bad - on a 200 scanline display that can mean 2000 to 3200 cycles you'd otherwise be missing out on. Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 25, 2017 Author Share Posted October 25, 2017 There is a lot of good talk here, but the problem is that it is way beyond my level of understanding. To try and make things clearer, I have attached a portion of my code. In the file demo.c, I load the menu CIN image, and then have a keyboard loop.The keyboard hit is never detected, because the OS rom is disabled. If I remove the line that disables the OS rom in the CIN asm file, then the image is not displayed.I have a real hard time understanding the connections between the various elements: OS ROM, NMI, VBI, DLI. So my hope is that someone will understand what I am hoping for: something like the JAG code (see attached), that is CC65 friendly, does not jam the OS (timers/keyboard), and does not burn all CPU time. Atari-CIN.zip Atari-JAG.rar Quote Link to comment Share on other sites More sharing options...
Rybags Posted October 25, 2017 Share Posted October 25, 2017 I'd recommend copy the OS to Ram, then you'd still have the services it offers. If you choose to use the Pokey Timers, disable all other IRQs except the timer you use, then just overwrite the hardware IRQ vector at $FFFE/F. Yes, you'll lose keyboard IRQ. But reading the keyboard just by the Pokey registers is pretty easy. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 25, 2017 Share Posted October 25, 2017 Or add keyboard IRQs further down your dispatcher. Look at the Altirra OS sources for good examples of pretty much everything you could want to do with regard to IRQ servicing. As I said in PM, at this point it might be sensible to get rid of the OS entirely, mapping in the ROM only when you need to perform IO. Turbo BASIC XL does this, and so does TLW and a whole lot of other stuff. A small wrapper toggles bit 0 of PORTB either side of calls to DOS, etc. Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 25, 2017 Author Share Posted October 25, 2017 My problem is as follow: I can get the entire game (IP65 network code, Joystick, SFX, RMT music, PMG) working with the 4 color Graphic Mode 15. All I want now is to replace GFX 15 with CIN, as it gives me 64 colours with the same resolution. But I would like to achieve this without restructuring the entire code, because the codebase is shared between C64, A8 and Apple II. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 25, 2017 Share Posted October 25, 2017 OK: We need to do exactly what Phaeron told us in post 3, then. Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 25, 2017 Author Share Posted October 25, 2017 If you're just trying to get minimal VBI functionality back to use this image as a menu screen, do this to get normal OS processing back: change the writes from the NMI vector at $FFFA-FFFB to just write the DLI vector into VDSLST change the display list write from "dlptr" (DLISTL) to the OS shadow at SDLSTL ($0230) move VBI writes to direct hardware registers into the OS shadow registers instead: SDMCTL ($022F) and COLOR0-COLOR4 ($02C4-02C8) delete the PORTB writes This will cause the DLI to be activated by the normal OS DLI path. You won't have a lot of CPU time as the DLI will occupy the CPU during the entire image, but it'll be enough for simple menus. While you're at it, might as well fix all the other lameness in the output: $D402 is DLISTL and not DLPTR, $D016-D019 is COLPF0-COLPF3 and not COLOR0-3, $D01A is COLBK and not COLBAK, $D01B is GRACTL and not GTICTL, 20 is RTCLOK+2, 764 is CH, and the STA NMIST in the VBI handler should be STA NMIRES. Trying to do this kind of mode through a DLI per scanline is not effective, because DLIs trigger at the start of a scanline while the change needs to occur at the end, so the DLIs will eat almost all the CPU in STA WSYNC anyway. To get significant CPU back requires using an IRQ instead, as Rybags notes. Hey Phaeron, I think I followed your suggestion step-by-step, but I must have made a mistake somewhere, as the colours are wrong (see attached). This is the current code: // Atari Interlaced Studio buf0 = $2010 buf1 = $4010 RTCLOK = $0012 VDSLST = $0200 SDMCTL = $022F ;dmactl = $d400 SDLSTL = $0230 COLPF0 = $02C4 ;COLPF0 = $d016 COLPF1 = $02C5 ;COLPF1 = $d017 COLPF2 = $02C6 ;COLPF2 = $d018 COLPF3 = $02C7 ;COLPF3 = $d019 COLBK = $02C8 ;COLBK = $d01a GRACTL = $d01b SKCTL = $d20f ;PORTB = $d301 DLISTL = $d402 WSYNC = $d40a VCOUNT = $d40b NMIEN = $d40e NMIRES = $d40f /*-------------------------------------------------------------------------------------------------*/ org $80 regA .ds 1 regX .ds 1 regY .ds 1 cnt .ds 1 /*-------------------------------------------------------------------------------------------------*/ .get 'menu.dat',-9 ; palette org buf0 ins 'menu.dat',0,8000 org buf1 ins 'menu.dat',$2800,8000 /*-------------------------------------------------------------------------------------------------*/ .align $100 dlist0: dta d'pp',$70+$80 dta $4e,a(buf0) :50 dta $f,$e dta $f dta $4e,0,h(buf0+$1000) :44 dta $f,$e dta $f dta $41,a(dlist1) dlist1: dta d'pp',$70+$80 dta $4f,a(buf1) :50 dta $e,$f dta $e dta $4f,0,h(buf1+$1000) :44 dta $e,$f dta $e dta $41,a(dlist0) /*-------------------------------------------------------------------------------------------------*/ main lda:cmp:req RTCLOK+2 ;sei mva #$00 NMIEN ;mva #$fe PORTB mwa #dlist0 SDLSTL ;mwa #dlist0 DLISTL mwa #dli0 vdli lda #$c0 sta mode+1 sta loop+1 lda #(<dli0) sta VDSLST lda #(>dli0) sta VDSLST+1 ;mwa #NMI $fffa mva #$c0 NMIEN lda:rne VCOUNT wait lda SKCTL ; press any key and #4 bne wait lda:rne VCOUNT ;mva #$ff PORTB mva #$40 NMIEN ;cli mva #$ff 764 ; clear info about pressed key rts ; exit /*-------------------------------------------------------------------------------------------------*/ dli0: sta regA stx regX ldx #192 mode: lda #$c0 loop: eor #$c0 sta WSYNC sta GRACTL dex bne loop eor #$c0 sta mode+1 lda regA ldx regX rti /*-------------------------------------------------------------------------------------------------*/ NMI bit NMIRES bpl vbl jmp dli0 vdli equ *-2 vbl sta NMIRES phr mva #$22 SDMCTL mva #.get[0] COLBK mva #.get[1] COLPF0 mva #.get[2] COLPF1 mva #.get[3] COLPF2 plr rti /*-------------------------------------------------------------------------------------------------*/ run main Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 25, 2017 Share Posted October 25, 2017 (edited) Does this one look any better? sources_edit.zip Tested and it runs in Altirra, but I'm honestly not sure what colours I'm supposed to be seeing. Make sure you enable any artifacting or blending options you need in the emulator. EDIT: I load up the OS colour shadow registers with the colour palette and then clear the interrupt disable bit to give the stage 2 OS VBL a chance to update the colour registers. Edited October 25, 2017 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 25, 2017 Author Share Posted October 25, 2017 This is what it should look like. Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 25, 2017 Author Share Posted October 25, 2017 EDIT: I load up the OS colour shadow registers with the colour palette and then clear the interrupt disable bit to give the stage 2 OS VBL a chance to update the colour registers. Thx buddy. But the colours are still wrong for some reason (see my attachment above)l. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 25, 2017 Share Posted October 25, 2017 Oh I see what's wrong: It's supposed to swap display lists during the VBLANK as well. Will fix it... 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 25, 2017 Share Posted October 25, 2017 (edited) Here you go: sources_fixed.zip The stage 2 VBI was resetting the display list pointer, upsetting the inherent display list swap. Re-enabling interrupts and waiting a couple of frames before shutting them off again fixes it. Having the stage 2 VBI running while the display's on will require a different method of swapping display lists, since the OS will keep loading up the shadow register. Edited October 25, 2017 by flashjazzcat 1 Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 25, 2017 Author Share Posted October 25, 2017 The stage 2 VBI was resetting the display list pointer, upsetting the inherent display list swap. Re-enabling interrupts and waiting a couple of frames before shutting them off again fixes it. Having the stage 2 VBI running while the display's on will require a different method of swapping display lists, since the OS will keep loading up the shadow register. Yeah, the colors show up correctly in that case. But "sei" disables interrupts again, which means the OS does not refresh timers/keyboard... :-S Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 25, 2017 Share Posted October 25, 2017 (edited) OK. Un-swap the pointers at the ends of the display lists and patch into the stage 1 VBL and swap the display list vector shadows there. This would remove the issue of the stage 2 VBL resetting the display list pointers from the shadows. If adopting this approach (and I'm sure there are other better solutions), you'd need to point VVBLKI at your VBI routine and end it with a JMP to the original vector (the original contents of VVBLKI). In there, you can swap the display list pointer every VBLANK. EDIT: You could update the display list pointer shadow at the end of the DLI kernel, actually, which would remove the need for a custom VBI. Edited October 25, 2017 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 25, 2017 Share Posted October 25, 2017 Couldn't leave it be: sources_IRQs_enabled.zip SDLSTL is swapped at the end of the screen kernel so the IRQs can be left enabled. Patching the stage 1 VBL might still be a good idea for the sake of stability (since stage 2 is skipped any time CRITIC is set). 1 Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 26, 2017 Author Share Posted October 26, 2017 WOW, this worked!!! Thanks so much Flash, you have really helped me a lot! I have integrated your latest code in my demonstration and attached the sources (press space bar to go through the various screens). I only have two little remaining issues: (1) When stopping the CIN code, the screen remains stuck in graphic mode 10, do you know how I can return to default text mode? (2) The sprites are all screwed up (black and occupying the entire screen), would you know how I can fix that? I have also attached the C64 version (which can be opened in VICE), to show how the demonstration should look like. sources.zip demo.atr demo.zip Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 26, 2017 Share Posted October 26, 2017 To get rid of the CIN mode, open the screen editor: http://atariki.krap.pl/index.php/Otwarcie_ekranu_w_trybie_konsoli_%28GRAPHICS_0%29 You may have conflicting priorities with the player missile graphics. Note that the NMI is writing PRIOR every scan line (toggling between $C0 and $00), so you'll need to change it so bits 0-5 are managed the way you want them. Quote Link to comment Share on other sites More sharing options...
8bit-Dude Posted October 26, 2017 Author Share Posted October 26, 2017 (edited) To get rid of the CIN mode, open the screen editor: http://atariki.krap.pl/index.php/Otwarcie_ekranu_w_trybie_konsoli_%28GRAPHICS_0%29 Hey Flash! I tried to insert the code you recommended, but all I get is a yellowed screen and crash (See attached): .proc StopCIN jsr waitvbl mva #$40 nmien gr0 ldx #$00 ;zamkniecie IOCB #0 lda #$0c ;CLOSE jsr ?xcio lda #<ename sta icbufa,x lda #>ename sta icbufa+1,x lda #$0c ;READ/WRITE sta icax1,x lda #$00 sta icax2,x lda #$03 ;OPEN ?xcio sta iccmd,x jmp ciov rts .endp Trying to figure out the PMG issue meanwhile... Edited October 26, 2017 by 8bit-Dude Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.