+SpiceWare Posted December 19, 2019 Share Posted December 19, 2019 Made some improvements to the source code. NOTE: See the comments for a correction that needs to be made in GameVerticalBlank() Source Code Download and unzip this in your shared directory. Collect3_20191219.zip ROM for reference collect3_20191219.bin Makefile revisions The C compiler is Step 2, but its output was shown under Step 1. It now correctly shows up under Step 2 The C compiler now outputs files with a root name of armcode. These are the 3 files created in directory main/bin/. Previously the root was testarm as I had copied the initial make file from one of our test projects from back when we developed the CDFJ driver. New defines.h file Added a new C header file, which currently contains: #define MODE RAM[_MODE] #define RUN_FUNC RAM[_RUN_FUNC] #define P0_X RAM[_P0_X] #define P1_X RAM[_P1_X] #define SWCHA RAM[_SWCHA] #define SWCHB RAM[_SWCHB] #define JOY0_LEFT !(SWCHA & 0x40) #define JOY0_RIGHT !(SWCHA & 0x80) #define JOY0_UP !(SWCHA & 0x10) #define JOY0_DOWN !(SWCHA & 0x20) #define JOY0_FIRE !(INPT4 & 0x80) #define JOY1_LEFT !(SWCHA & 0x04) #define JOY1_RIGHT !(SWCHA & 0x08) #define JOY1_UP !(SWCHA & 0x01) #define JOY1_DOWN !(SWCHA & 0x02) #define JOY1_FIRE !(INPT5 & 0x80) These defines make the C code easier to read and write. As an example, this snippet from GameOverScan() in Part 3: // abort game logic if user hit SELECT to return to the menu if (RAM[_MODE] < 128) return; // left player if (!(RAM[_SWCHA] & 0x80)) // check for joystick right if (player_x[0] < 152) { player_x[0]++; player_shape[0] = _PLAYER_RIGHT; } became this in Part 5: // abort game logic if user hit SELECT to return to the menu if (MODE < 128) return; // left player if (JOY0_RIGHT) if (player_x[0] < 152) { player_x[0]++; player_shape[0] = _PLAYER_RIGHT; } Overlapped Display Data RAM The memory used for the Splash Datastreams is now reused for the Menu and Game Datastreams. The starting point is also 4 byte aligned so we can do a faster initializing using int values instead of unsigned char values, which is covered in the next section below. ;---------------------------------------- ; To save space in RAM we can share the space used by the datastream buffers ; for the Splash, Menu, and Game screens. ;---------------------------------------- align 4 ; using myMemsetInt to zero out RAM is faster than ; myMemset, but it requires the starting address to be ; 4 byte aligned OverlapDisplayDataRam: ; mark the beginning of overlapped RAM ; Splash screen datastreams _SPLASH0: ds 192 _SPLASH1: ds 192 _SPLASH2: ds 192 _SPLASH3: ds 192 echo "----",($1000 - *) , "Splash bytes of Display Data RAM left" ;---------------------------------------- ; this ORG overlaps the Menu datastreams on top of the Splash datastreams ;---------------------------------------- ORG OverlapDisplayDataRam ; Menu datastreams _MENU0: ds 192 _MENU1: ds 192 echo "----",($1000 - *) , "Menu bytes of Display Data RAM left" ;---------------------------------------- ; this ORG overlaps the Game datastreams on top of the Splash and Menu datastreams ;---------------------------------------- ORG OverlapDisplayDataRam ; Game datastreams _GameZeroOutStart: _PLAYER0: ds 192 _PLAYER1: ds 192 _COLOR0: ds 192 _COLOR1: ds 192 align 4 ; need to be 4 byte aligned to use myMemsetInt _GameZeroOutBytes=*-_GameZeroOutStart The key to making this work is to define a label, in this case OverlapDisplayDataRam, before the first Splash datastream buffer, then use ORG with that label before the first menu and first game datastreams. As covered in the final section of this post, the last line defines _GameZeroOutBytes, which is used in GameVerticalBlank() to zero out the game datastreams. The amount of Display Data RAM remaining is shown in the output after you type make. In Part 3 that was: ---- $7d bytes of RAM left (space reserved for 2 byte stack) ---- $876 bytes of Display Data RAM left ---- $530c bytes of ARM space left $876 = 2166 bytes free. In Part 4 that output becomes: ---- $7d bytes of RAM left (space reserved for 2 byte stack) ---- $cf4 Splash bytes of Display Data RAM left ---- $e74 Menu bytes of Display Data RAM left ---- $cf4 Game bytes of Display Data RAM left ---- $5360 bytes of ARM space left $cf4 = 3316 bytes free for Splash Screen $e74 = 3700 bytes free for the Menu $cf4 = 3316 bytes free for Game So overlapping the RAM currently saves 1150 bytes of RAM. This savings will only increase as the project develops. Faster Memory Fill In Part 3 we used myMemset() to zero out RAM. This was done in Initialize(): myMemset(RAM, 0, 4096); and in GameVerticalBlank(): myMemset(RAM + _PLAYER0, 0, 192); myMemset(RAM + _PLAYER1, 0, 192); myMemset(RAM + _COLOR0, 0, 192); myMemset(RAM + _COLOR1, 0, 192); myMemset() is found in defines_cdfj.h: void myMemset(unsigned char* destination, int fill, int count) { int i; for (i=0; i<count; ++i) { destination[i] = fill; } } We use myMemset() instead of the usual C memset() because the later includes a lot of overhead, which uses up precious ROM. The ARM is a 32 bit processor, and each instruction takes a single cycle, so if we make a version of myMemset that uses 32 bit int values instead of 8 bit char values the fill will take about 1/4th the time. You can find that in defines_cdfj.h(): // in theory 4x faster than myMemset(), but data must be WORD (4 byte) aligned void myMemsetInt(unsigned int* destination, int fill, int count) { int i; for (i=0; i<count; ++i) { destination[i] = fill; } } The ARM requires int values to fall on 4-byte boundaries, which is why the align 4 was added when we overlapped the Display Data RAM. The new Initialize(): myMemsetInt(RAM_INT, 0, 4096/4); and the new GameVerticalBlank(). We also merged the original 4 myMemset calls into a single call, which eliminates some overhead, and used the new label that was added when Display Data was overlapped: myMemsetInt(RAM_INT + _GameZeroOutStart, 0, _GameZeroOutBytes/4); When changing memset from char to int it's critical to make sure you divide the byte counts by 4. NOTE: see comments for a correction to this call to myMemsetInt. 2 Link to comment Share on other sites More sharing options...
+SpiceWare Posted December 19, 2019 Author Share Posted December 19, 2019 I thought Part 3 would be the last one for 2019, but managed to squeeze in a couple more entries. There won't be anymore for 2019 as the next couple of weeks will be hectic with the holidays and the wedding of nephew # 2, who proposed to his high school sweetheart a couple months ago. They accelerated the wedding plans because he's about to be deployed overseas. My folks and I will be driving up to Wisconsin in my Model 3. I'll be documenting the trip over in the Tesla Club if you're interested. I'm curious to see how a winter trip to Wisconsin compares with the trip I took over the summer. Link to comment Share on other sites More sharing options...
Dionoid Posted December 20, 2019 Share Posted December 20, 2019 Thanks for the update, Darrell! Note that 'testrom' is still in the Makefile, which might be confusing for people (it was for me in the beginning). In my .asm code, I prefixed the names of the datastream buffers with _BUF_, so it was easier for me to recognize them. ; datastream usage for Game _DS_GRP0 = DS0DATA _DS_GRP1 = DS1DATA _DS_COLUP0 = DS2DATA _DS_COLUP1 = DS3DATA ... ; Game datastream buffers _BUF_PLAYER0: ds 192 _BUF_PLAYER1: ds 192 _BUF_COLOR0: ds 192 _BUF_COLOR1: ds 192 So e.g. in my C code it is clear that we're resetting the pointers to the start of the buffers. ... setPointer(_DS_GRP0, _BUF_PLAYER0); setPointer(_DS_GRP1, _BUF_PLAYER1); ... 1 Link to comment Share on other sites More sharing options...
Dionoid Posted December 20, 2019 Share Posted December 20, 2019 Note that when clearing the ARM RAM, you also need to divide _GameZeroOutStart by 4: 1 Link to comment Share on other sites More sharing options...
+SpiceWare Posted December 20, 2019 Author Share Posted December 20, 2019 You're right. I thought about that when I made the changes, but didn't see any remnants of MENU on screen: so didn't look into it any further as I wanted to get it posted because I had a big checklist* of things to do in preparation for the trip that I needed to start on. I should have taken a look with Fixed Debug Colors turned on: because I didn't think about the zeroing out of the color datastreams would making it invisible on a black background. I've made the correction in my source: void GameVerticalBlank() { // Zero out the datastreams. It's fastest to use myMemsetInt, but requires // proper alignment of the data streams (the ALIGN 4 pseudops found in the // 6507 code). Additionally the offset(_GameZeroOutStart) and // byte count(_GameZeroOUtBytes) must both be divided by 4. myMemsetInt(RAM_INT + _GameZeroOutStart/4, 0, _GameZeroOutBytes/4); // Use Y value to position player 0 in the datastreams myMemcpy(RAM + _PLAYER0 + player_y[0], ROM + (player_shape[0] & 0x0fff) + 0x7000, 16); myMemcpy(RAM + _COLOR0 + player_y[0], ROM + (_COLOR_LEFT & 0x0fff) + 0x7000, 16); // Use Y value to position player 1 in the datastreams myMemcpy(RAM + _PLAYER1 + player_y[1], ROM + (player_shape[1] & 0x0fff) + 0x7000, 16); myMemcpy(RAM + _COLOR1 + player_y[1], ROM + (_COLOR_RIGHT & 0x0fff) + 0x7000, 16); // initialize the Data Streams for 6507 code setPointer(_DS_GRP0, _PLAYER0); setPointer(_DS_GRP1, _PLAYER1); setPointer(_DS_COLUP0, _COLOR0); setPointer(_DS_COLUP1, _COLOR1); // set the X positions of the players P0_X = player_x[0]; P1_X = player_x[1]; } * have 4 things left on the list to do tonight after work, we hit the road early tomorrow morning. One of the things on the list that we did yesterday was a dry run of the AutoSocks. My brother lives out in the country and the roads don't always get plowed right away. I got stuck one year in my S2000, the end of his road had snow covered ice with a slight slope up. He lives on a dead-end road, so getting past that was the only way to get back to the main roads. 1 Link to comment Share on other sites More sharing options...
+SpiceWare Posted December 20, 2019 Author Share Posted December 20, 2019 Oops - missed this comment at first. 3 hours ago, Dionoid said: Note that 'testrom' is still in the Makefile, which might be confusing for people (it was for me in the beginning). Good catch, I've fixed it. I don't have a good handle on makefiles - I can butcher an existing one, but wouldn't even know where to begin if I needed to make one from scratch. Way back when (covered in Part 8 and Part 9 of the unfinished DPC+ARM tutorial) we had to type a bunch of echo statements in the 6507 source that would output #define statements for anything in the 6507 code that the C code needed to know about. This is a small example of the ~350 echo statements in Draconian (RAM in Collect3 = QUEUE in Draconian): echo "#define FUNC QUEUE[",[ARMfunc]d,"]" echo "#define SWCHA QUEUE[",[ARMswcha]d,"]" echo "#define SWCHB QUEUE[",[ARMswchb]d,"]" echo "#define INPT4 QUEUE[",[ARMinpt4]d,"]" echo "#define MODE QUEUE[",[ARMmode]d,"]" echo "#define FRAME QUEUE[",[ARMframe]d,"]" ... echo "#define AUDV ((unsigned char *)( DD_BASE +",[ARMaudv0]d,"))" echo "#define AUDF ((unsigned char *)( DD_BASE +",[ARMaudf0]d,"))" echo "#define AUDC ((unsigned char *)( DD_BASE +",[ARMaudc0]d,"))" ... echo "// game screen datastreams" echo "#define DATASTREAM_SIZE ",[DATASTREAM_SIZE]d echo "#define REPOSITION_SIZE ",[REPOSITION_SIZE]d echo "#define CLEAR_DS_INT ",[ClearDatastreamInt]d echo "#define CLEAR_DS_INT_SIZE ",[ClearDatastreamIntSize/4]d echo "#define P0_DATASTREAM ",[Player0DataStream]d echo "#define P1_DATASTREAM ",[Player1DataStream]d After manually running dasm you would copy that output, paste it into defines_dasm.h, compile the C code, then run dasm again to create the ROM. Eventually @cd-w figured out how to automate the entire process by having the makefile capture those defines and automatically create defines_dasm.h: dasm $(SOURCE).asm -f3 -o$(SOURCE).bin | gawk '!x[$$0]++' | grep "#define" > $(BASE)/defines_dasm.h While working on SpiceC I figured out how to use awk and have it parse the symbol file for symbols starting with _ instead, which eliminated the requirement of manually adding all those echo statements. awk '$$0 ~ /^_/ {printf "#define %-25s 0x%s\n", $$1, $$2}' $(PROJECT).sym >> main/$(DASM_TO_C) Quote In my .asm code, I prefixed the names of the datastream buffers with _BUF_, so it was easier for me to recognize them. I like that! Have made that change as well. Both changes will be present in Part 6. 2 Link to comment Share on other sites More sharing options...
Omegamatrix Posted December 31, 2019 Share Posted December 31, 2019 Darrell, thank you so much for these threads. I've gone through them since yesterday and I got everything compiling now for CDFJ. Ironically, I was wondering about a faster clear and that was covered in this latest update. I also like Dionoid's buffer identification so I'm going to use that as well. Right now I have started tinkering by converting one of my balloon kernels from CAA, just to draw a single stationary row. Darrell, there's a missing # on line 516 of collect3.asm. I'm also wondering about the comments on lines 424 to 427 of collect3.asm. Can there be a startup in bank 5 or will CDFJ always start in bank 6? At this point I am going to go through the Draconian source to learn more about the creating music. From making Venture Reloaded I got a major appreciation of how hard it is to get anything to sound tolerable on the 2600. I'm really looking forward to using the ARM to get better sound. 2 Link to comment Share on other sites More sharing options...
+SpiceWare Posted December 31, 2019 Author Share Posted December 31, 2019 Awesome to see you've resumed work on Circus AtariAge! I'm still on vacation in Wisconsin, will check out line 516 and the comments when I return home. BUS, CDFJ, and DPC+ all startup in their last bank as all other banks have the potential to be filled with ARM code, which would make it tricky for them to contain 6507 startup routines. Other than updating AUDV0 on every scanline, Draconian won't help with creating music. I'll cover that eventually, though if you need it before I finish Collect 3 then I'll take a detour to cover music like I did for John and gamepad support. Link to comment Share on other sites More sharing options...
Omegamatrix Posted December 31, 2019 Share Posted December 31, 2019 No need to rush Darrell, I still got lots of other work to do, and I will be slowing down as I will be going back to work soon too. For right now I am just inserting placeholders where the writes will go and building the kernels around that. I am going to make them at the same cycle all throughout so that nothing sounds off. I'm no musician, and I'm always amazed when someone can listen to something and pick out what notes it is using. When I did the music for VR it was awful as I had to constantly move music up and down Octaves due to missing notes, and even then I would still have to substitute notes quite often. I actually spent weeks on it tweaking and tweaking trying to get it to sound better. It was quite painful to be honest. Now though I'm hoping that the different waveform generations (Square, Triangle, Sawtooth, Sine?) will be able to provide a better range of notes. Has the third (channel constructed by combining the volumes of the first two) been fully investigated with how the 2600 combines both channels? When doing three channels I imagine either AUDV0 and AUDV1 would be either updated every line, or AUDV0 would be updated one line, and AUDV1 would be updated on the following line. Anyhow, there are more possibilities for sure, and music overall should be a lot better because of it. All the ARM games I've heard had really good music. Link to comment Share on other sites More sharing options...
+SpiceWare Posted December 31, 2019 Author Share Posted December 31, 2019 I used a macro to update AUDV0, when I didn't have the music or speech routines in place it did a couple dummy instructions that took the same amount of time and ROM space. You can probably find that in Draconian's source. You define the waveforms, so they can be anything. The music driver in SF2 even merges waveforms on the fly, such as combining a sawtooth with a sine. In DPC+ the waveforms are 32 bytes long. That's also the default size for BUS and CDFJ, though you can make them larger or smaller to better fit your needs. The 3 voices are merged together on the fly, only need to update AUDV0. A bankswitch scheme that could create 6 voice music using both channels is possible, but one hasn't been written to support that yet. Link to comment Share on other sites More sharing options...
Omegamatrix Posted December 31, 2019 Share Posted December 31, 2019 Well that is pretty slick that only one of the audio registers is needed for 3 voices. That of course leaves the other register for sound effects such as shots of collisions which could be updated once a frame. Link to comment Share on other sites More sharing options...
Omegamatrix Posted January 1, 2020 Share Posted January 1, 2020 Darrell, are you able to incorporate FASTJMP into your collect tutorial? I'm trying to get that working for my kernels. Edit: I was able to get it working. I had a bug where I was trying to set the data but was using the wrong constant. 1 Link to comment Share on other sites More sharing options...
+SpiceWare Posted January 2, 2020 Author Share Posted January 2, 2020 Yes. Still out of town. Link to comment Share on other sites More sharing options...
Dionoid Posted January 5, 2020 Share Posted January 5, 2020 A thing to mention is that char values are *unsigned* by default on ARM/gcc (I learned that the hard way ?). So you might want to change that in your inline documentation on Variables in the file main.c Or add the compiler flag "-fsigned-char" in the Makefile, which tells the compiler to use singed chars. Looking forward to Part 6 !!! 2 Link to comment Share on other sites More sharing options...
+SpiceWare Posted January 5, 2020 Author Share Posted January 5, 2020 Thanks, I think I did that documentation when we were using still using Codesourcery, and just copied it over as I assumed Linaro was the same. However, I must have run into it with Linaro because in Draconian the 186 instances of char are all prefixed with either unsigned (167 times) or signed (19). I've updated the char section to: // signed char = 8 bit, 1 byte, range is -128 to 127 // char = 8 bit, 1 byte, range is 0 to 255 // unsigned char = 8 bit, 1 byte, range is 0 to 255 Draconian doesn't have that comment block in it, which would be why I didn't think to update it then. We returned from our trip last night. I have a bunch of stuff to catch up on this week, so most likely won't start in on Part 6 until next week. 2 Link to comment Share on other sites More sharing options...
+SpiceWare Posted January 12, 2020 Author Share Posted January 12, 2020 On 12/31/2019 at 1:44 AM, Omegamatrix said: Darrell, there's a missing # on line 516 of collect3.asm. Fixed On 12/31/2019 at 1:44 AM, Omegamatrix said: I'm also wondering about the comments on lines 424 to 427 of collect3.asm. Can there be a startup in bank 5 or will CDFJ always start in bank 6? The vectors in bank 5 shouldn't ever be needed as CDFJ always starts in bank 6. I added them anyway as a programming bug in bank 5 could result in a BRK instruction being hit ( I had this occur in Medieval Mayhem). If a BRK did occur in bank 5 these 2 instructions would be executed: sta SELECTBANK6 on line 423 <bank 6 switches in> jmp InitSystem on line 651 The jmp B5init on line 424 should never be run, I just added that instruction since I needed to use up 3 bytes so the above 2 instructions would work correctly with the bankswitch. Link to comment Share on other sites More sharing options...
+SpiceWare Posted January 12, 2020 Author Share Posted January 12, 2020 On 12/31/2019 at 1:16 PM, SpiceWare said: I used a macro to update AUDV0, when I didn't have the music or speech routines in place it did a couple dummy instructions that took the same amount of time and ROM space. You can probably find that in Draconian's source. What I'm seeing in Draconian is not correct: MAC DIGITAL_AUDIO lda #AMPLITUDE sta AUDV0 ; dec Sleep5 ENDM while dec Sleep5 takes 5 cycles, it only takes 2 bytes of ROM vs 4 so could cause code and/or data to shift, potentially causing problems. In SpiceC I'm using this: MAC DIGITAL_AUDIO nop #AMPLITUDE sta Sleep3 ENDM which uses a NOP #immediate for a 2 byte 2 cycle NOP. That compiles as opcode $80 1 Link to comment Share on other sites More sharing options...
Dionoid Posted February 5, 2020 Share Posted February 5, 2020 (edited) On 12/31/2019 at 4:48 PM, SpiceWare said: ... Other than updating AUDV0 on every scanline, Draconian won't help with creating music. I'll cover that eventually, though if you need it before I finish Collect 3 then I'll take a detour to cover music like I did for John and gamepad support. Hi @SpiceWare, I'm trying to add a 4-bit PCM audio sound-effect to my game (writing AUDV0 on every scanline). I got something working in plain 6507 assembly, and now I'm trying to use the ARM to pre-calculate the amplitudes so I can simply read them from a data-stream and write to AUDV0 (instead of calculating them on the 6507 each scanline using a 16-bit cycle register and a 16-bit pitch/delta value). However it looks like (partially?) support for that is already in the example project that you shared (referring to methods like 'setNote' and 'setWaveform' in defines_cdsf.h). But I have no idea how to actually set a note from ARM and which predefined(?) datastream to read and store in AUDV0 from assembly code. Also, I was wondering how to keep writing to AUDV0 during the time the ARM is called on VerticalBlank and OverScan. I found this forum post by you, where you say that the 6507 is being fed NOPs during the duration of an ARM subroutine being called. That makes sense, as the 6507 must be doing something while one of the ARM functions is called. And later in that same forum discussion you mention a ZP routine that can run while the ARM is still running, using the 'ldx PosObject' instruction to check if the ARM has finished yet. Do you maybe have a simple example on how to play a single note using 4-bit PCM audio, using CDJF? Maybe you already explained this in one of your earlier posts on DPC+, but I couldn't find it using the forum's search. Cheers, Dion BTW: programming games using CDJF is an amazing experience! I like how it brings me new possibilities, while still I have to fight the limitations of the '2600. Just like my 6502 assembly code, my C code also has to be highly optimized. You can't get sloppy/lazy ? Edited February 5, 2020 by Dionoid 1 Link to comment Share on other sites More sharing options...
+SpiceWare Posted February 5, 2020 Author Share Posted February 5, 2020 The datastream for audio is AMPLITUDE: lda #AMPLITUDE sta AUDV0 However, it's a little different from the other datastreams. Instead of reading from a buffer you populated in Display Data RAM, it will either: Retrieve value from a ROM buffer* that's packed with 2 samples per byte (speech in Draconian) Created value on the fly for 3 voice music (music in Mappy) I have a music example somewhere, will track it down and get it to you. In Collect 3 the ARM code is triggered via: ldx #$FF stx CALLFN Change it to this to have AUDV0 updated once per scanline while the ARM code is running. ldx #$FE stx CALLFN * I think it might even work if the buffer is in RAM, which would allow you to manipulate them. 1 Link to comment Share on other sites More sharing options...
Recommended Posts