+SpiceWare Posted July 12, 2020 Share Posted July 12, 2020 Added the arena Source Code Download and unzip this in your shared directory Collect3_20200712.zip ROM for reference collect3_20200712.bin Arena The arena is drawn using the increment feature of the data streams. Usually we use an increment of 1.0 but we can use smaller increments like 0.25, which will return a value from the data stream 4 times before advancing to the next value. The fractional portion of the increment is set using a byte value, so if we wanted to set 0.25 we'll times the .25 of it by 256 for 64, then we pass the values 0 and 64 like this: setIncrement(_DS_PF0_LEFT,0,64); // set the Data Stream increment to 0.25 To calculate out the fractional increment needed to stretch an arbitrary image of a certain height over a set of scanlines use: 256 * height / scanlines We'll let dasm calculate that for us: _ARENA_INCREMENTS: .byte 256 * Arena1_Height / _ARENA_SCANLINES .byte 256 * Arena2_Height / _ARENA_SCANLINES .byte 256 * Arena3_Height / _ARENA_SCANLINES .byte 256 * Arena4_Height / _ARENA_SCANLINES While we need 6 bytes of data to send to the playfield registers, we'll define the graphics using just 5 bytes per row like this: As mentioned before, the ARM has an inline barrel shifter which makes it super fast to shift bit values around, we'll utilize those to break the 5 bytes apart into the 6 bytes needed for TIA: void PrepArenaBuffers() { // This function loads the selected Arena layout into the 6 playfield buffers. // // The 40 bits for each row of arena data are stored in 5 bytes arranged like this: // byte 0 byte 1 byte 2 byte 3 byte 4 // 33333333 33222222 22221111 11111100 00000000 // 98765432 10987654 32109876 54321098 76543210 // // They need to be converted to this arrangement for the playfield datastreams: // LEFT RIGHT // PF0 PF1 PF2 PF0 PF1 PF2 // 3333---- 33333322 22222222 1111---- 11111100 00000000 // 6789---- 54321098 01234567 6789---- 54321098 01234567 int row; unsigned char byte0, byte1, byte2, byte3, byte4; unsigned char *arena = ROM + arena_graphics[mm_arena]; unsigned char *arena_pf0_left = RAM + _BUF_PF0_LEFT; unsigned char *arena_pf1_left = RAM + _BUF_PF1_LEFT; unsigned char *arena_pf2_left = RAM + _BUF_PF2_LEFT; unsigned char *arena_pf0_right = RAM + _BUF_PF0_RIGHT; unsigned char *arena_pf1_right = RAM + _BUF_PF1_RIGHT; unsigned char *arena_pf2_right = RAM + _BUF_PF2_RIGHT; for(row=0; row<arena_heights[mm_arena]; row++) { // fetch the 5 bytes for the current row byte0 = arena[row*5 + 0]; byte1 = arena[row*5 + 1]; byte2 = arena[row*5 + 2]; byte3 = arena[row*5 + 3]; byte4 = arena[row*5 + 4]; // convert the 5 bytes into the 6 needed for TIA's PFx registers arena_pf0_left[row] = BitReversal(byte0) << 4; arena_pf1_left[row] = (byte0 << 4) + (byte1 >> 4); arena_pf2_left[row] = BitReversal((byte1 << 4) + (byte2 >> 4)); arena_pf0_right[row] = BitReversal(byte2); arena_pf1_right[row] = byte3; arena_pf2_right[row] = BitReversal(byte4); } // set the color of the arena ARENA_COLOR = ColorConvert(arena_color[mm_arena]); } With a large number of arenas this will end up saving ROM. In C use << to shift left and >> to shift right. The number after the << or >> denotes how many bits to shift. So value << 4 is equivalent to 6507 code of: lda value asl asl asl asl and value >> 4 is equivalent to: lda value lsr lsr lsr lsr The helper function BitReversal takes care of the bit order needed for PF0 and PF2: unsigned int BitReversal(unsigned int value) { // value a byte with bits in the order 76543210 // return a byte with bits in the order 01234567 value = ((0xaa & value) >> 1) | ((0x55 & value) << 1); value = ((0xcc & value) >> 2) | ((0x33 & value) << 2); value = ((0xf0 & value) >> 4) | ((0x0f & value) << 4); return value; } Since we're using this function to reverse a byte you'd expect the function to be defined as unsigned char, but one of the things about the ARM is it's more efficient to use 32 bit values when possible - it results is smaller code, and faster execution. Fast Jump Fast Jump is one of the really slick features in CDFJ. It came about because of a suggestion by @ZackAttack: What this does is eliminate the use of DEY (or DEX) and BNE for the kernel loop control, and replaces it with a data stream filled with 2 byte addresses and jmp $0000. While it's only a 2 cycle savings per loop it allows us to jump to many different kernels without any 6507 overhead, which will be very useful when we add support to reposition the players mid-screen. The Kernel currently looks like this: _NORMAL_KERNEL: sta WSYNC ;--------------------------------------- lda #_DS_GRP1 ; 2 2 values from datastream pointing at _BUF_PLAYER1 sta GRP1 ; 3 3 lda #_DS_COLUP0 ; 2 7 values from datastream pointing at _BUF_COLOR0 sta COLUP0 ; 3 10 lda #_DS_COLUP1 ; 2 12 values from datastream pointing at _BUF_COLOR1 sta COLUP1 ; 3 15 lda #_DS_PF0_LEFT ; 2 17 values from datastream pointing at _BUF_PF0_LEFT sta PF0 ; 3 20 PF0L 55-22 lda #_DS_PF1_LEFT ; 2 22 values from datastream pointing at _BUF_PF1_LEFT sta PF1 ; 3 25 PF1L 66-28 lda #_DS_PF2_LEFT ; 2 27 values from datastream pointing at _BUF_PF2_LEFT sta PF2 ; 3 30 PF2L before 38 lda #_DS_GRP0 ; 2 32 values from datastream pointing at _BUF_PLAYER0 sta GRP0 ; 3 35 VDELP0 is on, so this is for next scanline lda #_DS_PF0_RIGHT ; 2 37 values from datastream pointing at _BUF_PF0_RIGHT sta PF0 ; 3 40 PF0R 28-49 lda #_DS_PF1_RIGHT ; 2 42 values from datastream pointing at _BUF_PF1_RIGHT sta PF1 ; 3 45 PF1R 39-54 lda #_DS_PF2_RIGHT ; 2 47 values from datastream pointing at _BUF_PF2_RIGHT sta PF2 ; 3 50 PF2R 50-65 jmp FASTJMP1 ; 3 53 addresses from datastream pointing at _BUF_JUMP1 _EXIT_KERNEL: ; 53 sta WSYNC ; ldx #0 ; 2 2 stx GRP0 ; 3 5 stx GRP1 ; 3 7 stx PF0 ; 3 10 stx PF1 ; 3 13 stx PF2 ; 3 16 Since we're dealing with addresses we need to use 2 byte values in the buffers. We also need to make sure they're 2 byte aligned as required by the ARM processor: align 2 ; jump addresses are word values, so must be 2 byte aligned for the ARM code _BUF_JUMP1: ds _ARENA_SCANLINES * 2 _BUF_JUMP1_EXIT: ds 2 The C code will fill _BUF_JUMP1 with the value _NORMAL_KERNEL and _BUF_JUMP1_EXIT with _EXIT_KERNEL: // set the Jump Datastream so each entry runs the NORMAL KERNEL by default // init Jump Datastream for(i=0;i<_ARENA_SCANLINES;i++) RAM_SINT[(_BUF_JUMP1 / 2) + i] = _NORMAL_KERNEL; RAM_SINT[ _BUF_JUMP1_EXIT / 2 ] = _EXIT_KERNEL; One thing to be aware of is RAM_SINT references the RAM as signed integer values, which are 2 bytes in size, while the offsets of _BUF_JUMP1 and _BUF_JUMP1_EXIT are based on byte values. So we must divide them by 2 when using RAM_SINT. 3 Link to comment Share on other sites More sharing options...
cd-w Posted July 13, 2020 Share Posted July 13, 2020 (edited) 21 hours ago, SpiceWare said: One thing to be aware of is RAM_SINT references the RAM as signed integer values, which are 2 bytes in size, while the offsets of _BUF_JUMP1 and _BUF_JUMP1_EXIT are based on byte values. So we must divide them by 2 when using RAM_SINT. Is that correct - signed int values are 4 bytes (32-bit) in length, while the jump addresses are 2 bytes (16 bits)? EDIT: I see #define RAM_SINT ((unsigned short int*)DDR), so they are indeed 16 bit, but I'm still unsure why the /2 is needed? Chris Edited July 13, 2020 by cd-w Link to comment Share on other sites More sharing options...
+SpiceWare Posted July 14, 2020 Author Share Posted July 14, 2020 5 hours ago, cd-w said: Is that correct - signed int values are 4 bytes (32-bit) in length, while the jump addresses are 2 bytes (16 bits)? EDIT: I see #define RAM_SINT ((unsigned short int*)DDR), so they are indeed 16 bit, but I'm still unsure why the /2 is needed? Chris Display Data RAM is defined as this: void* DDR = (void*)0x40000800; #define RAM ((unsigned char*)DDR) #define RAM_INT ((unsigned int*)DDR) #define RAM_SINT ((unsigned short int*)DDR) So the memory accessed when using these defines is: RAM[0] is 1 byte at 0x40000800 RAM[1] is 1 byte at 0x40000801 RAM[2] is 1 byte at 0x40000802 RAM[3] is 1 byte at 0x40000803 ... RAM_INT[0] is 4 bytes starting at 0x40000800 RAM_INT[1] is 4 bytes starting at 0x40000804 RAM_INT[2] is 4 bytes starting at 0x40000808 RAM_INT[3] is 4 bytes starting at 0x4000080C ... RAM_SINT[0] is 2 bytes starting at 0x40000800 RAM_SINT[1] is 2 bytes starting at 0x40000802 RAM_SINT[2] is 2 bytes starting at 0x40000804 RAM_SINT[3] is 2 bytes starting at 0x40000806 ... From dasm we get the location of _BUF_JUMP1 and _BUF_JUMP1_EXIT as an offset in bytes from RAM, so we need to divide the offset by 2 for RAM_SINT. If we're using RAM_INT we need to divide the offset by 4. An example of that is also in InitGameBuffers(): // Zero out the buffers used to hold the player, missile, and ball data // It's fastest to use myMemsetInt, but requires // proper alignment of the data streams (the ALIGN 4 pseudops found in the // 6507 code). Additionally the offset(_GameZeroOutStart) and // byte count(_GameZeroOutBytes) must both be divided by 4. myMemsetInt(RAM_INT + _EVERY_FRAME_ZERO_START/4, 0, _EVERY_FRAME_ZERO_COUNT/4); This is one of those things that's easy to miss - Dionoid pointed out one I missed in Part 5. 1 Link to comment Share on other sites More sharing options...
Recommended Posts