help with optimizing state and palette lookup code

bigmessowires · January 21, 2023

I'm looking for help with optimizing or rewriting a key piece of code in a puzzle game that I'm working on. The game involves drawing horizontal rows of colored beans. There are 10 beans per row, with 8 possible colors for each bean, which makes 3 x 10 = 30 bits of state information per row, stored in 4 bytes of RAM with 2 extra bits unused. The three bits of state for each bean is used as an index into a ROM palette table to get the actual Atari color value to draw. The 10 color values for the current row of beans are stored in RAM, to be used by the kernel.

My code for this works, but it's super slow, and requires a whopping 8 scan lines worth of time to set up the colors for the next row of beans. Ugh. How can I do better?

My algorithm is:

get the state bits for the current row of beans into A

initialize the bean index

Loop:

save A to Temp

mask the bottom 3 bits

transfer the masked value to an index register

perform an indexed lookup into the Palette table, to find the color

store the color in the Color table entry for the current bean

restore A from Temp

shift the state bits right by 3 places

adjust the bean index for the next bean

continue Loop until all 10 beans are done

Here's the actual code for this particular section. The full code file is also attached if you want to see some colorful but boring static beans.

; RowNum (byte) - index of the next row of beans to be drawn
; State (48 bytes) - state info for each row of beans, 30 bits per row, 4 bytes
; Colors (10 bytes) - the Atari color for each bean in the row, used by kernel

; Start a new row of beans
; Set up the Colors for the row.
; Several blank lines will be drawn while this is all happening.

NormalRow       
    	lda RowNum
; each row has 4 bytes of state, so multiply RowNum by 4 for state index
        asl
        asl
        tax
; get the 4 bytes of state for this row
        lda State,X
        sta Temp0
        lda State+1,X
        sta Temp1
        lda State+2,X
        sta Temp2
        lda State+3,X

; set up Colors from the row's state
; the state info has three bits for each of the ten beans (A-J), like this:
; S0       S1       S2       S3 (accumulator)
; 0AAABBBC CCDDDEEE 0FFFGGGH HHIIIJJJ
	ldx #9		; begin with bean column 9 and work down to 0
; state of first 5 beans come from accumulator and Temp2
        jsr StoreColorFromState	; JJJ
        ror Temp2
        ror
        lsr
        lsr
        jsr StoreColorFromState	; III
        lsr
        lsr
        lsr
        jsr StoreColorFromState	; HHH
        lda Temp2
        jsr StoreColorFromState	; GGG
        lsr
        lsr
        lsr
        jsr StoreColorFromState	; FFF 
; state of second 5 beans come from Temp1 and Temp0
        lda Temp1
      	jsr StoreColorFromState	; EEE 
     	ror Temp0
        ror
      	lsr
        lsr
        jsr StoreColorFromState	; DDD
        lsr
        lsr
        lsr
        jsr StoreColorFromState	; CCC
        lda Temp0
        jsr StoreColorFromState	; BBB
        lsr
        lsr
        lsr
        jsr StoreColorFromState	; AAA

    	; all done, now continue into the kernel (not shown)...

; helper subroutine for bean coloring, it's very slow
StoreColorFromState
    	pha		; save A before masking
        and #7	; mask low 3 bits
        tay
        lda Palette,Y	; lookup into 8-entry palette table
        sta Colors,X	; save the color entry for this bean
        dex		; next bean
        pla		; restore A to its original unmasked value
        rts

; these are the Atari color values for the 8 bean types
Palette
	.byte $42, $C6, $86, $1A, $56, $36, $4E, $AC

The beans themselves are drawn as playfield pixels with a kernel that rapidly changes COLUPF, using the values in the Colors array. I couldn't get the CPU updates of COLUPF to align correctly with all 10 beans, so one of the beans is actually a player sprite that sits on top of a broken bean.

beans.dasm

+splendidnut · January 21, 2023

First thing, you're eating 12 cycles per subroutine call. Use macros and inline that code. It's best to avoid subroutine calls like this until you absolutely need to reduce the code footprint.

Next, if you want to optimize for time, easiest thing is to stop optimizing for space. While packing in 10 pieces of color information into 4 bytes saves RAM, it's at the expense of CPU cycles used for masking/shift the bits around. You could try simplifying your data packing and use nibbles instead (10 pieces of data in 5 bytes). This would help reduce the complexity of your unpacking routine. You could then eliminate all of the PHA/PLA, and just read the source data twice.

glurk · January 22, 2023

In your kernel part, you also do not need to set PF0 PF1 and PF2 on every line. Once you set them once, they stay set that way... So you really only need to set them once at the top, and maybe clear them at the bottom.

I didn't test it, but the extra cycles may allow you to do all the COLUPF changes on each line without using up a player sprite...

bigmessowires · January 22, 2023

Thanks for the great suggestions! I have a follow-up:

7 minutes ago, glurk said:

the extra cycles may allow you to do all the COLUPF changes on each line without using up a player sprite

There's actually enough CPU time to make all ten COLUPF changes on each line, but my problem was how they lined up with the playfield beans. No matter how I shifted things around, at least one of the COLUPF changes always fell inside of a visible bean, instead of in the black space between beans, so it created a two-color bean. If there's a clever solution for this, it would definitely be nice to avoid burning a player or missile just for this.

This is my first Atari programming attempt, and boy howdy does it ever require a different way of thinking! By choosing a basic puzzle game concept, I thought I was making something simple, but now I'm realizing I probably should have gone even simpler. Because I'm already running into questions like:

How can I smoothly animate a row of beans as they shift left and right, since they're playfield blocks with limited resolution?

How can I animate a bean up or down between rows?

How can I draw other players or missiles on top of the bean row, when my kernel is already almost 100 percent busy just updating playfield colors?

How can I use players/missiles/ball to cover portions of the corners of the playfield block so that the "beans" actually look round?

How can I show which row is selected?

I don't know if I'll ever get there, but the game concept is something like this: The game consists of many levels of increasing complexity. Each level has a few different rows of beans and the remaining rows are unused and not drawn. A few vertical ladders connect adjacent rows at specific points. The player can shift beans left and right in a row, or exchange the two beans at opposite ends of a ladder. The object is to unscramble the beans so that each row contains only a single color. Basically the whole game would be a fancy shift register.

I'm not sure if this would be fun by itself, but it could be dressed up with ideas like single-use ladders, wildcard beans, frozen-immovable beans, rows that can be locked and unlocked for movement, non-connected rows that always shift at the same time as each other, and other similar enhancements.

glurk · January 22, 2023

2 hours ago, bigmessowires said:

There's actually enough CPU time to make all ten COLUPF changes on each line, but my problem was how they lined up with the playfield beans. No matter how I shifted things around, at least one of the COLUPF changes always fell inside of a visible bean, instead of in the black space between beans, so it created a two-color bean. If there's a clever solution for this, it would definitely be nice to avoid burning a player or missile just for this.

Here's a demo. You can use Stella's debugger to see how I did it, but I used a RAM variable as the counter, and used X and Y registers as needed. It reads the colors from $90-$99 and makes a 10-color rainbow thing.

colors.bin

bigmessowires · January 22, 2023

THANK YOU! You were right. Following your example, I was able to make three more versions of the kernel that shift the beans -4, -8, or +4 pixels, so crude animation should be possible.

I was originally setting PF0 PF1 PF2 on every scan line, because I was planning to unroll the kernel loop and use a slightly different playfield on each line so the beans would look less blocky. But I'm not sure that's practical anyway.

I'd like to change the color on each line, to give the beans more of a color gradient shading style. I think that may require a different kernel for each color change.

I'd planned to use black-colored players and missiles in 3x repeat mode to cover the corners of the playfield blocks, giving them a more rounded shape. But now I'm thinking this whole approach might be backwards, and instead of using players/missiles to mask playfield blocks, I should just draw the beans using players. The beans are all the same shape, so maybe I could bash RESP0 to get repeated players on the same line, and update COLUP0 between each repetition? I'm not sure the timing would work. The existing kernel is just barely fast enough to load new colors from RAM and stuff them into the TIA's color register, and this hypothetical new kernel would have to do all that plus hit RESP0 too. Not enough time.

Maybe I should limit myself to a maximum of six beans on a row? That's already what my mock-up shows. Then I could use P0 and P1 in 3x repeat mode to make six beans, and I wouldn't need to hit RESPx during the line, so there should be enough time to update the colors. I could maybe even use missiles to draw additional beans, altering the missile width and position between lines to create a crude rounded shape.

glurk · January 22, 2023

I'm an 8-bit guy kinda moving over to 2600, and there are a couple truisms I'm learned about programming this infernal machine:

1> There's probably more than one way to do it. [For any given IT]

2> There's ALWAYS going to be some trade-offs.

As you've seen, 10 changes per scanline doesn't leave much wiggle room for anything else. I was going to suggest dropping PF0 and doing 8 across, but using the players and doing only 6 across might be even better. Trade-offs...

Probably, you should get your kernel figured out first and make sure it's going to be POSSIBLE to do what you envision, then work on the game logic afterwards. If you go into it with the right attitude, that it's fun and a learning experience, you can't go wrong!

Sign In

help with optimizing state and palette lookup code

Recommended Posts

bigmessowires

Link to comment

Share on other sites

+splendidnut

Link to comment

Share on other sites

glurk

Link to comment

Share on other sites

bigmessowires

Link to comment

Share on other sites

glurk

Link to comment

Share on other sites

bigmessowires

Link to comment

Share on other sites

glurk

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More