Pointer-setting Optimization

+Karl G · November 11, 2019

I am working on an alternate kernel for my tile-based game, allowing to switch between them with the Color/B&W switch. Right now, I have a 96-pixel kernel that combines the player graphics with an async playfield under certain tiles to provide an accent color for things like water tiles. I am trying to add an optional flickerblinds kernel that would lose the playfield background, but would reduce the eyestrain on many displays.

Anyway, I can't seem to update all 12 pointers in 3 scanlines in-between rows, and I wanted to see if anyone had any clever solutions that I hadn't considered.

Right now, my game reads the currently-visible portion of the current map into superchip RAM, and the display kernel reads the high and low address for the associated tile graphic from the table for the current map, and sets the associated pointers accordingly for each row of tiles. I have enough time to do this with 6 tiles + playfield within 3 scanlines, but 12 pointers takes up too many cycles. Here's some code showing how I set each pointer in-between rows:

	ldy sc_read+0,x             ; 4     (14)
	lda (TilePtrLow),y          ; 5     (19)
	sta G0Ptr                   ; 3     (22)
	lda (TilePtrHigh),y         ; 5     (27)
	sta G0Ptr+1                 ; 3     (30)

Each pointer takes 20 cycles to set, so for all 12 pointers, this is 240 cycles, and 3 scanlines is only 228 cycles.

Possible solutions I have considered:

1) Reduce the width to 80 pixels, so I only have to set 10 pointers between rows. This would work, but I'm grumpy about the idea.

2) Move my current tiles in RAM from SuperChip RAM to ZeroPage RAM. This would save me 12 cycles, which would make the update exactly 3 scanlines, but it would leave me no cycles for housekeeping, and would require massive rewriting of my existing game code if game variables used SC RAM to account for separate read/write addresses.

3) Have 4 scanlines between rows instead of 3. I don't think this would look good, and I'd have to lose my decorative screen border, and cut slightly into overscan to make this work.

4) Set the high byte of my pointers ahead of time. This would only work if my tables contained the addresses of 25 tiles or less (10 bytes in size each), which is only true for one map.

5) Read tile addresses from tables directly instead of from pointers to save 24 cycles. I think this would be enough savings, but it would require me to have a lot of duplicate code and tile data for each set of tiles.

I'll probably do either #1 or #5, but wanted to see if anyone had an idea I hadn't considered first.

Omegamatrix · November 11, 2019

Hello Karl,

I would lean towards reading the tile address directly from tables too. As you've found out the high pointer is a good place to focus on for optimization.

For example if you can just keep at least of 5 of them the same throughout each row for the kernel than you save 25 cycles. However that is really restricted. Likely you would have to have to use different routines for setting pointers that you jump into, and the complexity and overhead for that will quickly kill it.

Another page of ram would have been nice when Atari made the Super Chip. Then it would be problem solved.