As mentioned last time, cd-w spun off a new topic to investigate the use of an EEPROM to store level data. While we ended up not using one, he did use the CodeSourcery ARM toolchain to write routines to copy data from the EEPROM to RAM. batari mentioned that I should look into using that toolchain as it appeared to create tighter (smaller) code.
One minor issue is the toolchain wasn't available for the OS X, only Windows and Linux. But it was a rather minor issue as I'd been using Parallels (a virtual machine) for a few years. So I set up a new VM for Linux and installed the new compiler. If you're interested in learning more about virtual machines, my other blog series DPC+ARM Development is currently going over the free VM VirtualBox.
Using the new compiler, the ARM code shrunk by 1176 bytes from 15,963, a savings of 7.4%. Additionally, I was able to implement some optimizations that failed with the original C compiler - this brought the total savings to 1624 bytes, a savings of over 10%!
One problem did occur though, the music stopped working - it was stuck on the first note. I made some changes which fixed the music, but then the snowman stopped moving in the X direction. The variable storage when the music didn't work was:
*(.bss) *(COMMON) COMMON 0x40001e14 0x4 music.o 0x40001e14 gMusicTrack COMMON 0x40001e18 0x24 main.o 0x40001e18 gFrostyX 0x40001e1c gLevelWidth 0x40001e20 gFrostyImage 0x40001e24 gFrostyY 0x40001e28 gScore 0x40001e2c gSnowballY 0x40001e30 gShiftWindow 0x40001e34 gLevelOffset 0x40001e38 gSnowballX 0x40001e3c . = ALIGN (0x4) 0x40001e3c _ebss = . 0x40001e3c end = .
While when the snowman couldn't move was:
*(.bss) *(COMMON) COMMON 0x40001e14 0x28 main.o 0x40001e14 gFrostyX 0x40001e18 gLevelWidth 0x40001e1c gFrostyImage 0x40001e20 gFrostyY 0x40001e24 gScore 0x40001e28 gMusicTrack 0x40001e2c gSnowballY 0x40001e30 gShiftWindow 0x40001e34 gLevelOffset 0x40001e38 gSnowballX 0x40001e3c . = ALIGN (0x4) 0x40001e3c _ebss = . 0x40001e3c end = .
I realized that whatever was in 0x40001e14 was getting overwritten by zero, so as a quick fix I just defined a dummy variable:
*(.bss) *(COMMON) COMMON 0x40001e14 0x4 music.o 0x40001e14 Dummy COMMON 0x40001e18 0x28 main.o 0x40001e18 gFrostyX 0x40001e1c gLevelWidth 0x40001e20 gFrostyImage 0x40001e24 gFrostyY 0x40001e28 gScore 0x40001e2c gMusicTrack 0x40001e30 gSnowballY 0x40001e34 gShiftWindow 0x40001e38 gLevelOffset 0x40001e3c gSnowballX 0x40001e40 . = ALIGN (0x4) 0x40001e40 _ebss = . 0x40001e40 end = .
That fixed both problems. batari did some research and discovered a bug in custom.S, the C code boot routine that he'd written. Fixing that solved the problem, so I removed the dummy variable.
At this point I decided to analyze if our tentative level arrangement was going to fit in the ROM (the ?X notation is how wide the level is, 1X = 1 screen, 2X = 2 screens, etc):
- five 1X, each used twice for 10 total levels
- five 2X, each used twice for 10 total levels
- six 3X, 4 used twice, 2 distinct for 10 total levels
- one 1X bonus room, used twice for 2 levels
I realized that even with the space freed up by the C compiler it wasn't going to fit. So time for more optimizing!
First up, I reviewed the music routines and revised it so data like this:
C4, REST, C3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, G3, E3, 0, 0, 0, C4, 0, 0, 0, 0, 0, C4, G3, E3, 0, 0, 0, 0, 0, 0, 0, 0, 0,
was now stored like this:
C4, REST, C3, 0, G3, DOUBLE_TIME + E3, C4, 0, DOUBLE_TIME + 0, C4, G3, E3,
The test track used 1190 bytes of space, with the new storage method that was down to 306. I copied the music data twice to allocate space for a menu music, odd level music, and even level music. So at this point music data used up 918 bytes, down 272 bytes.
Next I found a Fast Bit Reversal routine for reversing 16 bits and revised it to work for 8:
unsigned char BitReversal(unsigned char value){ value = ((0xaa & value) >> 1) | ((0x55 & value) << 1); value = ((0xcc & value) >> 2) | ((0x33 & value) << 2); value = ((0xf0 & value) >> 4) | ((0x0f & value) << 4); return value;}
That replaced a 256 byte lookup table with a 42 byte function for another 202 byte savings. I also revised the sky colors, made some changes to the boss and added the defeated boss message.
Level 32 started to jitter, which I attributed to the bit reversal routine being slower than the lookup table.
At this point the following levels were considered "done" such that they might be tweaked and/or resequenced to make the progression better, but the layout itself was done:
- 1 - didn't end up in final game
- 3 - early version of level 4 in final game
- 10 - bonus room, level 8 in final game
- 11 - level 14 in final game
- 17 - moving version of #1, didn't end up in final game
- 19 - dropped from final game
- 26 - second bonus room, level 24 in final game
- 27 - level 29 in final game
- 32 - early version of boss level - touching boss still instantly vaporizes the snowman
Level 4 wasn't considered done, but would eventually become level 3 in final game.
At this point the ROM free state was 2502 bytes:
ARM - $5d8 -1496
Bank4 - $103 - 259
Bank5 - $1fe - 510
Display Data - $ed - 237
ROMs
Source
NOTE: While the ROMs work on the Harmony, they do not work in Stella.
Blog entry covers November 29 - December 2, 2010
2 Comments
Recommended Comments