Lynx extended addressing

+karri · September 24, 2009

Hi,

I am planning to add standard filesystem stuff into the cc65 C-compiler.

My suggestion is to use filenames like

"0" = mandatory title sprite

"1" = mandatory first executable

"2" .. "65535" file numbers on cart

The routines for reading a file from cart would be something like:

fno = open(filename, accessflags); // This also fetches the directory entry into RAM
status = read(fno, entry->loadAddress, entry->fileSize);

The Lynx cart is really a streaming device. There is no random access available. So I decided to allow just one file open ata time. Opening a new file automatically closes the old one.

The mandatory directory entry of a Lynx is:

.byte blockNumber
.word byteOffsetInBlock
.byte unused
.word loadAddress
.word fileSize

My suggestion is to take the byte value "unused" and use it for extended addressing. In this way we could extend the cart size from 512k to 128M. The new dirent structure would be like:

.byte blockNumber
.word byteOffsetInBlock
.byte hiBlockNumber
.word loadAddress
.word fileSize

The blocksize parameter would be an implementation dependent constant in the config file.

SYMBOLS {
   __STACKSIZE__ = $800;
   __BLOCKSIZE__ = 1024;
}

There are many ways to extend this range in hardware.

Obviously you need to create a new cart access routine which would send the hiBlockNumber to the extra hardware if it is used.

_setHiBlockNumber:
   sta specialHardware
   rts

You can use AUDIO I/O as a 9th blockNumber bit.

You can switch from cart0 strobe to cart1 strobe if the bit is set.

You can add a latch to keep several extra address bits in it.

But whatever you do this would be a "standard" way of extending the address range that the linker would understand.

Comments?

--

Karri

Edited September 24, 2009 by karri

Wookie · November 12, 2009

The Lynx cart is really a streaming device. There is no random access available. So I decided to allow just one file open ata time. Opening a new file automatically closes the old one.

Comments?

--

Karri

So, since the Lynx cart is a streaming device, why use the traditional "random" access fopen/fseek/fread/fwrite/fclose type of interface? I realize that it probably makes it easier to keep a paradigm programmer's are used to, but it seems to me like trying to juggle having only one file open would get cumbersome.

Let me add some more thoughts to this discussion. In my day job, I used to program/maintain the low level I/O code for a video game engine used in published games for the PC, PS2, PS3, XBox, and XBox 360. The engine was used in "big world" type games where only a small subset of the level data was in RAM at any given time and data was streamed in off of the DVD as needed. (The other design philosophy used by some games it to force all data for a level to fit into RAM and do a single load per level.)

It really helps to think of the data on the Lynx cart as a database. Sure there is an encrypted loader and an executable down at the beginning of the cart memory but the rest of it is a database of game data. Now that we're thinking of the cart as a big "database", all game "data units" (e.g. sprites, bitmaps, audio, scripts, etc) in this database can be fully described by a memory location (it's location on the cart) and a length (number of bytes).

Typically in a streaming game engine, each data unit has a unique ID assigned to it so that the ID can be used to refer to the asset. For instance if you have a text rendering engine that uses a bitmap of characters, the text engine would typically take an asset ID that refers to the character bitmap. This allows the same text rendering engine to be "data driven" in that it can be configured to use different character bitmaps. The point I'm trying to make is that having a unique ID per data unit is usually much easier to work with than to try to think of things in terms of files that you have to fopen, fseek, fread, fclose.

Once you commit to using unique ID's per data unit, you no longer need files. All you need is an index table that translates a data unit ID into the starting address and length data for the data unit. Typically, you would have a load queue that is full of load requests. Each load request consists of a data unit ID and a destination memory location and possibly a callback function pointer to call when the data is loaded. Then in the game run loop, a small portion of each frame is spent loading data. Since we know how many cycles it takes to do loads and our cart load state is preserved between across frames, we can easily limit the amount of time the loader takes each frame.

This design creates an asynchronous loading system. This is a good thing but can make your coding a little more tricky. For instance, let's say your game starts off by scrolling some text up the screen to tell a short back story. Your intro logic initializes the text rendering engine with the data unit ID of the character bitmap you want to use and when you call into the text rendering engine's render code it has to be able to "dereference" the data unit ID into a pointer to the character bitmap so that it can calculate the memory location of each character it is blit'ing to the screen. To dereference a data unit ID into a pointer, you pass the data unit ID to the loader. The loader checks to see if the data unit is already loaded, if it is, it immediately returns the pointer. If the data unit isn't loaded, it creates a load request, puts it in the load queue, and returns a value that mean "NOT YET" to the text rendering engine. The text rendering engine then needs to see that it doesn't have it's required data so it returns back out the run loop until next frame.

Now the run loop will call the loader which will take the load request, set up the upper address bit hardware and then spins doing dummy writes to the cart I/O register to get the bottom bits counter to the correct value before reading the data in and writing it to a free RAM location. Now the trick here is to figure out what the cycle budget is for the loader is so that you can "pause" a load and return to the run loop when the loader has used up its time slice. It's a primitive "cooperative multitasking" method. The next time through the run loop will call back into the loader which will continue it's load. This time sliced loading continues until the data unit is fully loaded. After that, the next call to the loader to dereference the data unit ID will return the address of the data unit in RAM. The text rendering engine then can proceed to blit the charaters to the screen buffer.

There are several cool tricks you can do with a streaming loader system like this:

1. You can partition the RAM into buffers for different types of data units. For instance, you can have a buffer for sounds, a buffer for bitmaps, a buffer for scripts, etc. Then if you store a monotonically incrementing number (think of it as a time stamp) associated with each data unit in a buffer, you can do least recently used data unit flushing when you need to free up memory in the buffer. For instance, let's say the text rendering engine was trying to load a new character bitmap. The loader sees that the bitmap buffer is full of loaded data units so it goes through the list of data unit timestamps and "flushes" the oldest one. It repeats the flushes until it has enough room for the new data unit to be loaded and then loads it into the buffer. Now this can be tricky because you can get into "thrashing" situations if you're not careful. Let's say your bitmap buffer only has room for two character bitmaps but you're level file requires text from three different character bitmaps. What will happen is that every time through the run loop, the character rendering engine will try to load three data units causing one of them to be flushed each frame. This thrashing will probably mean that one of the three character bitmaps will not be available every frame. It is likely to cause blinking text.

2. The buffers for data units can be partitioned by type or by region or however you think it is appropriate. The buffer sizes themselves can also be data driven. For instance, if you know that you need lots of sound effects in memory but fewer sprites, say during a boss battle, you could have your level definition file set the sound buffer bigger and the sprite buffer smaller.

3. The data unit ID system lets you organize data units so that they take fewer cycles to load. You could even put data units together that are logically associated. For instance, we almost alway need to load a palette with a sprite bitmap. Packing a sprite bitmap with its palettes probably makes sense so that you can load all of them at once.

4. The loader could be given some simple heuristics to scan its load queue and group together data units that are in the same 2048 byte segment on the cart. That way, it won't have to set the upper address bits and go into the offset loop to get the bottom 11-bit counter to the right offset. Instead, it just reads the first data unit, spins to skip the junk in between and then reads in the second data unit.

There are some details that I should lay out:

1. You will need an index data structure that maps data unit ID's to the cart memory locations and lengths. Let's assume you're using one of Karri's prototype carts that uses an additional 8-bit latch for address bits 19-27 giving you a 128 MB (65536 * 2048 Bytes) cart. A straightforward approach to indexing would mean that each index entry would have 3 bytes for the cart address and probably 2 bytes for the length. You could fit about 400 index records in a single 2048 byte segment. Now if your data units average 512 bytes in size, you could fit 262144 data units on a 128 MB cart. You would need 1.3 MB just to store the entire index. So clearly, some kind of hierarchical indexing scheme needs to be cooked up. That I leave up to you, but let me give you a hint: don't be afraid to have multiple copies of data units on the cart. You're going to have 128 MB to play with, having half a dozen copies of some common data units won't hurt anything. Also, think of how you could make assumptions about address bits. You could have a zero page byte that stores the upper 8 bits of cart addresses. Then you only need two bytes for address data per data unit. Each level would then store all of it's data units in a 512 KB block on the cart. When you make a level transition, you load initialize that variable with the 8-bits for the 512KB block for that level's data. Then you would load in the index from that 512 KB block and go with your level. All data unit ID translations then use that level's index and translate to addresses into that 512 KB block.

2. There will be times when your game might cause a brief period (a couple of frames) of thrashing that is harmless to the game play but might result in flushing data units that you absolutely don't want flushed (e.g. the boss a.i. script). You will want to have a locking mechanism on data units that overrides the LRU flush heuristic. You will want to use it sparingly, but I guarantee you will use it.

3. Instead of trying to make data driven A.I. or level logic where your game interprets some script, why not just code it up in C/asm, compile it, and then store it as a data unit that can be loaded when needed. The thing you have to worry about are hard coded branch destinations in the loadable executable data. The reason is that the chunk of executable could be loaded anywhere in memory and if a branch destination is not relative, then it likely branch of into nowhere and crash. You need to write "relocatable" code where all branches are relative to the program counter or are calls into permanent, non-moving engine code.

So Karri, writing a normal file interface may be good for casual programmers but if we want to really take full advantage of the new large carts, I think we need a generic way to package up assets as data units that can be streamed in as needed and an example streaming loader that people can use in their games. It would achieve the goal of abstracting the cart away from the programmer and be a better fit to the streaming cart interface.

--Wookie

Edited November 12, 2009 by Wookie

Wookie · November 13, 2009

Karri,

Don't forget that now that we can write and encrypt our own loaders, we are no longer bound by the directory entry stuff. The directory entries were a dependency of Harry's loader, not the Lynx ROM.

--Wookie

Shawn Jefferson · November 13, 2009

Karri,

Don't forget that now that we can write and encrypt our own loaders, we are no longer bound by the directory entry stuff. The directory entries were a dependency of Harry's loader, not the Lynx ROM.

--Wookie

Yes! I was going to say this too... the directory structure really can be anything, or nothing if you write your own loader and encrypt it yourself.

Also, you could technically do random access file routines, they would just be really slow since you'd have to "rewind" and seek to the data you want. A full fopen/fseek/fread/fclose interface could be written if you had the inclination. But really, only people interested in portability to other cc65 targets care about this overmuch... most lynx programmers making games for the lynx probably wouldn't require it.

Wookie · November 13, 2009

This morning it occurred to me that I could use a timer interrupt to handle time slicing a game run loop. There are a couple of possibilities:

1. I could use the timer interrupt to handle scheduling "threads" in a round-robin multi-tasking setup. That way I could have a run loop for the loader and a run loop for the sound engine and a run loop for the main game logic. The scheduler would do context switches between the three "threads" to give each one a chance to run each frame. This is really nice from a software standpoint but can be tricky when doing cross-thread communication and trying to find the right balance between the threads. All inter-thread communication would have to be done using event queues which works really nicely for the main game logic thread to talk to both the loader and sound system. Doing context switches is really pretty simple. You just have to have independent stacks for each thread. When the context switch happens, you disable interrupts, store the processor registers and state flags on the threads stack, restore the registers and state from the next thread, re-enabled interrupts and the do an rti. If you put the program counter pointer on to the thread's stack properly, when you do an rti, the cpu will initialize the pc from the next thread's stack and then jump back to where the thread was at, in the exact same state as when it was put to sleep.

2. A simpler approach than a full on multi-threading design would be to have the loader code set up a timer interrupt when it starts its time slice and when the timer interrupt hits, it sets a variable that causes the loader to return to the main loop. Something like the following:

loader_interrupt:

   inc $15        ; increment the loader exit variable in the zero page
   rti            ; return from the interrupt

loader_run_loop:

   ; check for an in-progress load and if there are load requests in the queue
   ; if there is nothing to do:
   ;   return immediately

   stz $15        ; set the loader exit variable to zero

   ; initialize the loader timer to fire an interrupt couple milliseconds from now

inner_load_loop:
   ; check for load in progress
   ; if not in progress:
   ;   initialize load variables with next load in queue

   ; do a duff device jump into the following list of load blocks
   ;   load one byte from the cart
   ;   store it
   ;   increment destination address pointer
   ;   load another byte from the cart
   ;   store it
   ;   increment destination address pointer
   ;   ...
   ;   load 64th byte from the cart
   ;   store it
   ;   increment destination address pointer
   
   ; now check to see if the loader interrupt has fired
   lda #0              ; a = 0
   adc $15             ; a += ($15)
   beq inner_load_loop ; this will loop if $15 is still zero


main_game_loop:
   
   ; let input system run

   ; let game logic system run

   ; let the sound system run
   
   ; let the loader run
   jsr loader_run_loop

   ; sleep to let Suzy have the bus

   ; endlessly loop
   bra main_game_loop

This code is a very simple duff's device loader run loop function that returns when the timer interrupt fires. I arbitrarily picked the zero page address $15 to be the loader run loop exit flag, that could be anywhere in RAM really. As long as Suzy can't mask timer interrupts, then the loader will always run as expected.

It is important to remember that if your game is running at 60 frames per second that means you have exactly 16 2/3 milliseconds of time per frame. I assume that Suzy will probably take up most of that time. I have no empirical data to back up this number but I would initially start with running the loader for 2-4 milliseconds. That would give anywhere from roughly 10%-25% of your frame time to the loader. Remember that if there are no pending load requests and no load in progress, the loader will return immediately and only take up a tiny fraction of the frame time.

--Wookie

Edited November 13, 2009 by Wookie

Wookie · November 13, 2009

This morning it occurred to me that I could use a timer interrupt to handle time slicing a game run loop. There are a couple of possibilities:

1. I could use the timer interrupt to handle scheduling "threads" in a round-robin multi-tasking setup. That way I could have a run loop for the loader and a run loop for the sound engine and a run loop for the main game logic. The scheduler would do context switches between the three "threads" to give each one a chance to run each frame. This is really nice from a software standpoint but can be tricky when doing cross-thread communication and trying to find the right balance between the threads. All inter-thread communication would have to be done using event queues which works really nicely for the main game logic thread to talk to both the loader and sound system. Doing context switches is really pretty simple. You just have to have independent stacks for each thread. When the context switch happens, you disable interrupts, store the processor registers and state flags on the threads stack, restore the registers and state from the next thread, re-enabled interrupts and the do an rti. If you put the program counter pointer on to the thread's stack properly, when you do an rti, the cpu will initialize the pc from the next thread's stack and then jump back to where the thread was at, in the exact same state as when it was put to sleep.

2. A simpler approach than a full on multi-threading design would be to have the loader code set up a timer interrupt when it starts its time slice and when the timer interrupt hits, it sets a variable that causes the loader to return to the main loop. Something like the following:
loader_interrupt:

   inc $15        ; increment the loader exit variable in the zero page
   rti            ; return from the interrupt

loader_run_loop:

   ; check for an in-progress load and if there are load requests in the queue
   ; if there is nothing to do:
   ;   return immediately

   stz $15        ; set the loader exit variable to zero

   ; initialize the loader timer to fire an interrupt couple milliseconds from now

inner_load_loop:
   ; check for load in progress
   ; if not in progress:
   ;   if load queue empty:
   ;      bra exit_loader_loop
   ;   else:
   ;      initialize load variables with next load in queue

   ; do a duff device jump into the following list of load blocks
   ;   load one byte from the cart
   ;   store it
   ;   increment destination address pointer
   ;   load another byte from the cart
   ;   store it
   ;   increment destination address pointer
   ;   ...
   ;   load 64th byte from the cart
   ;   store it
   ;   increment destination address pointer
   
   ; now check to see if the loader interrupt has fired
   lda #0              ; a = 0
   adc $15             ; a += ($15)
   beq inner_load_loop ; this will loop if $15 is still zero

exit_loader_loop:
   ; turn off the timer interrupt

   rts                 ; return to the main game loop

main_game_loop:
   
   ; let input system run

   ; let game logic system run

   ; let the sound system run
   
   ; let the loader run
   jsr loader_run_loop

   ; sleep to let Suzy have the bus

   ; endlessly loop
   bra main_game_loop
This code is a very simple duff's device loader run loop function that returns when the timer interrupt fires. I arbitrarily picked the duff's device to load 64 bytes at a time, but that should be carefully tuned to balance branching overhead and the number of clock cycles between checks of the exit condition variable. I also arbitrarily picked the zero page address $15 to be the loader run loop exit flag, that could be anywhere in RAM really but since it gets checked every frame, it should probably be in the zero page. As long as Suzy can't mask timer interrupts, then the loader will always run as expected.

It is important to remember that if your game is running at 60 frames per second that means you have exactly 16 2/3 milliseconds of time per frame. I assume that Suzy will probably take up most of that time. I have no empirical data to back up this number but I would initially start with running the loader for 2-4 milliseconds. That would give anywhere from roughly 10%-25% of your frame time to the loader. Remember that if there are no pending load requests and no load in progress, the loader will return immediately and only take up a tiny fraction of the frame time.

--Wookie

Edited November 13, 2009 by Wookie

Sign In

Lynx extended addressing

Recommended Posts

+karri

Link to comment

Share on other sites

Wookie

Link to comment

Share on other sites

Wookie

Link to comment

Share on other sites

Shawn Jefferson

Link to comment

Share on other sites

Wookie

Link to comment

Share on other sites

Wookie

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More