Assembly on the 99/4A

matthew180 · May 12, 2010

I'm posting this mostly as a response to help Owen get past some of the initial hurdles of AL programming and also as maybe a "starter kit" for anyone else who wants to get going with AL on the 99/4A. I will be presenting the information and programming style that I like to use, but of course there are always other ways and other people will have their own methods and madness. I'm not interested in arguing styles and such in this thread, so please let's not go down that road.

There are *basically* (but not only) two types of AL programs formats you can write on the TI, programs that run from the Editor/Assembler (EA) menu 3 option, commonly known as EA3. These types of programs require a linking phase to finalize the memory references before execution. Because of this, this kind of code is "relocatable" and the loader can place the program just about anywhere in memory that it will fit.

The second type of program is one meant to be run from a cartridge ROM. These programs are set up differently and have a few more restrictions as to defining labels and such. For now we will stick with EA3 types of programs.

This is the basic skeleton:

      DEF  START

* Typically EQUates, DATA, and BYTE defintions for variables
* will be at the head of the program, with longer data sets
* places at the bottom of the code.


* Program execution starts here
START  LIMI 2

LP9999 JMP  LP9999

      END

This program will do nothing but execute an endless loop. However, interrupts were enabled with LIMI 2, so the FCTN= combination can be used to reset the console.

This line:

DEF START

Is an assembler directive, which means it will be read and used by the assembler. It is *not* assembly language and does not cause any code to be created. This directive is used to place a label in the REF/DEF table that the linker uses to load and find programs. This entry is saying that there will be a label called "START" and that is where our program will begin execution. "START" could be anything we want that is 1 to 6 characters in length.

Let's add something that will actually do something for us like set the VDP to Graphics Mode I, clear the screen, and say "Hello World". I think this is the basic entry level program that most people try to start with.

*Note*

Sometimes you will see a $-2 or such used in code. The assembler users the $ to represent the current location and will subtract (or add) the specified number of bytes to the current location when generating an address. This is usually used as a short cut for short loops to avoid using a label. I do not currently recommend using this method because I have found that Asm994a gets it wrong, and as of right now that is the only Windows based assembler I know of. If you are going to stick 100% with the E/A, then go ahead, but don't be surprised if you try to assemble with Asm994a and your code does not work.

     DEF  START

* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read/write address

* Workspace
WRKSP  EQU  >8300             * Workspace
R0LB   EQU  WRKSP+1           * R0 low byte reqd for VDP routines

* Program execution starts here
START  LIMI 0
      LWPI WRKSP


      LIMI 2
LP9999 JMP  LP9999

      END

First notice that there are no REF statements for the normal VDP routines. This is because I don't like using the console based routines for a couple of reasons:

1. They are slow since they were designed to save ROM space instead of being fast to execute

2. They use BLWP which is slow and designed primarily for context switching in a multi-tasking system like TI's mini computers where the TMS9900 CPU was designed to be used

3. The routines use a workspace in 8-bit RAM! This in of itself should be a sin!

Also, I don't like to use the GLP routines because they require a workspace pointer in the 256 bytes of scratch pad RAM. Since there is only 256 bytes of 16-bit RAM in the machine, it is highly prized memory and I don't like some other console routines chewing up 32 bytes of that RAM. Anyway, the VDP routines provided in the E/A cartridge are slow, slow, and slow, and I'm a speed freak.

What we set up in the code is an equate (EQU) to the memory address where the VDP is mapped in the 99/4A's address space. Equates are assembler directives that simply let us use a label instead of a number. The assembler will do a search and replace on the labels, so any place you see VDPRD, for example, will be replaced with >8800.

We have also changed the first instruction to LIMI 0, which will disable the VDP interrupt, which in turn disables the console ISR. While the VDP interrupt is a nice thing to have to use as a sixtieth of a second timer in a game, it triggers the console's one and only interrupt service routine which we usually don't want running (it tries to do too much IMO.) So, we shut it off. We can still use the VDP interrupt, we just have to poll for it, which is not too bad.

The next thing we do is LWPI (load workspace pointer immediate), which sets the workspace pointer to >8300, which is the first address in the 16-bit scratchpad RAM. Since the registers in the 9900 CPU are memory-based, you absolutely positively always want to keep the workspace pointer set to an address in that 16-bit RAM (unless you really want to kill performance, in which case you should just use XB... ;-) )

Finally you will note an equate called R0LB. This creates a label that gives us a convenient way to access the low byte of R0 without using SWPB. It comes in handy when dealing with the VDP as you will see in the routines we set up.

Now let's do something, like clear the screen:

      DEF  START

* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read/write address

* Workspace
WRKSP  EQU  >8300             * Workspace
R0LB   EQU  WRKSP+1           * R0 low byte reqd for VDP routines


* Program execution starts here
START  LIMI 0
      LWPI WRKSP

      CLR  R0                * Set the VDP address to zero
      MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address

      LI   R1,>2000          * Set high byte to 32 (>20)
      LI   R2,768            * Set every screen tile name to >20
CLS    MOVB R1,@VDPWD         * Write byte to VDP RAM
      DEC  R2
      JNE  CLS

      LIMI 2
LP9999 JMP  LP9999

      END

For now we are doing things directly, but soon we will make a subroutine to encapsulate the VDP reading and writing. One of the most important things to understand about the 9918A VDP is that it maintains an internal address register that auto-increments any time a byte of data is written-to or read-from the VDP. This is handy since setting the VDP's address register takes two writes to the VDP write-to-register port. You can see that above. R0 is loaded with the address we want to set. In this case we are going to set up the address of zero, since I know the VDP name table defaults to memory location zero, we are setting up the VDP address to write bytes to the VDP RAM used to display the screen. A little later we will set the name table location directly so we know without a doubt where it is located in the VDP RAM.

The other thing to know about the VDP registers is that they are read only (except for the status registers.) So the only way to know for sure where the various VDP tables are located is to set them.

So, the address is set up and written to the VDP. The VDP's internal address register is 14-bits since it has to reference up to 16K, so it takes two 1-byte transfers to load the address register. After that, any data we write (or read) to the VDP will cause the address to auto-increment after the write (or read.)

Next we load R1 with the byte we want to write to the VDP. When writing from registers to memory mapped devices, the MSB of the register will always be transferred. Thus, the low byte of R1 really does not matter, but we use >00 to help keep things clear. Since we are clearing the screen, the pattern for tile (character) 32 is already defined as space thanks to the console boot-up process (ASCII 32, or >20 in hex.) We set R2 to 768 to keep track of how many bytes we have written to the VDP. We count *down* to take advantage of the fact that the 9900 CPU will compare R2 to 0 after the DEC instruction and we can use that to control the JNE (jump if not equal) instruction. So we are saying, Jump if R2 is not 0.

Finally we re-enable interrupts and cause an infinite loop. Something else to note. You *MUST* disable interrupts when reading or writing to the VDP because the console ISR also reads and writes the VDP and will mess up the address you had set up. And since your program has no way to know that it was possibly interrupted, your VDP accesses will be all messed up. But, we don't need the console ISR, so we will just leave it disabled until the end, if we ever enable it at all.

Okay, that's is for this installment. Next I'll add code for a full VDP mode setup so things are exactly where we want them, and we can set up the VDP subroutines.

Matthew

Edited May 12, 2010 by matthew180

sometimes99er · May 12, 2010

Very nice.

R0LB   EQU  WRKSP+1           * R0 low byte reqd for VDP routines
Finally you will note an equate called R0LB. This creates a label that gives us a convenient way to access the low byte of R0 without using SWPB. It comes in handy when dealing with the VDP as you will see in the routines we set up.

I guess it must have been proven that this method either requires less space or is faster than using two swpbs ?

matthew180 · May 12, 2010

Both. The symbolic address only adds 8 clocks and 1 memory access to the MOVB instruction. The SWPB itself is 10 clocks and 3 memory accesses x2 because you would have them back to back around the MOVB. Also, you save 4 bytes by not having SWPB, but the symbolic address takes 2 bytes, but you are still ahead 2 bytes of memory, 14 clock cycles, and 5 memory accesses.

Matthew

Opry99er · May 12, 2010

This is excellent man. Very clear and concise. I am very grateful you are in our little Atariage User Group here. . There is so much to learn, and so little time--- I wish I had started learning this when I was a young'n instead of writing my first assembly code at 27. But hey, now's as good a time as any.

You rock, man!!!

Opry99er · May 12, 2010


* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read

I notice you use this for all your examples. This is a template I could use as well--- place this after my DEF and REF and before the START label. I like the readability of it and the orderly structure. .

matthew180 · May 12, 2010

That's where I think they should be too. ;-) However, realize that you won't be referencing any of the E/A VDP routines. So you must make sure you do not have any of these:

       REF  VSBW,VMBW,VWTR,VSBR,VMBR

Ultimately you won't have any references at all, we will be writing our own code that does the VDP, keyboard, and joystick access.

What you need to understand is that the REF statement simply tells the assembler that these labels will be resolved (meaning the addresses for the labels will be determined) by the linker. So the assembler simply makes dummy addresses for them. Then when you "Load and Run" the program, the linker sees these references and goes and looks in the REF/DEF table to find the real addresses of the subroutines. This only works because the E/A cartridge builds up the REF/DEF table with lots of subroutines that it makes available, some of which are the VDP routines. The E/A cart copies the code from its GROM into the low memory starting at >2000, so all the E/A referenced routines eat up space our program could use, and they use an 8-bit workspace (which is bad, m'kay.)

There is nothing special about these routines. They simply hide some of the details of interfacing directly with the hardware, like the VDP, keyboard, joysticks, etc. While that can be good, the routines are general purpose and written to save ROM space, so rolling your own will be better, faster, smaller, and able to run without the E/A cartridge (which is another thing you need to understand, those routines are not available without the E/A cart, and maybe the XB cart.) Also, you learn more about your computer when you do it yourself.

Matthew

Opry99er · May 12, 2010

Wow--- it's coming a bit more clear now, thank you. I need to find the time today to sit at my console and type in a bunch of source just to get more familiar with how things work. I will be using this "roll your own" method from this point forward. As a matter of fact, this is how Lottrup does things, as the VDP routines are not in the minimem cart... You must know the addresses and go from there. . Thanks again. You're a vault of knowledge.

matthew180 · May 12, 2010

Actually, the VDP routines that Lotturp references must be in the mini-memory's ROM. Look at this:

      BLWP @>6028

or

VSBW   EQU >6028
. . .
      BLWP @VMBW

These are the same thing. Both are branching to a subroutine via the BLWP instruction, and >6000 to >7FFF is the cartridge address space. So, >6028 must be mini-memory ROM or nothing would happen except a reboot or hard lock.

If Lottrup was rolling his own, you would see some code similar to what I posted where you are reading and writing directly to the VDP's memory-mapped addresses, which are set up by the equates in my example.

I think you might be confusing the difference between a subroutine and a memory-mapped device. Calling a subroutine is simply changing where the CPU gets its next instruction by changing the value of the Program Counter (one of the 3 real internal registers that the 9900 has; the PC is used to hold the address of the next instruction to execute.) These instructions are generally used with subroutines (think GOSUB in BASIC):

BLWP

BL

The instructions below also change the program counter (PC), but they do not store the current value of the program counter before branching (updating the program counter), so you can not "return" to where you were. These are pretty much like GOTO in BASIC, with the exception that only two of the instructions are "unconditional", but all cause a jump by changing the PC:

B

JEQ

JGT

JH

JHE

JL

JLE

JLT

JMP

JNC

JNE

JNO

JOC

JOP

The difference between the B (branch) and JMP (unconditional jump), or any of the conditional jump instructions, is how "far" you can jump. The jump instructions encode an offset from the current PC value, and only 8-bits are allocated in the machine instruction to store the offset. So you can only use the jump instructions to jump to an address within -128 to +127 "words" of the current PC value (the value in the PC is *always* even.) If you need to jump further, you have to use the B instruction which takes an complete 16-bit destination address as an operand.

Thus, when you see something like BLWP @VSBW, you are simply jumping to a subroutine. It has nothing to do with the VDP in the 99/4A.

Okay, so memory-mapped devices... When you have a chip like the VDP, it needs to communicate with the main CPU somehow. The 9918A has an 8-bit data bus and 3 control pins that are used to communicate with the host CPU. To get data in and out, a CPU has an address bus and a data bus. When the CPU wants data from an address (which may or many not necessarily be RAM), it will put the address on the address bus, sends out a "hey I want to read data" signal (called "read enable"), waits a little bit, then expect to be able to read the data in on the data bus.

Same thing for writing data, the CPU puts the address of where it wants to write the data on the address bus, it puts the data to write on the data bus, then is sends out a "hey, store this data" signal (called "write enable"), waits a little bit for the data to be stored, then takes down the address and data.

*Note, I'm not going to cover the "I/O" commands, which on a CPU is usually done via the address and data bus, but works in a different way.

Computers have lots of different memory, there is RAM, ROM, GROM and GRAM (in our case with the 99/4A), memory-mapped devices, etc. When the CPU places an address on the address bus, that address may not necessarily be what we typically think of as "memory", i.e. RAM. When you have a device like the VDP that needs to send and receive data with the CPU, it is convenient to "map" that device into the CPU's address space somewhere such that when certain addresses are requested by the CPU, it will be talking to the VDP.

In the 99/4A, the 9918A is "mapped" to respond to 4 memory addresses. These addresses are hard wired via traces on the motherboard and can not be changed. If you read or write to the VDP addresses, you will always be sending or receiving data to/from the VDP. The sound chip works the same way (except it is write only), the GROMs are memory mapped as well (I'm sure there are other devices too.) The "mapping" is done with logic chips on the motherboard and is generally called "decoding", since parts of the address bus are used to detect when certain memory addresses are present on the address bus and the proper chips are activated.

The VDP in the 9900 will respond to CPU addresses >8800, >8802, >8C00, >8C02 because it is hardwired that way. There are 4 addresses (also called "ports") for reading and writing because of the 9900's "read before write" nature. On other computer systems, like the MSX, they probably have the read and write mapped to a single port (address.) Also, the 9918A is only an 8-bit device, and the 9900 has a 16-bit data bus, so the VDP was hooked to the 9900's upper 8 pins of the data bus, which is why the MSB is the byte always sent to the VDP when using a register.

In our code we use the equates to set labels to the VDP's memory addresses, which helps us as humans to keep things straight and not make so many mistakes.

So all of this to make sure you understand the difference between:

BLWP @VSBW

and

MOVB R1,@VDPWD

The first is jumping to a subroutine that implies it will write some data to the VDP. The second is moving the MSB from R1 to the memory address >8C00, which happens to be the address the VDP will respond to when you want to send data to the VDP. The label is VDP-Write-Data.

So, when we want code that will run from a cartridge, or that is just faster, we skip using the routines provided to us by the mini-memory, E/A, or XB carts and just write our own subroutines. We will model our routines after the ones in these carts, but ours will be in our own code, do not require references, and will be faster.

Clear as mud, right? :-)

Matthew

+adamantyr · May 12, 2010

Actually, Lottrup relies on the ROM routines as well, the architecture of the Mini-Memory system has them loaded into an area of the 4k ROM. What Matt's talking about is writing your own routines that do everything. The video routines are pretty simple to set up; KSCAN a bit more tricky if you want to not rely on the SCAN routine in the ROM; XMLLNK, DSRLNK and GPLLNK are all very nasty. (I've only done DSRLNK myself, using Travis Watford's version with some modifications.)

For my own programming, I still like using BLWP to access them because of the convenience of swapped registers. If I have something that is speed-critical then I just use the direct read/write ports in the base code. That way I get both the convenience and the speed when I need it.

Adamantyr

matthew180 · May 12, 2010

Okay, so this is will be the big tile (character) definition post. This code will sit at the bottom of our program and provides the data necessary to define patterns for decent looking ASCII characters. I have no idea why TI gave the 99/4A the most awful character set in existence, but at least we can change it. :-)

I usually put these data statements at the bottom of my code because I don't want to always have to scroll past it when I'm working on the main code. Out of sight, out of mind at the bottom. I won't be including this in the example code going forward because it would just bloat the posts. Also note that this data does increase the size of our executable and uses CPU RAM when our program is loaded. That sucks. However, unless we read the data from disk, we don't have much choice. Also, when writing programs to run from a cartridge, this will chew up part of the 8K we have for our program (unless we do paging, which is a good thing for this kind of data.)

To do us any good, this data needs to be copied from the CPU RAM into the VDP's RAM so the VDP can use it as tile patterns. We will write a "load character set" subroutine after our custom VDP routines are in place.

Matthew

**
* Standard Character Set 1 - "Space" 8x8
*
SCS1
      DATA >0000,>0000,>0000,>0000       ;   0 >00
      DATA >7C82,>AA82,>BA44,>3800       ;   1 >01
      DATA >7C92,>92FE,>BA44,>3800       ;   2 >02
      DATA >6CFE,>FEFE,>7C38,>1000       ;   3 >03
      DATA >1038,>7CFE,>7C38,>1000       ;   4 >04
      DATA >3838,>D6FE,>D610,>3800       ;   5 >05
      DATA >1038,>7CFE,>FE92,>3800       ;   6 >06
      DATA >0038,>7C7C,>7C38,>0000       ;   7 >07
      DATA >FEC6,>8282,>82C6,>FE00       ;   8 >08
      DATA >0038,>4444,>4438,>0000       ;   9 >09
      DATA >FEC6,>BABA,>BAC6,>FE00       ;  10 >0A
      DATA >0E06,>0A7C,>C6C6,>7C00       ;  11 >0B
      DATA >7CC6,>C67C,>107C,>1000       ;  12 >0C
      DATA >0C0C,>0C0C,>0C38,>3000       ;  13 >0D
      DATA >3E36,>3636,>E6DC,>1800       ;  14 >0E
      DATA >0155,>2955,>2955,>01FF       ;  15 >0F
      DATA >0060,>787E,>7860,>0000       ;  16 >10
      DATA >000C,>3CFC,>3C0C,>0000       ;  17 >11
      DATA >187E,>1818,>1818,>7E18       ;  18 >12
      DATA >6666,>6666,>6600,>6600       ;  19 >13
      DATA >7ED6,>D6D6,>7616,>1600       ;  20 >14
      DATA >7EC0,>FCC6,>7E06,>FC00       ;  21 >15
      DATA >0000,>007E,>7E00,>0000       ;  22 >16
      DATA >1038,>7C00,>7C38,>10FE       ;  23 >17
      DATA >0010,>387C,>FE00,>0000       ;  24 >18
      DATA >0000,>FE7C,>3810,>0000       ;  25 >19
      DATA >1018,>1C1E,>1C18,>1000       ;  26 >1A
      DATA >0818,>3878,>3818,>0800       ;  27 >1B
      DATA >0000,>00C0,>C0C0,>7E00       ;  28 >1C
      DATA >0028,>6CEE,>6C28,>0000       ;  29 >1D
      DATA >0000,>1038,>7CFE,>0000       ;  30 >1E
      DATA >0000,>00FE,>7C38,>1000       ;  31 >1F
      DATA >0000,>0000,>0000,>0000       ;  32 >20
      DATA >3030,>3030,>3000,>3000       ;  33 >21 !
      DATA >6C6C,>2800,>0000,>0000       ;  34 >22 "
      DATA >50F8,>50F8,>5000,>0000       ;  35 >23 #
      DATA >7CD6,>D07C,>16D6,>7C00       ;  36 >24 $
      DATA >3256,>6C18,>366A,>4C00       ;  37 >25 %
      DATA >386C,>3864,>C6C6,>7E00       ;  38 >26 &
      DATA >1818,>3000,>0000,>0000       ;  39 >27 '
      DATA >1060,>C0C0,>C060,>1000       ;  40 >28 (
      DATA >100C,>0606,>060C,>1000       ;  41 >29 )
      DATA >0054,>38FE,>3854,>0000       ;  42 >2A *
      DATA >0018,>187E,>1818,>0000       ;  43 >2B +
      DATA >0000,>0000,>0018,>1830       ;  44 >2C ,
      DATA >0000,>007C,>0000,>0000       ;  45 >2D -
      DATA >0000,>0000,>0018,>1800       ;  46 >2E .
      DATA >0006,>0C18,>3060,>0000       ;  47 >2F /
      DATA >7CC6,>C6D6,>C6C6,>7C00       ;  48 >30 0
      DATA >1838,>1818,>1818,>7E00       ;  49 >31 1
      DATA >7CC6,>061C,>70C0,>FE00       ;  50 >32 2
      DATA >7CC6,>063C,>06C6,>7C00       ;  51 >33 3
      DATA >0E1E,>3666,>C6FE,>0600       ;  52 >34 4
      DATA >FEC0,>C0FC,>0606,>FC00       ;  53 >35 5
      DATA >7CC6,>C0FC,>C6C6,>7C00       ;  54 >36 6
      DATA >FE0C,>187C,>3030,>3000       ;  55 >37 7
      DATA >7CC6,>C67C,>C6C6,>7C00       ;  56 >38 8
      DATA >7CC6,>C67E,>06C6,>7C00       ;  57 >39 9
      DATA >0018,>1800,>1818,>0000       ;  58 >3A :
      DATA >0018,>1800,>1818,>1000       ;  59 >3B ;
      DATA >0C18,>3060,>3018,>0C00       ;  60 >3C <
      DATA >0000,>7C00,>7C00,>0000       ;  61 >3D =
      DATA >6030,>180C,>1830,>6000       ;  62 >3E >
      DATA >3C46,>060C,>1800,>1800       ;  63 >3F ?
      DATA >3C46,>D6D6,>DEC0,>7C00       ;  64 >40 @
      DATA >386C,>C6C6,>FEC6,>C600       ;  65 >41 A
      DATA >FCC6,>C6FC,>C6C6,>FC00       ;  66 >42 B
      DATA >7CC6,>C0C0,>C0C6,>7C00       ;  67 >43 C
      DATA >FCC6,>C6C6,>C6C6,>FC00       ;  68 >44 D
      DATA >FEC0,>C0F8,>C0C0,>FE00       ;  69 >45 E
      DATA >FEC0,>C0F8,>C0C0,>C000       ;  70 >46 F
      DATA >7CC6,>C0DE,>C6C6,>7C00       ;  71 >47 G
      DATA >C6C6,>C6FE,>C6C6,>C600       ;  72 >48 H
      DATA >3C18,>1818,>1818,>3C00       ;  73 >49 I
      DATA >1E06,>0606,>06C6,>7C00       ;  74 >4A J
      DATA >C6CC,>D8F0,>D8CC,>C600       ;  75 >4B K
      DATA >6060,>6060,>6060,>7E00       ;  76 >4C L
      DATA >C6EE,>FED6,>D6C6,>C600       ;  77 >4D M
      DATA >C6E6,>F6DE,>CEC6,>C600       ;  78 >4E N
      DATA >7CC6,>C6C6,>C6C6,>7C00       ;  79 >4F O
      DATA >FCC6,>C6C6,>FCC0,>C000       ;  80 >50 P
      DATA >7CC6,>C6C6,>D6CC,>7606       ;  81 >51 Q
      DATA >FCC6,>C6C6,>FCC6,>C600       ;  82 >52 R
      DATA >7CC6,>C07C,>06C6,>7C00       ;  83 >53 S
      DATA >7E18,>1818,>1818,>1800       ;  84 >54 T
      DATA >C6C6,>C6C6,>C6C6,>7C00       ;  85 >55 U
      DATA >C6C6,>C6C6,>C66C,>3800       ;  86 >56 V
      DATA >C6C6,>D6D6,>D6EE,>C600       ;  87 >57 W
      DATA >C6C6,>6C38,>6CC6,>C600       ;  88 >58 X
      DATA >6666,>663C,>1818,>1800       ;  89 >59 Y
      DATA >FE0C,>187C,>3060,>FE00       ;  90 >5A Z
      DATA >1E18,>1818,>1818,>181E       ;  91 >5B [
      DATA >0060,>3018,>0C06,>0000       ;  92 >5C \
      DATA >7818,>1818,>1818,>1878       ;  93 >5D ]
      DATA >1038,>6CC6,>0000,>0000       ;  94 >5E ^
      DATA >0000,>0000,>0000,>00FF       ;  95 >5F _
      DATA >1818,>0C00,>0000,>0000       ;  96 >60 `
      DATA >0000,>7C06,>7EC6,>7E00       ;  97 >61 a
      DATA >C0C0,>FCC6,>C6C6,>FC00       ;  98 >62 b
      DATA >0000,>7CC6,>C0C6,>7C00       ;  99 >63 c
      DATA >0606,>7EC6,>C6C6,>7E00       ; 100 >64 d
      DATA >0000,>7CC6,>FEC0,>7C00       ; 101 >65 e
      DATA >3C62,>60FC,>6060,>6000       ; 102 >66 f
      DATA >0000,>7CC6,>C67E,>067C       ; 103 >67 g
      DATA >C0C0,>FCC6,>C6C6,>C600       ; 104 >68 h
      DATA >1800,>3818,>1818,>1800       ; 105 >69 i
      DATA >0C00,>1C0C,>0C0C,>8C78       ; 106 >6A j
      DATA >C0C0,>C6DC,>F0DC,>C600       ; 107 >6B k
      DATA >3818,>1818,>1818,>1800       ; 108 >6C l
      DATA >0000,>6CFE,>D6D6,>C600       ; 109 >6D m
      DATA >0000,>7CC6,>C6C6,>C600       ; 110 >6E n
      DATA >0000,>7CC6,>C6C6,>7C00       ; 111 >6F o
      DATA >0000,>7CC6,>C6C6,>FCC0       ; 112 >70 p
      DATA >0000,>7CC6,>C6C6,>7E06       ; 113 >71 q
      DATA >0000,>7CC6,>C6C0,>C000       ; 114 >72 r
      DATA >0000,>7EC0,>7C06,>FC00       ; 115 >73 s
      DATA >3030,>7C30,>3030,>1C00       ; 116 >74 t
      DATA >0000,>C6C6,>C6C6,>7C00       ; 117 >75 u
      DATA >0000,>C6C6,>C66C,>3800       ; 118 >76 v
      DATA >0000,>C6D6,>D6EE,>4400       ; 119 >77 w
      DATA >0000,>C66C,>386C,>C600       ; 120 >78 x
      DATA >0000,>C6C6,>C67E,>067C       ; 121 >79 y
      DATA >0000,>FE0C,>3860,>FE00       ; 122 >7A z
      DATA >1C30,>3060,>3030,>1C00       ; 123 >7B {
      DATA >1818,>1818,>1818,>1818       ; 124 >7C |
      DATA >7018,>180C,>1818,>7000       ; 125 >7D }
      DATA >7099,>0E00,>0000,>0000       ; 126 >7E ~
      DATA >0000,>1028,>44FE,>0000       ; 127 >7F
SCS1E

+Vorticon · May 12, 2010

Actually, Lottrup relies on the ROM routines as well, the architecture of the Mini-Memory system has them loaded into an area of the 4k ROM. What Matt's talking about is writing your own routines that do everything. The video routines are pretty simple to set up; KSCAN a bit more tricky if you want to not rely on the SCAN routine in the ROM; XMLLNK, DSRLNK and GPLLNK are all very nasty. (I've only done DSRLNK myself, using Travis Watford's version with some modifications.)

For my own programming, I still like using BLWP to access them because of the convenience of swapped registers. If I have something that is speed-critical then I just use the direct read/write ports in the base code. That way I get both the convenience and the speed when I need it.

Adamantyr

I tend to agree. The video routines will usually provide the most gain in performance and need very little work to setup. I am not sure it is worthwhile to spend a huge amount of time on the other ones except when they impact performance significantly, which is pretty rare for most programs. I have to admit though that the floating point routines included in the E/A cartridge are horribly slow, and I would have loved to replace them when I wrote Skychart (it takes about 15 minutes to calculate the sky configuration!), but I doubt I had (or even currently have) the skills to do it.

Opry99er · May 12, 2010

I've re-worked the CRAPO example using this method... Still trying to debug it. I will post it when it's complete. Thanks Matthew, Walid!!!

+acadiel · May 13, 2010

I'll thread jack for a second.

I want to do something like this, but according to my Compute! assembly book, you can't redefine equates:

***************
* CF2K Module *
***************

CF2K   BL @GOGO * Set up Char Sets

* >6000 (highest bank)
BANKTO EQU >6000 * BANK TO SELECT
BYTCNT EQU >076A * BYTES DIV 4
ADRFRM EQU >6258 * ADDRESS TO COPY FROM
ADRTO  EQU >A000 * ADDRESS TO COPY TO
      BL @COPYME * COPY IT!

* >6002 (2nd highest bank)
BANKTO EQU >6002 * BANK TO SELECT
BYTCNT EQU >076A * BYTES DIV 4
ADRFRM EQU >6258 * ADDRESS TO COPY FROM
ADRTO  EQU >BDA8 * ADDRESS TO COPY TO
      BL @COPYME * COPY IT!

* >6004 (3rd highest bank)
BANKTO EQU >6004 * BANK TO SELECT
BYTCNT EQU >076A * BYTES DIV 4
ADRFRM EQU >6258 * ADDRESS TO COPY FROM
ADRTO  EQU >DB50 * ADDRESS TO COPY TO
      BL @COPYME * COPY IT!

* >6006 (4th highest bank)
BANKTO EQU >6006 * BANK TO SELECT
BYTCNT EQU >04C0 * BYTES DIV 4
ADRFRM EQU >6D00 * ADDRESS TO COPY FROM
ADRTO  EQU >2000 * ADDRESS TO COPY TO
      BL @COPYME * COPY IT!

... later down in the code ...

****************
* Copy Routine *
****************
COPYME
    MOV R11,R8     * Save our Return Spot
    MOV R0,@BANKTO * Do the bank switch
    LI R4,BYTCNT 
    LI R9,ADRFRM
    LI R10,ADRTO
LOOPIT
    MOV *R9+,*R10+
    MOV *R9+,*R10+
    DEC R4       
    JNE LOOPIT
    B *R8         * We're done…

My goal is to save the code from having to be rewritten three times. Thoughts?

+adamantyr · May 13, 2010

I'll thread jack for a second.

I want to do something like this, but according to my Compute! assembly book, you can't redefine equates:

My goal is to save the code from having to be rewritten three times. Thoughts?

Equates aren't opcodes, they're compiler-directives. Essentially, they're symbolic replacements for constant memory values. They make it easy on you, the programmer, to specify a label instead of a raw number that be changed later.

You can avoid code replication by storing your addresses in DATA statements, and then using registers pointing to the start of each array to populate your loop registers with an indirect move operation. It means burning up CPU memory, so you'll want to consider if the trade-off is worth it. (It usually is, unless it's a very small amount of data, or a very small amount of loop code.)

Adamantyr

+acadiel · May 13, 2010

Thanks, Adam - see my other thread, where I posted the source. I didn't mean to hijack your thread long term

Opry99er · May 14, 2010

So, assuming I want a constant motion for the main character in Beryl Reichardt-- how would I achieve this within the framework of a game? (perhaps "constant motion" is the wrong terminology) what I mean is--- look at Final Fantasy Mystic Quest or Zelda... The character, even while standing still, maintains the "walking" motion. It's essentially a CALL PATTERN on a SPRITE which continuously executes throughout the program... In XB it's easy to do, but not within the framework of a game, since it requires too much "attention" by the processor--- essentially, it's the only thing that can happen WHILE it's happening. You get my meaning. Anyway, I noticed that even in some versions of Pac Man, the character is opening and closing his mouth at the same rate, whether the play piece is in motion or not. Sort of similar. Anyway, I really like the battle sequences in Mystic Quest and I'm seeking something similar for LoBR... Which I may just do in 100% assembly as well. Any help you may have would be greatly appreciated.

Opry99er · May 14, 2010

Here's a quick cheezy program to explain what I'm talking about. Please don't criticize my code, I realize its sloppy and spaghetti which is why it's a mockup at 8:00 AM and not a released game. It's just to get a point across. =)

100 CALL CLEAR :: CALL SCREEN(5) :: CALL MAGNIFY(3)

110 CALL CHAR(96,"0F1F1C1C1E1F0F03030307070202020384C4241424C4841FC43C040480804070")

120 CALL CHAR(100,"0F1F1C1C1E1F0F03030307070303030380C0201020C18204A8502840000080E0")

130 CALL CHAR(104,"0F1F1C1C1E1F0F03030307070305080E80C0201020C080000040FF4080008070")

140 CALL SPRITE(#1,96,2,100,20,0,10)

150 FOR I=96 TO 104 STEP 4 :: CALL PATTERN(#1,I) :: FOR DELAY=1 TO 45 :: NEXT DELAY :: NEXT I

160 X=1

170 FOR I=100 TO 96 STEP-4 :: CALL PATTERN(#1,I) :: FOR DELAY=1 TO 45 :: NEXT DELAY :: NEXT I

180 FOR I=100 TO 104 STEP 4 :: CALL PATTERN(#1,I) :: FOR DELAY=1 TO 45 :: NEXT DELAY :: NEXT I

190 X=X+1

200 IF X<5 THEN GOTO 170

210 IF X>10 THEN CALL MOTION(#1,0,10) :: GOTO 170

220 CALL MOTION(#1,0,0) :: GOTO 170

WALK.zip

+retroclouds · May 14, 2010

If I understand correctly, want you want is like running multiple threads at a time.

You would have 1 thread that is handling the motion of the hero and another could be used to read the keyboard or handle the sprites position.

You might want to check how I handle this kind of stuff in the SPECTRA library.

Check the "TIMERS" section starting page 64. The PDF is here.

matthew180 · May 14, 2010

Hey Owen, sorry for the delay on my own thread. :ponder: I have my head in another TI related project right now and it is pretty consuming. At any rate, what you need is called "game loop" and is typically done with a concept known as a Finite State Machine (FSM). Don't let the name scare you, they are very handy and the key to a lot of programs. It was this concept that I never learned early on and why my games never worked like the arcade coin-ops.

I don't have time to go into it right now, but I'll try to get more out this weekend. Hold on, I think I have a very simple game loop written in XB...

Okay, here is a brief description I wrote for some kids I was teaching to program a while ago:

The Game Loop

-------------

The key component of any game is the game loop. The game loop allows the game to run smoothly regardless of a user's input or lack thereof.

Typically software applications respond to user input and do nothing without it. For example, a word processor formats words and text as a user types. If the user doesn't type anything, the word processor does nothing.

Games, on the other hand, must continue to operate regardless of a user's input so objects like bullets, bad guys, etc. can continue to move even if the player's character is idle. The game loop allows this. A highly simplified game loop, in pseudo code, might look something like this:

while ( user doesn't exit )

check for user input

run AI

move user

move enemies

resolve collisions

draw graphics

play sounds

end while

If you look at the FlyGuy II code, I use a FSM to implement the game loop. I will also be covering the game loop in this thread. Here is an example in XB that might give you the basic idea:

5   REM >>> GAMELOOP <<<
10  CALL CLEAR
20  X=16 :: Y=12
30  DX=0 :: DY=0
40  C=64 :: D=32 :: E=42
50  CALL HCHAR(Y,X,C)

60  REM >>> GAME LOOP
70  OX=X :: OY=Y

80  REM >>> CHECK USER INPUT
90    CALL KEY(0,K,S) :: IF S=0 THEN 160
100   IF K=81 THEN 320
110   IF K=87 THEN DX=0 :: DY=-1 :: GOTO 160
120   IF K=83 THEN DX=0 :: DY=1 :: GOTO 160
130   IF K=65 THEN DX=-1 :: DY=0 :: GOTO 160
140   IF K=68 THEN DX=1 :: DY=0

150 REM >>> MOVE USER
160   X=X+DX :: Y=Y+DY

170 REM >>> AI

180 REM >>> MOVE ENEMIES

190 REM >>> RESOLVE COLLISIONS
200   IF X < 1 OR X > 32 OR Y < 1 OR Y > 24 THEN 280

210 REM >>> DRAW GRAPHICS
220   CALL HCHAR(OY,OX,D)
230   CALL HCHAR(Y,X,C)

240 REM >>> PLAY SOUNDS

250 REM >>> END GAME LOOP

260 GOTO 70

270 REM >>> BLOW UP
280   CALL HCHAR(OY,OX,E)
290   DISPLAY AT(12,14):"BOOM"
300   FOR I=1 TO 500 :: NEXT I
310   GOTO 10

320 END

This is very simple, but demonstrates the concept that even if the user is not providing input, things still happen. More to come...

Matthew

Opry99er · May 16, 2010

Thanks!! I'm pretty clear on game loops--- but what I was wondering about is how to "multitask" so to speak. For instance, your XB soundplayer is excellent because it plays in the background... I want my character to "walk in the background" if that makes any sense. Try out the demo I posted in my last entry to this thread. I think I am just not familiar enough with assembly to quite understand how it all works in that form. Thanks for this thread man. it's definitely helpful!!!

Opry99er · May 16, 2010

Filip--- I'm reading through all 70-something pages of this SPECTRA stuff. It's pretty awesome man!!! I'll

be using this for sure.

matthew180 · May 17, 2010

Thanks!! I'm pretty clear on game loops--- but what I was wondering about is how to "multitask" so to speak.

Well, that "multitasking" comes from the game loop... All computers without more than 1 physical CPU do the same kind of thing, and even multi-CPU machines are chopping up the CPU time. When was the last time an OS ran just one or two processes? (That's a rhetorical question.)

Loop at my example again:

while ( user doesn't exit )
 check for user input
 run AI
 move user
 move enemies
 resolve collisions
 draw graphics
 play sounds
end while

The "move user" part does not have to be 100% dependent on "user input". In any game where you are going to have things like animation, some sort of quasi real-time AI, motion, or things happening in the "back ground", you need to slice up the tasks and make them all based on time or some sort of completion state.

This is harder to do in slow languages like interpreted BASIC on older slow (less than 100MHz) computers, but you can still fake it. In XB the effect won't be very impressive, however in AL it works very nicely. Look at FlyGuy II. The timer is ticking down and the spiders are moving no matter what the user is doing with FlyGuy.

Below is an example in XB. It is painfully slow, but in AL this would run very fast (if I get some time I'll rewrite it in AL to demonstrate.) This is also where a language like BASIC really sucks since it does not offer a "record" or "structure" data type which lets you keep related information grouped, thus we have to use arrays. Also, BASIC does not offer us any sort of equates or constants (like #define in C for example), so the indexes of the various elements in the array are done with variables that have some relatively appropriate name. In the code below, for example, XI means "x index", FCI means "first character index", etc. There is also a separate array for the players since the character value refers to a sprite and the x,y locations are pixels. Also, if we were doing collision detection, we would only care about comparing the player with the objects and not usually all the objects with one another.

Another thing to note is that the TICK variable would usually be some sort of real-time counter that is implemented in hardware and made available via the OS; like the number of milliseconds since midnight 1970 which is common on modern OSes. On the 99/4A we can use the VDP interrupt in AL which gives us a 1/60th of a second tick that can be used to time animations, movement, game timers, etc. Time in a game is tricky though. Notice in the code below that in the DRAW subroutine we decouple the actual time and make it relevant to the last time any drawing was done.

3100 IF TICK>TLAST THEN TDIF=TICK-TLAST ELSE TDIF=TICK
3110 TLAST=TICK

In the rest of the DRAW code we use TDIF as an amount of elapsed time. What this does is to speed up or slow down the animation based on how long it has been since the last time the DRAW function was called. This is a *very* important aspect to understand since it makes our game run at the same speed no matter how fast the computer is. Remember, TICK *must* be based on "real clock time", and is not a function of the CPU speed, otherwise this method does not work. On the 99/4A the VDP interrupt is going to always be 1/60th (or 1/50th of PAL) no matter what is going on with the rest of the computer. On modern computers there are timers and such available (as I mentioned above) that offer something similar.

So, notice what happens if we have a really fast CPU and we can call DRAW 1000 times in a single TICK (this is not unrealistic for a modern PC). The difference between the last time DRAW was called and the current time (TICK) will be zero, so our animations simply to not update. So the CPU can run really fast and the game will still run at the speed it was designed to run at. If the CPU gets busy and there are some delays in calling DRAW, the time difference is large and out animations "run really fast" to catch up. This is not really desirable, but on an multi-tasking OS it is not really avoidable. On the 99/4A you really won't have that problem unless your code starts to take too long to execute between calls to your various routines.

One really nice thing that this does do for us is that our game loop can run at the full speed of the CPU and the drawing will still be at the desired frame rate. So, if we have stuff we can do all the time that is not frame rate related, then it will get the full speed of the CPU. A smart modification to the DRAW function would be to simply return if the difference was zero. We don't have that problem in XB, but in AL you will.

I hope this shines a little light on how to get lots of stuff happening at the same time. Basically you have a lot of house keeping to keep track of for every single object that moves or is animated (or both.) You have to know where each item is, how fast it is moving, how many frames of animation is has, what the current frame is, how long to wait between frames, should the animation loop or play a certain number of times, does a certain frame trigger something else to happen, can the object be interrupted during an animation sequence, etc.

In the example below, when you move left or right (S or D), the character moves a certain number of pixels and plays a designated walk sequence. You can move again until the "take a step" sequence is done. Usually the player sequence would be fast and small enough that the player would not want to change direction (or even notice the change was not instant) before the sequence is done.

Matthew

XI:   X
YI:   Y
CI:   CURRENT CHARACTER
FCI:  FIRST CHARACTER IN SEQUENCE
LCI:  LAST CHARACTER IN SEQUENCE
TPCI: TICKS PER CHARACTER
TICI: TICKS ELAPSED ON CURRENT CHARACTER
XDI:  X DELTA PER TICK
YDI:  Y DELTA PER TICK


100 CALL CLEAR::CALL SCREEN(5)::CALL MAGNIFY(3)
110 OBJTOT=2::XI=0::YI=1::CI=2::FCI=3::LCI=4::TPCI=5::TICI=6
120 XDI=7::YDI=8
130 DIM OBJ(2,6)
140 DIM PLY(1,

150 GOSUB 5000
160 GOSUB 5500

200 QUIT=0
210 TICK=0
220 TLAST=0

500 IF QUIT=1 THEN 1000
510 GOSUB 4000
520 GOSUB 3000
590 TICK=TICK+1
600 GOTO 500

1000 END

3000 REM DRAW
3100 IF TICK>TLAST THEN TDIF=TICK-TLAST ELSE TDIF=TICK
3110 TLAST=TICK
3120 FOR I=0 TO OBJTOT
3130 CALL HCHAR(OBJ(I,YI),OBJ(I,XI),OBJ(I,CI))
3140 OBJ(I,TICI)=OBJ(I,TICI)+TDIF
3150 IF OBJ(I,TICI)<OBJ(I,TPCI) THEN 3200
3160 OBJ(I,TICI)=0::OBJ(I,CI)=OBJ(I,CI)+1
3170 IF OBJ(I,CI)>OBJ(I,LCI) THEN OBJ(I,CI)=OBJ(I,FCI)
3200 NEXT I

3300 CALL SPRITE(#1,PLY(0,CI),2,PLY(0,YI),PLY(0,XI))
3310 IF PLY(0,XDI)=0 THEN 3490
3320 PLY(0,XI)=PLY(0,XI)+PLY(0,XDI)
3330 PLY(0,YI)=PLY(0,YI)+PLY(0,YDI)
3340 PLY(0,TICI)=PLY(0,TICI)+TDIF
3350 IF PLY(0,TICI)<PLY(0,TPCI) THEN 3490
3360 PLY(0,TICI)=0::PLY(0,CI)=PLY(0,CI)+4
3370 IF PLY(0,CI)>PLY(0,LCI) THEN PLY(0,CI)=PLY(0,FCI)::PLY(0,XDI)=0::PLY(0,YDI)=0
3490 RETURN

4000 REM USER INPUT
4100 CALL KEY(0,K,S)::IF S=0 THEN 4490
4110 IF K=81 THEN QUIT=1::GOTO 4490
4120 IF K=83 AND PLY(0,XDI)=0 THEN PLY(0,XDI)=-2::GOTO 4490
4130 IF K=68 AND PLY(0,XDI)=0 THEN PLY(0,XDI)=2
4490 RETURN

5000 REM CHARACTER DEFS
5100 CALL CHAR(112,"F000000000000000")
5110 CALL CHAR(113,"F080808000000000")
5120 CALL CHAR(114,"F080808080808080")
5130 CALL CHAR(115,"F0808080808080F0")
5140 CALL CHAR(116,"F0808080808080FF")
5150 CALL CHAR(117,"F0808080818181FF")
5160 CALL CHAR(118,"F1818181818181FF")
5170 CALL CHAR(119,"FF818181818181FF")
5180 CALL CHAR(120,"0F818181818181FF")
5190 CALL CHAR(121,"0F010101818181FF")
5200 CALL CHAR(122,"0F0101010101017F")
5210 CALL CHAR(123,"0F0101010101010F")
5220 CALL CHAR(124,"0F01010101010100")
5230 CALL CHAR(125,"0F01010100000000")
5240 CALL CHAR(126,"0E00000000000000")
5250 CALL CHAR(127,"0000000000000000")
5300 REM SPRITE DEFS
5310 CALL CHAR(96,"0F1F1C1C1E1F0F03030307070202020384C4241424C4841FC43C040480804070")
5320 CALL CHAR(100,"0F1F1C1C1E1F0F03030307070303030380C0201020C18204A8502840000080E0")
5330 CALL CHAR(104,"0F1F1C1C1E1F0F03030307070305080E80C0201020C080000040FF4080008070")
5490 RETURN

5500 REM SET UP OBJECTS AND PLAYER
5510 OBJ(0,XI)=10::OBJ(0,YI)=10::OBJ(0,CI)=112::OBJ(0,FCI)=112::OBJ(0,LCI)=127::OBJ(0,TPCI)=1::OBJ(0,TICI)=0
5520 OBJ(1,XI)=11::OBJ(1,YI)=10::OBJ(1,CI)=112::OBJ(1,FCI)=112::OBJ(1,LCI)=127::OBJ(1,TPCI)=2::OBJ(1,TICI)=0
5530 OBJ(2,XI)=12::OBJ(2,YI)=10::OBJ(2,CI)=112::OBJ(2,FCI)=112::OBJ(2,LCI)=127::OBJ(2,TPCI)=3::OBJ(2,TICI)=0

5540 PLY(0,XI)=10::PLY(0,YI)=100::PLY(0,CI)=96::PLY(0,FCI)=96::PLY(0,LCI)=104::PLY(0,TPCI)=1::PLY(0,TICI)=0
5550 PLY(0,XDI)=0::PLY(0,YDI)=0

5990 RETURN

Edited May 17, 2010 by matthew180

Opry99er · May 17, 2010

Fantastic Matthew!! A very clear explanation and a good example as well. In XB it's pretty lame, but

I can certainly see the translation to Assembly and how strong it will be. Thanks again!

matthew180 · May 22, 2010

I guess I better get another installment posted so Owen does not get bored. ;-) Although this part is not as exciting as moving sprites or making games, it is essential to programming in assembly. So, on with the show.

Registers and the "Workspace Pointer"

Internal to a CPU are several "registers" that help make memory (used loosely here to indicated all types, i.e. RAM, ROM, VRAM, GRAM, GROM, etc.) access possible, which is where your programs and data are stored. As a programmer you need to understand these registers and how to use them. The CPU registers are typically measured in bits and are usually the same size as the CPU’s data or address bus. Every CPU is going to have at least two registers:

1. The Program Counter (PC)

2. The Status Register (ST)

The Program Counter is a special CPU controlled register that always points to the address in memory where the next instruction will be read. Since this register is holding a memory address, it is usually the same size as the CPU’s address bus. However, on some CPUs like the Z80, even though it is an 8-bit CPU (based on its data bus), the program counter is 16-bits since the Z80 can address 65,536 (64K) bytes of memory. The Status Register is another CPU controlled register that will be changed based on certain events that happen in the CPU. The status register’s size will depend on the CPU design and is "bit mapped", meaning the specific bit positions in the register mean something special depending on if the bit is set to a 1 or 0. For example, after a subtraction operation, there is usually a "carry bit" in the status register that will be 1 or 0 depending on if the subtraction caused a carry. The status register bits are usually referred to as "flags", and if a bit is 1 it is called "set", and if a bit is 0 it is called "reset". Every instruction the CPU can execute *may* affect the status register flags in certain ways, and your program can make decisions based on these flags using the various conditional jump instructions.

A CPU will generally have other internal registers that can be used by the programmer for various tasks. Some of the registers might have specific uses, for example an "accumulator" register used during mathematical operations, or an index register that can be used to help access bytes in memory based on an offset. All CPUs have different registers and you will need to know what those registers are and if any of them have specific or special purposes. In the TMS9900 CPU there are 16 "general purpose" registers available to the programmer, meaning none of them have any special uses (except for registers 0 and 12, but only with certain instructions). Most CPUs of that time (early 1980's) only had about 4 to 8 registers and each had a special use, which makes the TMS9900 very flexible and easier to program in comparison. However, unlike most other CPUs (ever), the TMS9900’s registers are not built in to the CPU itself!

Most CPU's registers are "hardware" registers and reside inside the CPU chip itself which makes accessing the data in a register very fast. However, the 16 general purpose registers of the TMS9900 CPU are actually stored in the computer’s RAM. The TMS9900 has a special hardware register called the "Workspace Pointer" (WP) that holds the memory address of where the register memory starts in the computer’s RAM. This design has the unfortunate side effect of making register access slower on the TMS9900 CPU when compared to other CPUs. There is an up side to this design though, it makes a context switch (changing from one process to another) very efficient, which is important in a multi-tasking system which is how the TMS9900 CPU was designed to be used (TI's mini computers). Unfortunately the TI-99/4A is a single tasking computer and this "feature" of the CPU is not really needed, so instead of a benefit it becomes a slow down.

Which brings us to the reason why the WP is so important and why you should *always* have the WP set to a location in the 256 bytes of scratch pad RAM, which is the only 16-bit RAM in the 99/4A! (Did we cover the scratch pad RAM already? I should read my own thread or post more often! ;-) )

One of the first instructions you will see in any assembly program is going to be something like this:

       LWPI  >8300

That means: Load Workspace Pointer Immediate. The "immediate" part of the instruction refers to an immediate value instead of a reference like a memory location or a register. All of the "immediate" instructions in the TMS9900 instruction set require a numeric value and can not have a memory or register operand. In this case, the number >8300 is the immediate value that will be loaded into the WP. What this does is specify to the TMS9900 what memory to use to store the bytes that make up the 16 general purpose registers.

Since each register is 16-bit, then each takes 2 bytes, and since there are 16 registers, then the memory addresses from the WP to WP+32-1 (minus 1 because you have to remember to count 0) will be used by the CPU to store the contents of the registers. In our example, register use would look like this:

(32 in decimal is >20 in hex), so our formula indicates the range of >8300 to >83FF:

>8300 + >20 - >01 = >83FF

MSB LSB

R0 >8300, >8301

R1 >8302, >8303

R2 >8304, >8305

R3 >8306, >8307

R4 >8308, >8309

R5 >830A, >830B

R6 >830C, >830D

R7 >830E, >830F

R8 >8310, >8311

R9 >8312, >8313

R10 >8314, >8315

R11 >8316, >8317

R12 >8318, >8319

R13 >831A, >831B

R14 >831C, >831D

R15 >831E, >831F

Note that the WP must be an even address. If you try to load an odd address, the CPU will use the even address below the address specified. Also realize that you can access this memory directly, which will have the affect of accessing and/or modifying the values stored in the general purpose registers. While you don't usually what to do that, it can come in handy and I'll give an example of this when we talk about the VDP routines (which should be coming next.)

The value >8300 is used here because that represents the first address in the scratch pad RAM, which exists from >8300 to >83FF (256 bytes.) Actually there is a lot of waste in the memory-mapped devices in the 99/4A and the scratch pad RAM is not fully decoded, which means it will respond to memory address in these ranges:

>8000 to >80FF

>8100 to >81FF

>8200 to >82FF

>8300 to >83FF

Each range is accessing the same memory, however it is pretty much universally accepted that the scratch pad should be addressed from >8300 to >83FF. Something I've always wanted to do is add real memory to each of those address ranges and give the 99/4A 1K of real 16-bit scratch pad RAM! But that's another day, another project.

So, the scratch pad RAM is very important (have I stressed this enough yet?) because it does not incur the wait-states that the other memory in the system does, and because it is 16-bit memory and matches the 16-bit TMS9900 CPU. The only other 16-bit memory in the system that does not incur the dreaded wait-states is the ROM, which we can not change and don't really need for our assembly programs (with a few exceptions.) Thus, always put your workspace in the scratch pad RAM, somewhere between >8300 and >83E0 (32 bytes below >83FF).

Trivia: The 256 bytes of scratch pad RAM exists (probably, I like the story anyway) because the 99/4A was originally supposed to have the TMS9995 8-bit CPU which has the 256 bytes of scratch pad RAM build in to the CPU itself! However, the story goes that the TMS9995 was not ready in time and the engineers had to shoe-horn the TMS9900 into the 99/4A, and thus they needed to provide that 256 bytes of scratch pad. Also, static RAM was *really* expensive back then which is probably why we only got 256 bytes... bummer. :-(

Note, the VDP routines in the E/A and XB cart (the ones you use when you specify REF VSBW, etc. in your assembly programs) set the WP to an address in the low 8K of the 32K expansion, and that RAM is 8-bit!! Every access to registers will case 8 wait-states (since registers are 16-bit, they always access 2 bytes). This is probably the biggest reason to not use those routines.

Where exactly you set the workspace pointer is up to you, but when you are writing a program that runs from a cartridge, the 256 bytes of scratch pad memory are the only RAM you have in the machine, aside from the VDP RAM which is 8-bit and slow to access compared to the scratch pad RAM. Of course there is also the 32K of 8-bit RAM in the PEB, however most games do not require the PEB and you have to decide if your program will or will not. If not, then the 32K will not be available.

I like to set the WP to >8300 since it makes it easy to remember where the registers are. Then I have the remaining 224 bytes from >8320 to >83FF for my program's variables. If you absolutely need more RAM, then you will have to use the VDP RAM or require the PEB and 32K.

Okay, I hope that makes sense. I know it is a little dry, but necessary to understand what is going on with the CPU, WP, and registers. Next we can dissect VDP access and get something on the screen!

Matthew

+retroclouds · May 22, 2010

Very nice Matthew! I love reading your posts

Assembly on the 99/4A

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members