Assembly on the 99/4A

Airshack · June 12, 2017

I'm finally getting around to commenting on this thread from matthew180 which he started back in 2010.

Hopefully we can reignite this excellent Assembly Language tutorial/conversation.

I'm posting this mostly as a response to help Owen get past some of the initial hurdles of AL programming and also as maybe a "starter kit" for anyone else who wants to get going with AL on the 99/4A

I knew I'd want a better workflow and a modern set of tools for my "starter kit" before this BASIC programmer began climbing Mount Assembler.

Here's what I came up with thanks to the suggestions of many on AtariAge:

Editor: IntelliJ IDEA - Has a nice TMS9900 Assembly Language plug-in which also works with TI BASIC

https://www.jetbrains.com/idea/

IntelliJ Plugin (xdt99 IDEA) for TI BASIC and Assembly: https://github.com/endlos99

*** Thanks to Ralph Benzinger

The xdt99-mode and xdt99 IDEA plugins provide editor support for writing assembly and TI Extended BASIC programs using the GNU Emacs and the IntelliJ IDEA development environments, respectively. Plugin features include syntax highlighting, navigation, and semantic renaming, among others.

Cross Assembler: Asm994A (part of the Win994a simulator for Windows)

http://www.99er.net/win994a.shtml

*** Thanks to Cory Burr

AtariAge thread on using asm994a: http://atariage.com/forums/topic/229206-using-asm994a/

Emulator: Classic99

http://harmlesslion.com/

*** Thanks to Tursi

Cross-Assembler: Downloaded

Edited June 12, 2017 by Airshack

Airshack · June 12, 2017

The 11 May 2010 entry begins with a little sample code:

I'm posting this mostly as a response to help Owen get past some of the initial hurdles of AL programming and also as maybe a "starter kit" for anyone else who wants to get going with AL on the 99/4A. I will be presenting the information and programming style that I like to use, but of course there are always other ways and other people will have their own methods and madness. I'm not interested in arguing styles and such in this thread, so please let's not go down that road.

There are *basically* (but not only) two types of AL programs formats you can write on the TI, programs that run from the Editor/Assembler (EA) menu 3 option, commonly known as EA3. These types of programs require a linking phase to finalize the memory references before execution. Because of this, this kind of code is "relocatable" and the loader can place the program just about anywhere in memory that it will fit.

The second type of program is one meant to be run from a cartridge ROM. These programs are set up differently and have a few more restrictions as to defining labels and such. For now we will stick with EA3 types of programs.

This is the basic skeleton:
       DEF  START

* Typically EQUates, DATA, and BYTE defintions for variables
* will be at the head of the program, with longer data sets
* places at the bottom of the code.


* Program execution starts here
START  LIMI 2

LP9999 JMP  LP9999

       END
This program will do nothing but execute an endless loop. However, interrupts were enabled with LIMI 2, so the FCTN= combination can be used to reset the console.

This line:

DEF START

Is an assembler directive, which means it will be read and used by the assembler. It is *not* assembly language and does not cause any code to be created. This directive is used to place a label in the REF/DEF table that the linker uses to load and find programs. This entry is saying that there will be a label called "START" and that is where our program will begin execution. "START" could be anything we want that is 1 to 6 characters in length.

Let's add something that will actually do something for us like set the VDP to Graphics Mode I, clear the screen, and say "Hello World". I think this is the basic entry level program that most people try to start with.

*Note*
Sometimes you will see a $-2 or such used in code. The assembler users the $ to represent the current location and will subtract (or add) the specified number of bytes to the current location when generating an address. This is usually used as a short cut for short loops to avoid using a label. I do not currently recommend using this method because I have found that Asm994a gets it wrong, and as of right now that is the only Windows based assembler I know of. If you are going to stick 100% with the E/A, then go ahead, but don't be surprised if you try to assemble with Asm994a and your code does not work.
      DEF  START

* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read/write address

* Workspace
WRKSP  EQU  >8300             * Workspace
R0LB   EQU  WRKSP+1           * R0 low byte reqd for VDP routines

* Program execution starts here
START  LIMI 0
       LWPI WRKSP


       LIMI 2
LP9999 JMP  LP9999

       END
First notice that there are no REF statements for the normal VDP routines. This is because I don't like using the console based routines for a couple of reasons:

1. They are slow since they were designed to save ROM space instead of being fast to execute
2. They use BLWP which is slow and designed primarily for context switching in a multi-tasking system like TI's mini computers where the TMS9900 CPU was designed to be used
3. The routines use a workspace in 8-bit RAM! This in of itself should be a sin!

Also, I don't like to use the GLP routines because they require a workspace pointer in the 256 bytes of scratch pad RAM. Since there is only 256 bytes of 16-bit RAM in the machine, it is highly prized memory and I don't like some other console routines chewing up 32 bytes of that RAM. Anyway, the VDP routines provided in the E/A cartridge are slow, slow, and slow, and I'm a speed freak.

What we set up in the code is an equate (EQU) to the memory address where the VDP is mapped in the 99/4A's address space. Equates are assembler directives that simply let us use a label instead of a number. The assembler will do a search and replace on the labels, so any place you see VDPRD, for example, will be replaced with >8800.

We have also changed the first instruction to LIMI 0, which will disable the VDP interrupt, which in turn disables the console ISR. While the VDP interrupt is a nice thing to have to use as a sixtieth of a second timer in a game, it triggers the console's one and only interrupt service routine which we usually don't want running (it tries to do too much IMO.) So, we shut it off. We can still use the VDP interrupt, we just have to poll for it, which is not too bad.

The next thing we do is LWPI (load workspace pointer immediate), which sets the workspace pointer to >8300, which is the first address in the 16-bit scratchpad RAM. Since the registers in the 9900 CPU are memory-based, you absolutely positively always want to keep the workspace pointer set to an address in that 16-bit RAM (unless you really want to kill performance, in which case you should just use XB... )

Finally you will note an equate called R0LB. This creates a label that gives us a convenient way to access the low byte of R0 without using SWPB. It comes in handy when dealing with the VDP as you will see in the routines we set up.

Now let's do something, like clear the screen:
       DEF  START

* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read/write address

* Workspace
WRKSP  EQU  >8300             * Workspace
R0LB   EQU  WRKSP+1           * R0 low byte reqd for VDP routines


* Program execution starts here
START  LIMI 0
       LWPI WRKSP

       CLR  R0                * Set the VDP address to zero
       MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address

       LI   R1,>2000          * Set high byte to 32 (>20)
       LI   R2,768            * Set every screen tile name to >20
CLS    MOVB R1,@VDPWD         * Write byte to VDP RAM
       DEC  R2
       JNE  CLS

       LIMI 2
LP9999 JMP  LP9999

       END
For now we are doing things directly, but soon we will make a subroutine to encapsulate the VDP reading and writing. One of the most important things to understand about the 9918A VDP is that it maintains an internal address register that auto-increments any time a byte of data is written-to or read-from the VDP. This is handy since setting the VDP's address register takes two writes to the VDP write-to-register port. You can see that above. R0 is loaded with the address we want to set. In this case we are going to set up the address of zero, since I know the VDP name table defaults to memory location zero, we are setting up the VDP address to write bytes to the VDP RAM used to display the screen. A little later we will set the name table location directly so we know without a doubt where it is located in the VDP RAM.

The other thing to know about the VDP registers is that they are read only (except for the status registers.) So the only way to know for sure where the various VDP tables are located is to set them.

So, the address is set up and written to the VDP. The VDP's internal address register is 14-bits since it has to reference up to 16K, so it takes two 1-byte transfers to load the address register. After that, any data we write (or read) to the VDP will cause the address to auto-increment after the write (or read.)

Next we load R1 with the byte we want to write to the VDP. When writing from registers to memory mapped devices, the MSB of the register will always be transferred. Thus, the low byte of R1 really does not matter, but we use >00 to help keep things clear. Since we are clearing the screen, the pattern for tile (character) 32 is already defined as space thanks to the console boot-up process (ASCII 32, or >20 in hex.) We set R2 to 768 to keep track of how many bytes we have written to the VDP. We count *down* to take advantage of the fact that the 9900 CPU will compare R2 to 0 after the DEC instruction and we can use that to control the JNE (jump if not equal) instruction. So we are saying, Jump if R2 is not 0.

Finally we re-enable interrupts and cause an infinite loop. Something else to note. You *MUST* disable interrupts when reading or writing to the VDP because the console ISR also reads and writes the VDP and will mess up the address you had set up. And since your program has no way to know that it was possibly interrupted, your VDP accesses will be all messed up. But, we don't need the console ISR, so we will just leave it disabled until the end, if we ever enable it at all.

Okay, that's is for this installment. Next I'll add code for a full VDP mode setup so things are exactly where we want them, and we can set up the VDP subroutines.

Matthew

Today I had the opportunity to play with the code and see if I could figure everything out.

One issue I had was not knowing that MOVB is an even-to-even or odd-to-odd byte operation.

Here's my code I've over-commented in an attempt to capture Matthews first lesson along with some things I've learned while studying:

Intro to Assembly Language for the TI Home COmputer by Ralph Molesworth

Fundamentals of TI-99/4A Assembly Language by M.S.Morley

TI's Editor Assembler Manual (as a reference only)

Compute!'s Beginner's Guide to Assembly Language on the TI-99/4A by M. L. Lottrup

DEF LESON1

* no REF statements because VDP routines are slow: Console routines are designed to save ROM, not run quickly
* BLWP is for context switching TI 990 type multi-tasking operations
* NOTE: TI-99/4A was not initially designed to use this 16-bit CPU
* the console routines use 8-bit RAM for their workspace! BAD TI!
* the 99/4 has but 256 bytes of 16-bit RAM
* the VDP routines in E/A cartridge are super slow - avoid them
* GPL routines are slow and use up 32 bytes of 16-bit RAM
* E/A cartridge copies code from its GROM into low memory starting
* at >2000 so they cost you expansion RAM which is unnecessary

* VDP Memory Map - VDP RAM is accessed by the following four decoded address lines:
VDPRD EQU >8800 * VDP Read Data
VDPSTA EQU >8802 * VDP Read Status
VDPWD EQU >8C00 * VDP Write Data
VDPWA EQU >8C02 * VDP Read/Write Address
* VDP RAM is not directly addressable from the CPU, you MUST go through the VDP

* Workspace
WRKSP EQU >8300 * Workspace in beginning area of 16-bit fast "CPU RAM"
* * TI-99/4A only has 256 BYTES of CPU RAM! Range: >8300->83FF

RoLB EQU WRKSP+1 * R0 low byte
* gives us access to LSB of R0 w/o using the slow ~~console routine~~ SWPB

EDIT: SWPB is not a console routine but a TMS9900 instruction which happens to be slower than it needs to be.
* console routines are designed to save ROM space, not for speed

* program execution begins here
LESON1 LIMI 0 * You MUST disable VDP interrupts any time you write/read with VDP
* because console ISRs will mess with your VDP addressing since they
* read and write to the VDP on their own
* NOTE: Whenever disabled we can still use it by polling.

LWPI WRKSP * ALWAYS set workspace pointer to fast CPU RAM (unless context shifting)

MAINLP CLR R0 * we'll use R0 to set up the VDP write address, R0 now = >0000

MOVB @RoLB,@VDPWA * send LSB of R0 to LSB of VDP RAM read/write address (00000000 ==> >8C02+1)
* 16-bit VDP R/W Address uses 14-bits (13-0) to address 16K of memory

* * NOTE: Since @RoLB is an ODD address the MOVB will move that LSB to the
* LSB @VDPWA, or @VDPWA+1. MOVB transfers even-to-even or odd-to-odd

MOVB @R0LB,@VDPWA * sends LSB of R0 to VDP Write Address Register LSB

* when loading the VDP Address Register you MUST send the LSB first

* this is simply a function of how the 9918A works in the TI-99/4A

* * NOTE: With E/A the Screen Image Table default range is >0000->02FF
* this table is divided into three sections of 256 bytes each.
*
ORI R0,>4000 * sets MSB R0 (future read/write bits 14 and 15) to signal a write (01)
* LSB does not matter since it is not used in the next MOVB
* R0: 0000 0000 XXXX XXXX
* >4000: 0100 0000 0000 0000
* Result: 0100 0000 XXXX XXXX

MOVB R0,@VDPWA * send MSB of R0 to MSB of 16-bit VDP RAM write address

* the second write to VDP Write Address Register ALWAYS writes to the MS-Byte
* a MOVB from a Register ALWAYS sends the MS-Byte

* NOTE: >0000->02FF is the default range for the screen image table

* NOTE: VDP internal register auto increments after a VDP read or write.

* Setting the VDP address register takes TWO writes (MOVB) to the VDP
* write-to-register port.
* Subsequent writes in this program will only take ONE write due to
* the initial setup along with auto-increment

LI R1,>4000 * sets MSB to "@" code >40
LI R2,768 * Loop counter set in R2 to 768 = number of tiles on a 32*24 screen

ATFILL MOVB R1,@VDPWD * write "@" character in MSB R1 to the VDP Write Data resister
* write address value in @VDPWA auto-increments with each write/read
DEC R2 * decrease loop counter in R2
JNE ATFILL * Jump up to CLS if R2 is not zero

LI R2,>FFFF
DELAY1 DEC R2
JNE DELAY1

CLR R0
MOVB @RoLB,@VDPWA * sets up LSB of VDP Write Address Register to beginning of Screen
ORI R0,>4000
MOVB R0,@VDPWA * sets MSB

LI R1,>2000 * >20 = BLANK character
LI R2,768

CLS MOVB R1,@VDPWD * write blank character in MSB R1 to the VDP Write Data Register
DEC R2
JNE CLS

LIMI 2 * enable interrupt, allows ALT+= to halt program execution
LI R2,>FFFF *delay for clear screen
DELAY2 DEC R2
JNE DELAY2
LIMI 0 * disable interrupt, don't want ISRs messing with my VDP writing

JMP MAINLP
able to
END

~~Once I figured out that MOVB @RoLB,@VDPWA was working on the LSB of the VDP Write Address Register @VDPWA, I was able to understand everything.~~

Edit:

My code was modified to fill the screen with the "@" character and then clear it out because I wasn't sure matthew's screen clearing code was actually working on my machine. I'm new to the whole E/A process so I wasn't sure if the blank screen was a sign that the code worked or maybe just locked up and somehow was displaying a blank screen. The drawing and clearing of @-symbols made the status of the running code obvious.

Question for the TI faithful:

~~If you leave the interrupt mask off (LIMI 0), the code will start doing funny things like changing the character foreground and background colors? Not sure why?~~

If you leave the interrupt mask on (LIMI 2), the code will start doing funny things like changing the character foreground and background colors? Not sure why?EDIT:

Obviously, this shows that the console's ISRs are changing things around somehow. The specifics are what haunt me.

Edited June 13, 2017 by Airshack

+Lee Stewart · June 12, 2017

...

Cross Assembler: Asm994A + IntelliJ Plug-in for TMS9900 Assembly and BASIC

https://github.com/endlos99

This is Ralph Benzinger's xas99, not Corey Burr's asm994a.

...lee

Asmusr · June 12, 2017

It's not correct that MOVB only transfers even-to-even or odd-to-odd. MOVB can also transfer odd to even or even to odd.

Asmusr · June 12, 2017

MOVB @RoLB,@VDPWA sends R0 LSB to @VDPWA which is the MSB of that word.

+Lee Stewart · June 12, 2017

To be clear:

All writes to the VDP are byte writes—The first byte written to VDPWA is assumed to be the LSB of the VRAM address. Matthew's code always writes the first byte to VDPWA from the LSB of the relevant workspace register—and, it is always his first write by his design. It just happens to be odd by his plan. If he had written the first byte of the register, it would have been even to odd.
SWPB is not a console routine. It is a TMS9900 instruction. It just happens to be slower than some other instructions and slower than it needs to be to give the VDP time to recover.
Ampersand is ‘&’, not ‘@’.

...lee

Airshack · June 12, 2017

This is Ralph Benzinger's xas99, not Corey Burr's asm994a.

...lee

Thanks for straightening me out on the links Lee! I think they're all correct now with the proper credits. - james

Airshack · June 12, 2017

It's not correct that MOVB only transfers even-to-even or odd-to-odd. MOVB can also transfer odd to even or even to odd.

Damn! There goes my understanding of what's happening!

To be clear:

All writes to the VDP are byte writes—The first byte written to VDPWA is assumed to be the LSB of the VRAM address. Matthew's code always writes the first byte to VDPWA from the LSB of the relevant workspace register—and, it is always his first write by his design. It just happens to be odd by his plan. If he had written the first byte of the register, it would have been even to odd.

SWPB is not a console routine. It is a TMS9900 instruction. It just happens to be slower than some other instructions and slower than it needs to be to give the VDP time to recover.

Ampersand is ‘&’, not ‘@’.

...lee

Here's the golden nugget I was searching for: "The first byte written to VDPWA is assumed to be the LSB of the VRAM address."

Thanks guys! I think I get it now -- again!

Airshack · June 12, 2017

SWPB is not a console routine. It is a TMS9900 instruction. It just happens to be slower than some other instructions and slower than it needs to be to give the VDP time to recover.

...lee

Thanks again Lee!

To be clear

Ampersand is ‘&’, not ‘@’.

...lee

Clearly! ;p

Edited June 12, 2017 by Airshack

Tursi · June 12, 2017

If you leave the interrupt mask off (LIMI 0), the code will start doing funny things like changing the ampersand foreground and background colors? Not sure why?

Obviously, this shows that the console's ISRs are changing things around somehow. The specifics are what haunt me.

If you leave the interrupt mask off, then the ISR doesn't run.

I can't see offhand why that program would do anything unusual in that case though...

apersson850 · June 12, 2017

I can add that MOVB always writes a word, not a byte. If you do MOVB R1,@100, then the CPU will write to the bytes @100 and @101, placing the most significant byte of R1 in @100. If you instead do a MOVB R1,@101, the CPU will write to the two bytes @100 and @101, placing the most significant byte of R1 into @101.

To prevent destroying the other byte, the one not explicitly addressed, the CPU actually reads the word @100, modifies the left or right byte of that word and writes it back to @100. The TMS 9900 has to work like this, since it's a byte addressable device that can only address 16-bit words. After all, it has a 16 bit wide data bus, but only a 15 bit address bus. Thus it can actually access 32768 words, not 65536 bytes. Bytes are handled by manipulating one half of a word.

Normally, you never need to think about this, as it's transparent to the user. But there is one instance when it's important, and that's when addressing memory mapped devices that are auto-incrementing their internal address. Like the VDP. Since it increments the address on both a read and a write, you can't have the VDPWD and VDPRD decoded on successive bytes. If they were, the read-before-write concept in the TMS 9900 would increment the address when fetching the word that contains the byte that shouldn't change and again increment it when storing the word that contains the byte we want to store.

Another thing that's not important for programming by itself, but could be good to know if you talk to other programmers, is that the programs in this thread aren't compiled. They are assembled. I understand that everyone knows what's implied here, but using the correct terminology is sometimes important, when sharing ideas.

If somebody doesn't know the difference, then assembly is the process of translating mnemonics like MOV R1,R4 to the binary code this instruction has, in this case >A044. But there's always a direct relationship between the instruction as written, MOV R1,R4, and the instruction assembled, >A044.

Compiling is instead the art of translating a language the CPU doesn't understand to assembly language, so it can be executed.

This is much more complex, since the same statement, say

a := b+c;

can be compiled to an almost infinite number of instructions. Just to see what it can be, lets assume we have this code:

var a,b,c: integer;

begin

a := b+c+2;

end;

If we assume variables are allocated at VARBASE and we have R9 as pointer to this variable area, the compiler may generate

VARBASE BSS 6

EQU ERECP 9

LI ERECP,VARBASE

MOV @2(ERECP),*ERECP

A @4(ERECP),*ERECP

MOV *ERECP,R0

AI R0,2

MOV R0,*ERECP

Another compiler may generate code that looks like this

VA DATA 0

VB DATA 0

VC DATA 0

MOV @VB,R0

A @VC,R0

AI R0,2

MOV R0,@VA

Or the first one could be

VARBASE BSS 6

EQU ERECP 9

LI ERECP,VARBASE

MOV @2(ERECP),*ERECP

A @4(ERECP),*ERECP

INCT *ERECP

Note that it's never sure exactly what the source was when the program is compiled, but the source can be reconstructed instruction by instruction by disassembling an assembled program. You lose the label names, but you can see the instructions exactly as they were written.

In the first compiled example, the source code

var d,e,f: integer;

begin

d := e+f;

end;

would give the same result.

It's not essential to know about the difference to be able to write assembly language programs, but it's always good to know the distinction.

Edited June 12, 2017 by apersson850

matthew180 · June 12, 2017

Edit: lots of posts while I was writing this, so adding to what others have clarified (not just Rasmus and Lee).

To add to what Rasmus and Lee have already clarified about the MOVB and such:

The 9918A is an 8-bit device and is wired into the 9900's address space at specific addresses, i.e. it is a memory-mapped device. The specific address of the 9918A is determined by the decode logic on the 99/4A motherboard and cannot be changed without physically modifying the motherboard. The 99/4A designers also decided to wire the 9918A's 8-bit bus to the MSB of the 9900's data bus. That means any "word" read or write (MOV instruction) will transfer the MSB or the register or memory to/from the 9918A.

So something like this:

MOV R0,@>8C00

Will move the MSB of R0 to addresses >8C00, and the LSB of R0 to address >8C01. Since the 9918A is only physically wired to the MSB of the data bus, it will only ever see the MSB. The address >8C00 is used by the decode logic to enable the 9918A when the 9900 writes to that address. In other words, the 9900 address >8C00 will enable the 9918A, and the MSB of the 9900 data bus will be transfered to the 9918A.

When you use the MOVB instruction, the 9900 will operate on whatever memory byte is addressed. However, when one or both of the operands of the MOVB instruction is a register, the MSB of the register will always be read or written. For example:

MOVB R0,R1         ; move R0 MSB to R1 MSB

MOVB R0,@>8C00     ; move R0 MSB to the byte address >8C00

MOVB R0,@>8C01     ; move R0 MSB to the byte address >8C01

MOVB @>8800,R0     ; move the byte from address >8800 to R0 MSB

MOVB @>8801,R0     ; move the byte from address >8801 to R0 MSB

MOVB @>8800,@8C01  ; move the byte from address >8800 to the byte address >8C01

The reason I use the R0LB design is because when "setting up" the 9918A's internal 14-bit memory register, you have to send the low-byte of the address first, then the high byte (that's just the way the 9918A works). Since it is convenient to use a 9900 register to hold the address you want to set-up in the 9918A, you end up having to send the LSB of the register first. So you would need something like this:

LI   R0,>0400   ; Address to set-up in the 9918A in R0

SWPB R0         ; Move the address-LSB to R0-MSB

; Now the LSB of the *address* to set-up in the 9918A is in the MSB of R0.
MOVB R0,@>8C02  ; Send the address-LSB (in the MSB of R0) to the 9918A.

SWPB R0         ; Move the address-MSB back to R0-MSB

; The 9918A looks at bits >C0 (two MS-bits) to determine the kind of data being sent.
; The two MS-bits must be "01" ("01xxxxxx") to indicate a "write-address" is being
; set-up.

ORI  R0,>4000   ; Set MS-bits of R0 to "01"

; Now send the MSB of the *address* to set-up, either instruction will work:
MOVB R0,@>8C02

Because a register is being used to hold the set-up address, and the MOVB instruction always works on the MSB of a register operand, right away you have to swap the register bytes (because the 9918A requires the LSB of the set-up address to come first), then swap them back. This always bothered me. One solution you could do would be to load R0 with the set-up address in LSB/MSB order to begin with, for example:

LI  R0,>0004    ; Set up address >0400 in the 9918A.

However, that is unintuitive and error prone IMO, and you also still need at least one SWPB instruction.

In my design, I exploite the fact that the registers in the 9900 are actually in memory (the general purpose registser in the 9900, R0 .. R15, are not actaully in the 9900 itself). Since R0 is really two memory locations in RAM, and the MOVB instruction can address individual bytes, I use the address of the LSB of R0 to send the first byte (the address-LSB) to the 9918A.

For example, using the workspace (the location in RAM where the 9900 references its registers) of >8300 (which is 16-bit RAM in the 99/4A), the registers will look like this in memory:

8300 R0 MSB
8301 R0 LSB
8302 R1 MSB
8303 R1 LSB
.
.
.
831E R15 MSB
831F R15 LSB

Because we tell the 9900 where in RAM the workstace is (with the LWPI instruction), we know where the LSB of R0 is and that can be used to move the first byte (the LSB) of the address to the 9918A:

LI   R0,>0400       ; Address to set-up in the 9918A in R0
MOVB @>8301,@>8C02  ; Send the address-LSB (LSB of R0) to the 9918A.

; Then make sure the two MS-bits are "01":
ORI  R0,>4000       ; Set MS-bits of R0 to "01"

MOVB R0,@>8C02		; Send the MSB of the address to set-up.

This version is smaller and faster because it eliminates the two SWPB instructions. It also has at least one instruction (the ORI) between writes to the 9918A, which help make sure you don't over-run the 9918A.

This would not work if the 9900's registers were "real" registers internal to the CPU itself (like most other CPUs). In that case using SWPB would be required.

You could even speed this up in specific situations by removing the ORI and using MOV for the second byte:

LI   R0,>4400       ; Address to set-up in the 9918A in R0 plus "01" in MS-bits
MOVB @>8301,@>8C02  ; Send the address-LSB (LSB of R0) to the 9918A.
MOV  R0,@>8C02		; Send the MSB of the address to set-up.

~~MOV is slightly faster than MOVB because it surpresses the read-before-write~~ (edit: not true), and both MOV and MOVB will send the MSB of R0. The desination is an even address, so the MSB of R0 still goes to the same place, and the LSB of R0 is written to a non-existent memory location as far as the 99/4A is concerned. However, this code might have a side effect of over-running a 9918A (but is not a problem with the F18A) depending on if your code is running out of scratch-pad. Thus, I tend to stick with the version above with the ORI between the two writes and avoid having to include the MS-bit setup as part of the address.

There are a few nuances going on here:

1. The 9900's registers are really in RAM, and can be modified directly as any other memory.
2. MOVB *always* works with the MSB of any register operands.
3. MOVB can address any memory location for memory operands.

Also keep in mind that the EQU (equates) in the code are addembler *directives* and not "assembly language". They are converted dircectly to the values equated with the names before being assembled into code:

WRKSP  EQU  >8300
R0LB   EQU  WRKSP+1
VDPWA  EQU  >8C02

this:    LWPI WRKSP
becomes: LWPI >8300

this:    MOVB @R0LB,@WDPWA
becomes: MOVB @>8301,@>8C02

Also, not discussed is that MOVB does a read-before-write, which is how it can address and manipulate only a single byte despite being a 16-bit CPU (and no UB/LB output pins). This is why the read and write "ports" (memory-mapped locations) of the 9918A are at least two addresses apart in memory.

Edited June 12, 2017 by matthew180

apersson850 · June 12, 2017

Well, the read before write is what I discussed, but perhaps you were writing at the same time.

Another thing is that the general procedure used by MOV and MOVB are actually the same. Both read before write, since they use the same memory access principle internally in the CPU. They both have four memory accesses, i.e. fetch instruction, fetch source, fetch destination, write destination. The only difference is that with MOV, the entire destination is replaced by the source, with MOVB only half of it.

This thread has mainly been about games, and such games that focus on nice screen graphics. Nothing wrong with that, but have you considered adding anything about data management and processing? Games need to handle data too, especially if they are supposed to be clever. Intelligence frequently involves allocating, creating and traversing decision trees. Is it perhaps time to add something about how to handle not just single integers, but arrays, structures, linked lists, content addressable arrays and such stuff? Principles for evaluating a game situation and selecting the next move, depending on what the player does?

This could be interesting not just for turn-by-turn based games, but also for action games. Provided you can do the coding efficiently enough, something there has been a lot of focus on in this thread.

Edited June 12, 2017 by apersson850

+mizapf · June 12, 2017

MOV is slightly faster than MOVB because it surpresses the read-before-write

This is not true, unfortunately. In fact, for the operations A, AB, C, CB, S, SB, SOC, SOCB, SZC, SZCB, MOV, MOVB, COC, CZC, XOR, the same microprogram is used.

See 9900 Family Systems Design, Chapter 4: Hardware Design: Architecture and Interfacing Techniques, p. 4-93

NOTES

3) For MOV instruction the destination word (16 bits) is fetched although not used.

matthew180 · June 12, 2017

Damn! There goes my understanding of what's happening!

Here's the golden nugget I was searching for: "The first byte written to VDPWA is assumed to be the LSB of the VRAM address."

Thanks guys! I think I get it now -- again!

Be careful that you understand this only applies to setting up a read or write address (and that depends on what "port" you are writing to). If you are just reading or writing bytes to/from VRAM (via the 9918A), then, well, you are just reading or writing and there is no assumption on the 9918A's part other than you are moving data.

The 9918A interface to a host CPU has three signals, MODE, /CSR, and /CSW (/CSR and /CSW are active low, hence the '/'). With these signals "ports" are generally created in the host system's memory map:

1. Write to VDP register

2. Write to VRAM

3. Read from VRAM

4. Read status

Notice two are read, two are write. On systems that don't have a "read-before-write" and that can address real bytes, there are typically only two ports needed to talk to the 9918A. The 99/4A needs four ports.

The MODE pin is typically tied to the LS-bit of the address bus, and /CSR, /CWS tied to the CPU's read/write pins (with other logic for I/O vs Memory access, memory decoding, etc.) Anyway, if you look on page 2-3 of the 9918A datasheet you will see the details about how to communicate with the 9918A and where the "00", "01", and "10" bits come from that have to be set based on what you are doing. These "rules" come from the 9918A, and exist for any computer that uses the 9918A.

matthew180 · June 12, 2017

This is not true, unfortunately. In fact, for the operations A, AB, C, CB, S, SB, SOC, SOCB, SZC, SZCB, MOV, MOVB, COC, CZC, XOR, the same microprogram is used.

See 9900 Family Systems Design, Chapter 4: Hardware Design: Architecture and Interfacing Techniques, p. 4-93

NOTES

3) For MOV instruction the destination word (16 bits) is fetched although not used.

Damn, where is it suppressed then? I could swear there is a situation where that is true. It has been too long, and I better check my details before posting things like that again. I'll have to dig through the books tonight.

+mizapf · June 12, 2017

I think with CLR.

[Edit: No, CLR has the same issue, but it was STWP, STST, and the immediate operations, as I saw.]

Edited June 12, 2017 by mizapf

Airshack · June 12, 2017

If you leave the interrupt mask off, then the ISR doesn't run.

I can't see offhand why that program would do anything unusual in that case though...

How about that same question but reversing everything:If you leave the interrupt mask on (LIMI 2), the code will start doing funny things like changing the character foreground and background colors? Not sure why?

apersson850 · June 12, 2017

That makes sense, since there's no clear byte, and you don't care about the prior content of an address you're going to set to zero anyway.

Alas, it's not like that either. CLR does access memory three times. Which is one too many for just fetch instruction and zero destination.

If you leave the interrupt on, then the sequence between setting up a VDP address and writing to it, by your code, may be interrupted. If then the interrupt service also writes to the VDP address register, your setup is destroyed. But you don't know that when the control is returned to your program, so you'll keep on writing but to a different location than you think.

Edited June 12, 2017 by apersson850

matthew180 · June 12, 2017

Well, the read before write is what I discussed, but perhaps you were writing at the same time.

Yes, we overlapped. ;-)

Another thing is that the general procedure used by MOV and MOVB are actually the same. Both read before write, since they use the same memory access principle internally in the CPU. They both have four memory accesses, i.e. fetch instruction, fetch source, fetch destination, write destination. The only difference is that with MOV, the entire destination is replaced by the source, with MOVB only half of it.

Corrected.

This thread has mainly been about games, and such games that focus on nice screen graphics. Nothing wrong with that, but have you considered adding anything about data management and processing? Games need to handle data too, especially if they are supposed to be clever. Intelligence frequently involves allocating, creating and traversing decision trees. Is it perhaps time to add something about how to handle not just single integers, but arrays, structures, linked lists, content addressable arrays and such stuff? Principles for evaluating a game situation and selecting the next move, depending on what the player does?

This could be interesting not just for turn-by-turn based games, but also for action games. Provided you can do the coding efficiently enough, something there has been a lot of focus on in this thread.

Absolutely. I'm mostly writing based on feedback, and originally with a lean towards people new to assembly so complex topics would come later. However, anything is fair-game. Games just seems to be something people get excited about. People like to see things happen, so getting something on the screen and moving is a big motivator and can be very rewarding.

+mizapf · June 12, 2017

The interesting thing is that the 9900 and 9980A make use of a "data derivation sequence" in the microprogram, which includes loading the value from the destination, while the 9995 replaces this with an "address derivation sequence" without fetching the destination value.

Also, the TI-99/4(A) console multiplexes the words as LSB, MSB, while the 9995 and the 9980 use an MSB, LSB multiplex order. This led to problems with some peripheral cards that assumed that the LSB is written first; those failed with the Geneve. I seem to remember this was the case with SNUG cards, in particular with the BwG, but maybe they were fixed later.

matthew180 · June 12, 2017

How about that same question but reversing everything:If you leave the interrupt mask on (LIMI 2), the code will start doing funny things like changing the character foreground and background colors? Not sure why?

Yes, that is a problem. It was in your summary at "LESON1". If you are trying to read or write to the VDP, you need to make sure no other code is also trying to read or write to the VDP, especially when setting up the VDP address (since it takes two consecutive and very specific writes as just discussed). The 99/4A is not a multitasking system (no multi-core CPU, no other CPUs in the system accessing memory or devices), so your assembly code owns the machine and is the only code running... EXCEPT for interrupts. When the CPU gets interrupted, your code stops and the CPU goes off to some other code to handle the interrupt (called the "Interrupt Service Routine" or ISR). This code does some stuff and exits, at which point the CPU returns execution to where your code was interrupted.

The 99/4A has one main ISR, which can be controlled by LIMI 0 (off) and LIMI 2 (on). The ISR *will* communicate with the VDP, so if you are in the middle of writing to the VDP when the ISR happens, then all bets are off as to the VDP's internal memory address register. So, only enable the ISR with LIMI 2 when you know it is OK for the ISR to talk to the VDP. If you need to talk to the VDP then you must disable the ISR with LIMI 0.

With assembly, you don't lost anything by shutting off the ISR other than the built-in music player (which you can roll your own easy enough, code available in the forum), and "auto-sprite motion" (not really useful from assembly, easier to do it yourself).

Edited June 12, 2017 by matthew180

apersson850 · June 12, 2017

There you go. Didn't know about the different multiplexing between the external hardware in the 99/4A and in the CPUs with 8-bit data bus. Are the microprograms for these processors published, or just some general comments? In some data manual? Don't remember seeing them in my 9900 data book. I've microprogrammed some processors a long time ago, so it could perhaps be interesting.

Returning to data processing, the addressing modes available in the 9900 can be put to good use there. So perhaps some general hints about how to handle such structured data would be appropriate here.

Tursi · June 12, 2017

How about that same question but reversing everything:If you leave the interrupt mask on (LIMI 2), the code will start doing funny things like changing the character foreground and background colors? Not sure why?

Already well answered above.

Interestingly, this is a huge problem for the ColecoVision guys. On that machine, the VDP interrupt is hard wired to the non-maskable interrupt, which means the CPU can never ignore it, it always processes it. This trips up a lot of homebrewers and new software that doesn't screw up is often praised for having "no vdp corruption". The TI actually gives us two easy ways to ignore the VDP interrupt (via LIMI and via the CRU), so it's easy to forget how nice it is to have that control. (The interrupt can also be disabled on the VDP itself, but this takes two writes so you have to take steps to ensure it doesn't interrupt between them ).

Edited June 12, 2017 by Tursi

Airshack · June 13, 2017

Games just seems to be something people get excited about. People like to see things happen, so getting something on the screen and moving is a big motivator and can be very rewarding.

A significant point many Assembly Language books miss.

Assembly on the 99/4A

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 1 member