Assembly on the 99/4A

Airshack · June 13, 2017

TODAY's DEBRIEF & LESSONS LEARNED SUMMARY:

1. The 99/4A designers decided to wire the 9918A's 8-bit bus to the MSB of the 9900's data bus. That means any "word" read or write (MOV instruction) will transfer the MSB or the register or memory to/from the 9918A. Since the 9918A is only physically wired to the MSB of the data bus, it will only ever see the MSB.

2. The 9900 address >8C00 will enable the 9918A, and the MSB of the 9900 data bus will be transfered to the 9918A.

3. You have to send the low-byte of the VDP address first, then the high byte (that's just the way the 9918A works).

To be clear:

All writes to the VDP are byte writes—The first byte written to VDPWA is assumed to be the LSB of the VRAM address.

4. ...there is one instance when it's important, and that's when addressing memory mapped devices that are auto-incrementing their internal address. Like the VDP. Since it increments the address on both a read and a write, you can't have the VDPWD and VDPRD decoded on successive bytes. If they were, the read-before-write concept in the TMS 9900 would increment the address when fetching the word that contains the byte that shouldn't change and again increment it when storing the word that contains the byte we want to store.

5. When you use the MOVB instruction, the 9900 will operate on whatever memory byte is addressed. However, when one or both of the operands of the MOVB instruction is a register, the MSB of the register will always be read or written. MOVB *always* works with the MSB of any register operands

6. The ISR *will* communicate with the VDP, so if you are in the middle of writing to the VDP when the ISR happens, then all bets are off as to the VDP's internal memory address register. So, only enable the ISR with LIMI 2 when you know it is OK for the ISR to talk to the VDP. If you need to talk to the VDP then you must disable the ISR with LIMI 0.

7. Tursi plans to port over ALL Colecovision games to the TI-99/4A.

Thanks to all for this initial barrage of feedback.

matthew180 · June 13, 2017

...

1. The 99/4A designers decided to wire the 9918A's 8-bit bus to the MSB of the 9900's data bus. That means any "word" read or write (MOV instruction) will transfer the MSB or the register or memory to/from the 9918A. Since the 9918A is only physically wired to the MSB of the data bus, it will only ever see the MSB.

...

Reiterating that the above statement only applies to MOV. If using MOVB with a memory operand, the addressed byte will be transferred to the 9918A. Also note that MOVB is typically used (vs MOV) when communicating with the 9918A.

2. The 9900 address >8C00 will enable the 9918A, and the MSB of the 9900 data bus will be transfered to the 9918A.

I was just using that one "port" as an example. In the 99/4A, addressing any of the four ports decoded for accessing the 9918A will enable the 9918A. You can also get into interesting situations if you do things like read from a 9918A write port, and vice versa. Don't do that intentionally.

apersson850 · June 13, 2017

It's misleading to say that using any of these addresses will "enable" the VDP. You don't need to do anything special to "enable" it. It will do its task of displaying the screen anyway. But the VDP have these ports available to load an address, read data from it etc. at the specified addresses in CPU address space. By writing data to the VDP, or actually to the RAM memory managed by the VDP, you can change what's shown on the screen.

The only "enable" you can do is that you can actually turn off the screen, displaying only the backdrop. You do that by writing to a register in the VDP, an operation which is the only one that doesn't involve the RAM in VDP address space.

Unlike VDP RAM, the VDP registers can't be read from, so you need to remember what you store in them. That's why the operating system in the 99/4A keeps a copy of the VDP register where the screen blanking time out is done, so it can restore it when you press a key to cancel the timeout.

+mizapf · June 13, 2017

There you go. Didn't know about the different multiplexing between the external hardware in the 99/4A and in the CPUs with 8-bit data bus. Are the microprograms for these processors published, or just some general comments? In some data manual? Don't remember seeing them in my 9900 data book. I've microprogrammed some processors a long time ago, so it could perhaps be interesting.

The microprograms of the 9900 and 9980A are published in the above-mentioned book 9900 Family Systems Design, Chapter 4: Hardware Design: Architecture and Interfacing Techniques, pages 89 ff. This is a must-read for everyone who is interested in the internal operation.

I did not find these microprograms for the 9995, so for the emulation in MAME I had to guess how they could look like, derived from the instruction timing tables.

apersson850 · June 13, 2017

I have a memory of trying to buy that book back in the 80's, but I had to get the 9900 Family data book instead. The one you mention wasn't available where I looked.

+mizapf · June 13, 2017

We have it on whtech, as I just checked, folder "datasheets and manuals/Datasheets - TI/9900-FamilySystemsDesign-1stEdition/"

Airshack · June 13, 2017

It's misleading to say that using any of these addresses will "enable" the VDP. You don't need to do anything special to "enable" it.

I appreciate and understand what you're saying about the enable analogy. Since this thread is basically Assembly Language Preschool for Dummies, I believe the "enable" analogy was used to help me grasp something I failed to reach. I'll be sure to not take it literally.

Edited June 13, 2017 by Airshack

Tursi · June 13, 2017

7. Tursi plans to port over ALL Colecovision games to the TI-99/4A.

Hehe... I don't know Z80.

matthew180 · June 13, 2017

My apologizes on the "enable" confusion.

When using assembly language you are down at the bare metal of a computer, and to interface with the various pieces of the system you need to have at least a basic understanding of how the hardware works. I think this is what stumps a lot of beginners, and sometimes for the sake of moving forward it is easier to just accept that something works rather than understand the gory details.

Warning, gory details follow...

When I said "enable", it is in terms of signaling to the 9918A that it should be reading the data bus, or electrically driving data onto the data bus (writing to the bus). I did not mean "enable" in terms of the 9918A doing its job of producing video output. In that sense, as long as the 9918A is not being held in reset by a low input on pin-34, then it will be functional and working to produce a video output. I suppose you could call this "enabled", but that would be confusing to a hardware person. Let me explain.

The 9900 CPU has an address bus and a data bus. The address bus is generally (but not always) controlled by the CPU and is output-only from the CPU. However, the data bus is bi-directional since the CPU needs to be able to write data out to external circuits as well a read data back from external circuits. These two buses have physical electrical pins that emerge from the chip.

Devices that the CPU needs to communicate with, like RAM, ROM, I/O, a VDP, etc. must electrically connect to these buses. So if you have a RAM chip, a ROM, and a VDP all electrically connected to the data bus, and each one able to put data on that bus simultaneously (so the CPU can read data from those devices), you need a way to control which device is "driving" (electrically controlling the voltage levels on the individual wires that make the data bus) the bus at any given time. If every device tried to put data on the bus at the same time, the CPU would only receive garbage and the system would not work.

The standard term for a device that is currently allowed to read or write on a data bus is called "enabled". If you look at the datasheets for RAM, ROM, buffer chips, etc. you will see pins called "chip select", "enable", "chip enable", etc. This is the input to the device to signal that it should be doing something with the bus.

When the CPU wants to read data from RAM, it puts an address on the address bus, waits for the memory to put the data on the data bus, then the CPU reads in (latches) the data internally. So how does the RAM, or ROM, or VDP know when it is OK to puts its data on the bus? This is where address "decoding" comes in.

Decoding is typically done by extra logic circuits on the motherboard that look at what address the CPU has put on the address bus, and determines which device in the system should be "enabled" based on the address. This decoding is what gives a computer its specific "memory map", i.e. the memory locations in the CPU's address space where devices appear. The memory map is completely up to the system designer and is one of the main characteristics that makes one computer different from another, even if they have the same CPU.

Knowing a computer's memory map and the hardware at the various memory locations is essential to programming assembly language on a specific computer. You can look up the 99/4A's memory map in the E/A manual, other books, websites, etc.

On the 99/4A, the VDP is "mapped" into the 9900's address space at four memory locations (for those who know, I'm not going there right now):

>8800

>8802

>8C00

>8C02

When the decode logic on the motherboard sees that the CPU has put one of those addresses on the address bus, it will "enable" the 9918A, i.e. the 9918A input pins MODE, /CSW, and /CSR are set in such a way that the 9918A knows it should read data from the bus (accept data from the CPU), or drive data onto the bus (write data to the CPU).

For any of the addresses listed above, the *only* device "enabled" in the system will be the 9918A, and only the 9918A will read or write the data bus. All other devices in the system will ignore the data bus for those addresses. Conversely, for any addresses other than those listed above that the CPU puts on the address bus, the 9918A will ignore the data bus and will neither read nor write the data bus.

This does not mean the 9918A has stopped going about its business of generating video, etc. That is not the case and not what is meant by "enabled" in this context. The 9918A really cannot be "disabled" via software in the sense that you can no longer communicate with the 9918A via its data interface. You can set a bit in one register that causes the screen to blank, but the VDP is still functional and going about its job, you can still read / write VRAM, VDP registers, etc.

I will try to be more clear about the terms in the future, however it is a slippery slope sometimes because one thing leads to another and very quickly you are off talking about "chip enables" instead of getting something on the screen, or other such things.

apersson850 · June 13, 2017

Fair enough, now I see what you were thinking about. I'd call it chip select, or perhaps bus enable, but then I'm not a native English-speaker, so it could be me. Anyway, should anyone with beginner level competence read this, they now know the difference between enabling the basic functionality of the chip vs. enabling the electrical interface of the same.

Do you guys (gals?) here see it as a common thing to write an entire application in assembly language? I'm asking because I don' think I've ever done that on the 99/4A, not for anything useful, at least. OK, some small program which actually was intended to provide stimuli to drive hardware circuitry I've designed, to facilitate computerized testing of the same, that I've done. But even then, frequently the assembly part was a subroutine called by a program written in another language.

When doing this for a wider audience, the "other language" was frequently Extended BASIC. Using some CALL LOAD statements to install a memory image file loader, which then loaded the real assembly program, made these designs useful even if you had no other expansion than the 32 K RAM. You could still load such a program from cassette, even if it took a special tool to create the files on the tape.

For my own use, it was almost always assembly routines called from a Pascal program. Unlike in BASIC, as soon as the assembly procedure was declared as an external procedure, you didn't see any difference compared to a Pascal procedure. And since all normal memory in my machine is just as fast as the RAM PAD at >8300, I didn't have to bother with thinking about where to load the programs either.

I've designed my own embedded computers, based on the TMS 9995, but that's something else. They were programmed entirely in assembly, due to the lack of a suitable compiler.

+TheBF · June 15, 2017

I've designed my own embedded computers, based on the TMS 9995, but that's something else. They were programmed entirely in assembly, due to the lack of a suitable compiler.

I have to mention that in cases like the above, where you have new circuitry and no compiler, Forth can be a super Assembly language program to test the board.

It's not too hard to get it running and once you do, you can create (or copy) a Forth assembler for the machine and do Forth and assembly routines interactively.

apersson850 · June 15, 2017

True, but this was even before we got the first Forth designed to run on the TMS 9900.

+TheBF · June 15, 2017

True, but this was even before we got the first Forth designed to run on the TMS 9900.

I bought TI Forth in 1984 (or so) The kernel was FIG-Forth 9900 published in the late 1970's.

Was your board before that?

apersson850 · June 15, 2017

No, I think I started early 1983 or so. I didn't have the TI Forth at that time, at least. But sometimes we got things later here in Sweden than in the US.

+Lee Stewart · June 15, 2017

Yeah—TI Forth was put into the public domain December 21, 1983 and given to TI users groups for copying and redistribution at that time. MANNERS sold the manual and two diskettes for a nominal $25.

...lee

Airshack · June 15, 2017

This post with its diagram really cleared up some questions I had from the earlier conversation/post:

So, how does the VDP know we are setting up a read or write address if there is only one memory mapped port "VDP set read/write address" at >8C02? The answer is, the VDP looks at the upper two bits of the 2nd address byte we send. Since the VDP address register is 14-bits, the 1st byte (8-bits) plus 6-bits from the 2nd byte are used to form the address. The two most significant bits of the 2nd byte we send inform the VDP that this is a read or write address:
|               2nd byte                |               1st byte                |
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| t0 | t1 | a0 | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9 | a10| a11| a12| a13|
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
|  type   |                   14-bit VDP address register                       |
Here is the truth table for the two "type" bits:

00 : Setting up a read address
01 : Setting up a write address
10 : Writing to a write-only register
11 : Illegal / undefined

Here is the code to set up a read address:
WRKSP  EQU  >8300             * Where we will set the WP
R0LB   EQU  WRKSP+1           * Memory address were R0's LSB will be

. . .
       LWPI WRKSP
. . .

       LI   R0,384            * R0 contains the address we want to set up
       MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       ANDI R0,>3FFF          * Set read/write bits 0 and 1 to read (00)
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
A few things to understand here. First, I'm using the fact that the CPU's general purpose registers are memory-based and I'm using a memory-to-memory move to send the low byte. This could be done a lot of different ways, and something common you will see is this:
       SWPB R0
       MOVB R0,@VDPWA
       SWPB R0
       ANDI R0,>3FFF
       MOVB R0,@VDPWA
Remember, we have to send the LSB to the VDP first. The SWPB (SWaP Bytes) instruction will do just that, swap the register's low and high bytes. So the LBS of the address we want to set up is now in the MSB of R0, and sent to the VDP. Then the second SWPB puts the bytes back to their original form and the MSB of the address we want to set up is sent. Personally I don't really like this method, but it works and you might see it in code out there in the wild. There were other reasons the SWPB was used that have to do with timing, but that's a longer story.

The ANDI R0,>3FFF instruction masks out the upper two bits and makes sure they are zero, which indicates to the VDP that we are setting up a read address. If we assume that the programmer will never set a VDP address greater than 14-bits, then the upper two bits will always be zero and we can remove this instruction. This is what I personally do since it is quicker and the VDP functions we are developing will be used a lot in games. Thus, my version of setting up a VDP read address can be reduced to two instructions and assumes the VDP address to set up has been loaded into R0:
       MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
Setting a write address is exactly the same except we make sure that the two "type" bits are 01 by using an ORI (OR Immediate) instruction to set the bits. Actually this will not work if most significant bit of R0 was already 1, but I assume again that the programmer will not load a set up address into R0 that is greater than the 14-bits used by the VDP:
       MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       ORI  R0,>4000          * Set read/write bits 0 and 1 to write (01)
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
At this point you are ready to rock. You can start reading and writing bytes to the VRAM. The only thing we did not do is specifically set a graphics mode and fix up the various tables in VRAM by writing to the VDP's write-only registers. But, since this post is already long enough, that will come in another post.

Next I'll present a complete set of VDP routines that use the same calling convention (using R0, R1, and R2) as the routines available in the E/A or XB cartridges. We'll even add a new routine of our own that makes VRAM initialization fast and easy.

Matthew

I was struggling with this earlier since I didn't realize you were assuming >00 in the two most significant bits. Now it's clear! Much cleaner than the double SWBP technique. I'm looking forward to moving along to the VDP routines post.

It seems (to me) the slight thread creep may have cut you off on the Character Definition (12-14 May 2010) post conversation? Did you intend to simply show the DATA format without going into how to load the character set into CPU RAM? I was expecting a short example.

**
* Standard Character Set 1 - "Space" 8x8
*
* NOTE: This data will increasethe size of our executable and uses CPU RAM!
* When run from cartridge it eats up part of our 8K (unless we do paging).
SCS1
DATA >0000,>0000,>0000,>0000 ; 0 >00
DATA >7C82,>AA82,>BA44,>3800 ; 1 >01
DATA >7C92,>92FE,>BA44,>3800 ; 2 >02................????

matthew180 · June 16, 2017

Keep reading. Post #52 on the 3rd page of so covers it. Unfortunately when I started writing this I fell into the trap of trying to provide a lot of background *first*. If I had it to do over, I would rearrange the presented code so the learner would get things going more quickly, then go back and fill in the gaps and details.

Also, in the last 7 years I have changed some of my own style and way of doing some things. It does not matter much, the code presented will get you there, and hopefully your will evolve you own ways of doing things.

But, to answer your question about loading the tile (character) set, it is not a lot of code. You set-up the address in the pattern generator where the patterns need to go (usually a tile set defines tiles 32 to 127), then copy the bytes. But before you can do that, you need to set up the VDP registers to define where the various tables in VRAM will be located, etc. All that detail is explained in subsequent posts.

       LI   R0,>2000          * Start at the space character
       LI   R1,SCS1
       LI   R2,SCS1E-SCS1
       BL   @VMBW

That is the code to actually load the tile set. The assembler will have calculated value for the labels SCS1 (space character set) and SCS1E (the 'E' means 'end'), so all you have to do is make sure to include your data in your code. You should use longer labels these days with the new assembler tools. The address >2000 is the location in VRAM where the pattern generator table is located (which depends on the value in VR4). Also note that this tile set appears to have definitions for all 256 tiles.

matthew180 · June 16, 2017

This post with its diagram really cleared up some questions I had from the earlier conversation/post:

I was struggling with this earlier since I didn't realize you were assuming >00 in the two most significant bits. Now it's clear! Much cleaner than the double SWBP technique. I'm looking forward to moving along to the VDP routines post.

...

Yes, I probably should not make that assumption. However, I hate including code to cover a case that will almost never happen. I should probably present two versions and let the learner decide which they want.

apersson850 · June 16, 2017

Haha, doing things like this professionally, I hate when people leave out code to cover cases that never happens. They tend to happen as soon as the machine is on the customer's floor...

Asmusr · June 16, 2017

We can speed up VDP transfers with a few, simple tricks. Instead of this:



VMBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address

       ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)

       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address

VMBWLP MOVB *R1+,@VDPWD       * Write byte to VDP RAM

       DEC  R2                * Byte counter

       JNE  VMBWLP            * Check if done

       B    *R11

We can do this:



VMBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address

       ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)

       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address

VMBWLP MOVB *R1+,@VDPWD       * Write byte to VDP RAM

       MOVB *R1+,@VDPWD       * Write byte to VDP RAM

       DECT R2                * Byte counter

       JNE  VMBWLP            * Check if done

       B    *R11

Now we only have to count down and loop for every second byte. We can only transfer a number of bytes that's a multiple of 2, but that's usually OK. We can take this a step further:



VMBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address

       ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)

       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address

       SRL R2,2               * Divide by 4

VMBWLP MOVB *R1+,@VDPWD       * Write byte to VDP RAM

       MOVB *R1+,@VDPWD       * Write byte to VDP RAM

       MOVB *R1+,@VDPWD       * Write byte to VDP RAM

       MOVB *R1+,@VDPWD       * Write byte to VDP RAM

       DEC  R2                * Byte counter

       JNE  VMBWLP            * Check if done

       B    *R11

And so on... But unrolling the loop more than 8 times has very little effect. We can speed things up a bit more by loading the VDPWD address into a register:



VMBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address

       ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)

       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address

       LI R0,VDPWD

       SRL R2,2               * Divide by 4

VMBWLP MOVB *R1+,*R0          * Write byte to VDP RAM

       MOVB *R1+,*R0          * Write byte to VDP RAM

       MOVB *R1+,*R0          * Write byte to VDP RAM

       MOVB *R1+,*R0          * Write byte to VDP RAM

       DEC  R2                * Byte counter

       JNE  VMBWLP            * Check if done

       B    *R11

This is faster and shorter because each MOVB is only one word instead of two. The size matters if we make a final optimization and move the code into scratch pad where there is little space.

matthew180 · June 16, 2017

Haha, doing things like this professionally, I hate when people leave out code to cover cases that never happens. They tend to happen as soon as the machine is on the customer's floor...

True enough, however in this case "people" are assembly programmers, and if they are using code I wrote they need to respect the documentation, etc. If a library (or video chip) you are using says "always set these bits to zero", and you disregard that information, the consequences are on you. I don't believe in protecting a programmer from themselves.

Asmusr · June 16, 2017

Here's my general VPD copy routine that works on any number of bytes. It unrolls the loop 8 times for all groups of 8 bytes and then deals with the remaining bytes in a one-byte loop. If you transfer less than 16 bytes it's probably more efficient (my guess) to use a standard VMBW routine. Note that I'm using the SWPB approach instead of the @R0LB approach because it's nice that a general routine works in any workspace, and the time to execute the setup code is insignificant compared to the time of the inner loop. But you can also see that I still have Matthew's comments in the code because this thread is where I started to learn TMS9900 assembly.

*********************************************************************
*
* Fast CPU to VDP copy, replaces VMBW
*
* R0: Destination address
* R1: Source address
* R2: Number of bytes to copy
*
VDPCP  SWPB R0
       MOVB R0,@VDPWA                  ; Send low byte of VDP RAM write address
       SWPB R0
       ORI  R0,>4000                   ; Set the two MSbits to 01 for write
       MOVB R0,@VDPWA                  ; Send high byte of VDP RAM write address
       LI   R0,VDPWD
VDPCP0 MOV  R2,R3
       SRL  R3,3                       ; Number of groups of 8
       JEQ  VDPCP2
VDPCP1 MOVB *R1+,*R0
       MOVB *R1+,*R0
       MOVB *R1+,*R0
       MOVB *R1+,*R0
       MOVB *R1+,*R0
       MOVB *R1+,*R0
       MOVB *R1+,*R0
       MOVB *R1+,*R0
       DEC  R3
       JNE  VDPCP1
       ANDI R2,>0007                   ; Isolate number of remaining bytes
       JEQ  VDPCP3
VDPCP2 MOVB *R1+,*R0
       DEC  R2
       JNE  VDPCP2
VDPCP3 B    *R11
*// VDPCP

RXB · June 16, 2017

Hmm this looks exactly like code from the XB ROMs?

matthew180 · June 16, 2017

That is pretty nice. With a longer function like this using SWPB, vs. MOVB with the register's LSB address, is probably a better approach to keep the function workspace agnostic (as mentioned).

For single bytes I find myself not even bothering with a function call any more, but basically in-lining the code directly. Calling VSBW in a loop where the address is changing for every write, is a waste (IMO). Writing vertical lines to the name table is an example of where this might be happening. Just include the VDP address setup in your loop and skip calling VSBW. However, for something like this on the F18A where you can set the VDP address register to increment by values of +127/-128 (signed byte), you don't need to setup the address for each write.

Willsy · June 16, 2017

ANDI R2,7 .... NIIIIICE :-)

Assembly on the 99/4A

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members