Assembly on the 99/4A

+Vorticon · February 23

On 2/22/2024 at 5:56 AM, apersson850 said:

For the TI, I've used macros in the assembler for small stuff. For more complex tasks, I ususally returned to Pascal and did them there, unless they still were speed-critical or needed to do something not doable from Pascal.

I still struggle to find a use for Macros in assembly. What would be a the best use case for them in that environment?

+Lee Stewart · February 23

2 hours ago, Vorticon said:

I still struggle to find a use for Macros in assembly. What would be a the best use case for them in that environment?

Though ALC macros can certainly be complicated, the simplest use, in my opinion, is shortcuts for often-used code.

I use two macros for over 500 fbForth name fields of normal and immediate Forth words:

Spoiler

;++ </Name Field macros> -------------------------------------------

   .defm NAME_FIELD
       BYTE #1 + TERMINATOR_BIT
       TEXT #2
   .endm

   .defm NAME_FIELD_IMMEDIATE
       BYTE #1 + TERMINATOR_BIT + PRECEDENCE_BIT
       TEXT #2
   .endm

;++ Normal name field example...
;++ Macro for GOTOXY name field
GTXY_N .NAME_FIELD 6, 'GOTOXY '

;++ Assembly result of above macro
GTXY_N
       BYTE 6 + TERMINATOR_BIT
       TEXT 'GOTOXY '

Since I use the same code several times in the fbForth Floating Point Library for pushing and popping BL links to and from the return stack, the following macros are very convenient:

Spoiler

;++ <FPL Return Stack macros> --------------------------------------
* Return Stack (RSTACK) for transcendental fxns uses Forth's return
* stack while using GPL workspace. Return Stack Pointer is GPLWS R15,
* which value will be restored to >8C02 when done. RSTACK is EQUated
* in fbForth303_FPL_MATHS.a99.

   .defm FPUSHL      ;push link to Forth return stack
       DECT RSTACK         ;reserve space on Forth RS
       MOV  R11,*RSTACK    ;push return
   .endm
   
   .defm FRTPL       ;pop link from Forth RS and return to caller
       MOV  *RSTACK+,R11   ;pop return stack
       B    *R11           ;return to caller
   .endm

;++ Push link example
PWR$$  .FPUSHL

;++ Result of above macro
PWR$$  
       DECT RSTACK         ;reserve space on Forth RS
       MOV  R11,*RSTACK    ;push return

...lee

+TheBF · February 23

2 hours ago, Vorticon said:

I still struggle to find a use for Macros in assembly. What would be a the best use case for them in that environment?

If you implement a stack, adding PUSH and POP macros clarifies the code.

On 9900 after the stack you could make ENTRY and RETURN macros that automagically save R11 on your newly minted stack and restore it on return.

+TheBF · February 23

Once you have that stacking working, you could make another one for DATA.

Then you could make macros to fetch and store a variable, then more macros to do math operations with two numbers on the stack and return the result to the stack.

Maybe some macros to MOVE memory and FILL memory with arguments from the stack and then...

Oh wait.

That's building Forth.

apersson850 · February 23

1 hour ago, Lee Stewart said:

Though ALC macros can certainly be complicated, the simplest use, in my opinion, is shortcuts for often-used code.

Yes, at least when they aren't big and complicated enough to make sense to call as subroutines.

Like POP R2, which can be defined as MOV *SP+,%1

Now assuming that SP is EQU the stack pointer already and %1 means the first parameter to the macro. This is just a renaming of an instruction, but a little less to write each time.

Defining PUSH as DECT SP MOV %1,*SP means two instructions are actually assembled when you write one, so there's more of a saving.

If you want to create a subroutine you need and extra instruction to return, but only one for each call. Which means that if your repeat a four instruction code sequence twice, using eight instructions, then you waste one compared to calling a subroutine twice (two instructions), host the subroutine itself (four instructions) and return from it (one instruction). Unless you need to save a return address.

apersson850 · February 23

How do you prefer to make stacks?

First I define a stack pointer, usually in R10.

SP EQU 10

I always make them grow towards lower addresses. Thus I define them with

STACK BES 50

in Editor/Assembler, if I need 50 bytes of stack space.

Then I set up the stack pointer with

LI SP,STACK

Now it's ready to use.

Pusing on the stack is with DECT SP, MOV XXX,*SP

Popping becomes MOV *SP+,XXX

But I've seen some do the opposite. They define the stack as

STACK BSS 50

LI SP,STACK

Then they push with MOV XXX,*SP+ and POP with DECT SP MOV *SP,XXX

This does of course work, but if you want to refer to the value on top of stack you can't do that just with *SP. You have to use @-2(SP), which takes more space and is slower.

I can't see any advantage with the latter scheme. At best, if you never refer to the value on top of the stack without popping it away, they are equal. Otherwise the first method is better.

I can imagine that if you use the stack solely as a return stack, then you can just as well go with the second option. But if you want to do math too, the first is clearly better. Negate top of stack is simply NEG *SP instead of NEG @-2(SP)

Which method do you use?

+Vorticon · February 23

I see... Time to revisit the Macro section of the manual 😄

+TheBF · February 23

1 hour ago, apersson850 said:

How do you prefer to make stacks?

First I define a stack pointer, usually in R10.

SP EQU 10

I always make them grow towards lower addresses. Thus I define them with

STACK BES 50

in Editor/Assembler, if I need 50 bytes of stack space.

Then I set up the stack pointer with

LI SP,STACK

Now it's ready to use.

Pusing on the stack is with DECT SP, MOV XXX,*SP

Popping becomes MOV *SP+,XXX

But I've seen some do the opposite. They define the stack as

STACK BSS 50

LI SP,STACK

Then they push with MOV XXX,*SP+ and POP with DECT SP MOV *SP,XXX

This does of course work, but if you want to refer to the value on top of stack you can't do that just with *SP. You have to use @-2(SP), which takes more space and is slower.

I can't see any advantage with the latter scheme. At best, if you never refer to the value on top of the stack without popping it away, they are equal. Otherwise the first method is better.

I can imagine that if you use the stack solely as a return stack, then you can just as well go with the second option. But if you want to do math too, the first is clearly better. Negate top of stack is simply NEG *SP instead of NEG @-2(SP)

Which method do you use?

From what I have seen the TI-99 Forth implementations use your first method.

I went down the road of caching the top-of-stack in a register, which makes many things simpler but a few other things require an extra POP to refill the cache register at the end of the operation.

Net improvement is about 10% in the literature and seems to hold true on 9900.

+khanivore · February 23

2 hours ago, apersson850 said:

How do you prefer to make stacks?

The gcc backend for tms9900 uses R10 as SP. The stack grows downward and the most recently pushed item is always *R10. It knows in advance how much stack space to allocate so it can roll up the DECTs into a single AI and then use a scratch reg to do the push. e.g.

        ai   r10, >FFF8
        mov  r10, r0
        mov  r11, *r0+
        mov  r12, *r0+
        mov  r14, *r0+
        mov  r15, *r0

and then pop is in the same order:

        mov  *r10+, r11
        mov  *r10+, r12
        mov  *r10+, r14
        mov  *r10+, r15
        b    *r11

I was thinking sometime this could even be reduced a bit more if R11 is always the last register saved by doing:

        b    *r10+

(It only saves R11 if the current function is not a "leaf" (a function that doesn't call any other functions) so would be slightly more complex to implement.)

Asmusr · February 23

That's pretty neat.

+Lee Stewart · February 24

On 2/21/2024 at 5:12 AM, dhe said:

More EA, I like the flexibility this approach provides for passing variables. But, I think there is a bug.

But I think there is another bug in the EA manual.

I think the second MOV (Y Value) also needs to do an increment of R11.

This was actually corrected in the E/A Addendum.

...lee

apersson850 · February 24

17 hours ago, khanivore said:
I was thinking sometime this could even be reduced a bit more if R11 is always the last register saved by doing:
        b    *r10+

That is a very common mistake to do, sometimes even by those well into TMS 9900 programming. But it will not cause execution to go to the return address stored at the top of the stack, but rather start executing the top of the stack and just advance the stack pointer after that.

It's one step further, but really the same mistake in thinking as that B R2 would execute code at the address pointed to by R2. Instead it will execute the code stored in R2. Something the TMS 9900 will happily do, as registers are really memory.

Edited February 24 by apersson850

+khanivore · February 24

17 minutes ago, apersson850 said:

That is a very common mistake to do, sometimes even by those well into TMS 9900 programming. But it will not cause execution to go to the return address stored at the top of the stack, but rather start executing the top of the stack and just advance the stack pointer after that.

It's one step further, but really the same mistake in thinking as that B R2 would execute code at the address pointed to by R2. Instead it will execute the code stored in R2. Something the TMS 9900 will happily do, as registers are really memory.

Ah, good point, yes I had forgotten it still needs a double indirection.

apersson850 · February 24

Digital Equipment's VAX architecture ha the Autoincrement deferred addressing mode. It does exactly what you want. Branch @(R10)+ would branch to the address pointed to at the address pointed to by R10, then autoincrement R10.

Double indirection built into the instruction. There is more freedom in a 32 bit architecture, of course.

+khanivore · February 24

12 minutes ago, apersson850 said:

Digital Equipment's VAX architecture ha the Autoincrement deferred addressing mode. It does exactly what you want. Branch @(R10)+ would branch to the address pointed to at the address pointed to by R10, then autoincrement R10.

Double indirection built into the instruction. There is more freedom in a 32 bit architecture, of course.

Yep, the VAX approach makes a lot more sense. It would have been nice if the TMS9900 designers had built-in a level of indirection into branch as "B Rx" is pointless. It would have messed up "B @LABEL" though so probably more consistent to have it as it is.

apersson850 · February 24

It's of course the handling of the general address concept that streamlines the whole thing. A general address is the same, regardless of the context. Since they wanted two general addresses in some instructions, and these exist in both word and byte versions, there are only two bits to describe the address. Still they managed to squeeze in five mode in two bits, so I think it's fair to say they did a pretty good job defining the instruction format after all, considering it's a 16-bit architecture with a 16-bit opcode. Several instructions would have had to be longer than 16 bits if more flexibility should fit in.

There are machines that don't use the von Neumann architecture, although rare in these days. Such machines typically have completely different data and instruction paths in the CPU, meaning you can't execute data neither for love nor money. On the other hand, they can read data and instructions in the same machine cycle, since it's separate data paths that are used.

Edited February 24 by apersson850

+FarmerPotato · February 24

On 2/24/2024 at 6:14 AM, khanivore said:

Yep, the VAX approach makes a lot more sense. It would have been nice if the TMS9900 designers had built-in a level of indirection into branch as "B Rx" is pointless. It would have messed up "B @LABEL" though so probably more consistent to have it as it is.

The 99000 has a BIND branch indirect instruction for that.

BIND *R10+ fetches the word at *R10, increments R10, then puts that word in the PC.

How it returns is up to you. BIND takes any addressing mode.

B R8 is sometimes useful. You could set up an instruction in R8,R9, with another B in R10,R11

Or load R8 with the opcode for "B *R11" then do BL R8. This is a way to get the PC in code that can be moved around in memory.( If only there were a STPC instruction.)

99000 defines another, BLSK, Branch and Link Stack. It is like BL branch and link, except the return address goes onto a stack instead of R11.

Unfortunately, BLSK only takes a stack pointer register and an immediate address. So can't do BLSK *R5. Just

BLSK @SUBRT,R10

It decrements R10 and pushes the return address *R10.

You can however:

MAIN BLSK @SUBRT,R10

...

SUBRT BLSK @SUBRT2,R10

BIND *R10+ return

SUBRT2 BIND *R10+

Edited February 25 by FarmerPotato
Forgot R10

+TheBF · February 24

1 hour ago, FarmerPotato said:

The 99000 has a BIND branch indirect instruction for that.

BIND *R10+ fetches the word at *R10, increments R10, then puts that word in the PC.

How it returns is up to you. BIND takes any addressing mode.

B R8 is sometimes useful. You could set up an instruction in R8,R9, with another B in R10,R11

Or load R8 with the opcode for "B *R11" then do BL R8. This is a way to get the PC in code that can be moved around in memory.( If only there were a STPC instruction.)

99000 defines another, BLSK, Branch and Link Stack. It is like BL branch and link, except the return address goes onto a stack instead of R11.

Unfortunately, BLSK only takes a stack pointer register and an immediate address. So can't do BLSK *R5. Just

BLSK @SUBRT,R10

It decrements R10 and pushes the return address *R10.

You can however:

MAIN BLSK @SUBRT

...

SUBRT BLSK @SUBRT2,R10

BIND *R10+ return

SUBRT2 BIND *R10+

That could make a very efficient indirect threaded interpreter.

Currently 9900 needs something like this:

MOV     *IP+, W       move address at *IP (code field) into Working register & incr IP by 2
MOV     *W+, R5       move contents of code field to R5 & autoinc W
B          *R5        branch to the address in R5

Are you saying 99000 could do it two instructions?

MOV  *IP+,W
BIND *W+

+FarmerPotato · February 25

18 hours ago, TheBF said:
That could make a very efficient indirect threaded interpreter.

Are you saying 99000 could do it two instructions?
MOV  *IP+,W
BIND *W+

Yes, that's right.

I've was thinking about an interpreter for direct threading. Instead of a pointer in the CFA (for instance to DOCOLON or DOCODE) the code is inlined and it is reached by BL *W. One less indirection, but lots more memory used. Hopefully faster?
Anyhow taking that to Forth thread.

I too believe the stack should grow downward. Maybe the return stack is fine to grow upward?

+Vorticon · February 25

I'm trying to implement a simple low level data transfer routine for the RS232 card using RTS/CTS flow control and I'm not able to receive anything. The problem appears to be with the CTS control. According to Nouspikel, CRU bit 5 of the >1300 CRU base of the first RS232 card controls the CTS1 line. In the following excerpt, I first activate CTS, get the byte, then inactivate CTS and return to the calling routine (not shown). Removing the CTS parts allows me to receive data but I can't obviously control the flow. Anyone here familiar with the RS232 serial ops at the low level?

sbo     5               ;activate cts line. ready to receive
        a       @uartdis,r12    ;add uart displacement to cru base
chkdsr  tb      26              ;test rts pin. signal is inverted!
        jne     chkdsr          ;if line high then not ready
chkbuf  tb      21              ;test receive buffer
        jne     chkbuf
        mov     @cruadr,r12     ;restore base rs232 card cru address
        sbz     5               ;inactivate cts line. not ready to receive
        a       @uartdis,r12    ;add uart displacement to cru base
        stcr    r6,8            ;get byte into r6
        swpb    r6
        sbz     18              ;reset buffer cru bit 21

+TheBF · February 25

I have code that works but it is in Forth Assembler so some "translating will be required.

Looks to me like you have added the UART offset in the wrong places.

You select the UART reset DST and test bit 21, then restore the CARD address and try to reach the UART. Can't do that.

You must read the character with the UART address in R12.

**Also after each read you must set bit 18 on the UART.

sbo     5               ;activate cts line. ready to receive
        a       @uartdis,r12    ;add uart displacement to cru base
chkdsr  tb      26              ;test rts pin. signal is inverted!
        jne     chkdsr          ;if line high then not ready
chkbuf  tb      21              ;test receive buffer
        jne     chkbuf
        mov     @cruadr,r12     ;restore base rs232 card cru address
        sbz     5               ;inactivate cts line. not ready to receive
        a       @uartdis,r12    ;add uart displacement to cru base
        stcr    r6,8            ;get byte into r6
        swpb    r6
        sbz     18              ;reset buffer cru bit 21

Order of operations:

- select the base CARD CRU address first

-RESET bit 5 for clear to send

-then add the displacement to get access to the UART

- test UART bit 21

- if character is ready then:

- read a byte from the UART

- IMPORTANT! reset bit 18 on the uart after each character is received to reset the rcv buffer in the UART

(bit 18 is the Receive interrupt enable bit, but needs to be touched even when not using interrrupt driven receive)

- endif

- re-select the CARD CRU address

- set bit 5 to block further inputs

         0 LIMI,
         R12 RPUSH,           \ save R12 on return stack  *Needed?*
         CARD @@ R12 MOV,     \ set base address of CARD
         TOS PUSH,            \ give us a new TOS register (R4)
         TOS CLR,             \ erase it
\  *** handshake hardware ON ***
         5 SBZ,               \ CARD CTS line LOW. You are clear to send
         UART @@ R12 ADD,     \ add UART, >1300+40 = CRU address
         21 TB,               \ test if char ready
         EQ IF,
             TOS 8 STCR,      \ read the char
             18 SBZ,          \ reset 9902 rcv buffer
             TOS SWPB,        \ shift char to other byte
         ENDIF,
\  *** handshake hardware off ***
         CARD @@ R12 MOV,     \ select card
         5 SBO,               \ CTS line HIGH. I am busy!
\  ******************************
         R12 RPOP,            \ restore old R12  *Needed?*
         2 LIMI,

Also it is up to you to test DSR on receive. I chose not to so that RCV could be as fast as possible.

When you are polling for serial data on a TMS9900 it's easy to miss characters.

Stuart · February 25

Also worth pointing out that SBO, SBZ and TB can take negative values. So you can leave R12 pointing at the UART and use negative values to control the CTS line. The offset is (from memory) *twice* the difference in CRU bit numbers. So if R12 = >1340, you need a SBO -25 to switch on the card LED at address >130E.

+TheBF · February 25

1 hour ago, FarmerPotato said:

Yes, that's right.

I've was thinking about an interpreter for direct threading. Instead of a pointer in the CFA (for instance to DOCOLON or DOCODE) the code is inlined and it is reached by BL *W. One less indirection, but lots more memory used. Hopefully faster?
Anyhow taking that to Forth thread.

I too believe the stack should grow downward. Maybe the return stack is fine to grow upward?

Typically stacks grow downward.

However, I have a DOS forth system here where the stacks are in a separate segment.

The return stack grows upward using BP and the DATA stack grows downwards using SP, from the other end of the segment.

Edit: And so with the BIND instruction a direct threaded system would have a one instruction interpreter. Pretty cool.

+TheBF · February 25

2 minutes ago, Stuart said:

Also worth pointing out that SBO, SBZ and TB can take negative values. So you can leave R12 pointing at the UART and use negative values to control the CTS line. The offset is (from memory) *twice* the difference in CRU bit numbers. So if R12 = >1340, you need a SBO -25 to switch on the card LED at address >130E.

Oh man. Now you've gone and done it. Now I have to change my code.

Thanks for that reminder Stuart.

apersson850 · February 25

I was thinking the same thing. It's only STCR and LDCR that must have the CRU base address straight on. The single bits have an eight bit offset, so in the range -128..+127. It's also true that the last bit of the CRU address doesn't count. So address >1300 and >1301 are the same. The reason for this is that although the TMS 9900 can address 65536 bytes of memory, it really addresses 32768 words of memory, then look either right or left for the even or odd byte. So from a hardware point of view, there are only 15 address bits. The least significant bit doesn't exist physically, just logically. But CRU IO is physical, so only even addresses are used there.

The single bit displacement counts physcial bits, though, so any displacement is really multiplied by two, when you look at the address.

With a base address of >1300 and a displacement of 5, you reach bit >130A. If you have the base address >1340, you need to subtract 27*2 to get >130A. So TB 5 with CRU base >1300 is the same as TB -27 with CRU base >1340.

As far as stacks go, i can agree upon that a pure return stack can just as well grow upwards, since normally you never use the value at the top of the stack without also removing it from the stack, since you want to do a branch with double indirection to return to your caller.

A stack growing towards lower addresses is more appropriate when you also do math on the stack top, and perhaps the value below.

Edited February 25 by apersson850

Assembly on the 99/4A

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members