Jump to content
IGNORED

Imagine a 32-bit 9900


FarmerPotato

Recommended Posts

Imagine how you might go about extending the 9900 family to 32-bit. What problems do you run into and what solutions do you propose? 
 

Yes, TI started over fresh with their 32-bit architectures like TMS320 and TMS340. (TMS430 is 16-bit again.) Except for 340, these are still going strong, alongside TI’s 32-bit ARM offerings. 

A definition of 32-bit will include 32-bit internal data path, really everything wider like registers, address space, arithmetic. 

 

I guess this 9900 would be designed in the 386 era? 
 

It must be 9995 compatible, and 99105 if you’re familiar with that.  99105 does have some 32-bit arithmetic and some 2 word opcodes.

 

Note: many 32-bit architectures pack two 16-bit instructions into a 32-bit word. 
 

  • Like 3
Link to comment
Share on other sites

The interesting thing about 32-bit architectures is that you can have 3-register operations. I learned this form the MIPS architecture; it took some time for me to drop the habit of copying register values before using them. For example, you may have a "add $2,$3,$4" which means adding registers 3 and 4 and storing the result in register 2. With 32 bit instruction width, this allows for a register number width of 5 bit (32 registers) plus a lot of spare bits for opcode and other features.

 

Also, the immediate commands of MIPS carry the immediate value in the lower 16 bits of the instruction.

 

The problem is that if you want to stay compatible with the 99xxx architecture, you cannot use all these things.

  • Like 1
Link to comment
Share on other sites

1 hour ago, FarmerPotato said:

Imagine how you might go about extending the 9900 family to 32-bit. What problems do you run into and what solutions do you propose? 
 

Yes, TI started over fresh with their 32-bit architectures like TMS320 and TMS340. (TMS430 is 16-bit again.) Except for 340, these are still going strong, alongside TI’s 32-bit ARM offerings. 

A definition of 32-bit will include 32-bit internal data path, really everything wider like registers, address space, arithmetic. 

 

I guess this 9900 would be designed in the 386 era? 
 

It must be 9995 compatible, and 99105 if you’re familiar with that.  99105 does have some 32-bit arithmetic and some 2 word opcodes.

 

Note: many 32-bit architectures pack two 16-bit instructions into a 32-bit word. 
 

It would really benefit from a clean slate as far as the instruction set goes IMHO.

 

By that I mean opening up some of the bit fields to allow 32 registers in a workspace.

Post inc. and post dec. would be nice. ( *R1+  *R1- ) 

 

Of course there goes compatibility.

 

 

  • Like 2
Link to comment
Share on other sites

39 minutes ago, TheBF said:

It would really benefit from a clean slate as far as the instruction set goes IMHO.

 

By that I mean opening up some of the bit fields to allow 32 registers in a workspace.

Post inc. and post dec. would be nice. ( *R1+  *R1- ) 

 

Of course there goes compatibility.

 

 

I like the idea of expanding the Ts and Td fields. I guess the assembly would look the same but produce entirely different binary. 


99105 builds 2 word opcodes on some unused 1-operand type instruction space. It uses the few bits to define a subopcode, whose real fields for src and dst are in the 2nd word.  I feel “yuck” when I read this. 
 


 

  • Like 3
Link to comment
Share on other sites

2 minutes ago, FarmerPotato said:

I like the idea of expanding the Ts and Td fields. I guess the assembly would look the same but produce entirely different binary. 


99105 builds 2 word opcodes on some unused 1-operand type instruction space. It uses the few bits to define a subopcode, whose real fields for src and dst are in the 2nd word.  I feel “yuck” when I read this. 
 


 

Yup. Backwards compatibility is a straight-jacket.

Oh. I forget to mention.  Give me two stacks as well! :) 

  • Like 2
Link to comment
Share on other sites

6 minutes ago, GDMike said:

I think that TMS32010 is a 32 bit HOST type processor, meaning it can attach itself to the 9900?

You could attach anything to anything.
 

I see a possibility where the 99105 lets a co-processor handle an undefined instruction (like floating point) if the “Attached Processor” asserts the APP input.
 

If the 99105 hits an undefined instruction, and it sees APP raised, it momentarily outputs some helpful stuff like the current WP, then goes into a HOLD state. The Attached Processor presumably welcomes this and is able to reads/write from memory, including registers, while it’s in control. 
 

If the 99105 does not see an APP volunteer, it searches its Macrostore for a matching routine (I.e. the floating point library.) See @pnr ‘s excellent disassembly of that in the 99110 mask ROM. 
 

  • Like 1
Link to comment
Share on other sites

5 minutes ago, TheBF said:

Yup. Backwards compatibility is a straight-jacket.

Oh. I forget to mention.  Give me two stacks as well! :) 

Suppose there were a status bit for “16-bit”. A clean 32-bit CPU core could have a CALL instruction to execute code in a 64K compatibility space. Not sure how an RT would work to get back… but a smart loader could replace RTWP with a “RTWP from 16 bit mode”. otherwise any 16 bit code would run forever…

 

The 99105 extends the RTWP instruction with bits for Return from Macrocode with or without accepting an interrupt before the next instruction. 


then in a clean 32 instruction format, you have room for the things you want!
 

The MSP430 makes some nice use of addressing modes without increasing the field size in a 16 bit instruction. It does it by “stealing” R0 and R1 to mean special things.

 

The PDP-11 defined R6 (?) as the stack pointer and R7 as the PC. So push and pop just worked on R6. And a branch was loading R7. It had 3 bits for addressing mode (so in 8 modes you get that pre-decrement and double indirection) plus 3 bits for register.
 

The PDP-11 two general-operand instructions like A or MOV are 4 bits plus (3+3) for source and (3+3) bits for Dest.
 

Quite similar to a 9900 instruction like A or MOV which divide 6 bits into 2 for addressing mode and 4 bits for Register. 4 + (2+4) + (2+4) 

 

 


 


 

 

  • Like 1
Link to comment
Share on other sites

2 hours ago, apersson850 said:

It works better if you combine post-increment with pre-decrement. Then you can use the same pointer register to store/read values like you push/pop in a stack.

Though I suppose, if you were starting over, you'd just give it a stack. Or even two. If you have 32 x 32-bit registers then there's no reason why you couldn't double-purpose two of those registers as stack pointers.

Link to comment
Share on other sites

With the TMS 9900, we can handle stacks with these instructions:

SP	EQU	10

POP
	MOV	*SP+,R0

PUSH
	DECT	SP
	MOV	R0,*SP

Since we have 16 workspace registers to play with, we can implement several stacks at the same time by this method. If we want to.

However, if the CPU could not only post-increment (the address in the register is first used, then incremented), but also pre-decrement (the address is first decremented, then used) we could have simplified the code like this:

SP	EQU	10

POP
	MOV	*SP+,R0

PUSH
	MOV	R0,*-SP

For this to work it must decrement before using the address, as the stack pointer normally should point to the value at the top of the stack.

 

This is a more flexible concept than just having a stack like in older CPU designs. For branch instructions, you could pre-define a register to be used as stack pointer, or you could include the stack pointer in the branch instruction. Thus you can easily define a substack, for example for a concurrent process, or if you want to separate data- and subroutine stacks (like in Forth), or if you want to traverse an activation record on a stack, without destroying the real stack pointer.

  • Like 6
Link to comment
Share on other sites

One place where this could be relevant is for the F18A GPU. When I experimented with raycasting using the GPU, it would have been great to be able to multiply (signed) 32-bit numbers, and in order to access the extended RAM of the MKII, some instructions working on 32-bit addresses would also be useful. However, I don't think this is valuable enough to hold back the release of the MKII. 

 

 

 

 

Link to comment
Share on other sites

52 minutes ago, Asmusr said:

One place where this could be relevant is for the F18A GPU. When I experimented with raycasting using the GPU, it would have been great to be able to multiply (signed) 32-bit numbers, and in order to access the extended RAM of the MKII, some instructions working on 32-bit addresses would also be useful. However, I don't think this is valuable enough to hold back the release of the MKII. 

So you need 32bit X 32bit -> 64 bit result?

 

Could you make do 32bit X 16bit -> 32bit result?

Link to comment
Share on other sites

3 hours ago, Asmusr said:

Yes and maybe. But 32bit x 16bit = 48bit.

Yes 48 bits.

I was thinking of "mixed" operations that Forth has.  There is a mixed multiplication operator (M*)  that does this but when I looked at the source for the one I have it is written in Forth.

And although it makes use of the 9900 MPY to get a 32 bit result, applying the sign is rather involved, which I now get is why you want 32x32 "signed" multiplication in hardware. :) 

 

 

 

 

Link to comment
Share on other sites

9 hours ago, apersson850 said:

 

This is a more flexible concept than just having a stack like in older CPU designs. For branch instructions, you could pre-define a register to be used as stack pointer, or you could include the stack pointer in the branch instruction. Thus you can easily define a substack, for example for a concurrent process, or if you want to separate data- and subroutine stacks (like in Forth), or if you want to traverse an activation record on a stack, without destroying the real stack pointer.

 

I would wish for addressing modes to cover pre-decrement, as well as double-indirect. These can implement call and return instructions.

 

-R1      pre-decrement

**R1    double indirect

 

B *(*TOS+)  double indirect, return from subroutine

 

99000 Instructions

 

BLSK, branch and link to stack, is a 99000 instruction.  It does the pre-decrement of SP and pushes the return address onto the stack.   It is unique in that way. But it only takes immediate mode arguments like LI! It is nestled into the instruction set next to LST,LWP

 

BIND, branch indirect, does double-indirect. It is the only 99000 instruction to have that mode. 

 

I had trouble understanding these at first. 

 

So:

 

SP     EQU 10
       BLSK SP,SUBP   * branch to SUBP, push return address onto stack, like *-SP
NEXT   EQU $

...

SUBP   BIND *SP+     * branch to return address, popped from stack, like MOV *SP+,R0 and B *R0

BIND is also a CASE construct, since it has general addressing.

       A    R1,R1
       BIND TBL(R1)
TBL    DATA CASE0,CASE1,CASE2,CASE3
       BIND R1         PC = *R1
is the same as
       B    *R1        PC =  R1

 

Orthogonal instructions?

 

9900 has 2 general operands for major operations (Move, Add, SOC, etc) but the cases without are frustrating. "Orthogonal" would mean that every instruction gets the same opportunity (including Immediate mode for MOV etc.) 

 

Are these holes part of our love for the 9900? Would fixing them just make it more like any other processor?

 

 

  • Like 2
Link to comment
Share on other sites

I recommend you take a look at Digital's VAX 11 instruction set. I'd say that the VAX 11/750 it's the best CPU I've ever written assembly language programs for. There is a deferred mode, where you can access data from the address in a longword (their name for a 32 bit entiety) which is pointed to by a register.

The assembler for the UCSD p-system was the first I had for the TI that could handle macros. There I implemented the same idea, i.e. Branch with Link to Stack and ReTurn with Link from Stack (BLS and RTLS).

 

The TMS 9900 instructions with two general addresses  have only four bits remaining for the opcode in the word. One is used to flag byte/word operation, so three bits remaining for the instruction itself. Of these eight possible values, they could implement six instructions (A, S, MOV, C, SZC and SOC). The remaining two bits flag all other instructions.

They did a good job within the framework of a 16 bit CPU, I'd say.

Edited by apersson850
  • Like 2
Link to comment
Share on other sites

24 minutes ago, FarmerPotato said:

@apersson850 thanks for the pointer to VAX. I’ve never yet looked at that instruction set. I used VMS quite a bit in college but only in Pascal. 

I know your pain... That and Cobol... we also had Vax at our college.

There was also Rally SQL on ours where just modifying a label on a screen you designed meant that you had time to go eat lunch and return before it finished rebuilding the database...

 

On the other hand I did introduce a version of Tetris for the Vax on our system. I had foun d it onb one of those PD CD compilation. It was badly coded but worked, albeit while taking about 80% of the processor time which made our 16 VT termila equiped lab crawl...

Link to comment
Share on other sites

Ok, beans spilling time. 
 

I came up with a scheme mixing 16-bit and 32-bit instructions.

 

Not like the 99000 does though, where the first word is nearly just a stub and needs a second word.  

 

A 32-bit instruction fills the whole longword. It has wider bit fields, most expanded by one bit. 
 

So Ts/Td is 3 bits and allows 4 new addressing modes. Register is 5 bits allowing R16-31. B is 2 bits, allowing byte, word, and long word modes. 

There are 2 CPU modes. 
 

A status register bit is defined to mean 32 bit mode. In this mode, instruction acquisition grabs 32 bits, which should (must?) be longword aligned (or kill your bus performance.) 

 

In 16-bit mode, the 16-bit instructions are the exact same code as 99xxx , but are “expanded” in the CPU pipeline with 0s in any new fields. They are limited to accessing the local 64K of memory and 16 registers. All immediate operands and addresses are as before, 16bit. Identical to 9900.
 

For nostalgia, 32-bit instructions have the same first 16 bits as their 16-bit counterparts, and have the same function. The new bits are in the second word. So you can kind of disassemble the binary and pencil in the new bits after. 

The mode is switched from 16 to 32 by a trivial 16-bit instruction in unused (MID) space. From 16-bit mode, you assemble a MODE32 macro, which fills in a NOP first if needed to make the next instruction 32-bit aligned. A similar MODE16 macro writes the instruction to switch back to 16-bit mode.
 

One benefit: you can have subroutines in 16-bit instructions, which use less memory and therefore execute faster. 


Instructions in 32-bit mode can operate on registers as 8,16, or 32 bit quantities.

The assembler should have mnemonics for the three. 
16-bit was AB, A.

32-bit could be AB, A, AL. 
99000 had AM for add multiple (32 bit) 

 

 (kind of yucky but ambiguity must be avoided and I hate AM,SM,SRAM,SLAM mnemonics for the 99110 32-bit arithmetic. )

 

Registers R0-31 are just word offets in a 64-byte workspace. A single register is still 16 bits. If you use R1 in a 32-bit MOV.L, you are moving R1 and R2. 
 

I toyed with the idea of making R16-31 always indicate a 32-bit operation but ambiguity causes havoc.
 

Maybe the assembler could call them N0 to N15 and convert the opcodes B field to a longword mode.  Writing N9.H or N9.L would leave the instruction in word mode. And put 16+4 in the Ts/Td field. 

 


 


 

 

I’m not sure if the PC and WP should have a “shadow” upper word. Certainly they must point to memory in a 32-bit address space (4 gigabytes) but then what do they do when you switch to MODE16? Probably continue to work as 32 bit addresses with the top half hidden,  but STWP and B are weird.
 

Maybe 16-bit addresses just refer to a 64K page around the present location of the PC. The top word of PC is hidden.

 

 And B or any pointer really, can’t escape that boundary. The assembler errors it if you write a reference to a label that is too far away. and a loader loading relocatable code errors out if a 16-bit reference would cross a boundary and  be out of range. Probably, you’d want an assembly directive to start a new 64K aligned relocatable address.  (Maybe the unused PSEG directive.)

 

The loader would error out on illegal combinations of REF/DEF. 
 

BLWP/RTWP could use N13-N15 when executed in 32-bit mode. Calls from 32-bit sequences *to* 16 bit sequences might be disallowed by the assembler, or fixed up with a mode switch instruction, but returning would be impossible unless the  return address would fit in 16 bits of R13.
 

Register accesses from 16-bit code would invisibly refer to the 32-bit callers WP, but lots of bugs would ensue if that workspace , set up during 32-but mode, spanned a 64K page boundary. 
 

I’ve been inspired by TI’s NuBus lately. The only NuBus systems historically were:

 

TI’s Explorer (based on NuMachine)

the TI business system 1500 (Unix)

Macintosh II

NeXT workstation

 

 

NuBus specifies how 8,16, and 32 bit bus accesses work together. There is no need for read-before-write. There are two extra bus bits, one is R/W but the other combines with the lower two address bits to encode the transfer size. Which is either 32 bits, the top/bottom 16 bits, or one of 4 bytes. 
 

Recall how earlier I mentioned 1 extra in the Byte field indicator.

 

This maps exactly to the NuBus transfer size bits. And it leaves an unused mode… in NuBus that is a block move of 1-16 bytes. Hmm. 

This is all just wild dreaming. But it could all be done in an FPGA and run at 100 MHz. Still, who but us 9900 fans would want such a thing?

 

 

 

 


 

 

 


 

 

Edited by FarmerPotato
  • Like 3
Link to comment
Share on other sites

1 hour ago, FarmerPotato said:

This is all just wild dreaming. But it could all be done in an FPGA and run at 100 MHz. Still, who but us 9900 fans would want such a thing?

Haha, that's the easiest of all your questions to answer.

Maybe not even all of us. My interest today in assembly and "close to the hardware" programming I get enough of at work. Or with the old TI 99/4A, which has the advantage that it does not run at many Megahertz, or even more. Thus a simple hardware modification, like a wire from A to B, means that the signal goes from A to B, not from A up in the air.

 

But the exercise can be fun, though. I've been involved in one project in my lifetime, where we designed a new CPU from scratch. I was leading the micro programming team, or rather the part of it that coded everything but floating point operations. It was an interesting experience.

  • Like 5
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...