TMS9900 assembly language tricks

+TheBF · July 12, 2022

5 hours ago, apersson850 said:

I think Lee was referring to that the value in R4 has to be 00XX in hex. Or XX00. If you had used 199 hex in your example, it would have added 0199 to 9901, with the result 9A9A instead.

Indeed.

The stack diagram for FILLW ( addr len c -- ) means that the top argument is a character.

I know I know. It's just a comment. Not enforced by the compiler like certain other languages that begin with the letter 'P'.

That is all the warning you get to make sure the argument is less than 255.

If that is troublesome you could always add runtime checking yourself.

: FILLW   ( addr len c -- )
    DUP FF00 AND ABORT" You dummy! I told you the top argument is a character"
    FILLW 
;

Edited July 13, 2022 by TheBF
Updated code: mask was reversed.

apersson850 · July 12, 2022

A few posts back we were at assembly level, and there everything is game.

+TheBF · July 13, 2022

3 hours ago, apersson850 said:

A few posts back we were at assembly level, and there everything is game.

Things got a bit blurry I guess because the Assembler code was a finished Forth word.

The Forth routine lets you put a number directly into R4, like Assembler code so "everything is game" as well.

The difference, I suppose, is that if you use the correct operators for byte reads from memory or use a byte literal in your code, R4 will always be clear on lower bit side.

apersson850 · July 13, 2022

It doesn't matter if R4 contains a byte value per the TMS 9900 definition (to the left), or a 16-bit value not larger than 255 (to the right). As long as 8 bits are zero at one end, it will work.

From that point of view, the sequence

STWP R1

MOVB R4,@9(R1)

is better, since it will only duplicate the left byte in R4. The right byte doesn't have to be zero for this to work.

STWP R1

MOVB @9(R1),R4

will of course duplicate the right byte.

But it is at the expense of using one register for a temporary value. Now you frequently have such a register available, so that may not be too big a disadvantage.

apersson850 · July 14, 2022

On 9/8/2010 at 5:20 AM, insomnia said:

By using the BLWP instruction, you can have overlapping workspaces between the caller and callee.

The standard procedure is to access the caller's registers via R13 in the called procedure's workspace.

MOV @8(R13),R2

will bring the content of the caller's R4 to the called code's R2. More flexible but slower.

Likewise,

MOV *R14+,R2

will bring data following the BLWP instruction in memory into R2 in the subroutine. At the same time, it will advance the return address in R14, so we return to the code after the data.

This trick you can do with BL too, but you have to do a

MOV *R11+,R2

of course.

This can be used to pass parameters to subroutines. Let's say you have a routine which reads keypresses from the keyboard.

BL @KEYPRESS

DATA NULLINPUT

DATA INVALID

is a code sample where the two data items after the call instruction tells the routine what to do if the user presses just ENTER and what to do if he does enter something that's not valid, like not a number, for example.

In all cases, the subroutine must increment R11 by four to return properly, regardless of whether it really needs the data after the BL instruction or not.

A similar trick, but in the other direction, is that since R15 in a code called by BLWP will hold a copy of the status register, a copy that's restored to ST on return, you can pass a message there too. By manipulating R15, you can for example change the EQ bit before returning. Thus you can call a subroutine which sets up test conditions before returning, and make a call like this

BLWP @SUBVECTOR

JNE BADDATA

Here the subroutine is supposed to come back with the EQ bit set if everything is OK. If not, you want to handle that the data wasn't correct.

Edited July 14, 2022 by apersson850

JasonACT · July 24, 2023

On 9/5/2010 at 7:15 PM, retroclouds said:

I'd like to use this thread for collecting any cool tricks you can do in TMS9900 assembly language.

With "trick" I mean optimize a statement for speed and/or size.

Or just do something with an instruction you didn't think was possible at all.

So how about it, any cool tricks you wanna share ?

I'll kick it off with a little trick I found on Thierry's page:

Quote

C instruction

Appart for comparison, this instruction can also be used to increment a register by four:

C *Rx+,*Rx+

This uses only one word of memory as opposed to the equivalent :

INCT Rx

INCT Rx

Note that the corresponding CB instruction would increment the register by two, but there is no advantage over a plain vanilla INCT in this case.

This is such a bad idea... My latest blunder [for my Pi Pico PEB board] actually sees this for what it is in my logic analyser, as TI-XB does have a couple of places that uses this trick, and it seems the numbers being incremented by 4 are quite small - they end up reading LOW-ROM locations - which is fast memory on the TI, so not so bad - except where I screwed up in my Pico C code.

Indiscriminate use will put you in 8 bit memory though (a much slower space - making the increment... s...l...o...w...) and if you happen to be within those areas where the VDP read or GROM read addresses are, then you will throw them out of sync by auto-incrementing their addresses. Worse still, you may lock up the console with indiscriminate fast-access to those ports (TI says use NOPs between those accesses, which needs to be added for the faster 16 bit memory code, I know because I was able to often crash my TI with 32KB on the 16 bit bus back when I was 15 years old and had hacked quite a lot of 74LS chips inside the console).

While someone else woke this thread though, from 2010, I still offer my apology.

apersson850 · July 25, 2023

But INCT at a memory mapped address, like VDP read, isn't too clever either. Doesn't really matter if you use that or Compare.

JasonACT · July 25, 2023

15 minutes ago, apersson850 said:

But INCT at a memory mapped address, like VDP read, isn't too clever either. Doesn't really matter if you use that or Compare.

I'm not sure we're talking about the same thing here. You wouldn't map your work-space registers in the VDP read area.

JasonACT · July 25, 2023

On the other hand, yes, you're quite right.

+TheBF · July 25, 2023

2 hours ago, JasonACT said:

I'm not sure we're talking about the same thing here. You wouldn't map your work-space registers in the VDP read area.

Well... I think I read that here somewhere, where a person did that for fast writes to the ports.

Now I have to try it and see if it works.

apersson850 · July 26, 2023

Yes, you can do that, but you have to be aware that some of these ports aren't fully decoded, so they are recurring at more than one address. You have to verify that they don't show up again within the same workspace.

But my original comment wasn't about mapping the workspace over memory mapped ports, as the instruction used the addressing mode *Rx+, which is indirect and thus may reach anywhere outside the workspace.

JasonACT · July 26, 2023

21 hours ago, apersson850 said:

But INCT at a memory mapped address, like VDP read, isn't too clever either. Doesn't really matter if you use that or Compare.

Well, I think the VDP is much safer than GROM - GROM defaults the READY signal to pause the '9900 until it's in a state to respond. Fairly sure the VDP doesn't. So given R1 = >9802 and you do a double indirect + increment x2 read quickly, you're asking for the GROM address byte and data byte in fast sequence (because, as you say, addresses are not fully decoded). That's the sort of thing I was pointing out as being a bit nasty.

Without the workspace being mapped over memory mapped ports, INCT at that port will do a read, inc, write (along with an inc[t] of the R) - however TI made the read port ignore writes, and write port ignore reads... So I'd need an example to understand how this is quite as bad as a locked up console with the GROMs.

Edited July 26, 2023 by JasonACT

+mizapf · July 26, 2023

11 minutes ago, JasonACT said:

Well, I think the VDP is much safer than GROM - GROM defaults the READY signal to pause the '9900 until it's in a state to respond. Fairly sure the VDP doesn't.

There is no READY line from the VDP to the CPU. Would have made a lot of things easier.

senior_falcon · July 26, 2023

On 7/24/2023 at 7:06 AM, JasonACT said:

This is such a bad idea... My latest blunder [for my Pi Pico PEB board] actually sees this for what it is in my logic analyser, as TI-XB does have a couple of places that uses this trick, and it seems the numbers being incremented by 4 are quite small - they end up reading LOW-ROM locations - which is fast memory on the TI, so not so bad - except where I screwed up in my Pico C code.

Indiscriminate use will put you in 8 bit memory though (a much slower space - making the increment... s...l...o...w...) and if you happen to be within those areas where the VDP read or GROM read addresses are, then you will throw them out of sync by auto-incrementing their addresses. Worse still, you may lock up the console with indiscriminate fast-access to those ports (TI says use NOPs between those accesses, which needs to be added for the faster 16 bit memory code, I know because I was able to often crash my TI with 32KB on the 16 bit bus back when I was 15 years old and had hacked quite a lot of 74LS chips inside the console).

It took a number of readings before I understood what JasonACT's objection is.

As an example, let's say you wanted to prepare a section of memory in advance. We will make >B000 = >8800, >B002 = >8804, B004 = >8808 and so on.

LI R1,>B000

LI R2,>8800

LOOP MOV R2,*R1+

C *R2+,*R2+ add 4 to R2

JMP LOOP

I think if R2 is anything from >4000 - >5FFF and >8000 to >9FFF there could be trouble.

Whenever I have used this trick it has been to adjust a pointer in my code so it points to a different place in the code, and for that application there should be no problems.

apersson850 · July 26, 2023

Yes, he's worried about that as a side effect, the Compare instruction will read values from the addresses pointed to by the register being incremented. It has to, or it can't compare them. Now we don't care about the values read, as we don't care about the outcome of the compare instruction, but it's true that if you "compare" the VDP READ DATA address, then you'll increment the internal VDP address pointer, as it doesn't see any difference between a collateral data read and a real one.

He's also worried about that when you start pointing into 8 bit memory segments, then you'll have a penalty of 2*4 wait states to read the two values we never use anyway. Which means dual INCT may be comparable in speed, although still not in memory usage for the code itself.

All these worries are relevant, but as long as we dont count above 8192 (decimal), we hit the console's ROM on 16 bit bus, and that's neither slow nor vulnerable to being read. Up to 16384 it's slow, but not vulnerable, since it's RAM. In the DSR space it doesn't matter, since normally that memory would not be enabled unless it's a DSR call.

Incrementing the VDP address may or may not cause any issue. It depends entirely on what you do with it before you exchange data with the VDP next time. Also, it's not so that the VDP will blow up if you read data too fast. It will just return the wrong data.

Edited July 26, 2023 by apersson850

Willsy · July 26, 2023

Meh. It's assembly. You're free to screw up in any way you like! You're in control. And with that comes responsibility. I'm sure I use this trick in TF somewhere to save two bytes. As a space saver it's a useful technique when bytes matter more than speed. Meh.

+TheBF · July 26, 2023

1 hour ago, Willsy said:

Meh. It's assembly. You're free to screw up in any way you like! You're in control. And with that comes responsibility. I'm sure I use this trick in TF somewhere to save two bytes. As a space saver it's a useful technique when bytes matter more than speed. Meh.

Reminds me of the old American vaudeville joke:

Patient: "Doctor, it hurts when I do this."

Doctor: "Then don't do that!"

apersson850 · July 26, 2023

"Don't do that" is one of the best advice you can get for assembly programs.

The lower the level, the more important the structure becomes.

JasonACT · July 27, 2023

Yes... Don't do this:

        AORG >A000

        LIMI 0
        BLWP @vect
vect    DATA >9C00
        DATA here
here    BLWP @>0000

        END

EA5 file attached, safe for emulators, not so safe for real hardware... (I don't think it could permanently damage the console, I of course ran it a few times.)

BA000

Tursi · July 27, 2023

58 minutes ago, JasonACT said:
Yes... Don't do this:
        AORG >A000

        LIMI 0
        BLWP @vect
vect    DATA >9C00
        DATA here
here    BLWP @>0000

        END
EA5 file attached, safe for emulators, not so safe for real hardware... (I don't think it could permanently damage the console, I of course ran it a few times.)

BA000 150 B · 0 downloads

It won't do anything harmful.

Let's look at what happens here:

LIMI 0 - disable the interrupt mask. Only affects status register.

BLWP @vect - BLWP does the following: Reads the new workspace from the vector, stores the workspace pointer, PC, and status register into the new workspace, activates the new workspace, reads the program counter from the vector, and resumes execution at the new PC.

So the WP at vect is >9C00 - this is the GROM Write Data address, with Write Address set to >9C02. GROMs are incompletely decoded and will respond this way within the 1k range >9C00 - >9FFF

This is what a bus cycle looks like in the 99/4A writing to the GROM Write Data address:

GROM Access: 28 cycles (2 cycles memory access, 4 cycles wait state, 22 cycles GROM hold)

0-2 Dummy LSB cycle (3 cycles)

3 Beginning of MSB cycle, release of multiplexer hold, start of GROM chip hold (1 cycle)

4-25 GROM hold (22 cycles)

16-17 Completion of regular memory cycle (2 cycles)

So the new WP is >9C00, meaning the CPU will write the old workspace pointer to >9C1A. Since bit >0002 is set, this is write address operation. We don't know the old workspace pointer, but the first address write to GROM (LSB) will occur. The write to >9C1B is ignored by the hardware.

Then the PC is written to >9C1C. Bit >0002 is clear, so this will be a data write. No data is written, but the GROM will treat it as a data access, increment its internal address, and perform a data prefetch. This means the internal address latch is reset. Again, the write to >9C1D is ignored.

Then the ST is written to >9C1E. Bit >0002 is set again, so this will be an address write. Again, the unknown value is written to the GROM LSB and treated as the first write. The write to >9C1F is ignored.

Next, the PC is loaded with 'here', and the code resumes at that address (which in this case is the next address). Classic99, from the test point I started at (and it would depend entirely on how you loaded the code), changed the GROM address to >84C0.

'here' contains BLWP @>0000, which goes through the BLWP sequence again. This time, though, the new WP is >83E0, and the new PC is >0024 -- both normal addresses. Since no registers have been read, the only access to the WP at >9C00 was the BLWP storing data from the initial BLWP. Now, you might think GROM is left in an undefined state, but the console startup code expects this (because it is also the case at power up), and does a dummy read from GROMs before doing anything else with them.

There is come concern and myth in a lot of the early documentation. Concern about damaging GROM by hitting it too hard or locking up the system by reading from the sound chip. But I've never seen a case on any console I've tested on where it seems to bear any fruit. (The sound chip one is easily debunked - the sound chip never even gets a chip select on reads ). It's very difficult to overrun the VDP on the 99/4A, contrary to all the warnings in the documentation (which, of course, would have been more useful if faster sequels to the hardware did come out), and the only hardware lockup I'm aware of that is possible to do from software (which I have not tested yet) is to do a recursive 'X' instruction.... according to Barry Boone even a LOAD interrupt can't break it. For instance: li r4,>0484, x r4 -- 0484 is the opcode for X R4). But even that won't harm the hardware.

Edited July 27, 2023 by Tursi

Tursi · July 27, 2023

For what it's worth, I do know two ways to break "BLWP @>0000", and they should break hardware reset the same way if you have a button wired up.

The first is to call it with console interrupts enabled, and a user-defined interrupt configured at >83C4 that does not return. The user defined interrupt is /always/ called before the console finishes clearing scratchpad memory, so software can take back control, even from a hardware reset. (Actually, I don't think you need to enable interrupts yourself - the GPL Interpreter will do it for you.)

The second is to have certain bits set in the GPL Workspace, and I am not sure I ever took proper notes about exactly what. But you can confuse the interrupt handler such that when it's called (as I noted above it /always/ is), the console crashes instead of rebooting. As a precaution I just started clearing all scratchpad before resetting but I know I worked it out once or twice...

Edited July 27, 2023 by Tursi

JasonACT · July 27, 2023

58 minutes ago, Tursi said:

Next, the PC is loaded with 'here'

On real hardware, I don't think you ever get this far. You certainly do in an emulator though.

For the safety aspect, if the TI was prone to burning chips and needed fans, and those fans were software controlled - that's when I would be worried.

I did own a small form factor PC with software controlled fans, it got a BSOD and I took my time to write down the details (should have taken a photo) by the time I'd finished, the CPU or VDP had burned out.

JasonACT · July 27, 2023

3 hours ago, Tursi said:

and the only hardware lockup I'm aware of that is possible to do from software (which I have not tested yet) is to do a recursive 'X' instruction...

And I'll accept your last prototype Dragon's Lair device (doesn't need to be hardware, I've got a Pi Pico here) if I've just skooled you.

Edit: if the 16yo still in me did...

Edit2: this is only a joke, Aquarius is sometimes puzzling to Leos, I top-scored Dragon's Lair (to the dismay of the arcade owner) back in '85 on a real machine. That's what you get when a magazine publishes all the hidden puzzles.

Edited July 27, 2023 by JasonACT

apersson850 · July 27, 2023

From what I remember, the most vulnerable hardware to a software attack is the PSI, the TMS 9901. Since it has programmable IO pins, where you can set if they are outputs or inputs, you may turn two outputs agains each other, and set them to different levels.

JasonACT · July 28, 2023

Yeah, don't do that either. Or turn all your DSR ROMs on at the same time and read through their address range, making the line drivers fight one another.

TMS9900 assembly language tricks

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members