Jump to content
IGNORED

Assembly on the 99/4A


matthew180

Recommended Posts

On 1/2/2023 at 4:41 PM, retrodroid said:

So for a game/program that can complete all of its required processing in under 1/60th (or 1/50th) of a second, you might as well just have it execute after each vsync using a the "single-buffer" of the main screen.

You can also benefit from using a double buffering technique in that case (look at TI-Scramble, for instance). But if your main loop takes longer that 1/60s (1/50s) and you don't use a double buffering technique, you probably won't gain much from waiting for vsync.

Link to comment
Share on other sites

4 hours ago, Asmusr said:

You can also benefit from using a double buffering technique in that case (look at TI-Scramble, for instance). But if your main loop takes longer that 1/60s (1/50s) and you don't use a double buffering technique, you probably won't gain much from waiting for vsync.

Makes sense. With double-buffering you sync the buffer swap with the vsync to avoid screen tearing due to updating the buffer in the middle of its render cycle. If the time to render to the off-screen buffer > 1/60ths you just end up with a lower FPS than 60/sec, but no strange screen tearing etc.

Link to comment
Share on other sites

So here's another newbie to AL question.

 

I note that AL has multiple seemingly useful and powerful logical instructions to perform binary math operations. Coming from higher level languages, it's bit a of mystery to me exactly *when* each of these might be useful. I've annotated the ones below that I've identified uses for, top of mind: 

  • ANDI
  • CLR - reset word to zeros
  • COC
  • CZC
  • INV - Create a "masked"/selected effect for a char pattern?
  • ORI
  • SETO - Set word to "FFFF" / max value.
  • SLA - Multiply by 2 for each bit shifted, move LSB byte to the MSB position.
  • SOC
  • SRA - Divide by 2 for each bit shifted, move MSB byte to the LSB position.
  • SRC
  • SRL
  • SZC
  • XOR

 

Can someone provide references or examples when the others would come into play? So far the books I've seen all explain in one line and a simple example WHAT these do, but give no clue as to WHEN you might find them useful.

 

I feel like the "secret" to AL programming is knowing when to leverage these instructions. Thus far, I can kind of get where I need to go by using a small set of AL instructions pieced together in what is probably a horrifically clunky and inelegant manner, and I'd like to find resources to explain/show how/when to leverage these to  create elegant AL code.

 

Edited by retrodroid
Link to comment
Share on other sites

22 minutes ago, retrodroid said:

My take on this

  • ANDI
  • CLR - reset word to zeros
  • COC
  • CZC
  • INV - Create a "masked"/selected effect for a char pattern?
  • ORI
  • SETO - Set word to "FFFF" / max value.
  • SLA - Multiply by 2 for each bit shifted, move LSB byte to the MSB position.
  • SOC
  • SRA - Divide by 2 for each bit shifted, move MSB byte to the LSB position.
  • SRC
  • SRL
  • SZC
  • XOR
  • ANDI - Bitwise and operation with an immediate value. ANDI R1,>0FFF will let you keep 12 rightmost bits in R1, clear the four leftmost
  • CLR - reset word to zero
  • COC - Bitwise or test with a value at a general address. For example, COC R1,R2 will be true (equal bit set) if there are ones in each position in R2 where there are bits set to one in R1.
  • CZC - Bitwise and not test with a value at a general address. COC R1,R2 will be true if there are zeros in each position in R2 where there are bits set to one in R1.
  • INV - Reverse all bits at operand. One to zero, zero to one.
  • ORI - Bitwise or operation with an immediate value. ORI R1,>0FFF will set all 12 rightmost bits and not change the four leftmost.
  • SETO - Set word to "FFFF" / max unsigned value (65535) or -1 in 2's complement representation of numbers.
  • SLA - Multiply by 2 for each bit shifted, move LSB byte towards the MSB position.
  • SOC - Bitwise or operation with two operands. SOC R1,R2 will set all bits in R2 that are set to one in both R1 and R2. SOCB will do 8 bits instead of 16.
  • SRA - Divide by 2 with sign for each bit shifted, move MSB byte towards the LSB position. A copy of the leftmost bit is shifted in from the left towards the right.
  • SRC - Rotate bits in word to the right. Bits coming out to the right are re-fed in to the left.
  • SRL - Divide by 2 without sign for each bit shifted towards the right.
  • SZC - Bitwise and not operation with two operands. SZC R1,R2 will clear all bits in R2 that are set to one in R1. SZCB will do 8 bits instead of 16.
  • XOR - Exclusive or. Will set bits that are different in both operands, clear those that are not.

Bitwise and as well as or are for doing logical operations on 16 bits at a time. Not just "this and that", but "16 this and 16 that". ANDI, ORI, SOC, SZC and XOR belongs here.

Bitwise tests check for "this or that" 16 bits at a time. COC and CZC belongs here. CI and C (CB) instead compares the bits as a representation of numbers.

Shift instructions will move bits around. Arithmetic shifts are equivalent to multiply/divide by 2 a number of times. Logical and circular shifts more bits around for logical processing.

CLR, SETO and INV clear, set or invert bits for logical purposes.

CLR, SETO and NEG do the same for bits representing numbers.

 

For example, if you have a bit pattern representing positions of mines in R2, and another with a single bit representing you in R1, then

COC R1,R2

JEQ HIT

will jump to HIT if you are on one of the positions with mines.

 

SRC R4,8 will do the same thing as SWPB R4, but SRC can shift fewer or more bits as well.

Using INV on a character definition will invert all the bits, from on to off and the opposite.

Using

LI R2,>7C00

MOVB @CHDEF,R4

XOR R2,R4

will invert the five bits that are one bit from the left in R4. In this case, a character definition byte pulled from memory. The remaining bits in R4 are not touched. So XOR is kind of a selective INV, operating only the bits in the destination that have their corresponding bits set in the source.

 

Don't know if this helps you anywhere?

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

4 minutes ago, apersson850 said:
  • ANDI - Bitwise and operation with an immediate value. ANDI R1,>0FFF will let you keep 12 rightmost bits in R1, clear the four leftmost
  • CLR - reset word to zero
  • COC - Bitwise or test with a value at a general address. For example, COC R1,R2 will be true (equal bit set) if there are ones in each position in R2 where there are bits set to one in R1.
  • CZC - Bitwise and not test with a value at a general address. COC R1,R2 will be true if there are zeros in each position in R2 where there are bits set to one in R1.
  • INV - Reverse all bits at operand. One to zero, zero to one.
  • ORI - Bitwise or operation with an immediate value. ORI R1,>0FFF will set all 12 rightmost bits and not change the four leftmost.
  • SETO - Set word to "FFFF" / max unsigned value (65535) or -1 in 2's complement representation of numbers.
  • SLA - Multiply by 2 for each bit shifted, move LSB byte towards the MSB position.
  • SOC - Bitwise or operation with two operands. SOC R1,R2 will set all bits in R2 that are set to one in both R1 and R2. SOCB will do 8 bits instead of 16.
  • SRA - Divide by 2 with sign for each bit shifted, move MSB byte towards the LSB position. A copy of the leftmost bit is shifted in from the left towards the right.
  • SRC - Rotate bits in word to the right. Bits coming out to the right are re-fed in to the left.
  • SRL - Divide by 2 without sign for each bit shifted towards the right.
  • SZC - Bitwise and not operation with two operands. SZC R1,R2 will clear all bits in R2 that are set to one in R1. SZCB will do 8 bits instead of 16.
  • XOR - Exclusive or. Will set bits that are different in both operands, clear those that are not.

Bitwise and as well as or are for doing logical operations on 16 bits at a time. Not just "this and that", but "16 this and 16 that". ANDI, ORI, SOC, SZC and XOR belongs here.

Bitwise tests check for "this or that" 16 bits at a time. COC and CZC belongs here. CI and C (CB) instead compares the bits as a representation of numbers.

Shift instructions will move bits around. Arithmetic shifts are equivalent to multiply/divide by 2 a number of times. Logical and circular shifts more bits around for logical processing.

CLR, SETO and INV clear, set or invert bits for logical purposes.

CLR, SETO and NEG do the same for bits representing numbers.

 

For example, if you have a bit pattern representing positions of mines in R2, and another with a single bit representing you in R1, then

COC R1,R2

JEQ HIT

will jump to HIT if you are on one of the positions with mines.

 

SRC R4,8 will do the same thing as SWPB R4, but SRC can shift fewer or more bits as well.

Using INV on a character definition will invert all the bits, from on to off and the opposite.

Using

LI R2,>7C00

MOVB @CHDEF,R4

XOR R2,R4

will invert the five bits that are one bit from the left in R4. In this case, a character definition byte pulled from memory. The remaining bits in R4 are not touched. So XOR is kind of a selective INV, operating only the bits in the destination that have their corresponding bits set in the source.

 

Don't know if this helps you anywhere?

Thanks for the breakdown.  I guess my main issue is that I don't "think in binary" yet. I understand what the operations do, but I'm not able to identify when I would want to use them. I guess I'm looking for some examples or tips on AL algorithms that leverage these to best/most common effect.

Link to comment
Share on other sites

Memory is limited so information often has to be package tightly by using one or more bits within a word for storing different pieces of information. You can use these instructions to set, reset or extract the information.

 

Another common use case is for calculating an address into an array from coordinates or indexes. If the size of your array is a power of 2 you can do this by bit manipulation, which is faster than arithmetic. 

Link to comment
Share on other sites

The code is quite impossible to understand without knowing how the bitmap screen is structured, but here is a rough explanation.

 

The screen address is calculated as:
5_most_significant_bits_of_y * 256 + 5_most_significant_bits_of_x * 8 + 3_least_significant_bits_of_y =

5_most_significant_bits_of_y << 8 + 5_most_significant_bits_of_x << 3 + 3_least_significant_bits_of_y

 

MOV  R1,R4         ; 00000000YYYYYYYY
SLA  R4,5          ; 000YYYYYYYY00000
SOC  R1,R4         ; 000YYYYY???YYYYY
ANDI R4,>FF07      ; 000YYYYY00000YYY
MOV  R0,R5         ; 00000000XXXXXXXX
ANDI R5,7          ; 0000000000000XXX
A    R0,R4         ; 000YYYYY00000YYY + 00000000XXXXXXXX
S    R5,R4         ; 000YYYYY00000YYY + 00000000XXXXXXXX - 0000000000000XXX = 000YYYYY00000YYY + 00000000XXXXX000 = 000YYYYYXXXXXYYY 

 

  • Like 3
Link to comment
Share on other sites

42 minutes ago, Asmusr said:

The code is quite impossible to understand without knowing how the bitmap screen is structured, but here is a rough explanation.

 

The screen address is calculated as:
5_most_significant_bits_of_y * 256 + 5_most_significant_bits_of_x * 8 + 3_least_significant_bits_of_y =

5_most_significant_bits_of_y << 8 + 5_most_significant_bits_of_x << 3 + 3_least_significant_bits_of_y

 

MOV  R1,R4         ; 00000000YYYYYYYY
SLA  R4,5          ; 000YYYYYYYY00000
SOC  R1,R4         ; 000YYYYY???YYYYY
ANDI R4,>FF07      ; 000YYYYY00000YYY
MOV  R0,R5         ; 00000000XXXXXXXX
ANDI R5,7          ; 0000000000000XXX
A    R0,R4         ; 000YYYYY00000YYY + 00000000XXXXXXXX
S    R5,R4         ; 000YYYYY00000YYY + 00000000XXXXXXXX - 0000000000000XXX = 000YYYYY00000YYY + 00000000XXXXX000 = 000YYYYYXXXXXYYY 

 

So that is specific to Bitmap mode?

Link to comment
Share on other sites

1 hour ago, apersson850 said:

SOC - Bitwise or operation with two operands. SOC R1,R2 will set all bits in R2 that are set to one in both R1 and R2. SOCB will do 8 bits instead of 16.

 

Though the effect of this operation is definitely a bitwise OR and it probably makes more sense to consider it as such, the actual operation of “Set Ones Corresponding” is to set all bits in R2 that are set to one in R1. Whether both R1 and R2 bits are set is not considered. Those bits in R2 that correspond to 0 bits in R1 are simply left unchanged.

 

...lee

Link to comment
Share on other sites

2 hours ago, Asmusr said:

The code is quite impossible to understand without knowing how the bitmap screen is structured, but here is a rough explanation.

 

The screen address is calculated as:
5_most_significant_bits_of_y * 256 + 5_most_significant_bits_of_x * 8 + 3_least_significant_bits_of_y =

5_most_significant_bits_of_y << 8 + 5_most_significant_bits_of_x << 3 + 3_least_significant_bits_of_y

 

MOV  R1,R4         ; 00000000YYYYYYYY
SLA  R4,5          ; 000YYYYYYYY00000
SOC  R1,R4         ; 000YYYYY???YYYYY
ANDI R4,>FF07      ; 000YYYYY00000YYY
MOV  R0,R5         ; 00000000XXXXXXXX
ANDI R5,7          ; 0000000000000XXX
A    R0,R4         ; 000YYYYY00000YYY + 00000000XXXXXXXX
S    R5,R4         ; 000YYYYY00000YYY + 00000000XXXXXXXX - 0000000000000XXX = 000YYYYY00000YYY + 00000000XXXXX000 = 000YYYYYXXXXXYYY 

LOL.  I should have asked you about this last month or maybe even read the E/A manual. :)

 

I went through the process of translating the math from the TMS9918 manual myself.

Yours looks waaay faster.  I used a lookup table for the bit position.

Now I have to redo it... :(

But thanks for showing us how it's done.  

 

 

  • Thanks 1
Link to comment
Share on other sites

9 hours ago, Lee Stewart said:

 

Though the effect of this operation is definitely a bitwise OR and it probably makes more sense to consider it as such, the actual operation of “Set Ones Corresponding” is to set all bits in R2 that are set to one in R1. Whether both R1 and R2 bits are set is not considered. Those bits in R2 that correspond to 0 bits in R1 are simply left unchanged.

True. It could be that the folks at TI selected to call their OR operation Set Ones Corresponding because they have no AND operation. Instead they have the opposite of SOC, i.e. Set Zeros Corresponding. The second one could also be described as a Source and not Destination operation. Which is how TI define it in the TMS 9900 manual.

But as you also realized, setting the bits that are one in the source to one in the destination, and not touch the others in the destination, that's one way of defining how to perform a bitwise OR.

  • Like 1
Link to comment
Share on other sites

On 1/4/2023 at 8:39 PM, retrodroid said:

Thanks for the breakdown.  I guess my main issue is that I don't "think in binary" yet. I understand what the operations do, but I'm not able to identify when I would want to use them. I guess I'm looking for some examples or tips on AL algorithms that leverage these to best/most common effect.

Another real world example is when storing information on a disk, where a long string of bytes is used as a 'sector bitmap' to record which sectors on the disk are in use and which are free, with each bit equating to one sector. The binary operations can be used to find a '0' bit in the 'bitmap' which indicates that the related sector is free, and to set that bit to show that that sector is in use.

  • Like 2
Link to comment
Share on other sites

On 1/4/2023 at 3:39 PM, retrodroid said:

Thanks for the breakdown.  I guess my main issue is that I don't "think in binary" yet. I understand what the operations do, but I'm not able to identify when I would want to use them.

At the risk of being too obvious something that is often overlooked when people first encounter Assembler is binary multiplication and division.

It can save a lot of cycles when you need binary multipliers or divisors.

 

Multiply R1 by 2  with    SLA R1,1 
Multiply R1 by 4  with    SLA R1,2 
Multiply R1 by 8  with    SLA R1,3
Multiply R1 by 16 with    SLA R1,4 

Divide R1 by 2 with       SRA R1,1

etc...

 

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

I found this piece of code I've written once. Maybe it's useful as an example. It's written as a procedure to be called from Pascal, but the bit manipulation is the same as it would be, had it been called from BASIC.

The procedure declaration is

procedure convertline(row: integer; var buffer: packed array[0..255] of char); external;

 

The purpose is to convert the screen representation of graphics (bit-map mode) to a format that could be printed on a dot-matrix printer. On the screen the bits corresponding to one character position are represented by bytes in a horizontal way. When printed on a dot-matrix printer, the bytes represent vertical bit patterns instead. The code coverts one screen row (32 characters, or 256 by 8 bits) to a print row.

Comments are below each code segment.

There are a few things declared outside this code segment, like the workspaces, some utilities and the VDP setup. But that shouldn't matter for the example of bit manipulation. SP is the p-system's stack pointer and PASCALWS its workspace.

 

       .PROC CONVERTLINE,2
       .REF  VSBR,VMBR,DRAWWS

       MOV  *SP+,@DRAWWS+14      ;Buffer pointer -> DRAWWS R7
       MOV  *SP+,@DRAWWS+12      ;Screen row number -> DRAWWS R6
       LWPI DRAWWS

       MOV  R6,R5                ;Calculate start of row in screen image table
       SLA  R6,5
       AI   R6,IMAGE2
       SRA  R5,3                 ;Start of correct part of pattern table
       SLA  R5,11
       AI   R5,PATTERN2

First two parameters, the screen row number and a pointer to a buffer, are fetched. Then shift instructions are used to divide and multiply and some additions to calculate the start of the row in the screen image table and the pattern table.

       LI   R3,32       ;Number of bytes on a screen row
LOOP3  MOV  R6,R0       ;Read one byte from screen image
       INC  R6
       CLR  R1
       BLWP @VSBR
       SRL  R1,5        ;Calculate address in pattern table
       MOV  R1,R0
       A    R5,R0
       LI   R1,SOURCE   ;Fetch 8 byte definition
       LI   R2,8
       BLWP @VMBR

Some more shift and add to figure out where the pattern definition is. Then fetch it, 8 bytes long.

       LWPI PASCALWS         ;Start transposing the graphics
       LI   R1,DEST
       CLR  *R1              ;Clear destination buffer
       CLR  @DEST+2
       CLR  @DEST+4
       CLR  @DEST+6
       LI   R6,8
       LI   R3,7FFFH         ;Source mask

LOOP2  LI   R0,SOURCE
       LI   R4,8000H         ;Destination mask
       LI   R5,8             ;8 bytes to convert

LOOP1  MOVB *R0+,R2
       SZCB R3,R2            ;Clear all but interesting bit
       JEQ  BLANK
       SOCB R4,*R1           ;Set bit in destination if set in source
BLANK  SRL  R4,1             ;Change destination mask
       DEC  R5
       JNE  LOOP1            ;Check next source byte

       SRC  R3,1     	     ;Next bit in source
       INC  R1
       DEC  R6
       JNE  LOOP2

Here one row of 8 bits at a time is converted to one bit each in 8 column bytes. Note the shifts to move around the bit masks which selects the bits, and the SOCB/SZCB to set or clear bits based on the contents of the bit masks. The circular shift of R3 is used to move the single zero in the bit mask one position to the right each time, since bits set to one, that are shifted out to the right, are re-feed in from the left.

       LWPI DRAWWS

       LI   R10,8
       LI   R12,DEST
OUTLOOP MOVB *R12+,*R7+
       DEC  R10
       JNE  OUTLOOP

       DEC  R3           ;Next byte in row
       JNE  LOOP3

       LWPI PASCALWS
       B    *R11

SOURCE .BLOCK 8
DEST   .BLOCK 8

       .END

Here 8 converted bytes are written to the buffer, and then the history repeats itself until the 32 bytes representing a character row on the screen has been converted and stored in the buffer. The buffer is then ready to be inserted in the print string, but that was done on Pascal level.

Edited by apersson850
  • Like 3
Link to comment
Share on other sites

15 hours ago, TheBF said:
Multiply R1 by 2  with    SLA R1,1 
Multiply R1 by 4  with    SLA R1,2 

I was surprised to see that a shift operation can be just as expensive as a MPY multiplication, though I do not really know how to read this table...

image.thumb.png.35694b3a494c4c276674be765b13e228.png

 

 

  • Like 1
Link to comment
Share on other sites

The table should be read like this:

  • First case, a normal shift of 1-15 bits. 12 cycles to start the whole thing, then 2 cycles per bit to shift. There are three memory accesses. One for the instruction and two for the register to shift. For each you add four cycles if you access memory on an 8-bit bus in the 99/4A. Most common would be that the instruction is fetched from memory expansion, but the register file is in scratch-pad RAM. Thus you have one slow and two fast memory accesses, so add four cycles for that. Thus a 7 bit shift would be 12+2*7+4=30 cycles.
  • Second case, a shift of zero bits, which is used to indicate that the actual number of bits are to be found in R0. In this case R0 is zero, which is pointless, so it's instead interpreted as 16 bits. One extra memory access to get the bit count, but that's in fast RAM, so we end up with 52+4=56 cycles.
  • Third case, shift count fetched from R0 and it's not zero in this case. If R0 holds 7, we shift seven bits and get 20+2*7+4=38 cycles.
  • Like 1
Link to comment
Share on other sites

4 hours ago, retrodroid said:

Thanks guys - I'll study these.  I guess what I'll end up doing is putting together my pseudo-code for key routines and then maybe posting my naive AL implementation ideas here for expert review. That should provide some fertile learning opportunities.  :)

The code piece I posted actually implements a Pascal procedure. I tested the principle in Pascal, then made it fast enough by shifting over to assembler.

Link to comment
Share on other sites

in the table above, "WRO" should actually read "WR0" (workspace register 0), and would have better been written as "R0". Also, I guess "WRP" is a typo (P next to O on the keyboard, possibly typed by someone with no or little knowledge in assembly language) and should be "WRO", or actually "WR0" ... or better "R0", anyway. OK, if I say "WRP" I actually mean "R0". 🙂

 

  • Like 2
Link to comment
Share on other sites

11 hours ago, SteveB said:

I was surprised to see that a shift operation can be just as expensive as a MPY multiplication, though I do not really know how to read this table...

image.thumb.png.35694b3a494c4c276674be765b13e228.png

 

 

Indeed.  There is no free lunch with the 9900. :)

So a few shifts are faster but there comes a time when you just use MPY.

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...