Jump to content
IGNORED

BLWP?


Recommended Posts

Hey everyone!

 

Doing some more assembly language and wanted a bit of help on this instruction as I'm having a bit of trouble understanding it. We thought it would be good to use BLWP since we're attempting to write a more complex game (in 3d? maybe? if we can). I will attach the code I'm trying to run and hopefully someone can explain to me how stupid I am. Thanks!

3D.asm

  • Like 2
Link to comment
Share on other sites

Since all the real experts are having something to eat I will give you whatever I have gleaned. :) 

You are not alone with confusion using BLWP. I remember puzzling on it too. 

 

BLWP needs you to define a "vector" that consists of the workspace address and the entry address of the code you want to run. 

 

That would typically be done like this:

MYWKSP   BSS 32                * make some space for your registers

MYCODE   <CODE>
          <CODE>
            .
            .
      <END OF MY CODE> 


VECTOR  DATA  MYWKSP,MYCODE     * two memory words that point to the workspace and entry address  

MAIN    BLWP @VECTOR            * this will change to your workspace and run MYCODE  

 

  • Like 3
  • Haha 1
Link to comment
Share on other sites

48 minutes ago, TheBF said:

Since all the real experts are having something to eat I will give you whatever I have gleaned. :) 

You are not alone with confusion using BLWP. I remember puzzling on it too. 

 

BLWP needs you to define a "vector" that consists of the workspace address and the entry address of the code you want to run. 

 

That would typically be done like this:

MYWKSP   BSS 32                * make some space for your registers

MYCODE   <CODE>
          <CODE>
            .
            .
      <END OF MY CODE> 


VECTOR  DATA  MYWKSP,MYCODE     * two memory words that point to the workspace and entry address  

MAIN    BLWP @VECTOR            * this will change to your workspace and run MYCODE  

 

Only other thing to made note of is how r13 r14 r15 is used inside the new workspace.

 

You need to not use them so you can return to the part of code that called your blwp.

 

But they useful to access inline data passed after the blwp call or to access the previous calling workspace registers.

Edited by Gary from OPA
  • Like 2
Link to comment
Share on other sites

A pertinent note is that the routine you BLWP to typically uses a different workspace to your main code.

 

Simplish example below which demonstrates:

-- making a BLWP call

-- referencing main program registers within the BLWP'd routine using R13 (which is a pointer to the old workspace).

-- doing a BL within a BLWP.

 

START   LWPI WSREG       Load workspace for main program.

        LI R2,768        Write <Space> to entire screen.
        LI R0,>0400
        LI R1,' '*256
LP06    BLWP @VSBW
        INC R0
        DEC R2
        JNE LP06

<rest of main code>

WSREG   BSS 32            Program workspace registers.
UTILWS  BSS 32            VDP utility workspace registers.

*******************************
*VDP single byte write.
*BLWP @VSBW
*******************************
*R0 = address to write to.
*R1 = byte to write in MSB.
*******************************
VSBW    DATA UTILWS      Address of workspace for subroutine.
        DATA VSBWX       Address of subroutine ... which just happens to follow straight after.

VSBWX   MOV *R13,R0      Get old R0.

        ORI R0,>4000     Set bit 1 in address to 1 for write operations.
        BL @VDPADD       Set up VDP address.

        MOVB @>0002(R13),@VDPW  Write byte in MSB of old R1 to VDP.
        RTWP

*Common routine to set up VDP address stored in R0.

VDPADD  SWPB R0          Move LSB of address into MSB of word.
        MOVB R0,@VDPA    Write LSB of address to VDP.
        SWPB R0          Move MSB of address into MSB of word.
        MOVB R0,@VDPA    Write MSB of address into VDP.
        B *R11

 

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

For my two bits, the only time you would really use BLWP is if you want to use a separate workspace. If you can make do with the same 16 registers throughout, then BL is much faster. The idea is that BLWP and RTWP let you switch to an entirely different context, then back again. So it's good if the code needs to be distinct from the rest of the program. (Interrupts work this way, for that reason.)

 

It makes for arguably cleaner code, but if you're already writing in assembly language I'd argue that battle was already lost. ;)

 

  • Like 3
  • Haha 1
Link to comment
Share on other sites

6 minutes ago, Tursi said:

For my two bits, the only time you would really use BLWP is if you want to use a separate workspace. If you can make do with the same 16 registers throughout, then BL is much faster. The idea is that BLWP and RTWP let you switch to an entirely different context, then back again. So it's good if the code needs to be distinct from the rest of the program. (Interrupts work this way, for that reason.)

 

It makes for arguably cleaner code, but if you're already writing in assembly language I'd argue that battle was already lost. ;)

 

i personally like using BLWP when it comes to making library routines that are going to be use with more than one assembly program, over the years i built up a nice large set of blwp's do 90% of any program, and just have to write the 10% of assembly, and add the text data, and bang a new program is put together, and makes for maintaining the program easier as you using a set of known debugged utilities that have their own workspace and will not screw around with you main coding, it makes for good programming if used right.

 

of course for some time critical things like in a video game, you need to reduce the usage to the bare min. inside areas that need quick reaction to your player movements and onscreen graphics, so you have to judge correctly when you use blwp/rtwp instead of just BL and same 16 registers which if time-critical should also be in free unused 8300 pad area.

  • Like 3
Link to comment
Share on other sites

5 hours ago, Stuart said:
        LI R2,768        Write <Space> to entire screen.
        LI R0,>0400
        LI R1,' '*256
LP06    BLWP @VSBW
        INC R0
        DEC R2
        JNE LP06

Learn som'em new ery day ...didn't know you could use ' ' aside from the TEXT, DIRECTIVE, examples. The *256 is a good thought as well.:ponder:

 

However, assuming the default, COLOR MODE, the SCREEN IMAGE/PATTERN NAME TABLE, ends at >02FF. >0400 is just after the COLOR TABLE.

Also, INC R0, will overwrite the SPRITE DESCRIPTOR BLOCKS, VELOCITY/MOTION TABLE. Should be DEC R0.

 

I tried this using the EA, utils though. Haven't reviewed yours yet.

:)

Edited by HOME AUTOMATION
  • Like 3
Link to comment
Share on other sites

5 hours ago, Krool885 said:

Doing some more assembly language and wanted a bit of help on this instruction as I'm having a bit of trouble understanding it.

 

There are three primary branching instructions on the 9900:

 

B - Unconditional branch (fastest)

BL - Branch and Link (almost as fast as B)

BLWP - Branch and Load Workspace Pointer (slowest)

 

To really understanding the differences, you really need to understand that the general purpose registers (R0..R15) on the 9900 are no internal to the 9900 CPU.  The only *real* hardware registers are the PC (program counter), WP (Workspace Pointer), and SR (Status Register).

 

The 9900 general purpose registers R0..R15 are stored in RAM, and is called a "workspace".  The 9900 uses the WP to "point" to where the registers are located in memory.  This is very different from just about all other commercial CPUs of the era (and most CPUs in general, at all, ever).  There are advantages and disadvantages to having the CPU's registers set up this way, but let's skip that and just focus on the branching for now.

 

Note, on the 99/4A you ALWAYS want your registers stored in the 256 bytes of 16-bit scratch-pad RAM.  All other memory in the system is slow wait-state 8-bit RAM, and placing your registers in any RAM outside of memory from >8300 to >83F0 will slow your program down significantly.  Again, there is only 256 bytes of 16-bit RAM in the 99/4A, so you need to use it carefully.

 

The B instruction literally just loads a new value into the PC.  Program execution continues at whatever address was specified with the instruction.

 

The BL instruction is just like B, however the current value of the PC is first copied into R11 (the "link" part of the instruction), before replacing the PC with its new value.  Copying the PC to R11 is hard-wired in the CPU, so be sure you don't need the value in R11 before using BL.

 

Having the the PC saved in R11 this way allows you do "return" to where the BL instruction was issued, by using B *R11 (the assembler has a pseudo opcode "RET" that is converted into this instruction.  If you see "RET" in an assembly program, it is literally "B *R11").

 

Thus, BL is a basic subroutine instruction.  As long as the called subroutine does not change R11, it can return to the caller.  If you need to nest subroutines, then it is up to you to save the value in R11 before issuing another BL instruction (and restore R11 before issuing B *R11).

 

The BLWP is similar to BL, however when BLWP is executed both the PC and WP get new values, which means the subroutine is usually (but not always) using a new set of registers (changing the WP changes what memory the CPU uses for R0..R15).  The instruction also copies the old WP into R13, the old PC into R14, and the old SR into R15 of the new workspace.  You typically use RTWP to return from a subroutine called with BLWP, which will restore the caller's PC, WP, and SR.  Your subroutine needs to not change R13, R14, and R15 for the BLWP / RTWP mechanism to work.

 

What you need to keep in mind with BLWP is, if you use it as intended, then every subroutine needs to have its own 32-byte chunk of memory for the new R0..R15 workspace.  On the 99/4A, where you only have 256 bytes of 16-bit memory, this can get used up really fast.

 

If you limit yourself to only one BLWP at any one time, i.e. do not nest BLWP calls, then you can get away with just one additional chuck of 32-bytes for subroutines to use as their R0..R15 workspace.  But then this limits the benefits of BLWP and you might as well just use the faster BL.

 

You can also give your subroutines their own workspace when using BL, just like BLWP, so BLWP does not really have much benefit on the 99/4A.  The 9900 was designed to be the heart of a minicomputer where context switching would happen frequently, and BLWP would help a lot in that situation.  But for single programs, BL is a better, faster, more flexible choice.

 

IMO, for games, avoid BLWP.  For libraries or code designed to be reused, IMO still avoid BLWP and just set up a new workspace if necessary (you have to do that with BLWP anyway).

 

 

5 hours ago, Krool885 said:

We thought it would be good to use BLWP since we're attempting to write a more complex game (in 3d? maybe? if we can).

 

You might want to check out the first 3 or 4 pages (at the very least) in the Assembly Programming thread here on this subforum.  It was started to help people get into writing games in assembly on the 99/4A.

 

3D does not mean your program organization needs to be complex.  Always try to keep your programs as simple and well organized as possible, regardless of what the code is doing.

 

Also, 3D on any retro computer is not for the faint of heart.  The 9918A VDP (used in a lot of systems of the era, i.e. 99/4A, ColecoVision, MSX1, ADAM, Tomy, NES, etc.) does not have a true bit-addressable display, so plotting pixels is *slow*.  There are plenty of tricks for this kind of thing, and you are going to have to use all of them.  Of course you can use pseudo 3D, i.e. using the tile map to draw scenes that look 3D, i.e. like Tunnels of Doom, etc..  You could also consider using the F18A to draw, but that depends on what criteria you are setting for your project.

 

  • Like 3
Link to comment
Share on other sites

5 hours ago, HOME AUTOMATION said:

Learn som'em new ery day ...didn't know you could use ' ' aside from the TEXT, DIRECTIVE, examples. The *256 is a good thought as well.:ponder:

 

However, assuming the default, COLOR MODE, the SCREEN IMAGE/PATTERN NAME TABLE, ends at >02FF. >0400 is just after the COLOR TABLE.

Also, INC R0, will overwrite the SPRITE DESCRIPTOR BLOCKS, VELOCITY/MOTION TABLE. Should be DEC R0.

 

I tried this using the EA, utils though. Haven't reviewed yours yet.

:)

Well observed. ;-) This particular snippet of program isn't using the TI-99 default VDP layout.

Link to comment
Share on other sites

6 hours ago, matthew180 said:

Note, on the 99/4A you ALWAYS want your registers stored in the 256 bytes of 16-bit scratch-pad RAM.  All other memory in the system is slow wait-state 8-bit RAM, and placing your registers in any RAM outside of memory from >8300 to >83F0 will slow your program down significantly.  Again, there is only 256 bytes of 16-bit RAM in the 99/4A, so you need to use it carefully.

Which is why the hardware modification to install 16-bit wide fast expansion memory inside the console gives a significant speed boost for some programs.

6 hours ago, matthew180 said:

The BL instruction is just like B, however the current value of the PC is first copied into R11 (the "link" part of the instruction), before replacing the PC with its new value.  Copying the PC to R11 is hard-wired in the CPU, so be sure you don't need the value in R11 before using BL.

If we should be picky it's the address of the next instruction, after the BL instruction, that's stored in R11. This in turn means that it's possible to store fixed data in the code right after the call and then use that in the subroutine via some kind of MOV *R11+ or similar.

6 hours ago, matthew180 said:

The BLWP is similar to BL, however when BLWP is executed both the PC and WP get new values, which means the subroutine is usually (but not always) using a new set of registers (changing the WP changes what memory the CPU uses for R0..R15).  The instruction also copies the old WP into R13, the old PC into R14, and the old SR into R15 of the new workspace.  You typically use RTWP to return from a subroutine called with BLWP, which will restore the caller's PC, WP, and SR.  Your subroutine needs to not change R13, R14, and R15 for the BLWP / RTWP mechanism to work.

Normally yes, but you can take advantage of the fact that the value in R15 is used to set the status register after RTWP is executed. The subroutine called with BLWP can set bits as desired in R15, which means that you can design subroutines which allow mecahnisms like this one:

	BLWP @HASRS232
	JNE  NOPRINTER

Here we assume that the routine called figures out if there is any RS232 card or not, and sets the equal bit in R15 if there is. Then we can test the outcome of the BLWP like after an ordinary compare instruction or similar. Compared to BL the advantage is that we can set R15 whenever we like. If you try to do this with a BL routine, you have to get the status register updated last, before returning.

6 hours ago, matthew180 said:

If you limit yourself to only one BLWP at any one time, i.e. do not nest BLWP calls, then you can get away with just one additional chuck of 32-bytes for subroutines to use as their R0..R15 workspace.  But then this limits the benefits of BLWP and you might as well just use the faster BL.

Assuming your different library routines aren't dependent on each other, but still want to be independent of your caller's workspace, then it works perfectly with just one additional workspace. All BLWP vectors will point to the same workspace but different code. Each one of them can do whatever it likes with R0-R12 in its workspace, in spite of it being shared with other such routines.

6 hours ago, matthew180 said:

You can also give your subroutines their own workspace when using BL, just like BLWP, so BLWP does not really have much benefit on the 99/4A.  The 9900 was designed to be the heart of a minicomputer where context switching would happen frequently, and BLWP would help a lot in that situation.  But for single programs, BL is a better, faster, more flexible choice.

 

IMO, for games, avoid BLWP.  For libraries or code designed to be reused, IMO still avoid BLWP and just set up a new workspace if necessary (you have to do that with BLWP anyway).

BL may be faster and for that reason better, but for sure it's less flexible. If you do need to change to a different workspace, just because you use BL instead of BLWP (where the change is implied), then BL may not at all be any faster. Use two LWPI inside a BL routine and you've already lost the difference between the BL/B pair vs. the BLWP-RTWP pair. You've also lost the simple linkage between new and old workspace, which is not an insignificant disadvantage if you are writing general library routines. In a single program that's well planned you know the location of both and they'll not move (if "well planned" is true). Finally you lost the simple way of returning a status coming from whichever instruction in the subroutine, not just the last prior to branching back.

 

It's a very common mistake by TMS 9900 assembly programmers, even the better ones, to look at the timing charts and think "Ouch, that instruction is slow - I need to come up with something else" and in the end "something else" takes even more time. Since the TMS 9900 needs a lot of cycles just to get an instruction going, the golden rule is the fewer instructions, the better. A slow and complex instruction is almost always better than a few simpler after each other.

 

So BLWP is the most flexible and in all except the simplest cases also the fastest option you have. BL shines only when it's simple inline code and you want to make a subroutine of it simply because it's used in more than one place in your program.

If you need to do a new BL from inside a routine called by BL the fastest way is to save R11 but never restore it. If you have a register available, say R12 (assuming no CRU operation is done here), you can do like this. Most of the MOV instructions are just dummy, to indicated some activity.

 

MAIN	MOV	here,there
	BL	@SUB1
	MOV	this,that

SUB1	MOV	some,more
	MOV 	R11,R12
	BL	@SUB2
	MOV	now,again
	B	*R12

SUB2	MOV	my,data
	B	*R11

 

Edited by apersson850
  • Like 5
Link to comment
Share on other sites

Adding some practical examples to the theory that others have explained nicely... I use BLWP/RTWP extensively in my code, for example in the action game Rock Runner. Some pointers:

  • game_screen.asm contains the main game loop, which is mostly a list of calls to subroutines. Subroutines themselves can also perform nested calls. This keeps the code manageable and readable.
  • rendering.asm for example contains a set of subroutines for rendering. In their documentation, IN refers to a register in the workspace of the calling code. STATIC refers to a register in the workspace of the subroutine that has a fixed purpose across calls (like a static variable in C), so it is conveniently available in all related subroutines. I put the vector with addresses of the workspace and the code right before the code ('!' is a local label, supported in the xas99 assembler).
  • main.asm defines the addresses of 9 register workspaces for all kinds of purposes, 8 of which fit in fast scratch-pad RAM.
  • The few microseconds overhead are generally irrelevant.
  • Like 3
Link to comment
Share on other sites

On 3/16/2024 at 7:43 AM, apersson850 said:

BL may be faster and for that reason better, but for sure it's less flexible.

 

I disagree.

 

On 3/16/2024 at 7:43 AM, apersson850 said:

If you do need to change to a different workspace, just because you use BL instead of BLWP (where the change is implied), then BL may not at all be any faster.

 

That is entirely circumstantial.  Certainly if you need to change workspaces, and are managing the overhead or multiple workspaces, then by all means use BLWP.  But BLWP is still over two times slower to execute than BL, and if all you need is a quick subroutine then you have to decide what matters.

 

On 3/16/2024 at 7:43 AM, apersson850 said:

You've also lost the simple linkage between new and old workspace, which is not an insignificant disadvantage if you are writing general library routines.

 

Personally I would never write libraries with BLWP on a RAM limited machine like the 99/4A.  I do not want to be a programmer trying to dance around the whims of a library programmer, and I would never subject a dev trying to use a lib I wrote to having to screw around with giving my code its own workspace.

 

On 3/16/2024 at 7:43 AM, apersson850 said:

Finally you lost the simple way of returning a status coming from whichever instruction in the subroutine, not just the last prior to branching back.

 

Probably don't care most of the time.  It always depends, which is why there is no right answer 100% of the time, and people should not use libraries in limited systems like retro computers.  Assembly language, and especially games programming (which is the topic of this thread), is for having speed and control.  You don't want a bunch of code in there you didn't write or at least review.

 

On 3/16/2024 at 7:43 AM, apersson850 said:

It's a very common mistake by TMS 9900 assembly programmers, even the better ones, to look at the timing charts and think "Ouch, that instruction is slow - I need to come up with something else" and in the end "something else" takes even more time.

 

I do it all the time, and never walked away with something worse.  I wonder who all these "better" programmers are that do this and walk away with something slower?  Assembly programmers should absolutely read and understand the datasheet for the computer they are using.  They should use tools like Classic99 to measure actual performance of their code at runtime.  The should read and compare the ways other people have solved the same problem, and never take anyone's "word" for it.  Explore and learn it for yourself.  Sharing experiences is great, and always be open to learning a new trick or way of doing something.

 

On 3/16/2024 at 7:43 AM, apersson850 said:

Since the TMS 9900 needs a lot of cycles just to get an instruction going, ...

 

The 9900 needs one cycle (fetch) to get an instruction going, but whatever.

 

On 3/16/2024 at 7:43 AM, apersson850 said:

the golden rule is the fewer instructions, the better. A slow and complex instruction is almost always better than a few simpler after each other.

 

That's the "golden rule"?  Reference, please.  Highly subjective to the situation.

 

On 3/16/2024 at 7:43 AM, apersson850 said:

So BLWP is the most flexible and in all except the simplest cases also the fastest option you have. BL shines only when it's simple inline code and you want to make a subroutine of it simply because it's used in more than one place in your program.

 

That is quite a claim.  Just because you prefer BLWP over BL, there is no need to confuse people like this.  Each instruction has its use, and programmers should understand the instructions available to them, and make their own decisions based on what they are doing.  People in the forum should try to help others learn and understand, not make up statements just to support a personal preference on how to do something.  At least prefix things with "IMO".

 

On 3/16/2024 at 7:57 AM, Eric Lafortune said:

The few microseconds overhead are generally irrelevant.

 

Would you say the same thing about a game trying to do 3D at a playable frame rate, like the O.P. is trying to do?

 

I've watched Rasmus and other on the forum go through some pretty serious optimizations, unrolling loops, etc. to try and shave off enough cycles to get some of the games they were working on running well.  In cases like those, I'm sure the microseconds were adding up fast.

 

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

Of course a computationally intense game is sensitive to microseconds. The aren't the general case, though. If you write assembly support for Extended BASIC to do something that can't be done in BASIC alone, then even a full second may not be important.

 

Very few instructions are completed in 10 cycles, so yes, quite a lot of cycles are used even for the simplest of them. That's why adding a few more cycles to use a more "fancy" addressing mode is better than accomplishing it in a different way.

 

Finally, everything I write is my opinion. Frequently I also explain why. Which gives you all you need to make a more informed decision yourself. Myself, I don't use BLWP very often, mainly BL. I use BLWP when I feel it makes things better.

  • Like 5
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...