Jump to content
IGNORED

BLWP vs BL


Willsy

Recommended Posts

53 minutes ago, apersson850 said:

Having the stack pointer also be the frame pointer/environment pointer, isn't that a bit confusing if you need to push/top temporaries on the stack? I've always preferred a separate frame pointer.

A trick that I saw in intel compiled code was to push the original RP onto the return stack as the last item saved on the stack frame.

I made a cheap and dirty local variable system for my Forth using this idea. (Apologies for the RPN assembler)

CODE LOCALS ( n --) \ build a stack frame n cells deep
   RP R0 MOV,       \ dup the return stack pointer 
   TOS 1 SLA,       \ top of data stack x 2 = memory needed
   TOS RP SUB,      \ allocate space on return stack 
   R0 RPUSH,        \ push the original stack pointer onto return stack 
   TOS POP,         \ refill tos data stack cache register (R4)
   NEXT,
ENDCODE 

 

Then all the locals are accessed with indexed addressing using the RP register. 

 

Then to collapse the stack frame you just do this.

 

*RP RP MOV, 

 

Edited by TheBF
fixed comment
  • Like 1
Link to comment
Share on other sites

1 hour ago, apersson850 said:

Then you are concerned about execution performance, I presume?

In my programming life I also care about programming performance and reliability, not only execution speed. To isolate caller from called a BLWP could be handy, and isolation is usually good to prevent one stupid mistake to kill more than itself. Unless it's real-time programming where performance is cruical, or it simply doesn't work.

Sorry to go off the tangent, but I think I need to write this, since I deeply care about execution performance.

 

I haven't professionally written software for the past seven or eights years or so, but the preceding 25 years I spent a lot of my career writing very time critical software. Mostly on x86 platforms, but also some on TMS320C30 and TMS320C51 DSPs. I was first in the video conferencing industry and then in the game streaming industry before moving on. In those companies the performance of the software - both execution speed and reliability - were paramount, not only for technical reasons but they were business enablers and drivers too.

 

In the late nineties we created the world's first H.320 standard compliant software-only video conferencing solution for Windows. I wrote both H.261 standard compliant video encoders and decoders and spent a lot of time optimising them, it wasn't easy to get them to run in real-time with the PCs at the time. A lot of assembler coding using MMX technology, later on that changed to SSE. But I'm especially proud of the MPEG-4 encoder I wrote later on for game streaming. I think I spent at least a year optimising it, using SSE2 and SSSE3 (supplemental streaming extensions 3 - what an acronym) assembler code. The MPEG-4 encoder was/is very heavily multithreaded and uses dynamic programming algorithms to achieve minimum run time per frame with consistent image quality. In the process of all of this I also co-invented a very nifty motion estimation algorithm for 3D graphics, this is described in the U.S. patent is https://patents.google.com/patent/US20040095999A1/en, but that's another story. (Just now noticed that they misspelled my first name in that patent...)

 

Over the years I have hired a lot of programmers and one of the things I've tried to teach them is to indeed care about performance :) and reliability. Keep it simple and make it good enough, but not too good so that we can get it done and actually ship it...

Edited by speccery
  • Like 8
Link to comment
Share on other sites

On 12/3/2022 at 2:19 PM, apersson850 said:

The drawback of the flexible stack method is that the TMS 9900 doesn't have "deferred indirect".

The need for return stacks  led to the 990/10 BIND and BLSK instructions. Since they're only in 99105, not our consoles, that is just a historical footnote. 

 

Prior to that, TI used stacked 16-word workspaces. One technique was to construct a BLWP vector in R10-R11 then BLWP R10. Where R10 was usually the stack pointer,

AI R10,32

LI R11,CALLEE

BLWP R10

created a stack WP frame. 

Other stack techniques can be found in TI's Microprocessor Pascal manual for the TM990 series, or the TI PASCAL manuals for the minis. (Different). Also the Software Development Handbook.

 

I think the same MP Pascal stack frames are shown in assembly in the  TI Realtime Executive manual. All in Bitsavers. Some under the AMPL directory. Lots of 9900 heritage we can study. 

  • Like 3
Link to comment
Share on other sites

1 minute ago, FarmerPotato said:

The need for return stacks  led to the 990/10 BIND and BLSK instructions. Since they're only in 99105, not our consoles, that is just a historical footnote. 

 

Prior to that, TI used stacked 16-word workspaces. One technique was to construct a BLWP vector in R10-R11 then BLWP R10. Where R10 was usually the stack pointer,

AI R10,32

LI R11,CALLEE

BLWP R10

created a stack WP frame. 

Thanks for sharing this, that's interesting! And quite dense, looks like only 5 words of code space would be needed.

I have been wanting to test BIND and BLSK, need to dig up my TMS99105 boards :) 

  • Like 1
Link to comment
Share on other sites

1 hour ago, speccery said:

Sorry to go off the tangent, but I think I need to write this, since I deeply care about execution performance.

Interesting to read.

I've spent quite a bit of my professional career writing code for multi-axis motion control systems that control several servo motors synchronized to each other, with high demands for low following errors combined with highly dynamic move profiles. So I can fathom what you are referring to, even if the target system is different.

 

Using the stack pointer as a frame pointer makes more sense when you have more than one stack. That's a flexible solution, although you need to figure out where to have them, at least in the rather small memory space of the 99/4A.

Stacking workspaces is one idea. More flexible than fixed areas for each workspace. That way you can actually make recursive BLWP calls as well, even if you'll run out of stack memory 16 times as fast, compared to only stacking R11.

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...