BLWP vs BL

+mizapf · December 7, 2022

When life gives you BLWPs, make context-switch-ade.

+TheBF · December 7, 2022

53 minutes ago, apersson850 said:

Having the stack pointer also be the frame pointer/environment pointer, isn't that a bit confusing if you need to push/top temporaries on the stack? I've always preferred a separate frame pointer.

A trick that I saw in intel compiled code was to push the original RP onto the return stack as the last item saved on the stack frame.

I made a cheap and dirty local variable system for my Forth using this idea. (Apologies for the RPN assembler)

CODE LOCALS ( n --) \ build a stack frame n cells deep
   RP R0 MOV,       \ dup the return stack pointer 
   TOS 1 SLA,       \ top of data stack x 2 = memory needed
   TOS RP SUB,      \ allocate space on return stack 
   R0 RPUSH,        \ push the original stack pointer onto return stack 
   TOS POP,         \ refill tos data stack cache register (R4)
   NEXT,
ENDCODE

Then all the locals are accessed with indexed addressing using the RP register.

Then to collapse the stack frame you just do this.

*RP RP MOV,

Edited December 7, 2022 by TheBF
fixed comment

speccery · December 7, 2022

1 hour ago, apersson850 said:

Then you are concerned about execution performance, I presume?

In my programming life I also care about programming performance and reliability, not only execution speed. To isolate caller from called a BLWP could be handy, and isolation is usually good to prevent one stupid mistake to kill more than itself. Unless it's real-time programming where performance is cruical, or it simply doesn't work.

Sorry to go off the tangent, but I think I need to write this, since I deeply care about execution performance.

I haven't professionally written software for the past seven or eights years or so, but the preceding 25 years I spent a lot of my career writing very time critical software. Mostly on x86 platforms, but also some on TMS320C30 and TMS320C51 DSPs. I was first in the video conferencing industry and then in the game streaming industry before moving on. In those companies the performance of the software - both execution speed and reliability - were paramount, not only for technical reasons but they were business enablers and drivers too.

In the late nineties we created the world's first H.320 standard compliant software-only video conferencing solution for Windows. I wrote both H.261 standard compliant video encoders and decoders and spent a lot of time optimising them, it wasn't easy to get them to run in real-time with the PCs at the time. A lot of assembler coding using MMX technology, later on that changed to SSE. But I'm especially proud of the MPEG-4 encoder I wrote later on for game streaming. I think I spent at least a year optimising it, using SSE2 and SSSE3 (supplemental streaming extensions 3 - what an acronym) assembler code. The MPEG-4 encoder was/is very heavily multithreaded and uses dynamic programming algorithms to achieve minimum run time per frame with consistent image quality. In the process of all of this I also co-invented a very nifty motion estimation algorithm for 3D graphics, this is described in the U.S. patent is https://patents.google.com/patent/US20040095999A1/en, but that's another story. (Just now noticed that they misspelled my first name in that patent...)

Over the years I have hired a lot of programmers and one of the things I've tried to teach them is to indeed care about performance and reliability. Keep it simple and make it good enough, but not too good so that we can get it done and actually ship it...

Edited December 7, 2022 by speccery

+FarmerPotato · December 7, 2022

On 12/3/2022 at 2:19 PM, apersson850 said:

The drawback of the flexible stack method is that the TMS 9900 doesn't have "deferred indirect".

The need for return stacks led to the 990/10 BIND and BLSK instructions. Since they're only in 99105, not our consoles, that is just a historical footnote.

Prior to that, TI used stacked 16-word workspaces. One technique was to construct a BLWP vector in R10-R11 then BLWP R10. Where R10 was usually the stack pointer,

AI R10,32

LI R11,CALLEE

BLWP R10

created a stack WP frame.

Other stack techniques can be found in TI's Microprocessor Pascal manual for the TM990 series, or the TI PASCAL manuals for the minis. (Different). Also the Software Development Handbook.

I think the same MP Pascal stack frames are shown in assembly in the TI Realtime Executive manual. All in Bitsavers. Some under the AMPL directory. Lots of 9900 heritage we can study.

speccery · December 7, 2022

1 minute ago, FarmerPotato said:

The need for return stacks led to the 990/10 BIND and BLSK instructions. Since they're only in 99105, not our consoles, that is just a historical footnote.

Prior to that, TI used stacked 16-word workspaces. One technique was to construct a BLWP vector in R10-R11 then BLWP R10. Where R10 was usually the stack pointer,

AI R10,32

LI R11,CALLEE

BLWP R10

created a stack WP frame.

Thanks for sharing this, that's interesting! And quite dense, looks like only 5 words of code space would be needed.

I have been wanting to test BIND and BLSK, need to dig up my TMS99105 boards

apersson850 · December 7, 2022

1 hour ago, speccery said:

Sorry to go off the tangent, but I think I need to write this, since I deeply care about execution performance.

Interesting to read.

I've spent quite a bit of my professional career writing code for multi-axis motion control systems that control several servo motors synchronized to each other, with high demands for low following errors combined with highly dynamic move profiles. So I can fathom what you are referring to, even if the target system is different.

Using the stack pointer as a frame pointer makes more sense when you have more than one stack. That's a flexible solution, although you need to figure out where to have them, at least in the rather small memory space of the 99/4A.

Stacking workspaces is one idea. More flexible than fixed areas for each workspace. That way you can actually make recursive BLWP calls as well, even if you'll run out of stack memory 16 times as fast, compared to only stacking R11.

Sign In

BLWP vs BL

Recommended Posts

+mizapf

Link to comment

Share on other sites

+TheBF

Link to comment

Share on other sites

speccery

Link to comment

Share on other sites

+FarmerPotato

Link to comment

Share on other sites

speccery

Link to comment

Share on other sites

apersson850

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More