Jump to content
IGNORED

Strategy for efficiently storing and checking sets of boolean values in TI BASIC


Recommended Posts

I was puzzling lately over the most memory-efficient and time-efficient means of storing and retrieving sets of boolean values in TI BASIC (not Extended BASIC, so no bitwise operators). 

 

Say you've got a game where levels have switches/doors/toggles, and you need to store (and efficiently check) the on/off value for sets of up to 10 of them, as needed. 

 

The best approach I've been able to come up with so far, offering a balance of memory-efficiency and (importantly, as this is TI BASIC) time-efficiency is as an integer which is a multiple of primes.

 

So 10 boolean values are stored in an integer with a value of up to 2*3*5*7*11*13*17*19*23*29 = 6469693230 (when all 10 bits are "on").

 

And to set values, starting from X=1, "bit 3" can be set by X=X*5, "bit 9" by X=X*23, etc.  And zeroing values, if necessary, would of course be achieved by division. 

 

Then, to retrieve values, the truth value of, say, bit 7, is tested with (X/17=INT(X/17)), which is likewise quite fast, in TI BASIC terms. 

 

This is the sort of very elementary business that I'm sure has been trod a thousand times, over the years.  But I decided to attack TI BASIC type-in/cassette game development full force, recently, so I'm kind of exploring all these nooks and crannies for solutions. 

 

Does anyone have a better solution?  Within the domain of TI BASIC itself.  Not interested in jailbreak-based solutions, for present purposes.  Keeping in mind that speed is the top priority here, and memory use the secondary one.  This solution is very fast at storing values (a LET with a multiplication) and fairly fast at retrieving them (since an INT() and two divisions take a little longer). 

 

If you just need to store a few boolean values, sure, just making them regular numerical variables with values of 1 and 0 is fine.  But given how big TI BASIC numerical variables are, that solution doesn't really scale.  So it became apparent to me I needed something better, which isn't as slow as packing and unpacking data stored in strings. 

 

 

  • Like 2
Link to comment
Share on other sites

In TI BASIC, I found straight arrays were faster to use as simple flags rather than trying to store bit flags in a variable.  I have used bit flags in Extended BASIC using OR to set, AND to unset and detect, and it is faster than the equivalent math routines in TI BASIC.

 

You need eight bit flags?  F({0..7}) works just fine.  You can test for value or test for truth (0=false, anything else is true.)  It is more memory hungry than storing eight bits in a single variable, but faster to work with.  In terms of memory, since TI BASIC has no concept of integer storage, unlike CBM BASIC which uses six bytes (two for the name, two for the memory pointer, and two for the value,) for the % integer operator (F%, for instance,) TI BASIC will store your flag as a floating point number.

 

In terms of ease-of-use and speed, a straight variable for each flag, in my experience, is superior in TI BASIC.

  • Like 1
Link to comment
Share on other sites

Yes, as I had said, if you just need to store a few boolean values (or for that matter, your problem isn't one which is likely to butt up against memory limitations regardless), then storing them as 1s and 0s in regular numerical variables (probably as arrays so you can iterate over them easily) is fine and dandy.

 

The problem I'm confronting is just that in a complex game context where I am wrangling every spare byte and trying to control the size of very large arrays, this solution does not scale sufficiently well.  Storing a two-dimensional array of boolean values in eight bytes per bit is some crazy pants nonsense when you're fighting for spare bytes.

 

Using this solution, a 10x8 array of boolean values consists of 64 bytes of numerical data, instead of 640 bytes.  Which does make a major difference, in a 16K world.

  • Like 1
Link to comment
Share on other sites

I will say, this sort of struggle does have something of the quality of the solution in search of a problem.  After all, why develop any big or complex project for a TI BASIC 16K/Cassette use case, in a world where Extended BASIC and easy, cheap disk solutions are pervasively available?  Well, the same reason I'm using a 43 year old computer in the first place, I figure. 

 

But in a broader context, I do feel like TI BASIC was oddly underexploited.  It's somewhat shocking just how few high quality, thoughtfully developed TI BASIC programs were developed in its time.  Programs that actually looked at what it can do and the speed at which it can do it, and used those (albeit limited) capabilities to their fullest, within their own means.  Flooraway and Maze of Grog and Lord of Serpents are some examples. 

 

But at the same time, it's not shocking, given how utterly necessary Extended BASIC was, and how completely straightforward and impossible to ignore or put aside its advantages are.  Even if it had added no features at all, the huge performance advantage would still be impossible to pass up. 

  • Like 3
Link to comment
Share on other sites

One downside I can think of with your method is if you wanted to compile the program it would fail because you are using floats.

 

I have been noodling on strings as an alternative but I can't think of a fast way to set/change a char in the middle of a string. I suspect a bunch of nested string functions could do it however. 

It might still be the same speed as a lot of floating point math. (?)

 

POS(BIT$,"1",N) would let you read a "bit" pretty fast or you could use SEG$()

If you accessed  a bunch of "bits" as a string ( ie: "10101010") with SEG$ you can compared them all at once with the equal operator.

So there is that. 

 

 

  • Thanks 1
Link to comment
Share on other sites

1 hour ago, TheBF said:

POS(BIT$,"1",N) would let you read a "bit" pretty fast or you could use SEG$()

If you accessed  a bunch of "bits" as a string ( ie: "10101010") with SEG$ you can compared them all at once with the equal operator.

So there is that.

Yeah, there definitely is a use case for strings as non-text data storage in TI BASIC, and there are a fair number of programs out there that use them in such ways.

 

For example, I've seen entire songs stored in string format, which extract all their CALL SOUND values via ASC and SEG$.

 

But it is very slow.  A POS takes 40ms in TI BASIC to find character 1 of a 1 character string, and search duration grows at about 3ms per character.  And SEG$ takes 35ms to segment character 1 of a 1 character string.  Plus, string concatenation is quite slow, in BASIC.  Concatenating three single-character strings and assigning them to a variable takes over 50ms. 

 

So potentially a tool for downtime extraction or storage of data, but not something I'd use for general purpose gameplay variables that need to be accessed and modified regularly and on the fly, sadly. 

 

I did use strings for storage of integer values in one case in the program that motivated this thread, precisely because I needed something that behaved like a stack, which strings obviously can very easily.  But not for data actively used during gameplay, in that case.

Link to comment
Share on other sites

2 hours ago, pixelpedant said:

Yes, as I had said, if you just need to store a few boolean values (or for that matter, your problem isn't one which is likely to butt up against memory limitations regardless), then storing them as 1s and 0s in regular numerical variables (probably as arrays so you can iterate over them easily) is fine and dandy.

I thought I was being supportive, though I did not go so far as to say directly, but alluded, that TI BASIC may not be the proper environment for complex scenarios.  Certainly should not be a discouragement: we climb the mountain because it is there.

 

A lot has been squeezed out of TI BASIC over the decades, so getting a little more out of its native environment (no sandbox jailbreak or other special external tricks) would be neat to see, as well as the cost-benefits analysis.

  • Like 3
Link to comment
Share on other sites

19 hours ago, TheBF said:

One downside I can think of with your method is if you wanted to compile the program it would fail because you are using floats.

 

I have been noodling on strings as an alternative but I can't think of a fast way to set/change a char in the middle of a string. I suspect a bunch of nested string functions could do it however. 

It might still be the same speed as a lot of floating point math. (?)

 

POS(BIT$,"1",N) would let you read a "bit" pretty fast or you could use SEG$()

If you accessed  a bunch of "bits" as a string ( ie: "10101010") with SEG$ you can compared them all at once with the equal operator.

So there is that. 

 

 

Hmm if we turn this into an Assembly program I could incorporate it into RXB ROMs.

Link to comment
Share on other sites

18 hours ago, OLD CS1 said:

I thought I was being supportive, though I did not go so far as to say directly, but alluded, that TI BASIC may not be the proper environment for complex scenarios.  Certainly should not be a discouragement: we climb the mountain because it is there.

 

A lot has been squeezed out of TI BASIC over the decades, so getting a little more out of its native environment (no sandbox jailbreak or other special external tricks) would be neat to see, as well as the cost-benefits analysis.

To bad we can not do like I did with RXB and turn much of TI BASIC into Assembly.

Link to comment
Share on other sites

22 hours ago, TheBF said:

I have been noodling on strings as an alternative but I can't think of a fast way to set/change a char in the middle of a string. I suspect a bunch of nested string functions could do it however. 

It might still be the same speed as a lot of floating point math. (?)

 

POS(BIT$,"1",N) would let you read a "bit" pretty fast or you could use SEG$()

If you accessed  a bunch of "bits" as a string ( ie: "10101010") with SEG$ you can compared them all at once with the equal operator.

So there is that. 

 

 

I haven't thought this through 100%, but here's another way that might work using strings:

For your 10 values you could have ABCDEFGHIJ for each of the 10 values. Start out with A$=""

To see if "C" is there you could

50 P=POS(A$,"C",1) 

51 IF P THEN 100 ELSE 200

100 REM C IS THERE, GET RID OF IT

110 A$=SEG$(A$,1,P-1)&SEG$(A$,P+1,255)

200 REM C NOT THERE, ADD IT

210 A$=A$&"C"

Adding a letter is easy, as the order does not matter. A$ could be "CGDA"

But with two SEG$, removing a letter might be too slow.

 

(can't remember if you need parentheses around the POS(A$,"C",1))

  • Like 3
Link to comment
Share on other sites

Another possible way to use strings would be to have an array. This would be considerably smaller than using a numeric array.

To set the 3rd element: A$(3)="1"      1166 loops in 30 seconds

To reset: A$(3)=""                             1243 loops in 30 seconds

To test: IF A$(3)="1" THEN 200          1090 loops in 30 seconds

 

The prime method:

To set:      X=X*5                                              2285 loops in 30 seconds

To reset:   X=X/5                                               2211 loops in 30 seconds

To test:     IF X/17=INT(X/17) THEN 200              677 loops in 30 seconds

 

These are approximate times done with a stop watch. Plus I did not time empty loops which would let you determine how long each instruction takes.

Setting/resetting is faster using the prime method. However, testing is faster using the string method. So if you are mostly testing then the string method might be better. If you are mostly setting/resetting then the prime method would be better.

Arrays take time. If you have, say, 10 things to keep track of, maybe A$,B$,C$ etc. would be faster.

 

OLD CS1 has a good point about the garbage collection. But if you are not setting/resetting a lot it would not happen too often.

 

 

 

 

  • Like 3
Link to comment
Share on other sites

Here's another thing that is easy to overlook. Besides the overhead of a numeric variable or string variable or array, there is also the overhead of actually using the variable.

10 IF X/17=INT(X/17) THEN 100     takes 26 bytes

10 IF A$(3)="" THEN 100                takes 20 bytes

10 IF A$="" THEN 100                    takes 16 bytes

10 X=X*17                                     takes 14 bytes

10 A$(3)=""                                   takes 16 bytes

10 A$=""                                       takes 11 bytes

 

So the answer to your original question is.......complicated.

  • Like 2
  • Haha 1
Link to comment
Share on other sites

3 hours ago, senior_falcon said:

Here's another thing that is easy to overlook. Besides the overhead of a numeric variable or string variable or array, there is also the overhead of actually using the variable.

10 IF X/17=INT(X/17) THEN 100     takes 26 bytes

10 IF A$(3)="" THEN 100                takes 20 bytes

10 IF A$="" THEN 100                    takes 16 bytes

10 X=X*17                                     takes 14 bytes

10 A$(3)=""                                   takes 16 bytes

10 A$=""                                       takes 11 bytes

 

So the answer to your original question is.......complicated.

Gets even more complicated when you consider speed and types of memory.

Numeric values like 17 takes 8 bytes in memory as Floating Point and the longer the variable name the more memory used.

String values only take length and characters, but ONLY reside in VDP Slower memory in XB, in BASIC everything is in Slower VDP memory.

RAM memory is way faster than VDP memory and everything in BASIC is in VDP.

BASIC is slow for many reasons and I have not even mentioned GPL yet.

Link to comment
Share on other sites

4 hours ago, senior_falcon said:

 

So the answer to your original question is.......complicated.

Indeed!  And thank you for engaging with this question and producing these results.  :)

 

The significant memory cost of numeric constants is definitely something one has to keep in mind. 

 

One big takeaway from that reality being just that, given a large program with a lot of math in it, replacing every single instance of "1" with a variable "A" which is permanently set to the value 1 is very nearly always going to save a meaningful chunk of memory.  And look stupid and nonsensical, but c'est la vie, if you actually need that memory. 

 

Just another reason to effectively write two versions of any given TI BASIC program - one which is readable and unoptimised, and another which is a goddamn unreadable hodgepodge, but optimised. 

 

The limitation of this approach for very large programs being that if you're making it an unreadable hodgepodge to fit inside memory limitations and you're butting right up against said limitations, the "unoptimised" version is not actually going to be testable/usable code.  It's effectively no better than pseudocode, if it can't actually run.  So arguably, you may as well just write pseudocode instead. 

 

  • Like 1
Link to comment
Share on other sites

23 minutes ago, pixelpedant said:

 So arguably, you may as well just write pseudocode instead. 

... and then use a programmable language that turns your pseudocode into runnable code and so forth.

(Sorry. I couldn't resist the tempation) ;) 

  • Haha 1
Link to comment
Share on other sites

2 hours ago, RXB said:

RAM memory is way faster than VDP memory and everything in BASIC is in VDP.

VDP RAM is slower but not as much as I thought it was if you use an interpreted language where the interpreter has already slowed down the operations of the machine. 

I did some experiments in Forth with strings and variables in VDP RAM and expansion RAM.

I don't have the results handy but I was surprised. I need to find that code again.

Link to comment
Share on other sites

30 minutes ago, pixelpedant said:

The limitation of this approach for very large programs being that if you're making it an unreadable hodgepodge to fit inside memory limitations and you're butting right up against said limitations, the "unoptimised" version is not actually going to be testable/usable code.  It's effectively no better than pseudocode, if it can't actually run.  So arguably, you may as well just write pseudocode instead. 

Not necessarily.  The advantage of pseudocode would be writing geared toward your final goal, but with the lacking of testability.  I have written some BASIC programs which come close to consuming TI BASIC's program memory, and I have maintained two code bases of the same program.  Individual unoptimized parts can be tested in smaller chunks, feeding it useful information and checking expected output (or fuzzing it with nonsense to see if it breaks.)  You will definitely run into a problem of translating between your unoptimized source and your optimized running program.  Moving between my code bases sometimes introduced a little quirk here and there which had to be addressed, mostly because of the differences between AND and OR in XB and the equivalent mathematic formula in TI BASIC, but also simple transcription errors which change values or alter program flow unexpectedly.

  • Like 2
Link to comment
Share on other sites

7 hours ago, pixelpedant said:

One big takeaway from that reality being just that, given a large program with a lot of math in it, replacing every single instance of "1" with a variable "A" which is permanently set to the value 1 is very nearly always going to save a meaningful chunk of memory.  And look stupid and nonsensical, but c'est la vie, if you actually need that memory. 

 

I don't think there is any savings. In the program line, "1" is not kept as a floating point number using 8 bytes. It is kept as a string with a token denoting a string, a length byte and "1", and converted to floating point when it is needed by the program. 

"A" is kept the same way, with a length byte and "A"  Just the A with no length byte and no token denoting a string.

I may be remembering wrong, but this is easy to check in Classic99. Do a cold reset to clear the VDP memory, goto BASIC and enter a program line. In vdp ram the program line ends at >37D7 and you can use the debugger to see how many bytes are in the crunched line. Remember that program lines are 2 bytes, so line 10 becomes >000A.

Edited by senior_falcon
  • Like 2
Link to comment
Share on other sites

Ahah!  You make an excellent point (and have excellent knowledge of these things, as always). 

 

My hypothesis had been tested simply by dumping a giant program of mine into XB with all "1s" replaced with a variable "A" and comparing the SIZE result, which had improved meaningfully.

 

But looking at the tokenised BASIC, it looks like where this is coming from is not where I'd expected. 

 

Specifically, ZZZ=A in tokenised TI BASIC appears to consist of five bytes, those being:

 

>5A >5A >5A >BE >42
Z   Z   Z   =   A

While ZZZ=1 in tokenised TI BASIC appears to consist of seven bytes, those being:

>5A >5A >5A >BE >C8 >01 >31
Z   Z   Z   =   Str Len 1

So the memory footprint of any given reference to a numeric constant (in bytes) appears to be its decimal character length + 2. 

 

While the memory footprint of any given reference to a numeric variable appears to be its character length outright.

 

So if I dump a program like this into Classic99, it grows upwards to >3757 or so.

 

10 ZZZ=1
20 ZZZ=1
30 ZZZ=1
40 ZZZ=1
50 ZZZ=1
60 ZZZ=1
70 ZZZ=1
80 ZZZ=1
90 ZZZ=1
100 ZZZ=1

 

While if I dump a program like this into Classic 99, it grows upwards to >376B or so:

 

10 ZZZ=A
20 ZZZ=A
30 ZZZ=A
40 ZZZ=A
50 ZZZ=A
60 ZZZ=A
70 ZZZ=A
80 ZZZ=A
90 ZZZ=A
100 ZZZ=A

 

 

  • Like 2
Link to comment
Share on other sites

23 hours ago, TheBF said:

VDP RAM is slower but not as much as I thought it was if you use an interpreted language where the interpreter has already slowed down the operations of the machine. 

I did some experiments in Forth with strings and variables in VDP RAM and expansion RAM.

I don't have the results handy but I was surprised. I need to find that code again.

I spend all day for last 30 years working with GPL Interpreter and XB.

By changing VDP routines from GPL to Assembly I have increased routines in XB accessing VDP by 9 times like in CALL HCHAR or CALL VCHAR 

 

Link to comment
Share on other sites

18 hours ago, pixelpedant said:

Ahah!  You make an excellent point (and have excellent knowledge of these things, as always). 

 

My hypothesis had been tested simply by dumping a giant program of mine into XB with all "1s" replaced with a variable "A" and comparing the SIZE result, which had improved meaningfully.

 

But looking at the tokenised BASIC, it looks like where this is coming from is not where I'd expected. 

 

Specifically, ZZZ=A in tokenised TI BASIC appears to consist of five bytes, those being:

 


>5A >5A >5A >BE >42
Z   Z   Z   =   A

While ZZZ=1 in tokenised TI BASIC appears to consist of seven bytes, those being:


>5A >5A >5A >BE >C8 >01 >31
Z   Z   Z   =   Str Len 1

So the memory footprint of any given reference to a numeric constant (in bytes) appears to be its decimal character length + 2. 

 

While the memory footprint of any given reference to a numeric variable appears to be its character length outright.

 

So if I dump a program like this into Classic99, it grows upwards to >3757 or so.

 

10 ZZZ=1
20 ZZZ=1
30 ZZZ=1
40 ZZZ=1
50 ZZZ=1
60 ZZZ=1
70 ZZZ=1
80 ZZZ=1
90 ZZZ=1
100 ZZZ=1

 

While if I dump a program like this into Classic 99, it grows upwards to >376B or so:

 

10 ZZZ=A
20 ZZZ=A
30 ZZZ=A
40 ZZZ=A
50 ZZZ=A
60 ZZZ=A
70 ZZZ=A
80 ZZZ=A
90 ZZZ=A
100 ZZZ=A

 

 

This shows only the TOKENs used not the speed of execution, there will be more routines to run to interpret the A vs a constant 1

Interpreter has to look up symbol A find it's address where stored and then use that value this takes more time especially in VDP.

Do this in RAM with XB vs Consol and you will see a large difference in number of executions.

I have been fighting the fight on this for many years now.

 

100 CALL CLEAR
110 OPEN #1:"CLOCK"
120 INPUT #1:A$,B$,C$
130 FOR C=1 TO 10000
140 Z=1
150 NEXT C
160 INPUT #1:D$,E$,F$
170 PRINT A$,D$:B$,E$,C$,F$
180 END

100 CALL CLEAR :: A=1
110 OPEN #1:"CLOCK"
120 INPUT #1:A$,B$,C$
130 FOR C=1 TO 10000
140 Z=A
150 NEXT C
160 INPUT #1:D$,E$,F$
170 PRINT A$,D$:B$,E$,C$,F$
180 END

Run each to see the results.

TOP TEST Z=1 is 1 minute 17 seconds

BOTTOM TEST Z=A is 1 minute 23 seconds

  • Like 1
Link to comment
Share on other sites

I see the opposite. Looping 4000 times:

Z=A  32 seconds

Z=1  37 seconds

 

Bottom line is that it is pretty close either way.

 

Of course, I don't trust the times that my computer gives, and after seeing the results you posted in "BYTE MAGAZINE SIEVE BENCHMARK" or in "BENCHMARKING LANGUAGES", I definitely would not trust your times. Unless you have determined what was causing the anomalous results in the video and are getting results that agree with a real TI99. (As I noted in my post #219 in benchmarking languages "Here's the thing. As far as I can tell, your video shows a test that should yield valid results." Yet the video in your post #216 clearly shows that it does not.)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...