So, what about strings?

JamesD · August 9, 2019

19 hours ago, drpeter said:

I think once again it is becoming unclear what the question is.

Is it which hardware is/was fastest?

Is it which hardware/software combination is fastest?

Is it which hardware/firmware/software combination is fastest?

Is it which hardware/firmware/software/application combination is fastest?

Is the question restricted to hardware and/or software of a certain time period?

Is the question restricted to high-volume off-the-shelf hardware?

Is it 'cheating' to skip parts of the benchmark made unnecessary by the string-handling method chosen?

Is it legitimate to change the sort algorithm, or even the sort type?

Is it legitimate to alter the number of items to be sorted or interfere with the data in other ways?

Is it legitimate to 'mod' the hardware and/or firmware, and if so to what extent?

What speed enhancing 'tricks' in the routine are legitimate?

Is it legitimate to 'mod' even the software itself (Fast Basic)?

etc. etc.

Bearing in mind that the chief thing demonstrated by a benchmark is the system's ability to run that exact benchmark.

Which thought led Bill Wilkinson no less to repeatedly comment that most benchmark results were worth the paper they were written on and little more....

Answers to these questions will depend on the point to be demonstrated, and I'm not at all sure we agree what that is any more, although it's certainly been an interesting debate so far ☺️

And I think some interesting things have been clearly demonstrated and clarified along the way.

And perhaps it's more about the journey than the destination, so long as you can enjoy the ride!

All good points.

I'm more interested in how fast the machines are. No, one rule for you, but no rules for me stuff.
Machine, which BASIC, and year. Year would also mean when the BASIC came out in the case upgrades.

Having to adapt the code to a different BASIC... there has to be some allowance for that.
Unrolling loops, changing the functionality as with going from ^2 to *2 in Ahl's benchmark, changing sort, etc... it's not the same thing even if the result is the same.

If some machines can sort more names, we can also include that number, and how fast it was. The Plus/4 would win this hands down, or a CoCo 3 with a program called Big BASIC.

The fastest machine is going to differ by year, and newer machines are likely to be faster. That's just the way it is.
It seems a bit disingenuous to say some machines that came out later can't be compared, when the Atari came out later than the Apple II, TRS-80 Model I, and Pet. The Atari had the benefit of later development, and it was designed to be better than those machines. The first three were designed to compete with with the KIM 1, Altair, etc... which weren't really even personal computers.

The whole I get to use a machine that was introduced later than the Apple, but you can't bring up Apples introduced later thing is laughable.
The Apple III from 1980 was about twice as fast as the Apple II. It should manage around the 107 number I posted for a 2MHz Apple with zero BASIC modifications.
The CoCo came out in 1980, and it could run anything in the ROM bank at double speed. That speeds up BASIC by 30%, a 6309 can make it another 20% faster, it's ROM would benefit from the same modifications as the MC-10, and there are ways to make it even faster without an accelerator.
So, is 1980 too late?

Faicuai · August 9, 2019

34 minutes ago, JamesD said:

(...)

Having to adapt the code to a different BASIC... there has to be some allowance for that.
Unrolling loops, changing the functionality as with going from ^2 to *2 in Ahl's benchmark, changing sort, etc... it's not the same thing even if the result is the same.

(...)

Yep... like modifying actual MS-Basic statements by removing literal values, and reference (instead) global constants... or like by running MS-Basic across different machines and finding out a WIDE latitude of optimizations and versions... like PC MS-Basic / Basic, for instance, which catch 2^n operations and handle them as discrete, serial multiplications (from very early on its life)...

I think you got the picture upside-down. What we need to do is (if working with Basic);

Identify the top 5 wrenches we can throw at ANY interpreter to understand how they handle them, and the effects of their design and implementation choices.
Throw these wrenches sequentially at EACH interpreter and present, for EACH run, two (2) results: unoptimized (as is), and then optimized to handle the wrench, if the interpreter instance allows for it. If not, timing for 2nd run is the same as unoptimized.
Totalize all timings, and report the lowest aggregate #.

The Kilobaud magazine test-suit is a REAL example of the above, but at a more "atomic" level (smaller functions). In any case, the above is a much more solid (and transparent) approach. Much. much harder to later debunk, in fact.

Edited August 9, 2019 by Faicuai

JamesD · August 9, 2019

3 hours ago, Faicuai said:

Yep... like modifying actual MS-Basic statements by removing literal values, and reference (instead) global constants... or like by running MS-Basic across different machines and finding out a WIDE latitude of optimizations and versions... like PC MS-Basic / Basic, for instance, which catch 2^n operations and handle them as discrete, serial multiplications (from very early on its life)...

Did the algorithm change by putting constants in variables? No
Did the algorithm change by switching from ^2 to *2? Yes
Every other machine benchmarked in Creative computing used ^2.
Does the PC benefit unfairly from recognizing that optimization? Maybe. but then that's part of BASIC itself, not a change to the benchmark.
They could have used the math coprocessor for drastically better results, so if anything, PC BASIC is slower than it could be.
Besides, isn't that why we are benchmarking modern optimized BASICs?
So people can add things like that if they want?

3 hours ago, Faicuai said:

I think you got the picture upside-down. What we need to do is (if working with Basic);

Identify the top 5 wrenches we can throw at ANY interpreter to understand how they handle them, and the effects of their design and implementation choices.

Throw these wrenches sequentially at EACH interpreter and present, for EACH run, two (2) results: unoptimized (as is), and then optimized to handle the wrench, if the interpreter instance allows for it. If not, timing for 2nd run is the same as unoptimized.

Totalize all timings, and report the lowest aggregate #.

That is a much more solid (and transparent) approach. Much. much harder to later debunk, in fact.

Because clearly, no optimization was used in the Atari code but lots was used in the MS BASIC code. :roll:

MS BASIC wrenches.

It doesn't convert constants to binary at tokenization time. Not even line numbers. They must be converted from ASCII to float every time.

*edit* It converts the line #s at the start of the line, not line #s in GOTOs, GOSUBs, or ON GO... statements.
This slows down GOTO & GOSUB
Using FOR loops saves the address of the end of the FOR statement, making it easy to return to with the NEXT, so it's faster than GOTO.

It puts variables in it's table in the order they are created, and searches from start to finish to find them. FLOATS are stored before strings.
This is why MS BASIC programs almost always define variables at the top.
Since you complained about how I optimized the MS BASIC code, I just changed line 0 to define variables in slightly better order for the sort, and the MC-10 time dropped another second. Not much, but if you don't define them it can add up to quite a difference. If I play with it, I might knock off another second.
But you can see, defining/moving a variable or two can make a difference, even if it's small.

It doesn't store a pointer to the current line. Any loop on the same line using GOTO has to start searching from the start of the program.
It does check to see if the line number in the GOTO follows the current line number.
It also has to search for the end of a line character by character to find the end of line marker after a failed IF condition.
If you have a long line of code that depends on an IF at the start of the line, it's better to reverse the logic and GOTO a line a few lines down, and drop down for the true condition of the IF. But this also depends on how often the condition is TRUE, so you may have to benchmark it.
So, to make it run faster, this:
50 IF A > 1 THEN do a lot of crap:GOTO 70
60 do something else
70
Becomes:
50 IF A<=1 THEN 70
60 do a lot of crap:GOTO80
70 do something else
80

Those are the biggies off the top of my head.
NEXT without the loop variable is faster.
Spaces are left in the code except on the Apple II, which removes them during tokenization.
But you can't use certain variables on the Apple II because they may combine with keywords during tokenization.
On other machines, the spaces just slow down the interpreter.

The biggest problem with MS BASIC is inefficient implementation, which is what most of my optimizations target.
Poll the entire keyboard before every token is executed, inefficient parsing, 8 instead of 16 bit code...
If I convert constants during tokenization, create a faster way to look up lines, and a faster way to look up variables, everything should run faster.
Math offers the biggest room for improvement. A faster LOG function is on the list.

Edited August 9, 2019 by JamesD

drpeter · August 9, 2019

5 hours ago, JamesD said:

I'm more interested in how fast the machines are.

OK and what is the reason for that interest? Are you imagining which machine you might buy if you were transported back to 1980? Or is it more of a fascination with the development over time of 8-bit computing power? Or is is an interest in seeing if you can make a given system perform faster? For me, the chief fascination is in exploring imaginative solutions to a programming challenge within the tight constraints of a particular system. This started with the challenge of seeing whether a stock Atari could even be programmed to do the full 891 name sort at all. I'm less interested in whether another machine from the year before or the year after might be cajoled into performing the same arbitrary task a few seconds, or even minutes quicker or slower. Although it's of interest to see how different BASIC implementations make particular programming tasks simpler or harder. But that's just me and where I'm coming from.

For me, these machines were competitors 40 years ago, not nowadays. I'm not so much interested in the C64 or the Apple II or the BBC micro, not because I think they are bad or inferior machines, but simply because I never owned one. I love the Atari simply because it was my first computer, and you never forget your first love, do you? Oh, or Star Raiders and M.U.L.E ?

Edited August 9, 2019 by drpeter

JamesD · August 10, 2019

1 hour ago, drpeter said:

OK and what is the reason for that interest? Are you imagining which machine you might buy if you were transported back to 1980? Or is it more of a fascination with the development over time of 8-bit computing power? Or is is an interest in seeing if you can make a given system perform faster? For me, the chief fascination is in exploring imaginative solutions to a programming challenge within the tight constraints of a particular system. This started with the challenge of seeing whether a stock Atari could even be programmed to do the full 891 name sort at all. I'm less interested in whether another machine from the year before or the year after might be cajoled into performing the same arbitrary task a few seconds, or even minutes quicker or slower. Although it's of interest to see how different BASIC implementations make particular programming tasks simpler or harder. But that's just me and where I'm coming from.

For me, these machines were competitors 40 years ago, not nowadays. I'm not so much interested in the C64 or the Apple II or the BBC micro, not because I think they are bad or inferior machines, but simply because I never owned one. I love the Atari simply because it was my first computer, and you never forget your first love, do you? Oh, or Star Raiders and M.U.L.E ?

Do we really need a specific reason?
My interest is a little bit of all of the above.
There's the 'this is how the machines performed on different benchmarks' aspect, and how the performance changed over time.
There's the what if angle... like what if some worthless manager actually made a smart decision to update the BASIC they included with the machine, possibly even before it's release. Seriously, a 4%-5% speedup on the CoCo, and MC-10 required almost no effort.
The Atari? How do you slow down the one of the fastest personal computers on the market (at intro), to where it looks the slowest??? And then you NEVER do anything about it even though BASIC came on a freakin cart? Why couldn't they have

It's also interesting to see the different solutions to a problem, while sticking to just BASIC.
MS BASIC is pretty strait forward with string handling, and that was mostly tweaking the sort to work properly using for loops.
Porting the MS BASIC to Apple, and the C64 required changing one command, and it took a couple minutes to look up the Plus/4 display disable POKE.
The new gap calculation was sort of a "how can I do this with less code, and no divide?" challenge, thinking that would speed it up a tiny bit. The fact it seems to work that well was a total surprise, but it may not be as efficient with much smaller, or larger data sets.

The Atari was a bit different. I remember seeing a couple comments about it's poor string handling, and was curious to see if/and how it could do it.
How many different versions and tweaks were tried before coming to the "final" one?
Imagine if Atari had actually funded a BASIC XL like project, and introduced it by 1981, and faster math to go with it before the XL series came out.
Speaking of XL, did you ever run the last version of the code on BASIC XL and XE?

Faicuai · August 10, 2019

5 hours ago, JamesD said:

Did the algorithm change by putting constants in variables? No
Did the algorithm change by switching from ^2 to *2? Yes
Every other machine benchmarked in Creative computing used ^2.
Does the PC benefit unfairly from recognizing that optimization? Maybe. but then that's part of BASIC itself, not a change to the benchmark.
They could have used the math coprocessor for drastically better results, so if anything, PC BASIC is slower than it could be. (...)

Unfortunately, most of these observations seem to me out of practical context.

An algorithm does not change if the underlying expression (mathematically) does not change. 2^n is just 2.2.2.2... given n an integer. It would be broken, however, if such parameter was computed and produced in the real space and then processed as an integer. Even so, there is no problem in presenting the code WITH and WITHOUT its algebraic expansion, as we will clearly see who's trapping the expansion or not (which is the case of IBM's PC/MS-Basic, and a match co-processor has little to do with this).

Resolving m^n (as s simple example) is just a WRENCH that can be thrown at these interpreters. Another wrench, as we learned, is a sort routine based on DATA-statement look-up (and long line #s). Talk about a BIG wrench. Another wrench is Banana-sort itself, as it relies on maintaining a properly updated sort-index on real time for the totality of the incremental data-set read into memory, which requires running array access, indexing and memory-copy operations as fast as possible during an interpreted run, record after record. Even Atari Basic comes close to MS-Basic II on this test (on ATARI's implementation).

We definitely see this process from different perspectives. But the results are the only ones that count. Nothing else.

Edited August 10, 2019 by Faicuai

JamesD · August 10, 2019

1 hour ago, Faicuai said:

Unfortunately, most of these observations seem to me out of practical context.

An algorithm does not change if the underlying expression (mathematically) does not change. 2^n is just 2.2.2.2... given n an integer. It would be broken, however, if such parameter was computed and produced as the real space and then processed as an integer. Even so, there is no problem in presenting the code WITH and WITHOUT its algebraic expansion, as we will clearly see who's trapping the expansion or not (which is the case of IBM's PC/MS-Basic, and a match co-processor has little to do with this).

Resolving m^n (as s simple example) is just a WRENCH that can be thrown at these interpreters. Another wrench, as we learned, is a sort routine based on DATA-statement look-up (and long line #s). Talk about a BIG wrench. Another wrench is Banana-sort itself, as it relies on maintaining a properly updated sort-index on real time, which requires as fast array access, indexing and memory copy as possible, record after record. Even Atari Basic comes close to MS-Basic II on this test (on ATARI's implementation).

We definitely see this process from different perspectives. But the results are the only ones that count. Nothing else.

Practical? Do benchmarks have to be practical?

The DATA statements are a PITA, but the MC-10 doesn't have a disk drive, and that's what I used to test my BASIC changes.
Sorry it turned out to be a wrench for the Atari.
Would a disk file have been better? That would be a PITA on MS BASIC since every DOS is different.

Is the banana sort a variation of an insertion sort? I haven't spent much time looking at it.
The standard MC-10, Apple, and Plus/4 ROMs usually copy a byte at a time.
My update uses some 16 bit memory copies, but I don't know which code this will pass through to do that.

If you have to use ^3, or ^5, it uses the same code as ^2 on every 8 bit. *2 does not.
You'll never budge, so I won't waste any more time but to say the following...

I've already patched the MC-10 and CoCo 3 to use a hardware multiply.
Your last number for Atari using that optimization was 42.3.
On the MC-10 with the new ROM, using *2 instead of ^2 drops the time from 66 seconds, to about 42 or 43 seconds (hand timed)
That's about a 36% difference!
Are you going to tell me the MC-10 is as fast as the Atari?
With an HD6303, the hardware multiply is 3 clock cycles faster (30%), and many opcodes are only 1 clock cycle.
That would put it ahead of the Atari even with just a 5% speedup.
The CoCo 3 runs at double the clock speed, so even without the multiply or any other patch, using *2 comes in at 43.55, machine timed.
Using *2 with the multiply patch should easily put it in the 27 second range if it has the same speedup.
A CoCo 3 with an HD6309 is 20% faster in native mode on 6809 code, and I could implement some of the math functions using the additional 16 bit register, combined 32 bit register, and could use memory move instructions.
That would put it in the sub 20 second range.
According to Creative Computing, the IBM PC was timed at 24 seconds... even with it's internal *2 optimization in BASIC
A note on the scan shows the Amiga at 22.
The BBC Micro is 21, but who knows with what optimizations.
Do you still think this is fair?

Edited August 10, 2019 by JamesD

drpeter · August 10, 2019

6 hours ago, JamesD said:

Do we really need a specific reason?

Only in that it might help lead to answers to the questions I listed previously, and therefore to some reasoned means for narrowing down and agreeing parameters for this benchmarking that seem to have spiralled a little out of control.

6 hours ago, JamesD said:

The new gap calculation was sort of a "how can I do this with less code, and no divide?" challenge,

No problem there, except that it also, inadvertently or otherwise, and unnecessarily, changed the calculated gaps themselves by also switching from an F of 1.3 to one of ~1.266, (i.e. 1/0.79), as well as potentially introducing some rounding variations (by applying each iteration of the calculation to a rounded integer rather than a floating point value), which as discussed above ended up significantly changing the computing work needed to complete the sort- albeit not so much for 891 names. That opens up a whole new can of worms regarding the rationale for and reasonableness of 'gaming' F to optimise the sort time for the dataset in hand. That clearly introduces a difficulty if different machines/BASICs you want to compare as like-to-like as possible are using different F values and therefore gaps for the sort. Not a problem if you're more interested in the fastest means to a narrow end on one particular system and dataset- although I'd still wonder about the point of trying to find the best F for a particular dataset rather than for the general case. No problem at all if all comparator machines switch to the new F-value calculation, apart from the difficulties of keeping all contributors to the debate 'in sync' and everyone having to ditch all previous timings to re-run with a new optimised routine. The latter being an issue not restricted to using new F value calculations, of course, but also to the introduction of whole new sort types, sizes of dataset etc. etc. etc.

4 hours ago, JamesD said:

Practical? Do benchmarks have to be practical?

Again, that sort of depends on what you're trying to demonstrate and why. How fast is this machine for a general home programming case is a very different question from how fast can it sort a large dataset of string values or how fast can it repetitively perform a range of complex arithmetic operations- neither of which are probably very commonly-required scenarios in home computer programming, or for games programming, but conversely might be in office or scientific use respectively.

13 hours ago, JamesD said:

I'm more interested in how fast the machines are.

I guess to be more specific, how fast either a stock or custom machine/firmware/BASIC implementation is. At one level we know how fast the Atari, for example, is. It's a 6502 at 1.79 MHz. Practical results beyond that are going to depend on a combination of firmware,BASIC and application implementation.

6 hours ago, JamesD said:

Imagine if Atari had actually funded a BASIC XL like project, and introduced it by 1981, and faster math to go with it before the XL series came out.

Indeed, I can't explain or excuse it. As we know, even a hobbyist working solo had managed to do this by 1985 when the XE was released. The Newell 3rd party replacement fast math ROM chip came out in 1982. But by then Atari seemed fixated almost entirely on a downward-spiral of cost-control in competition with Commodore...

6 hours ago, JamesD said:

The Atari was a bit different. I remember seeing a couple comments about it's poor string handling, and was curious to see if/and how it could do it.

I would say more 'different' than 'poor'. Had both advantages and disadvantages compared with other BASICs, depending on context. And, of course we're talking about stock Atari BASIC here rather than the machines themselves and their multiplicity of 3rd party / hobbyist BASICs and string handling capabilities.

Lack of string arrays was of course the chief gripe, although as we've seen there are various workarounds for that. Bill Wilkinson's comments on that omission in several articles in Compute! are an interesting read. Some lament the absence of Left$, Mid$, Right$. Not me. I find A$(12,15) much nicer notation that MID$(A$(12,4)). That said, I never understood why no Atari-family BASIC introduced a convenient shorthand for Len(A$) (e.g. %), so that the equivalent for RIGHT$(A$(X)) could be written as, for example, A$(%-X+1) rather than the awkward A$(LEN(A$)-X+1).

My curiosity matched yours. Remembering anyway that the parameters of the original challenge were rooted in the somewhat arbitrary constraints of another system. Why 891 names exactly? Because that's what fitted into the MC-10 system you were working with at the time. Why DATA statements? Because the MC-10 system you were working with didn't have access to disk files. Why all that extraneous numeric data not directly used for the task in hand? Well, you tell me, but it's not really that important. The parameters for a benchmark are always going to be somewhat arbitrary and it's working within those parameters as far as feasible that is for me the fun of the challenge. Not so much fretting over whether the Atari was or is a 'better' or 'faster' computer than the MC-10.

6 hours ago, JamesD said:

How many different versions and tweaks were tried before coming to the "final" one?

Good question, although I wouldn't say there's a "final" version- more different ones which answer the original question in different ways and to different extents. First I tried a system of 2 float pointer arrays into a long string of names plus a float sort-sequence array, but found that sucked up far too much memory for a full 891-name sort, and would do so even with a reduced footprint of 1 float pointer array to take advantage of the specific scenario of packed static data. So, then a system where the single pointer array and sort-sequence array were encoded as word arrays held within long strings, written and extracted bytewise with CHR$() and ASC(). This achieved the original goal with stock Atari machine and BASIC, but was very, very slow. Although if this were a practical requirement for a task you only needed to run occasionally, that wouldn't matter. Set it to run and come back an hour later.

However, I then moved on to other Atari BASIC-compatible BASICs, still lacking string arrays but allowing access to the same string-as-word-array system using DPEEK and DPOKE, which speeded things up very considerably- both because of the more efficient means of accessing the arrays and because of intrinsically faster BASIC implementations. Next, I tried the 'pointer-into-line-numbers-of-static-DATA-statements-as-array' technique, which led to a faster 'complete' solution on stock ATARI and BASIC, but proved not quite as quick as the 'DPEEK and DPOKE' approach in BASICs supporting that. Finally, I did the rather trivial implementation in BASICs supporting string arrays. Which unsurprisingly gave the fastest results so far, but using string arrays, to date I've only managed to fit the full 891 name solution into memory by using a combination of OSS BASIC XE and extended memory on an Atari XE. I'm yet to try Atari MS BASIC though...

There's a sneaky possibility I've been considering to make the code more memory efficient by auto-deleting DATA statements after loading their data into an array and before dimensioning the sort-sequence array. But would this be 'cheating'? So far I've considered chaining programs or ENTERing data/code on-the-fly from a second file to be out-of-bounds.

It's of interest to note that these programming techniques are themselves only made possible by certain advantages the Atari BASIC family has over some other BASICs. Specifically, the ability to have strings up to 32768 characters long, the ability to specify an arithmetic expression rather than a constant as the line parameter for a RESTORE statement and the ability to self-modify code on-the-fly.

6 hours ago, JamesD said:

Speaking of XL, did you ever run the last version of the code on BASIC XL and XE?

Which of the above versions were you thinking of? Or were you referring to the revised gap calculation method?

Edited August 10, 2019 by drpeter

Faicuai · August 10, 2019

10 hours ago, JamesD said:

Practical? Do benchmarks have to be practical?

The DATA statements are a PITA, but the MC-10 doesn't have a disk drive, and that's what I used to test my BASIC changes.
Sorry it turned out to be a wrench for the Atari.
Would a disk file have been better? That would be a PITA on MS BASIC since every DOS is different.

Is the banana sort a variation of an insertion sort? I haven't spent much time looking at it.
The standard MC-10, Apple, and Plus/4 ROMs usually copy a byte at a time.
My update uses some 16 bit memory copies, but I don't know which code this will pass through to do that.

If you have to use ^3, or ^5, it uses the same code as ^2 on every 8 bit. *2 does not.
You'll never budge, so I won't waste any more time but to say the following...

I've already patched the MC-10 and CoCo 3 to use a hardware multiply.
Your last number for Atari using that optimization was 42.3.
On the MC-10 with the new ROM, using *2 instead of ^2 drops the time from 66 seconds, to about 42 or 43 seconds (hand timed)
That's about a 36% difference!
Are you going to tell me the MC-10 is as fast as the Atari?
With an HD6303, the hardware multiply is 3 clock cycles faster (30%), and many opcodes are only 1 clock cycle.
That would put it ahead of the Atari even with just a 5% speedup.

ALL timings, for ALL AHL variants / configs. on Atari, here (already in 18 secs. territory... far below 42 secs...):

Now that you mention HW speed preference, thought I would refresh your memory...

Edited August 10, 2019 by Faicuai

drpeter · August 10, 2019

2 hours ago, Faicuai said:

ALL timings, for ALL AHL variants

Hmm. This is what Bill Wilkinson had to say about the Ahl benchmark...

https://www.atarimagazines.com/compute/issue57/insight_atari.html

Faicuai · August 10, 2019

6 minutes ago, drpeter said:

Hmm. This is what Bill Wilkinson had to say about the Ahl benchmark...

https://www.atarimagazines.com/compute/issue57/insight_atari.html

Of course... had already read that (detailed) article a long time ago...

There you can see how the "wrenches" involved on Ahl's benchmark, are actually handled with Atari Basic's own-limitations... It became very clear to me, that the approach to these benchmarks, and the underlying SW layer (e.g. Basic, OS, FP-packs, etc.) would need to be embraced from a different point of view...

JamesD · August 10, 2019

5 hours ago, Faicuai said:

ALL timings, for ALL AHL variants / configs. on Atari, here (already in 18 secs. territory... far below 42 secs...):

Now that you mention HW speed preference, thought I would refresh your memory...

Not for a MS BASIC.

Faicuai · August 10, 2019

21 minutes ago, JamesD said:

Not for a MS BASIC.

Not correct.

MS Basic II and Indus/GT-Z80 timings (running MS-Basic as well) are also posted on that thread, at the very end (just as I reported them here with Banana sort).

Quite a wide plethora of Interpreters and optimization settings were included on that summary. However, Inter-platform MS-Basic timings are subject to significant implementation variations, though, and to such an extent that I don't take MS-Basic results as any real solid / stable reference of machine-to-machine performance. Even Basic XL + Newell/FPP (from 1983-1984) would have blown Atari MS-Basic II out of the water, as long as the OS FP package was used!

And still within 8,192 bytes of ROM, and same 16 Kbytes of OS-ROM, we can already see how much better (and superior) the Atari is capable of performing, without absolutely any change to the underlying CPU (clock, speed, etc.) or RAM timings.

Edited August 10, 2019 by Faicuai

JamesD · August 10, 2019

3 hours ago, Faicuai said:

Not correct.

MS Basic II and Indus/GT-Z80 timings (running MS-Basic as well) are also posted on that thread, at the very end (just as I reported them here with Banana sort).

Quite a wide plethora of Interpreters and optimization settings were included on that summary. However, Inter-platform MS-Basic timings are subject to significant implementation variations, though, and to such an extent that I don't take MS-Basic results as any real solid / stable reference of machine-to-machine performance. Even Basic XL + Newell/FPP (from 1983-1984) would have blown Atari MS-Basic II out of the water, as long as the OS FP package was used!

And still within 8,192 bytes of ROM, and same 16 Kbytes of OS-ROM, we can already see how much better (and superior) the Atari is capable of performing, without absolutely any change to the underlying CPU (clock, speed, etc.) or RAM timings.

If you look at how MS BASIC is implemented, it works pretty much the same way on every 68xx & 6502 platform.
If anything, it's closer to the machines doing the same work than any other test.
The one significant difference... patching the BASIC to take advantage of the hardware multiply on the MC-10 & CoCo 3 makes a huge speed difference.
The MC-10 number even came from an early version of BASIC I created that is missing most of the optimizations I've written. Most of the changes in that version were to squeeze in the larger multiply code.
I know you don't like the slower numbers, but it is most definitely a good machine to machine comparison.

Faicuai · August 11, 2019

29 minutes ago, JamesD said:

If you look at how MS BASIC is implemented, it works pretty much the same way on every 68xx & 6502 platform.
If anything, it's closer to the machines doing the same work than any other test.
The one significant difference... patching the BASIC to take advantage of the hardware multiply on the MC-10 & CoCo 3 makes a huge speed difference.
The MC-10 number even came from an early version of BASIC I created that is missing most of the optimizations I've written. Most of the changes in that version were to squeeze in the larger multiply code.
I know you don't like the slower numbers, but it is most definitely a good machine to machine comparison.

Ok, once again, let's put this to the test:

Please, run SORTBAS7.MSB (or .MST for ease of conversion on your end), as is (it is written for MS-Basic, with essentially minimal adaptions), and post the results here.

Let's see.

JamesD · August 11, 2019

1 hour ago, Faicuai said:

Ok, once again, let's put this to the test:

Please, run SORTBAS7.MSB (or .MST for ease of conversion on your end), as is (it is written for MS-Basic, with essentially minimal adaptions), and post the results here.

Let's see.

SORTBAS7.MSB isn't formatted so I can use it.
How about carriage returns instead of >
I used the data from the text file version and reformatted the code.
GOTO everywhere. This is definitely not written by someone used to optimizing MS BASIC code.

Edited August 11, 2019 by JamesD

JamesD · August 11, 2019

Does this program actually finish? It's been running for 20 minutes.

JamesD · August 11, 2019

31 minutes, 20 seconds.
Congrats, you broke MS BASIC
That would be dog slow even at double speed.

Faicuai · August 11, 2019

2 hours ago, JamesD said:

31 minutes, 20 seconds.
Congrats, you broke MS BASIC
That would be dog slow even at double speed.

Bingo! There you have it: MS-Basic crumbling under the weight of a wrench.

However, I have the impression this is worth looking at carefully. Please, check times of Apple II and any other relevant MS-Basic machines you consider.

Let's see...

JamesD · August 11, 2019

As I've said before, MS BASIC doesn't tokenize constants, including line numbers in GOTOs, and I haven't made any optimizations dealing with speeding up GOTO, or GOSUB yet.
After seeing this, it looks like a generic memory copy may be used to copy from one float to another.
If that's the case, I'll add a special case for that. It would speed up a LOT of things.

FWIW, minor changes to the code cuts off over 5 minutes, and a few more changes might drop another 5.
The memory move seems slow. I'll have to see what code is being called.

I am NOT timing the Apple. I won't sit around that long.
The Plus/4 does it in about 35.5 minutes without blanking the screen.

JamesD · August 11, 2019

Looks like the = assignment drops to the expression evaluator. The best way to fix this, is to identify it tokenization time and give it a different token than testing for =.

JamesD · August 11, 2019

Speeding up the interpreter's variable lookup code should help speed this up significantly. That is already on my to do list.
Just changing the code a bit offers significant improvement. The code below isn't optimal, but it's better without really altering the algorithm.
This code, which includes turning off the display for the memory move on the Plus/4, drops the time to a little under 24.5 minutes.
The MC-10 with new ROM completes it in 22:36. Even a minor patch to the Plus/4 should make it pass the MC-10.
More efficient algorithms make a big difference, but ultimately, MHz matters.

Ditching multi-character variable names completely would make this faster. MS BASIC only recognizes the first 2 characters anyway.
Improving the order variables are defined in at the top could speed it up, and changing a few more lines of code might help cut a few more minutes off of the time.
Using machine language for the insert memory move would probably offer the biggest improvement since it would bypass all the variable lookups and expression evaluation code, but then that's not benchmarking BASIC anymore. It would be interesting to see how much time is spent in that loop though.

Spoiler

0 J=0:R1$="":M=0:I0=0:IH=0:O=1:T=1/2:Q=-1
1 REM DEFINT I,J,L,M,N,S
2 NL=561+O:S=561+O:IMW=5:LB=T:MS=10
3 DIM I(S),N$(NL),R1$
'18 REM
'20 POKE ANTADR,ANTON*ANTIC:POKE TH,0:POKE TLB,0
4 PRINT "Loading & Sorting: ":J=0
5 FORX=0TO1STEP0:READ R1$,A,B,C:IFR1$="EOL"THEN70
6 J=J+O:I(J)=J:N$(J)=R1$:IFJ<LBTHENNEXTX
'30 REM
7 PRINT".";:IH=J-O:IL=O:IFJ<MSTHEN52
8 FORP=0TO1STEP0:IW=IH-IL:IFIW<IMWTHEN52
44 IH=INT(IW*T):I0=I(IL+IH)
46 IF R1$<N$(I0)THENIH=IL+IH:NEXTP
48 IL=IL+IH:NEXTP
'50 REM
52 IH=J:FORM=IHTOILSTEPQ:I0=I(M):IFR1$<N$(I0)THENIH=M
58 NEXT:IFIH>=JTHENNEXTX
60 I0=I(J):IL=IH+O:FORM=JTOILSTEPQ:I(M)=I(M-O):NEXT
62 I(IH)=I0:NEXTX
'64 REM
'66 X=PEEK(TH):Y=PEEK(TLB)
70 PRINT:PRINT"Printing Sort Results..."
72 FORM=OTOJ:I0=I(M):PRINT".";N$(I0);:NEXTM
'74 REM
92 PRINT:PRINT:PRINT"Processed ";J;" records!"
'94 PRINT "Elapsed: ";(X+256*Y)/VF;" sec."
96 END
98 REM

'Plus/4 changes
'0 TIMER$="000000":J=0:R1$="":M=0:I0=0:IH=0:O=1:T=1/2:Q=-1
'5 FORX=0TO1STEP0:POKE65286,27:READ R1$,A,B,C:IFR1$="EOL"THEN70
'52 POKE 65286,0:IH=J:FORM=IHTOILSTEPQ:I0=I(M):IFR1$<N$(I0)THENIH=M
'62 I(IH)=I0:NEXTX:POKE65286,27
'96 ?TI/60:END

Faicuai · August 11, 2019

3 hours ago, JamesD said:

Speeding up the interpreter's variable lookup code should help speed this up significantly. That is already on my to do list.
Just changing the code a bit offers significant improvement. The code below isn't optimal, but it's better without really altering the algorithm.
This code, which includes turning off the display for the memory move on the Plus/4, drops the time to a little under 24.5 minutes.
The MC-10 with new ROM completes it in 22:36. Even a minor patch to the Plus/4 should make it pass the MC-10.
More efficient algorithms make a big difference, but ultimately, MHz matters.

Ditching multi-character variable names completely would make this faster. MS BASIC only recognizes the first 2 characters anyway.
Improving the order variables are defined in at the top could speed it up, and changing a few more lines of code might help cut a few more minutes off of the time.
Using machine language for the insert memory move would probably offer the biggest improvement since it would bypass all the variable lookups and expression evaluation code, but then that's not benchmarking BASIC anymore. It would be interesting to see how much time is spent in that loop though.

Hide contents

0 J=0:R1$="":M=0:I0=0:IH=0:O=1:T=1/2:Q=-1
1 REM DEFINT I,J,L,M,N,S
2 NL=561+O:S=561+O:IMW=5:LB=T:MS=10
3 DIM I(S),N$(NL),R1$
'18 REM
'20 POKE ANTADR,ANTON*ANTIC:POKE TH,0:POKE TLB,0
4 PRINT "Loading & Sorting: ":J=0
5 FORX=0TO1STEP0:READ R1$,A,B,C:IFR1$="EOL"THEN70
6 J=J+O:I(J)=J:N$(J)=R1$:IFJ<LBTHENNEXTX
'30 REM
7 PRINT".";:IH=J-O:IL=O:IFJ<MSTHEN52
8 FORP=0TO1STEP0:IW=IH-IL:IFIW<IMWTHEN52
44 IH=INT(IW*T):I0=I(IL+IH)
46 IF R1$<N$(I0)THENIH=IL+IH:NEXTP
48 IL=IL+IH:NEXTP
'50 REM
52 IH=J:FORM=IHTOILSTEPQ:I0=I(M):IFR1$<N$(I0)THENIH=M
58 NEXT:IFIH>=JTHENNEXTX
60 I0=I(J):IL=IH+O:FORM=JTOILSTEPQ:I(M)=I(M-O):NEXT
62 I(IH)=I0:NEXTX
'64 REM
'66 X=PEEK(TH):Y=PEEK(TLB)
70 PRINT:PRINT"Printing Sort Results..."
72 FORM=OTOJ:I0=I(M):PRINT".";N$(I0);:NEXTM
'74 REM
92 PRINT:PRINT:PRINT"Processed ";J;" records!"
'94 PRINT "Elapsed: ";(X+256*Y)/VF;" sec."
96 END
98 REM

'Plus/4 changes
'0 TIMER$="000000":J=0:R1$="":M=0:I0=0:IH=0:O=1:T=1/2:Q=-1
'5 FORX=0TO1STEP0:POKE65286,27:READ R1$,A,B,C:IFR1$="EOL"THEN70
'52 POKE 65286,0:IH=J:FORM=IHTOILSTEPQ:I0=I(M):IFR1$<N$(I0)THENIH=M
'62 I(IH)=I0:NEXTX:POKE65286,27
'96 ?TI/60:END

Check current optimizations gains. vs. just enabling DEFINT declaration on source header.

All-in-all, I am initially inclined to think that the biggest contribution in execution timing differences are, in reality, actual implementation differences between MS Basic itself, more than just CPU-bandwidth alone (not that the latter is not a factor, but their relative contribution, that is).

This would be pretty much in-line with Doctor's initial comments, though...

I also wonder how the BBC Micro would handle the original SORTBAS7.MBT source list, though. We'll see...

JamesD · August 11, 2019

4 hours ago, Faicuai said:

Check current optimizations gains. vs. just enabling DEFINT declaration on source header.

All-in-all, I am initially inclined to think that the biggest contribution in execution timing differences are, in reality, actual implementation differences between MS Basic itself, more than just CPU-bandwidth alone (not that the latter is not a factor, but their relative contribution, that is).

This would be pretty much in-line with Doctor's initial comments, though...

I also wonder how the BBC Micro would handle the original SORTBAS7.MBT source list, though. We'll see...

DEFINT requires support for integers.

Faicuai · August 12, 2019

21 hours ago, JamesD said:

DEFINT requires support for integers.

Correct. USE IT!

We are past, already, the initial "wrenched" run, and now well into the "optimized" run. DEFINT is one of MS-Basic's key strengths.

As promised days ago, here are IndusGT-CP/M Z80 timings, on MS-Basic85 (5.29), interpreted, via CP/M Terminal Emulator on Atari 800/Incognito:

1. DEFINT: 09m:03s (about 543 secs)

2. nonINT: 11m:25s (about 685 secs)

All of the above for 561 records (exact copy of SORTBAS7.LST, and adapted to CP/M character codiing with ICDS tool included on Indus CP/M disk).

SURPRISE: results are quite a bit faster (100secs in best-case) than MS-Basic II for Atari (which is not normally the case, though).

Edited August 12, 2019 by Faicuai

So, what about strings?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members