Fast Math ROM

peteym5 · May 10, 2008

Does Turbobasic use the OS floating point package or its own?

Turbobasic uses its own floating point package. However, its improvements go beyond that. Most of the interpreter has been optimized over Atari Basic, you can just bench mark a simple for-next loop and see a major difference between the 2.

ClausB · May 10, 2008

Does Turbobasic use the OS floating point package or its own?

Turbobasic uses its own floating point package. However, its improvements go beyond that. Most of the interpreter has been optimized over Atari Basic, you can just bench mark a simple for-next loop and see a major difference between the 2.

Does it use BCD or binary math?

Rybags · May 10, 2008

I'd guess both.

Proper integer math routines would be "native" ie - not convert numbers back and forth between BCD like the C-64 BASIC.

Easy enough to find out - if the int/binary routines are slower then almost 100% that conversion is taking place.

peteym5 · May 10, 2008

I think all the variables are in the standard 6-byte BCD format, and why Turbobasic is backward compatible with Atari Basic. It even compiles to BCD format. There is no way to define a variable as a floating point or integer, like we see on later programming languages on more advance computers. Doing direct 8 and 16 bit integer math is much faster, which is why many subroutines written directly in assembly run hundreds of times the speed of a high level language like Basic.

Some programmers use machine language routines from Basic to manipulate the player/missile graphics, sound effects, or anything that is a lengthy process. Majority of the times, you only need one or two bytes to store a variable, because you are only using numbers that are less than 255 or 65535.

One of my ideals for someone working on an enhanced OS is to also include the integer multiply and divide routines (16bit integer instead of 48bit BCD floating point)

ClausB · May 10, 2008

I think all the variables are in the standard 6-byte BCD format, and why Turbobasic is backward compatible with Atari Basic. It even compiles to BCD format. There is no way to define a variable as a floating point or integer, like we see on later programming languages on more advance computers. Doing direct 8 and 16 bit integer math is much faster, which is why many subroutines written directly in assembly run hundreds of times the speed of a high level language like Basic.

Some programmers use machine language routines from Basic to manipulate the player/missile graphics, sound effects, or anything that is a lengthy process. Majority of the times, you only need one or two bytes to store a variable, because you are only using numbers that are less than 255 or 65535.

One of my ideals for someone working on an enhanced OS is to also include the integer multiply and divide routines (16bit integer instead of 48bit BCD floating point)

If I knew the entry point for Turbobasic's multiply routine, I could patch into it and maybe speed it up even more.

Yes, assembly is much faster for integer ops. The idea here was to speed up those number-crunching programs that use the floating point package and need 8-digit precision.

One could use the table-of-squares method to speed up binary multiplications of 1, 2, or 4 byte integers as well, but it would be a little different than this 5-byte BCD code.

ClausB · May 10, 2008

Just gave it a quick go.
This program runs a good deal faster (about 10.4 seconds vs 18.7) with the modified routine vs the standard one.

For comparison, I tried the Omniview OS, which supposedly has the Newell FastROM embedded in it. Time there was 12.64

Cool! Thanks for checking it out.

peteym5 · May 10, 2008

Another option is to make a non-BCD floating point package, all binary. Problem is you have to do more processing to display the variables. What type of applications you place on using your faster math routines for. If its game and graphic related for Basic games on the Atari, thats great. It is an interesting study and good hobby for you. Atari had a great concept for their floating point storage. I am not totally sure how Commodore and Apple did it on their 6502 based machines.

However, any professional mathematical process can be done a more powerful computer, like a modern PC. Anything I write in Visual Basic runs much faster than anything done in Atari machine language.

ClausB · May 10, 2008

Here are the benchmark programs. BM.BAS gives the timings I posted above. I corrected them by 2% because the program multiplies by 0 that often.

BMT.BAS checks the accuracy and speed of trig functions. I believe those functions are calculated as polynomials which make extensive use of the multiplier, so they should speed up too. BMT.BAS makes use of two trigonometric identities: For any angle a,

0 = 1-sqrt(sin(a)*sin(a)+cos(a)*cos(a))

0 = a-arctan(sin(a)/cos(a))

It computes these for 1° increments from 0 to 89° and averages the errors via the root-mean-square method. With the stock multiplier, it runs in 43.3 seconds and reports the RMS errors 1.5e-9 and 3.5e-7 respectively. With FAFMUL, it runs in 23.5 seconds with RMS errors of 6.9e-9 and 3.5e-7. This is the error increase due to rounding that I mentioned in a post above. If you add a line to print each individual error, you see that they are not larger but there are more of them. The errors in the first identity are all either 0 or 1.0e-8, which is the least significant bit in the Atari BCD representation of the number 1.

Finally, ASTRO.BAS computes the location of any planet in the sky from anywhere on earth. It uses many SIN()s and multiplies. I wrote it 25 years ago and used every Atari BASIC trick I knew to speed it up, including switching off GR. 0 mode. It locates the moon on the default date and time in 27 seconds stock or in 24 seconds with FAFMUL. What a disappointment! I guess it's all the READ / DATA statements. Anyway, there it is.

I'll post the FAFMUL source code after I finish commenting it.

FAFMUL.zip

ClausB · May 10, 2008

Here's the source listing. For ease, I used a PC assembler and converted the object code to an Atari .OBJ file. Sorry it's not in Atari assembler format, but it could be converted. It's pretty well commented, by my standards. Feel free to ask about what's not clear.

FAFMUL.txt

ClausB · May 12, 2008

BMT.BAS checks the accuracy and speed of trig functions. I believe those functions are calculated as polynomials which make extensive use of the multiplier, so they should speed up too. BMT.BAS makes use of two trigonometric identities: For any angle a,
0 = 1-sqrt(sin(a)*sin(a)+cos(a)*cos(a))

0 = a-arctan(sin(a)/cos(a))

It computes these for 1° increments from 0 to 89° and averages the errors via the root-mean-square method. With the stock multiplier, it runs in 43.3 seconds and reports the RMS errors 1.5e-9 and 3.5e-7 respectively. With FAFMUL, it runs in 23.5 seconds with RMS errors of 6.9e-9 and 3.5e-7. This is the error increase due to rounding that I mentioned in a post above. If you add a line to print each individual error, you see that they are not larger but there are more of them. The errors in the first identity are all either 0 or 1.0e-8, which is the least significant bit in the Atari BCD representation of the number 1.

I realized that BMT.BAS as posted does not measure the accuracy very well. With the Atari floating point representation, numbers slightly less than 1 have 10 decimals of precision. Numbers slightly greater than 1 have only 8 digits beyond the decimal point. BMT.BAS adds two squares less than 1 to get a sum of 1, losing 2 digits.

The first identity above can also be written:

0 = r-sqrt(r*sin(a)*r*sin(a)+r*cos(a)*r*cos(a))

If you choose r=0.99 instead of r=1, you avoid the problem and get a better accuracy measurement. To fix BMT.BAS, change:

30 X=0.99*COS(A):Y=0.99*SIN(A)

50 S=S+(0.99-Z)*(0.99-Z)

Then you get nearly the same accuracy using the stock multiplier or FAFMUL, 5.6E-09 RMS.

Well, I've beaten this to death. I see a few of you have downloaded, so please post here what you think. Thanks.

ClausB · May 13, 2008

One last benchmark: BMR.BAS takes the square of the square root and the exponent of the log. It runs stock in 39.1 seconds with RMS errors 7.5E-8 and 1.3E-5 . With FAFMUL it's almost twice as fast at 20.8 seconds and more accurate with RMS errors 4.1E-8 and 1.3E-5 . As an experiment I turned off FAFMUL's rounding and got 7.4E-8 RMS for the square root squared, so rounding does help accuracy here.

Here's the latest upload with all the benchmark programs.

FAFMUL.zip

Rybags · May 13, 2008

Nice.

Probably an idea to have some mechanism of easily "emulating" the rounding errors/precision of the stock FP-ROM. Would help in some cases in maintaining compatibility.

ClausB · May 14, 2008

Probably an idea to have some mechanism of easily "emulating" the rounding errors/precision of the stock FP-ROM. Would help in some cases in maintaining compatibility.

I disabled rounding with a simple POKE 53220,0 and enabled it with POKE 53220,80. However, I feel that rounding is better and that pure compatibilty with these very small errors is unimportant.

BTW, I noticed that FAFMUL neglects to keep the exponent within the legal range of E+98 to E-98. The stock routine returns an error if the product exceeds the range but FAFMUL allows it. The product is still a valid number but the print routine can't handle it. Try X=1E50*1E50:?X and get 1E+:0 but ?SQR(X) correctly gives 1E+50 . I'll post a fix when I can.

peteym5 · May 14, 2008

Pretty cool, I ran your benchmark programs and it almost runs 50% faster that standard Atari Basic, from like 39 seconds to 20 seconds. Also ran both programs in Turbobasic which got it down to just under 10 seconds. Think TB is probably using similar techniques, but you have to keep in mind TB also interprets all the stuff around the math statements faster also. It does the adds and subtracts, for-next processing, printing all in less time. Just benchmark an empty for-next loop and you'll see a difference.

I am not sure how to call the TB math routines themselves or where they're located. I am not sure if the original source code is available.

Edited May 14, 2008 by peteym5

_The Doctor__ · October 8, 2019

ah the fix is still waiting

Fast Math ROM

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members