Benchmarking Languages

Tursi · April 23, 2022

6 hours ago, RXB said:

Hmm how come Sprite Auto motion is not being used?

Could you think of a worse example for XB to move sprites in a single direction?

Also why is line 130 not FOR CNT= 1 to 100 and line 180 not NEXT CNT ????

The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B.

Using automotion defeats the purpose because XB is no longer moving the sprite.

+TheBF · April 23, 2022

5 hours ago, apersson850 said:

It's no longer the same thing, though. In the implementation. If we only consider the looks, then it is.

To truthfully follow the original you could select a speed which spends one interrupt per pixel. Which speed is that?

Why code SPEED*-1 when -SPEED most certainly is faster?

i forgot that I could do that. I haven't used XB since 1984 when I got my first PC clone.

RXB · April 23, 2022

3 hours ago, Tursi said:

The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B.

Using automotion defeats the purpose because XB is no longer moving the sprite.

Well that would rig the test for Assembly or some other languages like that and really slow down XB by it moving the sprite?

Why I made as many as I could of any VDP screen option in ROM3 Assembly instead of GPL in RXB 2021 and RXB 2022.

I tried to make Assembly for Sprites but there was no improvement in speed noticeable.

Tursi · April 23, 2022

Just now, RXB said:

Well that would rig the test for Assembly or some other languages like that and really slow down XB by it moving the sprite?

That's the purpose of a benchmark - to measure performance of the same task across multiple environments.

But you can do whatever you want. I abandoned this whole experiment the first time it became controversial.

If it were me, though, I'd re-run the benchmark in RXB 2022 to show how much the enhancements help.

apersson850 · April 23, 2022

8 hours ago, Tursi said:

The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B.

But as we've seen it rather tests the implementation of sprite handling.

The functionality provided for that is most advanced in Pascal, and it's also the slowest. At least as long as you don't start using more intricate knowledge of how the system works.

RXB · April 23, 2022

13 hours ago, Tursi said:

That's the purpose of a benchmark - to measure performance of the same task across multiple environments.

But you can do whatever you want. I abandoned this whole experiment the first time it became controversial.

If it were me, though, I'd re-run the benchmark in RXB 2022 to show how much the enhancements help.

Yea as I could not find any way to speed up Sprites in XB without compiling the code like 2.9 does RXB has no advantage in your original test over XB in comparison.

On the other hand RXB HCHAR, VCHAR, CLEAR, and others is faster, also the RND function too.

+FarmerPotato · April 23, 2022

On 4/22/2022 at 7:31 AM, Vorticon said:

(listing below).

Erastosthenes sieve benchmark.pdf 11.29 MB · 9 downloads

I love that you have this on 32-column thermal paper.

Tursi · April 24, 2022

17 hours ago, apersson850 said:

But as we've seen it rather tests the implementation of sprite handling.

The functionality provided for that is most advanced in Pascal, and it's also the slowest. At least as long as you don't start using more intricate knowledge of how the system works.

I was explaining to Rich what I wrote it for. That people want to debate it instead of making something better is just baffling to me.

9 hours ago, RXB said:

Yea as I could not find any way to speed up Sprites in XB without compiling the code like 2.9 does RXB has no advantage in your original test over XB in comparison.

On the other hand RXB HCHAR, VCHAR, CLEAR, and others is faster, also the RND function too.

Well, it'd be easy to change the test to moving an asterisk around with HCHAR. Try that across the various systems! Assembly and C should be equivalent, the others would be new data.

lucien2 · April 24, 2022

On 4/22/2022 at 7:41 PM, RXB said:

Where is the GPL code for this as I think I could punch it up a little faster.

Here it is: https://atariage.com/forums/topic/248187-benchmarking-languages/?do=findComment&comment=3422021

You must click "Reveal hidden contents" to see the code.

+Vorticon · April 24, 2022

16 hours ago, FarmerPotato said:

I love that you have this on 32-column thermal paper.

That was the only option available to get a sharable (is that a word?) listing from real hardware ?

+TheBF · April 24, 2022

On 1/22/2016 at 6:12 PM, lucien2 said:

GPL: 80 seconds

When we compared TF and GPL with the bricks demo 4 1/2 years ago they were closer.

Hide contents



	grom	>6000
	data	>aa00,>0100,>0000
	data	menu
	data	>0000,>0000,>0000,>0000
menu	data	>0000
	data	start
	stri	'BENCHMARK'

upcase	equ	>0018
x	equ	arg
y	equ	arg+1
xy	equ	arg
cnt	equ	arg+2

start
* magnify 2
	st	>e1,@arg
	move	1,@arg,#1
* load uppercase character set
	dst	>0900,@fac
	call	upcase
* copy asterisk pattern to sprite char 0
	move	8,v@42*8+>800,v@>400
* define sprite 0 to character 0, color black	
	dst	>8001,v@>302
* locate sprite 0 to 1,1
	dst	>0000,v@>300
	
	st	100,@cnt
	
L5	clr	@x
L1	st	@x,v@>301
	inc	@x
	ch	239,@x
	br	L1
	
	clr	@y
L2	st	@y,v@>300
	inc	@y
	ch	175,@y
	br	L2
	
	st	239,@x
L3	st	@x,v@>301
	dec	@x
	ceq	255,@x
	br	L3
	
	st	175,@y
L4	st	@y,v@>300
	dec	@y
	ceq	255,@y
	br	L4

	dec	@cnt
	cz	@cnt
	br	L5

	exit

@RXB Lucien wrote up this GPL version Rich.

RXB · April 24, 2022

GPL is designed around saving as much memory as possible and at same time using Assembly ROM to do more work using GPL XML routines.

GPL commands are just ROM 0 Assembly routines with GPL being the upper layer.

For example, only 2-byte command ALL 32 in GPL is the CALL CLEAR command BASIC or XB.

So, if I were honestly to make a program, I would make new ROM to go with the GPL program to speed it up and make it more optimal.

This is how GPL is supposed to work. Just like RXB 2022 that uses Assembly to speed up slower routines.

Spoiler

   grom   >6000
   data   >aa00,>0100,>0000
   data   menu
   data   >0000,>0000,>0000,>0000
menu   data   >0000
   data   start
   stri   'BENCHMARK'

upcase   equ   >0018
x   equ   arg
y   equ   arg+1
xy   equ   arg
cnt   equ   arg+2

start
* magnify 2
   st   >e1,@arg
   move   1,@arg,#1
* load uppercase character set
   dst   >0900,@fac
   call   upcase
* copy asterisk pattern to sprite char 0
   move   8,v@42*8+>800,v@>400
* define sprite 0 to character 0, color black
   dst   >8001,v@>302
* locate sprite 0 to 1,1
   dst   >0000,v@>300

   st   100,@cnt

L5   CLR   @X
   ST 239,@PAD
XML RCOL * MOVES X INTO SPRITE 1 COLUMN
* INCREMENTS X AND COMPARES X
* TO 239 IF YES END XML

   CLR   @Y
ST 175,@PAD
XML DROW * MOVES Y INTO SPRITE 1 ROW
* INCREMENTS Y AND COMPARES Y
* TO 239 IF YES END XML

   ST   239,@X
ST 255,@PAD
XML LCOL * MOVES X INTO SPRITE 1 COLUMN
* DECREMENTS X AND COMPARES X
* TO 255 IF YES END XML

   ST   175,@Y
ST 255,@Y
XML UROW * MOVES Y INTO SPRITE 1 ROW
* DECREMENTS Y AND COMPARES Y
* 255 IF YES END XML

   dec   @cnt
*   cz   @cnt * Not needed as 0 drops out loop
   br   L5 * loops as long as not 0

exit

apersson850 · April 24, 2022

Saving as much memory as possible is also one of the main priorities of the p-system. The other is portability across hardware platforms, which was irrelevant for GPL. Top execution speed was clearly not the main priority for any of them.

apersson850 · April 24, 2022

14 hours ago, Tursi said:

I was explaining to Rich what I wrote it for. That people want to debate it instead of making something better is just baffling to me.

Yes, I got that. Still, if you want to develop it into something further, you have to debate the original intention to understand where to go with it.

Tursi · April 24, 2022

3 hours ago, apersson850 said:

Yes, I got that. Still, if you want to develop it into something further, you have to debate the original intention to understand where to go with it.

The original intention is stated pretty clearly in post 1.

For benchmarking languages, really... just write comparable programs. Trying to compare languages and implementations was always a battle, even back in the day, since algorithm matters, what parts of the language you touch matters, what parts of the hardware you need to use matters, etc. But off the top of my head, a good quick one for the TI might be something like manually moving a sprite around the outer edge of the screen, one pixel at a time (no auto-motion). See how fast you can get it whipping around. Make it loop 100 times and then exit, so that you can time the total runtime.

All the things that have been debated since counter one of the statements in that paragraph.

Particularly note the "off the top of my head" comment. Indicating this was never proposed as a be-all-end-all test. I even touch on the part that any test is going to perform better in some environments than other.

There's no need nor /point/ to "improving" this one... any changes to the concept create a new benchmark and invalidate all previous tests anyway. All the results here mean is "this is the timing measured on this particular benchmark".

If you want a "better" benchmark, don't bother debating me, go write a better benchmark! It's fine, really!

RXB · April 28, 2022

I wanted to dispute that RXB was slowest in CALL CLEAR so showed a test proving it was not.

This is to show FOR NEXT is very very consistant per any XB variant on the TI99/4A

https://youtu.be/UIVs_wnKeck

senior_falcon · April 28, 2022

Your example proves nothing-there is obviously something wrong with your test.

First off, you show TI BASIC running faster than any of the XB's so that should tell you something is fishy.

Second, RXB 2022A does the assembly on the 8 bit bus, while other XB's use GPL ALL where the loop is on the 16 bit bus. Yet you show the same speed which is not possible.

The gold standard is this: "How does it perform on a real TI-99?" Reciprocating Bill tested this and got these results.

---------------------------------------------------------------------------------------------------------------

Running:

10 FOR I = 1 TO 1000 (I don't see any reason to wait around for 30+ minutes to get the basic ratios).

20 CALL CLEAR

30 NEXT I

...on real iron* running out of a FinalGrom, I get:

TI-BASIC 60 seconds

Extended Basic 23 seconds

RXB 2020 23 seconds

RXB 2022 52 seconds

*Console with RAM on 16-bit bus, which doesn't have significant impact on the speed of these BASICs.

(Just to check the assumption that FinalGrom has no impact on these numbers, I ran the code on an Extended Basic cartridge. No difference.)

------------------------------------------------------------------------------------------------------------------

Your results are nothing like these. Therefore it follows that something is wrong with your tests or your equipment. I don't know which. Maybe you cannot run 6 copies of Classic99 at once and get accurate clock results.

Edited April 28, 2022 by senior_falcon

RXB · April 29, 2022

13 hours ago, senior_falcon said:

Your example proves nothing-there is obviously something wrong with your test.

First off, you show TI BASIC running faster than any of the XB's so that should tell you something is fishy.

Second, RXB 2022A does the assembly on the 8 bit bus, while other XB's use GPL ALL where the loop is on the 16 bit bus. Yet you show the same speed which is not possible.

The gold standard is this: "How does it perform on a real TI-99?" Reciprocating Bill tested this and got these results.

---------------------------------------------------------------------------------------------------------------

Running:

10 FOR I = 1 TO 1000 (I don't see any reason to wait around for 30+ minutes to get the basic ratios).

20 CALL CLEAR

30 NEXT I

...on real iron* running out of a FinalGrom, I get:

TI-BASIC 60 seconds

Extended Basic 23 seconds

RXB 2020 23 seconds

RXB 2022 52 seconds

*Console with RAM on 16-bit bus, which doesn't have significant impact on the speed of these BASICs.

(Just to check the assumption that FinalGrom has no impact on these numbers, I ran the code on an Extended Basic cartridge. No difference.)

------------------------------------------------------------------------------------------------------------------

Your results are nothing like these. Therefore it follows that something is wrong with your tests or your equipment. I don't know which. Maybe you cannot run 6 copies of Classic99 at once and get accurate clock results.

LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL!

And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad.

I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that!

And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet!

And your timing method is flawed at best. The video posted proves that!

I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times.

Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB?

senior_falcon · April 29, 2022

3 hours ago, RXB said:

LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL!

I'm glad I could be amusing.

And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad.

This is not news to me. You can see all this with the Classic99 debugger. By the way, when you use the debugger to disassemble, you can see the number of clock cycles used by an instruction. For standard XB each iteration of the loop takes 50 cycles; for RXB 2022A it takes 58 cycles.

I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that!

Post #216 above by RXB has a video with all the proof I need.

And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet!

Then please explain how your test looping 100000 times only took twice as long as when you looped 10000 times, or why BASIC runs so much faster than XB in your video above.

And your timing method is flawed at best. The video posted proves that!

The video I posted shows what everyone else is seeing: That RXB 2022A is about half as fast as standard XB when doing CALL CLEAR.

I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times.

Let me get this straight. You have the ability to test this on a real TI99. Instead of doing that, you found that writing an angry diatribe was a more productive use of your time. Why don't you give it a try on real iron. You can loop it a million times or even a billion if that would make you feel better, but be sure to use 2022A.

Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB?

I thought this was about CALL CLEAR.

Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.)

This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve.

Edited April 29, 2022 by senior_falcon

Reciprocating Bill · April 29, 2022

3 hours ago, RXB said:

LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL!

And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad.

I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that!

And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet!

And your timing method is flawed at best. The video posted proves that!

I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times.

Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB?

C'mon Rich. On real iron with a stop watch:

10 for I = 1 to 1000

20 call clear

30 next I

RXB 2020 22.6" (identical to XB)

RXB 2022 51.9"

Of course, the accuracy is to within ~.25 seconds (namely, my reaction time). But you don't need an atomic clock, or 100,000 iterations, to see that something changed between RXB 2020 and RXB 2022.

HOME AUTOMATION · April 29, 2022

Slow?/Fast hardware vs. accurate clock results...

I don't see that the issue here is necessarily limited to hardware speed capabilities... but also O/S dynamics, such as priority/affinity.

I find it somewhat unlikely that multiple running apps, having the timing/hardware requirements of emulation, would receive the same threading provisions.

I'm almost certain that I conducted similar tests using Classic99, long ago on win98, or maybe it was xp, relating to my AUTOMATION innovations....

and found as expected that whichever app. was in FOCUS got the most processor time. I'm guessing that things like the availability of the video overlay come into play as well. :ponder:

GDMike · April 29, 2022

2 hours ago, senior_falcon said:

Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.)

This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve.

If you have TIPI then there's a clock available.. just saying..

Asmusr · April 29, 2022

The way to improve the performance of CALL CLEAR relative to XB would be to make it an unrolled loop. I think just 2 times unrolled would beat XB even though it has the advantage of running from 16-bit ROM, and 4 times unrolled would be 20% faster. But there are probably more important places where the performance could be improved by assembly routines.

senior_falcon · April 29, 2022

2 hours ago, Asmusr said:

The way to improve the performance of CALL CLEAR relative to XB would be to make it an unrolled loop. I think just 2 times unrolled would beat XB even though it has the advantage of running from 16-bit ROM, and 4 times unrolled would be 20% faster. But there are probably more important places where the performance could be improved by assembly routines.

Yes, CALL CLEAR is probably the least important thing to speed up. This is, after all, Extended BASIC, which is not exactly a speed demon.

There is more going on here than just the 8 bit vs 16 bit databus. The loop should only be 16% slower on the 8 bit bus (58 vs 50 clock cycles), yet the end result is less than half as fast. I would guess that extra GPL instructions have been added, and a little gpl goes a long way when it comes to slowing things down.

Edited April 29, 2022 by senior_falcon

RXB · April 29, 2022

6 hours ago, senior_falcon said:

Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.)

This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve.

You are correct and I was wrong.

Turns out when working on CALL COLLIDE I accidently reverted back a section of Assembly for CALL CLEAR and made it worse.

I have to update RXB 2022 to send out the correction, thanks for your help.

Benchmarking Languages

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members