Jump to content
IGNORED

Benchmarking Languages


Tursi

Recommended Posts

6 hours ago, RXB said:

Hmm how come Sprite Auto motion is not being used?

Could you think of a worse example for XB to move sprites in a single direction?

Also why is line 130 not FOR CNT= 1 to 100  and line 180 not NEXT CNT ????

 

The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B.

 

Using automotion defeats the purpose because XB is no longer moving the sprite.

Link to comment
Share on other sites

5 hours ago, apersson850 said:

It's no longer the same thing, though. In the implementation. If we only consider the looks, then it is.

To truthfully follow the original you could select a speed which spends one interrupt per pixel. Which speed is that?

 

Why code SPEED*-1 when -SPEED most certainly is faster?

i forgot that I could do that. I haven't used XB since 1984 when I got my first PC clone. :)

 

  • Like 2
Link to comment
Share on other sites

3 hours ago, Tursi said:

The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B.

 

Using automotion defeats the purpose because XB is no longer moving the sprite.

Well that would rig the test for Assembly or some other languages like that and really slow down XB by it moving the sprite?

 

Why I made as many as I could of any VDP screen option in ROM3 Assembly instead of GPL in RXB 2021 and RXB 2022.

I tried to make Assembly for Sprites but there was no improvement in speed noticeable.

Link to comment
Share on other sites

Just now, RXB said:

Well that would rig the test for Assembly or some other languages like that and really slow down XB by it moving the sprite?

That's the purpose of a benchmark - to measure performance of the same task across multiple environments.

 

But you can do whatever you want. I abandoned this whole experiment the first time it became controversial. 

 

If it were me, though, I'd re-run the benchmark in RXB 2022 to show how much the enhancements help.

 

Link to comment
Share on other sites

8 hours ago, Tursi said:

The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B.

But as we've seen it rather tests the implementation of sprite handling.

The functionality provided for that is most advanced in Pascal, and it's also the slowest. At least as long as you don't start using more intricate knowledge of how the system works.

Link to comment
Share on other sites

13 hours ago, Tursi said:

That's the purpose of a benchmark - to measure performance of the same task across multiple environments.

 

But you can do whatever you want. I abandoned this whole experiment the first time it became controversial. 

 

If it were me, though, I'd re-run the benchmark in RXB 2022 to show how much the enhancements help.

 

Yea as I could not find any way to speed up Sprites in XB without compiling the code like 2.9 does RXB has no advantage in your original test over XB in comparison.

On the other hand RXB HCHAR, VCHAR, CLEAR, and others is faster, also the RND function too.

  • Like 1
Link to comment
Share on other sites

17 hours ago, apersson850 said:

But as we've seen it rather tests the implementation of sprite handling.

The functionality provided for that is most advanced in Pascal, and it's also the slowest. At least as long as you don't start using more intricate knowledge of how the system works.

I was explaining to Rich what I wrote it for. That people want to debate it instead of making something better is just baffling to me. ;)

 

9 hours ago, RXB said:

Yea as I could not find any way to speed up Sprites in XB without compiling the code like 2.9 does RXB has no advantage in your original test over XB in comparison.

On the other hand RXB HCHAR, VCHAR, CLEAR, and others is faster, also the RND function too.

Well, it'd be easy to change the test to moving an asterisk around with HCHAR. ;) Try that across the various systems! Assembly and C should be equivalent, the others would be new data.

  • Like 3
Link to comment
Share on other sites

On 1/22/2016 at 6:12 PM, lucien2 said:

GPL: 80 seconds

When we compared TF and GPL with the bricks demo 4 1/2 years ago they were closer.

 

 

  Hide contents

 



	grom	>6000
	data	>aa00,>0100,>0000
	data	menu
	data	>0000,>0000,>0000,>0000
menu	data	>0000
	data	start
	stri	'BENCHMARK'

upcase	equ	>0018
x	equ	arg
y	equ	arg+1
xy	equ	arg
cnt	equ	arg+2

start
* magnify 2
	st	>e1,@arg
	move	1,@arg,#1
* load uppercase character set
	dst	>0900,@fac
	call	upcase
* copy asterisk pattern to sprite char 0
	move	8,v@42*8+>800,v@>400
* define sprite 0 to character 0, color black	
	dst	>8001,v@>302
* locate sprite 0 to 1,1
	dst	>0000,v@>300
	
	st	100,@cnt
	
L5	clr	@x
L1	st	@x,v@>301
	inc	@x
	ch	239,@x
	br	L1
	
	clr	@y
L2	st	@y,v@>300
	inc	@y
	ch	175,@y
	br	L2
	
	st	239,@x
L3	st	@x,v@>301
	dec	@x
	ceq	255,@x
	br	L3
	
	st	175,@y
L4	st	@y,v@>300
	dec	@y
	ceq	255,@y
	br	L4

	dec	@cnt
	cz	@cnt
	br	L5

	exit

 

@RXB  Lucien wrote up this GPL version Rich.

  • Like 1
Link to comment
Share on other sites

GPL is designed around saving as much memory as possible and at same time using Assembly ROM to do more work using GPL XML routines.

GPL commands are just ROM 0 Assembly routines with GPL being the upper layer.

For example, only 2-byte command ALL 32 in GPL is the CALL CLEAR command BASIC or XB.

 

So, if I were honestly to make a program, I would make new ROM to go with the GPL program to speed it up and make it more optimal.

This is how GPL is supposed to work. Just like RXB 2022 that uses Assembly to speed up slower routines.

 

Spoiler

    grom    >6000
    data    >aa00,>0100,>0000
    data    menu
    data    >0000,>0000,>0000,>0000
menu    data    >0000
    data    start
    stri    'BENCHMARK'

upcase    equ    >0018
x    equ    arg
y    equ    arg+1
xy    equ    arg
cnt    equ    arg+2

start
* magnify 2
    st    >e1,@arg
    move    1,@arg,#1
* load uppercase character set
    dst    >0900,@fac
    call    upcase
* copy asterisk pattern to sprite char 0
    move    8,v@42*8+>800,v@>400
* define sprite 0 to character 0, color black    
    dst    >8001,v@>302
* locate sprite 0 to 1,1
    dst    >0000,v@>300
    
    st    100,@cnt
    
L5    CLR    @X
    ST      239,@PAD
        XML     RCOL       * MOVES X INTO SPRITE 1 COLUMN
                           * INCREMENTS X AND COMPARES X 
                           * TO 239 IF YES END XML 
    
    CLR    @Y
        ST      175,@PAD   
        XML     DROW       * MOVES Y INTO SPRITE 1 ROW
                           * INCREMENTS Y AND COMPARES Y
                           * TO 239 IF YES END XML
    
    ST    239,@X
        ST      255,@PAD
        XML     LCOL       * MOVES X INTO SPRITE 1 COLUMN
                           * DECREMENTS X AND COMPARES X
                           * TO 255 IF YES END XML
    
    ST    175,@Y
        ST      255,@Y
        XML     UROW       * MOVES Y INTO SPRITE 1 ROW
                           * DECREMENTS Y AND COMPARES Y 
                           * 255 IF YES END XML

    dec    @cnt
*    cz    @cnt * Not needed as 0 drops out loop
    br    L5   * loops as long as not 0

    exit

 

Link to comment
Share on other sites

14 hours ago, Tursi said:

I was explaining to Rich what I wrote it for. That people want to debate it instead of making something better is just baffling to me. ;)

Yes, I got that. Still, if you want to develop it into something further, you have to debate the original intention to understand where to go with it.

Link to comment
Share on other sites

3 hours ago, apersson850 said:

Yes, I got that. Still, if you want to develop it into something further, you have to debate the original intention to understand where to go with it.

The original intention is stated pretty clearly in post 1.

 



For benchmarking languages, really... just write comparable programs. Trying to compare languages and implementations was always a battle, even back in the day, since algorithm matters, what parts of the language you touch matters, what parts of the hardware you need to use matters, etc. But off the top of my head, a good quick one for the TI might be something like manually moving a sprite around the outer edge of the screen, one pixel at a time (no auto-motion). See how fast you can get it whipping around. ;) Make it loop 100 times and then exit, so that you can time the total runtime.

 

All the things that have been debated since counter one of the statements in that paragraph.

 

Particularly note the "off the top of my head" comment. Indicating this was never proposed as a be-all-end-all test. I even touch on the part that any test is going to perform better in some environments than other.

 

There's no need nor /point/ to "improving" this one... any changes to the concept create a new benchmark and invalidate all previous tests anyway. All the results here mean is "this is the timing measured on this particular benchmark".

 

If you want a "better" benchmark, don't bother debating me, go write a better benchmark! It's fine, really! ;)

 

  • Like 2
Link to comment
Share on other sites

Your example proves nothing-there is obviously something wrong with your test.

First off, you show TI BASIC running faster than any of the XB's so that should tell you something is fishy.

Second, RXB 2022A does the assembly on the 8 bit bus, while other XB's use GPL ALL where the loop is on the 16 bit bus. Yet you show the same speed which is not possible.

The gold standard is this: "How does it perform on a real TI-99?" Reciprocating Bill tested this and got these results.

---------------------------------------------------------------------------------------------------------------

Running:

10 FOR I = 1 TO 1000          (I don't see any reason to wait around for 30+ minutes to get the basic ratios).

20 CALL CLEAR

30 NEXT I

...on real iron* running out of a FinalGrom, I get:

TI-BASIC 60 seconds

Extended Basic 23 seconds

RXB 2020  23 seconds

RXB 2022  52 seconds

*Console with RAM on 16-bit bus, which doesn't have significant impact on the speed of these BASICs.

(Just to check the assumption that FinalGrom has no impact on these numbers, I ran the code on an Extended Basic cartridge. No difference.) 

------------------------------------------------------------------------------------------------------------------

Your results are nothing like these. Therefore it follows that something is wrong with your tests or your equipment. I don't know which. Maybe you cannot run 6 copies of Classic99 at once and get accurate clock results.

Edited by senior_falcon
  • Like 3
Link to comment
Share on other sites

13 hours ago, senior_falcon said:

Your example proves nothing-there is obviously something wrong with your test.

First off, you show TI BASIC running faster than any of the XB's so that should tell you something is fishy.

Second, RXB 2022A does the assembly on the 8 bit bus, while other XB's use GPL ALL where the loop is on the 16 bit bus. Yet you show the same speed which is not possible.

The gold standard is this: "How does it perform on a real TI-99?" Reciprocating Bill tested this and got these results.

---------------------------------------------------------------------------------------------------------------

Running:

10 FOR I = 1 TO 1000          (I don't see any reason to wait around for 30+ minutes to get the basic ratios).

20 CALL CLEAR

30 NEXT I

...on real iron* running out of a FinalGrom, I get:

TI-BASIC 60 seconds

Extended Basic 23 seconds

RXB 2020  23 seconds

RXB 2022  52 seconds

*Console with RAM on 16-bit bus, which doesn't have significant impact on the speed of these BASICs.

(Just to check the assumption that FinalGrom has no impact on these numbers, I ran the code on an Extended Basic cartridge. No difference.) 

------------------------------------------------------------------------------------------------------------------

Your results are nothing like these. Therefore it follows that something is wrong with your tests or your equipment. I don't know which. Maybe you cannot run 6 copies of Classic99 at once and get accurate clock results.

LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL!

 

And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad.

I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that!

And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet!

 

And your timing method is flawed at best. The video posted proves that!

I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times.

Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB?

Link to comment
Share on other sites

3 hours ago, RXB said:

LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL!

I'm glad I could be amusing.

And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad.

This is not news to me. You can see all this with the Classic99 debugger. By the way, when you use the debugger to disassemble, you can see the number of clock cycles used by an instruction. For standard XB each iteration of the loop takes 50 cycles; for RXB 2022A it takes 58 cycles.

I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that!

Post #216 above by RXB has a video with all the proof I need.

And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet!

Then please explain how your test looping 100000 times only took twice as long as when you looped 10000 times, or why BASIC runs so much faster than XB in your video above.

And your timing method is flawed at best. The video posted proves that!

The video I posted shows what everyone else is seeing: That RXB 2022A is about half as fast as standard XB when doing CALL CLEAR.

I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times.

Let me get this straight. You have the ability to test this on a real TI99. Instead of doing that, you found that writing an angry diatribe was a more productive use of your time. Why don't you give it a try on real iron. You can loop it a million times or even a billion if that would make you feel better, but be sure to use 2022A.

Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB?

I thought this was about CALL CLEAR.

Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.)

This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve.

 

Edited by senior_falcon
  • Like 4
Link to comment
Share on other sites

3 hours ago, RXB said:

LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL!

 

And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad.

I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that!

And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet!

 

And your timing method is flawed at best. The video posted proves that!

I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times.

Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB?

C'mon Rich. On real iron with a stop watch:

 

10 for I = 1 to 1000

20 call clear

30 next I

 

RXB 2020  22.6" (identical to XB) 

RXB 2022  51.9"

 

Of course, the accuracy is to within ~.25 seconds (namely, my reaction time). But you don't need an atomic clock, or 100,000 iterations, to see that something changed between RXB 2020 and RXB 2022. 

  • Like 4
  • Haha 1
Link to comment
Share on other sites

Slow?/Fast hardware vs. accurate clock results...

 

I don't see that the issue here is necessarily limited to hardware speed capabilities... but also O/S dynamics, such as priority/affinity.

 

I find it somewhat unlikely that multiple running apps, having the timing/hardware requirements of emulation, would receive the same threading provisions.

 

I'm almost certain that I conducted similar tests using Classic99, long ago on win98, or maybe it was xp, relating to my AUTOMATION innovations....

and found as expected that whichever app. was in FOCUS got the most processor time. I'm guessing that things like the availability of the video overlay come into play as well.:ponder:

  • Like 1
Link to comment
Share on other sites

2 hours ago, senior_falcon said:

Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.)

This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve.

 

If you have TIPI then there's a clock available.. just saying..

  • Thanks 1
Link to comment
Share on other sites

The way to improve the performance of CALL CLEAR relative to XB would be to make it an unrolled loop. I think just 2 times unrolled would beat XB even though it has the advantage of running from 16-bit ROM, and 4 times unrolled would be 20% faster. But there are probably more important places where the performance could be improved by assembly routines. 

 

  • Like 3
Link to comment
Share on other sites

2 hours ago, Asmusr said:

The way to improve the performance of CALL CLEAR relative to XB would be to make it an unrolled loop. I think just 2 times unrolled would beat XB even though it has the advantage of running from 16-bit ROM, and 4 times unrolled would be 20% faster. But there are probably more important places where the performance could be improved by assembly routines. 

 

Yes, CALL CLEAR is probably the least important thing to speed up. This is, after all, Extended BASIC, which is not exactly a speed demon.

There is more going on here than just the 8 bit vs 16 bit databus. The loop should only be 16% slower on the 8 bit bus (58 vs 50 clock cycles), yet the end result is less than half as fast. I would guess that extra GPL instructions have been added, and a little gpl goes a long way when it comes to slowing things down.

Edited by senior_falcon
  • Like 2
Link to comment
Share on other sites

6 hours ago, senior_falcon said:

Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.)

This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve.

 

You are correct and I was wrong.

Turns out when working on CALL COLLIDE I accidently reverted back a section of Assembly for CALL CLEAR and made it worse.

 

I have to update RXB 2022 to send out the correction, thanks for your help.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...