Tursi Posted April 23, 2022 Author Share Posted April 23, 2022 6 hours ago, RXB said: Hmm how come Sprite Auto motion is not being used? Could you think of a worse example for XB to move sprites in a single direction? Also why is line 130 not FOR CNT= 1 to 100 and line 180 not NEXT CNT ???? The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B. Using automotion defeats the purpose because XB is no longer moving the sprite. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 23, 2022 Share Posted April 23, 2022 5 hours ago, apersson850 said: It's no longer the same thing, though. In the implementation. If we only consider the looks, then it is. To truthfully follow the original you could select a speed which spends one interrupt per pixel. Which speed is that? Why code SPEED*-1 when -SPEED most certainly is faster? i forgot that I could do that. I haven't used XB since 1984 when I got my first PC clone. 2 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 23, 2022 Share Posted April 23, 2022 3 hours ago, Tursi said: The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B. Using automotion defeats the purpose because XB is no longer moving the sprite. Well that would rig the test for Assembly or some other languages like that and really slow down XB by it moving the sprite? Why I made as many as I could of any VDP screen option in ROM3 Assembly instead of GPL in RXB 2021 and RXB 2022. I tried to make Assembly for Sprites but there was no improvement in speed noticeable. Quote Link to comment Share on other sites More sharing options...
Tursi Posted April 23, 2022 Author Share Posted April 23, 2022 Just now, RXB said: Well that would rig the test for Assembly or some other languages like that and really slow down XB by it moving the sprite? That's the purpose of a benchmark - to measure performance of the same task across multiple environments. But you can do whatever you want. I abandoned this whole experiment the first time it became controversial. If it were me, though, I'd re-run the benchmark in RXB 2022 to show how much the enhancements help. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted April 23, 2022 Share Posted April 23, 2022 8 hours ago, Tursi said: The purpose of the test is to measure performance of interfacing to the VDP. The purpose is not to move the sprite from point A to point B. But as we've seen it rather tests the implementation of sprite handling. The functionality provided for that is most advanced in Pascal, and it's also the slowest. At least as long as you don't start using more intricate knowledge of how the system works. Quote Link to comment Share on other sites More sharing options...
RXB Posted April 23, 2022 Share Posted April 23, 2022 13 hours ago, Tursi said: That's the purpose of a benchmark - to measure performance of the same task across multiple environments. But you can do whatever you want. I abandoned this whole experiment the first time it became controversial. If it were me, though, I'd re-run the benchmark in RXB 2022 to show how much the enhancements help. Yea as I could not find any way to speed up Sprites in XB without compiling the code like 2.9 does RXB has no advantage in your original test over XB in comparison. On the other hand RXB HCHAR, VCHAR, CLEAR, and others is faster, also the RND function too. 1 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted April 23, 2022 Share Posted April 23, 2022 On 4/22/2022 at 7:31 AM, Vorticon said: (listing below). Erastosthenes sieve benchmark.pdf 11.29 MB · 9 downloads I love that you have this on 32-column thermal paper. 5 Quote Link to comment Share on other sites More sharing options...
Tursi Posted April 24, 2022 Author Share Posted April 24, 2022 17 hours ago, apersson850 said: But as we've seen it rather tests the implementation of sprite handling. The functionality provided for that is most advanced in Pascal, and it's also the slowest. At least as long as you don't start using more intricate knowledge of how the system works. I was explaining to Rich what I wrote it for. That people want to debate it instead of making something better is just baffling to me. 9 hours ago, RXB said: Yea as I could not find any way to speed up Sprites in XB without compiling the code like 2.9 does RXB has no advantage in your original test over XB in comparison. On the other hand RXB HCHAR, VCHAR, CLEAR, and others is faster, also the RND function too. Well, it'd be easy to change the test to moving an asterisk around with HCHAR. Try that across the various systems! Assembly and C should be equivalent, the others would be new data. 3 Quote Link to comment Share on other sites More sharing options...
lucien2 Posted April 24, 2022 Share Posted April 24, 2022 On 4/22/2022 at 7:41 PM, RXB said: Where is the GPL code for this as I think I could punch it up a little faster. Here it is: https://atariage.com/forums/topic/248187-benchmarking-languages/?do=findComment&comment=3422021 You must click "Reveal hidden contents" to see the code. 1 1 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted April 24, 2022 Share Posted April 24, 2022 16 hours ago, FarmerPotato said: I love that you have this on 32-column thermal paper. That was the only option available to get a sharable (is that a word?) listing from real hardware ? 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 24, 2022 Share Posted April 24, 2022 On 1/22/2016 at 6:12 PM, lucien2 said: GPL: 80 seconds When we compared TF and GPL with the bricks demo 4 1/2 years ago they were closer. Hide contents grom >6000 data >aa00,>0100,>0000 data menu data >0000,>0000,>0000,>0000 menu data >0000 data start stri 'BENCHMARK' upcase equ >0018 x equ arg y equ arg+1 xy equ arg cnt equ arg+2 start * magnify 2 st >e1,@arg move 1,@arg,#1 * load uppercase character set dst >0900,@fac call upcase * copy asterisk pattern to sprite char 0 move 8,v@42*8+>800,v@>400 * define sprite 0 to character 0, color black dst >8001,v@>302 * locate sprite 0 to 1,1 dst >0000,v@>300 st 100,@cnt L5 clr @x L1 st @x,v@>301 inc @x ch 239,@x br L1 clr @y L2 st @y,v@>300 inc @y ch 175,@y br L2 st 239,@x L3 st @x,v@>301 dec @x ceq 255,@x br L3 st 175,@y L4 st @y,v@>300 dec @y ceq 255,@y br L4 dec @cnt cz @cnt br L5 exit @RXB Lucien wrote up this GPL version Rich. 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 24, 2022 Share Posted April 24, 2022 GPL is designed around saving as much memory as possible and at same time using Assembly ROM to do more work using GPL XML routines. GPL commands are just ROM 0 Assembly routines with GPL being the upper layer. For example, only 2-byte command ALL 32 in GPL is the CALL CLEAR command BASIC or XB. So, if I were honestly to make a program, I would make new ROM to go with the GPL program to speed it up and make it more optimal. This is how GPL is supposed to work. Just like RXB 2022 that uses Assembly to speed up slower routines. Spoiler grom >6000 data >aa00,>0100,>0000 data menu data >0000,>0000,>0000,>0000 menu data >0000 data start stri 'BENCHMARK' upcase equ >0018 x equ arg y equ arg+1 xy equ arg cnt equ arg+2 start * magnify 2 st >e1,@arg move 1,@arg,#1 * load uppercase character set dst >0900,@fac call upcase * copy asterisk pattern to sprite char 0 move 8,v@42*8+>800,v@>400 * define sprite 0 to character 0, color black dst >8001,v@>302 * locate sprite 0 to 1,1 dst >0000,v@>300 st 100,@cnt L5 CLR @X ST 239,@PAD XML RCOL * MOVES X INTO SPRITE 1 COLUMN * INCREMENTS X AND COMPARES X * TO 239 IF YES END XML CLR @Y ST 175,@PAD XML DROW * MOVES Y INTO SPRITE 1 ROW * INCREMENTS Y AND COMPARES Y * TO 239 IF YES END XML ST 239,@X ST 255,@PAD XML LCOL * MOVES X INTO SPRITE 1 COLUMN * DECREMENTS X AND COMPARES X * TO 255 IF YES END XML ST 175,@Y ST 255,@Y XML UROW * MOVES Y INTO SPRITE 1 ROW * DECREMENTS Y AND COMPARES Y * 255 IF YES END XML dec @cnt * cz @cnt * Not needed as 0 drops out loop br L5 * loops as long as not 0 exit Quote Link to comment Share on other sites More sharing options...
apersson850 Posted April 24, 2022 Share Posted April 24, 2022 Saving as much memory as possible is also one of the main priorities of the p-system. The other is portability across hardware platforms, which was irrelevant for GPL. Top execution speed was clearly not the main priority for any of them. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted April 24, 2022 Share Posted April 24, 2022 14 hours ago, Tursi said: I was explaining to Rich what I wrote it for. That people want to debate it instead of making something better is just baffling to me. Yes, I got that. Still, if you want to develop it into something further, you have to debate the original intention to understand where to go with it. Quote Link to comment Share on other sites More sharing options...
Tursi Posted April 24, 2022 Author Share Posted April 24, 2022 3 hours ago, apersson850 said: Yes, I got that. Still, if you want to develop it into something further, you have to debate the original intention to understand where to go with it. The original intention is stated pretty clearly in post 1. For benchmarking languages, really... just write comparable programs. Trying to compare languages and implementations was always a battle, even back in the day, since algorithm matters, what parts of the language you touch matters, what parts of the hardware you need to use matters, etc. But off the top of my head, a good quick one for the TI might be something like manually moving a sprite around the outer edge of the screen, one pixel at a time (no auto-motion). See how fast you can get it whipping around. Make it loop 100 times and then exit, so that you can time the total runtime. All the things that have been debated since counter one of the statements in that paragraph. Particularly note the "off the top of my head" comment. Indicating this was never proposed as a be-all-end-all test. I even touch on the part that any test is going to perform better in some environments than other. There's no need nor /point/ to "improving" this one... any changes to the concept create a new benchmark and invalidate all previous tests anyway. All the results here mean is "this is the timing measured on this particular benchmark". If you want a "better" benchmark, don't bother debating me, go write a better benchmark! It's fine, really! 2 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 28, 2022 Share Posted April 28, 2022 I wanted to dispute that RXB was slowest in CALL CLEAR so showed a test proving it was not. This is to show FOR NEXT is very very consistant per any XB variant on the TI99/4A https://youtu.be/UIVs_wnKeck Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted April 28, 2022 Share Posted April 28, 2022 (edited) Your example proves nothing-there is obviously something wrong with your test. First off, you show TI BASIC running faster than any of the XB's so that should tell you something is fishy. Second, RXB 2022A does the assembly on the 8 bit bus, while other XB's use GPL ALL where the loop is on the 16 bit bus. Yet you show the same speed which is not possible. The gold standard is this: "How does it perform on a real TI-99?" Reciprocating Bill tested this and got these results. --------------------------------------------------------------------------------------------------------------- Running: 10 FOR I = 1 TO 1000 (I don't see any reason to wait around for 30+ minutes to get the basic ratios). 20 CALL CLEAR 30 NEXT I ...on real iron* running out of a FinalGrom, I get: TI-BASIC 60 seconds Extended Basic 23 seconds RXB 2020 23 seconds RXB 2022 52 seconds *Console with RAM on 16-bit bus, which doesn't have significant impact on the speed of these BASICs. (Just to check the assumption that FinalGrom has no impact on these numbers, I ran the code on an Extended Basic cartridge. No difference.) ------------------------------------------------------------------------------------------------------------------ Your results are nothing like these. Therefore it follows that something is wrong with your tests or your equipment. I don't know which. Maybe you cannot run 6 copies of Classic99 at once and get accurate clock results. Edited April 28, 2022 by senior_falcon 3 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 29, 2022 Share Posted April 29, 2022 13 hours ago, senior_falcon said: Your example proves nothing-there is obviously something wrong with your test. First off, you show TI BASIC running faster than any of the XB's so that should tell you something is fishy. Second, RXB 2022A does the assembly on the 8 bit bus, while other XB's use GPL ALL where the loop is on the 16 bit bus. Yet you show the same speed which is not possible. The gold standard is this: "How does it perform on a real TI-99?" Reciprocating Bill tested this and got these results. --------------------------------------------------------------------------------------------------------------- Running: 10 FOR I = 1 TO 1000 (I don't see any reason to wait around for 30+ minutes to get the basic ratios). 20 CALL CLEAR 30 NEXT I ...on real iron* running out of a FinalGrom, I get: TI-BASIC 60 seconds Extended Basic 23 seconds RXB 2020 23 seconds RXB 2022 52 seconds *Console with RAM on 16-bit bus, which doesn't have significant impact on the speed of these BASICs. (Just to check the assumption that FinalGrom has no impact on these numbers, I ran the code on an Extended Basic cartridge. No difference.) ------------------------------------------------------------------------------------------------------------------ Your results are nothing like these. Therefore it follows that something is wrong with your tests or your equipment. I don't know which. Maybe you cannot run 6 copies of Classic99 at once and get accurate clock results. LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL! And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad. I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that! And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet! And your timing method is flawed at best. The video posted proves that! I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times. Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB? Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted April 29, 2022 Share Posted April 29, 2022 (edited) 3 hours ago, RXB said: LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL! I'm glad I could be amusing. And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad. This is not news to me. You can see all this with the Classic99 debugger. By the way, when you use the debugger to disassemble, you can see the number of clock cycles used by an instruction. For standard XB each iteration of the loop takes 50 cycles; for RXB 2022A it takes 58 cycles. I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that! Post #216 above by RXB has a video with all the proof I need. And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet! Then please explain how your test looping 100000 times only took twice as long as when you looped 10000 times, or why BASIC runs so much faster than XB in your video above. And your timing method is flawed at best. The video posted proves that! The video I posted shows what everyone else is seeing: That RXB 2022A is about half as fast as standard XB when doing CALL CLEAR. I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times. Let me get this straight. You have the ability to test this on a real TI99. Instead of doing that, you found that writing an angry diatribe was a more productive use of your time. Why don't you give it a try on real iron. You can loop it a million times or even a billion if that would make you feel better, but be sure to use 2022A. Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB? I thought this was about CALL CLEAR. Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.) This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve. Edited April 29, 2022 by senior_falcon 4 Quote Link to comment Share on other sites More sharing options...
Reciprocating Bill Posted April 29, 2022 Share Posted April 29, 2022 3 hours ago, RXB said: LOL 4.3 Ghz 12 Core AMD with 32Gig of 3900mhz RAM, 2TB M2 drive for OS and a RTX2070 Super for video card. Slow? LOL! And you are wrong somewhat my Assembly is running from ROM but uses scratch PAD Registers used by GPL in Scratch Pad. I have no clue how you get these number so wrong as for Tursi Clock program in Classic99 if you have proof it does not work show that! And I have run up to 12 Classic99 routines in my computer with no issues and never seen Tursi clock get it wrong yet! And your timing method is flawed at best. The video posted proves that! I can next run it on TI99/4A console behind me but there is no clock on the computer so best test is 100000 loop to check times. Unless you are going to claim GPL FOR/NEXT loops run at different speed on different versions of XB? C'mon Rich. On real iron with a stop watch: 10 for I = 1 to 1000 20 call clear 30 next I RXB 2020 22.6" (identical to XB) RXB 2022 51.9" Of course, the accuracy is to within ~.25 seconds (namely, my reaction time). But you don't need an atomic clock, or 100,000 iterations, to see that something changed between RXB 2020 and RXB 2022. 4 1 Quote Link to comment Share on other sites More sharing options...
HOME AUTOMATION Posted April 29, 2022 Share Posted April 29, 2022 Slow?/Fast hardware vs. accurate clock results... I don't see that the issue here is necessarily limited to hardware speed capabilities... but also O/S dynamics, such as priority/affinity. I find it somewhat unlikely that multiple running apps, having the timing/hardware requirements of emulation, would receive the same threading provisions. I'm almost certain that I conducted similar tests using Classic99, long ago on win98, or maybe it was xp, relating to my AUTOMATION innovations.... and found as expected that whichever app. was in FOCUS got the most processor time. I'm guessing that things like the availability of the video overlay come into play as well. 1 Quote Link to comment Share on other sites More sharing options...
GDMike Posted April 29, 2022 Share Posted April 29, 2022 2 hours ago, senior_falcon said: Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.) This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve. If you have TIPI then there's a clock available.. just saying.. 1 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted April 29, 2022 Share Posted April 29, 2022 The way to improve the performance of CALL CLEAR relative to XB would be to make it an unrolled loop. I think just 2 times unrolled would beat XB even though it has the advantage of running from 16-bit ROM, and 4 times unrolled would be 20% faster. But there are probably more important places where the performance could be improved by assembly routines. 3 Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted April 29, 2022 Share Posted April 29, 2022 (edited) 2 hours ago, Asmusr said: The way to improve the performance of CALL CLEAR relative to XB would be to make it an unrolled loop. I think just 2 times unrolled would beat XB even though it has the advantage of running from 16-bit ROM, and 4 times unrolled would be 20% faster. But there are probably more important places where the performance could be improved by assembly routines. Yes, CALL CLEAR is probably the least important thing to speed up. This is, after all, Extended BASIC, which is not exactly a speed demon. There is more going on here than just the 8 bit vs 16 bit databus. The loop should only be 16% slower on the 8 bit bus (58 vs 50 clock cycles), yet the end result is less than half as fast. I would guess that extra GPL instructions have been added, and a little gpl goes a long way when it comes to slowing things down. Edited April 29, 2022 by senior_falcon 2 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 29, 2022 Share Posted April 29, 2022 6 hours ago, senior_falcon said: Here's the thing. As far as I can tell, your video shows a test that should yield valid results. I'm as perplexed as you are about the results. (Actually, more so, since you cannot bring yourself to admit that there is anything wrong.) This is a matter for Tursi, Microsoft, Intel, or some other entity higher up the food chain to resolve. You are correct and I was wrong. Turns out when working on CALL COLLIDE I accidently reverted back a section of Assembly for CALL CLEAR and made it worse. I have to update RXB 2022 to send out the correction, thanks for your help. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.