Top DOs and DONTs: Intybasic game performance

+cmadruga · January 2, 2019

Hi folks, looking for lessons learned from more experienced programmers here…

In your experience, what are the top 3 ways to improve performance of Intybasic games?

In other words, what should be observed so programs will run as fluidly as possible, without noticeable slowdowns caused by the most impactful bottlenecks?

Anything would be fair game, from general principles, to advice on program structure, to the sharing of specific tricks.

I imagine some advice will probably involve trade-offs or not work in all situations, so expanding on those aspects would be greatly appreciated.

On the other hand, what would be the top things to AVOID, or watch out for?

Thanks in advance!

intvnut · January 2, 2019

Some high-level suggestions:

Lookup tables can reduce many problems down to a single array access.

ON .. GOTO (or ON .. GOSUB) instead of a stack of IF-statements. A stack of IF-statements has a penalty proportional to its depth, while an ON .. GOTO or ON .. GOSUB is a single level of lookup. In some cases, you can also replace the logic entirely with a lookup table (see first point above).

Looping over all of BACKTAB is expensive. Consider other ways of tracking the information you need if you can. It's not always possible.

Don't be afraid to read the compiler output, and at least get comfortable with reading CP1600 assembly at a high level. That way you can try different ways of writing an expression and seeing how it gets compiled. IntyBASIC does some local optimizations, but they have limited "reach", and so even slight tweaks (A = A + B vs. A = B + A) can sometimes make a difference.

A single, complex expression often performs better than multiple smaller expressions evaluated in steps. This runs counter to good programming practices, IMHO, but it often results in faster code in IntyBASIC. This goes back to IntyBASIC's local optimization model, along with the fact that IntyBASIC will store each intermediate result to memory if you break a complex expression into multiple smaller steps. You can use macros (DEF FN) to keep the long, complex expressions from getting to unwieldy.

Consider learning a smidge of CP1600 assembly, and using it for the hotspots in your program. (Or, reach out to someone who's skilled in CP1600 for a hand.) I assisted one programmer with some key assembly code routines for certain code that really did need to loop over most of BACKTAB (among other things). It gained huge speedups over what was possible in IntyBASIC natively. And, since it was isolated to a few hotspots, most of the program remained in IntyBASIC. The program as a whole went from laggy to fluid with that targeted optimization.

+DZ-Jay · January 2, 2019

Hi folks, looking for lessons learned from more experienced programmers here…

In your experience, what are the top 3 ways to improve performance of Intybasic games?

In other words, what should be observed so programs will run as fluidly as possible, without noticeable slowdowns caused by the most impactful bottlenecks?

Anything would be fair game, from general principles, to advice on program structure, to the sharing of specific tricks.

I imagine some advice will probably involve trade-offs or not work in all situations, so expanding on those aspects would be greatly appreciated.

On the other hand, what would be the top things to AVOID, or watch out for?

Thanks in advance!

Here's a tip: Every WAIT statement is a busy-wait until the next TV frame. Try to have at most one WAIT statement per game engine loop iteration, if possible.

-dZ.

+cmadruga · January 2, 2019

Some high-level suggestions:

Lookup tables can reduce many problems down to a single array access.

ON .. GOTO (or ON .. GOSUB) instead of a stack of IF-statements. A stack of IF-statements has a penalty proportional to its depth, while an ON .. GOTO or ON .. GOSUB is a single level of lookup. In some cases, you can also replace the logic entirely with a lookup table (see first point above).

Looping over all of BACKTAB is expensive. Consider other ways of tracking the information you need if you can. It's not always possible.

Don't be afraid to read the compiler output, and at least get comfortable with reading CP1600 assembly at a high level. That way you can try different ways of writing an expression and seeing how it gets compiled. IntyBASIC does some local optimizations, but they have limited "reach", and so even slight tweaks (A = A + B vs. A = B + A) can sometimes make a difference.

A single, complex expression often performs better than multiple smaller expressions evaluated in steps. This runs counter to good programming practices, IMHO, but it often results in faster code in IntyBASIC. This goes back to IntyBASIC's local optimization model, along with the fact that IntyBASIC will store each intermediate result to memory if you break a complex expression into multiple smaller steps. You can use macros (DEF FN) to keep the long, complex expressions from getting to unwieldy.

Consider learning a smidge of CP1600 assembly, and using it for the hotspots in your program. (Or, reach out to someone who's skilled in CP1600 for a hand.) I assisted one programmer with some key assembly code routines for certain code that really did need to loop over most of BACKTAB (among other things). It gained huge speedups over what was possible in IntyBASIC natively. And, since it was isolated to a few hotspots, most of the program remained in IntyBASIC. The program as a whole went from laggy to fluid with that targeted optimization.

Very good tips, thank you! Particularly the one about about complex expressions being better than multiple simpler ones, I had no idea.

Is there anything you could say about using multiple expressions like IF (FRAME AND ...)=1 THEN GOSUB <procedure> ?

I tend to hang many of those types of expressions on the main loop, and find that performance becomes obviously very sensitive to the frequency I define at those statements.

It then becomes an exercise in prioritization: for instance, some visual elements need to be very responsive, while others can afford to wait a little longer without impacting user experience. So I tend to keep juggling and iterating with those frequencies until I'm happy with the outcome.

Is this the right way to think about it?

+cmadruga · January 2, 2019

Here's a tip: Every WAIT statement is a busy-wait until the next TV frame. Try to have at most one WAIT statement per game engine loop iteration, if possible.

-dZ.

The catch here is the "at most one" expression you have used, no?

I gather from this topic that there are important implications (to scrolling routines, etc) of choosing not to have any WAITs.

http://atariage.com/forums/topic/232875-intybasic-wait-less-ness/

Do those points made back in 2014/2015 still hold today?

artrag · January 2, 2019

You can live without WAIT, GRAM updates are buffered and will be performed by the intybasic during the vertical retrace.

BTW, building the code around a WAIT in the main loop can help in having a stable frame rate.

When you have different branches of code whose execution has large differences in execution time you could experience unwanted slow down or speed up loosing the smoothness of the result.

If you are able to balance the execution time of the main loop under all conditions, you can avoid to use a WAIT

Edited January 2, 2019 by artrag

carlsson · January 2, 2019

Also as mentioned in the manual, OR is not a native operation in the CPU so it is inefficient. Sometimes you can add comparisons together, or use a different conditional construction.

+DZ-Jay · January 3, 2019

The catch here is the "at most one" expression you have used, no?

I gather from this topic that there are important implications (to scrolling routines, etc) of choosing not to have any WAITs.

http://atariage.com/forums/topic/232875-intybasic-wait-less-ness/

Do those points made back in 2014/2015 still hold today?

The important point to understand is that the STIC (video chip) of the Intellivision "owns" the graphics bus, and only relinquishes it during a brief period during the vertical blanking (VBLANK) interrupt, which occurs every 60th of a second while the TV raster resets for a new frame. That's the key.

What this means is that neither the STIC registers (MOBs, scroll, border, screen mode, etc.) nor the Graphics RAM (GRAM) can be accessed outside this brief window.

The WAIT statement causes your program flow to wait until the next interrupt. Therefore, apart from maintaining timing of your game loop, the WAIT statement is critical to provide access to the graphics sub-systems. IntyBASIC handles that for you for the most part by buffering STIC updates until the next interrupt, but you still need to wait until they occur.

What I was trying to suggest with my comment above is to avoid what many new programmers do naively which is to call WAIT after every few instructions instead of building a game loop that prepares everything that it needs before hand, then WAITs only once until the next interrupt to post all necessary changes.

You may find a bit more detail on how this works on an older post I made on the subject:

http://atariage.com/forums/topic/285847-questions-from-a-newbie-simple-color-and-animation/?p=4173944

-dZ.

artrag · January 3, 2019

Another common problem is to update the gram multiple times before the vertical retrace has occurred.

In this case only the last update is performed (actually the basic can update only two set of tiles per frame using DEFINE and DEFINE ALTERNATE for about 16 tiles in total).

Having a wait in the main loop helps to keep track of what you have updated and what is not.

+DZ-Jay · January 4, 2019

The catch here is the "at most one" expression you have used, no?

I gather from this topic that there are important implications (to scrolling routines, etc) of choosing not to have any WAITs.

http://atariage.com/forums/topic/232875-intybasic-wait-less-ness/

Do those points made back in 2014/2015 still hold today?

To answer your question directly, some of those points are still valid. The exception is that a lot of the feedback in that thread was incorporated into the language, for example the "ON FRAME GOSUB" statement is now included, which reduces the pressures of using a WAIT for that purpose.

For scrolling, the example code provided in that thread was taken from one of the sample programs included with the compiler showing how to scroll the screen. The thing is that the sample program does nothing but scroll, and thus gives the impression that the WAIT statement after scroll is needed at that precise moment.

If instead you consider a more holistic program, one that has to manage a more comprehensive game state, then you can picture the use of WAIT like this:

Compute new frame:
1. Update screen after scrolling (needs to be done very early at the start of a frame to keep ahead of the raster drawing).
2. Test for collisions
3. Check user input
4. Adjust player state
5. Adjust enemy state
6. Update SPRITE positions (buffered)
7. Update SCROLL state (buffered)
8. WAIT
9. Go back to step #1

The key complaint in that post about scrolling was what step #1 accomplishes: you need to redraw or clear the screen after crossing a tile boundary during scrolling, and you need to do it right at the beginning of a frame to avoid "tearing."

In the sample program, since there is no other game or enemy state, all you see is:

1. Update screen after scrolling

6. Update SPRITE positions

7. Update SCROLL state

8. WAIT

9. Go back to step 1

The only difference is that the loop starts on step #2 like this:

6. Update SPRITE positions

7. Update SCROLL state

8. WAIT

1. Update screen after scrolling

9. Goto back to step #6

As you can see, it's essentially the same thing, but slightly rearranged for the purpose of illustrating an example.

Ultimately, the point is that in the Intellivision there are some very specific tasks that need to happen at the start of the frame, immediately following the vertical retrace interrupt. IntyBASIC buffers these for you automatically, so you don't have to handle the interrupt, but unless your program's timing is absolutely and perfectly in synchronization with the TV signal, then you'll have to WAIT for the interrupt in order for those buffers to be applied.

Alternatively, you can encapsulate all your frame-critical code in a subroutine and call it with ON FRAME GOSUB. However, you'll still want to have your regular game loop somehow synchronize with it.

You can find a simple to understand but comprehensive description of how to use WAIT and why in this old post of mine.

-dZ.

Edited January 4, 2019 by DZ-Jay

+cmadruga · January 4, 2019

You can find a simple to understand but comprehensive description of how to use WAIT and why in this old post of mine.

-dZ.

Thanks for all the information on the WAIT statement, you explain things very clearly and make it all real easy to understand.

On that older post you mentioned, the following nugget called my attention:

For the sake of comparison, consider that the original Mattel Intellivision titles used a game engine that used three frames for each game cycle, resulting in a game running at 20 Hz. If you ever felt that the old classic titles were a bit sluggish, this is why. In contrast, most home-brews run at 60 Hz or 30 Hz, and consequently feel much more snappy.

Now that you mention it, yes, the old titles sometimes do feel a little sluggish. What was the reason for that frame rate cap embedded into Mattel's engine? Was there a technical reason, or maybe they thought users somehow expected a slower, easier to follow experience...?

+cmadruga · January 4, 2019

Also as mentioned in the manual, OR is not a native operation in the CPU so it is inefficient. Sometimes you can add comparisons together, or use a different conditional construction.

That's a good one, thanks Carlsson! Was the CP1610 unique in its absence of OR? What could have been the rationale for not incorporating it?

+cmadruga · January 4, 2019

You can live without WAIT, GRAM updates are buffered and will be performed by the intybasic during the vertical retrace.

BTW, building the code around a WAIT in the main loop can help in having a stable frame rate.

When you have different branches of code whose execution has large differences in execution time you could experience unwanted slow down or speed up loosing the smoothness of the result.

If you are able to balance the execution time of the main loop under all conditions, you can avoid to use a WAIT

Thanks, artrag! I assume you are the same artrag that I see hanging around MSX.org? :-)

To your point, in practice, how do experienced programmers keep track (and balance) execution time from all the branches? Does it come down to trial and error?

+DZ-Jay · January 4, 2019

Thanks for all the information on the WAIT statement, you explain things very clearly and make it all real easy to understand.

Sure thing! I'm glad it helped.

On that older post you mentioned, the following nugget called my attention:

For the sake of comparison, consider that the original Mattel Intellivision titles used a game engine that used three frames for each game cycle, resulting in a game running at 20 Hz. If you ever felt that the old classic titles were a bit sluggish, this is why. In contrast, most home-brews run at 60 Hz or 30 Hz, and consequently feel much more snappy.
Now that you mention it, yes, the old titles sometimes do feel a little sluggish. What was the reason for that frame rate cap embedded into Mattel's engine? Was there a technical reason, or maybe they thought users somehow expected a slower, easier to follow experience...?

Keep in mind that the Intellivision was designed in the period between 1977 to 1979. The contemporary video game machines out there were the Atari VCS and the Magnavox Oddessy2, both of which required low-level management of hardware resources with very few (if any) primitive abstractions.

In that regard, Mattel engineers (or the APh consultants, which did most of the software back then) were blazing new ground and were essentially inventing the future. There were little or no established rules, patterns, or practices in the industry, and every game was an engineering masterpiece.

In light of all that, when it came time to build the EXEC (the Intellivision's operating system and game engine runtime and programming framework), there was a concern that the speed of the processor and the abstractions included would not fit within a single frame. There was also no notion of "60 fps" games on home consoles as a standard, nor any known pressures from the market to optimize performance for twitch-reflexes. There was barely and industry or a market at all!

The main goal at Mattel was to reduced the cost of development and production, so it was considered adequate to reduce the granularity of the game engine to 20 Hz for the sake of doing everything that was necessary in order to accelerate development and reduce cartridge ROM size.

And you know what? They were right! The games may feel sluggish now, but most of them were perfectly acceptable back then, even the action games.

Of course, as the industry, the platform, and the programmers matured, the limitations of the original game engine became more and more of a hindrance, to the point that most programmers avoided it completely (like we do nowadays in the home-brew community).

It is important to note that when I and others speak of the limitations and fallibility of the Intellivision EXEC, we are doing so with over 30 years of collective game industry experience and perfect hindsight. What they did back then was not only adequate but downright impressive. The EXEC did many things back then that not even IntyBASIC does today. Things like autonomous object management and displacement, automatic sprite animation sequencing, a primitive but versatile sound-effects scripting language, object-oriented and event-driven frameworks, internal timers, etc.; all these features where ahead of their time in many ways -- and light-years ahead of what the primitive VCS could do.

It was not perfect, and lots of the issues betray a lack of foresight or experience. However, it didn't need to be perfect; it needed to make it to market, and it did.

-dZ.

Edited January 4, 2019 by DZ-Jay

+cmadruga · January 4, 2019

The main goal at Mattel was to reduced the cost of development and production, so it was considered adequate to reduce the granularity of the game engine to 20 Hz for the sake of doing everything that was necessary in order to accelerate development and reduce cartridge ROM size.

Sure, when we look back we should always make an effort to appreciate the context of things.

Just so I understand this point quoted above, the reduction in framerate leads to a reduction in development time/effort and ROM size because...

- you have more CPU time to make all abstractions work without having to worry so much about optimization?

- you can use less detailed animations, with fewer frames to store and manage?

- ...?

Comparing that situation with TODAY, when developing a homebrew nowadays, what would you say should be minimum framerates?

+DZ-Jay · January 4, 2019

P.S. For the purpose of illustration, below is a breakdown of what the EXEC did on each of its phases. This is from the top of my head, based on what I've seen. I could be wrong, so any corrections are welcomed.

Phase 1:
- Update scrolling
- Update position data for objects 0-3
- Update gram for 2 objects
- Update sound processor
Phase 2:
- Update position data for objects 4-7
- Update GRAM for next 2 objects
- Update sound processor
Phase 3:
- Handle object collisions
- Update GRAM for next 2 objects
- Check user input
- Update PRNG
- Update sound processor
- Update timers
- Dispatch processes and event handlers

Notice that the GRAM animation engine updates only two objects per interrupt, so it takes 4 interrupts (more than a single game engine cycle) to complete all of them.

Also, I believe that the sound processor is updated on every interrupt, but I am not sure.

-dZ.

Edited January 4, 2019 by DZ-Jay

+DZ-Jay · January 4, 2019

Sure, when we look back we should always make an effort to appreciate the context of things.

Just so I understand this point quoted above, the reduction in framerate leads to a reduction in development time/effort and ROM size because...

- you have more CPU time to make all abstractions work without having to worry so much about optimization?

- you can use less detailed animations, with fewer frames to store and manage?

- ...?

Sorry, I guess that part didn't clearly address your question. The reduction in development time and ROM size comes not because of the reduction in frame-rate directly.

The production cost is reduced because the EXEC serves as the "runtime engine" of the game program, it's "main" game loop; and the cartridge program becomes a set of data, subroutines, and event handlers to be used by the EXEC. Thus, the main core of every EXEC program is already included with the console, and so it doesn't need to be in the ROM.

The development cost is reduced because the EXEC already offers solutions to common problems and provides a convenient game framework and runtime engine to leverage. Thus, programmers do not have to re-invent the wheel every time they want to do something, and the hardest parts of "house-keeping" and "state-management" in a game are already addressed by the EXEC.

The reduction in frame-rate comes not as a cause but as a consequence of the above: in order for the EXEC to reduce development and production costs, it needed to do a heck-of-a-lot of stuff that game programs would normally do. It also needed to be general in purpose enough to support a wide variety of game programs, from simple betting games, to complex sports simulations, and everything in between. Because of this, emphasis was put on creating a full-featured and comprehensive EXEC, with the highest priority placed on features and not in performance.

As a natural consequence of that priority, and due to pressures for time-to-market, the programmers decided it would be more advantageous and expeditious to split the game engine into multiple phases with the assumption that 20 Hz was still a sufficiently fast pace for most games.

Given more time and money, they probably would have hit on an even better EXEC that ran at 60Hz. However, it probably wouldn't have been cost-effective, and they probably would have missed their release window of 1980.

Does that make sense?

Comparing that situation with TODAY, when developing a homebrew nowadays, what would you say should be minimum framerates?

It truly depends on the game. I think most games can get away with a frame-rate of 30Hz or 20Hz without consequence. Fast action games may want to try to keep up at 60Hz.

However, one advantage that we have now that the EXEC did not provide is a flexible engine. A 20Hz game does not mean necessarily that input will be read at 20Hz. You could still keep the game feeling snappy by updating all your graphics and player state as fast as possible, and just lowering the granularity of enemy AI.

Alternatively, you could update your graphics at 20Hz without consequence, and keep all your objects moving fast at 60Hz. There's more than one way to skin a cat, as they say.

The EXEC did not allow this: its phases were already baked into the ROM, so you had to deal with what it gave you.

-dZ.

Edited January 4, 2019 by DZ-Jay

artrag · January 4, 2019

Thanks, artrag! I assume you are the same artrag that I see hanging around MSX.org? :-)

To your point, in practice, how do experienced programmers keep track (and balance) execution time from all the branches? Does it come down to trial and error?

Hi cmadruga. Yes I'm the msx coder.

About timing of branches, in practice it mainly come down to trial and errors, but if you look the ASM produced by the INTYBASIC compiler you could compute the execution time of each opcode.

Look at cp1600_summary.txt in your ..\jzintv\doc\programming directory to have a reference on clock cycles. For beginners, the simplest way is to use a WAIT.

+DZ-Jay · January 4, 2019

Hi cmadruga. Yes I'm the msx coder.

About timing of branches, in practice it mainly come down to trial and errors, but if you look the ASM produced by the INTYBASIC compiler you could compute the execution time of each opcode.

Look at cp1600_summary.txt in your ..\jzintv\doc\programming directory to have a reference on clock cycles. For beginners, the simplest way is to use a WAIT.

The debugger also includes a rudimentary profiler that will tell you the cost (in cycles) of all subroutines (or any code block re-entered) called. It is still in Assembly Language, but IntyBASIC includes a means to map the IntyBASIC source to the Assembly Language listing file.

If you are using the IntyBASIC SDK, that's already set up for you, and all you need to do is use the "intydbug" command to run the debugger.

-dZ.

Top DOs and DONTs: Intybasic game performance

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members