Jump to content
IGNORED

BallBlazer framerate


VladR

Recommended Posts

- The goal here is to have a code that draws actual pixels, and does not butcher the framerate by beam racing which would halt CPU for 50% of the frame ((2*48)/192 = 0.5)

- This means we cannot use HScroll (implies beam racing), so our code has to be able to handle the fine scrolling on CPU

 

- I have a working generic C version that draws the playfield, so I have access to all scanlines, lengths and positions of each quad - e.g. real data at the target resolution.

- I have an 80% finished Asm port of the above, which surprisingly takes only 45-50,000 cycles - which is not bad, given that it's bunch of nested loops that could handle any kind of number of quads and width of quads

 

Here are some of the easy-to-implement options with their pros/cons:

 

Option 1: 60 fps with 128 KB RAM

- Pre-rendered static frames of the field: 160*48/4 = 1,920 Bytes per frame

- 48 frames (for forward movement) would take 90 KB

- as bank switching is cycle-instant, "rendering" would take nothing

- We can use HScroll and still have ~11,000 cycles for input,audio, PMG and bank switching

- this would be less tight on PAL

- I think this is the most realistic scenario to get 50/60 fps on a stock machine via redrawing the playfield each frame

 

Option 2: 20 fps but with 512 KB RAM

- Unrolled code for the 160x48 playfield takes 9,600 Bytes and 11,520 cycles (full screen of data in excel, based on real data from C prototype)

- 48 frames to handle forward movement then take 450 KB

- this would still need HScroll, that would butcher the available CPU time to something around 11,000 cycles, so 2 frames for two fields, and third frame for audio/input/PMG/scoring

- so it's absurd to require half meg and only get the same framerate as we have now

- but I needed to see the RAM/performance characteristics of unrolled code, if only to rule it out completely

 

 

I then went through designing and combining 4 different algorithms with various amounts of loop unrolling and tables, most of the performance estimates done in excel, as I already had the basic substages of the rendering implemented, thus helping me rule out scenarios that are too memory-intensive or too cycle-intensive.

 

Option 3:

- storing all variations of horizontal scrolling would require several MB of RAM

- so bruteforcing our way in this direction is obviously out of question

 

Option 4:

- it's slightly more manageable to only store each scanline's code as many times as is the length of the segment (e.g. if we have 9 segments, and each is 15 pixels long, we'd store it 15 times).

- This results in 543 KB of unrolled code, so would not fit even into 0.5 MB extension, so still not an option, but good to rule out early

 

Option 5:

- unlike Option 4's code storage, we rather store the data of the segment

- this takes just 50 KB for all permutations of all scanlines

- this is thus reasonable candidate for further work implementation or cycle estimate

 

Option 6:

- Heavy Compression

- Looks like yesterday I have accidentally stumbled upon an acceleration structure that has very interesting RAM-saving properties

- Surprisingly, it takes only 21 KB to store all HScroll permutations

- I confirmed it manually that it works on 7 scanlines randomly scattered across the field

- I need to first implement it in C to confirm it does indeed work on all scanlines

- this is however something that would work on 64 KB machine, though I doubt decompression will be faster than the 11,000/2 = 5,500 cycles that the Antic steals

- but it would be great to confirm / rule out that despite the heavy cycle stealing from Antic, it'd be still the preferred method

 

Link to comment
Share on other sites

Option 1: Yes, why not ?!?

 

Option 2: No, thanks !

 

Option 3: No ! (several MB ? even U1MB would not be enough)

 

Option 4: 64k base RAM + 512k XRAM = 576k total memory, this could maybe work, but I would rather say No!

 

Option 5: sounds good !

 

Option 6: also sounds good.

 

So, do it the "Bosconian" way and continue with a) a 128k version with 50/60 fps (fast) and b) a 64k version with less (slow)...

Link to comment
Share on other sites

A lot of this goes over my head, but that video shows 800 and 5200 in PAL for sure, the speed of the music clearly indicates this. (This is what I was used to....BITD).

 

The 5200 in PAL speed is weird as there never were any PAL 5200s but I guess its easy enough to run it like that in emulation.....

 

Im not sure, but the 7800 plays the music at the speed that I recognise as the speed of it being played in NTSC on 800.

 

It all looks like emulation to me plus dont forget that YouTube screws up video too....

 

Whatever the outcome of this all..there really is only one real,version of B.B..... :)

 

(Must say Im rather impressed with the C64 version....wouldnt have expected it to be that fast on it)

Edited by Level42
Link to comment
Share on other sites

Option 1: Yes, why not ?!?

Option 2: No, thanks !

Option 3: No ! (several MB ? even U1MB would not be enough)

Option 4: 64k base RAM + 512k XRAM = 576k total memory, this could maybe work, but I would rather say No!

Option 5: sounds good !

Option 6: also sounds good.

So, do it the "Bosconian" way and continue with a) a 128k version with 50/60 fps (fast) and b) a 64k version with less (slow)...

All these absurd 512+ KB options are however good for sparking ideas on alternatives.

 

Whatever the outcome of this all..there really is only one real,version of B.B..... :)

Sure, but this research can bring techniques that can be used for other games. Maybe even inspire somebody else to do some game based on this.

 

There may be little point in recreating BallBlazer, and being in U.S., I'm pretty sure it's not exactly a brilliant idea to parasite on LucasArts trademark :)

 

But perhaps a quick minigame running at 60 fps over checkerbox field could be made ? Say, something that C64 doesn't have at that framerate ?

  • Like 1
Link to comment
Share on other sites

Look ballblazer on my real c64 doesn't look as good as in that video... I don't think they're using a real machine.... or somethings amiss in the encoding of youtube or whatever. Playing and looking at them side by side at the same time.... the Atari is smoother faster, and though the sound on the c64 is nice, I even sorta kinda like pokeys ruffer grittier sound for this head to head game!

Edited by _The Doctor__
  • Like 1
Link to comment
Share on other sites

The problem with re-doing a Ballblazer like game - is it's going to be compared to the original - and will anyway end up being a lesser version of the game. Which can be alright for someone new to game design - and just want to experiment a bit with it.

Introducing any kind of new elements to it - seems highly unlikely - because there would be a lack of resources? to do so. And this will undoubtedly slow it down a great deal - as to make it unworkable?

 

You can compare such a project to what happened to Encounter! That someone did a PD game version of it - which was OK to play - looked a bit different and was a very good effort overall.

 

Harvey

Link to comment
Share on other sites

I imagine they considered pre-rendered frames. But it all goes out the window when you have to consider catering for the borders adjacent the checkerboard.

There does seem to be some optimization going on otherwise why bother with HScrolling?

 

My thought on the matter - forget pre-rendering. Do the playfield in character mode and use VScrol tricks to truncate their height where required.

But really, it's reinventing the wheel. You won't produce a better game than what's already been done. And it'd be a big chunk of work to replace what's there.

 

A conservative alternative IMO would be to optimize the existing rendering using unrolled loops in extended Ram. But even then it's not worth the effort unless a decent framerate improvement occurs.

  • Like 3
Link to comment
Share on other sites

Look ballblazer on my real c64 doesn't look as good as in that video... I don't think they're using a real machine.... or somethings amiss in the encoding of youtube or whatever. Playing and looking at them side by side at the same time.... the Atari is smoother faster, and though the sound on the c64 is nice, I even sorta kinda like pokeys ruffer grittier sound for this head to head game!

Aha, so my expectations were correct :) the .79MHz extra has to have at least some effect on this I guess....

 

And yes....even though technically SID may be "better", I've never liked the muffled sound of it...it always sounds "forced" to me....POKEY sounds more open, direct....the high notes on the tune sound very clear , the echo sounds realistic on POKEY and (to me very important) bass sounds have always been sooooo much better on POKEY and they play a very important role in BB's music. What if we would have had Stereo POKEY already in those days and the Ballblazer music hadn't been done in stereo....wow !! Indeed, because of its "randomness" it never really seems to tire either....simply brilliant stuff...

 

 

Two thoughts:

Maybe instead of recreating the entire game it would be more fruitful to try and adapt it to play head-to-head online ? Because frankly, these kind of games are the best in two player and I can hardly ever find a worthy opponent to play against ??

 

The other is that although it's a brilliant game and a lot of fun alround, I always wondered why Lucasfilm didn't make some "spin-off" game using the same graphics "engine" but making it a 3D shooter. Imagine that the rotofoil would have a gun mounted and static/moving targets/enemies to fire at....i figure that was probably the difficult thing as said from the ball and the two goal poles, there really isn't anything else happening on the screen....but if they had come up with a 3D shooter like that, it would have been brilliant.

 

All in all....games like these (and lots of other, also non game related stuff) always made me glad I had an A8. Usually the A8 at least has the "runner-up" graphics, but very often the gameplay is simply the best, almost every arcade conversion is better on the A8'then any other system. This is also why these kind of comparison videos really don't tell the entire story because watching a emulated and compressed video is not the same as actually playing the game. As to speak with Atari's slogan: you don't feel it :)

Edited by Level42
  • Like 1
Link to comment
Share on other sites

The problem with re-doing a Ballblazer like game - is it's going to be compared to the original - and will anyway end up being a lesser version of the game.

That's OK with me. I'm more interested in making the engine/effect faster.

 

Besides, given the thousands different games on Atari, the time to come up with killer games has passed almost half century in the past. And while with 1MB expansions I have several ideas for such killer games, their scope is so big, that I won't be able to pull it off unless I retire :lol:

 

But, given the memory expansions, we can now throw more memory at the problem and see how much more performance it will give us. That's the new element here that I'm exploring. Yes, I know - they've been around for over a decade.

 

Introducing any kind of new elements to it - seems highly unlikely - because there would be a lack of resources? to do so. And this will undoubtedly slow it down a great deal - as to make it unworkable?

Absolutely - bringing additional AI/gameplay elements on top of the existing ones will only worsen the framerate, as every cycle counts on 6502.

 

But what if you use the engine to create a completely different game ?

How about we use display-list mirroring on the floor playfield and get the ceiling for free ? That could run at 60 fps much easier, as we just halved both the HScroll performance impact (only 48 scanlines instead of 96 engage in beam racing) AND the playfield rendering (only 48 scanlines instead of 96), yet we have on-screen 96 scanlines of the checkerboard.

 

I could be wrong, but I don't think there's many 50/60 fps action games on Atari that have a dynamic checkerboard.

  • Like 2
Link to comment
Share on other sites

take a look at crazy maze race, while this is very different I can see a kind of understanding here, I am all for where VladR is thinking and trying and pushing.. I don't know how many times I've heard it's all been done, we can't get any better etc etc etc.... and every time we get our arses kicked.... yep look at that color, how did they get that sound, wtf?!? look at how smooth... holy chit VIdeo! nah it can't be a Movie with sound.... can't top it.... nah freakin kidding me, color video movie with sound!.... hot d*amn listen to that bass.... wow digitized sounds of quality with fast moving complex gameplay..... lord how'd they get all those colors in this game.... what the f*ck how many enemies are on that screen... I can keep going but you have got to understand that every time that limit gets pushed, it's broken, it's rebuilt, it exceeds what ever came before....

  • Like 3
Link to comment
Share on other sites

I imagine they considered pre-rendered frames.

Actually, I don't think they did. Was even 64 KB Atari on the market when this game was out ? They didn't really have any choice.

 

Now we have 0.5 MB, 1 MB, 4 MB expansions, so from engineering perspective, it's very curious to try and see if by throwing more memory at the problem, we could get a better performance.

 

 

There does seem to be some optimization going on otherwise why bother with HScrolling?

HScroll over 48 scanlines eats up roughly ~5,000 cycles of CPU or about a 100 per scanline, right ?

Since we have 40 Bytes on the scanline -> 100 / 40 = 2.5 cycle to rotate all bits per each byte.

 

That's obviously impossible to do on 6502, so HScroll totally makes sense. Plus, we can change few colors per scanline, as we're doing STA WSYNC anyway.

 

 

My thought on the matter - forget pre-rendering.

I haven't yet gone through all permutations. There's still a small chance something useable might pop up. But I've almost exhausted all of them already.

 

 

Do the playfield in character mode and use VScrol tricks to truncate their height where required.

The playfield is 40*6 = 240 characters.

How would you do fine scrolling ? Wouldn't Antic just scroll whole 8 scanlines instead of 1 ?

I don't think I follow what you mean by VScroll trick truncation.

 

 

But really, it's reinventing the wheel. You won't produce a better game than what's already been done. And it'd be a big chunk of work to replace what's there.

Yes, it's a lot of work :)

 

 

A conservative alternative IMO would be to optimize the existing rendering using unrolled loops in extended Ram.

Oh, you mean reverse-engineer the binary ?

 

 

But even then it's not worth the effort unless a decent framerate improvement occurs.

For me it's worth it, if I can get a double checkerboard running at 50/60 fps with enough cycles remaining for some simple gameplay,input,audio (without dropping to 25/30 fps).

 

Perhaps at the end of this exercise, we'll find we'd have to go for narrow mode to sustain 50/60 fps - or some other limitation - I don't really know yet.

 

Even in a worst-case scenario where 50/60 fps would be unsustainable, I will have some new routines and algorithms (I already noticed one improvement today) to try with my flatshader, so it's not going to be a total loss...

 

 

 

I really enjoy the collective brainstorming at the 8-bit section of AA. I am learning so many things...

Link to comment
Share on other sites

The other is that although it's a brilliant game and a lot of fun alround, I always wondered why Lucasfilm didn't make some "spin-off" game using the same graphics "engine" but making it a 3D shooter. Imagine that the rotofoil would have a gun mounted and static/moving targets/enemies to fire at....i figure that was probably the difficult thing as said from the ball and the two goal poles, there really isn't anything else happening on the screen....but if they had come up with a 3D shooter like that, it would have been brilliant.

It wouldn't be very difficult to merge the checkerboards with flatshader and throw some simple 3D objects into the scene, but would you then be ok with framerate of 15-20 fps ?

Basically roughly same as BallBlazer.

  • Like 2
Link to comment
Share on other sites

A slight rethink on pre-rendering - it could be done in a way to cater for borders, so a rough figure there would be ~ 70 bytes per scanline in bitmap.

 

Possibly they didn't go that way because they wanted the flexibility to also use the playfield for other things if needed.

Link to comment
Share on other sites

I have worked through the Option2 again and realized that we could globally sort all STAs per LDA value:

 

- the field is rendered via 40 STAs per scanline (each taking only 4 cycles)

- but instead of rendering scanline by scanline, we group all STAs by the LDA value

- one field thus takes only 7,700 cycles (compared to earlier 11,520, which is 50% more)

 

- RAM consumption dropped accordingly down to 5,780 Bytes for the whole field, thus 48 frames would take 270 KB

- of those 270 KB, first 18-24 scanlines would repeat several times, so it would easily fit within 256 KB

- while certainly large, the amount of RAM is not as obscene as in other scenarios and I wouldn't feel embarrassed to ask for 256 KB

 

- I believe this is the fastest possible way on Atari to render the 160x48 checkerfield

- We would employ HScroll to get horizontal scrolling and loose additional 5,500 cycles

 

BallBlazer scenario: Two distinct playfields:

- 2 * (5500 + 7700) = 26,400 cycles

- on PAL, the rendering would be possible at 50 fps, but the audio+game logic would highly likely spill to next frame, hence resulting in 25 fps / 30 fps (NTSC)

 

Other game scenario: Two mirrored playfields (floor + ceiling):

- via Display List mirroring, we would get the ceiling rendered for free, just with the cost of HScroll

- 7,700 + 5,500 + 5,500 = 18,700 cycles

- on PAL, this would be enough cycles for some simple gameplay (e.g. endless runner or something with very simple AI) and to retain 50 fps, probably not on NTSC though

Link to comment
Share on other sites

a gun to harm each other, not to kill opponents but to stun or knock opponent out of way... projectile is no different than the ball in some respect except it's not something you could catch... it just knocks the crap out of you!

Edited by _The Doctor__
Link to comment
Share on other sites

A mate and I attempted this back in the 80`s. We hacked a Pole position cart to confirm how it worked.

Then thought we could produce Space Harrier :-D The clever thing about Ballbazer is the limited amount of P/M`s it uses.

And as has been mentioned you lose so much processing time, although thinking about it now maybe you could get some back with pokey

interrupts for colour changes.

Our version used wsync (eek!) was struggling for processor time, and had 2 players and the missiles free after using 2 for the main player.

Shame I did`nt know what double buffering was, but we did have 4 checkerboard colours.

The result is on the left.

  • Like 1
Link to comment
Share on other sites

It wouldn't be very difficult to merge the checkerboards with flatshader and throw some simple 3D objects into the scene, but would you then be ok with framerate of 15-20 fps ?

Basically roughly same as BallBlazer.

Sure , why not? BB still looks smooth and fast enough to me :) Edited by Level42
Link to comment
Share on other sites

Paul Woakes did an incredible job with Encounter! I think Andrew Bradfield remarked to me - that he couldn't play it anymore - because it was such an intense game - meaning that the hyperwarp sequence between levels eventually got the better of him. Which I can agree to - is pretty manic - the sound effects do really get your heart pacing.

But if you were to combine elements of Encounter! with Ballblazer (even if this was possible... most will think not?) you will think it will take a significant hit in frame rate...

At a guess it's highly likely that the design team - would have thought about reusing the engine for other variants - and if they did have something workable - they would have released a different version of it... Because I think you can see it happening with Rescue from Fractalus with the other two newer/later games...

Which I thought was not that much better at all - in my own opinion and tastes.

 

I do think it can be possible theoretically to come up with new (or appear to be newish) game designs for these old Atari 8-bit machines - that can look somewhat fresh and different - that it is legitimately an artistic medium as such - a new type of - but because so much has been done already - it is very hard to produce something that looks refreshing and untried? before? No doubt there is some old game worth trying for? Maybe like Speed n Jump maybe... for someone new to programming - who simply wants to do something worthwhile while learning game programming that is new to him or her?

 

No doubt there are other arcade games out there - which can be picked up - for possible conversion. Black Widow? Or how about the SNES games of Kirby's Dream Course or Tetris Battle Gaiden?

 

Harvey

Edited by kiwilove
  • Like 2
Link to comment
Share on other sites

Black Widow would require two joysticks.....Not impossible, but just saying...

 

White talking vector games, I'd "kill" for Gravitar.....heck, even the 2600 got a decent port of that....why Not the A8 ?

 

But I think this thread is about BB, it's frame rate and what could be done with it....so let's keep on track :)

Edited by Level42
  • Like 2
Link to comment
Share on other sites

Black Widow would require two joysticks.....Not impossible, but just saying...

 

White talking vector games, I'd "kill" for Gravitar.....heck, even the 2600 got a decent port of that....why Not the A8

 

But I think this thread is about BB, it's frame rate and what could be done with it....so let's keep on track :)

Please, by all means, keep suggesting examples of other games (ideally make a post with YT vid), especially from other platforms that might be a good fit for something like this.

 

After Atari 800XL, I became a PC-only gamer for over a decade (then PS1-PS4), so all the 16,32,64 bit consoles just ran past me (I am slowly collecting them now - right now I got 3DO, Jaguar, Saturn). On a weekly basis I keep discovering new games, just by watching Top 10 games on <insert random console here> on YT.

 

Never heard of Gravitar, gonna look it up.

Link to comment
Share on other sites

Sure , why not? BB still looks smooth and fast enough to me :)

Well, few things:

- framerate in 3D varies greatly, even with same polycount between frames, due to higher number of scanlines, and more pixels to draw, when you get closer to an object

- the checkerboard field, even during fast movement does not jump across the screen between two frames too much, as we're talking single scanlines here

- but a 3D object, depending on speed of movement, may jump more than 10-15 pixels, and that's something that you don't even have to think about, but your brain will instantly notice (the evolutionary advantage of humans)

 

- so 15 fps for checkerboard and 15 fps for 3D objects are unfortunately two very different things, perception-wise

 

On the other hand, since it's looking more and more that it's going to be possible to render 2 checkerboards (via DL mirroring) during 1 frame, since those two fields will take up majority of screen space, the 3D objects won't have to be large (to avoid the empty screen feeling) - we would already have a floor and ceiling (covering substantial portion of screenspace), so we can spend next 2 frames on simple objects (that shouldn't require too many cycles), and fourth frame for everything else (input, audio, AI, gameplay, score, ...).

Link to comment
Share on other sites

While I was staring at the 1,000+ numbers in my Excel sheet, few numbers popped out and I realized something I should have realized at least 2 days ago, but oh well - better late than never...

 

If I'm interpreting the following quote from AnticTimings.txt properly, then the exact cost of WSYNC is 105 cycles per scanline:

WSYNC: When an STA WSYNC is executed, Antic takes 1 cycle to respond before halting the CPU. It releases WSYNC on cycle 105 on a scanline.
This has the appearance of the CPU restarting on cycle 104, but it's really that you get 1 cycle after STA WSYNC and restarting on 105.
Note that if Antic is doing DMA on the cycle immediately after STA WSYNC completes, the CPU misses that extra cycle (due to the DMA).

 

1 field (160x48) thus would burn (for the HScroll): 48*105 = 5,040 cycles.

2 fields : 5,040 * 2 = 10,080 cycles.

 

My first iteration of CPU HScroll was consuming 101 cycles per scanline, so I almost gave up. But 2 hours (and 3 optimization stages - page0, unrolling, splitting tables to remove INX, ...) later, it's taking 50 cycles per scanline for half of the field and 100 for another quarter of the field.

 

First 11 (out of 48) scanlines are still faster on ANTIC, but the remaining 37 are faster on CPU. Here's the current breakdown of the hybrid HScroll solution:

2,400 cycles : 37 scanlines done on CPU

1,155 cycles : 11 scanlines done on Antic (11*105)

--------------------

3,555 cycles : Total CPU cost for HScroll

 

This hybrid solution saves 5,040 - 3,555 = 1,485 cycles for one field compared to doing this just on ANTIC

The unrolled code takes only 884 Bytes, and uses 640 Bytes of tables and 96 Bytes of Page0.

 

At 160x192x4, Antic steals:

201 cycles: Display List (192:lines + 9:head/tail )
7,680 cycles: Framebuffer
1,728 cycles: Refresh
---------------
9,609 cycles: Total

 

This leaves us (to the best of my current knowledge) with:

29,859 - 9,609 = 20,250 available on NTSC
35,568 - 9,609 = 25,959 available on PAL

 

Please let me know if you see some discrepancies in this calculation.

 

Let's revisit the two scenarios again (see few posts above):

BallBlazer scenario: Two distinct playfields:

- 2 * (3555 + 7700) = 22,510 cycles

- on PAL, rendering is possible at 50 fps

- we are left with ~3,450 cycles - input and PMG logic would fit there, but not audio

- so audio would highly likely spill to next frame, hence resulting in 25 fps / 30 fps (NTSC)

 

Other game scenario: Two mirrored playfields (floor + ceiling):

- via Display List mirroring, we would get the ceiling rendered for free, just with the cost of HScroll (the Antic framebuffer cost is already included in the framebuffer cycle cost of 7,680)

- 7,700 + 3,555 + 3,555 = 14,810 cycles

- on NTSC we'd have 5,440 cycles

 

- on PAL, we would still have 11,149 cycles in the frame: this would surely be enough cycles for some simple audio&gameplay (e.g. endless runner or something with very simple AI) and to retain 50 fps

 

Link to comment
Share on other sites

for any hardware related timings please refer to Avery's brilliant Altirra Hardware Reference guide...

 

since 30 years I suspect Wsync to take up to 105 cycles til CPU is "in sync" with ANTIC... not that WSYNC always takes 105 cycles... that would make it a total waste to use... as we would have less cycles per raster interrupt than VIC2's 63...

Link to comment
Share on other sites

For what it's worth... maybe.

Some cycles could be regained by turning PMGs off in DMACTL where they're not used. By the looks of the DLI structure within the game it could probably be done without need to add more.

That's 5 cycles per scanline saved for no PMGs where they're not needed at a cost of maybe 24 cycles. Potential cycles saved per frame could be about 300.

 

They already use narrow DMA mode for the text area, so it seems they did at least attempt some cycle saving measures.

Edited by Rybags
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...