Jump to content
IGNORED

BallBlazer framerate


VladR
 Share

Recommended Posts

It's been few decades since I played it on real HW, but today I noticed a pretty low framerate on Atari800 while watching Let's Compare BallBlazer (forward to 12 min):

 

The 7800 (forward to 15 min) is much smoother and appears to run at 60 fps (or 50, not sure if the vid is PAL or NTSC) ?

 

On A800, even when the camera moves slow, the framerate is pretty jerky, and I don't think it's 30 fps either, looks around 15-20 honestly (e.g. 3-4 vblanks).

 

I did a search on this subforum, but only one thread with 464 pages popped up (C64 vs A800), so that didn't help.

 

Because if it's only 15fps, then from coding perspective:

- it should be entirely possible to fill the scanlines manually on CPU

- there's no need to clear the framebuffer, just redraw the 48 scanlines (but each scanline has fixed address, so unrolling could be used to maximize performance)

- 160x48/4 = 1,920 Bytes that need to be written for vertical movement

- Horizontal movement is just simple scrolling (which in itself should run at 60 fps, as there's no redrawing involved, unless we're at the arena edge where the framerate seems to drop on A800 even further).

 

 

 

Is it known what's the framerate of BallBlazer on A800?

 

Link to comment
Share on other sites

The game is rock solid on real hardware. The Atari's scrolling is magically fast and requires negligible resources. The game is so smooth it could stand by itself as a demo.

 

The person doing the comparison video must have issues with the capture method or used an emulator that couldn't handle it.

  • Like 3
Link to comment
Share on other sites

The source video is 30 FPS.

By the looks of all the systems it's probably emulator output for all of them.

 

The Atari 8-bit version does look a bit sub-par. There can be various reasons. Possibly an external utility was used to capture the action rather than facility within the emulator. Possibly the capture was at a low framerate. Maybe the video during post-processing was inadvertantly reduced in framerate. Maybe they played it set to PAL which can mean repeated frames if converted to 60 FPS.

 

It's been a while since I've played the real thing, not sure what draw strategy is used. Pretty sure that RoF, Eidelon and Koronis Rift use the modern day method where action is realtime and the rendering just occurs at whatever rate the system can obtain.

Ballblazer is probably the least demanding of the Lucasfilm games on the system. Atari's advantage over virtually all others is thanks to features like per scanline LMS.

Whether it attains full framerate all the time I'm not sure... and it might be another case of PAL being with the advantage thanks to more cycles per frame than NTSC.

Link to comment
Share on other sites

That's how it sounds in PAL... I should have realized that earlier, so that answers that question.

I think probably a case of "lost in translation" ie - frames have been discarded/repeated at some point which makes it seem worse than reality.

Edited by Rybags
  • Like 1
Link to comment
Share on other sites

The source video is 30 FPS.

By the looks of all the systems it's probably emulator output for all of them.

 

The Atari 8-bit version does look a bit sub-par. There can be various reasons. Possibly an external utility was used to capture the action rather than facility within the emulator. Possibly the capture was at a low framerate. Maybe the video during post-processing was inadvertantly reduced in framerate. Maybe they played it set to PAL which can mean repeated frames if converted to 60 FPS.

 

It's been a while since I've played the real thing, not sure what draw strategy is used. Pretty sure that RoF, Eidelon and Koronis Rift use the modern day method where action is realtime and the rendering just occurs at whatever rate the system can obtain.

Ballblazer is probably the least demanding of the Lucasfilm games on the system. Atari's advantage over virtually all others is thanks to features like per scanline LMS.

Whether it attains full framerate all the time I'm not sure... and it might be another case of PAL being with the advantage thanks to more cycles per frame than NTSC.

Yeah, I was in PAL territory when I played it, so it certainly helped the framerate. But, I still wasn't as sensitive to framedrops, as I became 3-4 years later, on a PC.

 

There’s an interesting conversation on Ballblazer using self-modifying code to speed things up here 20:20 into the interview with Dave Comstock:

 

http://ataripodcast.libsyn.com/antic-interview-316b-dave-comstock-part-2

Interesting interview. Not any ground-breaking technical info there - just common sense, really(unroll the loop).

 

Still very nice to hear some details on it from people who actually were there!

Link to comment
Share on other sites

Ballblazer does not run at 60 fps on stock hardware. It runs at 3 frames/tick, so 20 fps on NTSC and 16.7 fps on PAL. The posted video is of the game running in PAL, so it will look choppy due to the lower frame rate. While the Atari hardware scrolling does help a lot, the game has to spend a substantial portion of the frame in a giant DLI that uses much of the CPU power. The PAL advantage is negated because the main loop is locked to vsync and partial frame time is lost.

 

It is, however, one of the few games that runs at variable frame rate and will gracefully scale up with a faster CPU. At 3.5MHz it runs considerably smoother and at 7MHz it can hit 60fps.

  • Like 3
Link to comment
Share on other sites

The 7800 version has always looked smoother.

 

I believe Ballblazer's grid is done entirely in the DL. If it isn't, it certainly could be. Just by shifting each line left and right and possibly inverting the palette on certain lines you can get all the graphics needed. I also think it's possible to improve the rotofoils if you're willing to do more manipulation of the PMGs.

 

It would be cool if it could be sped up, but I imagine it's already one of the more optimized pieces of software out there.

Link to comment
Share on other sites

um because we seem to want to throw away the extra 5 and 10 ntsc has instead of adjusting the code that's an advantage right...ugh whatever.... Next comes the splitting of hairs... it should always be a damn near wash when NTSC is properly converted to PAL and the reverse should also be true... 25 30 / 50 60 whatever

 

Ball Blazer is discussed a bit here on Atari Age already, might want to link up those threads... A full on proper conversion to PAL would be nice for the PAL world about 21 FPS should theoretically be possible...but I am sure there will be some issues.

 

horror of horrors to see ball blazer so choppy!

Edited by _The Doctor__
  • Like 1
Link to comment
Share on other sites

Ballblazer does not run at 60 fps on stock hardware. It runs at 3 frames/tick, so 20 fps on NTSC and 16.7 fps on PAL. The posted video is of the game running in PAL, so it will look choppy due to the lower frame rate. While the Atari hardware scrolling does help a lot, the game has to spend a substantial portion of the frame in a giant DLI that uses much of the CPU power. The PAL advantage is negated because the main loop is locked to vsync and partial frame time is lost.

 

It is, however, one of the few games that runs at variable frame rate and will gracefully scale up with a faster CPU. At 3.5MHz it runs considerably smoother and at 7MHz it can hit 60fps.

So I was accidentally right when I said it's between 15-20 fps :lol:

 

- There's several DLI color changes that are instantly visible, so that's quite a few cpu-killer STA WSYNCs just for starters.

- Not sure if the two large single-color sky patches are also at 160x192 resolution (they wouldn't have to, till they reach the sky gradient, which might be a GTIA mode), but if yes, then that's also quite a few Antic cycles lost, right there.

- I suspect the DL has each scanline line with LMS: a separate address (another 3 cycles lost per scanline)

- Has somebody run it on 65816 ? I saw the Fractalus vid@7 Mhz, and that was pretty cool.

 

 

It runs at 3 frames/tick.

This is actually great news. This is giving me ideas, what's possible, in terms of fillrate. 3 frames is a lot, if it's not wasted on color changes !

- On PAL, that's 35,568*3 = 106,704 cycles. Minus ANTIC highway robbery, of course.

- There's 2*48 = 96 scanlines of the field, giving us roughly 1,000 cycles per scanline.

- A scanline at that res is -what- 40 bytes long ? 32 in narrow mode

- each scanline only needs 4 versions of unrolled code, to account for all 4 4-pixels / byte variations on the boundaries between 2 colors

- the single color run of each quad would be just couple STAs

 

A horizontal scrolling would run at 60 fps, as it wouldn't require redrawing of playfield (just HSCROL)

Vertical scrolling would require redrawing the field, but those wouldn't be fake Antic pixels, but real physical computed pixels :lol:

To avoid color changes that would induce STA WSYNC, the sky and fog would be in GTIA modes, so during gameplay, we wouldn't need to change colors (only when a goal is scored, at which point it wouldn't matter if it dropped a frame, though it shouldn't happen)

 

um because we seem to want to throw away the extra 5 and 10 ntsc has instead of adjusting the code that's an advantage right...ugh whatever.... Next comes the splitting of hairs... it should always be a damn near wash when NTSC is properly converted to PAL and the reverse should also be true... 25 30 / 50 60 whatever

 

But with PAL VSYNC there is nothing between 16.7 and 25 fps. It's either one or the other. You are 1 cycle over the threshold, and you drop from 25 to 16.7 fps.

 

If you have a framebuffer game, sure, you can switch LMS in the middle of the frame, and at the cost of some tearing (largely acceptable), so it's easy to get yours, say, 21 fps on PAL (or any framerate for that matter)

 

 

A full on proper conversion to PAL would be nice for the PAL world about 21 FPS should theoretically be possible...but I am sure there will be some issues.

- I don't have an estimate on how much CPU the audio effects take during gameplay, but since on 6502 you only consume cycles when you change the audio registers, it doesn't look like every single frame a register change is happening (but, could be wrong on that - some sounds are indeed deceivingly complex), but I would hazard a guess it's not 1,000 cycles per frame.

- Music, of course, only plays outside of actual gameplay, so it doesn't matter how expensive music is (the playfield slows down considerably when it plays, most probably to offset the audio cost).

 

Scenario 1: Brute-Force redraw (completely unrolled scanline drawing code (e.g. no loops))

- I am pretty sure, that at narrow resolution (128x192), you could get this to run at 25 fps on PAL, even with brute-force redraw

- 1 frame on PAL is about 35,568 cycles

- Antic would steal each frame: 96*9=864 (Refresh), 96*3 + 9 = 297 (DL), 128/4*96 = 3,072 (FrameBuffer) -> 864+297+3,072 = 4,233 cycles

- After Antic toll, we have 35,568 - 4,233 = 31,335 cycles available for the game

- The skybox via GTIA would take roughly additional 500 cycles, so we're down to 30,800 per frame

- Two frames (e.g. 25 fps) result in 2*30,800 = 61,600 cycles

 

- there's no clearing of the framebuffer (as it always occupies same scanlines), which saves easily 50%-75% of a frame time

- since you have fixed screen position of both playfields, you can use fastest possible quad-pixel (1 cycle per pixel) drawing via 4-cycle STA $2000, STA $2001, STA $2002, ...

- Drawing 2 playfields (each 128x48 = 6,144 pixels) results in 6,144 cycles

- But we need to change the color few times at each scanline via 2-cycle LDA #Color. How many times ? 3x on the nearest scanline and 11 times on furthest, so on average (11-3)/2 = 4x per scanline. That means 4x 2 cycles = 8 cycles / scanline -> 2x48x8 = 768 cycles for whole screen

- But there's one more thing to account for : The byte boundary, which happens as often as the color change, so on average 4x per scanline, and it's just another 2-cycle LDA, so for both playfields it takes 2x8x48 = 768 cycles for whole screen

- There's also going to be a logic (per each scanline) which will choose which of the 4 codepaths to JSR into (depending on starting X offset) - this should be under 25 cycles / scanline, so 25*2*48 = 2,400 cycles for whole screen

 

So we have: 6,144 + 768 + 768 + 2,400 = 10,080 cycles, which is mere 16% of our frame budget of 2*30,800 (61,600).

 

We still have 50,000 cycles for input, audio, HSCROLL and handling PMGs. Easy peasy :)

 

I'd hazard a guess it should be actually possible to run at 50 fps, since after drawing the 2 fields, we still have 20,000 cycles for input, audio, HSCROLL and PMGs...

Link to comment
Share on other sites

I do suspect that it could be redone without need to do any rendering, ie as mentioned change the LMS and HScrol values on the fly and do colour reversals where necessary.

 

If it does render on the fly I suspect it's left that way because they wanted the flexibility to be able to add extra objects in if needed.

  • Like 1
Link to comment
Share on other sites

Option 2: Precomputed playfield (60 fps)

- on a stock 130XE, we have enough RAM to store the playfield's all animation frames, even at 160x192 screen resolution

- 1 frame is (160+8)/4*48 = 2,016 Bytes

- we need 48 of those frames (only for vertical movement, as horizontal is done via HSCROLL), hence 48*2,016 = 96,768 Bytes

 

- The run-time cost of "rendering" is at that point only updating DisplayList pointers,e.g. 2*2*48 = 192 Bytes

 

- As for End of the playfield (the background color), the distant one, is merely another pointer in DL (pointing to empty background scanline)

- The Left/Right side would have to be drawn manually, but that's just couple bytes per scanline on average, without any OR or anything, so not a significant cost, especially compared to redrawing whole field (which we can do in 10,000 cycles in brute-force)

 

- even on NTSC, where we would have ~25,000 cycles (after Antic takes its toll), there's no way we would screw up so bad, so as NOT to run at 60 fps

 

- the only questionable feature is speed of bank switching - how many cycles does it takes to switch the bank, as we need to switch it each frame - I suspect somebody here knows the answer to that

Link to comment
Share on other sites

Bank switching is instantaneous -- very next cycle after access cycle triggering switch can use new bank. There are 1050 enhancements that rely on this to bank switch in the middle of instructions.

 

If hardware horizontal scrolling is used (HSCROL), the scroll amounts have to be varied for each scanline due to perspective. That means the 6502 has to burn cycles following the beam, so you lose 96 scanlines of free CPU time. Otherwise you would four times the precomputed scanlines, pre-scrolled.

 

Ballblazer's frame management is interesting. It alternately updates the two viewports and appears to also split the update so that part of it is sequenced with the DLIs, probably to avoid having to double buffer data.

 

post-16457-0-15993700-1515654305_thumb.png

  • Like 4
Link to comment
Share on other sites

Option 3: 64 KB machine, 60 fps

- Use the fact that the distant quads wrap around during forward motion in that resolution

- We thus don't need to store full 48 scanlines for each frame, only the height equivalent to 2 front quads being wrapped

- That's about 32 frames but at 160x192 it won't fit

- So we go for narrow res: 128x192

- 1 frame is then (128+8)/4*48 = 1,632 Bytes

- 32 frames takes 52,224

- since everything is precomputed we do not need framebuffer, just display list

 

- this leaves us with 10 KB for the code+music+2D art (very tight, but not impossible)

Link to comment
Share on other sites

Bank switching is instantaneous -- very next cycle after access cycle triggering switch can use new bank. There are 1050 enhancements that rely on this to bank switch in the middle of instructions.

That's really awesome to know ! Thanks !

 

This means, one can switch multiple banks during a game frame, for example even switch between the various tables (say, if one has 512 / 1024 KB).

 

 

If hardware horizontal scrolling is used (HSCROL), the scroll amounts have to be varied for each scanline due to perspective. That means the 6502 has to burn cycles following the beam, so you lose 96 scanlines of free CPU time. Otherwise you would four times the precomputed scanlines, pre-scrolled.

Yes, that's the idea - but the table to handle the perspective is rather small - under 1 KB.

 

The moment we go the route of following the beam, all performance from Atari is sucked out and we're left with vblank's worth, basically...

 

What do you mean, prescrolled ? Isn't the whole purpose of HSCROLL the pixel scrolling ?

 

Please don't tell me that for HSCROLL one needs to race the beam per each scanline...

 

 

Ballblazer's frame management is interesting. It alternately updates the two viewports and appears to also split the update so that part of it is sequenced with the DLIs, probably to avoid having to double buffer data.

Thank you for the screenshot. It's very curious :)

 

 

EDIT: Aaaand for HSCROLL we do actually need to race the beam, as there's just one register $D404 (just checked DeReAtari), so that's STA WSYNC for 96 scanlines.

 

We could store 4x more data at either 512 KB machine, or 320 KB (with slightly more complex wrapping logic), but there goes the 130XE 60 fps route...

Edited by VladR
Link to comment
Share on other sites

the music is algorithmically generated, The melody is assembled from a predefined set of 32 eight-note melody fragments, which are put together randomly by the algorithm which also makes choices about -how fast to play, how loud to play it, when to omit or elide notes, when to insert a break, trigger a riff. The accompanying bassline, drums and chords are also assembled on the fly by a simplified version of it. It's also suppose to adjust based on game play factors an cues.... it's not the norm

 

the width board of this is exactly what is needed.. if you play with it all all ... it ain't good...

Edited by _The Doctor__
  • Like 2
Link to comment
Share on other sites

I don't know what in the hell I was thinking yesterday night, not immediately realizing that the HSCROLL means racing the beam (thanks for reality check Phaeron!), but that's going to kill basically 50% of pre-VBLANK CPU time, so the available CPU time is roughly half of what I posted earlier, which won't make it exactly easy or realistic to obtain 60 fps.

Granted, at that point we might just use it and do couple register changes per each scanline (giving us more colors and PMG positioning).

 

I guess I just got carried away when I got excited about the idea.

 

 

As an apology, I'm going to go and code the BruteForce Unrolled approach in 6502 ASM.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...