Jump to content
IGNORED

Ultimate Planet


Vorticon

Recommended Posts

  • 7 months later...

Quick update: I finally returned to the game after a looong hiatus, and man is it a pain to re-familiarize myself with the code! In any case, I decided today to replace the VDP access utilities (VSBW, VMBW, VSBR, VMBR)with my own routines in the hope of speeding up the screen scrolling, only to be sorely disappointed with the final speed improvement, which was frankly minimal. I'm not even sure it was really worth doing, except maybe from an educational standpoint. I keep hearing people complaining about how slow the utilities are and that everybody should write their own routines using VDPWA, VDPRD and VDPWR. Sorry guys but I'm definitely not impressed, and I would recommend this only for the most exacting applications.

Link to comment
Share on other sites

GREAT TO HEAR!!!! This game has actually been on my mind lately... I've been contemplating doing a full bitmap demo.. just for myself... and I watched your demo at the Faire a few times on ustream. (quality is terrible, but it's all I got)....

 

Sorry you didn't get the speed difference you wanted... perhaps the speed difference isn't as obvious in full on GMODE2. I'm not much of an assembly programmer (yet) but it's REALLY fascinating how you create motion pixel by pixel in bitmap mode. That's some serious sh**. =)

Link to comment
Share on other sites

Quick update: I finally returned to the game after a looong hiatus, and man is it a pain to re-familiarize myself with the code! In any case, I decided today to replace the VDP access utilities (VSBW, VMBW, VSBR, VMBR)with my own routines in the hope of speeding up the screen scrolling, only to be sorely disappointed with the final speed improvement, which was frankly minimal. I'm not even sure it was really worth doing, except maybe from an educational standpoint. I keep hearing people complaining about how slow the utilities are and that everybody should write their own routines using VDPWA, VDPRD and VDPWR. Sorry guys but I'm definitely not impressed, and I would recommend this only for the most exacting applications.

There are several different reasons for wanting to write your own. Matthew gave a good run down half a year ago or so, but I can't find the list. I did my output as cartridge binaries from 2004 and on, and then of course, as with EA5, had to supply my own.

 

Anyways, when you got your own routines, you will gain speed if you make your own custom clear screen/fill memory thing (putting the same byte into a larger area) instead of calling a loaded service VSBW many times. VSBW sawap workspace back and forth with every call among other stuff. Other solutions exists too (like using WMBW 24 times).

 

Let's just assume you have written something like VMBW. One or more calls will not show any gain in performance.

 

Now, if you look at your routine, you have a loop. The loop itself probably executes DEC and JNE with every byte moved. That takes time too. Now you could instead try and have a certain degree of "rollout", so instead of

 

R0,400

MOVB

DEC

JNE

 

it would be something like

 

R0,100

MOVB

MOVB

MOVB

MOVB

DEC

JNE

 

You'll be moving more bytes per CPU cycle, though I'm sure Matthew can tell you all about it.

 

:)

Link to comment
Share on other sites

GREAT TO HEAR!!!! This game has actually been on my mind lately... I've been contemplating doing a full bitmap demo.. just for myself... and I watched your demo at the Faire a few times on ustream. (quality is terrible, but it's all I got)....

 

Sorry you didn't get the speed difference you wanted... perhaps the speed difference isn't as obvious in full on GMODE2. I'm not much of an assembly programmer (yet) but it's REALLY fascinating how you create motion pixel by pixel in bitmap mode. That's some serious sh**. =)

I'm waiting to be further along in the game before making a short video :)

Link to comment
Share on other sites

 

There are several different reasons for wanting to write your own. Matthew gave a good run down half a year ago or so, but I can't find the list. I did my output as cartridge binaries from 2004 and on, and then of course, as with EA5, had to supply my own.

 

Anyways, when you got your own routines, you will gain speed if you make your own custom clear screen/fill memory thing (putting the same byte into a larger area) instead of calling a loaded service VSBW many times. VSBW sawap workspace back and forth with every call among other stuff. Other solutions exists too (like using WMBW 24 times).

 

I actually only clear the screen twice: When the game is launched and after the splash screen. Subsequently, the background hex field remains static and I just update the position of the screen elements based on current local and global coordinates. Trying to scroll the entire bitmap is not feasible because it will require disk access as well as saves prior to scrolling to update the movable screen elements and will be horribly slow. I think the issue here is that the program goes through a huge table every time the screen is "scrolled", which contains the vital data for each of the units, and determines whether a certain unit is within the viewing window, alive, able to move etc... I can't find a way around this, and it is a 4 bitmap screen gamefield with 4 different armies, a planet with features, asteroids and meteorites... So in my case I will likely gain little from writing my own VDP routines, which I will get rid of if memory gets tight.

 

 

Now, if you look at your routine, you have a loop. The loop itself probably executes DEC and JNE with every byte moved.

 

How did you guess? ;)

 

 

That takes time too. Now you could instead try and have a certain degree of "rollout", so instead of

 

R0,400

MOVB

DEC

JNE

 

it would be something like

 

R0,100

MOVB

MOVB

MOVB

MOVB

DEC

JNE

 

You'll be moving more bytes per CPU cycle, though I'm sure Matthew can tell you all about it.

 

:)

That's an interesting concept, although I'd like to know what kind of performance gain I can get from this "rollout" before I embark on a laborious code modification. Matthew?

Link to comment
Share on other sites

Quick update: I finally returned to the game after a looong hiatus, and man is it a pain to re-familiarize myself with the code! In any case, I decided today to replace the VDP access utilities (VSBW, VMBW, VSBR, VMBR)with my own routines in the hope of speeding up the screen scrolling, only to be sorely disappointed with the final speed improvement, which was frankly minimal. I'm not even sure it was really worth doing, except maybe from an educational standpoint. I keep hearing people complaining about how slow the utilities are and that everybody should write their own routines using VDPWA, VDPRD and VDPWR. Sorry guys but I'm definitely not impressed, and I would recommend this only for the most exacting applications.

 

I just ran a test using the console VSBW and an optimized version all other things were equal. The optimized version was nearly twice as fast. Perhaps the log jam is somewhere else ?

Link to comment
Share on other sites

Quick update: I finally returned to the game after a looong hiatus, and man is it a pain to re-familiarize myself with the code! In any case, I decided today to replace the VDP access utilities (VSBW, VMBW, VSBR, VMBR)with my own routines in the hope of speeding up the screen scrolling, only to be sorely disappointed with the final speed improvement, which was frankly minimal. I'm not even sure it was really worth doing, except maybe from an educational standpoint. I keep hearing people complaining about how slow the utilities are and that everybody should write their own routines using VDPWA, VDPRD and VDPWR. Sorry guys but I'm definitely not impressed, and I would recommend this only for the most exacting applications.

 

It depends entirely on their use. If you have heavy screen updates in a critical loop, then writing your own can have a serious impact. One example is my yin yang program:

 

http://digitalstratum.com/programming/yinyang_ti_asm

 

When I first wrote that in 1984, I used the VDP utilities and my education from Lottrup's book. The image took, literally, about 4 minutes to draw. You got to watch each line generate in slow painful detail. I revisited that code back in 2006, and now it generates the image in about 10 seconds. Of course there were a lot of optimizations, but new VDP routines were critical.

 

If you only need to update a few bytes here and there, or even a small block, then no, you probably won't see much improvement. However, if you start to run that code in loops, you will see a difference. Also, realizing that writing to the VDP is not voodoo (it seems to freak a lot of people out), sometimes it is necessary to skip using a routine and just code the VDP access directly into the loop, or unroll a loop by a certain factor.

 

I can't remember the details of your game, but I think it is a turn-based strategy, no? Thus you may not have a lot of real-time VDP updates going on, so really you have a chance to see the difference.

 

The other argument for using your own routines is you don't have to rely on the XB or EA cart. If you plan on making a cartridge based version, then you obviously don't have a choice, you have to use your own routines. Oh yeah, I forgot, you like to develop on the real hardware, so doing cartridge based development is not so convenient. However, using Classic99 and Asm994a on a PC, you can do cart based development with a very fast code, compile, test, debug cycle.

Link to comment
Share on other sites

That's an interesting concept, although I'd like to know what kind of performance gain I can get from this "rollout" before I embark on a laborious code modification. Matthew?

 

Sometimes99er did some pretty extensive loop unrolling testing a while ago. I can't remember what thread it was in though. Maybe the "assembly my way" thread, or the thread about over-running the VDP.

 

Since it seems everyone was posting while I was typing my reply, I see my suspicions were correct about you not really needing to do much VDP updating. That explains the lack of much perceived performance. Your critical loops are elsewhere. As for scrolling your screen, I started to get all "shakey" when you said you didn't think it was possible. That kind of challenge is like crack to me. Give us enough details and I'm sure this group will get your screen scrolling! Anyone who doubts what is possible with the 9918A needs to go re-watch the MSX1 "BOLD" demo.

Link to comment
Share on other sites

That's an interesting concept, although I'd like to know what kind of performance gain I can get from this "rollout" before I embark on a laborious code modification. Matthew?

 

Sometimes99er did some pretty extensive loop unrolling testing a while ago. I can't remember what thread it was in though. Maybe the "assembly my way" thread, or the thread about over-running the VDP.

 

Since it seems everyone was posting while I was typing my reply, I see my suspicions were correct about you not really needing to do much VDP updating. That explains the lack of much perceived performance. Your critical loops are elsewhere. As for scrolling your screen, I started to get all "shakey" when you said you didn't think it was possible. That kind of challenge is like crack to me. Give us enough details and I'm sure this group will get your screen scrolling! Anyone who doubts what is possible with the 9918A needs to go re-watch the MSX1 "BOLD" demo.

I never said it was not possible, just slow for my purposes ;) There is no way to store 4 bitmap screens in memory, which will mean only a "slice" can be buffered from disk, actually 4 slices since scrolling can occur in 4 directions. Depending on how wide a slice will be, the memory requirements will likely be prohibitively expensive in bitmap mode. Furthermore, after just a few scrolls, the disk will need to be accessed and all 4 slices will need to be updated AND a disk save operation of the bitmap screen will need to be made to update any units that were moved. I think you'll agree that all these issues, while definitely relatively easily solvable, would result in a crawling scrolling process.

Nonetheless, I am very open to any suggestions that would speed things up in this regard. My solution of keeping the background hex field static was the best compromise I could come up with.

Link to comment
Share on other sites

That's an interesting concept, although I'd like to know what kind of performance gain I can get from this "rollout" before I embark on a laborious code modification. Matthew?

 

Sometimes99er did some pretty extensive loop unrolling testing a while ago. I can't remember what thread it was in though. Maybe the "assembly my way" thread, or the thread about over-running the VDP.

 

Since it seems everyone was posting while I was typing my reply, I see my suspicions were correct about you not really needing to do much VDP updating. That explains the lack of much perceived performance. Your critical loops are elsewhere. As for scrolling your screen, I started to get all "shakey" when you said you didn't think it was possible. That kind of challenge is like crack to me. Give us enough details and I'm sure this group will get your screen scrolling! Anyone who doubts what is possible with the 9918A needs to go re-watch the MSX1 "BOLD" demo.

I never said it was not possible, just slow for my purposes ;) There is no way to store 4 bitmap screens in memory, which will mean only a "slice" can be buffered from disk, actually 4 slices since scrolling can occur in 4 directions. Depending on how wide a slice will be, the memory requirements will likely be prohibitively expensive in bitmap mode. Furthermore, after just a few scrolls, the disk will need to be accessed and all 4 slices will need to be updated AND a disk save operation of the bitmap screen will need to be made to update any units that were moved. I think you'll agree that all these issues, while definitely relatively easily solvable, would result in a crawling scrolling process.

Nonetheless, I am very open to any suggestions that would speed things up in this regard. My solution of keeping the background hex field static was the best compromise I could come up with.

 

Are you considering a screen as the entire pattern table or are you saving it in an encoded fashion ?

Link to comment
Share on other sites

Quick update: I finally returned to the game after a looong hiatus, and man is it a pain to re-familiarize myself with the code! In any case, I decided today to replace the VDP access utilities (VSBW, VMBW, VSBR, VMBR)with my own routines in the hope of speeding up the screen scrolling, only to be sorely disappointed with the final speed improvement, which was frankly minimal. I'm not even sure it was really worth doing, except maybe from an educational standpoint. I keep hearing people complaining about how slow the utilities are and that everybody should write their own routines using VDPWA, VDPRD and VDPWR. Sorry guys but I'm definitely not impressed, and I would recommend this only for the most exacting applications.

 

A lot has already been said, but one of the biggest speed improvements in rolling your own is not using BLWP. If you are still using that, you won't see as much of a jump as you would would BL. For instance, VSBW can be reduced to:

 

SOCB @H80,R0  * Assumes there is a >80 at H80
SWPB R0
MOVB R0,@VDPWA
SWPB R0
MOVB R0,@VDPWA
MOVB R1,@VDPWD
B *R11

 

(or potentially tighter if you drop the SWPBs, I like it. ;) )

 

For my purposes, I've dropped even that sort of subroutine now. The only sub I tend to have is one to set the address, like so:

 

VWAD
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
B *R11

 

And I do all my VDP access inline, although if I have block moves I may write a copy subroutine. They are just MOVBs, after all. Note that I don't add the control bits - I do that inline too - this function also works for setting registers too, then, just by setting the right value in the bits. (And I can get even more speed if I need it by pre-swapping the data and dropping the first SWPB, since usually I use immediate loads anyway).

 

That may help you get more speed, but if you need to scroll the 12k bitmap screen, I don't think it can be done perfectly smoothly -- the VDP itself has a limit of very roughly 2k of transfer per frame.

Link to comment
Share on other sites

Forgive mr ignorance here, but if the screen stays static behind the characters, wouldn't it be possible to "fake" a scroll? Simply by shifting the display data during the necessary scroll routine? You won't actually be scrolling, per se, but we could fake it with some math, couldn't we?

 

I know bitmap is quite different from GMODE1, so I'm asking really for my own knowledge rather than offering a viable solution. :)

Link to comment
Share on other sites

Nonetheless, I am very open to any suggestions that would speed things up in this regard. My solution of keeping the background hex field static was the best compromise I could come up with.

 

I have found that the greatest speed gains in software come not from trying to make an existing solution faster, but rather rethinking the problem. I can only offer generalities here because I don't know the details of what you ultimately want to do with the game. However, decoupling the data and graphics from the screen can help speed things up and take much less memory.

 

For example, way back in the DOS days, I set about to write a "window" manager for some software I was writing. I took the approach of saving the screen area under where I was going to draw a new window. Then when the window would close, it would restore the area it destroyed. I quickly realized that I could consume a lot of memory as more and more "windows" were opened, but I didn't want a limit like that in my software. And these were character-based windows. I wondered how Microsoft did it with Windows (3.11 back then). There was no way they could be saving and restoring huge chunks of bitmap video memory as windows were opened, moved, and closed. I had no idea of the concept that a program could be asked to redraw its window at any time as necessary. I was stuck in a linear way of thinking and programming my apps, and once a window was drawn, my software had no way to redraw the window, probably because the window had painted a bunch of data fields and was waiting for user input.

 

Anyway, all that to say, maybe you could think of your game as a database of information used to play the game and track the units. At that point, the screen drawing simply becomes a matter of reading the data to "generate" the current display. There is no storing of bitmap screen data anywhere. If you make your data "sparse", then your game world can be very large, yet compact in data storage. You may also be able to pre-calculate intermediate data that could assist the screen drawing routines in scrolling. If you pre-calculate the data for each of the two top and bottom rows, and the two left and right columns, then you would be ready to scroll in any direction.

 

Anyway, just some thoughts. If you can define better what you need to be able to do, i.e. state the problem without consideration to your current solution, then we can help you come up with different possible solutions.

Edited by matthew180
Link to comment
Share on other sites

 

 

Anyway, all that to say, maybe you could think of your game as a database of information used to play the game and track the units. At that point, the screen drawing simply becomes a matter of reading the data to "generate" the current display. There is no storing of bitmap screen data anywhere. If you make your data "sparse", then your game world can be very large, yet compact in data storage. You may also be able to pre-calculate intermediate data that could assist the screen drawing routines in scrolling. If you pre-calculate the data for each of the two top and bottom rows, and the two left and right columns, then you would be ready to scroll in any direction.

That is exactly what I'm doing :) Everything displayed on screen except the background hex is in a single large table, which itself has pointers to a tiny character pattern table. I don't store any bitmaps at all. The main issue slowing things down is that with every scroll the entire table is scanned to see what items disappear and which ones become visible depending on the current coordinates of the "viewing" window which is a subset of the entire gamefield. Given that most items will eventually move during the game, in many cases very far from their initial starting positions, I could not figure out a way out of scanning the entire table every time... Nonetheless, the whole process is quite playable, especially since we are dealing with a turn based game and not an arcade one. I was recently playing a WWII wargame called Desert Rats on my IBM PCjr and it struck how slooow the scrolling was for a machine much faster than our TI and equipped with 640K of RAM. Now there is a version for the Spectrum 128K, and I bet that the bitmap maps are stored in memory and remain there for the entire game, with plenty of room left for the code. This is obviously not an option for the TI unless I make use of the SAMS card. I have considered that, but it would severely limit the accessibility of the game for those TIers without one, unless they use emulation.

On the other hand, in this day and age, how many TIers are really using real hardware? Maybe I should reconsider my entire paradigm as to who the target audience is and code for the highest hardware denominator available for the TI as long as it is supported by Classic 99 which is now the de facto emulation standard. Hmmmm.... If I go that route, I will have to re-write quite a bit of the game and will likely set me back as much as a year. What do you guys think?

Link to comment
Share on other sites

I say go with what your heart tells you. With LoBR, I'm maintaining a Bare minimum standard of 32k.... In other words, I've decided my target audience is PEB and CF7 users with a 32k mem. However------- when the game is done, I fully intend on expanding the sh** out of it, add every despicable trick in the book and release a "Classic99 version". :). Don't set yourself back a year--- just make a second, ridiculous, version once v1.0 is done. :). My two cents, anyway.

Edited by Opry99er
Link to comment
Share on other sites

Forgive mr ignorance here, but if the screen stays static behind the characters, wouldn't it be possible to "fake" a scroll? Simply by shifting the display data during the necessary scroll routine? You won't actually be scrolling, per se, but we could fake it with some math, couldn't we?

 

That's what Ultimate Planet does :D

Link to comment
Share on other sites

 

Are you considering a screen as the entire pattern table or are you saving it in an encoded fashion ?

That and the Screen Image table and the Color table. How would you encode this?

 

If you have lots of repeating patterns then saving just the sit could speed you up (file access wise that is.) If colors could be associated with certain characters then you could just wing the color table perhaps.

Link to comment
Share on other sites

Given that most items will eventually move during the game, in many cases very far from their initial starting positions, I could not figure out a way out of scanning the entire table every time...

 

You don't have to scan the whole table, only the objects that have the potential to appear on the screen. This is what 3D game designers have to do. They spend a lot of time on structures like b-trees, octrees, etc. doing binary space partitioning to isolate only those objects that need to be drawn. Since you are dealing with a 2D space, these methods should be much simpler vs. 3D. Also, depending on how fast units can move per turn, in the middle section you don't need to scan. For example, if in 1 turn a unit can only move 1 tile, then any unit on the screen within 2 tiles of an edge would still be on the screen after 1 turn, even if the screen also scrolled. You only need to scan for those units that might have come into, or gone out of, view.

 

I was recently playing a WWII wargame called Desert Rats on my IBM PCjr and it struck how slooow the scrolling was for a machine much faster than our TI and equipped with 640K of RAM.

 

Cool. My second computer was a PCjr. Funny that it suffers a similar design flaw as our 99/4A, i.e. the 8088 is a 16-bit CPU with an 8-bit bus to the world, not unlike our beloved 16-to-8 multiplexer. The 8088 at 4.77MHz was only marginally faster, and a properly designed 9900 system could probably keep pace.

 

On the other hand, in this day and age, how many TIers are really using real hardware? Maybe I should reconsider my entire paradigm as to who the target audience is and code for the highest hardware denominator available for the TI as long as it is supported by Classic 99 which is now the de facto emulation standard. Hmmmm.... If I go that route, I will have to re-write quite a bit of the game and will likely set me back as much as a year. What do you guys think?

 

I'm not sure who still uses real hardware, but I suspect a lot of people. I still have a full complement of stock gear (console, PEB, 32K, TI Disk Cont, TI RS232), speech synth. Nothing 3rd party though, I could never find anyone willing to sell for less than the price of gold. Also, Tursi has a good policy of not add anything to Classic99 that does not exist in the real world (with a few exceptions), so my opinion would be to stick with an expanded (PEB, 32K, SSSD), yet real, system. If you make a cartridge based game you can get more ROM space with bank switching. But, definitely don't do anything that will set you back a year.

Link to comment
Share on other sites

Yeah, but like Matthew said, I probably won't (add more VRAM). ;) I actually looked at adding some power-ups to the VDP in the guise of a microcontroller and some dual-port SRAM, but Matthew's VDP will do enough for me. ;) That said, I found the old 9938 code that was contributed to Classic99 years ago, and when I rewrite the VDP I intend to add that code back in, more or less. I should be able to support it now. More future long-term thinking though.

 

While I don't run my real hardware often, I'm a fan of supporting the base system. If you do it by adding hardware, sure, but I don't like adding TOO many imaginary things to Classic99 (even the synth sound chip that I used to have in there, I specified its operation with the intent of building the real thing. It just wasn't worth it, really.)

Link to comment
Share on other sites

Well, I think I'm just going to stick with my current code which will allow any basic TI setup to run the game (32K and one disk drive). I do want to have a project specifically targeted at the SAMS card which is incredibly underused at some time in the future though. Maybe my next wargame will feature true scrolling. But first, I really want to have Ultimate Planet finished in time for the Chicago Faire, so I'm going to focus on it exclusively for now.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...