Jump to content
IGNORED

New GUI for the Atari 8-bit


flashjazzcat

Recommended Posts

Started conversion to ROM today, and this is probably going to be the most critical part of the project (i.e. the part when I'll keep wanting to give up because it's such a bloody nightmare). :)

 

Some stats: there's about 24KB of code at the moment (not including fonts, icons, data, tables, etc: the XEX is currently 39KB with wallpaper and everything else embedded inline), and by far the largest "module" (which will fit nicely into an 8KB ROM bank) is the graphics segment - i.e. everything which writes directly to the screen. The next biggest module is the window manager (which is 5KB but will get its own bank), and then the code size drops off pretty sharply after that. I figured the best way to start this gargantuan task was to place a ".LOCAL" wrapper around the entire graphics source file, so that instead of JSR ROUTINE (in any other bank), JSR GFX.ROUTINE would be required. When editing the calls, I'm also changing JSR to LJSR (the latter being the inter-bank JSR macro call), so we now get LJSR GFX.ROUTINE. That's nicely self-documenting, in as much as it clearly shows what's an external call and what's not.

 

This threw out 117 assembly errors which I had to pick through (most of which were calls which needed amending as described above, obviously), and I was reminded of one thing I'd forgotten: that any kind of jump table will no longer work unless the target routines are in the same bank as the table, so they'll have to be redesigned too. Of course I'm also moving all the variable space down into low RAM, which will obviously need initialising by the cartridge code, and encountering (as I go) some poor decisions regarding local variable space tacked onto the end of LOCAL ranges (which obviously won't work in ROM); these have to be renamed and moved. There's much scope for bug-creation here. :)

 

So - rinse and repeat on another dozen source files (albeit smaller ones), until the thing works with the LOCAL wrappers in place, at least. Then switch the assembler to BIN mode, add padding to the banks, relocate the (test) application to low RAM from the init bank of the cart, make sure the interrupt handlers and everything else which needs to be is out of the way of banked ROM... and no doubt watch the whole thing fail to work for a number of weeks.

 

Then - when stuff's just about working - I'll have to look out for any performance hits caused by slow inter-bank calls in critical areas, and move the offending code to a different bank. I'm pretty sure it'll be snowing while this goes on... at least if we have a long, arduous winter. :)

 

And when that's all done, we can pick up where we left off adding to the functionality, designing the API, etc, etc. :D

 

  • Like 7
Link to comment
Share on other sites

Sounds fun! ;) Do you plan to leave the system graphical elements (wallpaper, icons, fonts, etc.) embedded, or will they be loaded from a resource file?

 

Oh no - nothing will remain embedded in the ROM apart from a few small tables. Everything will be pulled in from external resources, which will mean you can change the system font, assign your own icons (from a very considerably sized selection), etc. I might even put stuff like dithered scrollbar patterns and closer/fuller buttons in resources as well, so the system can be skinnable to a certain extent. The shell will also be an application, so if someone wants to write a "Symcommander" to use instead of the default file manager, that'll be possible too without changing the ROM.

 

I'll very shortly have to face the decision about which of MADS' two relocatable binary formats to use, too: SDX or proprietary. Proprietary would be better (since it allows lo/hi byte relocation, and has a slightly simpler layout), but it doesn't currently support multiple RELOC blocks in the same file, which I'll need (for drivers which - for example - install part of themselves in low conventional RAM and another part in extended memory), nor different RELOC segment types (for example - MAIN and EXT). The SDX format supports both of these, but not lo/hi byte relocation (which can be easily coded around, of course). The MADS format has a few niceties like long external symbol names, but I'll have to go with whatever's best when the time comes.

Edited by flashjazzcat
Link to comment
Share on other sites

I'm clearly prematurely optimizing, but perhaps you could have common routines repeated in multiple banks to avoid the LJSR overhead. It would be great if there were an optimizing assembler that could do this sort of thing automatically. Other obvious optimizations could include inlining function calls, dead code removal, peep hole optimization, etc. I could also imagine a tool that took a trace of the program and analyzed the function call sequence to automatically partition the code into tightly coupled pieces the size of one ROM bank. Anyhow, correctness should come first and then you can worry about optimization.

  • Like 1
Link to comment
Share on other sites

I'm clearly prematurely optimizing, but perhaps you could have common routines repeated in multiple banks to avoid the LJSR overhead. It would be great if there were an optimizing assembler that could do this sort of thing automatically. Other obvious optimizations could include inlining function calls, dead code removal, peep hole optimization, etc. I could also imagine a tool that took a trace of the program and analyzed the function call sequence to automatically partition the code into tightly coupled pieces the size of one ROM bank. Anyhow, correctness should come first and then you can worry about optimization.

 

Not premature at all: I already found myself copying and pasting common setup routines which need to be accessed in different banks. This seems to me a sensible way to proceed, especially when we have 40KB of space still to fill in the smallest target platform (U1MB). Sometimes a big code-inventorying phase like this is a really good time to take stock of the positioning of various routines.

 

Heh - if anyone writes a tool like the one you describe, I'm using it. :)

Link to comment
Share on other sites

I'm clearly prematurely optimizing, but perhaps you could have common routines repeated in multiple banks to avoid the LJSR overhead. It would be great if there were an optimizing assembler that could do this sort of thing automatically. Other obvious optimizations could include inlining function calls, dead code removal, peep hole optimization, etc. I could also imagine a tool that took a trace of the program and analyzed the function call sequence to automatically partition the code into tightly coupled pieces the size of one ROM bank. Anyhow, correctness should come first and then you can worry about optimization.

Like some kind of virtual memory manager? :) <- I'm sorry i couldn't resist. It would be awesome if there is some way to prevent future program writers to avoid this hurdle with your GUI by assigning an address range for the user's program. Also, if you choose the SDX method, could someone like drac030 add the functions of the proprietary one?

Edited by fibrewire
Link to comment
Share on other sites

Also, if you choose the SDX method, could someone like drac030 add the functions of the proprietary one?

Tebe is responsible for any enhancements to the way MADS produces relocatable files. I sent him a PM three weeks ago regarding the single block limitation, etc. Presumably he's too busy.

Link to comment
Share on other sites

MyIDE and others are good and fast, page swapping MAY be practical. Any thoughts?

 

I wouldn't want to swap out whole applications since this would require storage access in the interrupt context (although it would be just fine for simple task switching), but there's no reason that indirectly accessed extended RAM can't be paged out to disk. RAM allocated from the extended pool is never directly addressed by applications anyway, so if it's not around when the application requests a piece of it, we can just pull it in from disk.

Link to comment
Share on other sites

Crikey... nice surprise. Was just going to set up a second Pokey interrupt to test scheduling when I discovered I'd already set one up (at 50Hz) with quite a long dummy delay in it. Turning this off speeded up rendering yet further, and the fact is the scheduler won't usually be doing that much work when most processes are in a "not ready" state.

  • Like 5
Link to comment
Share on other sites

Hi Jon,

 

I don't know if this is any help, but back in the day, I remember having a little proggie that would context switch with a key combination. I forget the name of it, and the key combo, but I do remember it poked GTIA in the process of the switch, so I could hear a short buzz while it switched. I found it quite useful. Switch between a WP and DaisyDot, for example. Hopefully, you or someone else knows the name of that prog. It almost works like Windows Alt-Tab.

 

Hope this may help you in some way.

 

-K

Link to comment
Share on other sites

Hopefully progress is smooth for the rom conversion process. I know Ive hit a rough spot when dishes/lawn/garage are welcome breaks from doing real work, so your perseverance in this project is greatly appreciated.

 

Shouldn't take as long as the initial estimate, and it's going OK so far. I've actually been subject to yet another distraction in the form of the multitasking kernel I've started writing. It'll actually be useful to get this working at this stage: it'll remove the need to code up any cart jump tables because the kernel facilitates inter-process messaging using the 6502 BRK instruction. I've got the scheduler interrupt running in the test build already and the core kernel's about half done, pending testing and debugging. The trickiest bit is probably the message queue, which is a linked list to allow only those messages dispatched to a particular addressee to be pulled out of it. This approach appears to follow the SymbOS model but I'm waiting for Prodatron to confirm this. In any case, the kernel is one of the more fascinating parts of the project.

 

Hi Jon,

 

I don't know if this is any help, but back in the day, I remember having a little proggie that would context switch with a key combination. I forget the name of it, and the key combo, but I do remember it poked GTIA in the process of the switch, so I could hear a short buzz while it switched. I found it quite useful. Switch between a WP and DaisyDot, for example. Hopefully, you or someone else knows the name of that prog. It almost works like Windows Alt-Tab.

 

Hope this may help you in some way.

 

-K

 

Yes Kyle I know of it, and it may even have been mentioned earlier in this thread. Thanks though. I imagine the context switching overheads in that utility were pretty large, since it would have to handle all the hardware registers, the entirety of page zero, the whole stack, all of main RAM, etc. Fortunately the GUI only has to swap out a tiny bit of page zero and any section of the segmented stack which is currently cached.

 

The thorny issue of binary relocation format still weighs on my mind...

Edited by flashjazzcat
Link to comment
Share on other sites

Yes Kyle I know of it, and it may even have been mentioned earlier in this thread. Thanks though. I imagine the context switching overheads in that utility were pretty large, since it would have to handle all the hardware registers, the entirety of page zero, the whole stack, all of main RAM, etc. Fortunately the GUI only has to swap out a tiny bit of page zero and any section of the segmented stack which is currently cached.

Tom Hunt's Snapshot? I think that was the one; first version, as I recall, kept each "machine" in RAM; later version permitted you to swap out to a HD.

Link to comment
Share on other sites

Tom Hunt's Snapshot? I think that was the one; first version, as I recall, kept each "machine" in RAM; later version permitted you to swap out to a HD.

 

Thanks! That's the one I was thinking about, and I didn't know about the newer HD one. Now, I must go find it :)

 

-Kyle

 

 

P.S. Found it here http://atariage.com/forums/index.php?app=core&module=attach&section=attach&attach_id=58998

Edited by Kyle22
Link to comment
Share on other sites

I noticed you were writing a kernel for your GUI. Here is some information for SOS (Sophisticated Operating System) for the Apple ///

 

http://en.wikipedia.org/wiki/Apple_SOS

 

And conveniently, a source listing for said OS

 

http://www.brutaldeluxe.fr/documentation/a3/apple3_SRC_SOS_DTC.pdf

 

What caught my eye is the simplicity of how the OS communicated with other devices, namely character and block devices.

Link to comment
Share on other sites

I noticed you were writing a kernel for your GUI. Here is some information for SOS (Sophisticated Operating System) for the Apple ///

 

 

Interesting - thanks. Here's another pertinent document: http://www.1000bit.it/support/manuali/apple/a3sosrm.pdf.

 

It's a single-tasking OS, but the sections on banked memory management and device drivers are especially relevant.

Link to comment
Share on other sites

BTW: after a couple of restless nights worrying about redraw performance, I finally made a difficult but overdue decision and am abandoning my beloved RLE compressed window masks in favour of rectangle lists. The decision was made a little easier thanks to this article on the topic. SymbOS (like GEM) uses rectangle lists, but I had resisted change primarily because I considered the window masks such an elegant and novel solution. Another factor was that I simply couldn't get my head around rectangle lists and was going all out for simplicity as far as the client redraws were concerned. Now, clipping masks are wonderfully versatile but unfortunately they work ex post in the sense that by the time you extract the mask information and realize a particular byte is obscured, you're already in the middle of rendering the object. I think the classic Mac's QuickDraw used clipping masks (i.e. Regions) pre factum, since the 68000 is fast enough to AND an object's extent against a region prior to rendering. Very nice for rounded corner windows, etc, but I had to balance this against the fact that using the window masks, simply moving a window by a few pixels caused a calamitous amount of redrawing.

 

What we really want to happen when we - for example - move a window, is to create an update region and see if any of the background windows' rectangles intersect with it. About the only drawback with rectangle lists is that you have to call a render on everything in the window for each rectangle in the list, but coordinate clipping is much faster than actually rendering stuff, and anything outside of the update region is just discarded.

 

Anyway - all the methodology suddenly becomes wonderfully clear. :) Every time a window is opened, moved, resized or closed, the rectangle lists for every window (the desktop being window zero) are rebuilt. It's been quite interesting designing the code for all this. Here's my initial interpretation of the rectangle split algorithm described in the article above:

	.local RectSplit
	cpw Left2 Left1 ; split on left hand side?
	bcs NoLeftSplit
	mwa Left2 Left3
	sbw Left1 #1 Right3
	lda Top1 ; top3 = max (top1, top2)
	cmp Top2
	bcs @+
	lda Top2
@
	sta Top3
	lda Bottom1 ; bottom3 = min (bottom1, bottom2)
	cmp Bottom2
	bcc @+
	lda Bottom2
@
	sta Bottom3
	jsr AddWindowRect ; add Rect3 to the head of the rectangle list

NoLeftSplit
	cpw Right1 Right2
	bcs NoRightSplit
	adw Right1 #1 Left3
	mwa Right2 Right3
	lda Top1 ; top3 = max (top1, top2)
	cmp Top2
	bcs @+
	lda Top2
@
	sta Top3
	lda Bottom1 ; bottom3 = min (bottom1, bottom2)
	cmp Bottom2
	bcc @+
	lda Bottom2
@
	sta Bottom3
	jsr AddWindowRect

NoRightSplit
	lda Top2
	cmp Top1
	bcs NoTopSplit
	sta Top3
	sbb Top1 #1 Bottom3
	cpw Left1 Left2
	bcc Less1
	mwa Left1 Left3
	jmp @+
Less1
	mwa Left2 Left3
@
	jsr AddWindowRect

NoTopSplit
	lda Bottom1
	cmp Bottom2
	bcs NoBottomSplit
	adc #1
	sta Top3
	mva Bottom2 Bottom3
	cpw Left1 Left2
	bcc Less2
	mwa Left1 Left3
	jmp @+
Less1
	mwa Left2 Left3
@
	jsr AddWindowRect

NoBottomSplit
	rts
	.endl

AddWindowRect adds the newly created rectangle to the head of the current window's rect list. By placing the new rect at the head of the list, we can simultaneously iterate through the existing rects from the original list head without visiting the new additions. So: we start with the desktop, and place a 320x200 rectangle at the start of its rect list. Then we split that rect with the first window, and then split that window and all the resulting desktop rects with the next window and so on.

Stuff like background rendering can then be confined to the rectangle bounds, while larger objects which can't easily be "downsized" to fit the rects will be nicely clipped by the clipping routines. This all seems remarkably obvious now, but it's another fairly substantial change and thus rewrite...

  • Like 3
Link to comment
Share on other sites

BTW: after a couple of restless nights worrying about redraw performance, I finally made a difficult but overdue decision and am abandoning my beloved RLE compressed window masks in favour of rectangle lists. The decision was made a little easier thanks to this article on the topic. SymbOS (like GEM) uses rectangle lists, but I had resisted change primarily because I considered the window masks such an elegant and novel solution. Another factor was that I simply couldn't get my head around rectangle lists and was going all out for simplicity as far as the client redraws were concerned. Now, clipping masks are wonderfully versatile but unfortunately they work ex post in the sense that by the time you extract the mask information and realize a particular byte is obscured, you're already in the middle of rendering the object. I think the classic Mac's QuickDraw used clipping masks (i.e. Regions) pre factum, since the 68000 is fast enough to AND an object's extent against a region prior to rendering. Very nice for rounded corner windows, etc, but I had to balance this against the fact that using the window masks, simply moving a window by a few pixels caused a calamitous amount of redrawing.

 

What we really want to happen when we - for example - move a window, is to create an update region and see if any of the background windows' rectangles intersect with it. About the only drawback with rectangle lists is that you have to call a render on everything in the window for each rectangle in the list, but coordinate clipping is much faster than actually rendering stuff, and anything outside of the update region is just discarded.

 

Anyway - all the methodology suddenly becomes wonderfully clear. :) Every time a window is opened, moved, resized or closed, the rectangle lists for every window (the desktop being window zero) are rebuilt. It's been quite interesting designing the code for all this. Here's my initial interpretation of the rectangle split algorithm described in the article above:

 

*code*

 

AddWindowRect adds the newly created rectangle to the head of the current window's rect list. By placing the new rect at the head of the list, we can simultaneously iterate through the existing rects from the original list head without visiting the new additions. So: we start with the desktop, and place a 320x200 rectangle at the start of its rect list. Then we split that rect with the first window, and then split that window and all the resulting desktop rects with the next window and so on.

Stuff like background rendering can then be confined to the rectangle bounds, while larger objects which can't easily be "downsized" to fit the rects will be nicely clipped by the clipping routines. This all seems remarkably obvious now, but it's another fairly substantial change and thus rewrite...

 

 

So the rectangle lists will be faster? I thought the masking method was already speedy for a 6502!

Link to comment
Share on other sites

 

 

So the rectangle lists will be faster? I thought the masking method was already speedy for a 6502!

 

Removing the requirement to load up the window masks for each rendered scanline and then mask the bytes through them has certainly sped things up already. Therefore anything which actually does get drawn will be drawn faster. What remains to be seen (and I'm keeping a backed-up version using the masks) is how long all the rectangle clipping takes, with potentially multiple passes through the renderer to exhaust all the rectangles in the list. On paper it should be quite a bit faster (especially with regard to the slight delay in responsiveness in the current version after moving a front window, when nothing is apparently being drawn but in fact stuff is being drawn through a mask).

  • Like 4
Link to comment
Share on other sites

Hey i saw something in that article that might apply to your GUI if your still snapping windows to byte boundaries.

 

"Another DRS algorithm, which is much simpler, is to divide the playing area into blocks, and mark each block that is dirty. When it is time to update the screen, blit all dirty blocks. By carefully chosing the size of blocks, an optimal speed may be achieved."

 

Aslo, found the Apple 3 SOS Driver Writer's Reference. I don't know if it's of any use, but it's interesting that all drivers are stored in a single file, and has it's own configuration menu.

 

Apple III SOS Device Driver Writer's Guide.pdf - Asimov.net

 

Also, here's the Standard Device Drivers Manual for completeness.

 

Apple III Standard Device Drivers Manual - 1000 BiT

Link to comment
Share on other sites

Hey i saw something in that article that might apply to your GUI if your still snapping windows to byte boundaries.

 

"Another DRS algorithm, which is much simpler, is to divide the playing area into blocks, and mark each block that is dirty. When it is time to update the screen, blit all dirty blocks. By carefully chosing the size of blocks, an optimal speed may be achieved."[/size]

This raises the point that it might now be easier and more expedient to keep the window borders inside the byte boundaries, instead of outside as they currently are. This way, the split rectangles would always be whole byte extents. This would certainly save a bit of time when updating chunks of the desktop or clearing out sections of the display - not to mention making the window rect calculations 8-bit.

 

Aslo, found the Apple 3 SOS Driver Writer's Reference. I don't know if it's of any use, but it's interesting that all drivers are stored in a single file, and has it's own configuration menu.

 

Apple III SOS Device Driver Writer's Guide.pdf - Asimov.net

 

Also, here's the Standard Device Drivers Manual for completeness.

 

Apple III Standard Device Drivers Manual - 1000 BiT

Good stuff - thanks. The stuff about driver structure is interesting, since this is something I'll need to design further down the line.

Edited by flashjazzcat
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...