flashjazzcat Posted October 17, 2013 Author Share Posted October 17, 2013 Started conversion to ROM today, and this is probably going to be the most critical part of the project (i.e. the part when I'll keep wanting to give up because it's such a bloody nightmare). Some stats: there's about 24KB of code at the moment (not including fonts, icons, data, tables, etc: the XEX is currently 39KB with wallpaper and everything else embedded inline), and by far the largest "module" (which will fit nicely into an 8KB ROM bank) is the graphics segment - i.e. everything which writes directly to the screen. The next biggest module is the window manager (which is 5KB but will get its own bank), and then the code size drops off pretty sharply after that. I figured the best way to start this gargantuan task was to place a ".LOCAL" wrapper around the entire graphics source file, so that instead of JSR ROUTINE (in any other bank), JSR GFX.ROUTINE would be required. When editing the calls, I'm also changing JSR to LJSR (the latter being the inter-bank JSR macro call), so we now get LJSR GFX.ROUTINE. That's nicely self-documenting, in as much as it clearly shows what's an external call and what's not. This threw out 117 assembly errors which I had to pick through (most of which were calls which needed amending as described above, obviously), and I was reminded of one thing I'd forgotten: that any kind of jump table will no longer work unless the target routines are in the same bank as the table, so they'll have to be redesigned too. Of course I'm also moving all the variable space down into low RAM, which will obviously need initialising by the cartridge code, and encountering (as I go) some poor decisions regarding local variable space tacked onto the end of LOCAL ranges (which obviously won't work in ROM); these have to be renamed and moved. There's much scope for bug-creation here. So - rinse and repeat on another dozen source files (albeit smaller ones), until the thing works with the LOCAL wrappers in place, at least. Then switch the assembler to BIN mode, add padding to the banks, relocate the (test) application to low RAM from the init bank of the cart, make sure the interrupt handlers and everything else which needs to be is out of the way of banked ROM... and no doubt watch the whole thing fail to work for a number of weeks. Then - when stuff's just about working - I'll have to look out for any performance hits caused by slow inter-bank calls in critical areas, and move the offending code to a different bank. I'm pretty sure it'll be snowing while this goes on... at least if we have a long, arduous winter. And when that's all done, we can pick up where we left off adding to the functionality, designing the API, etc, etc. 7 Quote Link to comment Share on other sites More sharing options...
danwinslow Posted October 17, 2013 Share Posted October 17, 2013 Sounds fun! Do you plan to leave the system graphical elements (wallpaper, icons, fonts, etc.) embedded, or will they be loaded from a resource file? Quote Link to comment Share on other sites More sharing options...
TheNameOfTheGame Posted October 17, 2013 Share Posted October 17, 2013 Wooooot! This has been a long time coming I know this will be a long process, but every journey begins with the first step. 3 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 17, 2013 Author Share Posted October 17, 2013 (edited) Sounds fun! Do you plan to leave the system graphical elements (wallpaper, icons, fonts, etc.) embedded, or will they be loaded from a resource file? Oh no - nothing will remain embedded in the ROM apart from a few small tables. Everything will be pulled in from external resources, which will mean you can change the system font, assign your own icons (from a very considerably sized selection), etc. I might even put stuff like dithered scrollbar patterns and closer/fuller buttons in resources as well, so the system can be skinnable to a certain extent. The shell will also be an application, so if someone wants to write a "Symcommander" to use instead of the default file manager, that'll be possible too without changing the ROM. I'll very shortly have to face the decision about which of MADS' two relocatable binary formats to use, too: SDX or proprietary. Proprietary would be better (since it allows lo/hi byte relocation, and has a slightly simpler layout), but it doesn't currently support multiple RELOC blocks in the same file, which I'll need (for drivers which - for example - install part of themselves in low conventional RAM and another part in extended memory), nor different RELOC segment types (for example - MAIN and EXT). The SDX format supports both of these, but not lo/hi byte relocation (which can be easily coded around, of course). The MADS format has a few niceties like long external symbol names, but I'll have to go with whatever's best when the time comes. Edited October 17, 2013 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
Xuel Posted October 17, 2013 Share Posted October 17, 2013 I'm clearly prematurely optimizing, but perhaps you could have common routines repeated in multiple banks to avoid the LJSR overhead. It would be great if there were an optimizing assembler that could do this sort of thing automatically. Other obvious optimizations could include inlining function calls, dead code removal, peep hole optimization, etc. I could also imagine a tool that took a trace of the program and analyzed the function call sequence to automatically partition the code into tightly coupled pieces the size of one ROM bank. Anyhow, correctness should come first and then you can worry about optimization. 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 17, 2013 Author Share Posted October 17, 2013 I'm clearly prematurely optimizing, but perhaps you could have common routines repeated in multiple banks to avoid the LJSR overhead. It would be great if there were an optimizing assembler that could do this sort of thing automatically. Other obvious optimizations could include inlining function calls, dead code removal, peep hole optimization, etc. I could also imagine a tool that took a trace of the program and analyzed the function call sequence to automatically partition the code into tightly coupled pieces the size of one ROM bank. Anyhow, correctness should come first and then you can worry about optimization. Not premature at all: I already found myself copying and pasting common setup routines which need to be accessed in different banks. This seems to me a sensible way to proceed, especially when we have 40KB of space still to fill in the smallest target platform (U1MB). Sometimes a big code-inventorying phase like this is a really good time to take stock of the positioning of various routines. Heh - if anyone writes a tool like the one you describe, I'm using it. Quote Link to comment Share on other sites More sharing options...
fibrewire Posted October 18, 2013 Share Posted October 18, 2013 (edited) I'm clearly prematurely optimizing, but perhaps you could have common routines repeated in multiple banks to avoid the LJSR overhead. It would be great if there were an optimizing assembler that could do this sort of thing automatically. Other obvious optimizations could include inlining function calls, dead code removal, peep hole optimization, etc. I could also imagine a tool that took a trace of the program and analyzed the function call sequence to automatically partition the code into tightly coupled pieces the size of one ROM bank. Anyhow, correctness should come first and then you can worry about optimization. Like some kind of virtual memory manager? <- I'm sorry i couldn't resist. It would be awesome if there is some way to prevent future program writers to avoid this hurdle with your GUI by assigning an address range for the user's program. Also, if you choose the SDX method, could someone like drac030 add the functions of the proprietary one? Edited October 18, 2013 by fibrewire Quote Link to comment Share on other sites More sharing options...
ilmenit Posted October 18, 2013 Share Posted October 18, 2013 And when that's all done, we can pick up where we left off adding to the functionality, designing the API, etc, etc. We want API! We want API! Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 18, 2013 Author Share Posted October 18, 2013 Also, if you choose the SDX method, could someone like drac030 add the functions of the proprietary one? Tebe is responsible for any enhancements to the way MADS produces relocatable files. I sent him a PM three weeks ago regarding the single block limitation, etc. Presumably he's too busy. Quote Link to comment Share on other sites More sharing options...
Kyle22 Posted October 19, 2013 Share Posted October 19, 2013 Like some kind of virtual memory manager? ... MyIDE and others are good and fast, page swapping MAY be practical. Any thoughts? Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 19, 2013 Author Share Posted October 19, 2013 MyIDE and others are good and fast, page swapping MAY be practical. Any thoughts? I wouldn't want to swap out whole applications since this would require storage access in the interrupt context (although it would be just fine for simple task switching), but there's no reason that indirectly accessed extended RAM can't be paged out to disk. RAM allocated from the extended pool is never directly addressed by applications anyway, so if it's not around when the application requests a piece of it, we can just pull it in from disk. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 20, 2013 Author Share Posted October 20, 2013 Crikey... nice surprise. Was just going to set up a second Pokey interrupt to test scheduling when I discovered I'd already set one up (at 50Hz) with quite a long dummy delay in it. Turning this off speeded up rendering yet further, and the fact is the scheduler won't usually be doing that much work when most processes are in a "not ready" state. 5 Quote Link to comment Share on other sites More sharing options...
fibrewire Posted October 26, 2013 Share Posted October 26, 2013 Hopefully progress is smooth for the rom conversion process. I know Ive hit a rough spot when dishes/lawn/garage are welcome breaks from doing real work, so your perseverance in this project is greatly appreciated. Quote Link to comment Share on other sites More sharing options...
Kyle22 Posted October 26, 2013 Share Posted October 26, 2013 Hi Jon, I don't know if this is any help, but back in the day, I remember having a little proggie that would context switch with a key combination. I forget the name of it, and the key combo, but I do remember it poked GTIA in the process of the switch, so I could hear a short buzz while it switched. I found it quite useful. Switch between a WP and DaisyDot, for example. Hopefully, you or someone else knows the name of that prog. It almost works like Windows Alt-Tab. Hope this may help you in some way. -K Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted October 26, 2013 Author Share Posted October 26, 2013 (edited) Hopefully progress is smooth for the rom conversion process. I know Ive hit a rough spot when dishes/lawn/garage are welcome breaks from doing real work, so your perseverance in this project is greatly appreciated. Shouldn't take as long as the initial estimate, and it's going OK so far. I've actually been subject to yet another distraction in the form of the multitasking kernel I've started writing. It'll actually be useful to get this working at this stage: it'll remove the need to code up any cart jump tables because the kernel facilitates inter-process messaging using the 6502 BRK instruction. I've got the scheduler interrupt running in the test build already and the core kernel's about half done, pending testing and debugging. The trickiest bit is probably the message queue, which is a linked list to allow only those messages dispatched to a particular addressee to be pulled out of it. This approach appears to follow the SymbOS model but I'm waiting for Prodatron to confirm this. In any case, the kernel is one of the more fascinating parts of the project. Hi Jon, I don't know if this is any help, but back in the day, I remember having a little proggie that would context switch with a key combination. I forget the name of it, and the key combo, but I do remember it poked GTIA in the process of the switch, so I could hear a short buzz while it switched. I found it quite useful. Switch between a WP and DaisyDot, for example. Hopefully, you or someone else knows the name of that prog. It almost works like Windows Alt-Tab. Hope this may help you in some way. -K Yes Kyle I know of it, and it may even have been mentioned earlier in this thread. Thanks though. I imagine the context switching overheads in that utility were pretty large, since it would have to handle all the hardware registers, the entirety of page zero, the whole stack, all of main RAM, etc. Fortunately the GUI only has to swap out a tiny bit of page zero and any section of the segmented stack which is currently cached. The thorny issue of binary relocation format still weighs on my mind... Edited October 26, 2013 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
+David_P Posted October 27, 2013 Share Posted October 27, 2013 Yes Kyle I know of it, and it may even have been mentioned earlier in this thread. Thanks though. I imagine the context switching overheads in that utility were pretty large, since it would have to handle all the hardware registers, the entirety of page zero, the whole stack, all of main RAM, etc. Fortunately the GUI only has to swap out a tiny bit of page zero and any section of the segmented stack which is currently cached. Tom Hunt's Snapshot? I think that was the one; first version, as I recall, kept each "machine" in RAM; later version permitted you to swap out to a HD. Quote Link to comment Share on other sites More sharing options...
Kyle22 Posted October 27, 2013 Share Posted October 27, 2013 (edited) Tom Hunt's Snapshot? I think that was the one; first version, as I recall, kept each "machine" in RAM; later version permitted you to swap out to a HD. Thanks! That's the one I was thinking about, and I didn't know about the newer HD one. Now, I must go find it -Kyle P.S. Found it here http://atariage.com/forums/index.php?app=core&module=attach§ion=attach&attach_id=58998 Edited October 27, 2013 by Kyle22 Quote Link to comment Share on other sites More sharing options...
fibrewire Posted November 1, 2013 Share Posted November 1, 2013 I noticed you were writing a kernel for your GUI. Here is some information for SOS (Sophisticated Operating System) for the Apple /// http://en.wikipedia.org/wiki/Apple_SOS And conveniently, a source listing for said OS http://www.brutaldeluxe.fr/documentation/a3/apple3_SRC_SOS_DTC.pdf What caught my eye is the simplicity of how the OS communicated with other devices, namely character and block devices. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted November 1, 2013 Author Share Posted November 1, 2013 I noticed you were writing a kernel for your GUI. Here is some information for SOS (Sophisticated Operating System) for the Apple /// Interesting - thanks. Here's another pertinent document: http://www.1000bit.it/support/manuali/apple/a3sosrm.pdf. It's a single-tasking OS, but the sections on banked memory management and device drivers are especially relevant. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted November 1, 2013 Author Share Posted November 1, 2013 BTW: after a couple of restless nights worrying about redraw performance, I finally made a difficult but overdue decision and am abandoning my beloved RLE compressed window masks in favour of rectangle lists. The decision was made a little easier thanks to this article on the topic. SymbOS (like GEM) uses rectangle lists, but I had resisted change primarily because I considered the window masks such an elegant and novel solution. Another factor was that I simply couldn't get my head around rectangle lists and was going all out for simplicity as far as the client redraws were concerned. Now, clipping masks are wonderfully versatile but unfortunately they work ex post in the sense that by the time you extract the mask information and realize a particular byte is obscured, you're already in the middle of rendering the object. I think the classic Mac's QuickDraw used clipping masks (i.e. Regions) pre factum, since the 68000 is fast enough to AND an object's extent against a region prior to rendering. Very nice for rounded corner windows, etc, but I had to balance this against the fact that using the window masks, simply moving a window by a few pixels caused a calamitous amount of redrawing. What we really want to happen when we - for example - move a window, is to create an update region and see if any of the background windows' rectangles intersect with it. About the only drawback with rectangle lists is that you have to call a render on everything in the window for each rectangle in the list, but coordinate clipping is much faster than actually rendering stuff, and anything outside of the update region is just discarded. Anyway - all the methodology suddenly becomes wonderfully clear. Every time a window is opened, moved, resized or closed, the rectangle lists for every window (the desktop being window zero) are rebuilt. It's been quite interesting designing the code for all this. Here's my initial interpretation of the rectangle split algorithm described in the article above: .local RectSplit cpw Left2 Left1 ; split on left hand side? bcs NoLeftSplit mwa Left2 Left3 sbw Left1 #1 Right3 lda Top1 ; top3 = max (top1, top2) cmp Top2 bcs @+ lda Top2 @ sta Top3 lda Bottom1 ; bottom3 = min (bottom1, bottom2) cmp Bottom2 bcc @+ lda Bottom2 @ sta Bottom3 jsr AddWindowRect ; add Rect3 to the head of the rectangle list NoLeftSplit cpw Right1 Right2 bcs NoRightSplit adw Right1 #1 Left3 mwa Right2 Right3 lda Top1 ; top3 = max (top1, top2) cmp Top2 bcs @+ lda Top2 @ sta Top3 lda Bottom1 ; bottom3 = min (bottom1, bottom2) cmp Bottom2 bcc @+ lda Bottom2 @ sta Bottom3 jsr AddWindowRect NoRightSplit lda Top2 cmp Top1 bcs NoTopSplit sta Top3 sbb Top1 #1 Bottom3 cpw Left1 Left2 bcc Less1 mwa Left1 Left3 jmp @+ Less1 mwa Left2 Left3 @ jsr AddWindowRect NoTopSplit lda Bottom1 cmp Bottom2 bcs NoBottomSplit adc #1 sta Top3 mva Bottom2 Bottom3 cpw Left1 Left2 bcc Less2 mwa Left1 Left3 jmp @+ Less1 mwa Left2 Left3 @ jsr AddWindowRect NoBottomSplit rts .endl AddWindowRect adds the newly created rectangle to the head of the current window's rect list. By placing the new rect at the head of the list, we can simultaneously iterate through the existing rects from the original list head without visiting the new additions. So: we start with the desktop, and place a 320x200 rectangle at the start of its rect list. Then we split that rect with the first window, and then split that window and all the resulting desktop rects with the next window and so on.Stuff like background rendering can then be confined to the rectangle bounds, while larger objects which can't easily be "downsized" to fit the rects will be nicely clipped by the clipping routines. This all seems remarkably obvious now, but it's another fairly substantial change and thus rewrite... 3 Quote Link to comment Share on other sites More sharing options...
TheNameOfTheGame Posted November 1, 2013 Share Posted November 1, 2013 BTW: after a couple of restless nights worrying about redraw performance, I finally made a difficult but overdue decision and am abandoning my beloved RLE compressed window masks in favour of rectangle lists. The decision was made a little easier thanks to this article on the topic. SymbOS (like GEM) uses rectangle lists, but I had resisted change primarily because I considered the window masks such an elegant and novel solution. Another factor was that I simply couldn't get my head around rectangle lists and was going all out for simplicity as far as the client redraws were concerned. Now, clipping masks are wonderfully versatile but unfortunately they work ex post in the sense that by the time you extract the mask information and realize a particular byte is obscured, you're already in the middle of rendering the object. I think the classic Mac's QuickDraw used clipping masks (i.e. Regions) pre factum, since the 68000 is fast enough to AND an object's extent against a region prior to rendering. Very nice for rounded corner windows, etc, but I had to balance this against the fact that using the window masks, simply moving a window by a few pixels caused a calamitous amount of redrawing. What we really want to happen when we - for example - move a window, is to create an update region and see if any of the background windows' rectangles intersect with it. About the only drawback with rectangle lists is that you have to call a render on everything in the window for each rectangle in the list, but coordinate clipping is much faster than actually rendering stuff, and anything outside of the update region is just discarded. Anyway - all the methodology suddenly becomes wonderfully clear. Every time a window is opened, moved, resized or closed, the rectangle lists for every window (the desktop being window zero) are rebuilt. It's been quite interesting designing the code for all this. Here's my initial interpretation of the rectangle split algorithm described in the article above: *code* AddWindowRect adds the newly created rectangle to the head of the current window's rect list. By placing the new rect at the head of the list, we can simultaneously iterate through the existing rects from the original list head without visiting the new additions. So: we start with the desktop, and place a 320x200 rectangle at the start of its rect list. Then we split that rect with the first window, and then split that window and all the resulting desktop rects with the next window and so on. Stuff like background rendering can then be confined to the rectangle bounds, while larger objects which can't easily be "downsized" to fit the rects will be nicely clipped by the clipping routines. This all seems remarkably obvious now, but it's another fairly substantial change and thus rewrite... So the rectangle lists will be faster? I thought the masking method was already speedy for a 6502! Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted November 1, 2013 Author Share Posted November 1, 2013 So the rectangle lists will be faster? I thought the masking method was already speedy for a 6502! Removing the requirement to load up the window masks for each rendered scanline and then mask the bytes through them has certainly sped things up already. Therefore anything which actually does get drawn will be drawn faster. What remains to be seen (and I'm keeping a backed-up version using the masks) is how long all the rectangle clipping takes, with potentially multiple passes through the renderer to exhaust all the rectangles in the list. On paper it should be quite a bit faster (especially with regard to the slight delay in responsiveness in the current version after moving a front window, when nothing is apparently being drawn but in fact stuff is being drawn through a mask). 4 Quote Link to comment Share on other sites More sharing options...
fibrewire Posted November 2, 2013 Share Posted November 2, 2013 Hey i saw something in that article that might apply to your GUI if your still snapping windows to byte boundaries. "Another DRS algorithm, which is much simpler, is to divide the playing area into blocks, and mark each block that is dirty. When it is time to update the screen, blit all dirty blocks. By carefully chosing the size of blocks, an optimal speed may be achieved." Aslo, found the Apple 3 SOS Driver Writer's Reference. I don't know if it's of any use, but it's interesting that all drivers are stored in a single file, and has it's own configuration menu. Apple III SOS Device Driver Writer's Guide.pdf - Asimov.net Also, here's the Standard Device Drivers Manual for completeness. Apple III Standard Device Drivers Manual - 1000 BiT Quote Link to comment Share on other sites More sharing options...
The Usotsuki Posted November 2, 2013 Share Posted November 2, 2013 I do a lot of work with ProDOS which is basically a stripped-down SOS that fits in 13K (12K for the OS itself, 256 bytes for its global variables, 768 bytes for its default shell). Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted November 2, 2013 Author Share Posted November 2, 2013 (edited) Hey i saw something in that article that might apply to your GUI if your still snapping windows to byte boundaries. "Another DRS algorithm, which is much simpler, is to divide the playing area into blocks, and mark each block that is dirty. When it is time to update the screen, blit all dirty blocks. By carefully chosing the size of blocks, an optimal speed may be achieved."[/size] This raises the point that it might now be easier and more expedient to keep the window borders inside the byte boundaries, instead of outside as they currently are. This way, the split rectangles would always be whole byte extents. This would certainly save a bit of time when updating chunks of the desktop or clearing out sections of the display - not to mention making the window rect calculations 8-bit. Aslo, found the Apple 3 SOS Driver Writer's Reference. I don't know if it's of any use, but it's interesting that all drivers are stored in a single file, and has it's own configuration menu. Apple III SOS Device Driver Writer's Guide.pdf - Asimov.net Also, here's the Standard Device Drivers Manual for completeness. Apple III Standard Device Drivers Manual - 1000 BiT Good stuff - thanks. The stuff about driver structure is interesting, since this is something I'll need to design further down the line. Edited November 2, 2013 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.