JohnPCAE Posted December 28, 2015 Author Share Posted December 28, 2015 (edited) It was ages ago, but I know I started with either a disassembly of something or an example program as a template. Maybe 4-tris? I'm not sure. I test this in Nostalgia, and the only way I could test it on hardware would be with my Intellicart, so it's not written to take advantage of any on-cart acceleration features Edited December 28, 2015 by JohnPCAE Quote Link to comment Share on other sites More sharing options...
intvnut Posted December 28, 2015 Share Posted December 28, 2015 It was ages ago, but I know I started with either a disassembly of something or an example program as a template. Maybe 4-tris? I'm not sure. I test this in Nostalgia, and the only way I could test it on hardware would be with my Intellicart, so it's not written to take advantage of any on-cart acceleration features Well, my tweaks don't require JLP, they only shift the RAM down to an address range that's JLP-friendly (and also avoid write aliasing with GRAM). So you don't test with your own emulator? I always wondered why IntvWin disappeared. If you want to test with jzIntv (which does support JLP emulation), download the latest stable dev release: http://spatula-city.org/~im14u2c/intv/ Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 28, 2015 Author Share Posted December 28, 2015 I tried your version, and the on-screen results are different. It looks like two areas are stepping on one another: there are two regions that are both mapped to 0xC040. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 29, 2015 Author Share Posted December 29, 2015 In the meantime I redesigned how the maze works when in colored squares mode. Instead of defining all four walls for a block with a single bit, I define each of them individually using a word for each. This gives me individual control over the color of each wall. The visual effect is much better. Rendering speed will be very slightly slower, at least for now. raycast_20151228.zip Quote Link to comment Share on other sites More sharing options...
intvnut Posted December 29, 2015 Share Posted December 29, 2015 I tried your version, and the on-screen results are different. It looks like two areas are stepping on one another: there are two regions that are both mapped to 0xC040. Oops. I missed deleting an ORG $C040. The second one is in error. Here's an update. I tested it and it seems the visuals are restored. raycast_20151227_jz.zip Quote Link to comment Share on other sites More sharing options...
intvnut Posted December 29, 2015 Share Posted December 29, 2015 Ok, so I applied the changes to the 20151228 version. (Attached.) I did test the attached version, and it appears the output is good. (Unlike that oopsie upthread.) Here's the diff, if you're curious. It's very short. diff -wrbu raycast_20151228/raycast.src raycast_20151228_jz/raycast.src --- raycast_20151228/raycast.src 2015-12-28 23:16:40.000000000 -0600 +++ raycast_20151228_jz/raycast.src 2015-12-29 01:57:34.000000000 -0600 @@ -25,7 +25,7 @@ ; 16-bit RAM from $BE00-$BFFF (512 words) -_CARTRAM ORG $BE00, $BE00, "+RW" +_CARTRAM ORG $8040, $8040, "+RW" CartRAM RMB 512 @@ -6207,7 +6207,7 @@ ; ------------------------------------------------------------- - ORG $E000 + ORG $A000 ; ------------------------------------------------------------- ; Signed fixed-point multiply (slow...using tables would be faster) @@ -15509,7 +15509,7 @@ ENDP - ORG $9800 + ORG $C040 ; ------------------------------------------------------------- ; R2 = R0 * R1, where R0 and R1 are unsigned 8-bit values @@ -19871,7 +19871,7 @@ ; ------------------------------------------------------------- - ORG $A000 +; ORG $A000 ; Trig tables @@ -22294,7 +22294,7 @@ DCW $0000, $0000 ; $B1FE 0000 0000 [.. ] - ORG $B800 +; ORG $B800 Maze: raycast_20151228_jz.zip 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted September 18, 2016 Author Share Posted September 18, 2016 Thanks, Joe! I incorporated your changes into my source. I also had an insight a week ago on how to speed up the algorithm quite a bit by using a special-case multiply in several places. Where I determine the distance to the wall, one multiplicand is always >= 1 and < 2. That lets me drop multiplying by the high byte and adding in the original value instead (x * y changes to x * y.lo + x). This is only being used in Colored Squares mode. I've also added a little feature to the Colored Squares rendering: if you use the keypad to set the maximum render distance to something less than the maximum, it will draw black squares if no wall was hit to make it look like it's dark far away. raycast_20160918.zip 4 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 7, 2018 Author Share Posted December 7, 2018 I decided to take a fresh look at fixed-point multiplication (using the quarter-square method) and I managed to come up with a faster implementation. All of the multiplication routines were affected (basic + all special cases). Speedups vary, but for instance the kernel of the basic version runs in <90% of the time as the previous version. raycast_20181206.zip 7 Quote Link to comment Share on other sites More sharing options...
CrazyChris Posted December 8, 2018 Share Posted December 8, 2018 Whoa! Runs fast! Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 12, 2018 Author Share Posted December 12, 2018 One question I have for others here is whether using the EXEC could be holding it back. The demo runs within the EXEC's normal interrupt mechanism, but I don't know enough about programming the Inty to know if this is a problem or not, or even how to not use the EXEC. Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 12, 2018 Share Posted December 12, 2018 One question I have for others here is whether using the EXEC could be holding it back. The demo runs within the EXEC's normal interrupt mechanism, but I don't know enough about programming the Inty to know if this is a problem or not, or even how to not use the EXEC. Oh dear! When you say "within the EXEC's normal interrupt mechanism," do you mean that you have set it up as a "process" in the EXEC task list? If so, I believe it means it will be called at 20 Hz. It also means that hand-controller input is processed no faster than 20 Hz as well. If you reeaaaaaaaaaaally want to get rid of the EXEC completely (which I strongly urge you to), then it's rather easy. Just know that you will not be able to use anything from the EXEC after that, since you'll be breaking out of its framework. There is no turning back ... Here's some simple, high-level information. The SDK-1600 includes three handy components that will make your life exceedingly easier: CART.MAC - This is a macro library that allows you to automatically set up a ROM header appropriate for traditional home-brewing (i.e., skipping the EXEC completely). The library has many features for memory allocation and all that, but the most important one is the ROMSETUP directive, which will set up the ROM header in a way that will automatically call your MAIN routine directly. TASKQ - This is a simple game-engine task scheduler and main loop. It sets up a task queue, similar to the EXEC's process table, except that it is dynamic and runs outside the ISR context, in real-time. The idea is that you set up tasks or events and they will be executed in the order they were queued. SCANHAND - This is a general-purpose hand-controller decoder which, when included with TASKQ, will be automatically called during "idle" times. The idea then is to set up a ROM header that points to your MAIN routine. Your MAIN routine will just set up your custom ISR (Interrupt Service Routine) and run the TASKQ scheduler, which serves as an engine main loop. You then prepare even handlers for user input that will update the state of your world, and your ISR will then update GRAM cards and STIC registers as necessary. Let us know if you need help with this. -dZ. Quote Link to comment Share on other sites More sharing options...
artrag Posted December 12, 2018 Share Posted December 12, 2018 If the exec is slowing down your code now, I barely can imagine what you can do when you will get rid of it... Doom on intellivision is going to be reality The engine in pixel square mode seems to be the best candidate for implementing a FPS, as you can use gram for sprites Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 13, 2018 Author Share Posted December 13, 2018 (edited) I don't even have SDK-1600 on this laptop This is how the program sets itself up: ; Memory locations STIC EQU $0000 STICInteraction EQU $0018 STICHandshake EQU $0020 STICCardMode EQU $0021 IntVecLo EQU $0100 IntVecHi EQU $0101 TaskQueueHead EQU $0102 TaskQueueTail EQU $0103 TaskQueue EQU $0104 LeftCtrlData EQU $0115 RightCtrlData EQU $0118 DrawHeights EQU $011B RightController EQU $01FE LeftController EQU $01FF BACKTAB EQU $0200 . . . ; ------------------------------------------------------------- ; EXEC-ROM HEADER ; ------------------------------------------------------------- ORG $5000 BIDECLE InitTable BIDECLE InitTable BIDECLE Start ; ^ start program address BIDECLE InitTable BIDECLE InitTable2 BIDECLE Title ; ^ date / title string DCW $03C0 InitTable: BIDECLE $0000 InitTable2: DCW $0003, $0005 DCW $0000, $0000 DCW $0000 ; ------------------------------------------------------------- Title PROC DCW 113, 'Raycasting Demo', $0 ; 2013 ; 'code after title' here (patches to the title screen, etc). PSHR R5 JSR R5, PrintStr DCW COLOR_WHT, $23D, '= JD = Productions', 0 ; Color, position, string, null terminator JSR R5, PrintStr DCW COLOR_WHT, $2D0, '2013 = JD =', 0 ; Color, position, string, null terminator PULR R7 ENDP ; ------------------------------------------------------------- Start PROC DIS MVII #STACK, R6 SUBR R0, R0 MVO R0, TaskQueueHead MVO R0, TaskQueueTail MVO R0, UpdateAllowed MVO R0, KeyWasPressed MVII #STICSH, R4 MVII #$0020, R1 JSR R5, ZeroMemory ; Start with GRAM mode SUBR R0, R0 MVO R0, Mode JSR R5, SetGRAMMode ; Set the previous movement variables SUBR R0, R0 MVO R0, OldDX MVO R0, OldDY MVO R0, OldDR MVO R0, OldAngle ; Set the initial X position MVII #InitialX, R0 MVO R0, XPos ; Set the initial Y position MVII #InitialY, R0 MVO R0, YPos ; Set the initial heading MVII #InitialA, R0 MVO R0, Angle ; Set the initial rotation speed index (0-15, $80 = no movement) MVII #$0080, R0 MVO R0, RotSp ; Set the initial maximum rendering distance MVII #InitialMaxDist, R0 MVO R0, MaxDist ; Zero the card copy count SUBR R0, R0 MVO R0, CardCopyCount ; Set the interrupt vector MVII #InterruptProc, R0 MVO R0, IntVecLo SWAP R0 MVO R0, IntVecHi EIS JSR R5, Render MVII #Handlers, R0 MVO R0, W0338 MVO R7, UpdateAllowed JSR R5, Scheduler JSR R5, PrintStr DCW COLOR_RED, $2DC, 'SCHEDEXIT WAS CALLED', 0 ; Color, position, string, null terminator ; Halt here DECR R7 ENDP When I compile, I just use as1600: ..\..\jzintv-20180509-win32\bin\as1600 raycast.src -o raycast Edited December 13, 2018 by JohnPCAE Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 13, 2018 Share Posted December 13, 2018 (edited) I don't even have SDK-1600 on this laptop This is how the program sets itself up: < SNIP > When I compile, I just use as1600: ..\..\jzintv-20180509-win32\bin\as1600 raycast.src -o raycast The SDK-1600 is just the library files that Joe Z. includes with the tools. You can find it here: http://spatula-city.org/~im14u2c/intv/ Anyway, I don't think you need them, for I see in your code that you have implemented most of the same routines. Moreover, I don't see you using the EXEC in there. That set up does pretty much what we do, except in a long way: it sets the basic ROM header with useless process and animation lists, and then hijacks control during the start routine. In fact, I'm scanning through your source code in "raycast.src" and I can't find where you use the EXEC. I guess when you said "within the EXEC's normal interrupt mechanism," you just meant that you are performing your updates during the ISR. That's fine, that's typically how it's done. That said, I do see that your interrupt service routine does too much work that may not be necessary, like setting the video mode and clearing the collision registers on every call. Perhaps it could be tightened. However, everything else seems to run as fast as it can, in its own loop outside the ISR. -dZ. Edited December 13, 2018 by DZ-Jay Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 15, 2018 Author Share Posted December 15, 2018 I cleaned up some of the source code and gave the colored-squares rendering routine (DrawRayCS2) an optimization pass. It runs about 9.5% faster now, taking up 1217 less clock cycles (1.36ms). Speed-wise it's not the long pole in the tent, but at least it's more efficient. raycast_20181214.zip 1 Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 15, 2018 Share Posted December 15, 2018 I cleaned up some of the source code and gave the colored-squares rendering routine (DrawRayCS2) an optimization pass. It runs about 9.5% faster now, taking up 1217 less clock cycles (1.36ms). Speed-wise it's not the long pole in the tent, but at least it's more efficient. 1.2K cycles is something. -dZ. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 15, 2018 Author Share Posted December 15, 2018 MORE SPEED! I crawled through the colored-squares ray-casting portion, which is (I believe) the long pole in the tent. It definitely runs smoother now. Depending on the maximum draw distance setting and where you are in the maze, it can save up to about 16k clock cycles (on average, though, probably much less). raycast_20181215.zip 4 Quote Link to comment Share on other sites More sharing options...
artrag Posted December 16, 2018 Share Posted December 16, 2018 I would like to help in adding sprite support. Scaling sprites has to be done offline, while occlusion can be online. I would use actual sprites to have an acceptable resolution, but actually when you get close to an enemy you should be able to zoom out to sizes you cannot support with hw sprites. Probably the best solution is a mixed of hw sprites for objects in the long range and sw sprites for closer ranges. In order to simplify the development sw sprites are the best choice to start with. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 16, 2018 Author Share Posted December 16, 2018 (edited) I'm not sure that software sprites are possible, but feel free to play with it! My guess is that the raytracer would have to check against a list of object positions for each of the 40 vertical stripes in the loop. Testing whether a sprite object is partially occluded by a wall could get very tricky, though. I think a table of distances vs. sprite width could help speed things up by telling the raytracer that it can safely ignore an object if it's too far or if it has already struck it on a previous stripe. I think it can be made to work, but making it fast will be the real trick. I really wish there was a way to profile the code so I could know where the CPU is spending most of its time. I don't know if DOOM will be possible, but Wolfenstein 3D might be! Edited December 16, 2018 by JohnPCAE Quote Link to comment Share on other sites More sharing options...
artrag Posted December 16, 2018 Share Posted December 16, 2018 For sprites I followed this https://lodev.org/cgtutor/raycasting3.html 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 20, 2018 Author Share Posted December 20, 2018 I reworked the beginning part of the raycaster where it sets up values based on the direction of the ray. By breaking out all of the special cases there is an overall speedup across the board, mainly by eliminating a bunch of tests and branches. The effect is noticeable. I'm not entirely sure now where the demo is spending most of its time -- in the ISR, in the raycaster, in multiplication or division routines, or in the renderer. One interesting thing is that lowering the draw distance doesn't speed things up all that much. I've tried disabling the division routine and using a constant height and there was no noticeable speedup, so I don't think that's the main bottleneck. Maybe my task scheduler? It was lifted from one of Joe's examples and to be honest I don't understand it all that well. raycast_20181220.zip 1 Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 20, 2018 Share Posted December 20, 2018 I reworked the beginning part of the raycaster where it sets up values based on the direction of the ray. By breaking out all of the special cases there is an overall speedup across the board, mainly by eliminating a bunch of tests and branches. The effect is noticeable. I'm not entirely sure now where the demo is spending most of its time -- in the ISR, in the raycaster, in multiplication or division routines, or in the renderer. One interesting thing is that lowering the draw distance doesn't speed things up all that much. I've tried disabling the division routine and using a constant height and there was no noticeable speedup, so I don't think that's the main bottleneck. Maybe my task scheduler? It was lifted from one of Joe's examples and to be honest I don't understand it all that well. The TASKQ task scheduler is a good general-purpose library (it is the core of Christmas Carol) but a bit on the hefty side. It is quite expensive to queue and dequeue using that library, but perhaps that's not the bottleneck. There's a very simple profiling function in the debugger that may help. From the debugger command line, enable "CPU History" with the "h" command, then run a few cycles and break into the debugger again. From the debugger command line again, dump the CPU history with the "d" command. The result is an "*.hst" file containing the last bit of CPU execution history, but at the bottom it includes a profile of the addresses most called and how much time is spent on each. It's very rudimentary, but it's something. As for the TASKQ, should it prove to be the bottleneck, I have an improved version which Joe Z., Arnauld C., and I worked on. It is quite a lot faster. It is now the core of P-Machinery 2.0. The original discussion on the topic should be in the INTVProg mailing list archives. If you're interested, I can post a generalized version of the library in P-Machinery. -dZ. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 22, 2018 Author Share Posted December 22, 2018 Lots more optimizations to the colored-squares raycaster. It's really cooking now. I'm curious as to how it compares to motion in Treasure of Tarmin. It's reaching the point where you have to drastically lower the draw distance to get a noticeable improvement in speed, which suggests to me that the CPU is spending most of its time elsewhere. raycast_20181222.zip 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 31, 2018 Author Share Posted December 31, 2018 Fixed a little bug in the code that checks to see if a ray has gone outside the maze and made it a teensy bit faster. Happy New Year, everyone! raycast_20181231.zip 4 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 6, 2019 Author Share Posted January 6, 2019 I made a ton of optimizations and it's quite a lot faster raycast_20190106.zip 4 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.