Jump to content
IGNORED

Raycasting demo


Recommended Posts

It was ages ago, but I know I started with either a disassembly of something or an example program as a template. Maybe 4-tris? I'm not sure. I test this in Nostalgia, and the only way I could test it on hardware would be with my Intellicart, so it's not written to take advantage of any on-cart acceleration features :(

Edited by JohnPCAE
Link to comment
Share on other sites

It was ages ago, but I know I started with either a disassembly of something or an example program as a template. Maybe 4-tris? I'm not sure. I test this in Nostalgia, and the only way I could test it on hardware would be with my Intellicart, so it's not written to take advantage of any on-cart acceleration features :(

 

Well, my tweaks don't require JLP, they only shift the RAM down to an address range that's JLP-friendly (and also avoid write aliasing with GRAM).

 

So you don't test with your own emulator? I always wondered why IntvWin disappeared.

 

If you want to test with jzIntv (which does support JLP emulation), download the latest stable dev release: http://spatula-city.org/~im14u2c/intv/

Link to comment
Share on other sites

In the meantime I redesigned how the maze works when in colored squares mode. Instead of defining all four walls for a block with a single bit, I define each of them individually using a word for each. This gives me individual control over the color of each wall. The visual effect is much better. Rendering speed will be very slightly slower, at least for now.

raycast_20151228.zip

Link to comment
Share on other sites

I tried your version, and the on-screen results are different. It looks like two areas are stepping on one another: there are two regions that are both mapped to 0xC040.

 

Oops. I missed deleting an ORG $C040. The second one is in error. Here's an update. I tested it and it seems the visuals are restored.

raycast_20151227_jz.zip

Link to comment
Share on other sites

Ok, so I applied the changes to the 20151228 version. (Attached.) I did test the attached version, and it appears the output is good. (Unlike that oopsie upthread.)

 

Here's the diff, if you're curious. It's very short.

diff -wrbu raycast_20151228/raycast.src raycast_20151228_jz/raycast.src
--- raycast_20151228/raycast.src	2015-12-28 23:16:40.000000000 -0600
+++ raycast_20151228_jz/raycast.src	2015-12-29 01:57:34.000000000 -0600
@@ -25,7 +25,7 @@
 
         ; 16-bit RAM from $BE00-$BFFF (512 words)
 
-_CARTRAM    ORG     $BE00, $BE00, "+RW"
+_CARTRAM    ORG     $8040, $8040, "+RW"
 CartRAM     RMB     512
 
 
@@ -6207,7 +6207,7 @@
 
 ; -------------------------------------------------------------
 
-        ORG     $E000
+        ORG     $A000
 
 ; -------------------------------------------------------------
 ; Signed fixed-point multiply (slow...using tables would be faster)
@@ -15509,7 +15509,7 @@
 
         ENDP
 
-        ORG     $9800
+        ORG     $C040
 
 ; -------------------------------------------------------------
 ; R2 = R0 * R1, where R0 and R1 are unsigned 8-bit values
@@ -19871,7 +19871,7 @@
 
 ; -------------------------------------------------------------
 
-        ORG     $A000
+;       ORG     $A000
 
         ; Trig tables
 
@@ -22294,7 +22294,7 @@
         DCW     $0000,  $0000                   ; $B1FE  0000 0000       [.. ]
 
 
-        ORG     $B800
+;       ORG     $B800
 
 Maze:

raycast_20151228_jz.zip

  • Like 1
Link to comment
Share on other sites

  • 8 months later...

Thanks, Joe! I incorporated your changes into my source. I also had an insight a week ago on how to speed up the algorithm quite a bit by using a special-case multiply in several places. Where I determine the distance to the wall, one multiplicand is always >= 1 and < 2. That lets me drop multiplying by the high byte and adding in the original value instead (x * y changes to x * y.lo + x). This is only being used in Colored Squares mode.

 

I've also added a little feature to the Colored Squares rendering: if you use the keypad to set the maximum render distance to something less than the maximum, it will draw black squares if no wall was hit to make it look like it's dark far away.

raycast_20160918.zip

  • Like 4
Link to comment
Share on other sites

  • 2 years later...

I decided to take a fresh look at fixed-point multiplication (using the quarter-square method) and I managed to come up with a faster implementation. All of the multiplication routines were affected (basic + all special cases). Speedups vary, but for instance the kernel of the basic version runs in <90% of the time as the previous version.

raycast_20181206.zip

  • Like 7
Link to comment
Share on other sites

One question I have for others here is whether using the EXEC could be holding it back. The demo runs within the EXEC's normal interrupt mechanism, but I don't know enough about programming the Inty to know if this is a problem or not, or even how to not use the EXEC.

 

Oh dear! When you say "within the EXEC's normal interrupt mechanism," do you mean that you have set it up as a "process" in the EXEC task list? If so, I believe it means it will be called at 20 Hz. It also means that hand-controller input is processed no faster than 20 Hz as well.

 

If you reeaaaaaaaaaaally want to get rid of the EXEC completely (which I strongly urge you to), then it's rather easy. Just know that you will not be able to use anything from the EXEC after that, since you'll be breaking out of its framework. There is no turning back ...

 

Here's some simple, high-level information. The SDK-1600 includes three handy components that will make your life exceedingly easier:

  • CART.MAC - This is a macro library that allows you to automatically set up a ROM header appropriate for traditional home-brewing (i.e., skipping the EXEC completely). The library has many features for memory allocation and all that, but the most important one is the ROMSETUP directive, which will set up the ROM header in a way that will automatically call your MAIN routine directly.
  • TASKQ - This is a simple game-engine task scheduler and main loop. It sets up a task queue, similar to the EXEC's process table, except that it is dynamic and runs outside the ISR context, in real-time. The idea is that you set up tasks or events and they will be executed in the order they were queued.
  • SCANHAND - This is a general-purpose hand-controller decoder which, when included with TASKQ, will be automatically called during "idle" times.

 

The idea then is to set up a ROM header that points to your MAIN routine. Your MAIN routine will just set up your custom ISR (Interrupt Service Routine) and run the TASKQ scheduler, which serves as an engine main loop. You then prepare even handlers for user input that will update the state of your world, and your ISR will then update GRAM cards and STIC registers as necessary.

 

Let us know if you need help with this.

 

-dZ.

Link to comment
Share on other sites

I don't even have SDK-1600 on this laptop :? This is how the program sets itself up:

; Memory locations

STIC            EQU     $0000

STICInteraction EQU     $0018
STICHandshake   EQU     $0020
STICCardMode    EQU     $0021

IntVecLo        EQU     $0100
IntVecHi        EQU     $0101
TaskQueueHead   EQU     $0102
TaskQueueTail   EQU     $0103
TaskQueue       EQU     $0104

LeftCtrlData    EQU     $0115
RightCtrlData   EQU     $0118

DrawHeights     EQU     $011B


RightController EQU     $01FE
LeftController  EQU     $01FF

BACKTAB         EQU     $0200

.
.
.

; -------------------------------------------------------------
; EXEC-ROM HEADER
; -------------------------------------------------------------

        ORG     $5000

        BIDECLE InitTable
        BIDECLE InitTable
        BIDECLE Start                           ; ^ start program address
        BIDECLE InitTable
        BIDECLE InitTable2
        BIDECLE Title                           ; ^ date / title string

        DCW     $03C0

InitTable:

        BIDECLE $0000

InitTable2:

        DCW     $0003,  $0005
        DCW     $0000,  $0000
        DCW     $0000

; -------------------------------------------------------------

Title   PROC

        DCW     113, 'Raycasting Demo', $0      ; 2013

        ; 'code after title' here (patches to the title screen, etc).

        PSHR    R5
        JSR     R5,     PrintStr

        DCW     COLOR_WHT, $23D, '= JD = Productions', 0      ; Color, position, string, null terminator

        JSR     R5,     PrintStr

        DCW     COLOR_WHT, $2D0, '2013 = JD =', 0             ; Color, position, string, null terminator

        PULR    R7

        ENDP

; -------------------------------------------------------------

Start   PROC

        DIS
        MVII    #STACK, R6
        SUBR    R0,     R0
        MVO     R0,     TaskQueueHead
        MVO     R0,     TaskQueueTail
        MVO     R0,     UpdateAllowed
        MVO     R0,     KeyWasPressed
        MVII    #STICSH, R4
        MVII    #$0020, R1
        JSR     R5,     ZeroMemory

        ; Start with GRAM mode

        SUBR    R0,     R0
        MVO     R0,     Mode
        JSR     R5,     SetGRAMMode

        ; Set the previous movement variables

        SUBR    R0,     R0
        MVO     R0,     OldDX
        MVO     R0,     OldDY
        MVO     R0,     OldDR
        MVO     R0,     OldAngle

        ; Set the initial X position

        MVII    #InitialX, R0
        MVO     R0,     XPos

        ; Set the initial Y position

        MVII    #InitialY, R0
        MVO     R0,     YPos

        ; Set the initial heading

        MVII    #InitialA, R0
        MVO     R0,     Angle

        ; Set the initial rotation speed index (0-15, $80 = no movement)

        MVII    #$0080, R0
        MVO     R0,     RotSp

        ; Set the initial maximum rendering distance

        MVII    #InitialMaxDist, R0
        MVO     R0,     MaxDist

        ; Zero the card copy count

        SUBR    R0,     R0
        MVO     R0,     CardCopyCount

        ; Set the interrupt vector

        MVII    #InterruptProc, R0
        MVO     R0,     IntVecLo
        SWAP    R0
        MVO     R0,     IntVecHi

        EIS
        JSR     R5,     Render
        MVII    #Handlers, R0
        MVO     R0,     W0338
        MVO     R7,     UpdateAllowed
        JSR     R5,     Scheduler
        JSR     R5,     PrintStr

        DCW     COLOR_RED, $2DC, 'SCHEDEXIT WAS CALLED', 0    ; Color, position, string, null terminator

        ; Halt here

        DECR    R7

        ENDP

When I compile, I just use as1600:

 

..\..\jzintv-20180509-win32\bin\as1600 raycast.src -o raycast

Edited by JohnPCAE
Link to comment
Share on other sites

I don't even have SDK-1600 on this laptop :? This is how the program sets itself up:

< SNIP >

When I compile, I just use as1600:

 

..\..\jzintv-20180509-win32\bin\as1600 raycast.src -o raycast

 

The SDK-1600 is just the library files that Joe Z. includes with the tools. You can find it here:

 

Anyway, I don't think you need them, for I see in your code that you have implemented most of the same routines.

 

Moreover, I don't see you using the EXEC in there. That set up does pretty much what we do, except in a long way: it sets the basic ROM header with useless process and animation lists, and then hijacks control during the start routine.

 

In fact, I'm scanning through your source code in "raycast.src" and I can't find where you use the EXEC. I guess when you said "within the EXEC's normal interrupt mechanism," you just meant that you are performing your updates during the ISR. That's fine, that's typically how it's done.

 

That said, I do see that your interrupt service routine does too much work that may not be necessary, like setting the video mode and clearing the collision registers on every call. Perhaps it could be tightened. However, everything else seems to run as fast as it can, in its own loop outside the ISR.

 

-dZ.

Edited by DZ-Jay
Link to comment
Share on other sites

I cleaned up some of the source code and gave the colored-squares rendering routine (DrawRayCS2) an optimization pass. It runs about 9.5% faster now, taking up 1217 less clock cycles (1.36ms). Speed-wise it's not the long pole in the tent, but at least it's more efficient.

 

1.2K cycles is something. :thumbsup:

 

-dZ.

Link to comment
Share on other sites

MORE SPEED!

 

I crawled through the colored-squares ray-casting portion, which is (I believe) the long pole in the tent. It definitely runs smoother now. Depending on the maximum draw distance setting and where you are in the maze, it can save up to about 16k clock cycles (on average, though, probably much less).

raycast_20181215.zip

  • Like 4
Link to comment
Share on other sites

I would like to help in adding sprite support.

Scaling sprites has to be done offline, while occlusion can be online.

 

I would use actual sprites to have an acceptable resolution, but actually when you get close to an enemy you should be able to zoom out to sizes you cannot support with hw sprites.

 

Probably the best solution is a mixed of hw sprites for objects in the long range and sw sprites for closer ranges.

 

In order to simplify the development sw sprites are the best choice to start with.

Link to comment
Share on other sites

I'm not sure that software sprites are possible, but feel free to play with it! My guess is that the raytracer would have to check against a list of object positions for each of the 40 vertical stripes in the loop. Testing whether a sprite object is partially occluded by a wall could get very tricky, though. I think a table of distances vs. sprite width could help speed things up by telling the raytracer that it can safely ignore an object if it's too far or if it has already struck it on a previous stripe. I think it can be made to work, but making it fast will be the real trick. I really wish there was a way to profile the code so I could know where the CPU is spending most of its time.

 

I don't know if DOOM will be possible, but Wolfenstein 3D might be!

Edited by JohnPCAE
Link to comment
Share on other sites

I reworked the beginning part of the raycaster where it sets up values based on the direction of the ray. By breaking out all of the special cases there is an overall speedup across the board, mainly by eliminating a bunch of tests and branches. The effect is noticeable.

 

I'm not entirely sure now where the demo is spending most of its time -- in the ISR, in the raycaster, in multiplication or division routines, or in the renderer. One interesting thing is that lowering the draw distance doesn't speed things up all that much. I've tried disabling the division routine and using a constant height and there was no noticeable speedup, so I don't think that's the main bottleneck. Maybe my task scheduler? It was lifted from one of Joe's examples and to be honest I don't understand it all that well.

raycast_20181220.zip

  • Like 1
Link to comment
Share on other sites

I reworked the beginning part of the raycaster where it sets up values based on the direction of the ray. By breaking out all of the special cases there is an overall speedup across the board, mainly by eliminating a bunch of tests and branches. The effect is noticeable.

 

I'm not entirely sure now where the demo is spending most of its time -- in the ISR, in the raycaster, in multiplication or division routines, or in the renderer. One interesting thing is that lowering the draw distance doesn't speed things up all that much. I've tried disabling the division routine and using a constant height and there was no noticeable speedup, so I don't think that's the main bottleneck. Maybe my task scheduler? It was lifted from one of Joe's examples and to be honest I don't understand it all that well.

 

The TASKQ task scheduler is a good general-purpose library (it is the core of Christmas Carol) but a bit on the hefty side. It is quite expensive to queue and dequeue using that library, but perhaps that's not the bottleneck.

 

There's a very simple profiling function in the debugger that may help. From the debugger command line, enable "CPU History" with the "h" command, then run a few cycles and break into the debugger again. From the debugger command line again, dump the CPU history with the "d" command.

 

The result is an "*.hst" file containing the last bit of CPU execution history, but at the bottom it includes a profile of the addresses most called and how much time is spent on each. It's very rudimentary, but it's something. :)

 

As for the TASKQ, should it prove to be the bottleneck, I have an improved version which Joe Z., Arnauld C., and I worked on. It is quite a lot faster. It is now the core of P-Machinery 2.0.

 

The original discussion on the topic should be in the INTVProg mailing list archives. If you're interested, I can post a generalized version of the library in P-Machinery.

 

-dZ.

Link to comment
Share on other sites

Lots more optimizations to the colored-squares raycaster. It's really cooking now. I'm curious as to how it compares to motion in Treasure of Tarmin. It's reaching the point where you have to drastically lower the draw distance to get a noticeable improvement in speed, which suggests to me that the CPU is spending most of its time elsewhere.

raycast_20181222.zip

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...