Jump to content
  • entries
    658
  • comments
    2,707
  • views
    906,800

Slick Kernel


With my prior DPC+ kernels, I used an Event Datastream that would tell the kernel to jump out of its normal loop. After that, the 6507 would spent a lot of CPU time deciding if it was supposed to reposition player 0, reposition player 1, or if it was all done drawing the game display.

 

I got to thinking and came up with a way were the 6507 no longer has to make any kernel decisions - instead of having an Event Datastream, just have a Jump Datastream. Every entry in the Jump Datastream is initialized with the address of NormalKernel. When a player needs to be repositioned the C code just changes the appropriate Jump Datastream entry to point to the proper reposition kernel.

NormalKernel:                   ;   19
        lda #<DS_NUSIZ_COLUPF   ; 2 21 ' resize players/missiles, color for ball
        lda #<DS_COLUP_HMMB     ; 2 23 ' use for missile/ball HMxx
        ldx DS_GRP1             ; 4 27
        lda #<DS_JUMP           ; 2 29
        sta NextKernel          ; 3 32
        lda #<DS_JUMP           ; 2 34
        sta NextKernel+1        ; 3 37
        sta HMCLR               ; 3 40 ' reset missile/ball HMOVE
        lda #<DS_HMP0           ; 2 42
        sta HMP0                ; 3 45
        lda #<DS_HMP1           ; 2 47
        sta HMP1                ; 3 50
        lda #<DS_GRP0           ; 2 52
        sta GRP0                ; 3 55 <- on VDEL, for next scanline
        lda #<DS_M1M0BL         ; 2 57 <- on VDEL, for next scanline
        sta ENABL               ; 3 60
        lsr                     ; 2 62
        sta WSYNC               ; 3 65/0
R76:                            ; A holds M0 and M1
                                ; X holds GRP1
                                ; GRP0 and BL on VDEL
        sta HMOVE               ; 3  3
        stx GRP1                ; 3  6
        sta ENAM0               ; 3  9
        lsr                     ; 2 11
        sta ENAM1               ; 3 14
        jmp (NextKernel)        ; 5 19
 

 

The prior kernels would then have special reposition routines for the players that would take multiple scanlines to process. In Frantic it takes 4 scanlines while in Space Rocks it takes 2. Without the extra "which event" logic, the time to reposition a player is now down to 1 scanline:

 

Here's 1 of the 11 reposition kernels for player 0

Resp0Strobe23:                  ;   19
        sta.w RESP0             ; 4 23
        lda #<DS_NUSIZ_COLUPF   ; 2 25
        sta NUSIZ0              ; 3 28
        lda #<DS_COLUP_HMMB     ; 2 30
        sta COLUP0              ; 3 33
        ldx DS_GRP1             ; 4 37
        lda #<DS_JUMP           ; 2 39
        sta NextKernel          ; 3 42
        lda #<DS_JUMP           ; 2 44
        sta NextKernel+1        ; 3 47
        sta HMCLR               ; 3 50 ' reset missile/ball HMOVE
        lda #<DS_HMP0           ; 2 52
        sta HMP0                ; 3 55
        lda #<DS_HMP1           ; 2 57
        sta HMP1                ; 3 60
        lda #<DS_GRP0           ; 2 62
        sta GRP0                ; 3 65 <- on VDEL, for next scanline
        lda #<DS_M1M0BL         ; 2 67 <- on VDEL, for next scanline
        sta.w ENABL             ; 4 71
        lsr                     ; 2 73
        jmp R76                 ; 3 76
 

 

and here's one for player 1

Resp1Strobe23:                  ;   19
        sta.w RESP1             ; 4 23
        lda #<DS_NUSIZ_COLUPF   ; 2 25
        sta NUSIZ1              ; 3 28
        lda #<DS_COLUP_HMMB     ; 2 30
        sta COLUP1              ; 3 33
        ldx DS_GRP1             ; 4 37
        lda #<DS_JUMP           ; 2 39
        sta NextKernel          ; 3 42
        lda #<DS_JUMP           ; 2 44
        sta NextKernel+1        ; 3 47
        sta HMCLR               ; 3 50 ' reset missile/ball HMOVE
        lda #<DS_HMP0           ; 2 52
        sta HMP0                ; 3 55
        lda #<DS_HMP1           ; 2 57
        sta HMP1                ; 3 60
        lda #<DS_GRP0           ; 2 62
        sta GRP0                ; 3 65 <- on VDEL, for next scanline
        lda #<DS_M1M0BL         ; 2 67 <- on VDEL, for next scanline
        sta.w ENABL             ; 4 71
        lsr                     ; 2 73
        jmp R76                 ; 3 76
 

 

The major benefit of not having the 6507 spend CPU time making decisions is we can now also do mid-screen repositioning of both missiles:

Resm0Strobe23:                  ;   19
        sta.w RESM0             ; 4 23
        lda #<DS_JUMP           ; 2 25
        sta NextKernel          ; 3 28
        lda #<DS_JUMP           ; 2 30
        sta NextKernel+1        ; 3 33
        lda #<DS_NUSIZ_COLUPF   ; 2 35
        sta NUSIZ0              ; 3 38
        sta HMCLR               ; 3 41 ' reset missile/ball HMOVE
        ldx DS_GRP1             ; 4 45
        lda #<DS_COLUP_HMMB     ; 2 47
        sta HMM0                ; 3 50
        lda #<DS_HMP0           ; 2 52
        sta HMP0                ; 3 55
        lda #<DS_HMP1           ; 2 57
        sta HMP1                ; 3 60
        lda #<DS_GRP0           ; 2 62
        sta GRP0                ; 3 65 <- on VDEL, for next scanline
        lda #<DS_M1M0BL         ; 2 67 <- on VDEL, for next scanline
        sta.w ENABL             ; 4 71
        lsr                     ; 2 73
        jmp R76                 ; 3 76
        
Resm1Strobe23:                  ;   19
        sta.w RESM1             ; 4 23
        lda #<DS_JUMP           ; 2 25
        sta NextKernel          ; 3 28
        lda #<DS_JUMP           ; 2 30
        sta NextKernel+1        ; 3 33
        lda #<DS_NUSIZ_COLUPF   ; 2 35
        sta NUSIZ1              ; 3 38
        sta HMCLR               ; 3 41 ' reset missile/ball HMOVE
        ldx DS_GRP1             ; 4 45
        lda #<DS_COLUP_HMMB     ; 2 47
        sta HMM1                ; 3 50
        lda #<DS_HMP0           ; 2 52
        sta HMP0                ; 3 55
        lda #<DS_HMP1           ; 2 57
        sta HMP1                ; 3 60
        lda #<DS_GRP0           ; 2 62
        sta GRP0                ; 3 65 <- on VDEL, for next scanline
        lda #<DS_M1M0BL         ; 2 67 <- on VDEL, for next scanline
        sta.w ENABL             ; 4 71
        lsr                     ; 2 73
        jmp R76                 ; 3 76
 

 

as well as the ball

ResblStrobe23:                  ;   19
        sta.w RESBL             ; 4 23
        lda #<DS_JUMP           ; 2 25
        sta NextKernel          ; 3 28
        lda #<DS_JUMP           ; 2 30
        sta NextKernel+1        ; 3 33
        lda #<DS_NUSIZ_COLUPF   ; 2 35
        sta COLUPF              ; 3 38
        sta HMCLR               ; 3 41 ' reset missile/ball HMOVE
        ldx DS_GRP1             ; 4 45
        lda #<DS_COLUP_HMMB     ; 2 47
        sta HMBL                ; 3 50
        lda #<DS_HMP0           ; 2 52
        sta HMP0                ; 3 55
        lda #<DS_HMP1           ; 2 57
        sta HMP1                ; 3 60
        lda #<DS_GRP0           ; 2 62
        sta GRP0                ; 3 65 <- on VDEL, for next scanline
        lda #<DS_M1M0BL         ; 2 67 <- on VDEL, for next scanline
        sta.w ENABL             ; 4 71
        lsr                     ; 2 73
        jmp R76                 ; 3 76
 

It might take a bit of time to digest that code, so here's a summary of what it can do:

  • any object (player0, player1, missile0, missile1 or ball) can be repositioned in a single scanline. In the time Frantic takes to reposition 1 player we can now reposition 4 objects.
  • players can be set for any size when they're repositioned. In theory they can also be set for duplicate and triplicate, but we don't use that feature because it conflicts with missile usage (ie: if a player is set for multiple copies, so is the corresponding missile).
  • players can be set for any color when they're repositioned. Single color sprites only, like seen in Space Rocks. It doesn't support line-by-line color changes like in Frantic.
  • players can be shifted right/left on any scanline, which creates the illusion that 2x and 4x sized sprites have more than 8 pixels of horizontal detail. See reply 16 in the Space Rocks homewbrew topic if you're not sure what this means.
  • missiles can be set to any size (sizes are 1x, 2x, 4x and 8x) when the missile is repositioned.
  • ball color can be changed whenever its repositioned.

 

The above code has been tested and confirmed to work. A project that takes advantage of these routines will soon be announced.

  • Like 2

23 Comments


Recommended Comments

Hey Darrell, amazing progress! This slick kernel you put together with the list of what it can do just got me into thinking as well.

 

I know the use of two sprites pieced together for that fancy explosion was not practical to use in the new reboot of Frantic, and I must admit to myself that I do miss the novelty of that explosion affect. But concerning the player now being able to shift right/left on any scanline in this new kernal may present a possibility to bring back the fancy explosion. I suppose as was done to the rocks in Space rocks, I can use that same trick to mimick the fancy explosion into a single sprite and use the shifting method to preserve the same "wide" look to it and retain most of the visal likeness and color.

 

I'm not sure how much processing time this would use up as I do understand this may get in the way of how you're trying to free up as much time and space for the in-game speech which takes priority over all else, but what do you think of the revamped explosion idea?

Nice work - this has to be for a Robotron kernel :)

It feels like it might be possible to optimise further by removing the following two lines:

 

lda #<DS_JUMP ; 2 30
sta NextKernel+1 ; 3 33

 

If you could arrange all the kernel entry points to be in the same bank, and repurposed the cycles for the jmp R76, i.e.:

 

KernelEntry1:
   jmp ActualKernel1
 
ActualKernel1:
  ...
  jmp (NextKernel)

 

But those 5 extra cycles are probably not enough to update the PF or anything useful ...

 

Chris

Could also add some SLEEP to normal kernel so that this could be done:

NormalKernel:                   ;   19  
        lda #<DS_NUSIZ_COLUPF   ; 2 21 ' resize players/missiles, color for ball
        lda #<DS_COLUP_HMMB     ; 2 23 ' use for missile/ball HMxx
...
        lsr                     ; 2 62 
        SLEEP 11                ;11 73

        stx GRP1                ; 3 76/0
        sta HMOVE               ; 3  3
        sta ENAM0               ; 3  6
        lsr                     ; 2  8
        sta ENAM1               ; 3 11
        jmp (NextKernel)        ; 5 16
...
KernelEntry1:                   ;   16 
        jmp ActualKernel1       ; 3 19

ActualKernel1:                  ;   19
        sta RESP0               ; 4 23
...

Hmm, I suppose I could :ponder:

 

At the moment it wouldn't be feasible for the project I'm using this for. Digital music or SFX imposes a performance hit due to the interrupt driven ARM routines that update AUDV0 while the ARM code is running. Considering only the 6507 code that updates the audio:

LDA AMPLITUDE
STA AUDV0

that's 7 cycles of scanline time. While the ARM code is running the 6507 is feed a NOP, so if we assume the 6507 is in the middle of a 2 cycle NOP then we can estimate that 8 cycles of time will be used for the interrupt. So we're looking at a minimum of a 10% (8/76) performance penalty.

 

Normally I set TIM64T to $2B at the start of Vertical Blank. For this project I've already scooted the start of the display down, by using $2B+5, in order to give a little more processing time to VB. With the work I did last night VB gets down to 2 left in INTIM. In Stella I can see how much time is available for ARM code because it doesn't emulate the run time - it shows $1d, 29 decimal. A 10% hit on that will use up the remaining time in VB. In the current build I don't have any missiles turned on, so there's very minimal calculations going on for the missile datastream preparations. As soon as I add missiles I expect that 2 to drop, and am prepared to shift the screen down some more to give even more CPU time to VB. I'm not concerned about that as even when using $2B+5 my display currently starts higher than Space Invaders, Circus Atari, Crackpots, Demon Attack and especially Dodge Em.

 

I divide up my logic up so that OverScan does all the game calculations (joystick processing, enemy movement, etc), while Vertical Blank does everything needed to populate the datastreams with the information to drawn the current frame. It might be possible to shift some of the VB routines to OS, but since basically nothing is going on in OS right now it's too early to tell.

I have fast fetchers turned on, though I don't believe having them turned on is a requirement to use interrupt driven AUDV0 updates curing custom ARM code execution. That's why I assumed 7 cycle updates when estimating the ARM performance penalty for using the audio interrupt.

 

It's possible the DPC+ driver outputs LDA #actual_volume/STA AUDV0 to the 6507 for the interrupt driven updates as there's no need to use LDA AMPLITUDE/STA AUDV0 or even LDA #<AMPLITUDE/STA AUDV0.

Nice work - this has to be for a Robotron kernel :)

It feels like it might be possible to optimise further by removing the following two lines:

lda #<DS_JUMP ; 2 30
sta NextKernel+1 ; 3 33

If you could arrange all the kernel entry points to be in the same bank, and repurposed the cycles for the jmp R76, i.e.:

KernelEntry1:
   jmp ActualKernel1
 
ActualKernel1:
  ...
  jmp (NextKernel)

But those 5 extra cycles are probably not enough to update the PF or anything useful ...

 

Chris

 

When working on the Energy Field routines I debated doing this and using the ball object so the color of the Energy Field could be set. Turns out it only frees up 2 cycles - we forgot to count the 3 cycles used by the addition of jmp ActualKernel1.

Argh - and that's not quite right either.

 

The problem is there's not enough ROM space in bank 5 to remove the jmp R76 and duplicate the code down to the jmp (NextKernel) for each of the 55 reposition kernels. Leaving the jmp R76 in place means we'd end up with only 2 free cycles.

 

It might be possible to do it, but I'd need to revamp everything so 6507 code can also run in bank 4.

 

The other issue with using the ball for the energy field is the starfield display routine that runs in Vertical Blank would have to be rewritten, and wouldn't be nearly as efficient as it is now. And since we already get jitter due to VB running out of time, it would only make the jitter problem worse.

Hi Darrel,

 

Richard sent me to you since you are the guy that created the MAC menu maker version. So recently since the last Catalina 10.15.2 upgrade on my MAC, the new OS has disabled me from using the MenuMaker application to build the new menus from the new ROMs added. I still have my previous ROMs on the cart from the late summer before the Catalina upgrade which is fine andI just use my VecFever currently to solution and add new ROMs. But curious if you happen to have a newer version that might work around this Catalina OS upgrade? My Menumaker is from 2014 and see screen shot attached. If you don't have a newer version I am ok with that as well, but figured you might have a new version. Please hope if you can?

Screen Shot 2019-12-27 at 8.01.52 PM.png

On 12/28/2019 at 1:05 PM, eyelyft said:

New updated VecMulti menumaker for MACs

 

Forgot about this until yesterday. I installed the 64-bit version of Lazarus, then discovered the source for MenuMaker is no longer on my machine. I'd purchased a new SSD for my Mac Pro back in 2016 and had to do a clean install, so probably lost it then.

 

I tracked down the last MenuMaker blog entry from 2014, but discovered it doesn't contain everything.  Specifically it's missing the pascal source code!  Opening up the project file MenuMaker.lpi via TextEdit reveals the problem:

 

    <Units Count="2">
      <Unit0>
        <Filename Value="MenuMaker.lpr"/>
        <IsPartOfProject Value="True"/>
        <UnitName Value="MenuMaker"/>
      </Unit0>
      <Unit1>
        <Filename Value="../../unit1.pas"/>   <!-- THIS LINE REVEALS THE PROBLEM -->
        <IsPartOfProject Value="True"/>
        <ComponentName Value="Form1"/>
        <HasResources Value="True"/>
        <ResourceBaseClass Value="Form"/>
        <UnitName Value="Unit1"/>
      </Unit1>
    </Units>

 

unit1.pas is the pascal source, the ../../ means it was located 2 directories up from the MenuMaker project.  So it wasn't inside the directory I zipped up. 

 

 

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...