Jump to content
  • entries
    657
  • comments
    2,702
  • views
    902,030

Step 4 - 2 Line Kernel


SpiceWare

6,145 views

Let's review the TIA Timing diagram from last time:

blogentry-3056-0-85527300-1404428061_thumb.png

 

We used that to determine when we could safely update the playfield data in order to draw the score and timer. For moveable objects(player0, player1, missile0, missile1 and ball) if you update their graphics during the Visible Screen (cycles 23-76) you run the risk of shearing. For something that's moving fast, like the snowball in Stay Frosty 2, shearing may be an acceptable design compromise:

blogentry-3056-0-24448000-1404427717_thumb.png

 

That snowball should be square, but the left edge has sheared due to the ball object being updated mid-scanline.

 

To prevent shearing we need to update the objects on cycles 0-22. There's a lot of calculations to be done in the kernel to draw just one player. For Collect I'm using DoDraw, which looks like this for drawing player0:

DoDraw0:
        lda #HUMAN_HEIGHT-1 ; 2  2 - height of the humanoid graphics, subtract 1 due to starting with 0
        dcp HumanDraw       ; 5  7 - Decrement HumanDraw and compare with height
        bcs DoDrawGrp0      ; 2  9 - (3 10) if Carry is Set, then humanoid is on current scanline
        lda #0              ; 2 11 - otherwise use 0 to turn off player0
        .byte $2C           ; 4 15 - $2C = BIT with absolute addressing, trick that
                            ;        causes the lda (HumanPtr),y to be skipped
DoDrawGrp0:                 ;   10 - from bcs DoDrawGrp0
        lda (HumanPtr),y    ; 5 15 - load the shape for player0
        sta GRP0            ; 3 18 - update player0 to draw Human
 

 

 

That's 18 cycles to draw a single player. One way to make it easier to fit all the code in is to use a 2 Line Kernel (2LK). In a 2LK we update TIA's registers over 2 scanlines in order to build the display. For Collect, the current routines are updating them like this:

  1. player0, playfield
  2. player1, playfield

 

The actual code looks like this:

        ldy #ARENA_HEIGHT   ; 2  7 - the arena will be 180 scanlines (from 0-89)*2        
        
ArenaLoop:                  ;   13 - from bpl ArenaLoop
    ; continuation of line 2 of the 2LK
    ; this precalculates data that's used on line 1 of the 2LK
        lda #HUMAN_HEIGHT-1 ; 2 15 - height of the humanoid graphics, subtract 1 due to starting with 0
        dcp HumanDraw       ; 5 20 - Decrement HumanDraw and compare with height
        bcs DoDrawGrp0      ; 2 22 - (3 23) if Carry is Set, then humanoid is on current scanline
        lda #0              ; 2 24 - otherwise use 0 to turn off player0
        .byte $2C           ; 4 28 - $2C = BIT with absolute addressing, trick that
                            ;        causes the lda (HumanPtr),y to be skipped
DoDrawGrp0:                 ;   23 - from bcs DoDrawGrp0
        lda (HumanPtr),y    ; 5 28 - load the shape for player0
        sta WSYNC           ; 3 31
;---------------------------------------
    ; start of line 1 of the 2LK
        sta GRP0            ; 3  3 - @ 0-22, update player0 to draw Human
        ldx #%11111111      ; 2  5 - playfield pattern for vertical alignment testing
        stx PF0             ; 3  8 - @ 0-22
    ; precalculate data that's needed for line 2 of the 2LK        
        lda #HUMAN_HEIGHT-1 ; 2 10 - height of the humanoid graphics, 
        dcp BoxDraw         ; 5 15 - Decrement BoxDraw and compare with height
        bcs DoDrawGrp1      ; 2 17 - (3 18) if Carry is Set, then box is on current scanline
        lda #0              ; 2 19 - otherwise use 0 to turn off player1
        .byte $2C           ; 4 23 - $2C = BIT with absolute addressing, trick that
                            ;        causes the lda (BoxPtr),y to be skipped
DoDrawGrp1:                 ;   18 - from bcs DoDrawGRP1
        lda (BoxPtr),y      ; 5 23 - load the shape for the box
        sta WSYNC           ; 3 26
;---------------------------------------
    ; start of line 2 of the 2LK
        sta GRP1            ; 3  3 - @0-22, update player1 to draw box
        ldx #0              ; 2  5 - PF pattern for alignment testing
        stx PF0             ; 3  8 - @0-22
        dey                 ; 2 10 - decrease the 2LK loop counter
        bpl ArenaLoop       ; 2 12 - (3 13) branch if there's more Arena to draw
 

 

 

If you look at that closely, you'll see I'm splitting DoDraw a bit so that this is how the 2LK works:

  1. updates player0, playfield, precalc player1 for line 2
  2. updates player1, playfield, precalc player0 for line 1

By pre-calculating data during the visible portion of the scanline, we'll have more time during the critical 0-22 cycles for when we add the other objects.

 

Since we're updating the players on every other scanline, each byte of graphic data is displayed twice (compare the thickness of the humanoid pixels with the red lines drawn with the playfield). Also, the players never line up as they're never updated on the same scanlines:

blogentry-3056-0-26338200-1404425799_thumb.png

closeup:

blogentry-3056-0-94654100-1404430427.png

 

The designers of TIA planned for this by adding a Vertical Delay feature to the players and ball (though sadly not the missiles). The TIA registers for this are VDELP0, VDELP1 and VDELBL. For this update to Collect, I've tied the Vertical Delay to the difficulty switches, putting the switch in position A will turn on the delay for that player so we can experiment with how that works. For the next update I'll set the Vertical Delay based on the Y position of the player (this also means the maximum Y value will be double that of this build).

 

Left Difficulty A, Right Difficulty B so VDELP0 = 1 and VDELP1 = 0. Sprites line up with the same Y

blogentry-3056-0-66399400-1404425804_thumb.png

closeup:

blogentry-3056-0-76069900-1404430433.png

 

Left Difficulty B, Right Difficulty A so VDELP0 = 0 and VDELP1 = 1. Sprites line up when player1's Y = player0's Y + 1

blogentry-3056-0-89607500-1404425809_thumb.png

Closeup:

blogentry-3056-0-85314900-1404430439.png

 

 

The code that preps the data used by DoDraw looks like this:

    ; HumanDraw = ARENA_HEIGHT + HUMAN_HEIGHT - Y position
        lda #(ARENA_HEIGHT + HUMAN_HEIGHT)
        sec
        sbc ObjectY
        sta HumanDraw 
    
    ; HumanPtr = HumanGfx + HUMAN_HEIGHT - 1 - Y position
        lda #<(HumanGfx + HUMAN_HEIGHT - 1)
        sec
        sbc ObjectY
        sta HumanPtr
        lda #>(HumanGfx + HUMAN_HEIGHT - 1)
        sbc #0
        sta HumanPtr+1
    
    ; BoxDraw = ARENA_HEIGHT + HUMAN_HEIGHT - Y position
        lda #(ARENA_HEIGHT + HUMAN_HEIGHT)
        sec
        sbc ObjectY+1
        sta BoxDraw
    
    ; BoxPtr = HumanGfx + HUMAN_HEIGHT - 1 - Y position
        lda #<(HumanGfx + HUMAN_HEIGHT - 1)
        sec
        sbc ObjectY+1
        sta BoxPtr
        lda #>(HumanGfx + HUMAN_HEIGHT - 1)
        sbc #0
        sta BoxPtr+1
    
    ...
    
HumanGfx:
        .byte %00011100
        .byte %00011000
        .byte %00011000
        .byte %00011000
        .byte %01011010
        .byte %01011010
        .byte %00111100
        .byte %00000000
        .byte %00011000
        .byte %00011000
HUMAN_HEIGHT = * - HumanGfx        
 

 

 

The graphics are much easier to see using my mode file for jEdit:

 

blogentry-3056-0-02400400-1404431888.png

 

I'm sure some of you are wondering why the human graphics are upside down. If you wanted to loop thru something 10 times, you'd normally think to write the code like this:

        ldy #0
Loop:
    ; do some work
        iny
        cpy #10
        bne Loop
 

 

 

But the 6507 does an automatic check for 0 (as well as positive/negative) which lets you save 2 cycles of processing time by eliminating the CPY command:

        ldy #10
Loop:
    ; do some work
        dey
        bne Loop
 

 

 

Alternatively, if your initial value is less than 128, you can use this:

        ldy #(10-1)
Loop:
    ; do some work
        dey
        bpl Loop
 

 

 

Making the loop count down instead of up saves 2 cycles, but doing so requires the graphics to be upside down. 2 cycles doesn't sound like much, but in a scanline that's 2.6% of your processing time and saving it might be what allows you to update everything you want. In Kernels I've written, I often use every cycle - and that includes eliminating the sta WSYNC to buy back 3 cycles of processing time. See the reposition kernels in this post about Draconian.

 

I've also added joystick support that will let you move around the players. Pressing FIRE will slow down the movement, making it easier to line things up. The score (on the left) is used to display player0's Y position, and the timer is used for player1. As an added bonus, I'm showing how you can save ROM space by creating graphics that only face in one direction by using REFP0 and REFP1 (REFlect Player) to make the graphics face the other way. The routine's fairly sizable, so I'm not posting it here so download the source code and check it out!

 

ROM

collect_20140703.bin

 

Source

 

Collect_20140703.zip

 

COLLECT TUTORIAL NAVIGATION

<PREVIOUS> <INDEX> <NEXT>

18 Comments


Recommended Comments

Is there a different way to line up missiles since it doesn't use the vertical delay feature? If I wanted to say make a 2 player bomberman game and use the bombs as missiles, or maybe something like smash tv with the bullets being missiles what could be done?

Link to comment

There's only so many cycles per line, so if you want to update both missiles on every line you'll have to drop something else. As an example of such a compromise, in Stay Frosty I used a reflected playfield and dropped the updates to PF0. That's why the upper level ice blocks and platforms never go to the edges of the screen. You can see that in this blog post, where I compare Stay Frosty with Stay Frosty 2.

 

So how did I get Stay Frosty 2 to go the full width? By using an in-cartridge coprocessor like they did back in the day for Pitfall II. The coprocessor is known as DPC+, but that's beyond the level of a beginner course like Collect. After you've finished working thru the Collect blog entries, go check out the Harmony DPC+ programming topic.

Link to comment

Thanks. I'm taking my time making sure I grasp everything going on. So in arenaloop you use dcp followed by bcs. Could you clarify what flag is being set? I assume it's the carry flag, but that seems strange for an instruction to set comparing equality. Although if it's equal that just means humandraw is +1 greater than height so I guess that could make sense getting put into the carry flag. Also after that you use .byte $2C to skip a few bytes vs using a branch(to save those bytes). What would you need to change to the bit pattern to make it skip more or less bytes?

 

Thanks!

Link to comment

I'm going to do this as 2 replies. This reply is for DCP.

 

The 6502 was designed with 151 opcodes which are known by their mnemonics of LDA, STA, etc. There's 256 possible opcodes, so the remaining 105 were undefined.

 

Over time people figured out that some of the undefined opcodes did really useful things and assigned them names. Some that are commonly used with 2600 development are DCP, LAX and SAX. You can see a list of them here under the section titled Illegal opcodes. Do note that some of them are unstable, and thus shouldn't be used. If you'd like more information check this document, How MOS 6502 Illegal Opcodes really work. Also note that these opcodes are interchangeably known as illegal opcodes as well as undefined opcodes.

 

DCP is named as such because it's a merging of the DEC and CMP opcodes. Basically this bit of code, which takes 10 cycles to run:

        lda #HUMAN_HEIGHT-1 ; 2 15 - height of the humanoid graphics, subtract 1 due to starting with 0
        dec HumanDraw       ; 5 20 - Decrement HumanDraw by 1
        cmp HumanDraw       ; 3 23 - Compare HumanDraw with height

does exactly the same thing as this bit of code, which takes only 7 cycles to run:

        lda #HUMAN_HEIGHT-1 ; 2 15 - height of the humanoid graphics, subtract 1 due to starting with 0
        dcp HumanDraw       ; 5 20 - Decrement HumanDraw and compare with height

The 3 cycles savings is very handy when writing 2600 code.

Link to comment

This reply is for .byte $2C

 

This is a 6502 trick I learned back in the 80s on my VIC-20. It's a space savings trick to skip over a 2 byte instruction.

 

If you take a look at that opcode matrix again, you'll see that $2C is the BIT abs opcode that takes 4 cycles to execute. The abs means 2 bytes follow the opcode to specify an absolute address.

 

If the bcs is taken, the 6507 skips over the lda #0 and .byte $2c and runs the code like this:

        bcs DoDrawGrp0      ; 2 22 - (3 23) if Carry is Set, then humanoid is on current scanline
DoDrawGrp0:                 ;   23 - from bcs DoDrawGrp0
        lda (HumanPtr),y    ; 5 28 - load the shape for player0
        sta WSYNC           ; 3 31

If the bcs is not taken, the 6507 runs the code like this:

        bcs DoDrawGrp0      ; 2 22 - (3 23) if Carry is Set, then humanoid is on current scanline
        lda #0              ; 2 24 - otherwise use 0 to turn off player0
        bit $93b1           ; 4 28 - $2C = BIT with absolute addressing, trick that
        sta WSYNC           ; 3 31

Looking at the listing created by DASM, you'll see the lda (HumanPtr),y instruction is compiled like this:

    324  f8d7		       b1 93		      lda	(HumanPtr),y	; 5 28 - load the shape for player0

The b1 93 is the $93b1 address after the BIT instruction.

 

There's two reasons for using the $2C trick - first reason it is saves ROM space. Second reason is the code takes the same amount of cycles to execute, whether or not the branch is taken. When writing a kernel, having consistent execution time is often critical. For this kernel, due to the use of sta WSYNC, the time is not critical though we're happy to get the space savings.

 

This blog entry, 6502 Assembly - .BYTE $2C - Insane Coding Trick, by Johnny Star may also help explain the use of .byte $2C. Addendum: this site is gone, but you can find the contents via this archive at Wayback Machine. The link in the menu on the right doesn't work, but if you scroll down to the end you'll find the full blog entry text.

Link to comment

In case it's not clear how the delay works:

  • If VDELP0 is on, any updates to GRP0 are delayed until GRP1 is written to.
  • If VDELP1 is on, any updates to GRP1 are delayed until GRP0 is written to.
Link to comment

Good Morning,

 

I'm having a difficult time distinguishing between each Human variable. Could you please explain?

 

Does Humandraw = the Y position of the player on the screen? Why decrement it in the scan loop?

Does HumanPtr = the line of the player's graphic to be drawn now?

HUMAN_HEIGHT = ???

 

Also I don't understand this syntax. The asterisk mainly:

HUMAN_HEIGHT = * - HumanGfx

 

Thanks!

Link to comment

Been out of town, so just a quick response on HUMAN_HEIGHT. Will follow up on the rest later.

 

 

I always compile with the -s option to have dasm generate a symbol file. Open up collect.sym from the zip and you'll find:

 

 

HUMAN_HEIGHT             000a

All the symbol values are in hex, so HUMAN_HEIGHT has a value of 10 in decimal.

 

I also compile with the -l option to generate the listing. Open up collect.lst from the zip and you'll find:

 

 

    722  fa50    HumanGfx
    723  fa50        1c       .byte.b %00011100
    724  fa51        18       .byte.b %00011000
    725  fa52        18       .byte.b %00011000
    726  fa53        18       .byte.b %00011000
    727  fa54        5a       .byte.b %01011010
    728  fa55        5a       .byte.b %01011010
    729  fa56        3c       .byte.b %00111100
    730  fa57        00       .byte.b %00000000
    731  fa58        18       .byte.b %00011000
    732  fa59        18       .byte.b %00011000
    733  fa59        00 0a    HUMAN_HEIGHT = * - HumanGfx
    734  fa5a

The * denotes the current program counter (location in ROM). It's used as a synonym for .. From dasm's instruction file dasm.txt:

. -current program counter (as of the beginning of the instruction).

 

* -synonym for ., when not confused as an operator.

The listing is a little deceptive - the last byte value of 18 is located at fa59, so the * in the equation has a value of fa5a, not fa59 as the listing suggests.

 

HUMAN_HEIGHT = * - HumanGfx
HUMAN_HEIGHT = fa5a - fa50
HUMAN_HEIGHT = 000a

Basically I'm letting dasm calculate the size of the image. I do that because I usually start projects with placeholder graphics that are later replaced with images created by graphic artists such as Nathan Strum and David Vazquez. Do note those lists of projects they've contributed to should be larger - for instance Nathan did the graphics for Space Rocks, but that game's not yet in the database.

Link to comment

Yes, HumanDraw is the Y position and then some. Remember that TIA is scanline based so we need a way to programmatically determine if the sprite is to be drawn over a number of consecutive scanlines. The DCP is a thrifty(fast) way to do that.

 

Splitting the DCP to it's components, we're running this bit of code on every scanline to determine if the sprite is drawn:

  lda #10 ; the height of the sprite
  dec HumanDraw
  cmp HumanDraw
  bcs DrawSprite ; if 'C'arry is set, the sprite is on this scanline
  lda #0; sprite not on this scanline, so use 0 to blank it out
  jmp UpdateTIA ; the .byte $2c trick is equivalent to this
DrawSprite:
  lda (HumanPtr),y ; fetch the shape for this particular scanline
UpdateTIA:
  sta GRP0

During the compare the Carry flag will be set if HumanDraw < 10. So for 10 lines, when HumanDraw has the value from 0 to 9, the sprite data will be loaded into A. All other times the 0 is loaded into A in order to blank out the sprite.

 

To have the sprite start drawing on the 10th scanline, we set HumanDraw to 19. Due to the decrement BEFORE the compare, the code run on each scanline will check the following values of HumanDraw:

  • line 1 - is 18 < 10 - nope, blank the sprite
  • line 2 - is 17 < 10 - nope, blank the sprite
  • ...
  • line 9 - is 10 < 10 - nope, blank the sprite
  • line 10 - is 9 < 10 - yep, draw the sprite
  • line 11 - is 8 < 10 - yep, draw the sprite
  • ...
  • line 18 - is 1 < 10 - yep, draw the sprite
  • line 19 - is 0 < 10 - yep, draw the sprite
  • line 20 - is 255 < 10 - nope, blank the sprite*
  • line 21 - is 254 < 10 - nope, blank the sprite
  • ...
* when dealing with unsigned 1 byte values, 0 - 1 = 255, or in Hex that'd be $00 - $01 = $ff.
Link to comment

And for the final part of the question: yes, HumanPtr is used to point to the line of graphics to be drawn now.

 

The tricky part is it's too time consuming to adjust HumanPtr in the Kernel like we do HumanDraw, so we have to use the Y register select the proper line of graphics for each scanline. However Y won't be 0-9 while the sprite is being drawn, it'll be another range of 10 numbers in sequence, so we need to adjust the value of HumanPtr to compensate.

 

In other words, we need HumanPtr + Y to be equal to HumanGfx + HUMAN_HEIGHT - 1 when we hit the very first scanline for "draw the sprite" (as denoted in the prior reply).

 

Since Y is counting downward in the kernel, HumanPtr + Y will equal HumanGfx on the last "draw the sprite" scanline.

 

Hope these three replies clear it up for you!

Link to comment

Thanks! It took me quite a few reads to digest it.

 

I can follow it.. Don't ask me to recite it though!

 

So:

DASM knows that * - HumanGfx is the "distance" from where HumanGfx started, making HUMAN_HEIGHT equal to that number of positions-right?

 

One other thing.. I understand setting the carry flag, but why does DCP set the carry flag here? What's the logic of that? Is it just something you know or does it make sense somehow? I don't see what's carrying over..

Link to comment

That's exactly right on * - HumanGfx.

 

 

The DCP instruction is really DEC and CMP, we use it because it's a few cycles faster than using the "legal instructions". From 6502.org Tutorials: Compare Instructions

The CMP, CPX, and CPY instructions are used for comparisons as their mnemonics suggest. The way they work is that they perform a subtraction. In fact,

    CMP NUM

is very similar to:

    SEC
    SBC NUM

Both affect the N, Z, and C flags in exactly the same way. However, unlike SBC, (a) the CMP subtraction is not affected by the D (decimal) flag, (b) the accumulator is not affected by a CMP, and © the V flag is not affected by a CMP. A useful property of CMP is that it performs an equality comparison and an unsigned comparison. After a CMP, the Z flag contains the equality comparison result and the C flag contains the unsigned comparison result, specifically:

  • If the Z flag is 0, then A <> NUM and BNE will branch
  • If the Z flag is 1, then A = NUM and BEQ will branch
  • If the C flag is 0, then A (unsigned) < NUM (unsigned) and BCC will branch
  • If the C flag is 1, then A (unsigned) >= NUM (unsigned) and BCS will branch
Link to comment

Thanks for your explanation of DoDraw; now I finally get how this works.

 

One question though: when preparing the HumanPtr pointer and subtracting the y-position of the player, I think it is possible this ends up pointing to an address in the page prior to the page where the gfx is located. Which - depending on the y-position of the player - can sometimes result in an extra cycle when doing lda (HumarPtr),y.

 

So to be cycle-exact every time, you probably need to place the player gfx somewhere at the end of a page, right?

Link to comment

You are correct! Surprisingly I'd not run into any problems due to that, most likely due to there beings enough slack in the kernel than an extra cycle didn't cause a problem.

Link to comment

Hey there. I am trying to figure out why the timer and score are changing with the players positioning now? I'm sure there's an obvious answer but I'm not too good at reading code yet, so I can't quite find it. Thanks in advance!

Link to comment

@Elijah - check the comment just after PrepScoreForDisplay:

 

PrepScoreForDisplay:
    ; for testing purposes, set Score to Humanoid Y and Timer to Box Y
        lda ObjectY
        sta Score
        lda ObjectY+1
        sta Timer
        

 

Link to comment

Hi, I have a question about the PosObject subroutine, specifically regarding this instruction:
 

sta.wx HMP0,X


What does sta.wx mean? I can't find any information about the .wx after the sta. I removed it to see if there was any difference, and player 0 appears in a wrong position, so it must mean something.


image.thumb.png.1af1ef7f4fda65fb2c4754a311c92b20.png

 

I also noticed there are a total of 263 scanlines instead of 262. I understand that this isn't a problem, but I don't know if it's an error or was intentional.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...