Jump to content

Session 24: Some nice code

Recommended Posts

In tutorial 22, we learned that to horizontally position a sprite, we need to trigger the RESPx register at the appropriate position in the scanline, at which point the sprite will display immediately. To move to an arbitrary horizontal position, we need to trigger RESPx just before the TIA is displaying the appropriate colour clock. Our solution has been to use the desired X-position of the sprite as the basis for a delay loop which starts at the beginning of a scanline, delays until roughly the correct position, adjusts the HMPx fine-tune horizontal position register and then 'hits' RESPx to immediately position the sprite.



Since the minimal time for a single loop iteration is 5 cycles (involving a register decrement, and a branch), and 5 cycles corresponds to 15 TIA colour-clocks, it follows that our delay-loop approach can only position RESPx writes with an accuracy of 15 TIA colour-clocks. This is fine, though, as the hardware capability of fine-positioning sprites by -8 to +7 pixels perfectly allows the correct position of the sprite to be established.


The approach taken previously has been to effectively divide the position by 15 (either through a table-lookup, or 'clever' code which simulated a divide by 15 using a divide by 16 (quick) + adjustment) and use that value as the iteration counter in a delay loop. This approach works, and has been fairly standard for a number of years. This is the approach presented in our earlier tutorial.


A recent posting to the [stella] list of an independant discovery of a 'new' method much improves on this technique. In actual fact, the technique was already known and documented in the list... but for various reasons these things don't always become well-known. The 'new' technique of horizontal positioning rolls the divide-by-15 and the delay loop into a single entity.




.Div15   sbc #15      ; 2

        bcs .Div15   ; 3(2)



Now that may not look like much, but it's absolutely brilliant! Every iteration through the loop, the accumulator is decremented by 15. When the subtraction results in a carry, the accumulator has gone 'past' 0, and our loop ends. Each iteration takes exactly 5 cycles (with an extra 2 cycles added for the initial 'sec' and one less for the final branch not taken). The real beauty of the code is that we also, 'for free', get the correct -8 to +7 adjustment for the fine-tuning of the position (which with a little bit of fine-tuning can be used for the HMP0 register)! Read the relevant post on [stella] here... http://www.biglist.com/lists/stella/archiv...3/msg00260.html


For this brilliant bit of coding, our thanks go to R. Mundschau



; Positions an object horizontally
; Inputs: A = Desired position.
; X = Desired object to be positioned (0-5).
; scanlines: If control comes on or before cycle 73 then 1 scanline is consumed.
; If control comes after cycle 73 then 2 scanlines are consumed.
; Outputs: X = unchanged
; A = Fine Adjustment value.
; Y = the "remainder" of the division by 15 minus an additional 15.
; control is returned on cycle 6 of the next scanline.


           sta WSYNC                  ; 00     Sync to start of scanline.

           sec                        ; 02     Set the carry flag so no borrow will be applied during the division.

.divideby15 sbc #15                    ; 04     Waste the necessary amount of time dividing X-pos by 15!

           bcs .divideby15            ; 06/07  11/16/21/26/31/36/41/46/51/56/61/66


           lda fineAdjustTable,y      ; 13 -> Consume 5 cycles by guaranteeing we cross a page boundary

           sta HMP0,x


           sta RESP0,x                ; 21/ 26/31/36/41/46/51/56/61/66/71 - Set the rough position.


; This table converts the "remainder" of the division by 15 (-1 to -15) to the correct
; fine adjustment value. This table is on a page boundary to guarantee the processor
; will cross a page boundary and waste a cycle in order to be at the precise position
; for a RESP0,x write

           ORG $F000


           DC.B %01110000; Left 7 

           DC.B %01100000; Left 6

           DC.B %01010000; Left 5

           DC.B %01000000; Left 4

           DC.B %00110000; Left 3

           DC.B %00100000; Left 2

           DC.B %00010000; Left 1

           DC.B %00000000; No movement.

           DC.B %11110000; Right 1

           DC.B %11100000; Right 2

           DC.B %11010000; Right 3

           DC.B %11000000; Right 4

           DC.B %10110000; Right 5

           DC.B %10100000; Right 6

           DC.B %10010000; Right 7

fineAdjustTable EQU fineAdjustBegin - %11110001; NOTE: %11110001 = -15



One interesting aspect of this code is the access to the table with a (conceptual) negative index (-1 to -15 inclusive). Negative numbers are represented in two's complement form, so -1 is %11111111 which is *exactly* the same as 255 (%11111111). So how can we use negative numbers as indexes? We can't! All indexing is considered to be with positive numbers. So if our index was -1, we would actually index 255 bytes past the beginning of our table. The neat bit of code at the bottom sets the conceptual start of our table to 241 bytes BEFORE the start of the actual data so that when we attempt to access the -15th element of the table, we ACTUALLY end up at the very first byte of the "fineAdjustBegin" table. Likewise, when accessing the -1th element, we ACTUALLY access the last element of the table. It's all very neat!


Finally, since we need to account for every cycle in this code very carefully (as the horizontal position depends on exactly where we write the RESP0 value), we need to take into account the possibility that an extra cycle is being thrown in when we access fineAdjustTable,y and that access crosses a page boundary. By positioning the table being accessed exactly on a page boundary, the code guarantees that every access incurs an extra cycle 'penalty' and is therefore consistent for all cases.


I don't take any credit for this, I just admire it. I consider this a BRILLIANT bit of coding, so hats-off to R. Mundschau and thanks for sharing!


Another "BRILLIANT" bit of code, but this time from yours truly, is the 8-byte system clear. We touched on this earlier in Session 12, but I thought I'd give a quick run-down on exactly how that code works...


       ldx #0 


Clear   dex 



       bne Clear



We assume that when this code starts, the system is in a totally unknown state. Firstly, X and A are set to 0, and we enter the loop.

The loop begins: X-register is decremented (to 255) and this value is placed in the stack pointer (now $FF)

the accumulator(0) is then pushed onto the stack, so memory/hardware location $FF is set to 0, and the stack pointer decrements to $FE

since the tsx and pha don't affect the flags, the branch will be based on the decrement of the x register

if non-zero, then we repeat the loop. 0 will be written to 256 consecutive memory locations starting with $FF and ending with 0 (inclusive). Loop will terminate after 256 iterations.

On the final pass through, x would be decremented to 0, and this placed in the stack pointer. We then push the accumulator (0) onto the stack (which effectively writes it to memory (TIA) location 0) and as a consequence the stack pointer decrements (and wraps!) back to $FF

At the conclusion of the above, X = 0, A = 0, SP = $FF, a near-perfect init!


That could be the best 8-bytes ever written ;)

  • Thanks 1
Link to comment
Share on other sites

I must be missing something fundamental, because I just can't figure out how this horizontal positioning code works.


Take two examples, a horizontal position of 0 and 66 (relative to the left edge of the screen).


If A is 0, I would think that the sprite should display 68 TIA clocks into the scanline (i.e. after the horizontal blank).


However in the code above, it looks like the the STA RESP0,X completes 21 cpu cycles after the STA WSYNC. This is 63 TIA clocks into the scanline. The Y value used for the table lookup is -15. After some research on the [stella] list, I also found out that there is a 5 pixel delay between when RESPx is set, and where the sprite is displayed. So it looks like the sprite will be displayed at 63+5-7 = 61 TIA clocks into the scanline. This is off the screen to the left.


If A is 66, I would think the sprite should be displayed at 134 cycles into the scanline (66 cycles after the horizontal blank).


However in the code above, it looks like 5 subtractions are done leaving -9 as the table lookup value. The STA RESP0,X is done 41 cpu cycles after the STA WSYNC. So it looks like the sprite will start at (41*3)+5-1 = 127 TIA clocks into the scanline. A bit left of where I thought it should go.


So what am I missing here? :?

Link to comment
Share on other sites

You are correct. The code does not have A=0 = pixel #0; a +7 offset is required.


Some other possible modifications to the given code:

1. If fineAdjustBegin is located at $xxF1 and fineAdjustTable = $xx00, then the page boundary penalty cycle can be avoided.

2. The fineAdjust table could also convert -1 to -15 to Left 6 - Right 8; which may be useful in some situations.


There are other possible variations of this code (search [stella] for "Revolutionary Horizontal Positioning") most notable being variations which do not require the fineAdjust table (at the cost of some additional instructions and cycles) and ones optimized for positioning both player sprites next to one another (for 16 or 48 bit sprites).


Note: when developing a repositioning routine don't forget to cycle count the right edge too! It's very easy to slip past cycle 73 if you are ending the routine with a STA WSYNC/STA HMOVE combo and add an extra line.


Also don't forget that HMnn remains set even after an HMOVE. So do a STA HMCLR if you don't want sprites to move again.


Oh, one other "trick". If the SEC is a CLC, then the first subtraction will be A-16. The fineAdjust table will need to be expanded to contain Left 7 to Right 8. This can be used to get one more pixel of range out of the routine.

Link to comment
Share on other sites

; X = Desired object to be positioned (0-5).  

To clarify looking at vcs.h and how X is used, that's

0 = Player0

1 = Player1

2 = Missile0

3 = Missile1

4 = Ball


(And I'm guessing that's a minor typo, it should be 0-4,

else the routine hits AUDC0 and VDELP0 when X = 5)


One thing it took me awhile to realize is how you can take advantage of the way TIA addresses are laid out in logical orders, so for example


is P0 if X is 0 and P1 if X is 1...that kind of thing. I've seen it most often used in this kind of reusable positioning code, actually...

Link to comment
Share on other sites



Sorry I didn't see this thread earlier. Thanks, Andrew for highlighting my code. Here are some comments corrections.


First, regarding the correlation between the value in A and the resulting position of the object on screen. It is true that assuming the pixels on screen are numbered 0 to 159, then placing the corresponding value in A and calling this routine will not position object at the expected pixel. This is one of the quirks of horizontal positioning on the 2600. The resulting position is: Xreal = K + Xdesired. The value of K will vary in different implementations of the horizontal positioning algorithm. K is the extra overhead of the routine after the write to WSYNC on top of the division by 15 of Xdesired. By varying K in your code you can control the mapping of Xdesired to Xreal in your program. There is some entertaining math related to this problem, and if I ever get a free hour or two I will write it up and post it here.


Errata: The range of values in X is 0-4 not 0-5, sorry thats a typo. Good catch!

Link to comment
Share on other sites

  • 9 months later...

I must be missing something trivial, because this horizontal positioning routine isn't working right for me. I have a variable that I use to store the position of player 1. I increment it every frame and store it to A before I call the routine. The problem is I still get the jerky leaps that I did when I simply used a loop. Are there any gotchas with this?


I don't have my code with me, but I'll post it up when I get a chance.

Link to comment
Share on other sites

It is true that assuming the pixels on screen are numbered 0 to 159, then placing the corresponding value in A and calling this routine will not position object at the expected pixel.


My feeble mind had given up trying to figure out things like Xreal and K :lol:


Instead, I just use a table. 160 bytes of overhead, but simple enough to grasp. When you get involved with bankswitching, it's not like rom is scarce or anything ;)

Link to comment
Share on other sites

Here is a subroutine that takes exactly 2 scanlines (plus change) every time, no matter the horizontal position.


Going in, A = horizontal position 0-160 (0==160 and, actually, I think you could use up to 164 as a horiz position, w/ 160-164 == 0-5).

and X = player number, as above.




       sec            ;doing this before so that I have more time

                      ;during the next scanline.

       sta WSYNC      ;begin line 1


       sbc #15

       bcs DivideLoop                 ;+4/5	4/9.../54

       tay                            ;+2	6

       lda FineAdjustTableEnd,Y       ;+5	11


       nop            ;+4     15/20/etc.   - 4 free cycles!

       sta HMP0,X     ;+4     19/24/...

       sta RESP0,X    ;+4     23/28/33/38/43/48/53/58/63/68/73

       sta WSYNC      ;+3      0       begin line 2

       sta HMOVE      ;+3

       rts            ;+6      9


You need to call this subroutine with at least 11 cycles left in the scanline (time for the jsr, sec, and sta WSYNC) and it returns 9 cycles into the 3rd scanline.


And, you need the table:

	org $FF00


.byte %01100000	;left 6

.byte %01010000

.byte %01000000

.byte %00110000

.byte %00100000

.byte %00010000

.byte %00000000	;left/right 0

.byte %11110000

.byte %11100000

.byte %11010000

.byte %11000000

.byte %10110000

.byte %10100000

.byte %10010000

.byte %10000000	;right 8

FineAdjustTableEnd	=	FineAdjustTableBegin - 241

I attached a very simple .bin showing this in action; the binary calls the function about 100 scanlines down the visible screen so you can see that it takes the same # of scanlines for every horizontal position (run it in z26 with the -n flag to see that it is a constant 262 scanlines).


Link to comment
Share on other sites

OK, I checked my code and it seems the problem was the routine I was using (the one at the top of this page) did not store anything in HMOVE. I added sta HMOVE at the end of the routine and it works good now. Is there a reason why it was not included in the code at top?

Link to comment
Share on other sites

I wonder if the table is really necessary. I tried something similar in a recent project of mine and it indeed worked, with no table. It went something like this:

       sta WSYNC      ;begin line 1 



       sbc #15 

       bcs DivideLoop                 ;+4/5   4/9.../54 
;need 13 cycles

           EOR #$0F;2- convert negative to positive

           SBC #8; 2 -subtract 8




           ASL;2 -Shift left 4 places
;there's 12 cycles above, do we need another cycle?

       sta HMP0,X     ;+4     19/24/... 

       sta RESP0,X    ;+4     23/28/33/38/43/48/53/58/63/68/73 

       sta WSYNC      ;+3      0       begin line 2 

       sta HMOVE      ;+3 


Is there any reason why the above won't work?

Link to comment
Share on other sites

Wait, I forgot the EOR was outside the scanline. Also, they don't have proper ventillation in the computer lab at school so my brain gets fried after a while. I've also included cycle counts to make sure it might actually work.


      EOR #$FF;0-159 is now 255-96


      sta WSYNC      ;begin line 1 


       ADC #15 

       BCC DivideLoop                 ;54 max

           SBC #7; 2




           ASL;2 -Shift left 4 places 

       sta HMP0,X     ;+4     68

       sta RESP0,X    ;+4     72
;maybe an extra cycle can be found somewhere?

       sta WSYNC       

       sta HMOVE      ; 

Link to comment
Share on other sites

Another advantage of mine is that it doesn't use the Y register, which I found helpful because I used it throughout my kernel as a scanline counter.


I tried Nukey's assembly but ran both methods at the same time, and my method positioned the sprite exactly four pixels to the left of the other, three of them because mine does the STA HMP,x one cycle sooner. I can't think of an elegant way to burn up an extra cycle in the code. Maybe someone who is smarter than me knows slick way to do this.


Regardless, I don't think there's any need to adjust this if it's used in a new kernel, but if someone wanted to use the code as a drop-in replacement in an existing kernel, one could add 3 to the accumulator before calling the procedure and change the SBC #7 to SBC #8.

Link to comment
Share on other sites

I tried it, and for some reason it didn't work. Seems like it should, since the STA.w is supposed to take 5 cycles, but the sprites didn't change positions.


However, it's probably not a big deal that the two methods don't align perfectly, as neither would put zero at exactly the left edge of the screen and 159 at the right anyway, as I understand. You could almost as easily drop in this code as-is while changing a few other numbers around to get their sprites lined up right.

Link to comment
Share on other sites

The battlezone method shown in that link saves 2 bytes over mine. the EOR #7 didn't occur to me. But at least I came close. I thought up my method last night in a brief stint of lucidity after reading through the Stella programming guide, so when I saw that table in this thread, I knew there was a better way.


Based on the date in that link, people here have known about battlezone for years, so I wonder why they presented the table method in the tutorial, since the battlezone method works just as well and saves 15 bytes.


Anyway, here's what I'm going to use:

       clc             ;2

       STA WSYNC       ;3 begin line 1


       SBC #15        ;2

       BCS DivideLoop ;54 max   

       EOR #7         ;2

       ASL            ;2

       ASL            ;2

       ASL            ;2

       ASL            ;2 -Shift left 4 places

       sta HMP0,X     ;4     68

       sta RESP0,X    ;4     72   

Link to comment
Share on other sites

Based on the date in that link, people here have known about battlezone for years, so I wonder why they presented the table method in the tutorial...

Well, which method is the best, depends. Some are smaller but slower than others.


So, inside a kernel you may want to use the table method (though it requires an additional register), outside you will probably choose the shortest code.

Link to comment
Share on other sites

Hi there!


So, inside a kernel you may want to use the table method (though it requires an additional register), outside you will probably choose the shortest code.


I originally added the table for Star Fire, because there I reposition two sprites at once per scanline:


-> http://www.biglist.com/lists/stella/archiv...1/msg00165.html


With 2 RESPS, the extra shift for the second RESP and the need to to jump to the beginning of the next scannline, the ASL method just wasn't fast enough ;)




Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...