+Andrew Davie Posted June 28, 2014 Author Share Posted June 28, 2014 It's a bit like someone coming up to Yoda and thanking him for his advice and calling him "young master"... with no idea he's talking to the most powerful Jedi master of all. Quote Link to comment Share on other sites More sharing options...
bogax Posted August 8, 2014 Share Posted August 8, 2014 (edited) Yes, that's a good 8-byte clear. It clears memory but doesn't set the stack pointer. you should look at Session 24 "Some nice code" for a better one... ldx #0 txa Clear dex txs pha bne Clear The above exits with stack pointer set to $FF, all memory zeroed, X and A zero. how about ldx #$FF txs lda #$00 LOOP tsx pha bne LOOP Omegamatrix can stick in a few extra pha's at the beginning of the loop as long as the total is divisible in to 256 Edited August 8, 2014 by bogax Quote Link to comment Share on other sites More sharing options...
bogax Posted August 8, 2014 Share Posted August 8, 2014 (edited) Yes, that is a brilliant, optimal solution. Just stick a CLD on there and you are done. I sometimes run into a situation where I need to re-boot or switch to an entirely new kernel (say titlescreen to a playing screen). The main issue to avoid is scanline bounces. The optimal code takes about 36 scanlines to complete which is too long. I made a routine that saves about 26 scanlines with the trade-off of using much more bytes. I tried to balance the byte cost vs amount of scanlines gained, and this was the best balance I could find: cld lda #0 ldx #$2C txs .loopClear: pha pha pha pha pha pha tsx cpx #$7E bne .loopClear ldx #$FF txs like this (see above) ldx #$FF txs lda #$00 LOOP pha pha pha pha pha pha pha tsx pha bne LOOP cld Edited August 8, 2014 by bogax Quote Link to comment Share on other sites More sharing options...
+Andrew Davie Posted August 9, 2014 Author Share Posted August 9, 2014 how about ldx #$FF txs lda #$00 LOOP tsx pha bne LOOP It's not better as it stands -- 9 bytes instead of 8. However, in the special case where you need extra speed it's definitely quicker. Quote Link to comment Share on other sites More sharing options...
LS_Dracon Posted August 9, 2014 Share Posted August 9, 2014 (edited) ldx #$FFtxslda #$00 LOOP tsx pha bne LOOP Hmm.... This gonna works? lax #$00 dex txs Loop tsx pha bne Loop EDIT : or lax #$00 Clear dex txs pha bne Clear Edited August 9, 2014 by LS_Dracon Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted August 9, 2014 Share Posted August 9, 2014 Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware. Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 9, 2014 Share Posted August 9, 2014 (edited) cld lda #0 ldx #$2C txs .loopClear: pha pha pha pha pha pha tsx cpx #$7E bne .loopClear ldx #$FF txs I came up with a more optimized solution which I posted in my blog a while ago: cld lda #0 ldx #CXCLR txs ldx #28 .loopClearFaster: pha pha pha pha pha pha dex bpl .loopClearFaster txs @Bogax the point of the above code is to balance speed vs bytes used. By starting at CXCLR and working down a lot more cycles are saved (plus less loops by stuffing multiple PHA's). The above code only takes 10 scanlines and 48 cycles. That's pretty good performance. The compact code takes 36 scanlines and 22 cycles to complete. You typically only need speed if you are switching kernels... say from a title screen to playing screen, and want to easily avoid scanline bounces. Edited August 9, 2014 by Omegamatrix Quote Link to comment Share on other sites More sharing options...
+Andrew Davie Posted August 9, 2014 Author Share Posted August 9, 2014 lax #$00 Clear dex txs pha bne Clear Very nice! Quote Link to comment Share on other sites More sharing options...
LS_Dracon Posted August 9, 2014 Share Posted August 9, 2014 Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware. Yep. It's safe in Atari 2600 I assume, as many homebrews uses this opcode. Actually this and DCP. Very nice! Thanks but it's just your code with lax Quote Link to comment Share on other sites More sharing options...
+Andrew Davie Posted August 9, 2014 Author Share Posted August 9, 2014 lax #0 txs loop pha tsx bne loop Quote Link to comment Share on other sites More sharing options...
LS_Dracon Posted August 9, 2014 Share Posted August 9, 2014 Neat! Saves 2 cycles in the loop! Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 9, 2014 Share Posted August 9, 2014 Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware. I use LAX all the time, but have never used LXA #IMM as it was reportedly highly unstable. Looking at this page it might be true that loading zero might always work: http://www.oxyron.de/html/opcodes02.html note to LAX: DO NOT USE!!! On my C128, this opcode is stable, but on my C64-II it loses bits so that the operation looks like this: ORA #? AND #{imm} TAX. I'm writing this opcode as "LXA" because that is how DASM compiles it. Quote Link to comment Share on other sites More sharing options...
LS_Dracon Posted August 9, 2014 Share Posted August 9, 2014 (edited) LAX not works with imm (at least in DASM) So LXA is working fine on emulator. BTW I'm testing and having problems in these codes, it's not working. TSX must be set before PHA, but doesn't make sense to me... cld lxa #0 txs loop tsx pha bne loop Edited August 9, 2014 by LS_Dracon Quote Link to comment Share on other sites More sharing options...
+Andrew Davie Posted August 9, 2014 Author Share Posted August 9, 2014 LAX not works with imm (at least in DASM) So LXA is working fine on emulator. BTW I'm testing and having problems in these codes, it's not working. TSX must be set before PHA, but doesn't make sense to me... cld lxa #0 txs loop tsx pha bne loop The problem seems to be that your (and my) code exits with SP=0, whereas it should be $FF Add another PHA at the end, like this... lxa #0 txs loop pha tsx bne loop pha starts with x=0 and then puts that into SP, the PHA writes 0 to location 0, and sets the SP to $FF and we loop when SP is 1, the pha will write 0 to location 1, and SP becomes 0 which is then tsx'd and the loop ends, with SP=0 the final PHA resets the SP to $FF I haven't actually run this. But it looks reasonable. However, "LXA" is considered unstable and should probably not be used. And there's no LAX immediate as you have pointed out. So... lda #0 tax txs loop pha tsx bne loop pha It's not so elegant anymore. 9 bytes, but does have the advantage of a quicker (512 cycles) clear at the cost of an extra byte. Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 9, 2014 Share Posted August 9, 2014 (edited) lax #0 txs loop pha tsx bne loop There is a problem with the above code in that the stack pointer is left pointing to 0 after completion. LAX not works with imm (at least in DASM) So LXA is working fine on emulator. BTW I'm testing and having problems in these codes, it's not working. TSX must be set before PHA, but doesn't make sense to me... cld lxa #0 txs loop tsx pha bne loop The branch in the loop is never taken, as the very first time through TSX brings a value of 0 to X. PHA does not affect any flags. Edit: Andrew beat me to it. Edited August 9, 2014 by Omegamatrix Quote Link to comment Share on other sites More sharing options...
LS_Dracon Posted August 9, 2014 Share Posted August 9, 2014 That's the need of dex, to stack (txs) enter in loop as $FF. But then, the code get's bigger again. We're so close... Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 9, 2014 Share Posted August 9, 2014 (edited) Here's another one. ;25 scanlines + 18 cycles (1918 cycles total) ;A = 0 ;X = 0 ;Y = random ;SP = $FF ;zp ram location $FF = random cld lda #0 .loopClear: ldx #$48 ; PHA opcode = $48 txs inx bne .loopClear+1 ; jump between operator and operand to do PHA This sets the stack correctly, but leaves ram location $FF untouched. Not clearing $FF is okay for me. It can be used for a random seed, and often programmers use JSR with the stack aligned to $FF anyhow. starting at $48 instead of 0 or $FF makes the routine quicker. Edit just realized the mirror for the TIA registers starts at $40, so I don't actually clear: VSYNC VBLANK NUSIZ0 NUSIZ1 COLUP0 COLUP1 Most of these registers the programmer will set up during the program, so it's still not too bad as long as the user is aware that the initial state of them is unknown. Edited August 9, 2014 by Omegamatrix Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 9, 2014 Share Posted August 9, 2014 (edited) I believe I have just come up with an 8 byte solution that includes CLD, an no illegal opcodes: ;39 scanlines + 65 cycles (3029 cycles total) ;A = 0 ;X = 0 ;Y = random ;SP = $FF cld .loopClear: ldx #$0A ; ASL opcode = $0A inx txs pha bne .loopClear+1 ; jump between operator and operand to do ASL It takes the most cycles of any solution, but clears all the TIA registers and RIOT ram. Edited August 9, 2014 by Omegamatrix 3 Quote Link to comment Share on other sites More sharing options...
+Andrew Davie Posted August 10, 2014 Author Share Posted August 10, 2014 I believe I have just come up with an 8 byte solution that includes CLD, an no illegal opcodes: ;39 scanlines + 65 cycles (3029 cycles total) ;A = 0 ;X = 0 ;Y = random ;SP = $FF cld .loopClear: ldx #$0A ; ASL opcode = $0A inx txs pha bne .loopClear+1 ; jump between operator and operand to do ASL It takes the most cycles of any solution, but clears all the TIA registers and RIOT ram. The branch into mid-instruction which is a asl is very clever. However, I'm struggling to understand this. X is effectively initialised at 11 (first time) so that's where the first "a" value goes. But "a" is undefined -- effectively random. second time you do an "asl" every loop, so after 8 loops a will guaranteed be 0. And you branch until Z is zero (effectively when x gets to 0). So you never clear locations 0 to 10. And furthermore locations 10 to 17 effectively have randomish data. This code is bizarre, and this is my third attempt to analyse/respond. Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 10, 2014 Share Posted August 10, 2014 The branch into mid-instruction which is a pha is very clever. However, I'm struggling to understand this. X is effectively initialised at 11 (first time) so that's where the first "a" value goes. But "a" is undefined -- effectively random. second time you do an "asl" every loop, so after 8 loops a will guaranteed be 0. And you branch until Z is zero (effectively when x gets to 0). So you never clear locations 0 to 10. And furthermore locations 10 to 17 effectively have randomish data. This code is bizarre, and this is my third attempt to analyse/respond. Hi Andrew, A=0 by the time it hits the TIA mirrors at $40-$7F. It doesn't matter what value A starts with, as it will be zero for the "second time through" as it clears the mirrored addresses. As a bonus you know the carry will also always end up being clear by the end of this routine. 1 Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 10, 2014 Share Posted August 10, 2014 Does the second routine make sense now? SP REGISTER VALUE (FROM ACCUMULATOR, which gets ASL'd) $0B REFP0 %XXXXXXXX $0C REFP1 %XXXXXXX0 $0D PF0 %XXXXXX00 $0E PF1 %XXXXX000 $0F PF2 %XXXX0000 $10 RESP0 %XXX00000 $11 RESP1 %XX000000 $12 RESM0 %X0000000 $13 RESM1 %00000000 $14 RESBL A=0 for now on ;writes continue to start of TIA mirrors SP REGISTER VALUE (FROM ACCUMULATOR) $40 VSYNC 0 $41 VBLANK 0 $42 WSYNC 0 ... ;Writes continue through ZP $80-$FF clearing RIOT RAM ;At end of routine TIA registers and RIOT RAM cleared, ;A=X=0, SP = $FF Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted August 10, 2014 Share Posted August 10, 2014 You typically only need speed if you are switching kernels... say from a title screen to playing screen, and want to easily avoid scanline bounces. But why would you be clearing ram and registers at that point anyway? Of all the games I've altered to have more than a single kernel, I've never had to do it. Powerup only "requires" it because everything is in an unknown state...but even that is too broad of a statement to be using (i.e. you really only need to clear the stuff your regular game init routine misses, or gfx/aud registers that you won't be using at all). Quote Link to comment Share on other sites More sharing options...
LS_Dracon Posted August 10, 2014 Share Posted August 10, 2014 (edited) Assuming LXA is stable as LAX and removing dex from the loop and setting stack as $FF, this should works? We could test LXA in real hardware. I'm searching about it and people who said it's not stable, misunderstand referring as LAX. lxa #0 dex txs loop pha tsx bne loop EDIT : Definitely unstable, and it's not "lax #imm", it's AND A with X and load on X. Since X not starts as 0, A as well, it's not useful. Edited August 10, 2014 by LS_Dracon Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 10, 2014 Share Posted August 10, 2014 But why would you be clearing ram and registers at that point anyway? Of all the games I've altered to have more than a single kernel, I've never had to do it. Powerup only "requires" it because everything is in an unknown state...but even that is too broad of a statement to be using (i.e. you really only need to clear the stuff your regular game init routine misses, or gfx/aud registers that you won't be using at all). It's just much easier to clean it all. IMHO it also makes the game a lot easier to troubleshoot. Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted August 10, 2014 Share Posted August 10, 2014 EDIT : Definitely unstable, and it's not "lax #imm", it's AND A with X and load on X. Since X not starts as 0, A as well, it's not useful. Although LXA is unstable, it is possible that using 0 for the immediate value could be stable as Nukey described. My notes describes LXA as: AND byte with accumulator, then transfer accumulator to X register. And the unstable behaviour is described as: ORA #? AND #{imm} TAX In either case the accumulator is AND'd with the immediate value right before TAX. As long as you are ANDing with 0 you should be okay. That being said I'd still be a little iffy to implement it. Who knows if the behaviour will be different on some consoles? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.