johncl Posted December 29, 2008 Share Posted December 29, 2008 (edited) I like that version, nice use of LAX, but you gotta be brave and not use the temp LAX word INX AND #$F0 STA word TXA AND #$0F ORA word STA word A late reply to the starting hack of this thread. If you got a free byte of memory space and you could initialise that before the loop this should be faster: At an init stage store the top 4 bits in a variable: lda word and #$f0 sta hi And in your loop when you are iterating and need the wraparound increment on lower 4 bits: ldx word ; 3 inx ; 2 txa ; 2 and #$0f ; 2 ora hi; 3 sta word ; 3 Uses a total of 15 cycles, 4 less than the one quoted assuming word and hi are in zero page. Well, if you can assume that your "word" variable bit 4 is always zero (so counter starts from one of $00,$20,$40,$60,$80,$a0,$c0,$e0) you could do this also: ldx word; 3 inx ; 2 txa ; 2 and #$ef; 2 sta word; 2 Which is only 11 cycles! Edited December 29, 2008 by johncl Quote Link to comment Share on other sites More sharing options...
bogax Posted February 24, 2009 Share Posted February 24, 2009 As I was growing up, I kept a notebook full of cool code snippets and ideas. My notebook had been misplaced but I ran across it recently and here is one of the pages which is from a 1987 Dr. Dobbs article by Mark S. Ackerman. "6502 Killer Hacks". Post your own 6502 Killer Hacks and share them with the rest of us! . . . Well here is the killer hack. This one is to scrimp on RAM. Incrementing only the lower 4 bits of a byte (with wrap) . . . - David Just joined these forums so sorry if I'm a little late to this party Here's a couple of my favorites First the counter eor something with its self you get 0 eor something with 0 you get its self lda counter inc counter eor counter and #$F0 eor counter sta counter Of course you can insert bits from one byte into another byte (not just from a changed version of itself) Used eg for setting pixels ========= Parity is just an xoring of bits A simple sum is just an xoring of bits 0+0=0 0+1=1 1+0=1 1+1=0 Disregarding the carry obviously Carry is a way of propagating bits across a byte (sort of) 000a +0111 =a??? We can combine the two to get parity and collect "bits" across a byte ;parity of A sta temp asl eor temp and #b10101010 adc #b01100110 and #b10001000 adc #b01111000 ;now the parity is in the sign bit ========= Already posted this to a different thread Rotate two bits left through the carry asl adc #$80 rol Do it twice to swap nibbles ============ Kernigans method for counting set bits in a byte This code lifted directly from dclxvi in the 6502.org programming forum http://forum.6502.org/viewtopic.php?p=6993...highlight=#6993 TAX BEQ L2 LDX #0 SEC L1 INX STA SCRATCH SBC #1 AND SCRATCH BNE L1 TXA L2 RTS Quote Link to comment Share on other sites More sharing options...
+batari Posted February 24, 2009 Share Posted February 24, 2009 (edited) Kernigans method for counting set bits in a byte This code lifted directly from dclxvi in the 6502.org programming forum http://forum.6502.org/viewtopic.php?p=6993...highlight=#6993 TAX BEQ L2 LDX #0 SEC L1 INX STA SCRATCH SBC #1 AND SCRATCH BNE L1 TXA L2 RTS A simple shifting approach is more efficient in terms of size (and in many cases, cycles) than any other routine I saw on that thread. They all start with the value passed in the accumulator and return the accumulator, so I'll do the same: sta temp lda #0 loop asl temp beq done adc #0 bcc loop done I'm sure this can be improved somehow. Edited February 24, 2009 by batari Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted February 24, 2009 Share Posted February 24, 2009 How about this? A bit smaller using the X register to count: ;A=byte value ldx #-1 Bump_Count inx Next_Bit lsr bcs Bump_Count bne Next_Bit ;X=number of set bits, A=0 Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted February 24, 2009 Share Posted February 24, 2009 Maybe precede the LSR with CLC, but it's still smaller. Quote Link to comment Share on other sites More sharing options...
+batari Posted February 24, 2009 Share Posted February 24, 2009 Maybe precede the LSR with CLC, but it's still smaller. Why would you need to do that? But anyway, this further proves my point that the code linked in the thread above isn't ideal in terms of size and (with the probable exception of the 256-byte table) cycles. Quote Link to comment Share on other sites More sharing options...
bogax Posted February 24, 2009 Share Posted February 24, 2009 A simple shifting approach is more efficient in terms of size (and in many cases, cycles) than any other routine I saw on that thread. Yes I just think it's a clever hack (OK, so it's not a killer hack..) I presume it was originally in C Quote Link to comment Share on other sites More sharing options...
grafixbmp Posted March 2, 2009 Share Posted March 2, 2009 Anyone have a slick hack for taking a byte and separate 5 bits on one side and the other 3 bits as well? I saw where some were talking about swaping nibbles. This is also usefull for taking 11111111 and producing 00001111 and 11110000 but shifting it down to be 00001111. The the 5 /3 one would be like taking 11111111 and producing 00011111 and 00000111 out of it with minimal cycles used. just curious. Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted March 2, 2009 Share Posted March 2, 2009 5 bits? Is this for audio frequency? If so, are you aware that the upper 3 bits are irrelivant for these registers? Likewise, the upper 4 bits are irrelivant for distortion and volume registers. So you could use the original merged value to update frequency, then drop the upper 3 bits down (5xLSR) for one of the other registers. Quote Link to comment Share on other sites More sharing options...
grafixbmp Posted March 2, 2009 Share Posted March 2, 2009 (edited) 5 bits? Is this for audio frequency? If so, are you aware that the upper 3 bits are irrelivant for these registers? Likewise, the upper 4 bits are irrelivant for distortion and volume registers. So you could use the original merged value to update frequency, then drop the upper 3 bits down (5xLSR) for one of the other registers. Yes. but I was more intrested in getting thoes last 3 bits ready ASAP for audio control. The other byte is used for sustain and rest duration. This is how long the volume is held and how long it is off. I was going to organize thoes 3 bits to cover the most usable distortion settings on the audio control register. Edited March 2, 2009 by grafixbmp Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted March 2, 2009 Share Posted March 2, 2009 drop the upper 3 bits down (5xLSR) for one of the other registers. 4 ROLs Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted March 2, 2009 Share Posted March 2, 2009 List 'em first. %sssssccc. It's only 2 cycles to AND off the upper bits...and by using LAX, you don't need to reload (the original value is still in X). LAX tablevalue AND #7 STA AUDCn TXA LSR LSR LSR Quote Link to comment Share on other sites More sharing options...
grafixbmp Posted March 2, 2009 Share Posted March 2, 2009 List 'em first. %sssssccc. It's only 2 cycles to AND off the upper bits...and by using LAX, you don't need to reload (the original value is still in X). LAX tablevalue AND #7 STA AUDCn TXA LSR LSR LSR How quick then would it be to do the others from X? Remove the low 3 bits and shift down. Or somehow keep the carry at 0 while ROR 3 times Quote Link to comment Share on other sites More sharing options...
fox Posted March 31, 2009 Share Posted March 31, 2009 After research, I came up with something really short (17 bytes) ; Binary in A sed sta temp1 lda #0 ldx #8 loop asl temp1 sta temp2 adc temp2 dex bne loop cld ; BCD in A What's cool about this one is that it actually will do 8-bit binary -> 9-bit BCD, with the 9th bit contained in the carry! Can this be improved any more, though? Faster, one byte shorter and not using X: sec rol @ sta bin lda #0 sed do_bit sta bcd adc bcd asl bin bne do_bit cld Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted March 3, 2011 Share Posted March 3, 2011 Just saw this today: Average of IntegersThis is actually an extension of the "well known" fact that for binary integer values x and y, (x+y) equals ((x&y)+(x|y)) equals ((x^y)+2*(x&y)). Given two integer values x and y, the (floor of the) average normally would be computed by (x+y)/2; unfortunately, this can yield incorrect results due to overflow. A very sneaky alternative is to use (x&y)+((x^y)/2). If we are aware of the potential non-portability due to the fact that C does not specify if shifts are signed, this can be simplified to (x&y)+((x^y)>>1). In either case, the benefit is that this code sequence cannot overflow. http://aggregate.ee.engr.uky.edu/MAGIC/#Average%20of%20Integers In 6502 assembly: lda a and b sta temp lda a eor b lsr clc adc temp Next question: extend to more than 2 integers, and is it possible to do without temp RAM? Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted March 3, 2011 Share Posted March 3, 2011 Why not clc lda a adc b ror ? Quote Link to comment Share on other sites More sharing options...
djmips Posted March 4, 2011 Author Share Posted March 4, 2011 (edited) As I was growing up, I kept a notebook full of cool code snippets and ideas. My notebook had been misplaced but I ran across it recently and here is one of the pages which is from a 1987 Dr. Dobbs article by Mark S. Ackerman. "6502 Killer Hacks". Post your own 6502 Killer Hacks and share them with the rest of us! . . . Well here is the killer hack. This one is to scrimp on RAM. Incrementing only the lower 4 bits of a byte (with wrap) . . . - David Just joined these forums so sorry if I'm a little late to this party Here's a couple of my favorites First the counter eor something with its self you get 0 eor something with 0 you get its self lda counter inc counter eor counter and #$F0 eor counter sta counter Of course you can insert bits from one byte into another byte (not just from a changed version of itself) Used eg for setting pixels ========= haven't read this thread for awhile (thanks to vdub to resurrect it so I would actually see some of the cool additions) This is more likely the original Ackerman 'hack' for incrementing only the low 4 bits of a byte without requiring any additional memory. I think the other 'bad' version must have been my own idle mind playing around with other ideas. Thanks bogax. Edited March 4, 2011 by djmips Quote Link to comment Share on other sites More sharing options...
djmips Posted April 4, 2017 Author Share Posted April 4, 2017 Why not clc lda a adc b ror? This won't work because you could overflow if the numbers were both > 128 Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted April 4, 2017 Share Posted April 4, 2017 The ROR will take take of that. Quote Link to comment Share on other sites More sharing options...
djmips Posted April 4, 2017 Author Share Posted April 4, 2017 Ah yes, I suspected I would be corrected. Of course, the carry is the ninth bit. I am rusty. Quote Link to comment Share on other sites More sharing options...
c3po Posted February 24, 2021 Share Posted February 24, 2021 In (timely) response to the initial post... lda work ; or ldy work clc ; iny adc #1 ; tya eor work and #$0f eor work sta work and if you want any subset of contiguous bits you can do like that: lda work clc adc #4 ; value of the lsb of the increment group eor work and #%00011100 ; 3 bits (2...4) eor work sta work this will cycle bits 2...4 1 Quote Link to comment Share on other sites More sharing options...
tokumaru Posted February 25, 2021 Share Posted February 25, 2021 This is really good! Man, I missed this thread! Quote Link to comment Share on other sites More sharing options...
djmips Posted February 25, 2021 Author Share Posted February 25, 2021 16 hours ago, c3po said: In (timely) response to the initial post... lda work ; or ldy work clc ; iny adc #1 ; tya eor work and #$0f eor work sta work and if you want any subset of contiguous bits you can do like that: lda work clc adc #4 ; value of the lsb of the increment group eor work and #%00011100 ; 3 bits (2...4) eor work sta work this will cycle bits 2...4 I'm pretty sure this was later surfaced in this very long thread but appreciate your contribution! Quote Link to comment Share on other sites More sharing options...
CPUWIZ Posted February 25, 2021 Share Posted February 25, 2021 Just pinned it. 1 1 Quote Link to comment Share on other sites More sharing options...
c3po Posted February 25, 2021 Share Posted February 25, 2021 Well, for the sake of hacks, I think in any long program it's a useful commodity to have a table like this: NumTab ; values 0-255 byte 0, 1, 2, 3, 4, 5, 6, 7 byte 8, 9, 10, 11, 12, 13, 14, 15 ... byte 248, 249, 250, 251, 252, 253, 254, 255 This allows some interesting "new instructions": AND NumTab,X ; A AND X ORA NumTab,X ; A OR X EOR NumTab,X ; A XOR X CMP NumTab,X ; CMP A with X CLC ADC NumTab,X ; A + X SEC SBC NumTab,X ; A - X LDY NumTab,X ; TXY LDX NumTab,Y ; TYX 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.