Thomas Jentzsch Posted May 22, 2007 Share Posted May 22, 2007 And here is the code for when you are doing more bits at once. 3 in this example. lsr lsr lsr adc #$F0 eor #$F0 Cool! Quote Link to comment Share on other sites More sharing options...
djmips Posted May 23, 2007 Author Share Posted May 23, 2007 Getting the 2's complement of the accumulator, rudimentary, but the extra here is two implementations. One version uses clc, the other sec so you could save the sec or clc if you know the prior state of the carry flag. ;2's complement x = x ^ $FF + 1 ;assumes carry is clear eor #$FF adc #01 ; 2's complement x = x - 1 ^ $FF ; assumes carry is set sbc #01 eor #$FF Quote Link to comment Share on other sites More sharing options...
Robert M Posted May 31, 2007 Share Posted May 31, 2007 Cool! Here's another one: Modulus. I need to find n mod 20. Well, to be specific, I need to find if n mod 20 is zero or otherwise. Here's how I do it: lda n sec Loop sbc #20 bcc Remainder bne Loop ;--remainder is zero Which is slow. Is there a better way? That division by 5 is nasty for optimization. Instead I recommend dividing by 40 so that the average number of times through the loop is cut in half plus 1 add instruction. Its slower for some cases, but I estimate about 45% faster on average. Modulo20 ; input: A = the unsigned number to test (0 to 255) tax ; Need to set Z-flag for special case that A is already 0. ( your original code did not do this so it may not be important.) sec loop beq done sbc #40 bcs loop ; C = 0 adc #20 done ; Z = 1 if the input number was a multiple of 20. ; X = the original number ; A = garbage. Quote Link to comment Share on other sites More sharing options...
supercat Posted May 31, 2007 Share Posted May 31, 2007 That division by 5 is nasty for optimization. Instead I recommend dividing by 40 so that the average number of times through the loop is cut in half plus 1 add instruction. Its slower for some cases, but I estimate about 45% faster on average. How about something like: sec u160: ; Test 160 from above sbc #160 bcc b80 u80: sbc #80 bcc b40 u40: sbc #40 bcc b20 u20: sbc #20 bcc bzero done: rts b80: ; Test 80 from below adc #80 bcs u40 b40: adc #40 bcs u20 b20: adc #20 bcs done bzero: adc #20 rts That should take 10 cycles for adc/sbc's, 8-12 for branches, and six for the RTS. Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted January 23, 2008 Share Posted January 23, 2008 Anyone have a killer hack for 8.8 multiplication? I need to do 8.8 * 8.8 and have the result be 8.8. I know that the integer part of the result will not exceed 8 bits and I don't need a 16-bit remainder. What I've got is this: ; a.b * c.d = e.fg lda #0 sta e sta f sta g ldx a ldy b clc MultiplyLoop lda g adc d sta g lda f adc d sta f lda e adc #0 sta e dey bne MultiplyLoop dex bne MultiplyLoop ;result in e.fg ... ignore g to get e.f as 8.8 result (This ignores some zero cases.) The main problem is that this is slower than dirt. Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted January 23, 2008 Share Posted January 23, 2008 Anyone have a killer hack for 8.8 multiplication? I need to do 8.8 * 8.8 and have the result be 8.8. How much space do you have? Check this link: http://www.ffd2.com/fridge/chacking/c=hacking9.txt This is one of my all time favourites. Quote Link to comment Share on other sites More sharing options...
Cybergoth Posted January 23, 2008 Share Posted January 23, 2008 C= Hacking was brilliant, too bad it's been a few years already since the last issue. Still joping for its return Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted January 23, 2008 Share Posted January 23, 2008 (edited) Anyone have a killer hack for 8.8 multiplication? I need to do 8.8 * 8.8 and have the result be 8.8. How much space do you have? Check this link: http://www.ffd2.com/fridge/chacking/c=hacking9.txt This is one of my all time favourites. Do you mean the f(x) = (x^2)/4 method? That requires a table of squares, which is fine for 8 bits, but how do I translate that to 16 bits? The table I would need is enormous! I don't have that much space. Edited January 23, 2008 by vdub_bobby Quote Link to comment Share on other sites More sharing options...
Ben_Larson Posted January 23, 2008 Share Posted January 23, 2008 Anyone have a killer hack for 8.8 multiplication? I need to do 8.8 * 8.8 and have the result be 8.8. well if you're using fixed point notation, then 8.8 is the same as 16.0, right? I mean, it's just a matter of how you interpret it. with that said I think you could use this method (scroll down to 'Binary Multiplication'): http://www.atariarchives.org/roots/chapter_10.php Basically the algorithm is long multiplication like you'd do by hand, except in binary form. Quote Link to comment Share on other sites More sharing options...
supercat Posted January 24, 2008 Share Posted January 24, 2008 (scroll down to 'Binary Multiplication'): Three useful optimizations I would suggest: Subtract 1 from the multiplicand stored in memory, to eliminate the in-loop CLC. Store the other multiplicand into the product low byte location. In that way, the ROR that stores the product low bits can also sample the next bit of the multiplicand. If code space permits, unroll the loop at least somewhat. Unrolling 2x will save about twenty cycles. Such approaches still won't be as fast as the table-lookup approach, but they'll be pretty good. Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted January 24, 2008 Share Posted January 24, 2008 (edited) Do you mean the f(x) = (x^2)/4 method? That requires a table of squares, which is fine for 8 bits, but how do I translate that to 16 bits? The table I would need is enormous! I don't have that much space. No, you only require those 8 bit tables. I did this back in the 80ies with my C64. IIRC I did use 5 256 bytes tables (some were used just to save a few cycles) for 32 bit FP math (Mandelbrot). Let's see if I can remember it... A8:8*B8:8 = A8:0*B8:0 + 2*A8:0*B0:8 + 2*A0:8*B8:0 + A0:8*B:08 So everything is reduced to 8 (or 9) bits. You can get away with just one 256 bytes table. Or you can do extra tables for the 9 bit results to save a few extra cycles. Edited January 24, 2008 by Thomas Jentzsch Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted January 24, 2008 Share Posted January 24, 2008 No, you only require those 8 bit tables. I did this back in the 80ies with my C64. IIRC I did use 5 256 bytes tables (some were used just to save a few cycles) for 32 bit FP math (Mandelbrot). Let's see if I can remember it... A8:8*B8:8 = A8:0*B8:0 + 2*A8:0*B0:8 + 2*A0:8*B8:0 + A0:8*B:08 So everything is reduced to 8 (or 9) bits. You can get away with just one 256 bytes table. Or you can do extra tables for the 9 bit results to save a few extra cycles. Thanks. It turns out what I really need is A0:8 * B8:8 and I don't need the least-significant byte. So I think I can do this... A0:8 * B0:8 + 256*(A0:8 * B8:0) and then just throw away the bottom byte. My next question! You need a table of the values (x^2)/4 for x=0 to 255. But what happens when (x^2)/4 > 255? I.e., for x>31? Don't you need two tables, one for the lower byte and one for the upper byte. And don't your tables need to be ~512 bytes long, for when x+y>255? Since xy=f(x+y)-f(x-y) Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted January 24, 2008 Share Posted January 24, 2008 A0:8 * B0:8 + 256*(A0:8 * B8:0)and then just throw away the bottom byte. You have to double A0:8 * B8:0 You need a table of the values (x^2)/4 for x=0 to 255. But what happens when (x^2)/4 > 255? I.e., for x>31? Don't you need two tables, one for the lower byte and one for the upper byte. Sure, 2 tables here. We are getting close to the 5 tables I remember. And don't your tables need to be ~512 bytes long, for when x+y>255? Since xy=f(x+y)-f(x-y) That would be another 2 tables, yes. Unless I miss something. Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted January 24, 2008 Share Posted January 24, 2008 A0:8 * B0:8 + 256*(A0:8 * B8:0)and then just throw away the bottom byte. You have to double A0:8 * B8:0 Nah, slightly different algorithm, I think. Use base-10, let each letter be a digit: a * bc = a * ( ( 10 * b ) + c ) = 10 * a * b + a * c Same thing, only where each "digit" is an 8-bit number, the base is 256. So a * bc = a * ( ( 256 * b ) + c ) = 256 * a * b + a * c In other words, I drop the "place" value of each digit and then add it back in at the end, kinda like when you do multiplication on paper: a b * c d ------ ad bd ac bc -------- Using a lot of large tables, I think I can get my A0:8 * B8:8 multiplication, throwing away the bottom byte of the result, in 77 cycles: ;--a * b:c lda a sta IntPtrPlus sta FractPtrPlus sta IntPtrMinus sta FractPtrMinus ;four tables so I don't have to figure -c ldy c lda (IntPtrPlus),Y sec sbc (IntPtrMinus),Y sta ResultInt1 ;--don't care about bottom byte of a * c since I'll drop it anyway ldy b lda (IntPtrPlus),Y sec sbc (IntPtrMinus),Y sta ResultInt2 lda (FractPtrPlus),Y sbc (FractPtrMinus),Y clc adc ResultInt1 sta ResultInt1 lda ResultInt2 adc #0 sta ResultInt2 I *think* that will work... Quote Link to comment Share on other sites More sharing options...
Robert M Posted January 24, 2008 Share Posted January 24, 2008 Anyone have a killer hack for 8.8 multiplication? I need to do 8.8 * 8.8 and have the result be 8.8. I know that the integer part of the result will not exceed 8 bits and I don't need a 16-bit remainder. So is this really a 4.8 * 4.8 multiply, or you can just guarantee that if the one value is bigger than 12 bits, then the other will be sufficiently small to prevent an 8-bit overflow? If it is really just 4.8 * 4.8 and you are discarding the fraction, then you can just do 4.4 * 4.4. Because the lower 4 bits of the fractions can never effect the integer portion of the result. 4.4 * 4.4 is just a bit shifted 8*8. I think that will simplify things. Cheers! Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted January 24, 2008 Share Posted January 24, 2008 (edited) So is this really a 4.8 * 4.8 multiply, or you can just guarantee that if the one value is bigger than 12 bits, then the other will be sufficiently small to prevent an 8-bit overflow? If it is really just 4.8 * 4.8 and you are discarding the fraction, then you can just do 4.4 * 4.4. Because the lower 4 bits of the fractions can never effect the integer portion of the result. 4.4 * 4.4 is just a bit shifted 8*8. I think that will simplify things. If you scroll/read down a bit, you'll see that it is really 0.8 * 8.8 and I want an 8.8 result. And I wrote a ~ 1-scanline multiply routine, just waiting for someone to tell me (a) how to make it faster or (b) how it won't work. Edited January 24, 2008 by vdub_bobby Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted January 24, 2008 Share Posted January 24, 2008 Nah, slightly different algorithm, I think. You are right. My mistake. Quote Link to comment Share on other sites More sharing options...
Robert M Posted January 24, 2008 Share Posted January 24, 2008 So is this really a 4.8 * 4.8 multiply, or you can just guarantee that if the one value is bigger than 12 bits, then the other will be sufficiently small to prevent an 8-bit overflow? If it is really just 4.8 * 4.8 and you are discarding the fraction, then you can just do 4.4 * 4.4. Because the lower 4 bits of the fractions can never effect the integer portion of the result. 4.4 * 4.4 is just a bit shifted 8*8. I think that will simplify things. If you scroll/read down a bit, you'll see that it is really 0.8 * 8.8 and I want an 8.8 result. And I wrote a ~ 1-scanline multiply routine, just waiting for someone to tell me (a) how to make it faster or (b) how it won't work. Since you are thowing away the bottom byte of the 8.16 result, doesn't the problem reduce to 0.4 * 8.4, and you can just use a lookup table with the 16 possible 8.8 results? Or is my thinking all screwy on this? Cheers! Quote Link to comment Share on other sites More sharing options...
Robert M Posted January 24, 2008 Share Posted January 24, 2008 Since you are thowing away the bottom byte of the 8.16 result, doesn't the problem reduce to 0.4 * 8.4, and you can just use a lookup table with the 16 possible 8.8 results? Or is my thinking all screwy on this? Cheers! My thinking is screwy. The table won't work, but the reduction to 0.4 * 8.8 is still correct I think. Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted January 24, 2008 Share Posted January 24, 2008 (edited) Since you are thowing away the bottom byte of the 8.16 result, doesn't the problem reduce to 0.4 * 8.4, and you can just use a lookup table with the 16 possible 8.8 results? Or is my thinking all screwy on this? Cheers! My thinking is screwy. The table won't work, but the reduction to 0.4 * 8.8 is still correct I think. I'm not sure if that is correct or not...it doesn't seem like it to me, but I have to think for a long time to wrap my head around these things But in any case, I don't think it will help my current routine, other than reducing the size of my tables, since multiplying a nibble by a byte will be the same as multiplying a byte by a byte. Or... Edited January 24, 2008 by vdub_bobby Quote Link to comment Share on other sites More sharing options...
Robert M Posted January 24, 2008 Share Posted January 24, 2008 Okay, I did a spreadsheet in Excel to prove this to myself. 0.8 * 8.8 = 8.16 but we are throwing the lower 8 bits of the answer away which makes the problem this: 0.8 * 8.8 = 8.16 / 256 We want to move the division by 256 to the left side of the equation, which because it is applied after the multiplication we must take the square root of 256. That is 16, so: (0.8 / 16) * (8.8 / 16) = 8.16 / 256 [Excel says I am right ] So the problem reduces to: 0.4 * 8.4 = 8.8 I don't know if that makes the code easier, but it should get the same answer as the original (0.8 * 8.8)/256 minus a small rounding error. Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted January 24, 2008 Share Posted January 24, 2008 (edited) Hmmm. Have to think about that... EDIT: I'm not exactly following this. Doesn't 8.16 / 256 = 0.24 ? EDIT: Actually, more generally, I think I follow your argument - but what is that rounding error? How big is it? After all, 2.95 * 0.73 = 2.1535 If you throw away the bottom two digits, you get 2.15 which is within 1/2% of the true answer. But if you truncate before multiplying you get 2.03, which is over 5% away from the actual answer. And the difference would be larger for truncating 4 bits, so I dunno...a pre-truncating routine would have to be smoking fast to be worth the error, I think. Edited January 24, 2008 by vdub_bobby Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted January 24, 2008 Share Posted January 24, 2008 0.4 * 8.4 = 8.8 Or e.g. 0.2 * 8.6? Quote Link to comment Share on other sites More sharing options...
Robert M Posted January 25, 2008 Share Posted January 25, 2008 0.4 * 8.4 = 8.8 Or e.g. 0.2 * 8.6? Okay my shorthand was too short 8.8 means 8 bits integer and 8 bits fraction. so the original equation was ([0].[8] * [8].[8]) >> 8 , where >> 8 is the same as /256 I am saying you can divide the arguments before you multiply and get the same answer with less multiplying and the same amount of dividing. (([0].[8] >> 4) * ([8].[8] >> 4)) == ([0].[8] * [8].[8]) >> 8 The original equation on the right is easier to round after the divide by bitshift since you can just add the .5 in the carry back into the answer. The new equation has two shifts and you would need to remember the carry from both and round up the answer by one only if both were set. The equation on the left after the shift is a nybble * 12 bits which I think should be easier than a byte times 16 bits. Just the rounding gets tougher. I will try to come up with some code shortly. Quote Link to comment Share on other sites More sharing options...
Hornpipe2 Posted April 11, 2008 Share Posted April 11, 2008 I personally like to use BRANCH instructions instead of JMP wherever possible. It saves a byte of ROM each time, and the cycle cost is the same. There are so many branch instructions you can usually find one that will execute 100% of the time for your given application. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.