SvOlli Posted February 26, 2021 Share Posted February 26, 2021 Not a killer hack, but still something I found useful. For easy notation: "A" means accumulator, "R" is a value stored in RAM. So, doing "A minus R" is easy SEC SBC R I was faced with the problem of doing "R minus A", remembering of something I learned at school called "2s complement", I came up with this: EOR #$FF SEC ADC R So, what I did there was "minus A plus R". And it worked like a charm with a "cost" of two bytes and two cycles. And also: On 2/25/2021 at 12:16 PM, c3po said: Well, for the sake of hacks, I think in any long program it's a useful commodity to have a table like this: [...] This allows some interesting "new instructions": That's a very nice idea. But "spending" 1/16 of a bank seems to be a bit expensive. But coding for coding for other 6502 based platforms (C128/C64 right now), I will definitely use this, as the table could be also created with a simple loop in RAM. (That would be a fun thing in SuperCharger.) Thanks! 1 Quote Link to comment Share on other sites More sharing options...
c3po Posted February 27, 2021 Share Posted February 27, 2021 19 hours ago, SvOlli said: I was faced with the problem of doing "R minus A", remembering of something I learned at school called "2s complement", I came up with this: EOR #$FF SEC ADC R I like it. You can still do a normal "A minus R" then complement it SEC SBC R EOR #$FF CLC ADC #01 simplified to: CLC SBC R EOR #$FF which is useful if the carry is known to be cleared already. 2 Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted February 27, 2021 Share Posted February 27, 2021 A simple hack: Use EOR BEQ/BNE instead of CMP BEQ/BNE. This has two advantages which might be useful: A is 0 if equal The C and V flags are not changed 3 Quote Link to comment Share on other sites More sharing options...
c3po Posted March 19, 2021 Share Posted March 19, 2021 If you need to do a modulo operation where the divisor is a power of two, you can use AND: M mod N = M AND (N-1). ; M mod 8 (8 is a power of 2...) LDA M AND #7 ; 8-1 Quote Link to comment Share on other sites More sharing options...
mzxrules Posted June 18, 2021 Share Posted June 18, 2021 not sure if it counts as a killer hack, and maybe this has been thought of before but I'll post it anyway. My game is structured such that I have a world composed of 256 rooms, and then the playfield of each room is composed of up to 64 page-aligned 8x16 sprites, which comes out to 4 pages of sprite data. Instead of storing an index from 0-63 in the world data, and then multiply it later by 16 to build the 2 byte room sprite pointer, I swap the first 4 least significant bits with the 4 most significant bits and store that in my world data. For example, let's say my sprite data is at F000, and say I want to store an index to sprite 0x2D, which is located at F2D0. If I store the index as 0xD2 instead, then I can generate the room sprite pointer without needing 2 byte multiplication: ldx roomId lda roomSpriteIdx, x pha and #$F0 ; a is now lower half of the pointer sta roomSprPtrL pla and #$0F clc ; These two lines can also just be adc #$F0 ; ora #$F0 sta roomSprPtrH 1 Quote Link to comment Share on other sites More sharing options...
+Andrew Davie Posted June 18, 2021 Share Posted June 18, 2021 11 hours ago, mzxrules said: not sure if it counts as a killer hack, and maybe this has been thought of before but I'll post it anyway. My game is structured such that I have a world composed of 256 rooms, and then the playfield of each room is composed of up to 64 page-aligned 8x16 sprites, which comes out to 4 pages of sprite data. Instead of storing an index from 0-63 in the world data, and then multiply it later by 16 to build the 2 byte room sprite pointer, I swap the first 4 least significant bits with the 4 most significant bits and store that in my world data. For example, let's say my sprite data is at F000, and say I want to store an index to sprite 0x2D, which is located at F2D0. If I store the index as 0xD2 instead, then I can generate the room sprite pointer without needing 2 byte multiplication: A neat observation. Just an aside: you can use the assembler to do the hard work for you in generating the swapped table - do not do it manually. Something like this... makes it visually easier (you see actual sprite # in the human-readable code) and also much easier to change later mac swapn .byte ({1}>>4)|(({1}&15)<<4) endm Table ; list of sprite indexes swapn 22 swapn 41 swapn 12 swapn 3 Quote Link to comment Share on other sites More sharing options...
+Omegamatrix Posted June 19, 2021 Share Posted June 19, 2021 14 hours ago, mzxrules said: not sure if it counts as a killer hack, and maybe this has been thought of before but I'll post it anyway. My game is structured such that I have a world composed of 256 rooms, and then the playfield of each room is composed of up to 64 page-aligned 8x16 sprites, which comes out to 4 pages of sprite data. Instead of storing an index from 0-63 in the world data, and then multiply it later by 16 to build the 2 byte room sprite pointer, I swap the first 4 least significant bits with the 4 most significant bits and store that in my world data. For example, let's say my sprite data is at F000, and say I want to store an index to sprite 0x2D, which is located at F2D0. If I store the index as 0xD2 instead, then I can generate the room sprite pointer without needing 2 byte multiplication: ldx roomId lda roomSpriteIdx, x pha and #$F0 ; a is now lower half of the pointer sta roomSprPtrL pla and #$0F clc ; These two lines can also just be adc #$F0 ; ora #$F0 sta roomSprPtrH I really like that one, but I think you meant three lines could be ORA #$F0 as: ldx roomId lda roomSpriteIdx, x pha and #$F0 ; a is now lower half of the pointer sta roomSprPtrL pla ora #$F0 sta roomSprPtrH If you don't need X=roomId after this bit of code than you can use TAX TXA to save 3 cycles too: ldx roomId lda roomSpriteIdx, x tax and #$F0 ; a is now lower half of the pointer sta roomSprPtrL txa ora #$F0 sta roomSprPtrH Using Y register instead for TAY TYA is also an option. Quote Link to comment Share on other sites More sharing options...
mzxrules Posted June 19, 2021 Share Posted June 19, 2021 4 hours ago, Andrew Davie said: A neat observation. Just an aside: you can use the assembler to do the hard work for you in generating the swapped table - do not do it manually. I'm definitely not doing it manually, that's for sure. But for my project, I'm using Python 3 instead of DASM to generate the final data that will be included in the rom. It offers me so much more flexibility with regards to how I manage my game's data than DASM does. 1 hour ago, Omegamatrix said: I really like that one, but I think you meant three lines could be ORA #$F0 as: ldx roomId lda roomSpriteIdx, x pha and #$F0 ; a is now lower half of the pointer sta roomSprPtrL pla ora #$F0 sta roomSprPtrH If you don't need X=roomId after this bit of code than you can use TAX TXA to save 3 cycles too: ldx roomId lda roomSpriteIdx, x tax and #$F0 ; a is now lower half of the pointer sta roomSprPtrL txa ora #$F0 sta roomSprPtrH Using Y register instead for TAY TYA is also an option. The first one is rather clever. I can't use it in my own game though because I use the extra bits not used by the index to store other data. For the second one, I already have it written as tax/txa in my own code, but you do bring up a good point; I hadn't realized that pha/pla take more cycles than tax/txa, presumably because pha/pla require memory access. Quote Link to comment Share on other sites More sharing options...
+Andrew Davie Posted June 19, 2021 Share Posted June 19, 2021 1 hour ago, mzxrules said: It offers me so much more flexibility with regards to how I manage my game's data than DASM does. Maybe so, but skilled use of macros does offer a lot of flexibility too. Quote Link to comment Share on other sites More sharing options...
+Omegamatrix Posted June 19, 2021 Share Posted June 19, 2021 12 hours ago, mzxrules said: The first one is rather clever. I can't use it in my own game though because I use the extra bits not used by the index to store other data. For the second one, I already have it written as tax/txa in my own code, but you do bring up a good point; I hadn't realized that pha/pla take more cycles than tax/txa, presumably because pha/pla require memory access. I also have thought about this since then, and while it won't work for your game I will mention that there is an illegal opcode option that is quicker, and uses less bytes: ldx roomId lda roomSpriteIdx, x ldx #$F0 sax roomSprPtrL ora #$F0 sta roomSprPtrH Quote Link to comment Share on other sites More sharing options...
azure Posted September 11, 2021 Share Posted September 11, 2021 (edited) On 6/12/2005 at 4:08 PM, Alex H said: Here's one I like: ; unsigned divide by 3 sta temp lsr lsr clc adc temp ror lsr clc adc temp ror lsr clc adc temp ror lsr clc adc temp ror lsr Sorry for bumping an old thread, but I think the accuracy of this divide by 3 algorithm can be improved a lot by a simple fix. As it is it produces an incorrect result for all multiples of 3 greater than 0. Just add 1 before each iteration, then the algorithm gives the correct answers up to 255/3. Only four iterations are actually necessary for 8-bit division, so there's a savings of 5 cycles. Five iterations will give accurate answers up to 1024/3 if extended into 16-bits. sta Temp ; 3 (3) ; iteration 1: a = a * 1/4 clc ; 2 (5) adc #1 ; 2 (7) ror ; 2 (9) lsr ; 2 (11) ; iteration 2: a = a * 5/16 sec ; 2 (13) adc Temp ; 3 (16) ror ; 2 (18) lsr ; 2 (20) ; iteration 3: a = a * 21/64 sec ; 2 (22) adc Temp ; 3 (25) ror ; 2 (27) lsr ; 2 (29) ; iteration 4: a = a * 85/256 sec ; 2 (31) adc Temp ; 3 (34) ror ; 2 (36) lsr ; 2 (38) ; iteration 5 is unnecessary for 8-bit division by 3 ; iteration 5: a = a * 341/1024 sec ; 2 (40) adc Temp ; 3 (43) ror ; 2 (45) lsr ; 2 (47) The assembly can be verified here: https://skilldrick.github.io/easy6502/ I've written a simple test program in C, which implements the algorithm in 16-bits. It flags all incorrect results. div3.c div3.txt.zip Edited September 11, 2021 by azure Added .txt output Quote Link to comment Share on other sites More sharing options...
+Omegamatrix Posted September 11, 2021 Share Posted September 11, 2021 I've made a whole bunch of 8 bit unsigned integer division routines. I've posted them in my blog, and on Nesdev. https://atariage.com/forums/blogs/entry/10805-unsigned-integer-division-routines/ For divide by 3 this is what I came up with: ;Divide by 3 ;18 bytes, 30 cycles sta temp lsr adc #21 lsr adc temp ror lsr adc temp ror lsr adc temp ror lsr It's good for all values 0-255. 3 Quote Link to comment Share on other sites More sharing options...
azure Posted September 12, 2021 Share Posted September 12, 2021 2 hours ago, Omegamatrix said: For divide by 3 this is what I came up with: It's good for all values 0-255. Hmm. That 21 addition is fascinating. I suppose you already knew, but 21 is the only number that works for that algorithm. +20 doesn't work for 252/3 and 255/3. And +22 doesn't work for 2/3 and 5/3. From what I can make of it, it's adding two adjustments to the summation series: +21/128 and it's also folding back in the carry. I'm not sure I can figure out what adjustment the carry is having, but it seems to make it all work for the full 8-bit range. Quote Link to comment Share on other sites More sharing options...
+batari Posted December 18, 2021 Share Posted December 18, 2021 On 9/11/2021 at 4:27 PM, Omegamatrix said: I've made a whole bunch of 8 bit unsigned integer division routines. I've posted them in my blog, and on Nesdev. https://atariage.com/forums/blogs/entry/10805-unsigned-integer-division-routines/ For divide by 3 this is what I came up with: ;Divide by 3 ;18 bytes, 30 cycles sta temp lsr adc #21 lsr adc temp ror lsr adc temp ror lsr adc temp ror lsr It's good for all values 0-255. While trying to figure out a good divide by 3 in 16 bits I stumbled upon some possible explanations to why this one works! First off, it is possible this is just doing a divide by 3 using the well known multiply and shift idea (don't divide by 3, multiply by 1/3!) that is, something like (x*(512/3))>>9 then mask out the lower 8 bits. I am not sure if this was the basis of the above code. However, I learned something else that I don't think is directly related: For any division by an odd number in binary, there is exactly ONE number in the range (whether 8 bits, 16 bits, 32 bits or any number of bits for that matter), where if you multiply by that number, you get the same result as division. That is, each odd divisor has exactly one "magic number" you can multiply by and it functionally performs a divide. For 8 bits, the magic number for 3 is 171 or $AB. And wouldn't you know it, the above looks a lot like a bitwise multiplication by $AB. Yeah, 512/3 is 171 so the number could have come from that. I am not sure what the general formula for calculating the magic number. I used brute-force C code to figure out that for 3 it is 171 ($AB, or 0b10101011). For 5, the number seems to be 205 ($CD). For 7, it seems to be 183 ($B7). Note that you cannot get some of these numbers using the multiply and shift method above so this doesn't appear to be that simple. I wrote a quick bit of code to figure these out. 3's magic number: 171 5's magic number: 205 7's magic number: 183 9's magic number: 57 11's magic number: 163 13's magic number: 197 15's magic number: 239 17's magic number: 241 19's magic number: 27 21's magic number: 61 23's magic number: 167 25's magic number: 41 For example, for 12/3, do 12*171, then just look at the lower 8 bits, and you get 4. It should work for all numbers evenly divisible by 3. Note that this algorithm, as I presented it, does not work correctly unless the numbers are evenly divisible. Perhaps the ADC #21 somehow corrects that problem. I am not exactly sure how that part works or how to tweak the other algorithms to correct that. That all said, this idea is easily expandable to 16 bits, which was how I learned about this. I ran a few numbers for that: 3's magic number: 43691 5's magic number: 52429 7's magic number: 28087 9's magic number: 36409 11's magic number: 35747 13's magic number: 20165 15's magic number: 61167 17's magic number: 61681 19's magic number: 51739 21's magic number: 53053 23's magic number: 14247 25's magic number: 23593 27's magic number: 55827 29's magic number: 49717 31's magic number: 31711 33's magic number: 33761 35's magic number: 44939 37's magic number: 7085 39's magic number: 28567 If you needed to do a fast general 16-bit divide, you could put these numbers in a table then run a bitwise multiplication of the magic number. As suggested this also does have the problem of not working right unless the numbers are evenly divisible. That was not a problem for what I was working on, as I didn't need to divide, but just a fast way to check if a number was evenly divisible on a CPU with a hardware multiplier but no divide. There may be a way to tweak the idea to work with numbers that are not evenly divisible, but I am not sure what it is. I don't understand exactly how this all works, I am just stating something interesting I learned recently and I remembered this post. Quote Link to comment Share on other sites More sharing options...
+Omegamatrix Posted December 18, 2021 Share Posted December 18, 2021 Hello @batari The math routines I wrote were all shift and add method with perbutations to make them shorter in bytes and execution if possible. What you stumbled upon is reciprocal multiplication. Jones has a good article on it. Although a little off topic I will say I started a math file for the ARM games to make division by a unsigned integer easier for everyone to do: // Example divide by 10: result = (dividend * DIV_BY_10) >> DIV_BY_10_SHIFT; Here is the file. You'd have to look at the particular value to see if it is valid for the entire 16 bit range. Math.h Quote Link to comment Share on other sites More sharing options...
+batari Posted December 18, 2021 Share Posted December 18, 2021 1 hour ago, Omegamatrix said: Hello @batari The math routines I wrote were all shift and add method with perbutations to make them shorter in bytes and execution if possible. What you stumbled upon is reciprocal multiplication. Jones has a good article on it. Although a little off topic I will say I started a math file for the ARM games to make division by a unsigned integer easier for everyone to do: // Example divide by 10: result = (dividend * DIV_BY_10) >> DIV_BY_10_SHIFT; Here is the file. You'd have to look at the particular value to see if it is valid for the entire 16 bit range. Math.h 14.71 kB · 1 download Nope, this is not reciprocal multiplication. The theory here is based on modular arithmetic. Although many of the factors strangely turned out to be the same, this does not involve any shifting at all. This is literally only multiplication and the result is present in the lower 8 or 16 bits with no shifting involved at all. As you found in math.h, reciprocal multiplication sometimes fails, and that is because it is an approximation. The theory I mentioned here never fails (for numbers evenly divisible) because according to theory, there is one and only one magic number for each odd divisor and this number is mathematically correct. Since it only works for evenly divisible numbers, it's not as useful as reciprocal multiplication for general division but for some things (like a modulo operator on a CPU with no division instructions) it is very useful. Quote Link to comment Share on other sites More sharing options...
+batari Posted December 18, 2021 Share Posted December 18, 2021 Here is the theory for those interested: https://en.wikipedia.org/wiki/Modular_multiplicative_inverse I don't really understand how to do the math but I get what it's saying and can write code to brute-force magic numbers. I calculated the magic number for 7 in 32 bits: 3067833783. That is, on a 32-bit machine, x/7 is the same as x*3067833783 for evenly divisible numbers. if 56/7=8, then 56*3067833783=8 (in mod 2^32). This number 3067833783 works for all evenly divisible values throughout the entire 32-bit range and it's mathematically correct and complete. Quote Link to comment Share on other sites More sharing options...
+Omegamatrix Posted December 20, 2021 Share Posted December 20, 2021 On 12/18/2021 at 2:53 AM, batari said: Here is the theory for those interested: https://en.wikipedia.org/wiki/Modular_multiplicative_inverse I don't really understand how to do the math but I get what it's saying and can write code to brute-force magic numbers. I calculated the magic number for 7 in 32 bits: 3067833783. That is, on a 32-bit machine, x/7 is the same as x*3067833783 for evenly divisible numbers. if 56/7=8, then 56*3067833783=8 (in mod 2^32). This number 3067833783 works for all evenly divisible values throughout the entire 32-bit range and it's mathematically correct and complete. Truthfully I don't understand the language either so I look around for examples. I made an excel sheet to play around with it. Mod Test.xlsx 1) Using 8 bit as example you can make a table of (n*(2^8))+1, where I used a range of n of 0 to (2^8)-1. Looking at the results this range can probably be trimmed (like throwing out zero), but this is just for illustration. 2) Take the odd number divisor and and divide each number of the table starting from the lowest. Check each result, and stop at the first instance of an integer. It will be the magic number for that divisor within the masking of (2^8). - An observation is that finding one magic number is actually finding 2. If 239 is the magic number for 15, then 15 is the magic number of 239. - In the above example I am assuming it can easily be expanded to different bit sizes, 2^16, etc... though I didn't try. - I ignored 1 as divisor, and (2^8)-1 = 255 as a divisor. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.