Tonight I was scrubbing my blogs to reformat all the mangled code block entries and thought about some code I made in @Karl G's topic about BCD to Binary Routines.
Here is the last routine I made:
; BCD value $0 - $99 is in A. Returns binary number 0-99 in A
BCDtoBin6:
sta Temp
and #$F0
lsr
sta Temp2
lsr
lsr
adc Temp
sec
sbc Temp2
rts
Here is @Andrew Davie's explaination of the routine:

I can't sleep. I'm consumed by this problem and I keep thinking about it...
Tonight I discovered some of Batari's solutions for hex to bcd conversion here and here. They are very compact solutions, and neat. I started to wonder if a constant (un-looped) cycle approach could be found without using tables. Then the insomnia crept in. I decided to have a go at translating hex values of 0-99 (BCD range for a byte).
The approach I took was to modify my divide by 10 routine so

I was watching The Story of Maths on Netflix. One of the episodes discussed how ancient people discovered (example below): 4*6 = (5^2) - 1 or: a*b = (a+1)^2 - 1, when b = a+2 The pattern here is that 4, 5, and 6 are all sequential integers. They also proved this pattern was valid for all integers. I found this interesting and made an excel sheet to check it out. The natural question was could this be extended for numbers farther apart? I found that it could. I discovered patterns for a*b w

A while back I did an entry on Faster Initialization for the 2600. Today's entry is about squeezing the bytes out of the Initialization routine to the bare minimum. Here is a routine I came up with a while back:
;Economical 8 byte initialization routine
;By Omegamatrix
cld
.loopClear:
ldx #$0A ; ASL opcode = $0A
inx
txs
pha
bne .loopClear+1 ; jump between operator and operand to do ASL
; A=0, X=0, Y=random, S

I've covered some Hex to Decimal conversions like 0-255 and 0-65535 already in my blog, and have had done some previous solutions to 0-99 as well. Today I was thinking about converting a single byte from a hex value to BCD (0-99) once again. I came up with this routine which is the shortest, simplest, and fastest yet.
;Hex2Bcd (good 0-99)
;22 bytes, 26 cycles
tay ;2 @2
lsr ;2 @4
lsr ;2 @6
lsr ;2 @8
lsr ;

I saw a topic on 6502.org about converting byte to hex string. I remembered Lee Davison's code shorts about converting 0 to 15 to ascii 0 to 15 by using decimal mode. I know Lee passed along this year so RIP Lee, I didn't know you but you seemed like a genius to me.
Using his basic code I came up with this for converting a byte:
;HEX to ASCII
; A = entry value
sed ;2 @2
tax ;2 @4
and #$0F ;2 @6
cmp #9+1 ;2 @8
adc #$30 ;2 @10
tay

While posting my Unsigned Integer Division Routines on NesDev, a new member there said he was looking for a divide by 40 and mod 40. I wrote a few routines, but this one really sticks out as a neat idea.
;Divide by and Mod 40 combined
;38 bytes, 45 cycles
;Y = value to be divided
InterlacedMultiplyByFortyTable:
lda #0 ; dummy load, #0 used in LUT
lda #0
cpy #40
adc #0
cpy #80
adc #0
cpy #120
adc #0
cpy #160
adc #0
cpy #200
adc #0
cpy #24

Sometimes you end up with routines that use a lot of tables. While writing my (0-65535) Hex to Decimal routine I ended up with a lot of 16 byte tables (6 of them actually). I realized that an optimization could be made by interlacing the tables. Normally I would do something like this:
lda hexValue ;3 @3
lsr ;2 @5
lsr ;2 @7
lsr ;2 @9
lsr ;2 @11
tay

EOR is a useful function. I often use it in a situation where I already have "myNumber" in the accumulator, and I want to do a subtraction. Instead of doing this:
sta temp ; myNumber (0 to 255)
lda #$FF ; 255
sec
sbc temp ; 255 - myNumber
You can just do this:
eor #$FF ; 255 - myNumber
You save a lot of cycles and bytes. You can apply the same idea in other situations. Another common one is EOR $0F with a value of

I had another go at an one byte 'hex to decimal' conversion routine. This time I kept it simple. It's a much better effort, and much quicker then before!
Boring stats:
;--------------------------------
;0-255 conversion stats
;--------------------------------
;cycles occurances
;47 - 20
;48 - 0
;49 - 10
;50 - 36
;51 - 0
;52 - 40
;53 - 10
;54 - 10
;55 - 40
;56 - 0
;57 - 40
;58 - 10
;59 - 10
;60 - 20
;61 - 0
;62 - 10
;average execution is 54.10 cycles
It looks like a ni

Continuing with my Hex to Decimal routines, I have written one for 16 bit numbers.
This routine is really geared toward a NES. The NES has no decimal mode. I think most of the time programmers will just break their scores out into multiple bytes (one for each digit), and then handle rollover cases for greater than 9, and less than 0.
Here we are dealing with the case where they do want to use just 2 bytes of ram to hold a score. The idea is to take 16 bit number and conv

I took a stab at making a 16 bit division routine tonight. I took the approach of dividing the high and low bytes separately, since I have already written a bunch of 8 bit division routines. I then corrected the result with a couple of small look up tables.
This is the 16 bit routine I came up with. At 111 cycles max, and 96 bytes total it's not too bad.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; UNSIGNED DIVIDE BY 10 (16 BIT)
; 111 cycles (max), 96 bytes
;;;;;;;;;;

This morning I decided to extend my Hex to BCD routines so that I can cover a whole byte.
This is what I came up with :
;---------------------------------------
lda hexValue ;3 @3 (0 - 255)
ldx #0 ;2 @5 hundreds digit = 0
;divide by 10, and times result by 6
sta temp ;3 @8
lsr ;2 @10
adc #13 ;2 @12
adc temp ;3 @1

Most (not all) Atari 2600 games use an initialization routine that clears the RIOT ram and sets the TIA registers to a known state. The most optimal code to do this is:
cld
ldx #0
txa
.loopClear:
dex
txs
pha
bne .loopClear
Andrew Davie wrote that, I think. Brilliant piece of code really. Clears everything and leaves A=0, X=0, and Stack Pointer = $FF. Takes only nine bytes too!
Sometimes when you are writing a game you switch bank

Normally paddles are read with a routine like this:
bit INPT0 ;3 @3
bmi .paddleDone-1 ;2³ @5/6
sty padZero ;3 @8
.paddleDone:
If the branch is taken it jumps between the opcode and operand of the 'sty padZero' instruction. The trick is to make the ram location of 'padZero' the same as an opcode that takes 2 cycles. That way the routine takes 8 cycles no matter what. There are several ram locations which could be chosen for

In general, multiplication in assembly is easy, and division is a bitch. There are three basic approaches to doing division. The first is to just do a loop in which the divisor is continually subtracted:
lda dividendValue
ldx #0 sec
.loopDivideBySeven
inx
sbc #7
bcs .loopDivideBySeven
The advantage to this approach is that it takes very few bytes. On the other hand when the dividend is large lots of loops get taken, and each loop piles on the cycle