ilmenit Posted January 2, 2022 Share Posted January 2, 2022 Hi, I'm looking for a way to shorten (by the size, not the cycles) a procedure that is performing multiplication of 8bit value to 16bit result (where the result address is not overlapping the value address). So far I got it down to 24 ($18) bytes. Any ideas how to shorten it more? word-mul16.asm 2 Quote Link to comment Share on other sites More sharing options...
Eagle Posted January 2, 2022 Share Posted January 2, 2022 https://codebase64.org/doku.php?id=base:short_8bit_multiplication_16bit_product Quote Link to comment Share on other sites More sharing options...
ilmenit Posted January 2, 2022 Author Share Posted January 2, 2022 5 minutes ago, Eagle said: https://codebase64.org/doku.php?id=base:short_8bit_multiplication_16bit_product this one is longer. Btw, when moving the variables to Zero Page the Mul3 is winning so far by length: * MUL1: $0015 * MUL2: $0015 * MUL3: $0014 Quote Link to comment Share on other sites More sharing options...
rensoup Posted January 3, 2022 Share Posted January 3, 2022 .proc mul3 ldx val ldy #4 loop: ; ASL16 txa ASL tax LDA result+1 ROL STA result+1 ; dec loop dey bne loop stx result rts .endp seems a little too simple ? 1 Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted January 3, 2022 Share Posted January 3, 2022 (edited) Scrap that Edited January 3, 2022 by Wrathchild Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted January 3, 2022 Share Posted January 3, 2022 (edited) Edit: Forget this, it's your "mul2 routine", but I wonder why you 'CLC' before 'ASL'ing...? Original post: Does this qualify? lda val pha asl asl asl asl sta result pla lsr lsr lsr lsr sta result+1 rts (Extra candy: X and Y are not touched....) Edited January 3, 2022 by Irgendwer 4 Quote Link to comment Share on other sites More sharing options...
TGB1718 Posted January 3, 2022 Share Posted January 3, 2022 (edited) Nice if you only want to multiply by $10, but neat anyway ? Unless I read the intro wrong, I think he wants 8bit multiply with 16bit result Edited January 3, 2022 by TGB1718 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 3, 2022 Share Posted January 3, 2022 Perhaps I'm missing something crucial, but it seems more concise to rotate the memory contents directly: lda #2 sta val jsr mul4 ... .proc mul4 lda val sta result lda #0 sta result+1 ldy #3 Loop: asl result rol result+1 dey bpl Loop rts .endp Note I'm initialising the upper 8 bits of the result; lda #0/sta result+1 can be removed if that's not needed. It makes sense to pass value in A as well, which saves more space: lda #2 jsr mul5 ... .proc mul5 sta result lda #0 sta result+1 ldy #3 Loop: asl result rol result+1 dey bpl Loop rts .endp Down to 15 bytes using absolute addresses if you get rid of the upper 8 bit initialisation. 2 Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted January 3, 2022 Share Posted January 3, 2022 3 minutes ago, flashjazzcat said: Down to 15 bytes using absolute addresses if you get rid of the upper 8 bit initialisation. Compared to the version I posted, which also would be 15 bytes without 'LDA'ing first, your's needs Y-register, is bigger if result is non-ZP and slower too. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 3, 2022 Share Posted January 3, 2022 1 minute ago, Irgendwer said: Compared to the version I posted, which also would be 15 bytes without 'LDA'ing first, your's needs Y-register, is bigger if result is non-ZP and slower too. Agreed. I think yours is the best. 1 1 Quote Link to comment Share on other sites More sharing options...
mono Posted January 3, 2022 Share Posted January 3, 2022 ldy #0 sty tmp ldy #4 ?loop asl rol tmp dey bne ?loop rts 13 bytes when tmp is on page 0. 2 Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted January 3, 2022 Share Posted January 3, 2022 18 minutes ago, mono said: ldy #0 sty tmp ldy #4 ?loop asl rol tmp dey bne ?loop rts 13 bytes when tmp is on page 0. Where is the 16 bit result? Quote Link to comment Share on other sites More sharing options...
ivop Posted January 3, 2022 Share Posted January 3, 2022 15 minutes ago, Irgendwer said: Where is the 16 bit result? Looks like LSB is in A and MSB is tmp? Quote Link to comment Share on other sites More sharing options...
ivop Posted January 3, 2022 Share Posted January 3, 2022 (edited) ldx val lda lsbtab,x sta result lda msbtab,x sta result+1 12 bytes if val and result are on ZP. You said shortest code But you need 512 bytes of LUT. Edited January 3, 2022 by ivop first it an empty post, after that I fixed a typo 1 Quote Link to comment Share on other sites More sharing options...
ivop Posted January 3, 2022 Share Posted January 3, 2022 (edited) Or this one: asl rol result+1 asl rol result+1 asl rol result+1 asl rol result+1 sta result shorter, but trashes X ldx #3 loop asl rol result+1 dex bpl loop sta result Enter with value in A and result+1 set to 0. And a way to set res+1 to 0 cheaply. Still trashes X. ldx #0 stx res+1 lda val loop asl rol res+1 inx cpx #4 bne loop sta res Edited January 3, 2022 by ivop added more variations 1 Quote Link to comment Share on other sites More sharing options...
ivop Posted January 3, 2022 Share Posted January 3, 2022 (edited) A different approach. Not smaller though, but it might help others with thinking about this problem ; swap nibbles asl adc #$80 rol asl adc #$80 rol ; split and store result pha and #$f0 sta res pla and #$0f sta res+1 Edited January 3, 2022 by ivop Quote Link to comment Share on other sites More sharing options...
ilmenit Posted January 3, 2022 Author Share Posted January 3, 2022 1 hour ago, ivop said: A different approach. Not smaller though, but it might help others with thinking about this problem that's exactly one of my original attempts in my first post ? 1 Quote Link to comment Share on other sites More sharing options...
ivop Posted January 3, 2022 Share Posted January 3, 2022 (edited) 4 minutes ago, ilmenit said: that's exactly one of my original attempts in my first post ? Haha, sorry. Missed that somehow Edit: oh, I never looked at your asm file, but to the quoted code. It was mul1 Edited January 3, 2022 by ivop Quote Link to comment Share on other sites More sharing options...
ivop Posted January 3, 2022 Share Posted January 3, 2022 (edited) Okay, how about this one? It trashes the value though. 16 bytes with val on ZP. lda #0 asl val rol asl val rol asl val rol asl val rol sta val+1 Or trashing X, too: lda #0 ldx #3 loop asl val rol dex bpl loop sta val+1 12 bytes. Edited January 3, 2022 by ivop typoo 1 Quote Link to comment Share on other sites More sharing options...
ilmenit Posted January 3, 2022 Author Share Posted January 3, 2022 .proc mul7 ; by barrym95838, 14 bytes! ; result16 = factor8 * 16 lda val sta result lda #$10 loop: asl result rol bcc loop sta result+1 rts .endp 1 Quote Link to comment Share on other sites More sharing options...
ilmenit Posted January 3, 2022 Author Share Posted January 3, 2022 6 minutes ago, ivop said: Okay, how about this one? I trashes the value though. Good ideas, but I will need the value ? Quote Link to comment Share on other sites More sharing options...
ivop Posted January 3, 2022 Share Posted January 3, 2022 2 minutes ago, ilmenit said: Good ideas, but I will need the value ? Yeah, I guessed so. Combining that incredibly neat trick rolling #$10 four times that barrym95838 introduced, with the val trashing code results in this: lda #$10 loop asl val rol bcc loop sta val+1 9 bytes. Perhaps you can keep track of the original val somewhere else? 2 Quote Link to comment Share on other sites More sharing options...
ilmenit Posted January 3, 2022 Author Share Posted January 3, 2022 2 minutes ago, ivop said: 9 bytes. Perhaps you can keep track of the original val somewhere else? With 9 bytes now there can be space to preserve the val ? Quote Link to comment Share on other sites More sharing options...
ilmenit Posted January 3, 2022 Author Share Posted January 3, 2022 @barrym95838 is on AtariAge, I see. Kudos! 2 Quote Link to comment Share on other sites More sharing options...
ilmenit Posted January 4, 2022 Author Share Posted January 4, 2022 (edited) I made one that is using OS and requires placement on a special location, has 12 bytes and does not destroy the val: org $3 result_hi .ds[1] result_lo .ds[1] val .byte 121 org $2000 .proc os_mul16 lda val sta result_lo ldx #0 stx result_hi jsr $DBED rts .endp one that is destroying the val and has 8 bytes ? org $3 result_hi .ds[1] result_lo: val .byte 121 org $2000 .proc os_mul16_destr_val ldx #0 stx result_hi jsr $DBED rts .endp I think it's good enough comparing to initial 21-25 bytes. Edited January 4, 2022 by ilmenit 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.