SoulBlazer Posted May 2, 2012 Share Posted May 2, 2012 Hey, stop 'nerding' up the place with all this code talk. Just kidding! If it helps you guys figure out how to make more awesome Intellivision games...GO RIGHT AHEAD!!! Tell me about it. I don't even know what programing language the NES uses, much less how to program for it. Quote Link to comment Share on other sites More sharing options...
Carl Mueller Jr Posted May 2, 2012 Share Posted May 2, 2012 The Inty CPU is is a ton of fun to write code for and that is a big draw for me. I'm glad I'm not the only person who thinks this way. I cut my teeth on the TMS9900 and later 6502, but of the three, I feel like I can crank out CP-1610 the best. It just feels more straightforward, so I can focus on writing the program rather than figuring out, say, the best way to multiplex the accumulator. Don't get me wrong: I had a blast writing the 6502 code that I did. I just feel more productive on the CP-1610. I've poked at Z80 briefly, and while I can manage it, it's not my fave. I've been spoiled by 16 bit registers. :-) (TMS9900 wasn't too bad, actually. It had slightly better addressing modes, and more registers. It's been 20 years since I've written any TMS9900 assembly though, so I can't really compare it to anything. And, any machine that renames "logical OR" to "Set Ones Corresponding," doesn't even have a proper stack, and numbers its buses so that bit 0 is the most significant bit has gotta be a little wacky in my opinion.) Yeah, the best thing about the CP1610 is that it just seems to simplify programming. It may not be very fast, but it seems a lot easier to implement more complex algorithms. It would be interesting to do a comparison to see what operations the CP1610 could do faster at its lower clock rate than the 6502 or Z80, for example. On the other hand, I like the 6502 because there simply aren't a lot of ways to do things so you don't have to spend a lot of time juggling registers and figuring out how to optimize something. And it has that nifty zero page that shrinks the instruction size and speeds it up a bit. The Z80 is overly complicated, but it does in fact have 16-bit registers and some 16-bit instructions. I particularly like the stack instructions since you can push and pull 16 bit values quite quickly – I use this technique for my Intellivision for Gameboy Color emulator to quickly grab new opcodes from the game ROMs. The 8048 is a piece of junk, in my opinion. What, a 256 byte address space and something like a 16 byte stack? It does have two register sets, but this is a very, very limited processor. I can't believe they chose this as the main CPU for the Odyssey 2. Fortunately, it's kind of speedy… It was good enough to generate the waveform data for DK's sound. Carl Quote Link to comment Share on other sites More sharing options...
Rev Posted May 2, 2012 Share Posted May 2, 2012 I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking. Quote Link to comment Share on other sites More sharing options...
Carl Mueller Jr Posted May 2, 2012 Share Posted May 2, 2012 I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking. A joke post, I gather The 8048 doesn't have any 16 bit registers to my recollection. And the 6502's only 16 bit register is the program counter. Plus, both processes run faster than the CP1610, so I doubt its clock rate is necessarily to any programmer's liking. :-) Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted May 2, 2012 Share Posted May 2, 2012 I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking. Well, I think it needs a higher bit-rate in the cowbell waveform. Quote Link to comment Share on other sites More sharing options...
Fushek Posted May 2, 2012 Share Posted May 2, 2012 I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking. Well, I think it needs a higher bit-rate in the cowbell waveform. I have a video game fever ... and the only prescription ... is more cowbell waveform. Quote Link to comment Share on other sites More sharing options...
intvnut Posted May 3, 2012 Share Posted May 3, 2012 (edited) Yeah, the best thing about the CP1610 is that it just seems to simplify programming. It may not be very fast, but it seems a lot easier to implement more complex algorithms. It would be interesting to do a comparison to see what operations the CP1610 could do faster at its lower clock rate than the 6502 or Z80, for example. I believe something as simple as a memory copy would be faster on the CP-1610, if you measured in bytes/sec. Here's two loops. The 6502 version below is limited to a maximum 128 byte copy, and I assume you can do the copy "backwards" for it, to merge your index register with the loop counter, which requires a "pre-decrement" and to terminate at -1 rather than 0 because LDA sets flags... ; CP-1610 version loop: MVI@ R4, R0 ; 8 cycles MVO@ R0, R5 ; 9 cycles DECR R1 ; 6 cycles BNEQ loop ; 9 cycles vs. ; 6502 version. Assume 'src' ptr is in ($10), 'dst' ptr is in ($12), ; and X is number of bytes to copy. DEX ; pre-adjust X so ($10),X points to last byte loop: LDA ($10), X ; 5 cycles STA ($12), X ; 6 cycles DEX ; 2 cycles BMI loop ; 3 cycles So, it takes CP-1610 32 cycles for the Intellivision to copy 2 bytes, and 16 cycles for the 6502 to copy 1 byte. At the same clock rate, they copy at the same rate in bytes per second. But, consider all the provisos that come with the 6502 version, such as being limited to 128 bytes, etc. And if you unroll even just one time, the advantage starts to tip toward the CP-1600 (24.5 cycles/byte vs. 27 cycles/byte). (Ok, I expect a 6502 expert to tell me all the ways I screwed up in 3... 2... 1...) On the other hand, I like the 6502 because there simply aren't a lot of ways to do things so you don't have to spend a lot of time juggling registers and figuring out how to optimize something. And it has that nifty zero page that shrinks the instruction size and speeds it up a bit. The zero-page isn't just nifty. It's a necessity, since there aren't enough registers. :-) The ZP is your register set. The Z80 is overly complicated, but it does in fact have 16-bit registers and some 16-bit instructions. I particularly like the stack instructions since you can push and pull 16 bit values quite quickly – I use this technique for my Intellivision for Gameboy Color emulator to quickly grab new opcodes from the game ROMs. I admit my brief brush with it stuck largely to the 8080 subset, which I believe only has HL (sometimes called "M"). The 8048 is a piece of junk, in my opinion. What, a 256 byte address space and something like a 16 byte stack? It does have two register sets, but this is a very, very limited processor. I can't believe they chose this as the main CPU for the Odyssey 2. Fortunately, it's kind of speedy… It was good enough to generate the waveform data for DK's sound. It's a reasonable microcontroller meant mainly for tasks such as scanning a keyboard or controlling simple children's toys. It was stretched beyond those limits in the O2. The 8051 is a bit nicer and even has an external fetch mode. (I don't remember if the 8048 does.) Edited May 3, 2012 by intvnut Quote Link to comment Share on other sites More sharing options...
Carl Mueller Jr Posted May 3, 2012 Share Posted May 3, 2012 That's interesting that the CP1610 can match the 6502 for some operations. But you're using variable pointers… Supposing that you use the indexed immediate mode on the 6502, it would definitely win – particularly if your store was to the zero page. And I am well, well aware of the necessity of the zero page. It's the only way to set up a variable pointer. Also, it would be limited to 256 bytes, not 128. After which you would have to increment the high byte of your zero page pointer. That's one criticism of the CP1610… No index registers. Quote Link to comment Share on other sites More sharing options...
Rev Posted May 3, 2012 Share Posted May 3, 2012 I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking. A joke post, I gather The 8048 doesn't have any 16 bit registers to my recollection. And the 6502's only 16 bit register is the program counter. Plus, both processes run faster than the CP1610, so I doubt its clock rate is necessarily to any programmer's liking. :-) LOL, did it almost sound like I knew what I was talking about? Quote Link to comment Share on other sites More sharing options...
+cmart604 Posted May 3, 2012 Share Posted May 3, 2012 I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking. A joke post, I gather The 8048 doesn't have any 16 bit registers to my recollection. And the 6502's only 16 bit register is the program counter. Plus, both processes run faster than the CP1610, so I doubt its clock rate is necessarily to any programmer's liking. :-) LOL, did it almost sound like I knew what I was talking about? Lol! I almost pissed myself laughing when I saw that. I thought "Rev doesn't know what he's talking about, he's just talking out of his ass, and yet it sounds pretty good"! I love it when you guys talk Sanskrit, or whatever the hell language you're speaking. As Rev said, if it leads to more awesome games getting made then I'm all for it. Quote Link to comment Share on other sites More sharing options...
Rev Posted May 3, 2012 Share Posted May 3, 2012 What was this thread about again? In B 4 lock! Quote Link to comment Share on other sites More sharing options...
intvnut Posted May 3, 2012 Share Posted May 3, 2012 (edited) That's interesting that the CP1610 can match the 6502 for some operations. But you're using variable pointers… Supposing that you use the indexed immediate mode on the 6502, it would definitely win – particularly if your store was to the zero page. And I am well, well aware of the necessity of the zero page. It's the only way to set up a variable pointer. Also, it would be limited to 256 bytes, not 128. After which you would have to increment the high byte of your zero page pointer. Fair enough -- if the "to" and "from" buffers are fixed, indexed immediate does shave a couple cycles. That's useful in some, but not all cases. The peculiar limitation to 128 in my code above was kinda to illustrate a point. It could easily have been modified to allow copying 256 bytes, but only if you pre-decremented the two pointers, so you could terminate the loop at 0 rather than -1. The point is -- yes, the 6502 may be somewhat faster at certain things, but it's often less straightforward. At least to me, it can seem that way. That's one criticism of the CP1610… No index registers. Yes, some sort of indexed addressing mode would be nice. A "@R3[5]" type of mode (ie. access 5 words after R3) would be very helpful. It would make accessing data structures much more convenient. Edited May 3, 2012 by intvnut Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted May 3, 2012 Share Posted May 3, 2012 ; 6502 version. Assume 'src' ptr is in ($10), 'dst' ptr is in ($12), ; and X is number of bytes to copy. DEX ; pre-adjust X so ($10),X points to last byte loop: LDA ($10), X ; 5 cycles STA ($12), X ; 6 cycles DEX ; 2 cycles BMI loop ; 3 cycles Theres a couple of invalid instructions in there . You have to use the Y register for the lda/sta using indirect addressing and if you pass a value in X which less than or equal to 127 you'll only copy one byte because the value won't be negative the first time through the loop. To copy a straight 256 bytes you'd use something like :- ; 6502 version. Assume 'src' ptr is in ($10), 'dst' ptr is in ($12), ; Copy 256 bytes LDY #0 ; 2 cycles loop: LDA ($10), Y ; 5 cycles STA ($12), Y ; 6 cycles INY ; 2 cycles BNE loop ; 3 cycles if taken, 2 if not If you used absolute addressing, y for the source and destination addresses you'd save 2 cycles per loop unless you crossed a page boundary. Quote Link to comment Share on other sites More sharing options...
intvnut Posted May 3, 2012 Share Posted May 3, 2012 Theres a couple of invalid instructions in there . You have to use the Y register for the lda/sta using indirect addressing and if you pass a value in X which less than or equal to 127 you'll only copy one byte because the value won't be negative the first time through the loop. Which kinda underscores my point... I could never remember which of X and Y does indirect-indexed and indexed-indirect. :-) Ironic thing is, I had the answer in front of me when I looked up the cycle counts, but I guess I had Teflon-brain at that moment. To copy a straight 256 bytes you'd use something like :- ; 6502 version. Assume 'src' ptr is in ($10), 'dst' ptr is in ($12), ; Copy 256 bytes LDY #0 ; 2 cycles loop: LDA ($10), Y ; 5 cycles STA ($12), Y ; 6 cycles INY ; 2 cycles BNE loop ; 3 cycles if taken, 2 if not ... which only copies exactly 256 bytes. What about "up to 256 bytes?" If you used absolute addressing, y for the source and destination addresses you'd save 2 cycles per loop unless you crossed a page boundary. I believe that's what Carl was calling "indexed immediate". I understood it to mean "LDA $1234, X" and "STA $1234, X". Works if source and destination are fixed, and all copies are less than or equal to 256 bytes. Again, it kinda underscores my point: 6502, you can get there and often get faster code, but the path is generally never as straight or straightforward as it is for CP-1610. That's not to say you can't do tricky things on the CP-1610 -- you can do tricky things on any CPU -- but contrary to Carl's claim that there's fewer ways to do it on the 6502 and so you remained focused, I'd claim otherwise. :-) Here's a fun one I think the 6502 might have more trouble with, esp when you consider some values are larger than 8 bits: http://spatula-city....y/dist_fast.asm Sure, you can use some zero-page variables and direct addressing to get there. Thankfully, most 6502 machines don't have to worry about reentrancy... Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted May 3, 2012 Share Posted May 3, 2012 ... which only copies exactly 256 bytes. What about "up to 256 bytes?" Without adjusting the source and destination memory pointers to be off by one I can't think of a quick way without using the X register to keep track in the loop. Adjusting the pointers isn't too bad because you'd just use something like (CC65 syntax) :- lda #.lobyte(SrcAddress-1) sta $10 lda #.hibyte(SrcAddress-1) sta $10+1 Or make a macro to wrap around the function call and the assembler would do all the work . Here's a fun one I think the 6502 might have more trouble with, esp when you consider some values are larger than 8 bits: http://spatula-city....y/dist_fast.asm Sure, you can use some zero-page variables and direct addressing to get there. Thankfully, most 6502 machines don't have to worry about reentrancy... Cool! I need a distance computation for a project. I might even convert it to 6502 at some point . Quote Link to comment Share on other sites More sharing options...
intvnut Posted May 3, 2012 Share Posted May 3, 2012 Here's a fun one I think the 6502 might have more trouble with, esp when you consider some values are larger than 8 bits: http://spatula-city....y/dist_fast.asm Sure, you can use some zero-page variables and direct addressing to get there. Thankfully, most 6502 machines don't have to worry about reentrancy... Cool! I need a distance computation for a project. I might even convert it to 6502 at some point . Go for it. BTW, that particular copy of the source file says "GPL v2", but later I went and re-released pretty much all my library code in the public domain. The algorithm itself, as I said in the comments, comes from Graphics Gems, and is itself available to all I believe. It does work pretty well. For amusement sometime, you might try plotting an "error map" for the function. It's actually rather interesting. Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted May 3, 2012 Share Posted May 3, 2012 The Graphic Gems are an interesting set of books. I also have several of the Game Programming Gems series too. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.