senior_falcon Posted June 2, 2022 Share Posted June 2, 2022 5 minutes ago, RXB said: 5%? Where did the 1% or 2% claim go? That is what got me started? Simple. We remembered that it was 1% to 2%. Testing showed that it averaged around 5%. Ergo, our memories were wrong, so we adjusted our beliefs. 2 Quote Link to comment Share on other sites More sharing options...
RXB Posted June 2, 2022 Author Share Posted June 2, 2022 Just now, senior_falcon said: Simple. We remembered that it was 1% to 2%. Testing showed that it averaged around 5%. Ergo, our memories were wrong, so we adjusted our beliefs. Well 5% is 4 times larger than 1% and over double of 2%! You can see the difference in TI Basic compared to XB in speed. Both use same token code, both can run from VDP RAM, but from RAM XB is much faster and you can see it. 1% or 2% would be hard to be perceptible. And saying no difference is just a lie. 1 Quote Link to comment Share on other sites More sharing options...
HOME AUTOMATION Posted June 2, 2022 Share Posted June 2, 2022 2 hours ago, RXB said: If VDP/GROM/RAM are the same speed why even go with Assembly? (It is the worst one for wasting memory!) Assembly produces MACHINE CODE, the native language of the microprocessor, so, the fastest, and the microprocessor can only execute that code directly from the address bus. 1 Quote Link to comment Share on other sites More sharing options...
HOME AUTOMATION Posted June 2, 2022 Share Posted June 2, 2022 On 6/1/2022 at 2:05 AM, apersson850 said: But when an interpreter runs sequential code from memory, reading a byte from CPU RAM, GROM or from VDP RAM doesn't make much difference. When reloading the address it does, but you don't do that all the time. I wonder how the top-speed of the interpreter, running w/o any instruction/memory fetch, would compare to that of its normal operation, when expressed as a percentage... 1 Quote Link to comment Share on other sites More sharing options...
HOME AUTOMATION Posted June 2, 2022 Share Posted June 2, 2022 2 hours ago, RXB said: And saying no difference is just a lie. I'm assuming the second sense of the word "LIE". Noun: The way, direction, or position in which something lies. "he was familiarizing himself with the lie of the streets" 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted June 2, 2022 Share Posted June 2, 2022 2 hours ago, RXB said: 5%? Where did the 1% or 2% claim go? That is what got me started? I think Bill had some tests where he saw 1%..2% difference but that was due to his hardware configuration. (It's in these threads somewhere) I used my stock console and old PEB with my old XB cartridge and got about 4.5% in one test and 5% in another. 5% is faster but not what one might expect. I always thought it was much faster using expansion RAM until I proved myself wrong. (ask my wife about how often that happens) 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted June 2, 2022 Share Posted June 2, 2022 8 hours ago, RXB said: Like CALL HCHAR for example is about 9 times faster proving what you said above is hogwash! It's only proving you don't have a clue about how this works. To interpret byte code you need instructions like these: MOVB *R13, R1 MOVB *R13+, R1 The first one will fetch one byte from VDP RAM or GROM, assuming R13 contains the VDP read address, or GROM read address, which is what sensible interpreters would do. The second one will fetch one byte from CPU RAM, assuming R13 contains the instruction pointer. The first instruction is 18 cycles, the second 20. The first implies 5 memory accesses, the second 6. In the first case, four memory accesses are to CPU RAM, in the second all of them. A memory access will cost an additional four wait states in 8-bit expansion RAM, none extra in 16-bit RAM. That's 16 for accessing VDP RAM, 24 for addressing CPU RAM only. So with instruction times of 34 cycles for the first (VDP RAM) and 44 for the second (CPU RAM), the little difference in accessing the byte to be interpreted isn't significant. Of course, everything runs with the workspace in RAM PAD, which implies that only the instruction fetch and the byte fetch will have the wait states added. So it's 22 cycles for the first, 28 for the second. Any possible wait states with accessing the VDP RAM or GROM needs to be added, but all in all, it should be obvious that the kind of memory used is insignificant. It's actually a bit faster to read from VDP RAM, since you don't need to increment the address. It's done by the hardware. In the real world, an interpreter would have to keep track of the read address when running from VDP RAM too, which adds a separate INC. Without that tracking, subroutines can't be executed. Also, when a jump is carried out, the address reload is more cumbersome than when running from CPU RAM. Now add the instructions the interpreter has to execute to interpret the instructions. In the p-system, the minimum is seven instructions overhead, provided that the p-code to interpret can then be executed by one single assembly instruction. If it's more complex, several, sometimes hundreds, of instructions are run to interpret one p-code. Although I'm not that well versed about GPL, I know enough to say it's similar. All in all, there is a difference due to the different hardware, depending on if it's VDP RAM, GROM, slow CPU RAM or fast CPU RAM. But compared to the rest of the things going on in these interpreters, the difference isn't noticeable, unless you do accurate timing of the progress. A percent or so. As others have already explained, hopefully enough for you to understand it, it's not at all the same as comparing other languages to code written in assembly. 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted June 2, 2022 Share Posted June 2, 2022 2 hours ago, RXB said: Well 5% is 4 times larger than 1% and over double of 2%! We need some elementary math lesson here too. 1% faster is 0.99 of the original time. 5% faster is 0.95 of the original time. 99/95 is 1.04, so the difference is a factor of 1.04, not five (which you probably meant, although you wrote 4). 3 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted June 2, 2022 Share Posted June 2, 2022 26 minutes ago, HOME AUTOMATION said: I wonder how the top-speed of the interpreter, running w/o any instruction/memory fetch, would compare to that of its normal operation, when expressed as a percentage... I don't know the structure of the GPL interpreter, but the p-code interpreter executes seven instructions to figure out what to do when running from CPU RAM, eight if the code is in VDP RAM. The one extra is because when running from autoincrementing memory, the interpreter has to increment the instruction pointer with a separate instruction. It has to keep track of that to be able to return from a subroutine. When running from CPU RAM, the instruction pointer is autoincremented. I've shown it in detail in one of the Pascal threads here before. Fetch code Shift to make word index Fetch address from interpreter's opcode table Branch to interpreter code Fetch next address to branch to in interpreter (this just points to the next word if there is no immediate data after the instruction. If there is, it will lead to code to fetch that data.) Branch to next address (handle immeditate(s) or actually run the instruction) Run the instruction(s) that implement the instruction to be interpreted. Branch back to the interpreter The boldface is the interpreter's instructions, the rest the actual work to do. But to express the percentage is difficult, since it depends on the instruction to interpret. In the simplest case, like an ADI (Add integer), which adds top of stack to the next value on the stack, and leave the result on the top of stack, only one machine instruction is needed. A *SP+,*SP. At the other extreme, it's an inter-segment procedure call, where a segment fault is issued, so the code containing the requsted procedure first has to be loaded from disk. To make that possible, space must be created in one of the code pools, possibly by removing unused segments and packing the used ones. In such a case, thousands of instructions will be executed to interpret a single p-code GPL doesn't support such advanced memory management, but on the other hand has several instructions that does quite a lot with a single instruction. 3 Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted June 2, 2022 Share Posted June 2, 2022 2 hours ago, RXB said: If there is no difference as you argue why is TI Basic slower then XB? That is not always the case. As you know, some programs run faster in Basic, despite running from VDP ram. The Basic interpreter and the XB interpreter are different and do not use the same instructions to achieve their goals. so it is hardly surprising that they would run at different speeds. Plus XB has 12K of assembly to do some of the heavy lifting. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted June 2, 2022 Share Posted June 2, 2022 5 hours ago, Reciprocating Bill said: There are some other interesting approaches that can help tease out the performance impact of utilizing VDP RAM in settings other than the running XB interpreter. For example, I’m tempted to re-code my assembly version of the Sieve of Erastothenes, placing the 8192 byte array in VDP. A quick run of the Sieve in Chipmunk BASIC shows that the array is read and/or writen to 24,573 times during the course of a single interation of the sieve (including initialization). What impact will the use of VDP RAM have on Sieve performance, relative to an array in RAM? Stay tuned. I think we can show that with two ASM words in Forth. ( For the FORTH challenged) FILL fills a block of RAM with a character given (addr,len,char) VFILL fill a block of VDP RAM with a character given (VDPaddr,len,char) I can use the 9901 timer to time short duration code. It runs continuously in Camel99 Forth. (Thanks Tursi) Here is the test code. Times are shown in 21.3uS ticks of the 9901 timer. Spoiler \ TEST FILL versus VFILL HEX : TESTRAM TMR@ \ read the 9901 timer B000 1000 [CHAR] # FILL \ fill some memory TMR@ - \ read the timer again and subtract . \ print the result ; : TESTVDP TMR@ 1000 1000 [CHAR] # VFILL TMR@ - . ; DECIMAL Here is a big shock. VDP filled slightly faster than CPU RAM in this case on CLASSIC99. The code for FILL VFILL are below in RPN Assembler Spoiler CODE FILL ( addr cnt char -- ) *SP+ R0 MOV, \ pop cnt->R0 *SP+ R1 MOV, \ pop addr->R1 TOS SWPB, BEGIN, TOS *R1+ MOVB, \ char is in TOS R0 DEC, \ decr. count EQ UNTIL, \ loop until r0=0 TOS POP, \ refill the TOS register NEXT, ENDCODE CODE VFILL ( VDP-addr count char-- ) TOS SWPB, \ fix the TMS9900 byte order TOS W MOV, R0 POP, \ R0=count TOS POP, \ VDP-addr popped into TOS WMODE @@ BL, \ setup VDP write address in TOS register R3 VDPWD LI, \ vdp addr. in a reg. makes this 12.9% faster BEGIN, W *R3 MOVB, \ write byte to vdp ram R0 DEC, \ dec the byte counter EQ UNTIL, \ jump back if not done 2 LIMI, TOS POP, NEXT, ENDCODE Quote Link to comment Share on other sites More sharing options...
+TheBF Posted June 2, 2022 Share Posted June 2, 2022 Similar result on real iron. Quote Link to comment Share on other sites More sharing options...
RXB Posted June 2, 2022 Author Share Posted June 2, 2022 1 hour ago, apersson850 said: It's only proving you don't have a clue about how this works. To interpret byte code you need instructions like these: MOVB *R13, R1 MOVB *R13+, R1 The first one will fetch one byte from VDP RAM or GROM, assuming R13 contains the VDP read address, or GROM read address, which is what sensible interpreters would do. The second one will fetch one byte from CPU RAM, assuming R13 contains the instruction pointer. The first instruction is 18 cycles, the second 20. The first implies 5 memory accesses, the second 6. In the first case, four memory accesses are to CPU RAM, in the second all of them. A memory access will cost an additional four wait states in 8-bit expansion RAM, none extra in 16-bit RAM. That's 16 for accessing VDP RAM, 24 for addressing CPU RAM only. So with instruction times of 34 cycles for the first (VDP RAM) and 44 for the second (CPU RAM), the little difference in accessing the byte to be interpreted isn't significant. Of course, everything runs with the workspace in RAM PAD, which implies that only the instruction fetch and the byte fetch will have the wait states added. So it's 22 cycles for the first, 28 for the second. Any possible wait states with accessing the VDP RAM or GROM needs to be added, but all in all, it should be obvious that the kind of memory used is insignificant. It's actually a bit faster to read from VDP RAM, since you don't need to increment the address. It's done by the hardware. In the real world, an interpreter would have to keep track of the read address when running from VDP RAM too, which adds a separate INC. Without that tracking, subroutines can't be executed. Also, when a jump is carried out, the address reload is more cumbersome than when running from CPU RAM. Now add the instructions the interpreter has to execute to interpret the instructions. In the p-system, the minimum is seven instructions overhead, provided that the p-code to interpret can then be executed by one single assembly instruction. If it's more complex, several, sometimes hundreds, of instructions are run to interpret one p-code. Although I'm not that well versed about GPL, I know enough to say it's similar. All in all, there is a difference due to the different hardware, depending on if it's VDP RAM, GROM, slow CPU RAM or fast CPU RAM. But compared to the rest of the things going on in these interpreters, the difference isn't noticeable, unless you do accurate timing of the progress. A percent or so. As others have already explained, hopefully enough for you to understand it, it's not at all the same as comparing other languages to code written in assembly. Hmmm you totally ignored that in RAM you do not need a routine to SET THE VDP ADDRESS. That requires instructions that are not INSTANTENIOUS. As for VDP address Autoincrement works fine for a serial sequence of data but most things are not sequential for Screen info. I am not saying Autoincrement is slow, I am saying you must use much more programing for changing address in VDP and that talks much longer the more often used. A real test would be to write every 5th byte in VDP/GROM or RAM and test that for times of how long it takes. That would be a REAL WORLD TEST. That is the kind of realistic test or both RAM vs VDP/GROM. Quote Link to comment Share on other sites More sharing options...
RXB Posted June 2, 2022 Author Share Posted June 2, 2022 2 hours ago, apersson850 said: We need some elementary math lesson here too. 1% faster is 0.99 of the original time. 5% faster is 0.95 of the original time. 99/95 is 1.04, so the difference is a factor of 1.04, not five (which you probably meant, although you wrote 4). Last I checked 100% minus 5% is not 1.04% It is 95% as 100%-5% is 95% what kind of math are you creating here? And 99/95 does not mean anything logical. Maybe 99-95 means something like 4 which makes more sense. Also I am confused why you moved the decimal place from 100.00 to .99 due to 100.00-.99 equals 99.01 not 1% as .01 is not 1% in anyway in math. Thus 100-95=5 not .95 this is some odd math here. I know you take a value and divide by 100 for decimal equivalent, but no way 1.04 is difference between 99% and 95% how did 4% become .04% No human could see a difference of .04% in speed. I have never claimed to be a math wiz, but I can not find anything to support this even your factor comes out wrong. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted June 2, 2022 Share Posted June 2, 2022 24 minutes ago, RXB said: Hmmm you totally ignored that in RAM you do not need a routine to SET THE VDP ADDRESS. No, I did not. I wrote earlier that the address needs to be reloaded when you execute a jump. But, as it turns out, that's negated by the need to have an extra instruction to increment the instruction pointer. Reloading the address is typically three instructions, and sequential runs are typically longer than that. I can clearly see that you are no math wiz. Let's just say that 4% is 0.04. You have to pay attention to if there is a percent sign or not. Quote Link to comment Share on other sites More sharing options...
RXB Posted June 2, 2022 Author Share Posted June 2, 2022 2 hours ago, apersson850 said: I don't know the structure of the GPL interpreter, but the p-code interpreter executes seven instructions to figure out what to do when running from CPU RAM, eight if the code is in VDP RAM. The one extra is because when running from autoincrementing memory, the interpreter has to increment the instruction pointer with a separate instruction. It has to keep track of that to be able to return from a subroutine. When running from CPU RAM, the instruction pointer is autoincremented. I've shown it in detail in one of the Pascal threads here before. Fetch code Shift to make word index Fetch address from interpreter's opcode table Branch to interpreter code Fetch next address to branch to in interpreter (this just points to the next word if there is no immediate data after the instruction. If there is, it will lead to code to fetch that data.) Branch to next address (handle immeditate(s) or actually run the instruction) Run the instruction(s) that implement the instruction to be interpreted. Branch back to the interpreter The boldface is the interpreter's instructions, the rest the actual work to do. But to express the percentage is difficult, since it depends on the instruction to interpret. In the simplest case, like an ADI (Add integer), which adds top of stack to the next value on the stack, and leave the result on the top of stack, only one machine instruction is needed. A *SP+,*SP. At the other extreme, it's an inter-segment procedure call, where a segment fault is issued, so the code containing the requsted procedure first has to be loaded from disk. To make that possible, space must be created in one of the code pools, possibly by removing unused segments and packing the used ones. In such a case, thousands of instructions will be executed to interpret a single p-code GPL doesn't support such advanced memory management, but on the other hand has several instructions that does quite a lot with a single instruction. LOL GPL has the MOVE instruction! (Page 4.42 GPL Programing Guide) It can move any type of memory to any other type of memory any size you want. It can even move from any type of memory to VDP Registers even for F18 that I include in RXB using only GPL command MOVE for up to 58 Registers. GPL is my thing so you just stepped into my world here. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted June 2, 2022 Share Posted June 2, 2022 1 minute ago, RXB said: LOL GPL has the MOVE instruction! (Page 4.42 GPL Programing Guide) So what? I wrote that GPL doesn't have the advanced memory management you find in the p-system. Which it doesn't. Quote Link to comment Share on other sites More sharing options...
RXB Posted June 2, 2022 Author Share Posted June 2, 2022 1 minute ago, apersson850 said: No, I did not. I wrote earlier that the address needs to be reloaded when you execute a jump. But, as it turns out, that's negated by the need to have an extra instruction to increment the instruction pointer. Reloading the address is typically three instructions, and sequential runs are typically longer than that. I can clearly see that you are no math wiz. Let's just say that 4% is 0.04. You have to pay attention to if there is a percent sign or not. Right thus .04 is still 4% not 1% as you tried to say. And the original quote was 5% high side and the low side was 4%. So the original thing I disputed was saying 1% was high side. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted June 2, 2022 Share Posted June 2, 2022 (edited) 6 minutes ago, RXB said: Right thus .04 is still 4% not 1% as you tried to say. It would be easier to teach you something if you cared to check what I write, and even better, check what you write yourself. You claimed that 5% is four times larger than 1%. It's not in this case, since it's the relation between if it takes 99% or 95% of the time to execute the program. That's the difference between a ratio of 0.99 and 0.95. One divided by the other is 1.04, not 4.00. That's a pretty minute difference. Anyway, as I've already shown to you, reading a byte stream from CPU RAM isn't much different compared to reading it from VDP RAM. And that's the main thing done. That's why you see so little difference. I don't see any real difference between running Extended BASIC with my fast CPU RAM or using the standard CPU RAM expansion. As could be expected. Edited June 2, 2022 by apersson850 Quote Link to comment Share on other sites More sharing options...
RXB Posted June 2, 2022 Author Share Posted June 2, 2022 1 minute ago, apersson850 said: It would be easier to teach you something if you cared to check what I write, and even better, check what you write yourself. You claimed that 5% is four times larger than 1%. It's not in this case, since it's the relation between if it takes 99% or 95% of the time to execute the program. That's the difference between a ratio of 0.99 and 0.95. One divided by the other is 1.04. That's a pretty minute difference. Anyway, as I've already shown to you, reading a byte stream from CPU RAM isn't much different compared to reading it from VDP RAM. And that's the main thing done. That's why you see so little difference. I don't see any real difference between running Extended BASIC with my fast CPU RAM or using the standard CPU RAM expansion. As could be expected. Your test may be badly written as you did not even test VDP having changed address each time but instead only took advantage of Autoincrementing that is not real-world factor. That would make the test bias at best. Even in TI OS has to save the previous address when it writes to screen and reset the VDP screen pointer to original address. RAM or VDP or GROM all have to do this in real-world so this factor has to be within your test. Quote Link to comment Share on other sites More sharing options...
RXB Posted June 2, 2022 Author Share Posted June 2, 2022 15 minutes ago, apersson850 said: So what? I wrote that GPL doesn't have the advanced memory management you find in the p-system. Which it doesn't. Dude PCODE CARD is written with GPL GROM and Assembly ROM. Now maybe your PASCAL is not like that, but is stands a fact that PCODE card is built on GPL and ROM Assembly. This like claiming C sucks as it is not Windows? (But Windows is created using C) That ignores the facts of so many factors. Yea GPL does not have a memory manager, but you can make one using it. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted June 2, 2022 Share Posted June 2, 2022 (edited) Rich, cut the bloody "dude" and "hogwash" and "lies" bullshit so we can have a real discussion here! Looking at the p-code interpreter, it runs a different number of instructions to decode a p-code. It depends on if the p-code is just a byte, or has data inline. The simplest runs for 116 cycles. There are two memory access cycles with four wait states each, if the computer has standard 8-bit RAM expansion. So the difference is 116 or 124 cycles between standard and 16-bit RAM expansion. One memory access is either CPU RAM, VDP RAM or GROM, depending on where the code to interpret is. The longest interpretation overhead, to fetch not only the instruction but also the largest amount of data an instruction may have, is 335 cycles. Then there are five memory accesses to code memory and one more to memory expansion. 24 cycles more or less, added to 335. So even with the pretty efficient p-code interpreter, the difference in memory speed doesn't do too much. The biggest gain in the p-system comes from the quite extensive amount of code that's transferred to the 8K RAM expansion, and runs there. Some of that runs with the workspace in expansion RAM too, and in such cases the execution time is cut to less than half with 16-bit memory. But all instruction executions are running on the p-code card, which has the standard 8-bit access of things in the expansion box. The hardware used by the p-system has nothing to do with what we try to discuss here. There's no GPL in the implementation of the p-system. Just GROM and ROM chips to store code and data. The disadvantage the PME has it that all executive parts are on the p-code card, which has 8-bit access. The GPL interpreter runs in the monitor ROM, which is 16-bit. The PME mitigates this to some extent, by transferring the core of the interpreter to RAM PAD. 27 instructions runs from there. Of course you can make a memory management system using GPL. But that's also not the point. That it's there for you to use from the beginning is also not the point. I mentioned it just to show the enormous difference in complexity between the simplest and most complex p-codes. Just like a simple add, the memory manager is invoked by one single p-code instruction. By the way, a block move is useful regardless of the language. There is no p-code that corresponds to the MOVE instruction in GPL. Instead, the system provides a number of intrinsics for special tasks. Three of them are moveleft, moveright and fillchar. They do similar things. The advantage of them in the p-system is that they are available to you on Pascal level as well. The p-system, on the other hand, does have the advantage that it never stores variables in anything but CPU RAM. The stack is also in CPU RAM. Besides, I've done no test now. I'm showing the theoretical background to why these interpreters are so little dependent on the memory the code is stored in. I did do tests a long time ago, especially when I implemented my 16-bit memory expansion, to see the difference in execution speed in Extended BASIC, assembly and Pascal, for example. That's why I know you hardly notice the difference with BASIC. Edited June 2, 2022 by apersson850 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted June 3, 2022 Author Share Posted June 3, 2022 18 hours ago, apersson850 said: Rich, cut the bloody "dude" and "hogwash" and "lies" bullshit so we can have a real discussion here! Looking at the p-code interpreter, it runs a different number of instructions to decode a p-code. It depends on if the p-code is just a byte, or has data inline. The simplest runs for 116 cycles. There are two memory access cycles with four wait states each, if the computer has standard 8-bit RAM expansion. So the difference is 116 or 124 cycles between standard and 16-bit RAM expansion. One memory access is either CPU RAM, VDP RAM or GROM, depending on where the code to interpret is. The longest interpretation overhead, to fetch not only the instruction but also the largest amount of data an instruction may have, is 335 cycles. Then there are five memory accesses to code memory and one more to memory expansion. 24 cycles more or less, added to 335. So even with the pretty efficient p-code interpreter, the difference in memory speed doesn't do too much. The biggest gain in the p-system comes from the quite extensive amount of code that's transferred to the 8K RAM expansion, and runs there. Some of that runs with the workspace in expansion RAM too, and in such cases the execution time is cut to less than half with 16-bit memory. But all instruction executions are running on the p-code card, which has the standard 8-bit access of things in the expansion box. The hardware used by the p-system has nothing to do with what we try to discuss here. There's no GPL in the implementation of the p-system. Just GROM and ROM chips to store code and data. The disadvantage the PME has it that all executive parts are on the p-code card, which has 8-bit access. The GPL interpreter runs in the monitor ROM, which is 16-bit. The PME mitigates this to some extent, by transferring the core of the interpreter to RAM PAD. 27 instructions runs from there. Of course you can make a memory management system using GPL. But that's also not the point. That it's there for you to use from the beginning is also not the point. I mentioned it just to show the enormous difference in complexity between the simplest and most complex p-codes. Just like a simple add, the memory manager is invoked by one single p-code instruction. By the way, a block move is useful regardless of the language. There is no p-code that corresponds to the MOVE instruction in GPL. Instead, the system provides a number of intrinsics for special tasks. Three of them are moveleft, moveright and fillchar. They do similar things. The advantage of them in the p-system is that they are available to you on Pascal level as well. The p-system, on the other hand, does have the advantage that it never stores variables in anything but CPU RAM. The stack is also in CPU RAM. Besides, I've done no test now. I'm showing the theoretical background to why these interpreters are so little dependent on the memory the code is stored in. I did do tests a long time ago, especially when I implemented my 16-bit memory expansion, to see the difference in execution speed in Extended BASIC, assembly and Pascal, for example. That's why I know you hardly notice the difference with BASIC. LOL so GROM in the PCODE card is just data? You know this how? Can you show me how you know this with some of the code? As a GPL programmer I would like to see it. And this is all based on your opinion not any tests you have done and you freely just admitted as much. "Besides, I've done no test now. I'm showing the theoretical background to why these interpreters are so little dependent on the memory the code is stored in." Quote Link to comment Share on other sites More sharing options...
Asmusr Posted June 3, 2022 Share Posted June 3, 2022 Does this summarize the facts? Reading from GROM/VDP is almost as fast as reading from CPU memory (ROM or RAM) if the read address has already been set up If you need to set up the read address first, reading from GROM/VDP is a lot slower than reading from CPU memory It's about 5% faster to execute an XB program from 32K RAM instead from VDP RAM only An XB routine programmed in assembly is usually much faster than a similar routine written in GPL Reading from 16-bit memory is exactly 4 CPU cycles faster than reading from 8-bit memory Because console ROMs are 16-bit, they can be faster than similar assembly routines in cartridge ROM 3 Quote Link to comment Share on other sites More sharing options...
Willsy Posted June 3, 2022 Share Posted June 3, 2022 27 minutes ago, Asmusr said: Does this summarize the facts? Reading from GROM/VDP is almost as fast as reading from CPU memory (ROM or RAM) if the read address has already been set up If you need to set up the read address first, reading from GROM/VDP is a lot slower than reading from CPU memory It's about 5% faster to execute an XB program from 32K RAM instead from VDP RAM only An XB routine programmed in assembly is usually much faster than a similar routine written in GPL Reading from 16-bit memory is exactly 4 CPU cycles faster than reading from 8-bit memory Because console ROMs are 16-bit, they can be faster than similar assembly routines in cartridge ROM Yep. That about does it. I'm out! 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.