+TheBF Posted August 24, 2017 Share Posted August 24, 2017 Quick update: The version based on apersson850's array access method is fully functional now. Actually, it was never broken to start with, but rather the PIO throughput was much lower than I had expected and it was not being adequately picked up by my connected equipment. The culprit was the compare instruction when cycling through the array elements CLR R3 MOVB @VERTEX,R3 GET NUMBER OF VERTICES IN ARRAY SWPB R3 REDO LI R1,1 LOOP MOVB @VERTEX(R1),@PIO SEND ARRAY BYTE TO PIO INC R1 C R1,R3 JLE LOOP JMP REDO At Lee's suggestion, I changed it to a decrementing loop instead, thus eliminating the compare instruction alltogether REDO CLR R3 MOVB @VERTEX,R3 GET NUMBER OF VERTICES IN ARRAY SWPB R3 LI R2,VERTEX+1 LOOP MOVB *R2+,@PIO SEND ARRAY BYTE TO PIO DEC R3 JNE LOOP JMP REDO and now the array is cycled through much faster. I had no idea Compare executed that slowly... At that point, I'm not going to even bother to use the NUMREF utility function given that I will have to issue a BLWP call for each array element in that case, which is going to be even slower. Fascinating... 3 MHz clock, 14 cycles or more to do almost anything and you get a VERY slow machine. I would estimate 120,000 instructions per second or less. State of the art circa 1970s By comparison the modern MSP430 from TI which has a similar instruction set and register count to the TMS9900 does some instructions in 1 clock cycle but on average takes about 2.5 So with the same 3MHz clock it would give you 1,200,000 instructions per second. This is part of the fun of programming the old devices. You must be smarter <oldfartrant> than these kids today, programming multi-core GHz processors in Javascript.</oldfartrant> 1 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 24, 2017 Author Share Posted August 24, 2017 I'm going to measure the PIO throughput tonight with and without the compare instruction to get an objective sense of the difference and for the advancement of human knowledge Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 24, 2017 Share Posted August 24, 2017 (edited) 3 MHz clock, 14 cycles or more to do almost anything and you get a VERY slow machine. I would estimate 120,000 instructions per second or less. Considering that a standard TI 99/4A runs most of your own assembly code in memory which has four wait-states per access, you can add roughly 10 more cycles per instruction, and then you land at almost exactly what you estimate. On the other hand, having 14 cycles per instruction gives you about 214000 instructions/s, so a modification to have all 16-bit RAM is quite valueable when it comes to performance. Due to a more efficient internal design, the TMS 9995 is capable of executing the same instructions using about 1/3 of the clock cycles. But it must always read memory byte by byte, except when using the internal scratch pad RAM inside the chip. Vorticon, you know you can start counting the clock cycles if you want to, right? You don't have to measure. Edited August 24, 2017 by apersson850 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 24, 2017 Author Share Posted August 24, 2017 Vorticon, you know you can start counting the clock cycles if you want to, right? You don't have to measure. Yes I know, but I want to measure the actual pulse output from the PIO port with my oscilloscope as there may some variation related to hardware. But the main reason is that I get to play with the oscilloscope 2 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 24, 2017 Share Posted August 24, 2017 Being crystal controlled, the variation from the hardware should be neglectable. But the playing with the scope thing I endorse fully, so I'll not object any more! 1 Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted August 25, 2017 Share Posted August 25, 2017 So I'm back at it with another project, and I've run into a snag: I need to send an entire array from XB to a small assembly program to be put out to the PIO port. The problem is that while I can give the assembly program access to the array via the NUMREF utility, it seems that I can only access one element for each XB call. Looks like you figured out NUMREF and the arrays. Neat stuff It isn't clear to me how many elements you intend to send or how they are 'calculated'. An alternative might be to use a string. If your values are 0<x<255 and total elements is <256, you could build the data into a string, pass the string with STRREF, and use the resulting length byte to send the 'elements' to PIO in that fashion. For example, I=1 to 255::A$=A$&CHR$(I)::NEXT I::CALL LINK("SEND".A$) would send 255 bytes. I suppose it also depends on whether you are building the elements on the fly, retrieving from disk, etc. The PIO port being 8 bits wide lends itself to byte values. Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 25, 2017 Author Share Posted August 25, 2017 Looks like you figured out NUMREF and the arrays. Neat stuff It isn't clear to me how many elements you intend to send or how they are 'calculated'. An alternative might be to use a string. If your values are 0<x<255 and total elements is <256, you could build the data into a string, pass the string with STRREF, and use the resulting length byte to send the 'elements' to PIO in that fashion. For example, I=1 to 255::A$=A$&CHR$(I)::NEXT I::CALL LINK("SEND".A$) would send 255 bytes. I suppose it also depends on whether you are building the elements on the fly, retrieving from disk, etc. The PIO port being 8 bits wide lends itself to byte values. The size of the array is limited to 256 bytes, and the number of actual elements is calculated prior to sending the data to the PIO port. Using string passing is definitely an option that I did not think about. Cool! Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 25, 2017 Author Share Posted August 25, 2017 So I tested the PIO throughput speed for my assembly routine: Here's the result with the compare instruction and here's the one without the compare instruction Interestingly, the throughput from the PIO port was about the same at about 9.1 kHz, although the pulse was longer by approximately 9 microseconds for the compare instruction because the data on the PIO port stays longer while the compare instruction is being processed. But then when resetting the array to the beginning there are fewer instructions to process in the program with the compare instruction. So in the end it turns out to be a wash... Interesting finding! That said, for a 3MHz computer, that throughput seems awfully low... Does it sound right? Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 25, 2017 Share Posted August 25, 2017 (edited) What kind of data are you measuring? If you have data that alternates from on to off, then on again, on the same pin for each cycle, then the duty cycle of your signal should be roughly 50%. It's obviously not. I get 120 cycles for the first method and 92 for the second, in the inner loop. This assuming everything is in slow memory. With everything in fast memory, it's 64 vs. 48. 48 cycles is equivalent to 31.25 kHz, since you have to run through the loop twice to output first a "one", then a "zero". By pre-loading a register with PIO, you can change the MOVB to output to *R5 instead of @PIO, saving eight more cycles, and thus reaching 37.5 kHz. If we assume workspace in fast memory and the rest in slow memory, I get 100 vs. 76 cycles. The latter implies 19.7 kHz. An interesting observation is that by "simply" adding 16-bit wide zero wait-state RAM to the machine, you roughly double the speed of the computer, in the cases where you make things easy for yourself and let both code and workspace reside in expansion RAM. Especially when writing software that works together with a higher level language, like Extended BASIC or Pascal, it's valuable to be able to leave the RAM pad as it is, to avoid messing up things you shouldn't, and still have full speed from the CPU. For software that frequently accesses VDP RAM and such stuff, the impact is of course less. Edited August 25, 2017 by apersson850 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 25, 2017 Author Share Posted August 25, 2017 Actually the frequency I got on the oscilloscope should be multiplied by 3 because there are 3 elements in my test array. The cycle is pin D7, D6 then D7 and my probe is on D6. So the PIO output frequency is more like 27.3 kHz... Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 25, 2017 Author Share Posted August 25, 2017 What kind of data are you measuring? If you have data that alternates from on to off, then on again, on the same pin for each cycle, then the duty cycle of your signal should be roughly 50%. It's obviously not. I get 120 cycles for the first method and 92 for the second, in the inner loop. This assuming everything is in slow memory. With everything in fast memory, it's 64 vs. 48. 48 cycles is equivalent to 31.25 kHz, since you have to run through the loop twice to output first a "one", then a "zero". By pre-loading a register with PIO, you can change the MOVB to output to *R5 instead of @PIO, saving eight more cycles, and thus reaching 37.5 kHz. If we assume workspace in fast memory and the rest in slow memory, I get 100 vs. 76 cycles. The latter implies 19.7 kHz. An interesting observation is that by "simply" adding 16-bit wide zero wait-state RAM to the machine, you roughly double the speed of the computer, in the cases where you make things easy for yourself and let both code and workspace reside in expansion RAM. Especially when writing software that works together with a higher level language, like Extended BASIC or Pascal, it's valuable to be able to leave the RAM pad as it is, to avoid messing up things you shouldn't, and still have full speed from the CPU. For software that frequently accesses VDP RAM and such stuff, the impact is of course less. Great info. Not sure why I'm not seeing a faster frequency for the decrementing loop on the scope. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 25, 2017 Share Posted August 25, 2017 (edited) Well, frequency is how often a period is repeated. So you need two data elements to create a frequency. 1 Hz is one signal per second, but you need two data output operations, one to turn on and the other to turn off, to create that frequency. Thus it takes two updates per second to get 1 Hz. Thus it would be correct to say that you can achieve 14 kHz by removing one of the three data elements. That also implies that you are reloading the array frequently. My calculations were based on that the array is at least 100 elements long, so that the time for the reload doesn't really matter. That's what gave me the 19.7 kHz figure. Edited August 25, 2017 by apersson850 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 26, 2017 Author Share Posted August 26, 2017 So more tribulations... I set up the assembly program to scan for any key using KSCAN at the end of each array processing cycle, and if a key is detected it will return to XB. That works just fine and it dutifully returns to XB on demand. The problem is that the key that was pressed is somehow retained and when a CALL KEY is used subsequently in the XB program, it registers that key, which is undesirable. I tried placing >FF in location >8375 (the address of the key pressed with KSCAN) prior to returning to XB with no effect. Clearing that location also does nothing. Clearing the GPL status byte before exiting also has no effect. Is there some magical location specific to XB where the key from KSCAN is stored by any chance? I tell you my hair is thinning by the minute Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted August 26, 2017 Share Posted August 26, 2017 Have you tried waiting until the finger is off the key before returning to XB? 2 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted August 26, 2017 Share Posted August 26, 2017 Check KSCAN for "no keystroke" before returning to XB to avoid key bounce. ...lee 1 Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted August 26, 2017 Share Posted August 26, 2017 IIRC, you can stuff one of the bytes in the >83cx range with the scanned key to trick the auto-repeat functionality into a debounce. Check Thierry's page as I believe he goes into a bit of this in the keyboard or ROM section(s). If the solutions posted by SeniorFalcon and Lee do not work for some reason, you could also opt to do a simple scan for the CTRL or SHIFT key with a simple CRU test. I recycled this from TIMXT's interpreter, with a few minor changes, as an example. * TEST FOR a pause NOPS1 LIMI 0 don't allow interrupts during CRU operation. LI R12,>0024 set row CLR R0 to 0 LDCR R0,3 LI R12,6 set column TB 6 CTRL ? JNE NOPSE yes. exit TB 5 SHIFT? JNE NOPSE yes * (add a LIMI 2 here to trigger an interrupt before re-scanning) JMP NOPS1 stay in loop NOPSE LIMI 2 continue... 1 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 26, 2017 Author Share Posted August 26, 2017 Ah debounce! I don't know why I did not think of it... A simple delay loop before returning to XB solved the problem, BUT with one notable exception: if I press the space bar instead of any other key to exit the assembly routine, when I am in basic I get an un-interruptable stream of space characters coming in... It's the oddest thing! Any ideas here? Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted August 27, 2017 Author Share Posted August 27, 2017 OK figured it out. Looks like if the RS232 card is still active upon return to XB, strange things happen with KSCAN. Properly turning off the card before exiting the assembly program solved the issue. Meshing hardware and software is tricky... 3 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.