+mizapf Posted April 7, 2013 Share Posted April 7, 2013 @sometimes99er: Thanks for sharing the video! Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2731011 Share on other sites More sharing options...
Asmusr Posted April 8, 2013 Share Posted April 8, 2013 It looks like you're advanced enough to know the basics for speeding things up, but definitely unroll your clear loop (I find 4 times a good tradeoff if you can afford it the space, benefits taper after 8 ). Likewise for your VDP copy loop, it should be helpful to unroll that one 2-4 times. If you have enough space in scratchpad, those alone will give you back a noticable amount of time. Also, replace @VDPWD in your VDP copy loop with a register, this also saves a few cycles both for reading the instruction and processing it. Ah yes, unrolling the loops, I forgot about that old trick. But actually a quick test shows that it doesn't help much, only about 20% faster. What is the best way to profile you code in Classic99, is there some way to count how many cycles an entire subroutine call takes? Thanks for your advice. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2731322 Share on other sites More sharing options...
Opry99er Posted April 8, 2013 Share Posted April 8, 2013 20% is quite a bit faster! =) Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2731335 Share on other sites More sharing options...
Tursi Posted April 8, 2013 Share Posted April 8, 2013 Ah yes, unrolling the loops, I forgot about that old trick. But actually a quick test shows that it doesn't help much, only about 20% faster. What is the best way to profile you code in Classic99, is there some way to count how many cycles an entire subroutine call takes? Thanks for your advice. I would have expected better, unless you move 20% overall, and not just in the loop itself. I benchmarked unrolled copy loops on hardware a few years ago to see what could be done. Classic99 has a timing mode in the debugger, set a start and end point, and the debug log will emit the number of cycles executed between those two points. (In the help, it's described as T(0000-0001) -- 0000 is the start address, 0001 is the end. More accurately, the first address resets the counter to zero, and the second dumps it to debug. 1 Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2731339 Share on other sites More sharing options...
matthew180 Posted April 9, 2013 Author Share Posted April 9, 2013 (edited) ... can you tell me how DIV and DIVS are implemented? In particular, I'm interested in the overflow detection of DIVS, because right now I have an ugly piece of code in MESS to predict an overflow ... DIV was one part of the CPU that worried me when I started because I had no idea how it was done. After a lot of research I found a few methods: * subtract and replace * subtract and shift * a complicated (but fast) prediction method with lots of math I don't understand * turn the dividend into its reciprocal and multiply by the divisor. This method requires a lookup table, but is also fast and can be done in a consistent time. This is the method used on the CRAY-1 computer. Check these links out: http://bitsavers.informatik.uni-stuttgart.de/pdf/cray/2240004C_CRAY-1_Hardware_Reference_Nov77.pdf http://bitsavers.informatik.uni-stuttgart.de/pdf/cray/ I chose a version of the subtract and shift method and modified it to be a 32x16 instead of the dividend and divisor being the same size. In looking at my HDL to reply to this post, I found a bug though! Grrr. When I modified the NxN divide example in to an NxM divide, I failed to realize a subtle difference caused by the difference in dividend vs divisor size. So, my DIV in the GPU is limited to a 15-bit divisor and 31-bit dividend. I'm glad you asked the question though, since it forced me to review the HDL and find the error. I have a fix, but of course that is another update on top of the one I *just* put out. However I don't suppose having *yet another update* will ever end. The 9900 does not have signed division (DIVS), so I did not have to deal with it, but I did go learn about it so I could reply with some sort of useful information. Here is a link that I found that explains how to perform the signed division and handle the sign of the results: http://homepages.cae.wisc.edu/~ece352/fall00/project/pj2.pdf Basically, signed division is done the same as unsigned by converting the inputs to unsigned values first. There are a few tests to do on the inputs to determine the resulting sign of the quotient and remainder: 1. The remainder will have the same sign as the dividend. 2. The quotient will be positive if the divisor and dividend have the same sign, otherwise negative. The minimal extra overhead of DIVS over DIV is probably checking and converting the inputs to unsigned as necessary, and setting two flags based on the original inputs so the output values can be made negative if necessary. Overflow would work the same way as unsigned division, once the inputs are converted to unsigned values, and can be checked prior to doing the actual DIVS just like DIV. For a lopsided divider, i.e. dividers with configurations of 32-bit x 16-bit, or 16-bit x 8-bit, or 8-bit x 4-bit, etc., the value of the divisor must be greater than the value of the most-significant-half of dividend. If not, overflow occurs and can be checked even before the division begins. I did an exercise to see this work using a 4x2 divider: 4-bit x 2-bit divider: The dividend is two 2-bit registers, the divisor is one 2-bit register. The answer quotient and remainder must be 2-bit, i.e. 0..3. binary decimal ---------- ------- 11:11 / 11 = 15 / 3 = 5r0 overflow, divisor >= MS-half of dividend 11:10 / 11 = 14 / 3 = 4r2 overflow, divisor >= MS-half of dividend 11:01 / 11 = 13 / 3 = 4r1 overflow, divisor >= MS-half of dividend 11:00 / 11 = 12 / 3 = 4r0 overflow, divisor >= MS-half of dividend ---- 10:11 / 11 = 11 / 3 = 3r2 10:10 / 11 = 10 / 3 = 3r1 10:01 / 11 = 9 / 3 = 3r0 10:00 / 11 = 8 / 3 = 2r2 01:11 / 11 = 7 / 3 = 2r1 01:10 / 11 = 6 / 3 = 2r0 01:01 / 11 = 5 / 3 = 1r2 01:00 / 11 = 4 / 3 = 1r1 00:11 / 11 = 3 / 3 = 1r0 00:10 / 11 = 2 / 3 = 0r2 00:01 / 11 = 1 / 3 = 0r1 00:00 / 11 = 0 / 3 = 0r0 You can see where if the MSbits are >= to the divisor that the result will not fit in the 2-bit quotient and 2-bit remainder, and thus set overflow. What also becomes clear by looking at this is the limitations within the range of numbers. For example, you can't divide 8 by 2, since the answer is 4R0, but 4 cannot be represented with 2-bits and would set the overflow flag. So, in the 32x16 division, there are actually many ranges of numbers that cannot be divided if the dividend is too large compared to the divisor. Luckily this is easily determined by comparing the divisor to the MS-word of the dividend as noted above. Some of the ranges for various NxM division are: Max Dividend / Max Divisor 4-bit / 2-bit ------------- 10:11 / 11 = 11 / 3 = 3r2 Max 4-bit dividend value - max allowable dividend: 15 - 11 = 4 8-bit / 4-bit ------------- 1110:1111 / 1111 = 239 / 15 = 15r14 Max 8-bit dividend value - max allowable dividend: = 255 - 239 = 16 16-bit / 8-bit -------------- 11111110:11111111 / 11111111 = 65279 / 255 = 255r254 Max 16-bit dividend value - max allowable dividend: 65535 - 65279 = 256 32-bit / 16-bit --------------- 1111111111111110:1111111111111111 / 1111111111111111 = 4294901759 / 65535 = 65535r65534 Max 32-bit dividend value - max allowable dividend: 4294967295 - 4294901759 = 65536 It is interesting to note that the difference between the max value for a dividend (all bits '1'), and the max allowable dividend, is one greater than the max value of the divisor. Here is a C program I wrote as an interpretation of the HDL divider (the corrected version) in the F18A: #include <stdio.h> #include <stdlib.h> #include <inttypes.h> enum { IDLE, OP, DONE, QUIT }; int main(int argc, char *argv[]) { uint32_t d; // 16-bit dividend uint32_t rl; // 16-bit divisor ms half uint32_t rh; // 16-bit divisor ls half uint32_t msb; // 1-bit ms bit shifting out of rh uint32_t diff; // 16-bit partial remainder uint32_t sub17; // 17-bit 17-bit subtraction uint32_t q_bit; // 1-bit quotient bit uint8_t count; // 4-bit 0 to 15 counter uint32_t dividend; // 32-bit input dividend uint32_t divisor; // 16-bit input divisor uint32_t q; // 16-bit output quotient uint32_t r; // 16-bit output remainder uint32_t state; state = IDLE; if ( argc != 3 ) { printf("\nUsage: %s dividend divisor\n\n", argv[0]); state = QUIT; } else { dividend = atoi(argv[1]); divisor = (atoi(argv[2]) & 0xFFFF); } while ( state != QUIT ) { // FSM is synchronous with the main clock. switch ( state ) { case IDLE : d = divisor; rl = (dividend & 0xFFFF); rh = ((dividend >> 16) & 0xFFFF); msb = 0; count = 15; printf("%u / %u\n", (rh<<16)+rl, d); // Divisor must be greater than the dividend ms half. if ( d > rh ) state = OP; else { state = QUIT; printf("overflow\n"); } break; case OP : // msb stores the shifted-out bit of rh for the 17-bit subtract. msb = ((diff << 1) & 0x10000); // msb is positioned for the 17-bit subtraction // rh shifts left and stores the difference and next dividend bit. rh = ((diff << 1) & 0xFFFE) + ((rl & 0x8000) >> 15); // rl shifts left and stores the quotient bits. rl = ((rl << 1) & 0xFFFE) + q_bit; if ( count == 0 ) state = DONE; count--; break; case DONE : // Final iteration stores the quotient and remainder. rh = diff; rl = ((rl << 1) & 0xFFFE) + q_bit; state = QUIT; default : state = QUIT; break; } printf("diff:%u q_bit:%u msb:%u rh:%u rl:%u\n", diff, q_bit, msb, rh, rl); // Combinatorial logic, always evaluated. // 17-bit subtraction sub17 = (msb + rh) - d; // If the partial remainder is greater than the divisor, the // result will be positive. if ( (sub17 & 0x10000) == 0 ) { // Partial remainder is the difference. diff = (sub17 & 0xFFFF); q_bit = 1; } else { // Partial remainder is still smaller than the divisor. diff = rh; q_bit = 0; } } q = rl; r = rh; printf("%u R %u\n", q, r); return 0; } // main() For anyone interested, below is the corrected VHDL 32x16 division implementation I wrote for the 9900 GPU. Note that overflow is tested outside of this module before the DIV even starts, like this: -- The divisor (src_oper) must be > the MSB of the dividend if ws_dout >= src_oper then cpu_state <= st_cpu_status; div_overflow <= '1'; end if; -- F18A V1.5 -- Matthew Hagerty, copyright 2013 -- -- Create Date: 20:09:36 04/12/2012 -- Module Name: f18a_div32x16 - rtl -- -- Unsigned 32-bit dividend by 16-bit divisor division for the -- TMS9900 compatible GPU. 16-clocks for the div op plus two -- clocks state change overhead. library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; use ieee.std_logic_unsigned.all; entity f18a_div32x16 is port ( clk : in std_logic; reset : in std_logic; -- active high, forces divider idle start : in std_logic; -- '1' to load and trigger the divide ready : out std_logic; -- '1' when ready, '0' while dividing done : out std_logic; -- single done tick dividend_msb : in std_logic_vector(0 to 15); -- MS Word of dividend 0 to FFFE dividend_lsb : in std_logic_vector(0 to 15); -- LS Word of dividend 0 to FFFF divisor : in std_logic_vector(0 to 15); -- divisor 0 to FFFF q : out std_logic_vector(0 to 15); r : out std_logic_vector(0 to 15) ); end f18a_div32x16; architecture rtl of f18a_div32x16 is type div_state_t is (st_idle, st_op, st_done); signal div_state : div_state_t; signal rl : std_logic_vector(0 to 15); -- dividend lo 16-bits signal rh : unsigned(0 to 15); -- dividend hi 16-bits signal msb : std_logic; -- shifted msb of dividend for 17-bit subtraction signal diff : unsigned(0 to 15); -- quotient - divisor difference signal sub17 : unsigned(0 to 16); -- 17-bit subtraction signal q_bit : std_logic; -- quotient bit signal d : unsigned(0 to 15); -- divisor register signal count : integer range 0 to 15; -- 0 to 15 counter signal rdy : std_logic; signal dne : std_logic; begin -- Quotient and remainder will never be more than 16-bit. q <= rl; r <= std_logic_vector(rh); ready <= rdy; done <= dne; -- Compare and subtract to derive each quotient bit. sub17 <= (msb & rh) - ('0' & d); process (sub17, rh) begin -- If the partial result is greater than or equal to -- the divisor, subtract the divisor and set a '1' -- quotient bit for this round. if sub17(0) = '0' then diff <= sub17(1 to 16); q_bit <= '1'; -- The partial result is smaller than the divisor -- so set a '0' quotient bit for this round. else diff <= rh; q_bit <= '0'; end if; end process; -- Divide process (clk) begin if rising_edge(clk) then if reset = '1' then div_state <= st_idle; else rdy <= '1'; dne <= '0'; case div_state is when st_idle => d <= unsigned(divisor); count <= 15; msb <= '0'; -- Only change rl and rh when triggered so the registers -- retain their values after the division operation. if start = '1' then div_state <= st_op; rl <= dividend_lsb; rh <= unsigned(dividend_msb); rdy <= '0'; end if; when st_op => -- rl shifts left and stores the quotient bits. rl <= rl(1 to 15) & q_bit; -- rh shifts left and stores the difference and next dividend bit. rh <= diff(1 to 15) & rl(0); -- msb stores the shifted-out bit of rh for the 17-bit subtract. msb <= diff(0); count <= count - 1; rdy <= '0'; if count = 0 then div_state <= st_done; end if; when st_done => -- Final iteration stores the quotient and remainder. rl <= rl(1 to 15) & q_bit; rh <= diff; dne <= '1'; div_state <= st_idle; end case; end if; end if; end process; end rtl; If you are not familiar with HDL, note that the assignments ( <= ) are not done in a serial manner like in programming, but in parallel (this is hardware, not software). So, something like: count <= count - 1; if count = 0 then ... The value of "count" (a register) in the condition has not changed when the "if" is evaluated, and will not change to the new value until the next clock cycle. That is one of things you have to wrap your head around when moving from programming to describing hardware. The inherent parallelism is very cool IMO. Edited July 2, 2017 by matthew180 Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2732413 Share on other sites More sharing options...
+mizapf Posted April 9, 2013 Share Posted April 9, 2013 Matthew, thanks a lot for the elaborate answer! I'll have to save that message for some spare time to come during the next days ... hopefully ... Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2732424 Share on other sites More sharing options...
matthew180 Posted July 24, 2013 Author Share Posted July 24, 2013 (edited) On page 2 of this thread, about post #38 or so, I talked about what the console ISR does and also how much I dislike the way the 99/4A only has a single external interrupt available. Don't get me wrong, I like interrupts. They are efficient. But the way the 99/4A was designed, they add a lot of unnecessary overhead. For me, polling the VDP was the way to go, however in light of the new information in the "Smooth Scrolling" thread, I have to change my opinion on this. It appears that polling the original 9918A VDP can be problematic. Apparently the original VDP does not manage asynchronous hazards and if you are polling the status byte at the same time the VDP is updating the status byte, the flags will be cleared and come back in the status byte as clear, i.e. you miss the VSYNC and/or collision flags. If you can tolerate a few VSYNC or collision misses every few seconds, then polling might still be an option. But for smooth scrolling, timing, or sound processing, the glitches can be noticeable. Because of this new information, using the VDP interrupt, and subsequently the console ISR seems to be a more stable option. The only problem with the console ISR is all the baggage it brings along, but luckily most of it can be disabled. It does still have overhead since it makes various checks before getting to your code, so you will have to decide if that is acceptable. Assuming you read pg2 from post #38 on, I'm only going to give a small example of disabling the ISR features and hooking your own routine. The control byte for the ISR is in the 16-bit scratchpad RAM at address >83C2. The first four bits control the various ISR services: auto sprite motion, the sound player, and the QUIT key. Yes, that is only three services, the fourth is "disable all" (and is actually the first bit, the MSb, in the control byte). To be safe, I set the control byte to >FF which ensures everything in the ISR is disabled. One *nice* thing the ISR does actually do is to allow us to "hook" into the ISR. When the ISR is all done with its processing, it will check the 16-bit value at scratchpad address >83C4, and if it is not zero, it will use that value as a address to branch to. So, we can put the address of a subroutine we wrote in >83C4 and when the VDP generates and interrupt, the console ISR (after a little overhead) will call our code. It is actually very nice, I just wish the check for the ISR hook had been *first* in the console ISR instead of last. The hook code we write should be small since an ISR is supposed to "service" something and return quickly. In my example, I simply use a flag in my code that can only be set by our ISR hook. That keeps things very small and neat. One more thing about the ISR hook. Before you load your own ISR routine into >83C4, you are supposed to see if the value has already been set. If it has, some other code has also hooked into the console ISR and you are supposed to save that address before loading your own address. Then, when you are done with your own ISR, instead of returning with B *R11, you are supposed to branch to the address you found at >83C4. This allows multiple programs to hook the ISR and all of them will be called in a chain. You will probably only see this if you are working on assembly code to be LINKed into XB or something. If your program is loaded via the E/A or you expect to be the only program running, you can just ignore any previous value. :-) So, on with the example. I like to use names in my code instead of remembering addresses and numbers, so I set up some equates (#define for those more familiar with other languages): *Note, as of this post, code tags are still broken on the forum, so the spacing will not work out. Thus I'm not even going to try (much). You will have to reformat this yourself. ISRCTL EQU >83C2 * Four flags: disable all, skip sprite, skip sound, skip QUIT, XXXX USRISR EQU >83C4 * Interrupt service routine hook address My ISR simply sets a flag and returns. This is the *only* place in the code where this flag can be set to something other than zero. Very important. The flag here is in 16-bit RAM, and is just an address I chose to use. It could also be allocated with a DATA statement: FLGINT EQU >8354 * Interrupt flag - or - FLGINT DATA 0 . . Somewhere in the code is the very simple ISR. . MYISR INC @FLGINT B *R11 To hook in to the console ISR, somewhere during your initialization, and before you begin any game loop or such thing, to something like this: LI R0,>FF00 MOVB R0,@ISRCTL * Disable everything in the ISR LI R0,MYISR * Address of my ISR MOV R0,@USRISR * Load my ISR into the console ISR hook address CLR @FLGINT ** Initially clear my ISR flag Keep in mind you code still runs mostly with LIMI 0, i.e. the CPU's interrupts disabled. Or not, it is up to you really. Now in your game loop or where ever you need to wait for the VSYNC, you use something like this: VWAIT CLR @FLGINT * Optional depending on if LIMI 0 is normal, or if you want to know you missed an interrupt LIMI 2 * Enable interrupts VLOOP MOV @FLGINT,@FLGINT * Cheap way compare to zero. JEQ VLOOP * If still zero, wait. LIMI 0 * Optional depending on how you want things to work. CLR @FLGINT * Interrupt was set, so clear the flag and continue. . . Rest of the code. . How you manage things is up to you. You might not want to sit around in an idle loop waiting for the VSYNC, or maybe you do. You could just leave interrupts enabled (LIMI 2) and check the flag in your main loop, and if set, clear it and do something. If you want to know that you missed a VSYNC, then you can leave interrupts enabled during your main processing and check the flag when you are done, etc. The only thing to remember is, you *MUST* disabled interrupts when you communicate with the VDP, i.e. update the screen, set a VDP register, etc. This is because the console ISR will read the VDP status byte which will mess up any VDP communication you may have had going on when the interrupt occurred. And if auto sprite motion is enabled, the console ISR is going to update the whole Sprite Attribute Table. Edited July 24, 2013 by matthew180 4 Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2797433 Share on other sites More sharing options...
+mizapf Posted August 7, 2013 Share Posted August 7, 2013 Oh well, already some months ago... To relax, now a little bit of math for you. Got more time now, so I had a new look at the DIVS overflow detection, and now I think I got it. I have produced a heap of paper with wrong solutions, most of the attempts ending in sufficient but not necessary criteria (if the check indicates overflow, there will be an overflow, but if it says no, there could still be an overflow). But now here is a solution which worked with all test cases. Signed division, V=32 bit signed value, D=16 bit signed divisor, not 0. Case: V>=0 and D>0. Overflow iff V > D << 15 - 1. Case: V>=0 and D<0. Overflow iff V > (-D) << 15 + (-D) - 1. Case: V<0 and D>0. Overflow iff (-V) > D << 15 + D - 1. Case: V<0 and D<0. Overflow iff (-V) > (-D) << 15 - 1. (Non-math people: "iff"="if and only if". "D<<15" = shift left D by 15; alternatively: D*2^15; << binds stronger than +/-) The symmetry in the solutions gives me some confidence that this should be correct, even catching the cases with that nasty 8000. My fault in earlier attempts was that I divided the value by 2^15 and threw away the important bits in the lower positions. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2806149 Share on other sites More sharing options...
+Vorticon Posted October 2, 2013 Share Posted October 2, 2013 Quick question: The EA manual states that whenever the value in VR1 is being changed, the new value should first be saved in location >83D4 before effecting the change to the register. Is this truly necessary? I can't seem to see a difference regardless of whether I save the new value or not... Thanks. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840182 Share on other sites More sharing options...
matthew180 Posted October 2, 2013 Author Share Posted October 2, 2013 If interrupts (i.e. the console ISR) are enabled it might be necessary. It might be more of an "XB thing", but I'm not sure. Can't remember. Sorry for being so vague. If you are working with XB or some other language, plan to use the ISR, or call DSRs, then you probably should follow the rules. If you are taking over the console, i.e. a game that does not return to XB or anywhere else, do what you want with the scratchpad. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840195 Share on other sites More sharing options...
marc.hull Posted October 2, 2013 Share Posted October 2, 2013 Quick question: The EA manual states that whenever the value in VR1 is being changed, the new value should first be saved in location >83D4 before effecting the change to the register. Is this truly necessary? I can't seem to see a difference regardless of whether I save the new value or not... Thanks. I "think".... The KSCAN routine (IIRC) writes the value found @ >83d4 to R1 when it is accessed. I guess as part of the screen saver feature ? I don't know if it is connected to the ISR or does it @ every call. If You change screen modes and don't update >83d4 then a call to KSCAN will revert your screen mode back to what it was before you changed it. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840241 Share on other sites More sharing options...
Asmusr Posted October 2, 2013 Share Posted October 2, 2013 This gives my an opportunity to post an old question that I was struggling with for Titanium and now again for Scramble. The code below if for a pause key toggle. It works fine to toggle the pause on, but when you try to toggle the pause off again it is often turned back on immediately after. I assume the problem is lack of debouncing, but how can you prevent it except by adding long delays? Even if I could, I wouldn't want to call KSCAN. * Test pause key P LI R1,>0500 * Test column 5 LI R12,>0024 * Address for column selection LDCR R1,3 * Select column LI R12,>000A * P key PAUSE1 TB 0 JEQ PAUSE1 * Wait for press PAUSE2 TB 0 JNE PAUSE2 * Wait for release * Do pause stuff PAUSE3 TB 0 JEQ PAUSE3 * Wait for press PAUSE4 TB 0 JNE PAUSE4 * Wait for release PAUSE5 ... Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840253 Share on other sites More sharing options...
RXB Posted October 2, 2013 Share Posted October 2, 2013 Why do you not save some memory and use the OS built in Keyscan? It does have a built in debounce and would do the trick with no real cost for speed. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840278 Share on other sites More sharing options...
Tursi Posted October 2, 2013 Share Posted October 2, 2013 Quick question: The EA manual states that whenever the value in VR1 is being changed, the new value should first be saved in location >83D4 before effecting the change to the register. Is this truly necessary? I can't seem to see a difference regardless of whether I save the new value or not... Thanks. It's necessary if you call KSCAN -- when a key is pressed, KSCAN copies the value in >83D4 back into VR1 (to turn off any potential screen blanking ). Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840279 Share on other sites More sharing options...
Tursi Posted October 2, 2013 Share Posted October 2, 2013 This gives my an opportunity to post an old question that I was struggling with for Titanium and now again for Scramble. The code below if for a pause key toggle. It works fine to toggle the pause on, but when you try to toggle the pause off again it is often turned back on immediately after. I assume the problem is lack of debouncing, but how can you prevent it except by adding long delays? Even if I could, I wouldn't want to call KSCAN. * Test pause key P LI R1,>0500 * Test column 5 LI R12,>0024 * Address for column selection LDCR R1,3 * Select column LI R12,>000A * P key PAUSE1 TB 0 JEQ PAUSE1 * Wait for press PAUSE2 TB 0 JNE PAUSE2 * Wait for release * Do pause stuff PAUSE3 TB 0 JEQ PAUSE3 * Wait for press PAUSE4 TB 0 JNE PAUSE4 * Wait for release PAUSE5 ... It likely is key bounce.. the two ways around keybounce are long delays and multiple samples over shorter delays - when you get the same result 3-4 times in a row, then you can accept it. Since it's a pause you are dealing with, I would just put the delay in the game loop, and ignore the pause key for 100ms or so after pausing (and vice versa) - so the game runs immediately, but you can't pause again immediately. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840280 Share on other sites More sharing options...
matthew180 Posted October 2, 2013 Author Share Posted October 2, 2013 I "think".... The KSCAN routine (IIRC) writes the value found @ >83d4 to R1 when it is accessed. I guess as part of the screen saver feature ? I don't know if it is connected to the ISR or does it @ every call. If You change screen modes and don't update >83d4 then a call to KSCAN will revert your screen mode back to what it was before you changed it. Ah yes, exactly. The "screen saver" on the 99/4A blanks the screen by setting the blank-bit in VR1, so it would have to know what to write for the other VR1 bits when blanking/unblanking. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840290 Share on other sites More sharing options...
matthew180 Posted October 2, 2013 Author Share Posted October 2, 2013 I assume the problem is lack of debouncing, but how can you prevent it except by adding long delays Along the same lines as what Tursi already mentioned, you need to sample the key over a period of time. The console 16.does this by imposing a long delay in the KSCAN routine, which is really bad for a game. Human reaction time is something on the order of 200ms (IIRC), so check the input for 5 consecutive frames and increment a counter if the key is pressed. Then if the value is between 3 and 5 then accept the key's state. You might have to play with the values a little, but that is the basic idea. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840297 Share on other sites More sharing options...
matthew180 Posted October 2, 2013 Author Share Posted October 2, 2013 Why do you not save some memory and use the OS built in Keyscan? It does have a built in debounce and would do the trick with no real cost for speed. Because it does cost speed, to the tune of ~16ms per call and that is *just* the delay loops used in KSCAN and does not count the overhead of the rest of the routine. I don't think Rasmus can spare an extra frame just to read the key input and still maintain the scrolling. Look in the TI-Intern at ROM addresses >0390 and again at >03B6. They branch to >0498 which is this code: Time delay 0498 020C LI 12,>04E2 Loop counter 049A 04E2 049C 060C DEC 12 049E 16FE JNE >049C 04A0 045B B *11 >04E2 = 1250 decimal. DEC takes 10 clock cycles, and JNE takes 10 clock cycles (when the PC is changed, which it is inside the loop). The formula for instruction time is: T = Tc * (C + W * M) T = instruction time in uS Tc = 0.333 C = clock cycles, which is 20 in this case for both instructions W = wait states, which luckily is 0 because this is in 16-bit ROM M = memory accesses, which is 6 for both instructions T = 0.333 * (20 + 0 * 6) T = 6.66uS * 1250 loop iterations = 8.325 milliseconds. The delay is called twice in KSCAN. There are easily better ways, especially for games. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840302 Share on other sites More sharing options...
+Vorticon Posted October 3, 2013 Share Posted October 3, 2013 Ah yes, exactly. The "screen saver" on the 99/4A blanks the screen by setting the blank-bit in VR1, so it would have to know what to write for the other VR1 bits when blanking/unblanking. OK this makes sense to me now. This issue came up as I continue work on Ultimate Planet and noted that VR1 was not saved prior to displaying the bitmap splash screen, with no ill effects. However, I did follow the rules in the main program which also accesses KSCAN, so I should be good. Thanks for the insight guys Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840387 Share on other sites More sharing options...
marc.hull Posted October 3, 2013 Share Posted October 3, 2013 This gives my an opportunity to post an old question that I was struggling with for Titanium and now again for Scramble. The code below if for a pause key toggle. It works fine to toggle the pause on, but when you try to toggle the pause off again it is often turned back on immediately after. I assume the problem is lack of debouncing, but how can you prevent it except by adding long delays? Even if I could, I wouldn't want to call KSCAN. * Test pause key P LI R1,>0500 * Test column 5 LI R12,>0024 * Address for column selection LDCR R1,3 * Select column LI R12,>000A * P key PAUSE1 TB 0 JEQ PAUSE1 * Wait for press PAUSE2 TB 0 JNE PAUSE2 * Wait for release * Do pause stuff PAUSE3 TB 0 JEQ PAUSE3 * Wait for press PAUSE4 TB 0 JNE PAUSE4 * Wait for release PAUSE5 ... Perhaps your scheme could be to....... Sample the key once per VDP interrupt (I assume that is your time slice .) When you have 2 consecutive, positive hits (during 2 consecutive frames) then activate the pause/un-pause feature. After a positive press has been accepted then do not allow another " positive input" scan until 4 consecutive no hits have been registered (on consecutive frames again.) This would give you the de-bounce you need without any delay required and the required key down time should be no more than 1/15 of a second which should work for a pause switch. That number (4 frames total) is assuming 2 frames for settling with NTSC which is most likely more than needed). Of course it is an arbitrary number and may take more but I would hazard that 2 would be more than enough for a solid press ..... Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840397 Share on other sites More sharing options...
Asmusr Posted October 3, 2013 Share Posted October 3, 2013 Thank you for the replies. Perhaps a simple solution would be only to execute the code every 4 or 8 frames? That should prevent another pause from being triggered immediately after one is released. I will try that before moving on to the more advanced solutions. If KSCAN is waiting an entire frame of 16ms that makes it useless for high speed speed games. 16ms is enough for scrolling, sprites, collision detection, sound, and input reading and game logic combined (actually Scramble only uses about 12ms). Edit: Just realized the sample code I posted was misleading. The first loop should not be there. It should jump to PAUSE5 if the key if not pressed instead of waiting . Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840515 Share on other sites More sharing options...
jens-eike Posted October 3, 2013 Share Posted October 3, 2013 (edited) How about using different keys for (P)ause and (C )ontinue? This adds a compare, but avoids timers and counters (and comparing their results in turn). Edited October 3, 2013 by jens-eike Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840518 Share on other sites More sharing options...
+retroclouds Posted October 3, 2013 Share Posted October 3, 2013 Along the same lines as what Tursi already mentioned, you need to sample the key over a period of time. The console 16.does this by imposing a long delay in the KSCAN routine, which is really bad for a game. Human reaction time is something on the order of 200ms (IIRC), so check the input for 5 consecutive frames and increment a counter if the key is pressed. Then if the value is between 3 and 5 then accept the key's state. You might have to play with the values a little, but that is the basic idea. So I have to ask. Is this KSCAN slowing down running Basic / Extended Basic programs ? Besides the GPL interpretation I mean. Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840521 Share on other sites More sharing options...
sometimes99er Posted October 3, 2013 Share Posted October 3, 2013 So I have to ask. Is this KSCAN slowing down running Basic / Extended Basic programs ? Besides the GPL interpretation I mean. KSCAN do have a delay after reading the keyboard - it counts down from 1250 (DEC, JNE, no interrupts and fast memory). I guess TI thought this a value safe. I wonder if the 3 to 3.58 MHz CPU speedup hack makes it unsafe. Waiting a frame before reading the keyboard directly, should be safe too (you can count to more than 1250 in one frame (unless something like speech stops you)). Waiting a frame on the other hand might only be safe using the ISR (LIMI 2). I guess some kind of counting would be safe too (while reading the VDP status register), but it has to be proven (with screen blanking, speech, hardware attached etc.). Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840530 Share on other sites More sharing options...
Willsy Posted October 3, 2013 Share Posted October 3, 2013 So I have to ask. Is this KSCAN slowing down running Basic / Extended Basic programs ? Besides the GPL interpretation I mean. Yes. I can see it in TurboForth. TF uses the console's KSCAN routine, and if you put it in a short loop the delay is noticable. Try this in TF: : TEST1 ( -- ) PAGE 500 0 DO I . KEY? DROP LOOP ; : TEST2 ( -- ) PAGE 500 0 DO I . LOOP ; Quote Link to comment https://forums.atariage.com/topic/162941-assembly-on-the-994a/page/10/#findComment-2840531 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.