jdgabbard Posted February 10, 2016 Share Posted February 10, 2016 I'm needing a delay function for a hardware interface. Essentially I need to have a piece of modular code that I can use to delay time when I cannot read the busy flag from an 16x2 LCD of the HD44780 variety. During initialization as an example. Everything I have read has suggested that this delay be anywhere from 40us to 2.1ms. I have written a short delay function, and think this will work for timing a 1.1ms delay. Just basically looking to ensure I'm not a long ways off. Ports are not specified, as I am not entirely sure where I'm going to decode the LCD to. But here is the code, syntax may be a little off... ;Routine for delaying approx 1.1ms with 6mhz clock ; ;My best guess is that this will generate about 6671 T-States ;at 166ns per cycle, for a total of about 1.108ms. Does this look ;corrent? DELAY: LD A, FFh ;Load A with 11111111b CALL D_LOOP ;Jump to Delay Loop RET D_LOOP: NOP ;Do Nothing NOP NOP DEC A ;Decrement A JP Z, D_LOOP ;If A>0 jump to Delay Loop RET ;If A=0 Return Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted February 10, 2016 Share Posted February 10, 2016 First a quick comment about commenting assembly language. Don't explain what the instruction is doing. You can see that by looking at the opcode and its data. Instead, try to explain what you are trying to achieve. Something like this should work (I haven't coded in Z80 in decades) :- DELAY: ld b, FFh ; Iterations required to achieve a time of ??.??ms DELAY_LOOP: nop ; Burn some CPU cycles. nop nop djnz DELAY_LOOP ; Interations finished? ret ; Yes! All done... 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted February 11, 2016 Share Posted February 11, 2016 The problem with delay loops is they are dependent on CPU MHz and any wait states of a particular machine.If we don't know that, we can't tell you how many clock cycles to delay.From there it's just a matter of counting clock cycles. Quote Link to comment Share on other sites More sharing options...
jdgabbard Posted February 11, 2016 Author Share Posted February 11, 2016 Looks pretty similar to what I came up with, except DJNZ is a relative jump. Is there any reason, other than saving the extra 5 or 6 bytes, for using DJNZ? Looks like the function would last only a uS or two longer without doing the math. Quote Link to comment Share on other sites More sharing options...
jdgabbard Posted February 11, 2016 Author Share Posted February 11, 2016 (edited) The problem with delay loops is they are dependent on CPU MHz and any wait states of a particular machine. If we don't know that, we can't tell you how many clock cycles to delay. From there it's just a matter of counting clock cycles. The MHz were mentioned above, 6mhz. As for wait states, I'm not anticipating any. This is a homebrew system I'm building. A portable switch panel with LCD to be precise. Edited February 11, 2016 by jdgabbard Quote Link to comment Share on other sites More sharing options...
jdgabbard Posted February 11, 2016 Author Share Posted February 11, 2016 Please excuse the size of the attachment... Quote Link to comment Share on other sites More sharing options...
JamesD Posted February 11, 2016 Share Posted February 11, 2016 (edited) You have the right idea but GroovyBee's code is better. My math is a little rusty on this, but if we do the math to find out how many t states are in 40 uSec:6MHz = 6 million cycles per second. period in sec = 1 / 6,000,000period * 1,000,000 = uSec So one t state = .166666666... uSec 40 uSec = 6.666666... aka 7 t states?Two NOPs will take you over that... if I did the math right.As for the loop delay (GroovyBee's version):CALL = 17 t states LD B,FFh = 7 t states NOP = 4 t states DJNZ Branch taken = 13 t states, branch not taken = 8 t states RET = 10 t states 17 + 7 + (254 * ((4 * 3) + 13)) + 8 + 10 = 6392 total t states 6392 * 1.6666666... = 1,065 uSec or 1.065 ms... so you are pretty close to 1.1 ms with that delay loop.*edit*If you want the loop to be slightly longer, you can use LD B,00h. The decrement takes place before the test so that should add 1 more pass for 25 more t states. Edited February 11, 2016 by JamesD 1 Quote Link to comment Share on other sites More sharing options...
jdgabbard Posted February 11, 2016 Author Share Posted February 11, 2016 So, functionality is basically the same. GroovyBee's example achieves desired outcome and is more elegant while saving a few bytes. But shouldn't there be a return opcode right after the call? Like this: DELAY: ld b, FFh ; Iterations required to achieve a time of ??.??ms ret ; Back from whence you came.... DELAY_LOOP: nop ; Burn some CPU cycles. nop nop djnz DELAY_LOOP ; Interations finished? ret ; Yes! All done... I'm not really too knowledgeable on the programming end, but understand the basic flow of things. And it appears that you'd end up in an unknown state without it. Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted February 11, 2016 Share Posted February 11, 2016 In your modification the code after the DELAY_LOOP label is never executed. Quote Link to comment Share on other sites More sharing options...
jdgabbard Posted February 11, 2016 Author Share Posted February 11, 2016 In your example, lets say I had a piece of code that was being executed. I perform a call to this delay routine. The return that you have at the end of DELAY_LOOP would return back to the original piece of code that was running? Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted February 11, 2016 Share Posted February 11, 2016 In your example, lets say I had a piece of code that was being executed. I perform a call to this delay routine. The return that you have at the end of DELAY_LOOP would return back to the original piece of code that was running? It would continue execution at the instruction after the "call DELAY". Quote Link to comment Share on other sites More sharing options...
JamesD Posted February 11, 2016 Share Posted February 11, 2016 (edited) CALL is the assembly equivalent of BASIC's GOSUB and RET is the equivalent of RETURNDJNZ is decrement jump not zero. So it decrements B and branches if the zero flag in the condition code register is not set. Execution continues with the next instruction if it is. Edited February 11, 2016 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted February 11, 2016 Share Posted February 11, 2016 (edited) Notice that I counted the number of t states that the call requires as well as the delay routine. You should count any instructions that are executed between when you want to start timing and where execution should continue after the delay. You can make a small change to the code to use a variable delay. ; ld b,00h ; Iterations required to achieve a time of 1.0695ms call DELAY ;continue executing DELAY: nop ; Burn some CPU cycles. nop nop djnz DELAY ; Interations finished? ret ; Yes! All done... *edit*You might want to add t state info to the comments for the delay loop so if you have to tweak it or call it in many places you don't need to look up the info again. Edited February 11, 2016 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted February 12, 2016 Share Posted February 12, 2016 (edited) ... As for the loop delay (GroovyBee's version): CALL = 17 t states LD B,FFh = 7 t states NOP = 4 t states DJNZ Branch taken = 13 t states, branch not taken = 8 t states RET = 10 t states 17 + 7 + (254 * ((4 * 3) + 13)) + 8 + 10 = 6392 total t states 6392 * 1.6666666... = 1,065 uSec or 1.065 ms... so you are pretty close to 1.1 ms with that delay loop. ... Actually, I screwed up the math. The first version I added branch not taken when I had already included branch taken in the calculation . Here I'm subtracting the difference between branch taken and branch not taken to adjust for the last pass. 17+7+(254*((4*3)+13))-(13-8 )+10 ^ CALL ^ LD ^ loop # of passes ^ 3 NOP ^ DJNZ ^ adjustment for branch not taken on last pass instead of branch taken ^ RET Edited February 12, 2016 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted February 12, 2016 Share Posted February 12, 2016 That also brings up the fact that the LD is 7 t states which is 40 uSec.If you are even using a LD in your loop to write to the the display you are over the minimum delay. Since you probably need to load a value and then store it to the display I'd say you may not require any additional delay. Quote Link to comment Share on other sites More sharing options...
George Phillips Posted February 13, 2016 Share Posted February 13, 2016 That also brings up the fact that the LD is 7 t states which is 40 uSec. No, a reciprocal has snuck into your calculations here. 7 T-States is 7 * 0.16666 = 1.17 microseconds. 40 microseconds is exactly 240 T-States: 40 * (1 / 6) = 240. Quote Link to comment Share on other sites More sharing options...
George Phillips Posted February 13, 2016 Share Posted February 13, 2016 While it is probably overkill for the problem at hand, a generic routine that delays a given number of T-States can save effort in the long run. Here's one such example: http://members.shaw.ca/gp2000/beamhack3.html Useful commentary there, but I'll copy the routine here for easy reference: ; wHL -- Waste HL + 100 T states. Only uses A, HL. wHL256: dec h ;<0> | <4> ld a,256-4-4-12-4-7-17-81 ; 81 is wA overhead ;<0> | <7> call wA ;<0> | <17+A> wHL: inc h ;<4> | <4> dec h ;<4> | <4> jr nz,wHL256 ;<7> | <12> ld a,l ;<4> wA: rrca ;<4> jr c,wHL_0s ;<7> | <12> 1 extra cycle if bit 0 set nop ;<4> | <0> wHL_0s: rrca ;<4> jr nc,wHL_1c ;<12> | <7> 2 extra cycles if bit 1 set jr nc,wHL_1c ;<0> | <7> wHL_1c: rrca ;<4> jr nc,wHL_2c ;<12> | <7> 4 extra cycles if bit 2 set ret nc ;<0> | <5> nop ;<0> | <4> wHL_2c: rrca ;<4> jr nc,wHL_3c ;<12> | <7> 8 extra cycles if bit 3 set ld (0),a ;<0> | <13> wHL_3c: and a,0fh ;<7> ret z ;<11> | <5> done if no other bits set wHL_16: dec a ;<0> | <4> loop away 16 for remaining count jr nz,wHL_16 ;<0> | <12> ret z ;<0> | <11> ; Last jr was 7, but the extra 5 from "ret z" keeps us at 16 * A. ; The "ret z" cost balances the previous "ret z" in the 0 case. It runs for 100 + HL T-States so can give you a delay of 100 (HL=0) to 65635 (HL=65535) T-States. Hidden inside is the "wA" routine which will run for 81 + A T-States. I have a handy chart that helps when counting T-States: http://members.shaw.ca/gp2000/T-states.txt Finally, there's my version of the zmac assembler which can automate a lot of the T-State counting: http://members.shaw.ca/gp2000/zmac.html Quote Link to comment Share on other sites More sharing options...
JamesD Posted February 14, 2016 Share Posted February 14, 2016 No, a reciprocal has snuck into your calculations here. 7 T-States is 7 * 0.16666 = 1.17 microseconds. 40 microseconds is exactly 240 T-States: 40 * (1 / 6) = 240. Du-Oh! All I had to do is read my own comment. 1 t state = .1666666 uSec. ... So one t state = .166666666... uSec 40 uSec = 6.666666... aka 7 t states? ... 40 uSec / .1666666 = 240 t states Told you my math was rusty 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.