z80 Assembly question

jdgabbard · February 10, 2016

I'm needing a delay function for a hardware interface. Essentially I need to have a piece of modular code that I can use to delay time when I cannot read the busy flag from an 16x2 LCD of the HD44780 variety. During initialization as an example. Everything I have read has suggested that this delay be anywhere from 40us to 2.1ms.

I have written a short delay function, and think this will work for timing a 1.1ms delay. Just basically looking to ensure I'm not a long ways off. Ports are not specified, as I am not entirely sure where I'm going to decode the LCD to. But here is the code, syntax may be a little off...

;Routine for delaying approx 1.1ms with 6mhz clock
;
;My best guess is that this will generate about 6671 T-States
;at 166ns per cycle, for a total of about 1.108ms. Does this look
;corrent?


DELAY:

	LD A, FFh	;Load A with 11111111b
	CALL D_LOOP	;Jump to Delay Loop
	RET

D_LOOP:

	NOP		;Do Nothing
	NOP
	NOP
	DEC A		;Decrement A
	JP Z, D_LOOP	;If A>0 jump to Delay Loop
	RET		;If A=0 Return

GroovyBee · February 10, 2016

First a quick comment about commenting assembly language. Don't explain what the instruction is doing. You can see that by looking at the opcode and its data. Instead, try to explain what you are trying to achieve.

Something like this should work (I haven't coded in Z80 in decades) :-

DELAY:
    ld b, FFh            ; Iterations required to achieve a time of ??.??ms
DELAY_LOOP:
    nop                  ; Burn some CPU cycles.
    nop
    nop
    djnz DELAY_LOOP      ; Interations finished?
    ret                  ; Yes! All done...

JamesD · February 11, 2016

The problem with delay loops is they are dependent on CPU MHz and any wait states of a particular machine.
If we don't know that, we can't tell you how many clock cycles to delay.
From there it's just a matter of counting clock cycles.

jdgabbard · February 11, 2016

Looks pretty similar to what I came up with, except DJNZ is a relative jump. Is there any reason, other than saving the extra 5 or 6 bytes, for using DJNZ? Looks like the function would last only a uS or two longer without doing the math.

jdgabbard · February 11, 2016

The problem with delay loops is they are dependent on CPU MHz and any wait states of a particular machine.

If we don't know that, we can't tell you how many clock cycles to delay.

From there it's just a matter of counting clock cycles.

The MHz were mentioned above, 6mhz. As for wait states, I'm not anticipating any. This is a homebrew system I'm building. A portable switch panel with LCD to be precise.

Edited February 11, 2016 by jdgabbard

jdgabbard · February 11, 2016

Please excuse the size of the attachment...

JamesD · February 11, 2016

You have the right idea but GroovyBee's code is better.

My math is a little rusty on this, but if we do the math to find out how many t states are in 40 uSec:
6MHz = 6 million cycles per second.
period in sec = 1 / 6,000,000
period * 1,000,000 = uSec

So one t state = .166666666... uSec

40 uSec = 6.666666... aka 7 t states?

Two NOPs will take you over that... if I did the math right.

As for the loop delay (GroovyBee's version):
CALL = 17 t states

LD B,FFh = 7 t states

NOP = 4 t states

DJNZ Branch taken = 13 t states, branch not taken = 8 t states

RET = 10 t states

17 + 7 + (254 * ((4 * 3) + 13)) + 8 + 10 = 6392 total t states
6392 * 1.6666666... = 1,065 uSec or 1.065 ms... so you are pretty close to 1.1 ms with that delay loop.

*edit*
If you want the loop to be slightly longer, you can use LD B,00h. The decrement takes place before the test so that should add 1 more pass for 25 more t states.

Edited February 11, 2016 by JamesD

jdgabbard · February 11, 2016

So, functionality is basically the same. GroovyBee's example achieves desired outcome and is more elegant while saving a few bytes. But shouldn't there be a return opcode right after the call? Like this:

DELAY:
    ld b, FFh            ; Iterations required to achieve a time of ??.??ms
    ret			 ; Back from whence you came....
DELAY_LOOP:
    nop                  ; Burn some CPU cycles.
    nop
    nop
    djnz DELAY_LOOP      ; Interations finished?
    ret                  ; Yes! All done...

I'm not really too knowledgeable on the programming end, but understand the basic flow of things. And it appears that you'd end up in an unknown state without it.

GroovyBee · February 11, 2016

In your modification the code after the DELAY_LOOP label is never executed.

jdgabbard · February 11, 2016

In your example, lets say I had a piece of code that was being executed. I perform a call to this delay routine. The return that you have at the end of DELAY_LOOP would return back to the original piece of code that was running?

GroovyBee · February 11, 2016

In your example, lets say I had a piece of code that was being executed. I perform a call to this delay routine. The return that you have at the end of DELAY_LOOP would return back to the original piece of code that was running?

It would continue execution at the instruction after the "call DELAY".

JamesD · February 11, 2016

CALL is the assembly equivalent of BASIC's GOSUB and RET is the equivalent of RETURN

DJNZ is decrement jump not zero. So it decrements B and branches if the zero flag in the condition code register is not set. Execution continues with the next instruction if it is.

Edited February 11, 2016 by JamesD

JamesD · February 11, 2016

Notice that I counted the number of t states that the call requires as well as the delay routine.

You should count any instructions that are executed between when you want to start timing and where execution should continue after the delay.

You can make a small change to the code to use a variable delay.

    ;
    ld b,00h            ; Iterations required to achieve a time of 1.0695ms
    call DELAY
    ;continue executing


DELAY:
    nop                  ; Burn some CPU cycles.
    nop
    nop
    djnz DELAY           ; Interations finished?
    ret                  ; Yes! All done...

*edit*

You might want to add t state info to the comments for the delay loop so if you have to tweak it or call it in many places you don't need to look up the info again.

Edited February 11, 2016 by JamesD

JamesD · February 12, 2016

...

As for the loop delay (GroovyBee's version):

CALL = 17 t states

LD B,FFh = 7 t states

NOP = 4 t states

DJNZ Branch taken = 13 t states, branch not taken = 8 t states

RET = 10 t states

17 + 7 + (254 * ((4 * 3) + 13)) + 8 + 10 = 6392 total t states

6392 * 1.6666666... = 1,065 uSec or 1.065 ms... so you are pretty close to 1.1 ms with that delay loop.

...

Actually, I screwed up the math.

The first version I added branch not taken when I had already included branch taken in the calculation

.

Here I'm subtracting the difference between branch taken and branch not taken to adjust for the last pass.

17+7+(254*((4*3)+13))-(13-8 )+10

^ CALL

^ LD

^ loop # of passes

^ 3 NOP

^ DJNZ

^ adjustment for branch not taken on last pass instead of branch taken

^ RET

Edited February 12, 2016 by JamesD

JamesD · February 12, 2016

That also brings up the fact that the LD is 7 t states which is 40 uSec.
If you are even using a LD in your loop to write to the the display you are over the minimum delay.

Since you probably need to load a value and then store it to the display I'd say you may not require any additional delay.

George Phillips · February 13, 2016

That also brings up the fact that the LD is 7 t states which is 40 uSec.

No, a reciprocal has snuck into your calculations here. 7 T-States is 7 * 0.16666 = 1.17 microseconds.

40 microseconds is exactly 240 T-States: 40 * (1 / 6) = 240.

George Phillips · February 13, 2016

While it is probably overkill for the problem at hand, a generic routine that delays a given number of T-States can save effort in the long run. Here's one such example: http://members.shaw.ca/gp2000/beamhack3.html

Useful commentary there, but I'll copy the routine here for easy reference:

; wHL -- Waste HL + 100 T states. Only uses A, HL.

wHL256:
        dec     h               ;<0>  | <4>
        ld      a,256-4-4-12-4-7-17-81       ; 81 is wA overhead
                                ;<0>  | <7>
        call    wA              ;<0>  | <17+A>
wHL:    inc     h               ;<4>  | <4>
        dec     h               ;<4>  | <4>
        jr      nz,wHL256       ;<7>  | <12>
        ld      a,l             ;<4>
wA:     rrca                    ;<4>
        jr      c,wHL_0s        ;<7>  | <12> 1 extra cycle if bit 0 set
        nop                     ;<4>  | <0>
wHL_0s: rrca                    ;<4>
        jr      nc,wHL_1c       ;<12> | <7>  2 extra cycles if bit 1 set
        jr      nc,wHL_1c       ;<0>  | <7>
wHL_1c: rrca                    ;<4>
        jr      nc,wHL_2c       ;<12> | <7>  4 extra cycles if bit 2 set
        ret     nc              ;<0>  | <5>
        nop                     ;<0>  | <4>
wHL_2c: rrca                    ;<4>
        jr      nc,wHL_3c       ;<12> | <7>  8 extra cycles if bit 3 set
        ld      (0),a           ;<0>  | <13>
wHL_3c: and     a,0fh           ;<7>
        ret     z               ;<11> | <5>  done if no other bits set
wHL_16: dec     a               ;<0>  | <4>  loop away 16 for remaining count
        jr      nz,wHL_16       ;<0>  | <12>
        ret     z               ;<0>  | <11>
; Last jr was 7, but the extra 5 from "ret z" keeps us at 16 * A.
; The "ret z" cost balances the previous "ret z" in the 0 case.

It runs for 100 + HL T-States so can give you a delay of 100 (HL=0) to 65635 (HL=65535) T-States. Hidden inside is the "wA" routine which will run for 81 + A T-States.

I have a handy chart that helps when counting T-States: http://members.shaw.ca/gp2000/T-states.txt

Finally, there's my version of the zmac assembler which can automate a lot of the T-State counting: http://members.shaw.ca/gp2000/zmac.html

JamesD · February 14, 2016

No, a reciprocal has snuck into your calculations here. 7 T-States is 7 * 0.16666 = 1.17 microseconds.

40 microseconds is exactly 240 T-States: 40 * (1 / 6) = 240.

Du-Oh! All I had to do is read my own comment. 1 t state = .1666666 uSec.

...

So one t state = .166666666... uSec

40 uSec = 6.666666... aka 7 t states?

...

40 uSec / .1666666 = 240 t states

Told you my math was rusty

z80 Assembly question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members