Jump to content
IGNORED

z80 Assembly question


jdgabbard

Recommended Posts

I'm needing a delay function for a hardware interface. Essentially I need to have a piece of modular code that I can use to delay time when I cannot read the busy flag from an 16x2 LCD of the HD44780 variety. During initialization as an example. Everything I have read has suggested that this delay be anywhere from 40us to 2.1ms.

 

I have written a short delay function, and think this will work for timing a 1.1ms delay. Just basically looking to ensure I'm not a long ways off. Ports are not specified, as I am not entirely sure where I'm going to decode the LCD to. But here is the code, syntax may be a little off...

;Routine for delaying approx 1.1ms with 6mhz clock
;
;My best guess is that this will generate about 6671 T-States
;at 166ns per cycle, for a total of about 1.108ms. Does this look
;corrent?


DELAY:

	LD A, FFh	;Load A with 11111111b
	CALL D_LOOP	;Jump to Delay Loop
	RET

D_LOOP:

	NOP		;Do Nothing
	NOP
	NOP
	DEC A		;Decrement A
	JP Z, D_LOOP	;If A>0 jump to Delay Loop
	RET		;If A=0 Return
Link to comment
Share on other sites

First a quick comment about commenting assembly language. Don't explain what the instruction is doing. You can see that by looking at the opcode and its data. Instead, try to explain what you are trying to achieve.

 

Something like this should work (I haven't coded in Z80 in decades) :-

DELAY:
    ld b, FFh            ; Iterations required to achieve a time of ??.??ms
DELAY_LOOP:
    nop                  ; Burn some CPU cycles.
    nop
    nop
    djnz DELAY_LOOP      ; Interations finished?
    ret                  ; Yes! All done...
  • Like 1
Link to comment
Share on other sites

The problem with delay loops is they are dependent on CPU MHz and any wait states of a particular machine.

If we don't know that, we can't tell you how many clock cycles to delay.

From there it's just a matter of counting clock cycles.

The MHz were mentioned above, 6mhz. As for wait states, I'm not anticipating any. This is a homebrew system I'm building. A portable switch panel with LCD to be precise.

 

post-41787-0-19556200-1455158297_thumb.jpeg

Edited by jdgabbard
Link to comment
Share on other sites

You have the right idea but GroovyBee's code is better.

My math is a little rusty on this, but if we do the math to find out how many t states are in 40 uSec:
6MHz = 6 million cycles per second.
period in sec = 1 / 6,000,000
period * 1,000,000 = uSec

So one t state = .166666666... uSec

40 uSec = 6.666666... aka 7 t states?

Two NOPs will take you over that... if I did the math right.


As for the loop delay (GroovyBee's version):
CALL = 17 t states

LD B,FFh = 7 t states

NOP = 4 t states

DJNZ Branch taken = 13 t states, branch not taken = 8 t states

RET = 10 t states

17 + 7 + (254 * ((4 * 3) + 13)) + 8 + 10 = 6392 total t states
6392 * 1.6666666... = 1,065 uSec or 1.065 ms... so you are pretty close to 1.1 ms with that delay loop.

*edit*
If you want the loop to be slightly longer, you can use LD B,00h. The decrement takes place before the test so that should add 1 more pass for 25 more t states.

Edited by JamesD
  • Like 1
Link to comment
Share on other sites

So, functionality is basically the same. GroovyBee's example achieves desired outcome and is more elegant while saving a few bytes. But shouldn't there be a return opcode right after the call? Like this:

DELAY:
    ld b, FFh            ; Iterations required to achieve a time of ??.??ms
    ret			 ; Back from whence you came....
DELAY_LOOP:
    nop                  ; Burn some CPU cycles.
    nop
    nop
    djnz DELAY_LOOP      ; Interations finished?
    ret                  ; Yes! All done...

I'm not really too knowledgeable on the programming end, but understand the basic flow of things. And it appears that you'd end up in an unknown state without it.

Link to comment
Share on other sites

In your example, lets say I had a piece of code that was being executed. I perform a call to this delay routine. The return that you have at the end of DELAY_LOOP would return back to the original piece of code that was running?

 

It would continue execution at the instruction after the "call DELAY".

Link to comment
Share on other sites

CALL is the assembly equivalent of BASIC's GOSUB and RET is the equivalent of RETURN

DJNZ is decrement jump not zero. So it decrements B and branches if the zero flag in the condition code register is not set. Execution continues with the next instruction if it is.

Edited by JamesD
Link to comment
Share on other sites

Notice that I counted the number of t states that the call requires as well as the delay routine.

You should count any instructions that are executed between when you want to start timing and where execution should continue after the delay.

 

You can make a small change to the code to use a variable delay.

 

 

    ;
    ld b,00h            ; Iterations required to achieve a time of 1.0695ms
    call DELAY
    ;continue executing


DELAY:
    nop                  ; Burn some CPU cycles.
    nop
    nop
    djnz DELAY           ; Interations finished?
    ret                  ; Yes! All done...
*edit*

You might want to add t state info to the comments for the delay loop so if you have to tweak it or call it in many places you don't need to look up the info again.

Edited by JamesD
Link to comment
Share on other sites

...

As for the loop delay (GroovyBee's version):

CALL = 17 t states

LD B,FFh = 7 t states

NOP = 4 t states

DJNZ Branch taken = 13 t states, branch not taken = 8 t states

RET = 10 t states

17 + 7 + (254 * ((4 * 3) + 13)) + 8 + 10 = 6392 total t states

6392 * 1.6666666... = 1,065 uSec or 1.065 ms... so you are pretty close to 1.1 ms with that delay loop.

...

Actually, I screwed up the math.

The first version I added branch not taken when I had already included branch taken in the calculation

.

Here I'm subtracting the difference between branch taken and branch not taken to adjust for the last pass.

17+7+(254*((4*3)+13))-(13-8 )+10

^ CALL

^ LD

^ loop # of passes

^ 3 NOP

^ DJNZ

^ adjustment for branch not taken on last pass instead of branch taken

^ RET

Edited by JamesD
Link to comment
Share on other sites

That also brings up the fact that the LD is 7 t states which is 40 uSec.
If you are even using a LD in your loop to write to the the display you are over the minimum delay.

Since you probably need to load a value and then store it to the display I'd say you may not require any additional delay.

Link to comment
Share on other sites

While it is probably overkill for the problem at hand, a generic routine that delays a given number of T-States can save effort in the long run. Here's one such example: http://members.shaw.ca/gp2000/beamhack3.html

Useful commentary there, but I'll copy the routine here for easy reference:

; wHL -- Waste HL + 100 T states. Only uses A, HL.

wHL256:
        dec     h               ;<0>  | <4>
        ld      a,256-4-4-12-4-7-17-81       ; 81 is wA overhead
                                ;<0>  | <7>
        call    wA              ;<0>  | <17+A>
wHL:    inc     h               ;<4>  | <4>
        dec     h               ;<4>  | <4>
        jr      nz,wHL256       ;<7>  | <12>
        ld      a,l             ;<4>
wA:     rrca                    ;<4>
        jr      c,wHL_0s        ;<7>  | <12> 1 extra cycle if bit 0 set
        nop                     ;<4>  | <0>
wHL_0s: rrca                    ;<4>
        jr      nc,wHL_1c       ;<12> | <7>  2 extra cycles if bit 1 set
        jr      nc,wHL_1c       ;<0>  | <7>
wHL_1c: rrca                    ;<4>
        jr      nc,wHL_2c       ;<12> | <7>  4 extra cycles if bit 2 set
        ret     nc              ;<0>  | <5>
        nop                     ;<0>  | <4>
wHL_2c: rrca                    ;<4>
        jr      nc,wHL_3c       ;<12> | <7>  8 extra cycles if bit 3 set
        ld      (0),a           ;<0>  | <13>
wHL_3c: and     a,0fh           ;<7>
        ret     z               ;<11> | <5>  done if no other bits set
wHL_16: dec     a               ;<0>  | <4>  loop away 16 for remaining count
        jr      nz,wHL_16       ;<0>  | <12>
        ret     z               ;<0>  | <11>
; Last jr was 7, but the extra 5 from "ret z" keeps us at 16 * A.
; The "ret z" cost balances the previous "ret z" in the 0 case.

It runs for 100 + HL T-States so can give you a delay of 100 (HL=0) to 65635 (HL=65535) T-States. Hidden inside is the "wA" routine which will run for 81 + A T-States.

 

I have a handy chart that helps when counting T-States: http://members.shaw.ca/gp2000/T-states.txt

 

Finally, there's my version of the zmac assembler which can automate a lot of the T-State counting: http://members.shaw.ca/gp2000/zmac.html

Link to comment
Share on other sites

No, a reciprocal has snuck into your calculations here. 7 T-States is 7 * 0.16666 = 1.17 microseconds.

 

40 microseconds is exactly 240 T-States: 40 * (1 / 6) = 240.

Du-Oh! All I had to do is read my own comment. 1 t state = .1666666 uSec.

...

So one t state = .166666666... uSec

40 uSec = 6.666666... aka 7 t states?

...

40 uSec / .1666666 = 240 t states

 

Told you my math was rusty

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...