laoo Posted February 15, 2020 Share Posted February 15, 2020 (edited) Did anyone observe variance of CPU cycle timing? I'm referring to the fact that one CPU cycle can take 4 system ticks if memory access is within the same page or 5 system ticks otherwise. As I understood the documentation 4 cycles should be (more or less, e.g. if we do not cross pages) while reading opcode / operand and 5 cycles while reading data pointed by operands. I've devised that a sequence of repeated instructions should take: nop 4+4 lda #$80 4+4 lda $80 5+4+5 lda $8080 5+4+4+5 lda $8080,y 5+4+4+(5)+5 lda ($80) 5+4+5+4+5 lda ($80),y 5+4+5+4+(5)+5 lda ($80,x) 5+4+4+5+4+5 dec $80 5+4+5+4+4 dec $80,x 5+4+4+5+4+4 asl $8080,x 5+4+4+(5)+4+4+4 dec $8080,x 5+4+4+5+4+4+4 Number in parentheses is an optional cycle on page crossing. I tried to check if I'm right by writing a program that has a HBI handler that burns some cycles to position the CPU on the screen, changes background color, executes sequence of 32 identical instructions and clears background color. I've actually written a program that performs all these instruction sequences in different display rows and I've added as a reference a sequence of 1 cycle NOP $x3 repeaded 32*1, 32*2, 32*3 etc times. On the image below are the results. The white pattern on right is binary encoded row number (last row has number 0 and last eight rows are a reference cycle counting NOPs). I've annotated the screen grabbing with instructions which are executed in given rows. As a bonus I set index registers to $00 on even rows and $ff on odd rows to observe page crossing (hence the jagged patterns on indexed addressing modes). So as you can see... there are NO differences whatsoever. Every instruction sequence's timing is exact and proper multiplicity of reference timing of a sequence of 1-cycle nops. I've tried standard $EA NOP as well (with halved length) and the result was the same. Am I missing something? Because it seems that each cycle of each and every tested instruction took the same amount of system ticks. Obviously I can't tell whether it's 4 or 5. I'm attaching executable with source code to be compiled with mads assembler PS. As a side note you can see that DEC $8080,x and ASL $8080,x takes the same amount of cycles. Contrary to documentation which says that DEC and INC always takes 7 cycles. test.zip Edited February 15, 2020 by laoo 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted February 16, 2020 Share Posted February 16, 2020 (edited) Does not work on my Lynx. Just returns to BLL loader. See also: Edited February 16, 2020 by 42bs Quote Link to comment Share on other sites More sharing options...
Cyprian Posted February 16, 2020 Share Posted February 16, 2020 22 hours ago, laoo said: Did anyone observe variance of CPU cycle timing? did you test it on Lynx 1, Lynx 2 or emu? Quote Link to comment Share on other sites More sharing options...
42bs Posted February 17, 2020 Share Posted February 17, 2020 The doc says (http://www.monlynx.de/lynx/lynx4.html#TOP) Quote Cycle Min Max --------------------------------------------------- Page Mode RAM(read) 4 4 Normal RAM(r/w) 5 5 And: Quote The requirement for using a page mode cycle is that the current access is in the same 256 address page of memory as the previous access. So, you won't be able to see the 5 ticks cycles in the benchmark. It is only the instruction after skiping the page boundary which is a tick longer. Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 (edited) On 2/16/2020 at 11:55 AM, 42bs said: Does not work on my Lynx. Just returns to BLL loader. I've put some effort into it and managed to prepare a source code with embedded loader so that straightforward simple program can be assembled directly to LNX file. So try attached file Timing.zip. On 2/16/2020 at 7:31 PM, Cyprian_K said: did you test it on Lynx 1, Lynx 2 or emu? Lynx II Hayato, so with additional bit instructions. I would love it if someone could run it on Lynx I or first version of Lynx II. No emulator can do such stuff On 2/17/2020 at 6:33 AM, 42bs said: The doc says (http://www.monlynx.de/lynx/lynx4.html#TOP) And: So, you won't be able to see the 5 ticks cycles in the benchmark. It is only the instruction after skiping the page boundary which is a tick longer. I'm not convinced. The table you've pasted says that only reads can take 4 ticks. In my test I'm testing read and RMW instructions. Furthermore if the instruction isn't immediate it obviously reads or writes from other page. So there must be some 5 tick cycles. It's hard to do a test with code crossing page boundary several times but I could try to do it. As a bonus I've attached another test with sequences of 453 CPU cycles each that alters background color filling (almost) whole line. Timing.zip Line.zip Edited February 19, 2020 by laoo Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 (edited) A lnx file needs to be flashed, so if you have a single .o file it is easier to text. Edit: The .lnx file crashes Edited February 19, 2020 by 42bs Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 Out of curiostiy, I did a small test doing this in HBL (aligned on page boundary): REPT 16 dec $FDA0 stz $fda0 ENDR And get this (McWill LCD, but original looks the same):As you can see, number of cycles differ. The minimum width is 1 pixel (at 75Hz it is 0.794us) Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 (edited) And this is 32 NOP and 32 STA $FF With this code: HBL: txa inx lsr lsr lsr bcc a bcs b a dec $FDA0 REPT 32 nop ENDR stz $fda0 END_IRQ b dec $fda0 rept 32 inc $ff endr stz $fda0 END_IRQ So the offset at the beginning is due to the test selection. Edited February 19, 2020 by 42bs Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 One last (code, .o here: https://github.com/42Bastian/lynx_hacking/tree/master/cycle_check_hbl) Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 (edited) This is weird. I've run your example without problems. I've changed my code to be similar to yours (I've changed to 75 Hz and only obvious difference is that I don't use VBL but I'm dispatching code through VCOUNT. Furthermore I don't return from interrupt but jump to long sequence of NOP, maybe it's the reason why my output is more stable on the screen) and I get different results. I've made a collage of result of your code and my below and my sequences seems to be taking longer: Furthermore if counting pixels in your code $1b takes 84 pixels, NOP takes 168 and ADC $ff takes a bit more - 284. It seems to be almost consistent with 21 pixels per tick. ( $1b - 4 ticks, NOP - 4+4, ADC $ff - 5+4+5, for rest is the same ). Of course only approximately as we should take into account few tick burned by dec $FDA0, stz $fda0 and by feching video data. On the other hand my cycles seems to be taking more time and approximately proportionally for each tested instruction. Could you add a reference pattern of 32*1, 32*2, 32*3 etc cycles of $1b. I'm running my code from LynxSD so it may be the reason that it works on my Lynx. I'll try to prepare 128 kB LYX image. Maybe this will be more compatible. I've got currently two machines: PAG-0400 and PAG-401. The output is identical. Edited February 19, 2020 by laoo Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 4 hours ago, 42bs said: Out of curiostiy, I did a small test doing this in HBL (aligned on page boundary): REPT 16 dec $FDA0 stz $fda0 ENDR And get this (McWill LCD, but original looks the same):As you can see, number of cycles differ. The minimum width is 1 pixel (at 75Hz it is 0.794us) Regarding this pattern I believe that the irregularities are (mostly?) due to video data fetching taking place each few columns. There are 10 such places where 8 bytes are fetched in sequence. It can be clearly seen in my example from another thread: Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 Updated test: Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 (edited) 25 minutes ago, 42bs said: Updated test: Clearly ADC $FF takes more than 3 and less than 4 fold the time of $1b. The question is why I have different results. I didn't make everything up I'm now suspecting that I'm not initializing hardware properly. EDIT: I've logged your initialization in HandyBug and done my initialization the same... no change. Edited February 19, 2020 by laoo Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 @laoo something must be wrong, as the .o file crashes on a real lynx. I suspect the "not returning" from interrupt may be a problem. Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 Interessting: The 3-byte NOP $5c takes 9 cycles. So for tied CPU time burning loops the ideal opcode Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 59 minutes ago, 42bs said: @laoo something must be wrong, as the .o file crashes on a real lynx. I suspect the "not returning" from interrupt may be a problem. Indeed. Something must be wrong. But my lynx is real too https://drive.google.com/file/d/1LqKpAEaH5zzq7eYd6SzuAikPKvFBoJR0/view I suspect that in this form it runs only on LynxSD. Could someone with LynxSD try to run, please, the code from my attachments? Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 4 minutes ago, laoo said: Indeed. Something must be wrong. But my lynx is real too https://drive.google.com/file/d/1LqKpAEaH5zzq7eYd6SzuAikPKvFBoJR0/view I suspect that in this form it runs only on LynxSD. Could someone with LynxSD try to run, please, the code from my attachments? Strange. So maybe LynxSD does some init which your code is missing when I load the .o file. Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 1 minute ago, 42bs said: Strange. So maybe LynxSD does some init which your code is missing when I load the .o file. I've replicated initializations from your .o file to mine and both are currently (roughly) the same. Maybe LynxSD has some other initializations than your runtime. I don't know what BLL does before launching .o file. I think that comparing it ideally would involve making 256k LYX image and run it on AgaCart as it doesn't initialize anything on its own. Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 (edited) 1 hour ago, 42bs said: I suspect the "not returning" from interrupt may be a problem. I've changed the test so that it does return from HBI to a loop of loop stz GREEN0 jmp loop And there were two bugs that emerged in this scenario. I've fixed them but the output is essentially the same: Could you try this fixed version? PS. The $5c NOP seems to be taking 8 cycles. crti.o crti.lnx Edited February 19, 2020 by laoo Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 (edited) It returns diretly to BLL after launch. Ok, it was the load address of $200 which does not work with the "standard" BLL. Edited February 19, 2020 by 42bs Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 (edited) 35 minutes ago, laoo said: PS. The $5c NOP seems to be taking 8 cycles. Yes, that is the official value from the VLSI TECHNOLOGY, INC. data book: Now the question is, why are some of your values different from mine. I have a PAG-401 Lynx. Edited February 19, 2020 by 42bs Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 1 hour ago, 42bs said: It returns diretly to BLL after launch. Ok, it was the load address of $200 which does not work with the "standard" BLL. So did you manage to run it or should I reassemble it to different address that does work with BLL? I'll try to rewrite the test to a LYX image to be run in more controllable environment. Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 36 minutes ago, laoo said: So did you manage to run it or should I reassemble it to different address that does work with BLL? I'll try to rewrite the test to a LYX image to be run in more controllable environment. I could load it. In BLL I can add the downloader to the program and this one is placed at the end of the RAM, so no clash. Quote Link to comment Share on other sites More sharing options...
42bs Posted February 19, 2020 Share Posted February 19, 2020 Now to something completely different (RIP Terry Jones): I wanted to get rid of the HBL interrupt, and did this: w0: lda $fd00 w1: cmp $fd02 bne w1 w2: cmp $fd02 beq w2 Means, I wanted to wait until Timer 0 (HBL) gets reloaded and then start after the first tick. What I see is, that every second line is skipped?! Any idea? Quote Link to comment Share on other sites More sharing options...
laoo Posted February 19, 2020 Author Share Posted February 19, 2020 Polling timers is tricky. It seems that Timer 0 is faster than your w2 roundabout and it gets decremented before you are able to check it. And then w2 loops till the next reload. It's not stated in the docs but I suspect that the timers registers are implemented in the same manner as audio registers - as a DPRAM processed in a cyclic manner. You could observe 1.25 μs latency in reading the counter then. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.