Jump to content
IGNORED

Assembly on the 99/4A


matthew180

Recommended Posts

10 hours ago, TheBF said:

FYI: Saving R15 did not fix the problem but I think it changed how things crash which is progress. 

There is something else I am missing.  

I will try pushing R12 before calling DSRLINK and popping R12 when it's done.

I can't understand how that could be the problem though because I in different workspaces with DSRLINK. ??

 

GPL workspace R13, R14, R15 are all important to have correct for routines that expect them, which includes pretty much all DSRs. R13 is the current GROM base—usually, >9800. R14 has the VDP timer tick in the MSB and various flags (per Thierry Nouspikel: >20 cassette operations, >10 cassette verify, >08 16K VDP mem >02 multicolor mode, >01 sound table in VDP mem) in the LSB. R15 should always be >8C02, the VDP write address.

 

The standard DSRLNK routine does, indeed, have its own workspace, but, as noted by @InsaneMultitasker, it switches to the GPL workspace just before branching to a DSR.

 

...lee

  • Like 2
Link to comment
Share on other sites

Thanks Lee. I don't consciously alter those things but I will look for problems. R15 save did fix the issue partially.

With the interrupt driven serial console I can now compile code from floppy disk and return to the console. Yeah!

 

It seems to die when I paste code into the terminal, and that code accesses the drive with an include statement.

The disk file compiles but when I return to sending data the communication is messed up.

It may be as simple as the text continues to come in during file activity but it wraps around in the queue.

Next step is to implement a hard handshake when the buffer is 1/2 full to try and prevent overflows.

Link to comment
Share on other sites

10 hours ago, InsaneMultitasker said:

Standard DSRLNK flips to GPLWS before the call. Whether that matters or not, I am not sure. It is forth, so maybe you need to pop before you push?  :rolling:

 

lol. Nope. push before pop still applies.  :)

 

 

Link to comment
Share on other sites

OK.  I made an error in my testing and had to do it again.

So for certain R15 not being saved was the problem. 

 

With the fixed DSRLINK, saving/restoring R15 I can paste a single line into the terminal that causes disk activity (INCLUDE DSK1.TOOLS) 

and I return to the console correctly. 

 

The problem is almost for sure that I am overrunning the serial queue during disk activity. 

Need some rigorous handshaking in the ISR.

 

Thanks everybody for your help. 

 

 

  • Like 3
Link to comment
Share on other sites

3 hours ago, TheBF said:

With the fixed DSRLINK, saving/restoring R15 I c

As @Lee Stewart and @Tursipoint out, gplws R15 must also be set to the correct vdp address. My comments aren’t clear in that regard; simply saving and restoring gplws R15 would have been insufficient in cases where the DSR does not set R15 itself.  Glad to hear things are working now! 

Edited by InsaneMultitasker
  • Like 3
Link to comment
Share on other sites

Here are two short videos.

First one shows pasting code to compile into the Forth terminal at 19,200 bps using polled serial port I/O.

I had to use a 1mS delay per character because that's the minimum delay on TeraTerm. Sloooow.

 

Second video is ISR serial I/O at 19,200. No character delay, 10mS delay after line feed. 

 

This is so fun now. :) 

Thanks again.

 

 

  • Like 4
Link to comment
Share on other sites

On 2/20/2024 at 6:12 AM, dhe said:

Big thanks to @HOME AUTOMATION @TheBF @mizapf  !!!

 

I still have this one hanging question:

 

Question:

Are you saying, that you can pick off anything in column 0 with a single tb - like

   tb 4 will check spacebar,

     but if you want K, you would need to do a ldcr and stcr?

 

Side Note:

I have about 3 pages worth of notes centering around the 5 CRU Base instructions. I will put them up here when they are a little further along for you good folks to verify and as a potential aid to other novices.

 

 

 

Sorry to get back so late... been busy, and confused as to why you are confused.

 

Looking the problem over, I can see that not everything needed, has been covered.

 

Yes, I did try TB 4, SPACEBAR ...I gave this a thumbs-up in a previous post.

 

For K, use TB 4, with SBO 19, this turns on BIT 1 of the keyboard encoder.

 

      IDT  'SOUND'
*
* EXTERNAL REFERENCES
       REF  SOUND        REFERENCE SOUND PORT
       DEF  START
*
* EQUATED VALUES
DEBNCE EQU  >2000        DELAY TIME TO WAIT ON BOUNCING KEY
ROW2   EQU  4            DISPLACEMENT FOR ROW 2
*
START  LWPI WS           INITIALIZE WORKSPACE POINTER
       CLR  R12          POINT TO KEYBOARD
       SBO  19           turns on BIT 1 of the keyboard encoder
CHEKEY TB   ROW2         TEST KEY
       JEQ  $-2          WAIT TILL IT'S DOWN
       LI   R10,>9100    TURN ON
       MOVB R10,@SOUND    TONE
       LI   R2,DEBNCE    INIT R2 TO DEBOUNCE DELAY COUNT
       DEC  R2           WAIT FOR KEY
       JNE  $-2           TO STOP BOUNCING
       TB   ROW2         TEST KEY
       JNE  $-2          WAIT UNTIL IT'S UP
       LI   R10,>9F00    TURN OFF
       MOVB R10,@SOUND    TONE
       JMP  CHEKEY       GO WAIT FOR KEY TO BE PRESSED AGAIN
*
WS     BSS  32           WORKSPACE
       END

 

The ENCODER, has a 3-bit-wide input, and 8 individual outputs, which drive the columns.

 

P2, P3, and P4, are on CRU BITS, 18, 19, and 20, which respectively, activate BITS 2, 1, 0, of the keyboard select lines. BIT 2, being the Least Significant.

 

So, putting 110, on CRU BITS 18, 19, 20, will activate column 3. Whereas 011, activates column 6(JOYSTICK PORT, PIN 7).

 

For the above example K, 010.

 

I'm using PIN #s, from the QI's diagram, sometimes there are errors.

 

I added CRU BIT, and COLUMN, designations to U25...

image.thumb.jpeg.c9ef651983fe6163816b4336aff87c2b.jpeg

 

Hope this helps a BIT.:) 

Edited by HOME AUTOMATION
edited "ROW 2" into comments, added schematic.
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

  • 3 weeks later...

I love it when I make a connection.

 

Always known we have VSBW and VMBW, I just kind of went with the flow on that, but then I saw this little snippet in the VDP Programmers Guide:

"The VDP is connected to VRAM via a 14-bit auto-incrementing Address Register. Once the address to read
from or write to is set up (two-byte data transfer), we can read or write a byte of data using a one-byte
transfer. Continuing to read or write to the VDP causes the address to increment automatically.
Therefore, reading or writing a sequential chunk of data can be performed very quickly."

 

Hence, why two utilities instead of one. I also double checked, yep, with 14bits, you aren't getting more that 16K...

 

  • Like 2
Link to comment
Share on other sites

30 minutes ago, dhe said:

Hence, why two utilities instead of one. I also double checked, yep, with 14bits, you aren't getting more that 16K...

And there are only 14 bits because the remaining two address bits have a special meaning. Bit 1 indicates that the address setting prepares a write operation when set, while bit 0 indicates a video register write when set. (Bit numbering in the TI order: 0 as the MSB, 15 as the LSB)

  • Like 4
Link to comment
Share on other sites

Posted (edited)

Understanding that the VDP has an auto-incrementing address register is very important to using the VDP effectively.  However, it really has nothing to do with having VSBW and VMBW, since those are host CPU-side routines.  VMBW with a count of "1" is the same as VSBW, only slightly slower.  Actually, that is *almost* true, but as Home Automation pointed out (and I forgot), VMBW expects a pointer the the source data, and VSBW takes a single byte value.

 

If you look at a typical implementation of these two routines you can see they are very similar:

 

I actually don't know what a "typical implementation" is because I have not audited all 99/4A assembly code in the world that talks to the 9918A.  I should not have made up a statistic on the fly like that (and neither should anyone else).  Better to say: If you look at two functions that write bytes to the VDP, one for writing a single byte and one for writing multiple bytes, you will see they are very similar in how they talk to the VPD:

 

VSBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
       MOVB R1,@VDPWD         * Write byte to VDP RAM
       B    *R11
*// VSBW


VMBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
VMBWLP MOVB *R1+,@VDPWD       * Write byte to VDP RAM
       DEC  R2                * Byte counter
       JNE  VMBWLP            * Check if done
       B    *R11
*// VMBW

 

They both set the VDP's 14-bit address, then write the first byte.  For VMBW, it keeps writing until the specified number of bytes have been written to the VDP.  VSBW is really only useful if you need to write single bytes to VRAM, and each byte is going to some different address in VRAM.

 

Normally you want to try and batch your writes to VRAM such that you are writing to consecutive VRAM memory locations, thus avoiding having to repeatedly set up the VDP address register (which has to be done every time you need to write to VRAM at some address other than where the VDP address register is currently pointing).

 

Also, it is sometimes much easier to just in-line the VDP access depending on your situation.  The first two or three pages of this thread cover VDP access in detail.

Edited by matthew180
technical corrections, removed unfounded claims
  • Like 4
Link to comment
Share on other sites

27 minutes ago, HOME AUTOMATION said:

I believe the biggest difference between the two routines, is that VMBW, expects a buffer address, while VSBW, directly accommodates ASCII, values.

VSBW writes a byte to VDP memory. It has no checks or limits on the value of the byte. ( 0..255. So ASCII is ok but it doesn't know that  :) )

  • Like 3
Link to comment
Share on other sites

10 hours ago, matthew180 said:

If you look at a typical implementation...

The most typical is a BLWP version, since that's included in all standard support packages. The BL version you show is frequently used where top speed must be achieved.

Link to comment
Share on other sites

 

8 hours ago, apersson850 said:

The most typical is a BLWP version, ...

 

The discussion was around talking to the VDP and its auto-incrementing address pointer, not the local CPU's subroutine calling convention, which is irrelevant in this case.

 

If someone likes BLWP, then great.  BL is good too, as is in-lining the code and not calling any subroutine at all.  There is no "best way" or "common way" or "right way".  We all have opinions, which should not be conflated with, or presented as, fact.  People have preferences and ways they like to do things, as they should; to each their own.  There is a large section of this thread where all manner of calling conventions have been discussed and beat to death.  Time to move on an let people code they way they like.

 

I edited my previous post since I should not have made any statement about how a "typical" VDP routine might be implemented.  No one should claim to know what is "typical" or "common" without having audited the majority of all 99/4A code written.

  • Like 1
Link to comment
Share on other sites

Sure, you're right. When I saw "typical" I thought about the TI-supplied video utilities, and they are implemented with the BLWP concept, so I thought it could be worthwile to point out for some. But perhaps not - most people still active with the machine are pretty much experts by now.

Link to comment
Share on other sites

8 minutes ago, apersson850 said:

Sure, you're right.

 

So, I'm not trying to be right or wrong, I'm just trying to share technical information without being too pedantic.  I'm not here to argue either, just to enjoy the hobby.

 

10 minutes ago, apersson850 said:

When I saw "typical"

 

Yeah, I should not have written that (and I made a correction to my post).  I try really hard to not accidentally make up statistics when I'm posting.  It is easy to say "typically", yet how do I know that to be true?  Typical for me maybe, and the way I write my code, but it is misleading.

 

15 minutes ago, apersson850 said:

... and they are implemented with the BLWP concept, so I thought it could be worthwile to point out for some. ...

 

Innocent enough, and my apologizes for probably reading the reply wrong.

 

16 minutes ago, apersson850 said:

most people still active with the machine are pretty much experts by now.

 

There are actually a lot of people taking their first steps into assembly language around here.  The thread was started to help someone doing exactly that, and part way through (around 30 to 50) someone else picks it up starting from zero, and actually writes his first assembly game (Airshack, RIP).

  • Like 1
Link to comment
Share on other sites

Well, the most important thing is that we discuss. If we have different opinions it's in a way just better, because it puts light on the fact that depending on what you want to do, you can rate the usefulness of different tools differently.

I didn't have the same point of view when I created a sorting routine which can sort a thousand integers in less than half a second, compared to when I did support for Extended BASIC to store a large number of text strings using the memory image format on cassette tape.

 

So when (if) somebody write AI 2 I'll remark that INCT is faster, but AI 4 is better than INCT INCT. That's not to argue, it's just to show there is a different way.

  • Like 5
Link to comment
Share on other sites

C *R2+,*R2+, assuming it's running in 8 bit RAM, registers in scratchpad, and R2 points to memory that triggers the multiplexer (which is the majority), takes 14 cycles plus 4 cycles to read the instruction, plus 8 cycles plus 4 cycles for the multiplexer for each *R2+. So 14+4+8+4+8+4=42 cycles.

AI R2,4 with the same constraints takes 14 cycles plus 4 cycles to read the instruction, plus 8 + 4 for the immediate argument. So 14+4+8+4, or 30 cycles.

 

If registers are also in 8-bit RAM, then it's even worse since each register write adds another 4 cycles: 14+4+8+4+4+8+4+4=50 cycles vs 14+4+8+4+4=34 cycles.

 

If everything is in zero wait state memory, you still have 14+8+8=30 vs 14+8=22 cycles.

 

If you only have two bytes of code space, the trick works, but otherwise AI remains more efficient. 

 

  • Like 1
Link to comment
Share on other sites

2 hours ago, apersson850 said:

It autoincrements R2 twice, so the same as AI R2,4. Or INCT R2 INCT R2. But in one word of memory. Then it generates status bits you don't use, provided the purpose is just to increment R2 by 4.

Clever! Thanks for the explanation. It's a rather obtuse way of doing things, so there had better be a good reason to do so. In my world clarity of code trumps maximum efficiency in most situations. 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...