Jump to content
IGNORED

F18A performance in VDP read, write and scrolling


Recommended Posts

I’ve developed a conio prototype that otherwise uses libti99 with gcc. It is relatively quick in that vdp writes are aggregated eliminating setting the vdp write address for each byte. Screen scrolls are also faster as the parts of the display that have been written to are cached in ram thus eliminating the reads from vdp. I wrote this as most of my planned work is expected to be text based. But as one could imagine there’s several costs, being extra RAM, and about 2KB code. The RAM is about 1KB and will be around 2 in the 80x24 text mode of the F18A.

 

I understand vdp ram access to be slowed mostly due to the fact that the stock vdp is busy updating the screen. Does the F18A eliminate much of the delay since it runs at 100MHz?

 

The trade off of resources to performance in my implementation are significant and I wonder if the gains will pretty much be lost in using the F18A. I thought I read it has some sort of hardware scroll as well.

 

Thoughts?

  • Like 3
Link to comment
Share on other sites

Yes the F18A has hardware registers for pixel smooth scrolling. However, it requires that you keep track of offsets and pages, so perhaps it's not ideal for a c library? Alternatively you can write a GPU program to move the data in VDP RAM. This will scroll the screen maybe 100 times faster than the CPU can do it and will run in parallel.  

  • Like 1
Link to comment
Share on other sites

Libti99 has a GPU routine for scrolling extended attribute text, which works with its conio routines as well. You can compare performance, and cherry pick what you want out of there... 

 

I think it is defined in the routine for setting up 80 column color text mode.

  • Thanks 1
Link to comment
Share on other sites

The F18A can also run at bus speed, meaning the usual delays are not necessary when accessing it. BUT, there are only a few limited cases on the 99/4A where you can actually overrun a stock VDP anyway, so you won't gain much performance by removing those cases.

 

Though.. I already had conio in libti99? Why the new one?

 

 

 

  • Like 2
Link to comment
Share on other sites

21 hours ago, Tursi said:

The F18A can also run at bus speed, meaning the usual delays are not necessary when accessing it. BUT, there are only a few limited cases on the 99/4A where you can actually overrun a stock VDP anyway, so you won't gain much performance by removing those cases.

 

Though.. I already had conio in libti99? Why the new one?

 

 

 

I was attempting to increase performance and did so, primarily due to minimizing the updating of the vdp address for every character written. The resource utilization in my implementation is high though, and reduces compatibility in an already small community. My current thoughts are to mostly use your conio but augment slightly to minimize the address writes. I think that could be done by not driving all output through the putc method and allowing puts to write chars sequentially to vdp as possible.

Link to comment
Share on other sites

On 7/7/2023 at 3:58 PM, jedimatt42 said:

Libti99 has a GPU routine for scrolling extended attribute text, which works with its conio routines as well. You can compare performance, and cherry pick what you want out of there... 

 

I think it is defined in the routine for setting up 80 column color text mode.

Good to know. I don’t know much about extended attributes yet. I would rather reuse what is present.

Link to comment
Share on other sites

On 7/7/2023 at 12:45 PM, Asmusr said:

Yes the F18A has hardware registers for pixel smooth scrolling. However, it requires that you keep track of offsets and pages, so perhaps it's not ideal for a c library? Alternatively you can write a GPU program to move the data in VDP RAM. This will scroll the screen maybe 100 times faster than the CPU can do it and will run in parallel.  

I have a lot 0f learning to do on this vdp. I just installed it on my eBay special ti99 but don’t have any expansion ram yet to try thing out.

Link to comment
Share on other sites

56 minutes ago, mrvan said:

I was attempting to increase performance and did so, primarily due to minimizing the updating of the vdp address for every character written. The resource utilization in my implementation is high though, and reduces compatibility in an already small community. My current thoughts are to mostly use your conio but augment slightly to minimize the address writes. I think that could be done by not driving all output through the putc method and allowing puts to write chars sequentially to vdp as possible.

Yeah, that's fair. I wrote my base library for performance. I wrote conio cause someone asked for it. ;)

 

  • Like 2
Link to comment
Share on other sites

On 7/7/2023 at 2:06 PM, mrvan said:

I understand vdp ram access to be slowed mostly due to the fact that the stock vdp is busy updating the screen. Does the F18A eliminate much of the delay since it runs at 100MHz?

 

Mostly.  As Tursi mentioned, is has been shown that the 99/4A can only overrun the VDP in limited cases anyway.  The F18A does have limits though, since it has to synchronize the /CSR and /CSW signals over a few clock cycles on either end of a read or write, plus a few cycles for the operation, the max rate is about 80ns per operation, or 12.5MHz.  However, there are no other restrictions, i.e. accesses per scan line, etc. that the 9918A must impose.  The 9918A has a 200ns access time for the /CSR and /CSW, which is 5MHz, however it cannot sustain that theoretical 320 accesses per scan line.  The F18A can sustain about 800 accesses per scan line (about 200K per frame, or 12.5MB/sec (hence the 12.5MHz stated above)).

 

The F18A GPU has fewer restrictions on timing and can access VRAM faster.

 

On 7/7/2023 at 2:06 PM, mrvan said:

The trade off of resources to performance in my implementation are significant and I wonder if the gains will pretty much be lost in using the F18A.

 

Your gains appear to be on the host-side, so those would be gains regardless of there being a 9918A or F18A in the system.  As long as you are overrunning the 9918A, then your gains are realized with either VDP.

 

On 7/7/2023 at 2:06 PM, mrvan said:

I thought I read it has some sort of hardware scroll as well.

 

As Asmusr pointed out, it does have scrolling registers, but there is a bit of software house-keeping to manage the data that is scrolling in/out of the page, etc..  A GPU program (similar to a modern "shader") that your library loads could help make an interface for scrolling that is easier on the programmer and lib.

 

A GPU shader (see how I'm using modern terms with old-school tech? :P ) could also be used to help with your lib in some way, maybe.  Depends on what you are trying to do, and if you want to restrict yourself to F18A only code.  But, if you are using T80, then you are already F18A-bound.

  • Like 2
Link to comment
Share on other sites

On 7/13/2023 at 1:41 PM, matthew180 said:

 

Mostly.  As Tursi mentioned, is has been shown that the 99/4A can only overrun the VDP in limited cases anyway.  The F18A does have limits though, since it has to synchronize the /CSR and /CSW signals over a few clock cycles on either end of a read or write, plus a few cycles for the operation, the max rate is about 80ns per operation, or 12.5MHz.  However, there are no other restrictions, i.e. accesses per scan line, etc. that the 9918A must impose.  The 9918A has a 200ns access time for the /CSR and /CSW, which is 5MHz, however it cannot sustain that theoretical 320 accesses per scan line.  The F18A can sustain about 800 accesses per scan line (about 200K per frame, or 12.5MB/sec (hence the 12.5MHz stated above)).

 

The F18A GPU has fewer restrictions on timing and can access VRAM faster.

 

 

Your gains appear to be on the host-side, so those would be gains regardless of there being a 9918A or F18A in the system.  As long as you are overrunning the 9918A, then your gains are realized with either VDP.

 

 

As Asmusr pointed out, it does have scrolling registers, but there is a bit of software house-keeping to manage the data that is scrolling in/out of the page, etc..  A GPU program (similar to a modern "shader") that your library loads could help make an interface for scrolling that is easier on the programmer and lib.

 

A GPU shader (see how I'm using modern terms with old-school tech? :P ) could also be used to help with your lib in some way, maybe.  Depends on what you are trying to do, and if you want to restrict yourself to F18A only code.  But, if you are using T80, then you are already F18A-bound.

Thanks Matthew180. Excellent feedback. I’m looking forward to trying out the f18a after I get some ram. All the demos I have req the memory expansion as does my personal code base.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...