mrvan Posted July 7 Share Posted July 7 I’ve developed a conio prototype that otherwise uses libti99 with gcc. It is relatively quick in that vdp writes are aggregated eliminating setting the vdp write address for each byte. Screen scrolls are also faster as the parts of the display that have been written to are cached in ram thus eliminating the reads from vdp. I wrote this as most of my planned work is expected to be text based. But as one could imagine there’s several costs, being extra RAM, and about 2KB code. The RAM is about 1KB and will be around 2 in the 80x24 text mode of the F18A. I understand vdp ram access to be slowed mostly due to the fact that the stock vdp is busy updating the screen. Does the F18A eliminate much of the delay since it runs at 100MHz? The trade off of resources to performance in my implementation are significant and I wonder if the gains will pretty much be lost in using the F18A. I thought I read it has some sort of hardware scroll as well. Thoughts? 3 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted July 7 Share Posted July 7 Yes the F18A has hardware registers for pixel smooth scrolling. However, it requires that you keep track of offsets and pages, so perhaps it's not ideal for a c library? Alternatively you can write a GPU program to move the data in VDP RAM. This will scroll the screen maybe 100 times faster than the CPU can do it and will run in parallel. 1 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted July 7 Share Posted July 7 Libti99 has a GPU routine for scrolling extended attribute text, which works with its conio routines as well. You can compare performance, and cherry pick what you want out of there... I think it is defined in the routine for setting up 80 column color text mode. 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted July 9 Share Posted July 9 The F18A can also run at bus speed, meaning the usual delays are not necessary when accessing it. BUT, there are only a few limited cases on the 99/4A where you can actually overrun a stock VDP anyway, so you won't gain much performance by removing those cases. Though.. I already had conio in libti99? Why the new one? 2 Quote Link to comment Share on other sites More sharing options...
mrvan Posted July 10 Author Share Posted July 10 21 hours ago, Tursi said: The F18A can also run at bus speed, meaning the usual delays are not necessary when accessing it. BUT, there are only a few limited cases on the 99/4A where you can actually overrun a stock VDP anyway, so you won't gain much performance by removing those cases. Though.. I already had conio in libti99? Why the new one? I was attempting to increase performance and did so, primarily due to minimizing the updating of the vdp address for every character written. The resource utilization in my implementation is high though, and reduces compatibility in an already small community. My current thoughts are to mostly use your conio but augment slightly to minimize the address writes. I think that could be done by not driving all output through the putc method and allowing puts to write chars sequentially to vdp as possible. Quote Link to comment Share on other sites More sharing options...
mrvan Posted July 10 Author Share Posted July 10 On 7/7/2023 at 3:58 PM, jedimatt42 said: Libti99 has a GPU routine for scrolling extended attribute text, which works with its conio routines as well. You can compare performance, and cherry pick what you want out of there... I think it is defined in the routine for setting up 80 column color text mode. Good to know. I don’t know much about extended attributes yet. I would rather reuse what is present. Quote Link to comment Share on other sites More sharing options...
mrvan Posted July 10 Author Share Posted July 10 On 7/7/2023 at 12:45 PM, Asmusr said: Yes the F18A has hardware registers for pixel smooth scrolling. However, it requires that you keep track of offsets and pages, so perhaps it's not ideal for a c library? Alternatively you can write a GPU program to move the data in VDP RAM. This will scroll the screen maybe 100 times faster than the CPU can do it and will run in parallel. I have a lot 0f learning to do on this vdp. I just installed it on my eBay special ti99 but don’t have any expansion ram yet to try thing out. Quote Link to comment Share on other sites More sharing options...
Tursi Posted July 10 Share Posted July 10 56 minutes ago, mrvan said: I was attempting to increase performance and did so, primarily due to minimizing the updating of the vdp address for every character written. The resource utilization in my implementation is high though, and reduces compatibility in an already small community. My current thoughts are to mostly use your conio but augment slightly to minimize the address writes. I think that could be done by not driving all output through the putc method and allowing puts to write chars sequentially to vdp as possible. Yeah, that's fair. I wrote my base library for performance. I wrote conio cause someone asked for it. 2 Quote Link to comment Share on other sites More sharing options...
matthew180 Posted July 13 Share Posted July 13 On 7/7/2023 at 2:06 PM, mrvan said: I understand vdp ram access to be slowed mostly due to the fact that the stock vdp is busy updating the screen. Does the F18A eliminate much of the delay since it runs at 100MHz? Mostly. As Tursi mentioned, is has been shown that the 99/4A can only overrun the VDP in limited cases anyway. The F18A does have limits though, since it has to synchronize the /CSR and /CSW signals over a few clock cycles on either end of a read or write, plus a few cycles for the operation, the max rate is about 80ns per operation, or 12.5MHz. However, there are no other restrictions, i.e. accesses per scan line, etc. that the 9918A must impose. The 9918A has a 200ns access time for the /CSR and /CSW, which is 5MHz, however it cannot sustain that theoretical 320 accesses per scan line. The F18A can sustain about 800 accesses per scan line (about 200K per frame, or 12.5MB/sec (hence the 12.5MHz stated above)). The F18A GPU has fewer restrictions on timing and can access VRAM faster. On 7/7/2023 at 2:06 PM, mrvan said: The trade off of resources to performance in my implementation are significant and I wonder if the gains will pretty much be lost in using the F18A. Your gains appear to be on the host-side, so those would be gains regardless of there being a 9918A or F18A in the system. As long as you are overrunning the 9918A, then your gains are realized with either VDP. On 7/7/2023 at 2:06 PM, mrvan said: I thought I read it has some sort of hardware scroll as well. As Asmusr pointed out, it does have scrolling registers, but there is a bit of software house-keeping to manage the data that is scrolling in/out of the page, etc.. A GPU program (similar to a modern "shader") that your library loads could help make an interface for scrolling that is easier on the programmer and lib. A GPU shader (see how I'm using modern terms with old-school tech? ) could also be used to help with your lib in some way, maybe. Depends on what you are trying to do, and if you want to restrict yourself to F18A only code. But, if you are using T80, then you are already F18A-bound. 2 Quote Link to comment Share on other sites More sharing options...
mrvan Posted July 16 Author Share Posted July 16 On 7/13/2023 at 1:41 PM, matthew180 said: Mostly. As Tursi mentioned, is has been shown that the 99/4A can only overrun the VDP in limited cases anyway. The F18A does have limits though, since it has to synchronize the /CSR and /CSW signals over a few clock cycles on either end of a read or write, plus a few cycles for the operation, the max rate is about 80ns per operation, or 12.5MHz. However, there are no other restrictions, i.e. accesses per scan line, etc. that the 9918A must impose. The 9918A has a 200ns access time for the /CSR and /CSW, which is 5MHz, however it cannot sustain that theoretical 320 accesses per scan line. The F18A can sustain about 800 accesses per scan line (about 200K per frame, or 12.5MB/sec (hence the 12.5MHz stated above)). The F18A GPU has fewer restrictions on timing and can access VRAM faster. Your gains appear to be on the host-side, so those would be gains regardless of there being a 9918A or F18A in the system. As long as you are overrunning the 9918A, then your gains are realized with either VDP. As Asmusr pointed out, it does have scrolling registers, but there is a bit of software house-keeping to manage the data that is scrolling in/out of the page, etc.. A GPU program (similar to a modern "shader") that your library loads could help make an interface for scrolling that is easier on the programmer and lib. A GPU shader (see how I'm using modern terms with old-school tech? ) could also be used to help with your lib in some way, maybe. Depends on what you are trying to do, and if you want to restrict yourself to F18A only code. But, if you are using T80, then you are already F18A-bound. Thanks Matthew180. Excellent feedback. I’m looking forward to trying out the f18a after I get some ram. All the demos I have req the memory expansion as does my personal code base. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.