Jump to content
IGNORED

TMS9900 CPU core creation attempt


speccery

Recommended Posts

Update at: https://hackaday.io/project/20826-tms9900-compatible-cpu-core-in-vhdl/log/58301-cpu-was-synthesis-passed

 

I added STCR and LDCR instructions. These are messy. DIV and MPY are still missing. But those things are beside the point - I did an ad-hoc totally buggy integration of the CPU core to my TI-99/4A FPGA design - and it did pass synthesis! On the very first time! Wow! Now I only have to make it work (sigh).

 

FPGA synthesis statistics are included in the blog entry I linked to above.

  • Like 2
Link to comment
Share on other sites

Wow! What a milestone. From reading you have implemented a complete TMS9900 cpu including the instructions that are not supported by the 99/4A.

To be honest the readings are a bit above my skill level.

Good luck for debugging it to run. :)

  • Like 1
Link to comment
Share on other sites

What prefetch? The one from the 9995?

If you are going to make a new processor and have already added commands that weren't on the 9900... why not add a prefetch?

 

The HD6309 was fully compatible with the 6809 on startup.

Then you flip a bit and it enables the prefetch and some other things.

Still compatible but faster when you need it.

Just a suggestion.

  • Like 1
Link to comment
Share on other sites

Wow! What a milestone. From reading you have implemented a complete TMS9900 cpu including the instructions that are not supported by the 99/4A.

To be honest the readings are a bit above my skill level.

Good luck for debugging it to run. :)

 

 

To be honest it is not yet a complete TMS9900, since it lacks 1) interrupt support 2) MPY and DIV instructions. And of course number 3 is to make all of this actually work - the data path works etc already but flags I am pretty sure are still bogus. In other words, I have not yet added anything that a TMS9900 wouldn't have. But from here the additional things should be easy, I will first try to make my current processor do something on the FPGA chip.

Link to comment
Share on other sites

If you are going to make a new processor and have already added commands that weren't on the 9900... why not add a prefetch?

 

The HD6309 was fully compatible with the 6809 on startup.

Then you flip a bit and it enables the prefetch and some other things.

Still compatible but faster when you need it.

Just a suggestion.

 

Just to say again - there is no added functionality (yet). I think before prefetch a more interesting addition would be a memory cache.

It is a good idea to extra features, which can be enabled at will.

Link to comment
Share on other sites

It depends on your goals. If you want to create an emulation of the 9900, prefetch does not belong to it (but to the 9995). So you could directly go for the 9995, or maybe for the 99000.

 

My first goal is to have a TMS9900 instruction set compatible CPU. My instruction decoder is already stricter than what you have on the TMS9900, for example on the TMS9900 RTWP instruction is >038X (where X can be anything) but on the TMS9995 RTWP is >0380. So my decoder only accepts >0380.

While I will hopefully have instruction set compatibility, I do not have a timing compatibility goal. I'm designing this to run at 100MHz, but as I have not run it on the real hardware I don't know if I reach that goal. Should be possible.

Link to comment
Share on other sites

I do not have a timing compatibility goal. I'm designing this to run at 100MHz

 

Hmm... actually, why?

 

I mean, I understand why the F18A GPU runs at 100 MHz. You seem to be targeting a CPU at that speed. Just as a challenge? Mind that your TI hardware will not cope with speeds noticeably higher than the current 3 MHz (timer loops, device timings etc.).

  • Like 1
Link to comment
Share on other sites

 

Just to say again - there is no added functionality (yet). I think before prefetch a more interesting addition would be a memory cache.

It is a good idea to extra features, which can be enabled at will.

Even if you just cache the register file that would make a noticeable difference.

 

A cache will not make any difference when accessing VDP RAM unless you have a very smart cache just for that.

 

Link to comment
Share on other sites

 

Hmm... actually, why?

 

I mean, I understand why the F18A GPU runs at 100 MHz. You seem to be targeting a CPU at that speed. Just as a challenge? Mind that your TI hardware will not cope with speeds noticeably higher than the current 3 MHz (timer loops, device timings etc.).

 

 

I should clarify that I am not trying to plug - at least initially - this CPU to the TI. It meant to be used as an embedded core in the FPGA. I already have the rest of the TI implemented inside the FPGA. That other logic already runs at 100MHz, and my VDP can accept new bytes probably at least at 25MHz, which is way faster my current core could do.

  • Like 1
Link to comment
Share on other sites

Even if you just cache the register file that would make a noticeable difference.

 

A cache will not make any difference when accessing VDP RAM unless you have a very smart cache just for that.

 

 

Yes, the register file cache would make a dramatic difference. But it is a little nasty to implement: workspace changes would need to invalidate the cache, many TI programs use byte wide instructions to only update a byte within the register file, etc. So a "simple" implementation only doing 16-bit wide word operations on register operands would not work, the register cache tag entires would in practice need to be 15-bit wide to capture all the scenarios.

Again my VDP inside the FPGA already runs at 100MHz, so it goes very fast.

Link to comment
Share on other sites

Actually, a register cache would not make a lot of difference if an external memory was 10ns (talking about SRAM here). And if all the memory is block RAM, a cache won't help at all. It might help if an external SDRAM was being used.

For a register cache, it's small enough it should fit in the FPGA.

Link to comment
Share on other sites

For a register cache, it's small enough it should fit in the FPGA.

 

It is not a matter of fitting in the FPGA. If the main RAM is being implemented with the FPGA's "Block RAM" or external 10ns SRAM (this design is running at 100MHz), then a cache will not speed things up since the RAM is just as fast as the small memory or flip-flops that would be used to implement a cache. There is zero speed gain, but a lot of extra complexity. However, if the main RAM was being implemented in a slower memory (any SDRAM, DDR2, DDR3, DDR4), then yes a cache might help some.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

First actual run of the TMS9900 CPU core!!

 

https://hackaday.io/project/20826-tms9900-compatible-cpu-core-in-vhdl/log/59624-first-successful-run

 

There is also a (quick-and-dirty late-at-night) video of the beast in action. Resolution is not fantastic, as my upload speeds are not that fast - and it is late. So I went with 960x540.

 

https://youtu.be/vn1OICqWQSo

 

This is too late for the retro challenge, my real life has been very involved lately... It is also too little, as it does not fully implement the TMS9900 yet (missing interrupts, DIV/MPY - and a truckload of debugging). So I cannot run actual TI ROMs on it. But surprisingly it does run my test code and at least it shows that the TMS9900 VHDL core can initialise and use my TMS9918 core. That may not look as much, but it actually is quite a lot of functionality.

 

The CPU is also completely unoptimised and the memory cycles for instance are way too long, but it still seems to run a lot faster than my TMS99105 based TI-99/4A clone. This can to an extent be seen on the video. That should be the case, as on the TMS99105 I run the CPU at 20MHz (5MHz cycles but 20MHz states), and the FPGA CPU runs at 100MHz state transitions. The memory cycles now take 90ns, which is ridiculously slow given that the FPGA board has 10ns SRAMs. I just threw in enough wait states to make sure the memory interface is stable, and I can trigger peripheral accesses reliably. I think at the end memory cycles could be done in 20ns or 30ns, so there is a ton of opportunities for optimisation.

 

The code is on GitHub.

  • Like 6
Link to comment
Share on other sites

Actually, if you are using external SDRAM then your memory access will never be better than about 70ns without a Block-RAM or SRAM-based cache (even then you may have a 70ns latency for a cache miss). I wonder if the Xilinx tools have an SDRAM with cache controller wizard?

  • Like 1
Link to comment
Share on other sites

Actually, if you are using external SDRAM then your memory access will never be better than about 70ns without a Block-RAM or SRAM-based cache (even then you may have a 70ns latency for a cache miss). I wonder if the Xilinx tools have an SDRAM with cache controller wizard?

 

 

The FPGA board I am using currently has static RAMs, two 256K x 16 bit chips forming up a 32-bit databus. The SRAMs have a 10ns access time. So with this board there is no reason to use that many wait states, there are other circuits for the board running at a 50MHz memory clock.

 

My other primary FPGA board does have SDRAM, but using the Xilinx hardwired LPDDR SDRAM controller block it can run at crazy speeds, basically it supports 64-bit transfers at 100MHz or 800Mbytes per second transfer speeds once the burst is setup. There is no cache in there as far as I know. Anyway for now I am not planning to use this board, I'll stick to the SRAM based board.

Link to comment
Share on other sites

Ah, well, using SRAM then you do have a lot of room to improve the memory access. :-)

 

I'll have to give the Xilinx SDRAM controller a try. It looked very confusing and complicated the first time I started messing with SDRAM, so I rolled my own controller based on a simpler design I found online (hamster works). The problem with the burst mode and 64-bit transfers is that the 9900 does not really have that kind of access pattern, and you need a memory controller and cache to even begin to take advantage of any memory access over 16-bits. For completely random access, the read access time is still about 70ns on even the fastest SDRAM.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...