Jump to content
IGNORED

Introducing picocart - it works


speccery

Recommended Posts

I got today the board working as an extended basic cartridge. Rather embarrassingly I wasted at least an hour wondering why the heck I could not get neither address or data reads working properly - as I worked my way eliminating various possible reasons, I realised I had forgotten to solder a good part of the pins of the Raspberry Pi Pico. Let's not tell anyone that, such a stupid mistake. It does explain why data pins were not really working...

 

Once through with that, I started to play around with PSRAM. I have wired it to the SPI bus, chained to the micro-SD card slot. I also have connected it in a way that QPI mode (quad bit, i.e. four bit wide) can be used. I have been curious to see how this works in practice, so instead of using hardware SPI I decided to write a simple bit bang implementation SPI and then extended it to also work in QPI mode.

 

As background, the PSRAM stands for pseudo dynamic RAM. It is a dynamic RAM chip, but in self refresh mode, so it looks like a static RAM. The benefits are low cost and high density. It comes with some restrictions - you cannot leave chip select active for extended periods of time, as that prevents the self refresh from working. PSRAMs come in many different packages, but this 8 pin package is very attractive for these hobby projects. The package is small and easy to solder in my opinion.

 

I wrote routines with bit banged SPI to read chip ID, read data block, write data block and change mode to QPI. I also wrote a routine to read in QPI mode and to transfer back from QPI mode to SPI. I got all of the aforementioned working. QPI mode works similarly to SPI mode: you issue a command such as read (8 bit command code), followed by a 24 bit address. These are sent 4 bits at a time. Once done, the direction of the 4-bit wide data bus is changed, and four more clocks are issued to give the PSRAM the time to read its memory cells. After this bytes are read 4 bits at a time, so two clocks per byte. The maximum clock rate is 66 MHz (or using fast read command even 133 MHz). I am still very far away from that,  after sending the command and address I am currently able to read a byte in about 224 nanoseconds. So the current data rate is just over four megabytes per second. This is much faster than the TI's cartridge bus, but the issue is the latency: in order to read a single byte, we pull chip select low, issue eight cycles delivering four bits at a time (8 bit command + 24 bit address), then another four dummy clocks for the memory to do its thing, and then start to receive data. For the first data byte this means another two clocks. So in total it's 14 clock cycles. If more bytes are needed we can just keep clocking and the chip provides data bytes form the following addresses (there are some constraints such as page boundaries, but otherwise it's a bit like reading GROM or the VDP).

 

If I was able to run this at something like 66MHz this would go very fast, but currently as I mentioned I get a byte in about 220 nanoseconds, so the actual clock is something like 10MHz per nibble. I realise I did not choose optimal pins for this operation, it would have been better to not try to connect the PSRAM to the SPI lines at all if I am going to go with the quad mode. Oh well. With the current pins I need to do a fair amount of masking and oring the bits together to build nibbles and bytes. Anyway it's a start, I will continue to experiment how fast I can make the software QPI go with the Pi Pico. I am still running the Pico at the stock clock of 133MHz. If I am not able to make this work, I don't plan not to give up the idea of using the PSRAM chip but rather try to overclock to 250MHz, use yet another processor or small FPGA to help drive the PSRAM. It's a fun little exercise, hopefully going fast enough at the end.

  • Like 4
Link to comment
Share on other sites

Jumped into surgery and some code optimisation: I changed the wiring of the PSRAM chip to reduce amount of bit mangling needed in the code. I first removed the chip, bended some legs and added capton tape to isolate the pins I was going to wire:

image.thumb.jpeg.18c51ba47badb332477a848e9e220609.jpeg

Then it was time resolder the chip:

image.thumb.jpeg.57c678ad29bc44c2560a74678bb76d97.jpeg

And then as the final step add the wiring better for QPI mode:

image.thumb.jpeg.a040db24491845f0303b78702c33ad24.jpeg

Positive result, this worked just fine. Performance of the code improved by 60%, and is now close to 6 megs per second. Still not sure if this is enough to handle the reading of the first first byte in real time. To test that, the PSRAM handling code needs to be interleaved with the code handling the cartridge bus. It will be a mess and almost certainly not fast enough, currently reading a single byte as a standalone operation takes almost one microsecond. That's due to the address setup needed, but also the function call overhead in my benchmark will have an impact. I think it should be faster when the code is embedded into the real thing.

  • Like 7
Link to comment
Share on other sites

I'm not sure what is doing this, I think it's the Pico SDK, but because I overclock to 250MHz, clk_peri was being set to 48MHz (USB speed I think, meaning my SPI was limited to 24MHz [half]).  Now that I've changed the clk_peri back to 250MHz, I have tried running 125MHz & 62.5Mhz for the PSRAM and they both fail.  Maybe because I've got an SD card on the wires.  The best I can get is a divide by 6 (41.6MHz). But that's actually nearly double the 24MHz I was being limited to before.  My current paging is getting a lot better, but it's still 1.5 microseconds for a 1 byte access.  1.9ms for a full 8KB page swap.

 

I reckon you're on a winner, the numbers you are getting should work.

  • Like 1
Link to comment
Share on other sites

I followed the guy's advice from here (a commentator, and if I knew more about PIO on the Pico, I'd implement this whole project - but it's all a mystery to me)...

 

https://github.com/polpo/rp2040-psram/issues/5

 

The trick is to start writing the address in one mode, but switch to a different mode when reading back a byte (the exact timing is documented in the PSRAM .pdf)

 

I'm now at 62.5MHz reliably, like he said:

 

image.thumb.png.65e8b17cf393fb20840a12cebecb1511.png

 

I reckon, if I could just get this Pico PIO project working in mine, that time would halve and would be a fighting chance.  I've also picked the wrong pins though, this PIO project requires some pins to be next to each other.  The owner also teases about a QPI version, but says he doesn't need it, so won't do it.

 

*Edit: Oh, I took out the start-before-A15 goes low code, because it was complicating things and really not at all needed.

Edited by JasonACT
  • Like 2
Link to comment
Share on other sites

6 hours ago, JasonACT said:

I followed the guy's advice from here (a commentator, and if I knew more about PIO on the Pico, I'd implement this whole project - but it's all a mystery to me)...

 

https://github.com/polpo/rp2040-psram/issues/5

Thanks for the link and comments! I think I will continue my software PSRAM adventure a bit more. I will pretty likely be out of town for a few days, so advances in the project have to wait.

But I did make an updated PCB design already. I am tempted to send it to manufacturing, so that I could perhaps get it already next week. Followed your "tip" and changed the 74VLC1G125 driver to a MOSFET to simplify. I was planning to use the BS170 MOSFET I have used earlier years ago, but then I realised it is almost the same which you used, the 2N7000, so went with that instead :) 

  • Like 2
Link to comment
Share on other sites

2N7000 is one of my favourites.

 

I spent some time studying the Pico PIO library, I think I now get it...  So I re-wrote his SPI lib so it was much simpler, no DMA but still optimised with knowing the PIO FIFO size and where I need to pause, and I control the *CS pin (so there's no problem having that on any GPIO now) and his GPIO drive & slew rate settings seem to be quite important, and luckily don't affect the SD card access.  I'm now at 125MHz for single byte reads, 8KB accesses still go through the 62MHz version when loading a cart though.

 

image.thumb.png.0b7858000655a5b8828e7418f98de9ff.png

This is Parsec running, with me holding READY to slightly extend the read accesses.  I'm at 500ns now, but latency makes it fail without READY being controlled.  I did get to play a whole game though, with only 1 glitch happening.  That's A14 on the bottom too.  You can see it is sometimes very late to start the read, with an extra extended A15 high area.

 

Since I now understand the PIO stuff, I can tell you, you would like his QPI PIO code for the state-machine, it's nearly ready for use...  The only issue I see is, you would need to specify 1 less bit in the number of bits to read (he couldn't get an extra instruction in there to decrement the y counter, like he does the x counter).  It only would take a tiny bit of C code to get that all working, and you would be looking at some super fast PSRAM accesses.  As long as the 4 GPIOs are in order/sequential.

 

I just don't have the 2 extra GPIOs needed on my board.  I've got the peripheral interrupt GPIO, but I can't find another to complete this.

  • Like 1
Link to comment
Share on other sites

3.5 days later...  I've tried so many things to get this working, all the waveforms look good, but it was still crashing without me adding wait states with the READY signal.  Last night I gave up, content that I had tried everything, I would need 1 extra wait state on the first (low) byte access for cart ROM areas above 16KB.  I slept well, knowing that original carts would work exactly how they should, only new larger programs would be affected and only in a minor way.

 

New day today, I found myself cleaning up the Pico code, again finding a cycle here and there..  While there were no crashes, I could see it wasn't quite doing the right thing sometimes (I must have been so close).  FCMD's DIR command (which was crashing in previous days) would end the directory read early 1 in 40 times.  I looked at the logic analyser and I could see after a CRUCLK to turn off my DSRROM the timing was out a bit.  That part of the Pico code does practically nothing though, there was no room for improvement.  Or was there?  I was reading about GCC's computed gotos (&&label) and I replaced the switch with a goto into a dereferenced table of labels - I also put that table into scratch_x (CPU1's special memory)...  The damn thing works now.

 

Signals:

0=PSRAM_EN

1=DATA-DIR

2=READ-LOW-ADX

3=A15

4=CRUOUT

 

Before: (my PSRAM getting the first byte ends only a tiny bit before A15 goes low - so most of the time it works, but not always)

image.thumb.png.1b57dcfa358875219e0750d4c8284c93.png

After: (I've somehow gained a huge advantage in my read start time, it looks the same as all the other good ones now)

image.thumb.png.116fde09aaa81ce7e369a48b33b8bebd.png

I decided to undo my changes and look at what GCC was doing with the switch: (Ghidra decompile)

image.thumb.png.2e188dc8a9f85366cdc726c0f417c32f.png

You can see, although I copy my function into scratch_x RAM, GCC had decided that its jump-table for the switch statement was ok to leave in (ever so slow) flash memory.  (0x2004.... [scratch_x] has a read reference into 0x1002.... [flash]  Oh, no!)

 

One other "massive" thing I had done in the days up to now, was to change the PSRAM read function, break it into two, you can see I've started it early (with the values I know at the time, read-command and two of three address bytes) and about 1/3 the way through, I get the Pico to sample the TI bus for the low-address byte, just in time for me to pass it to the Pico PIO state-machine implementing SPI to complete the PSRAM read.  Without that, it wouldn't have worked.

  • Thanks 1
Link to comment
Share on other sites

So, much testing has taken place tonight (ok, lots of games played) and...  Nothing to report, it's working without the crashes I've had in the past week.

 

Last thing tonight though, I've extended the DSR-ROM.  It now flips a CRU bit to turn everything off in my device (other than the DSR, and it seems I've got cycles to spare for "if" tests now) and it checks for >2000 low-memory being writable without me being enabled (and sets a status bit if "my" 32KB expansion RAM can be turned on when it isn't writable - if it is writable I put back the old value) then it tests if GROM @>6000 has an "AA" value (I can now plug in my only cartridge - the EA module - the interesting thing was, I needed to save the current GROM address before the test, then set it back [-1] once I'm done inside the reset vector) and finally, I see if the first 4 words of cartridge ROM are 0, and enable cart-roms in my device if they are (I can't test this further, other than to say, my EA module shows up with whatever ROM-only binary I've loaded now - so I think it's good - there's no ROM in that cart - my machine shows 0's when no real ROM is present - and I don't see any cart images where the first 8 bytes are all zero).

 

I think I've managed to get this thing "safe" now, considering it doesn't block the cartridge slot being used...  Or another 32KB memory expansion being plugged in...  Not really sure what to do if another speech synthesizer is there though?  Sadly, I'll never own another one of those now, there were a lot here in Australia I'm sure, but all are buried now.

  • Like 2
Link to comment
Share on other sites

On 7/9/2023 at 7:17 AM, JasonACT said:

3.5 days later...  I've tried so many things to get this working, all the waveforms look good, but it was still crashing without me adding wait states with the READY signal.  Last night I gave up, content that I had tried everything, I would need 1 extra wait state on the first (low) byte access for cart ROM areas above 16KB.  I slept well, knowing that original carts would work exactly how they should, only new larger programs would be affected and only in a minor way.

 

New day today, I found myself cleaning up the Pico code, again finding a cycle here and there..  While there were no crashes, I could see it wasn't quite doing the right thing sometimes (I must have been so close).  FCMD's DIR command (which was crashing in previous days) would end the directory read early 1 in 40 times.  I looked at the logic analyser and I could see after a CRUCLK to turn off my DSRROM the timing was out a bit.  That part of the Pico code does practically nothing though, there was no room for improvement.  Or was there?  I was reading about GCC's computed gotos (&&label) and I replaced the switch with a goto into a dereferenced table of labels - I also put that table into scratch_x (CPU1's special memory)...  The damn thing works now.

Congratulations! Must have been a great feeling to get this far after all the hard work. I have been out of network reach for a while. I ordered new PCBs before going out of town for a few days, those should be arriving tomorrow. I'm looking forward to see how those turned out. I know I ordered them in a haste without testing everything with the still brand new previous boards, but since I am on vacation I wanted to try to maximise the opportunities that I might have some spare time working on this project. 


Anyway, great to see that you have gotten this far!

  • Like 2
Link to comment
Share on other sites

Thanks Speccery!  It was a good feeling, especially seeing the PSRAM outpacing A15 by quite the margin in my implementation.  I've been looking for 8MB PSRAMs too, but it looks like I'll have to pay double the usual (low) price and pick up an unknown brand, or import from China - they are just not available here at the moment (out of stock everywhere).  I'm currently using an ESP-PSRAM16H that I pulled with my hot air station, but the '64H is the one to have for sure.

 

Anyway, 2MB will do until I can get stock - it's rare to see a cart image that's greater than 1MB anyway...  So I've split my implementation in two: 1MB to match the FinalGROM cart & 1MB for SAMS:

 

image.thumb.jpeg.511c83d030ed58c9a884762c95600d0f.jpeg

 

Happy days!  Now to clean up the code.

  • Like 3
Link to comment
Share on other sites

12 hours ago, JasonACT said:

Are there any threads here (or ideas from others) on what else you can use SAMS memory for?

 

image.thumb.jpeg.2fd7341b01524fde0dcd5cc875bcda7a.jpeg

 

I can see I've got the full command history in FCMD now, so that's nice.

Well, you can give my programming editor a spin. Besides SAMS it does require an F18a (and finalgrom cart). It’s a 64k bankswitched cart image
 

 

  • Like 3
Link to comment
Share on other sites

16 hours ago, JasonACT said:

Are there any threads here (or ideas from others) on what else you can use SAMS memory for?

 

image.thumb.jpeg.2fd7341b01524fde0dcd5cc875bcda7a.jpeg

 

I can see I've got the full command history in FCMD now, so that's nice.

 

I also have a text editor. There is a 40 column version for standard VDP chip systems. It uses SAMS to hold multiple files.

 

I would be interested in hearing about projects people wish they had for SAMS cards.

I am always interested in seeing what I can do with my little system. 

  • Like 2
Link to comment
Share on other sites

19 hours ago, JasonACT said:

Are there any threads here (or ideas from others) on what else you can use SAMS memory for?

 

image.thumb.jpeg.2fd7341b01524fde0dcd5cc875bcda7a.jpeg

 

I can see I've got the full command history in FCMD now, so that's nice.

My Gemini browser uses SAMS. 

 

ForceCommand applications can use SAMS, or be stacked in SAMS, if you write to the ForceCommand API. An example is FCMENU.

  • Like 3
Link to comment
Share on other sites

On 7/4/2023 at 5:12 AM, speccery said:

I realised I had forgotten to solder a good part of the pins of the Raspberry Pi Pico. Let's not tell anyone that, such a stupid mistake.

As long as we're not telling people about silly things, I will say, I've spent nearly 2 days trying to work out what I did wrong while cleaning up my code...

 

I've got a special timing loop on *WE going high (where I use a certain number of NOPs to get me past the current memory access):

 

label1:

do {} while (sample_pin (WE) == 0);

label2:

pause84 (); // 84ns

 

Which works well for all my old code

 

But the new code I was writing was doing a "goto label2;" in the PSRAM routines which I was also first testing for *WE to go high...  Makes sense right?  However, the goto statement was throwing the timing out enough to cause random issues.  Why did I even have a label2 in the first place?  It was the original 32KB code (which had no timing issues, being Pico RAM) which was just poorly coded, and which I copied for use with the PSRAM!

 

It's all fixed now though, but, sssshhhhh.

  • Like 4
Link to comment
Share on other sites

On 7/11/2023 at 9:28 AM, JasonACT said:

Thanks Speccery!  It was a good feeling, especially seeing the PSRAM outpacing A15 by quite the margin in my implementation.  I've been looking for 8MB PSRAMs too, but it looks like I'll have to pay double the usual (low) price and pick up an unknown brand, or import from China - they are just not available here at the moment (out of stock everywhere).  I'm currently using an ESP-PSRAM16H that I pulled with my hot air station, but the '64H is the one to have for sure.

 

That is great progress! I haven't had enough time to advance my own project, even if I have made some progress. I hopefully get back on track in the coming days.

 

For PSRAM chips, I have been using these chips which are readily available from mouser. I have purchased most of the components for my projects from them.


 

Edited by speccery
  • Like 3
Link to comment
Share on other sites

These came the other day, but I don't yet feel the need to desolder my 2MB one - still working through stuff that fits in that size at the moment.  I did pay AU$8.50 each and another AU$14 for postage (AU$30 all up, instead of the usual AU$3.50 each and $5 postage - so it was a rip-off)...  But they are listed as working in that guy's github @ 140MHz (I didn't know what I was going to get):

 

image.thumb.jpeg.120683782d9679eb925f489ab210816a.jpeg

 

In other developments, I've got TI device "RS232" & "RS232/1" going to the Pico Serial 2 - blocking reads + keyboard break (ALT4) and non-blocking reads using the interrupt & scratch pointers into VDP RAM, TE-II uses the int.  But TE-II encodes chars so isn't terribly useful these days :(  Having it all working well though, I've also implemented "RS232/2" using a server socket on the Pico (Port 2322 :) ):

 

image.thumb.gif.c8d5696c182cfc94c7e41dc9e523496c.gif

I have to type on the TI keyboard, or my USB one on the Pico, of course.

Edited by JasonACT
  • Like 1
Link to comment
Share on other sites

Sooo, I spent another whole day (and it was a Saturday, so I mean a whole day, not a partial [week] day) working on my latest blunder...  I only found out today what that was though, and it will remain private.

 

Late Saturday night, I had a good work-around, so Sunday meant more progress in testing and cleaning up code.

 

Today (late Monday here) I disabled my work-around to test other things and finally worked out what I'd done wrong... Timing is everything, is all I'll say, but having my work-around will make the whole thing far more solid.

 

So, back to SAMS.  There's actually not a lot that uses it, but I can get most things working...  I can't yet get AMSTIOPOLY working though.  It is not a .dsk file that I can extract from, and they are not 0x07TIFILES either.  Classic99 loads it though, so my cunning plan was to use an empty 360KB .dsk and use FCMD to copy the files within Classic99 to a .dsk image.  It worked for most files - but I got '07'? errors on Tin_h/i/j/k/m (not _l or the earlier ones).  Most were saved, and then converted to TIFILES using TIImageTool.  I patched the broken files in a hex-editor by hand (certainly enough for my loader to know how to read them) but it still doesn't load (and my implementation is not case sensitive, on the Pico, being FAT32 - so no help there).  It may be a padding issue (making the files slightly larger than they should be).  The AMS loader seemed happy enough, but pressing a key to start the program crashes.

 

Unsure where to post a bug report, since, who knows where the issue may be?  Is there a proper .dsk download somewhere of AMSTIOPOLY somewhere for me to try?

Link to comment
Share on other sites

2 hours ago, JasonACT said:

Sooo, I spent another whole day (and it was a Saturday, so I mean a whole day, not a partial [week] day) working on my latest blunder...  I only found out today what that was though, and it will remain private.

 

Late Saturday night, I had a good work-around, so Sunday meant more progress in testing and cleaning up code.

 

Today (late Monday here) I disabled my work-around to test other things and finally worked out what I'd done wrong... Timing is everything, is all I'll say, but having my work-around will make the whole thing far more solid.

 

So, back to SAMS.  There's actually not a lot that uses it, but I can get most things working...  I can't yet get AMSTIOPOLY working though.  It is not a .dsk file that I can extract from, and they are not 0x07TIFILES either.  Classic99 loads it though, so my cunning plan was to use an empty 360KB .dsk and use FCMD to copy the files within Classic99 to a .dsk image.  It worked for most files - but I got '07'? errors on Tin_h/i/j/k/m (not _l or the earlier ones).  Most were saved, and then converted to TIFILES using TIImageTool.  I patched the broken files in a hex-editor by hand (certainly enough for my loader to know how to read them) but it still doesn't load (and my implementation is not case sensitive, on the Pico, being FAT32 - so no help there).  It may be a padding issue (making the files slightly larger than they should be).  The AMS loader seemed happy enough, but pressing a key to start the program crashes.

 

Unsure where to post a bug report, since, who knows where the issue may be?  Is there a proper .dsk download somewhere of AMSTIOPOLY somewhere for me to try?

if you do go the SAMS route, will it be limited to 1MB or are you targeting 4MB as well? FWIW classic99 supports up to 32MB.
EDIT: oh and just to add. Might be an exception in the round, but do run into the SAMS 1MB limit from time to time with my editor.

Edited by retroclouds
  • Like 2
Link to comment
Share on other sites

Short update...  I tested my hand edited "TIFILES" version of AMSTIOPOLY in Classic99 and they work..  So no need for a proper .dsk version.

 

Re-thinking the SAMS Pico C code though, I may be able to get better timing if I limit it to 1MB (one byte hardware register on a '612 chip).  I can't find anything I'd use that needs more than 1MB anyway.  So, with an 8MB chip, 7MB for Carts & 1MB for SAMS?

Link to comment
Share on other sites

1 hour ago, JasonACT said:

Short update...  I tested my hand edited "TIFILES" version of AMSTIOPOLY in Classic99 and they work..  So no need for a proper .dsk version.

 

Re-thinking the SAMS Pico C code though, I may be able to get better timing if I limit it to 1MB (one byte hardware register on a '612 chip).  I can't find anything I'd use that needs more than 1MB anyway.  So, with an 8MB chip, 7MB for Carts & 1MB for SAMS?

nothing for sams uses more than 1mb..  so that should be sufficient

 

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...