Speed-Programming Karri's Flashcard on the Lynx

MichelS · January 19, 2022

I had some spare time recently to test an idea about programming Karri's 512K flashcard directly from the Lynx.
It's known for some time now that it can be done, but it's also known to be so damn slooooow...

With a (really) simple adapter cable, i.e. no electronics involved, programming time can be brought down significantly.

Spoiler

I ended up with 4min 30s for writing a 256K rom to flash

For building a prototype cable like mine (see attached image), a spare cartridge connector from a broken Lynx is needed and a cartridge to donate part of a PCB to solder some wires to.

A warning first: I initially wrote most of this for my own notes, so it's a really long post. Maybe i should start a blog instead...
And to tell the truth - the PiHat programmer and the upcoming BennVenn JoeyLynx cart flasher are faster programmers than the Lynx ever will be, but hey - where's the fun?

So, why is progamming the flashcard from the Lynx so slow?
Reason is: the flash chip used and the Lynx cartridge interface do not fit together well when it comes to writing data to flash.
For writing a single byte, a "byte program" command must be sent to flash first... for every byte... again and again.

The byte program sequence is this:

1) write 0xAA to address 0x5555
2) write 0x55 to address 0x2AAA
3) write 0xA0 to address 0x5555
4) write data byte to its destination address

Since the Lynx cartridge interface is designed for sequential access, this random access sequence hurts writing performance a lot.

Even with optimisations like
- skipping 0xFF bytes (no need to write 0xFF into an empty flash)
- switching the display off (this not only saves ticks otherwise used for video DMA, but the "rest" bit in IODAT can be set to 0 when shifting bits into the address shift register, enabling very compact code)
- advancing the address counter by 4 with a single instruction wherever possible,
- doing the least possible number of shifts (see below),
- using 1Mbaud serial transfer rate to get the data into the Lynx,
- unrolling loops
- etc.
writing my 256K testrom image "Batman returns" to flashcard wouldn't get much faster than 18min 24s (=1104s).

Key idea for speeding up the writing process and reason for the cable is to swap address lines as to minimize the usage of the address counter.

To understand this, one needs to look at the default wiring of the address lines first:
Address bits 0-10 are connected to the counter, bits 11-18 to the shifter.

Step 1 of the sequence is writing to address 0x5555, which is

    shifter               counter 
1 1 1 1 1 1 1 1   1
8 7 6 5 4 3 2 1   0 9 8 7 6 5 4 3 2 1 0
---------------------------------------
x x x x 1 0 1 0   1 0 1 0 1 0 1 0 1 0 1
        |_____|   |___________________|
          =10             =1365
        |_____________________________|
                   =0x5555

When shifting bits into the shifter, previous content of the shifter is moved to the high order bits.
Fortunately, the flash chip ignores address bits 15-18 (marked 'x') for the byte program sequence, so the content of the four high order bits doesn't matter.
Thus, for step 1 of the byte program sequence it's enough to
- shift '1010' into the shifter (4 shifts)
- count to 1365
- write 0xAA

Next, address 0x2AAA is

    shifter               counter 
1 1 1 1 1 1 1 1   1
8 7 6 5 4 3 2 1   0 9 8 7 6 5 4 3 2 1 0
---------------------------------------
x x x x 0 1 0 1   0 1 0 1 0 1 0 1 0 1 0 
        |_____|   |___________________|
          =05             =682
        |_____________________________|
                   =0x2AAA

It's an inverted bit pattern from the previous address 0x5555. With the contents of the shifter from the previous step, it's enough to
- shift '1' into the shifter (the '1' previously at bit position 14 shifts to bit position 15 and is ignored)
- count to 682
- write 0x55

And finally: writing to 0x5555 again is similar (inverted again)
- shift '0' into the shifter
- count to 1365
- write 0xA0

At last, the shifter and counter are set to the destination address and the byte is written to flash.
For this, another 8 shifts are needed to set the page index, and an average of ~512 counts to set the page offset of the destination address.
(For a 256K rom, the pagesize is 1024 bytes, i.e. the counter for the destination address can have any value between 0 and 1023 => ~512 on average)

Doing the math, it's clear why programming the flash chip from the Lynx is slow:
For writing a single byte of a 256K rom, 4+1+1+8 = 14 shifts and 1365+682+1365+512 = 3924 counts are required on average.
And this needs to be repeated 262144 times (minus the number of 0xFF bytes)...

Clearly, it's the counter that hits performance quite hard.
The shifter behaves well since 0x5555 and 0x2AAA have this inverted bit pattern.
Once the initial '1010' pattern is loaded into the shifter, a single bitshift is sufficient to toggle the entire (relevant) content for the next address of the sequence.

So the idea to speed up the writing process is to swap address lines such, that for the byte program sequence:
1) counter usage is minimized
2) utilization of the shifter is maximized

One possible pattern for address swapping is this:

       flash chip address bit
1 1 1 1 1 1 1 1   1 
8 7 6 5 4 3 2 1   0 9 8 7 6 5 4 3 2 1 0
| | | | | | | |   | | | | | | | | | | |
1 9 8 7 3 6 2 5   1 4 0 1 1 1 1 1 1 1 1
0                       8 7 6 5 4 3 2 1
 lynx cartridge connector address bit

The lowest 8 address bits (#0 to #7) for the flash chip are taken from the shifter: 11-18.
Four (remaining) address bits, which need to be '1' when setting address 0x5555 (#8, #10, #12, #14) are taken from the lowest counter bits 0-3.
Three interleaved bits, which are '1' when setting address 0x2AAA (#9, #11, #13) are mapped to the next free counter bits 4-6.
The remaining 4 bits, not used in the byte program command, are mapped in sequential order to counter bits 7-10.

With this wiring scheme, address 0x5555 at the flash chip is set through:

   shifter               counter 
1 1 1 1 1 1 1 1   1
8 7 6 5 4 3 2 1   0 9 8 7 6 5 4 3 2 1 0
--------------------------------------- 
0 1 0 1 0 1 0 1   0 0 0 0 0 0 0 1 1 1 1

Address 0x2AAA at the flash chip is set by:

   shifter               counter 
1 1 1 1 1 1 1 1   1
8 7 6 5 4 3 2 1   0 9 8 7 6 5 4 3 2 1 0
---------------------------------------
1 0 1 0 1 0 1 0   0 0 0 0 1 1 1 0 0 0 0

And the byte program sequence thus becomes:

- shift '01010101' into the shifter (8 shifts)
- count to '1111' = 15 (flash address is 0x5555 now)
- write 0xAA

- shift in '0'
- count to '1110000' = 112 (flash address is 0x2AAA now)
- write 0x55

- shift in '1'
- count to '1111' = 15 (flash address is 0x5555 now)
- write 0xA0

- set destination address and write byte

In total, this is 10 shifts and 142 counts for the byte program command.

There's a downside also.
Swapping address lines means: if the Lynx sets address X at the cartridge connector, the address at the flash chip is some Y.
Unfortunately, this also applies to the destination address for the data to be written.
Thus, if shifter and counter are set to address X and romdata[X] is written, it ends up in the wrong place Y.
To account for this, either the destination address must be adjusted in software on the Lynx to revert the hardware address line swapping, or data needs to be rearranged.
Rom rearrangement is the faster option: in the rearranged rom, romdata[Y] is simply located at address X (instead of Y).
Therefore, when setting a destination address X via shifter and counter, romdata[Y] is written.
The swapped address lines make romdata[Y] end up at its correct address Y.

Since the Lynx cannot hold the entire rom content in ram, data is sent to the Lynx in packets to be written to flash.
A program running on PC is needed to provide "data on demand" via serial connection.
So, rom rearrangement can be done on the PC and on the fly.

But there's one catch: The rearranged rom always covers the full 512K address space. This is due to the way the address lines are swapped.
If the original rom is smaller, it must be expanded to 512K first (using 0xFF for unused space) before being reordered.

Fortunately, expanding and reordering is just a few lines of code and takes only seconds on PC.
Transferring the full 512K to the Lynx also takes less than 10s at 1Mbaud.
And finally, the inserted bytes during expansion will be 0xFF and thus not be written but skipped.
All this doesn't hurt much.

What does, is the original data being spread over the full 512K address range.
It means, the pagesize for writing to flash is always 2048 and the average number of counts for setting the destination address is always 1024, no matter what size the original rom had.

So, the total balance for writing a single byte becomes:
shifts: 8+1+1+8 = 18 (against 14 without address swapping)
counts: 15+112+15+1024 = 1166 (against 3924 without address swapping)

From these numbers, the programming time can be expected to decrease significantly.
And indeed - the measured time for the 256K testrom "Batman returns" goes down from 18min 24s (=1104s) to 6min 44s (=404s).
Usable already... at least when considering the time it would take to dig out the Pi and PiHat from the basement.

Now the bottleneck is the count required to set the destination address, which takes 1024 of the 1166 counts on average.
Thinking about it now, it should be easy to bring this down to 512 (or even 256?)...

... where's my soldering iron?

Ok - that was simple.
Now, flash address pin 18 is wired to AUDIN. Counter bit #10, to which it was previously connected, is unused now.
This means the counter goes up to 1023 max. when setting the destination address, i.e. 512 on average.
For page offsets greater than 1023, audin acts as "pseudo counter bit 10" and is set in software.
Code isn't as clean as before, since audin has to be unset also whenever the shifter is triggered.
(Shifting bits resets the counter in hardware, so pseudo counter bit audin needs to follow.)

But the result is nice.
The total balance for writing a single byte is finally:
shifts: 8+1+1+8 = 18 (14 without address swapping)
counts: 15+112+15+512 = 654 (3924 without address swapping)

... and the time to write "Batman returns" to flash comes down to: 4min 30s (=270s).

I guess that's as far as i want to go for now.
Maybe i'll wire SWVCC to flash address pin 17 one day. That would bring the counts further down to 398.
Plotting my measured times vs. the average counts from above shows an almost perfectly linear relationship, so the extrapolated time for this would be 3min 26s then.
Is yet another 1 minute improvement worth the effort?
Code would definitely get ugly...

Edited January 19, 2022 by MichelS

42bs · January 19, 2022

Wow! Did not think about the $55 <> $2A trick.

+karri · January 19, 2022

Wow! Interesting reading.

Perhaps it is time to look into some better chips for self-programmable flash carts.

Or perhaps I just dig out the two SRAM carts with the 1F cap that I used years ago. With the high speed ComLynx loader and 256k it might be the fastest development cycle tool I have. I belive I bought them from Lars Baumstark or Bastian Schick. Anyway, they were really nice tools at the time. It is just that I had forgot about them.

42bs · January 19, 2022

@karri the SRAM card is still in use here. My only problem is that the 1F GoldCap is somewhat damaged. It does not hold the SRAM long outside the Lynx.

FRAM would be nice, but I did not find 5v types and they are expensive.

I haven't checked yet the write speed of the GD.

Something like a SRAM-card with "boot-sector" would be cool.

But the 29F types do not need the 3 jump, so could be written much quicker. Karri, did you try the PROGRAM BYPASS mode with the PiHat?

+karri · January 19, 2022

1 hour ago, 42bs said:

@karri the SRAM card is still in use here. My only problem is that the 1F GoldCap is somewhat damaged. It does not hold the SRAM long outside the Lynx.

FRAM would be nice, but I did not find 5v types and they are expensive.

I haven't checked yet the write speed of the GD.

Something like a SRAM-card with "boot-sector" would be cool.

But the 29F types do not need the 3 jump, so could be written much quicker. Karri, did you try the PROGRAM BYPASS mode with the PiHat?

I will as soon as I get the proto batch to Finland. I asked them to use ST or Micron chips on it. It looks as a ST model.

Using FRAM's is like €12 for just 32k.

I did have an idea to create a cart with 2MB of SRAM and a Pi Zero 2. With a wireless mouse/kbd and a HDMI monitor I could develop stuff on the Lynx+Pi combo.
Basically I would just add the PiHAT chips on the cart. You could turn off power from the Lynx and let the Pi write the RAM. Then just power up the Lynx to run the cart image.

The SRAM chips for 1Mx8 are just €12. So two of these plus two SPI extenders and a place for a Pi Zero. And eeprom's of course. This should be a real time saver for working with larger than 512k games.

Edit: I just stumbled on MRAM's. Much cheaper and pretty cool. It would be interesting to make a 512k card with one of these. You could bootstrap it with some image with a downloader. After that just program it with your game (and keep the downloader as part of the game).

MichelS · January 19, 2022

Quote

But the 29F types do not need the 3 jump, so could be written much quicker.

Really? Which ones - Am29F040B? I haven't found anything on bypassing the JEDEC write protection in the datasheets. Am i missing something?

I've seen chips using addresses 0x55 and 0x2A, others with 0x555 and 0x2AA and the ones built into in Karri's flashcard that use 0x5555 and 0x2AAA for the byte-program sequence, though.

The 0x55, 0x2A ones would be pretty close to the minimum number of shifts/counts required for setting a random access address, i'd think.

You can get faster only with sequential access, i.e. if no command is required at all and you can simply write out data, (auto-) incrementing the address on each strobe...

Is there a 29F type that can be put into a "global programming" mode?

Edited January 19, 2022 by MichelS

+karri · January 20, 2022

8 hours ago, MichelS said:

Really? Which ones - Am29F040B? I haven't found anything on bypassing the JEDEC write protection in the datasheets. Am i missing something?

I've seen chips using addresses 0x55 and 0x2A, others with 0x555 and 0x2AA and the ones built into in Karri's flashcard that use 0x5555 and 0x2AAA for the byte-program sequence, though.

The 0x55, 0x2A ones would be pretty close to the minimum number of shifts/counts required for setting a random access address, i'd think.

You can get faster only with sequential access, i.e. if no command is required at all and you can simply write out data, (auto-) incrementing the address on each strobe...

Is there a 29F type that can be put into a "global programming" mode?

The current 2MB chip has another UNLOCK BYPASS mode (only Micron and ST chips) that allows a block to be programmed in 2 passes:

ADDR DATA
X A0

PA PD

So there needs to be no special addresses set. You still need to provide 0xA0 between every data byte you write. But of course you could first write all odd bytes, then all even bytes.

It is just that in my 2MB design the WE# pin is not available to the Lynx. By lifting up pin 16 (A19) and soldering a wire to GND from it you could disable A19.

Soldering a wire from CART1 to the #WE logic the chip would be Lynx writable.
Perhaps this could be a desired configuration option for the next batch?

But if we create a magnetic SRAM cart then we need level shifters and a 3.6V regulator on the cart. By soldering one wire to the Lynx we can swap carts without turninig off the Lynx. So boot the Lynx from a cart, swap in the MRAM cart, send content via ComLynx and let the Lynx program the whole memory sequentially.

MR2A08ACYS35 for around €25. The cart might be around €40. This is definitely the fastest Lynx programmable cart. Just single random writes anywhere.

The MRAM loses its content in 20 years. So it is not good as a cart.

If the cart image includes a downloader then you don't need to remove the cart at all.

MichelS · January 20, 2022

Thanks Karri,

this Unlock Bypass would be so cool!

The Lynx should be fast enough to do the two writes while the next byte is coming in over serial.

So no package requesting, buffering, etc. needed, which simplifies the downloader code.

Maybe some break handling when passing block boundaries...

Speed is only limited by the serial connection then, but we're talking about 3-4s for a 256K rom @ 1Mbaud here.

MRAM wouldn't give much benefit over this, except for writing data to cart that is already in Lynx ram.

Edited January 20, 2022 by MichelS

+karri · January 20, 2022

8 hours ago, MichelS said:

Thanks Karri,

this Unlock Bypass would be so cool!

The Lynx should be fast enough to do the two writes while the next byte is coming in over serial.

So no package requesting, buffering, etc. needed, which simplifies the downloader code.

Speed is only limited by the serial connection then, but we're talking about 3-4s for a 256K rom @ 1Mbaud here.

MRAM wouldn't give much benefit, except for writing data to cart that is already in Lynx ram.

Nice! Unfortunately I found out that I missed a VSS pin in the schematic symbol so I need to fix the design anyway. I will add solder jumpers for allowing the Lynx to control the write process. It wil limit the flash to 2 MB. Or actually... Perhaps there could be an option for two programmable banks? Then you could program 1M with the bootloader, flip the switch on the cart and let the Lynx program the other half. In this way you don't need to extract the cart at all. The Lynx could of course program either bank. Here is a concept of the next version.

There is still practical things I don't know. When I get the proto series I need to find out:

- does it matter that the SWVCC line messes with the chip write logic line all the time if CART0 and CART1 are at 1. I guess it does not matter.
- using CART0 as A19 poses the problem that can the address logic work correctly when the OE# drops some nanoseconds later

- is is ok to add a few pF to the CMOS logic output to delay OE# and WE# if we need to.

- I really need to write a cc65 driver for the 64k serial eeprom chip to see if it works with the Lynx at all.

Comments?

Well. Got to finish Wizzy and sell the first batch before I take the leap for this one. But I can modify the current boards to test these features.

I did add a solder switch for A20 (AUDIN). And all these switches could have some default connections so you need to cut the trace and solder the other side to change it. Perhaps the default could be PiHAT programmable 512k cart with 128 byte eeprom like the old cart.

~~I should really drag myself out of bed and get something to eat from the shop nearby...~~ Anyway. The next design is done.
Perhaps it needs a manual. Too many things to tweak?

[rant on]

All my shows are cancelled due to COVID restrictions. The same is true for rehearsals for future stuff. Bored...
Plus it is getting expensive to stay at home. I "invent" new stuff just to do something. Fedex just told me that my carts should be in customs real soon. The same is true for my new DMX lighting control boards. They should arrive a few days later. But what I am going to do with new lights. The already cancelled shows don't need them...
Looks like this goes on until Easter. At least.

[rant off]

Lynx / PiHAT:
Lynx = CART1 is the write strobe for the flash
PiHAT = SWVCC is the write strobe for the flash

64k/128:

64k = the new large eeprom in use
120 = the old 93C46 eeprom in use

none = no eeprom

A20:
AUDIN = AUDIN connected to A20
gnd = A20 connected to gnd

A19:

CART1 = CART1 is used for reading the second rom bank. It means Lynx does not have means to send a write strobe

gnd = A19 connected to gnd

none = A19 is dangling. You could connect a wire to the A19 pin and manually connect it to GND or 5V.

42bs · January 20, 2022

How about replacing the eeprom with an tiny ATmega? It sure could be programmed to act as eeprom plus if a gpio is free it can be used for bank switching.

Ok, strange idea ?

Edit: ATtiny comes in 8 pin with up to 512 bytes EEPROM inside.

Edited January 20, 2022 by 42bs

+karri · January 20, 2022

1 hour ago, 42bs said:

How about replacing the eeprom with an tiny ATmega? It sure could be programmed to act as eeprom plus if a gpio is free it can be used for bank switching.

Ok, strange idea ?

Edit: ATtiny comes in 8 pin with up to 512 bytes EEPROM inside.

Not a bad idea. It is really cheap and could replace the NAND gates also.

Edit: It has just 5 I/O pins. Not enough to replace the gates and eeprom.

Personally I like the idea that everyone knows what the chips are and how they are connected. An embedded AVR makes things bad. You don't know why things do not work the way you expect if someone has buggy code in the AVR.

And a full frontal image. Well, I did not have time for modeling the 3D versions of the chips. But I moved the flash chip down so that it does not obstruct the round cutout for the 3D cart shell.

42bs · January 26, 2022

Timing test with TurboUART and the SRAM Card, data send block wise, no direct writing of the received bytes: 50s

42bs · February 2, 2023

Increased speed: 5.8s / block: 25min per card (compared to previous 49min):

https://github.com/42Bastian/sram_up

Art · March 18

On 1/19/2022 at 7:19 PM, MichelS said:

I had some spare time recently...

It happens I'm in the middle of prototyping that address bus on protoboard, including the 74164, 4040, and the single 7404 inverter used to clock the counter.

The diode arrangement is interesting at the input of the 7404 inverter.

It occurs to me that the signal lines at both diode anodes must be high for the clock to remain high.

To toggle the clock one signal must remain high, while the other is toggled, and a diode will conduct.

Am I on the right track?

Edited March 18 by Art

MichelS · March 18

As i understand it, the diodes basically form a wired "AND".
The inverter input is pulled high by the resistor.
Any of the two CE/-lines going low will pull the inverter's input low via the corresponding diode.
I'm not an expert in electronics though...

Art · March 18

27 minutes ago, MichelS said:

As i understand it, the diodes basically form a wired "AND".
The inverter input is pulled high by the resistor.
Any of the two CE/-lines going low will pull the inverter's input low via the corresponding diode.
I'm not an expert in electronics though...

I didn't realise it was a bump from a year ago lol.

That's exactly how things look to me.

I assumed you'd have to know this from the software side if you're loading the shift register and incrementing the counter.

MichelS · March 18

From the software side you're not bothered with this too much.
Suzy provides two registers for reading (and writing) data from cart.
Reading from one of the registers will make Suzy strobe CE0, reading from other one strobes CE1.
This way only one of both lines can go low at a time, and Suzy takes care of timing.
After reading the data, the cart address is automatically incremented due to this diode circuitry going into the clk-input of the counter.
This way you read data sequentially for (up to) a complete "page" i.e. until the counter wraps around (depends on the cartridge internal wiring, if it uses 9, 10 or 11 bits of the counter).
The page is the upper 8 bits of the cartridge address and it's set with the shifter. Mikey provides another set of registers (or register bits) to set the databit and the clock-input of the shifter.
Clocking data into the shift register has to be done in software, so this is a bit of work to be done by the coder.
But each time a bit is shiftet in, the counter is reset automatically. You can see this in the schematics as the clk-input of the shift register is wired to the rst-input of the counter.

Edit: the programmer doesn't even have to care for the shifting - there's a function in rom for that which you can call if you're lazy.

Edited March 18 by MichelS

Art · March 18

Thanks 👍

This whole thread will be a good reference if there’s need for debugging the thing.

Igor · April 3

The JoeyLynx (https://k-retro.com/atari-lynx-cart-dumpers-programmers/65-78-bennvenn-joeylynx-lynx-cart-dumper.html#/28-kit_type-fully_assembled_kit) is able to program Karri's PCBs in about 20 seconds

42bs · April 4

5 hours ago, Igor said:

The JoeyLynx (https://k-retro.com/atari-lynx-cart-dumpers-programmers/65-78-bennvenn-joeylynx-lynx-cart-dumper.html#/28-kit_type-fully_assembled_kit) is able to program Karri's PCBs in about 20 seconds

Nice gadget. But adding taxes and customes, it comes quiet expensive (for EU folks).

Igor · April 4

3 minutes ago, 42bs said:

Nice gadget. But adding taxes and customes, it comes quiet expensive (for EU folks).

Unfortunately the case for all goods sent to EU now 😭

42bs · April 4

30 minutes ago, 42bs said:

Nice gadget. But adding taxes and customes, it comes quiet expensive (for EU folks).

I guess there is some uC inside. A version with ESP32 and WLAN would be cool. Thinking of this, a Flash card with ESP32 programmable inside the Lynx via WLAN would be cooler 🙂

Igor · April 5

19 hours ago, 42bs said:

I guess there is some uC inside. A version with ESP32 and WLAN would be cool. Thinking of this, a Flash card with ESP32 programmable inside the Lynx via WLAN would be cooler 🙂

That was a request for the 'dev kit' adapter for ElCheapoSD that BennVenn and us will be working on. The key functionality that would be required here is to reset the Lynx without resetting the flash cart, so the flash cart needs to be powered somehow.

Speed-Programming Karri's Flashcard on the Lynx

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members