Jump to content
IGNORED

I Found a Maria Silicon Bug


kevtris

Recommended Posts

I had kind of gotten off track on my other post, so I thought I'd make the post in here, since it's a programming issue.

 

The basics of what I have found tonight is simply that there's some kind of nasty bug in the Maria's DMA fetcher. The game Realsports Baseball's title screen exhibits this problem (as does Xenophobe, but not on its title screen).

 

I have done a bunch of research using three sources: MESS, my FPGA 7800, and a real live Atari 7800 connected up to a logic analyzer. Here's the sequence of images that go with the post, first:

 

First, I soldered some headers onto the 6502 in the 7800.

 

7800_headers.JPG

 

Next, I connected up the logic analyzer:

 

7800_hookup.JPG

 

Then I had to make an EPROM cartridge:

 

7800_cart.JPG

 

Here's the logic analyzer showing the defective DMA fetch:

 

7800_LA.JPG

 

And Finally, what it looks like if your DMA unit is *NOT* buggy:

 

rs_baseball_problem.jpg

 

 

The good news: I have confirmed that the Maria chip's basic DMA fetching cycle times are accurate.

 

The bad news: The Maria chip will fetch only 1 byte instead of the number specified in the DL header! I cannot seem to find much of a pattern to why it is doing this. At first I thought that it was shortening the same DL entries on one zone each scanline, but it is not. It's more complicated than this. In RS Baseball, the zones are all 8 scanlines tall typically (at least on the title screen).

 

The way RS Baseball makes up its title screen is it draws the field and then overlays the title graphic on top of it. On the first affected zone (middle of the logo where the right half is missing), it starts attempting to perform 14 DMA's. The first 10 DMA's are fetching 4 bytes each, and make up the baseball field, then the last 4 DMA's fetch the title screen. We're only interested in these last 4 fetches, since 2 of them will be shortened from 20 bytes, to 1 byte!

 

These 4 DMA's are as follows (address on the left is where in RAM the DL entries are sitting. The DL starts at 1A6Ch)

 

1A94: 28 2C 99 00 - DMA 20 bytes from 9928h

1A98: 50 2C A9 00 - DMA 20 bytes from A950h

1A9C: 3C 2C 99 50 - DMA 20 bytes from 993Ch

1AA0: 64 2C A9 50 - DMA 20 bytes from A964h

 

Here's the weird part: On the *first* scanline of the zone, the following occurs:

 

1A94: 28 2C 99 00 - DMA 20 bytes from 9928h <-- instead of 20 bytes, it only fetches 1.

1A98: 50 2C A9 00 - DMA 20 bytes from A950h

1A9C: 3C 2C 99 50 - DMA 20 bytes from 993Ch <-- instead of 20 bytes, it only fetches 1.

1AA0: 64 2C A9 50 - DMA 20 bytes from A964h

 

On the other 7 scanlines of the same zone, the following occurs:

 

1A94: 28 2C 99 00 - DMA 20 bytes from 9928h

1A98: 50 2C A9 00 - DMA 20 bytes from A950h <-- instead of 20 bytes, it only fetches 1.

1A9C: 3C 2C 99 50 - DMA 20 bytes from 993Ch

1AA0: 64 2C A9 50 - DMA 20 bytes from A964h <-- instead of 20 bytes, it only fetches 1.

 

To cork it all off, the programmer of RS Baseball had to have a clue what was going on, because he only put the title screen graphics in the correct places! The top scanline of the graphic is at 9928-994Fh, while the other 7 scanlines of the graphic are at A950-A977

 

On my logic analyzer picture, above, you can see it read the display list entry out of 1A9C-1A9F, then instead of 20 bytes of DMA it only fetches one, before reading the next display list entry at 1AA0-1AA3. This was taken on the last scanline of the zone, so that the addresses fetched do not have an offset.

 

I think this is pretty iron-clad proof that there is something very "funny" going on in the Maria chip! One interesting use for this would be for emulator protection- you could use purposely buggy DMA's that would work on a real system, but overwrite and corrupt the graphics on screen if the emulator executed the DMAs "properly" without emulating the DMA bug.

 

I am not sure how deep this particular rabbit hole goes, and I cannot seem to find any reasoning for these broken DMA fetches. There is obviously SOME set of triggering conditions, but I am not sure what that would be. Only two games exhibit this problem. At this point I am wondering if it has something to do with the horizontal pixel position when the width is being read.

 

Anyone writing software hit upon this problem where "random" display list entries do not work?

  • Like 4
Link to comment
Share on other sites

Whoa. Hang on there. A 7800 hooked up to a logic analyzer?

I may need to enlist some help on "Get Lost" and it's mystical black screen. Maybe it's related?

 

Sometimes I have screens go completely black, or some DL's go haywire. Works perfect in emulation.

Would you be able to clarify what's going on a little more?

 

-John

Edited by Propane13
Link to comment
Share on other sites

Whoa. Hang on there. A 7800 hooked up to a logic analyzer?

I may need to enlist some help on "Get Lost" and it's mystical black screen. Maybe it's related?

 

Sometimes I have screens go completely black, or some DL's go haywire. Works perfect in emulation.

Would you be able to clarify what's going on a little more?

 

-John

 

I had similar type things happen during development of the FPGA 7800. The main cause was too many NMI's. Some games (Klax) did things like turn the DMA on before it set up the display lists- it would grab random crap out of open bus / RAM / god knows where and if bit 7 is set it would do a DLI. The timing was very close on this and being off a few cycles was enough that it couldn't recover.

 

Make sure you are not accidentally setting DMA mode to 0 or 1. The game Beef Drop (I am not sure if all versions do it, or the version I had which was a dev one) does it. It sets DMA mode 0 sometime when it changes from the title screen to the game screen. This puts the DMA state machine into a tizzy and it will fetch one DL header from god knows where. If bit 7 is set, it will NMI and trash the game, causing the in game to be all corrupt. I fixed it by disallowing NMIs if the DMA mode is not 3.

 

The other possible explanation is your code's expecting to execute more instructions than it can on a real system. The emulators aren't very good at deducting cycles used by the DMA's yet (or if they do, they top out at some default value). The emulators have no problem fetching more graphics than can be rendered, too. The issues raised in this thread is proof of that.

 

But if I had to guess, NMI is most likely what is causing your crashes. If you have the cycles, you could put a re-enterant protector in your NMI loop. Check an "already in NMI" flag and RTI if it's set. If not, set it, do your NMI then clear it and RTI. This will prevent multiple NMI's stepping on each other.

 

I am not sure why Atari didn't use IRQ for this, instead... seeing how it goes unused. It's maskable and will not perform more IRQs until the current one is finished. My only guess is they didn't want the overhead of clearing the IRQ flag on the chip when the IRQ was serviced.

Link to comment
Share on other sites

I posted my thoughts in your other thread. I think what you are seeing with the logic analyzer is not a bug. The H16 bit is enabled for the title DLLs. In this holy dma mode maria will not fetch from odd 4k blocks. In the first scan line of your example only the top line at $a028 is fetched. For the next 7 scan lines $a950 - $af50 are used and $bo50 is skiped.

 

If your hardware is fetching from these odd 4k blocks, then it may have a bug.

Link to comment
Share on other sites

If you havnt seen this yet, check out this tool that GroovyBee wrote. It will open Prosystem .sav files and decode the dll data for you. Makes it easy to figure out how the display is setup. The RS Baseball title is actually made with a combination of 8 and 16 line zones.

 

http://atariage.com/forums/topic/161355-a78psd-prosystem-save-state-file-analyser/#entry1986861

Link to comment
Share on other sites

I posted my thoughts in your other thread. I think what you are seeing with the logic analyzer is not a bug. The H16 bit is enabled for the title DLLs. In this holy dma mode maria will not fetch from odd 4k blocks. In the first scan line of your example only the top line at $a028 is fetched. For the next 7 scan lines $a950 - $af50 are used and $bo50 is skiped. If your hardware is fetching from these odd 4k blocks, then it may have a bug.

 

Yes this was it. Thanks for the tip on that. I checked and the schematic has nothing at all about the DMA memory holes, actually. I read everything I could and the documentation all said that when a DMA memory hole is hit, it will simply write 00h to all the bytes instead of whatever is being read from memory, so that's what I did. Obviously the caveat is that you have to terminate the DMA's early too!

 

The logic analyzer showed this. As for the memory ranges affected, I wrote out the addresses it was fetching and did notice that when it was fetching Axxxh on the RS Baseball screen it would only DMA 1 byte instead of all 20. I didn't make the connection between that and the DMA hole bit being set. Oh well, that saved me hours of debug time.

 

If you havnt seen this yet, check out this tool that GroovyBee wrote. It will open Prosystem .sav files and decode the dll data for you. Makes it easy to figure out how the display is setup. The RS Baseball title is actually made with a combination of 8 and 16 line zones. http://atariage.com/forums/topic/161355-a78psd-prosystem-save-state-file-analyser/#entry1986861

 

Yeah that's nice- I needed one of those :-) I ended up writing a simple one in QB64 to decode the DL's during debugging.

 

So I guess in the end there is no silicon bug and I feel kinda silly for thinking there was, now. In any event, the FPGA 7800 is about done now. I added 2 button support and tested a bunch of games without finding any more new bugs. I will do a regression test tomorrow and run every game again to see if anything else broke but I ran 20 or so and they were all fine. Played up to level 5 on Xenophobe and not a graphic disappeared. RS BB's title screen is perfect, too.

 

Still find it interesting that only 2 titles were affected out of all of them, though!

 

In a few days if I get around to it I will make a video of it running various games.

Link to comment
Share on other sites

Aww... so there's no bug then?

I guess that means "Get Lost" has a different type of bug.

 

I would love to know what kind of logic analyzer setup you have (where you put pins, what you did to hook into the 7800), as I have been very interested to figure out what's broken in my game. Would you be willing to share?

 

-John

Link to comment
Share on other sites

Aww... so there's no bug then?

I guess that means "Get Lost" has a different type of bug.

 

I would love to know what kind of logic analyzer setup you have (where you put pins, what you did to hook into the 7800), as I have been very interested to figure out what's broken in my game. Would you be willing to share?

 

-John

 

I forgot to mention I ran it for over 24 hours on my doodad and it didn't crash. I played it for about a half hour last night and it seemed OK. Cool game btw. I will probably be playing it more! I couldn't actually play it until last night since it didn't support one button controllers. I fixed that last night so then I could play it properly other than running around and falling off that first ladder as before. I got stuck between the two dragons on the ground 'cause I couldn't jump so I left it there all night and it was still running the next day. I still think it's some kind of weird NMI problem. WHEN does it crash? Is it between screens or just sitting there on a screen without moving can trigger it? If it's between screens I'd think it might be NMI. If it's while the game just sits there, I don't know. Make sure you're not letting them point somewhere invalid, like 0000h if you're clearing the RAM out and writing a new one in. I doubt it, but I thought I'd throw it out there.

 

The emulators tend to have well defined behaviour if you try to read from unimplemented ("open bus") areas of memory, vs. a real system.

 

How does Missing in Action play on your FPGA 7800? It tends to give the emus problems.

 

Mitch

 

I have not seen anything about this game. I will find it on the forum and play it.

Link to comment
Share on other sites

I would love to take a peek at this schematic. Is it online? I did a quick search but came up empty.

 

Yeah I only have a paper copy. I will be getting it scanned and it will be posted eventually. It's 43 pages long and has 2 duplicate pages where some Atari engineers scribbled notes on it, making it 45 total. It's extremely buggy and the DMA hole logic is totally missing. I think it was very early in the design process when Atari reorg'd. This schematic is a "combo" TIA/Maria one so it would've been for another revision of the 7800 to reduce costs.

 

 

Aww... so there's no bug then?

I guess that means "Get Lost" has a different type of bug.

 

I would love to know what kind of logic analyzer setup you have (where you put pins, what you did to hook into the 7800), as I have been very interested to figure out what's broken in my game. Would you be willing to share?

 

-John

 

Sure. I am using an HP/agilent 16700B logic analyzer I got off ebay for $200 (it came with three acquisition cards). I had to buy some probes for it too (naturally, also off ebay).

 

To connect it to the 7800, all I did was solder some .1" headers onto the pins of the CPU. 20 per side. I set the header pins on top of the chip legs where they exit the package, and soldered it to them. This was done for the 40 pins of the CPU which contained almost every signal I needed. I soldered a few pins to the Maria to get at its clock input line. The Maria clock is driving the logic analyzer's capturing, so it captures 1 sample each 14.318MHz clock cycle. This let me do easy cycle counting by just setting markers and reading off cycles directly. The probe pins just plug into the headers directly so there's no fuss or soldering other than putting the pins on the chip originally. Now that I'm done, I will probably just leave the pins on the CPU since they aren't hurting anything. The case will still close just fine and if I ever have to probe it again, it's ready to go.

 

I set a trigger up on the display list address so it would start capturing data when the DL line in question on Realsports Baseball was accessed. The picture I took shows the signals I was monitoring- the address bus, data bus, HALT, R/W, and phi0 into the CPU. This analyzer as it sits with the 3 acquisition cards can monitor about 200 signals and has a depth of 2 million samples.

 

So far I have used it to totally reverse engineer timing of the Videobrain, Arcadia 2001, and now I used it to poke the 7800. It's strong medicine but it gives clear results in seconds to minutes instead of hours or days or more using any other means.

 

The next job for this logic analyzer will be unraveling the mysteries of the Gameboy's video rendering to the LCD. It's extremely complex and convoluted.

Link to comment
Share on other sites

I forgot to mention I ran it for over 24 hours on my doodad and it didn't crash. I played it for about a half hour last night and it seemed OK. Cool game btw. I will probably be playing it more! I couldn't actually play it until last night since it didn't support one button controllers. I fixed that last night so then I could play it properly other than running around and falling off that first ladder as before. I got stuck between the two dragons on the ground 'cause I couldn't jump so I left it there all night and it was still running the next day. I still think it's some kind of weird NMI problem. WHEN does it crash? Is it between screens or just sitting there on a screen without moving can trigger it? If it's between screens I'd think it might be NMI. If it's while the game just sits there, I don't know. Make sure you're not letting them point somewhere invalid, like 0000h if you're clearing the RAM out and writing a new one in. I doubt it, but I thought I'd throw it out there.

 

The emulators tend to have well defined behaviour if you try to read from unimplemented ("open bus") areas of memory, vs. a real system.

 

Interesting. No crashes when just sitting there? Others have reported that it crashes at random when they just sit on the first screen after starting up.

For me, it crashes typically within an hour of playing. Calculation-wise, the DLL's aren't filled up with too much stuff, and I don't believe I'm actually using DLI's.

It just gets weirder and weirder. I'll have to look at my source code to confirm-- haven't done that in awhile.

 

If you do end up playing it, and get a crash scenario, I would be in your debt for the feedback.

Maybe it has to do with the type of 7800 it runs on, though? If you've never seen a crash, that's very interesting.

 

Regards,

-John

Link to comment
Share on other sites

*sorry to de-rail*

 

John:

 

Just a question - could it be something silly, like the bat? i.e. it looks like you generate a (fairly) random zone for the bat to come out. What happens if a number is generated that is *just* outside the screen coordinates and ends up 'modifying' a non-existent DL? I've done similar things in the past like that...

 

... or is the bat already on the screen when it goes black?

 

Bob

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...