Jump to content
IGNORED

F18A programming, info, and resources


matthew180

Recommended Posts

8 hours ago, matthew180 said:

Not currently, no.  Sorry.  Although I think this could be added in the next firmware update.  The BML is only addressed during TL1 processing, so the BML's priority bit is between TL1 and the BML

(5 years ago, yikes) I tried  a BML to mask out the bottom 3-4 rows of the screen.

 

TL1 and TL2  were scrolling (different speeds) for a parallax effect. I wanted a fixed status display at the bottom.  (The old "scrolling window" feature was gone.) 

 

BML would cover ties 21-24. Row 21 would be solid BML  to hide partial tiles. Tiles to be erased after scrolling 8 pixels. I would draw status into BML for rows   22-24.

With a 256x32 bitmap, 2 bits per pixel, I got a very confusing result. (Hardware F18A, recent firmware.) Probably my error, but to clarify--

 

 

Is there a way for the BML to mask both TL1 and TL2?

 

Sprites aren't an issue. 

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, FarmerPotato said:

Row 21 would be solid BML  to hide partial tiles.

 

If you were doing parallax scrolling, that would be left-to-right, so I'm confused what masking you needed at the bottom?  Just put empty tiles on TL1 and TL2 for those rows.  If you are using the expanded name-tables, then you do not need any left or right edge masking (which TL2 was designed to do in cases where is was needed).

 

1 hour ago, FarmerPotato said:

Is there a way for the BML to mask both TL1 and TL2?

 

Not currently.  The BML and TL1 can swap which is on top, but TL2 will always be in front of both.

 

You could use TL1 and the BML for the scrolling, and leave TL2 for things like the score and fixed-place text, etc..  The BML is like a big sprite and can be moved around.  However, I would also think that some of the techniques Rasmus has come up with (like the driving games) could be used to crate a parallax effect much easier and more efficiently.

 

Alternatively you can use a single tile layer and use the GPU to set the horizontal scroll for every pixel row.  Every 8 rows it could also shift the horizontal tiles in a row so you do not need the expanded name tables.

 

I'm working on a firmware update, and a possible new feature is support for horizontal and vertical scrolling without needing the extra name tables.  I think I'm calling it "border scroll mode", but I might change it to "window scroll mode" (although I don't like "modes" so I should probably pick a new word for that too).  Basically the name tables becomes 34x26 (or 34x32 if ROW30 is enabled), but the displayed tiles will be the center 32x24.  This leaves a border of tiles all the way around the tile layer that is used to provide the edge data when scrolling takes place.

 

This does mean after scrolling 8 pixels in any direction you will have to reset the scroll and tile-shift the whole name table, but you eventually have to do this anyway.  With this technique the name table only needs to grow by 116 bytes, i.e. 768 to 884 (1088 in ROW30), which is way less memory than doubling or quadrupling the name table space for each layer.

  • Like 1
Link to comment
Share on other sites

2 hours ago, matthew180 said:

 

2 hours ago, matthew180 said:

If you were doing parallax scrolling, that would be left-to-right, so I'm confused what masking you needed at the bottom? 

Oh, it's both horizontal and vertical travel. Horizon isn't fixed. 
 


 

 

2 hours ago, matthew180 said:

Basically the name tables becomes 34x26 (or 34x32 if ROW30 is enabled), but the displayed tiles will be the center 32x24.

Would it have to be a contiguous row? Address calculation gets messier--

MOV R3,R0  Row

SLA R0,5  MPY by 32

A R3,R0

A R3,R0

 

But I guess you would be refreshing the whole table in one go, so VDPWA is set once. 

For scrolling, I made TL2 into four screens, so the refresh portion is always off-screen. TL1 scrolls half the speed so I allocated just 64x30 & refresh off screen. 

Link to comment
Share on other sites

3 hours ago, matthew180 said:

This does mean after scrolling 8 pixels in any direction you will have to reset the scroll and tile-shift the whole name table, but you eventually have to do this anyway.  With this technique the name table only needs to grow by 116 bytes, i.e. 768 to 884 (1088 in ROW30), which is way less memory than doubling or quadrupling the name table space for each layer.

Would it be possible to allow the scroll to wrap at the edges?  On the right side: 33 to 0, and at the bottom: 25 to 0 (or 31 to 0 w/ ROW30).

 

I also can think of a situation where I would want BM layer on top of both TL1 and TL2.

Link to comment
Share on other sites

1 minute ago, PeteE said:

Would it be possible to allow the scroll to wrap at the edges?

 

That is the native functionality if you just start changing the scroll registers without any masking (from TL2, etc.) or setting up the additional name tables.

 

Here is a screenshot of some early testing of the BML and scrolling.  This is the stock Mater Title Screen, with a GPU program running that is updating registers (like the BML control), and the horizontal scroll register being updated at certain locations.  The "READY-PRESS ANY KEY TO BEGIN" text looks blurry because it is horizontal scrolling, along with the color bars in places, etc..  The console has no idea this stuff is happening, and there is no 9900 code involved.

 

image.thumb.png.1cc7dc66531990850100a9c525710840.png

 

 

11 minutes ago, PeteE said:

I also can think of a situation where I would want BM layer on top of both TL1 and TL2.

 

I can probably make that happen.

Link to comment
Share on other sites

7 minutes ago, matthew180 said:

That is the native functionality if you just start changing the scroll registers without any masking (from TL2, etc.) or setting up the additional name tables.

I meant with future 34x26 border scroll functionality.  I would rather have the scroll wrap instead of needing to rewrite the whole name table every 8 pixels scroll.

Edited by PeteE
Link to comment
Share on other sites

44 minutes ago, PeteE said:

I meant with future 34x26 border scroll functionality.  I would rather have the scroll wrap instead of needing to rewrite the whole name table every 8 pixels scroll.

How difficult would it be to use the GPU as a blitter to do that?

Link to comment
Share on other sites

The GPU is more than fast enough to set multiple registers between scanlines, if you have a complicated scrolling situation that requires a fixed window, just use the GPU to set up the window mid-screen. The whole value of the GPU is to easily set up exactly the situation that you need without all of them needing to be specific features. ;)

 

Link to comment
Share on other sites

4 minutes ago, OLD CS1 said:

How difficult would it be to use the GPU as a blitter to do that?

That depends on how much GPU RAM is left over after both TL1 and TL2 name and attribute tables, ECM3 patterns and ECM3 sprites.  The level data needs to be stored somewhere.

Edit: oh... blitter as a block transfer, to move the data in the name+attr tables up/down/left right.  Yeah, that would work. Good idea!

Edited by PeteE
Link to comment
Share on other sites

17 hours ago, PeteE said:

Edit: oh... blitter as a block transfer, to move the data in the name+attr tables up/down/left right.  Yeah, that would work. Good idea!

 

When I finished the base 9918A functionality in the F18A, I had a lot of room left in the FPGA.  I always intended to have a DMA for exactly this, but defining all the DMA features eventually morphed into "Why not just have a CPU?"  And if you have a CPU in the VDP, why not have it be a 9900?  Thus the GPU.

 

There is, actually, also a DMA in the F18A.  I should probably put the GPU's memory map in the register use spreadsheet.  It is documented in this thread somewhere too.

 

   -- DMA
   -- 8xx0 - MSB src
   -- 8xx1 - LSB src
   -- 8xx2 - MSB dst
   -- 8xx3 - LSB dst
   -- 8xx4 - width
   -- 8xx5 - height
   -- 8xx6 - stride
   -- 8xx7 - 0..5 | !INC/DEC | !COPY/FILL
   -- 8xx8 - trigger
   --
   -- src, dst, width, height, stride are copied to dedicated counters when
   -- the DMA is triggered, thus the original values remain unchanged.

 

This will access VRAM at 10ns per byte, per read and write (but this will probably change a little in the future firmware, and will be slightly variable between 10ns to 30ns per byte).  So copying a byte will be 10ns read and 10ns write.  So, in 1us 50 bytes can be copied.  Clearing the screen can be done in about 16us.  Moving a 2K table takes about 40us.

 

There is also the PIX instruction (replaces the XOP instruction) that is designed to read/write/update BML pixels, and can also calculate the GM2 byte to update from a pixel X,Y location.  This instruction should be documented in this thread, I hope, and I think there are some examples (I can dig some up as well).

 

         -- PIX XY,CMD
         -- Can only operate on 16K VRAM addresses.
         -- Can be written like this: XOP src,dst
         -- Uses XOP addressing modes for src (XY) and dst (CMD)
         -- SRC: XY is the pixel x,y location in 8:8 format.  Uses all source
         --      addressing modes.
         -- DST: MAxxRWCE xxOOxxPP
         -- M  - 1 = calculate the effective address for GM2 instead of the new bitmap layer,
         --          placing the VRAM address in the dst.
         --      0 = use the remainder of the bits for the new bitmap layer pixels
         -- A  - 1 = retrieve the pixel's BML effective address instead of setting a pixel,
         --          placing the VRAM address in the dst.
         --      0 = read or set a pixel according to the other bits
         -- R  - 1 = read current pixel into PP, only after possibly writing PP
         --      0 = do not read current pixel into PP
         -- W  - 1 = do not write PP
         --      0 = write PP to current pixel
         -- C  - 1 = compare OO with PP according to E, and write PP only if true
         --      0 = always write
         -- E  - 1 = only write PP if current pixel is equal to OO
         --      0 = only write PP if current pixel is not equal to OO
         -- OO   pixel to compare to existing pixel
         -- PP   new pixel to write, and previous pixel when reading

 

 

 

  • Like 4
Link to comment
Share on other sites

  • 2 months later...

With the release of the MK1, and future MK2, I have inadvertently created a problem for myself related to firmware updates.  Specifically, the in-system updater now need to be able to tell the difference between the F18A original (back designed MK0), the MK1, and future MK2.

 

For the last year the MK1 boards have been released without any way to be able to tell the difference between them and the MK0 from a host computer, so this is going to be a hurdle when the first firmware update is released.

 

I was looking at SR1 (Status Register 1) which holds the "VDP type" information, and I'm a little confused why I chose bits >E0 to indicate the F18A.  SR1 is used by the 9938/58 to indicate the VDP type, but there are also light-pen and other various bits in the register, and bits >C0 (used by the F18A ident) overlap with some of those.

 

L=light pen flag
F=light pen switch
V=version bits
B=blanking period (horz or vert)
H=horz interrupt flag

LFVVVVVH LF00000H 9938
LFVVVVVH LF00010H 9958
VVV000BH 111000BH F18A

 

Nothing like throwing common sense out the window on that one.  I don't recall WTF I was thinking, but I certainly did not consider future versions.  Technically the zeros in bits >1C are "don't care" for identifying the F18A, but in the register use spreadsheet they are shown as "0".  Thus it is possible existing software might use a mask of >FC on SR1 data, and expect to find >E0 for the F18A.

 

The 9938/58 could also report something close if the light pen inputs are active, with only bit >20 being clear in that case.

 

I don't know why I didn't stick to the five version bits >37 in the middle of the register...  I *could* change to that, but risk some compatibility problems with software written to detect the F18A.  Alternatively I could leave SR1 as-is, and use SR12 to indicate MK0, MK1, or MK2, but that is yet another register to implement, and more code to detect the version.

 

I think I would like to use something like this:

MK0 v19  1110000H   Original
MK1 v19  1110000H   Can't distinguish from MK0
MK0 v1a  xx10000H   Starting with v1a
MK1 v1a  xx10010H   Starting with v1a
MK2 v20  xx10100H   Initial release

 

This removes the conflict with the light pen flags of the 9938/58, and removes the combined (horz+vert) blanking flag (which was probably useless anyway since horz blanking happens much too fast for reading a status register).

 

A mask of >20 would indicate "F18A", and a mask of >1C would indicate 0, 4, or 8, for MK0, MK1, MK2 respectively (or 0, 1, 2 if you only consider the three masked bits).  But this would break any existing software looking for at lease >E0 in SR1 for "F18A".  Alternatively I could leave bits >C0 set, and the new scheme would still work, if existing software is not expecting bits >1C to be zero.

 

If anyone has written F18A software that reads SR1 to check for the F18A, I would be interested in feedback.  I know @Tursi and @Asmusr have written such software, and any insight would be appreciated!

 

Edited by matthew180
technical corrections
  • Like 3
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, matthew180 said:

I was looking at SR2 (Status Register 2) which holds the "VDP type" information, and I'm a little confused why I chose bits >E0 to indicate the F18A.  SR2 is used by the 9938/58 to indicate the VDP type, but there are also light-pen and other various bits in the register, and bits >C0 (used by the F18A ident) overlap with some of those.

You mean SR1, right?

Link to comment
Share on other sites

Yes, SR1 (the 2nd status register... :P ).  Thanks, I fixed the post.  I also checked the Mario Bros code, which was the software that brought the MK1 problem to my attention, and it does mask with >F0 and compare with >E0.  I'm testing (well, Arcadeshopper is testing) a version that uses >20 for the mask and compare, but I think I might have to just put back the >C0 bits in SR1.

Edited by matthew180
Link to comment
Share on other sites

29 minutes ago, matthew180 said:

Yes, SR1 (the 2nd status register... :P )

I started by using the GPU detection method (TI Scramble), then switched to the SR1 method (Mario demo, raycaster demo, Power Strike demo are a few examples), and in the newer stuff (ZXQ-ONE and Karts demo) I haven't bothered to detect the F18A at all.

 

Edit: I never implemented the HF and BLANKING bits in JS99er, so I have never used those.

 

It appears to be more common to use SR4 for v9938/58 detection (https://forums.atariage.com/topic/207586-f18a-programming-info-and-resources/?do=findComment&comment=2837251).

 

Edited by Asmusr
Link to comment
Share on other sites

GPU detection will not be affected, and neither will not checking at all. ;)  It is the SR1 tests that would be a problem.  But since the number of programs that use SR1 detection is n > 1, I'll probably have to put the >E0 bits back and just call it a day.

 

Yeah, I should probably re-read the early posts in this thread, since I might answer my own questions.  It probably has more to do with detecting the 9918A vs F18A, and what you would get back on a 9918A if you read the status register.  For 9918A detection, if you wait for an interrupt and read SR0 twice, then bits >E0 in SR0 will never be set on the 2nd read.  Maybe that's why I used those bits...

Link to comment
Share on other sites

12 hours ago, matthew180 said:

If anyone has written F18A software that reads SR1 to check for the F18A, I would be interested in feedback.  I know @Tursi and @Asmusr have written such software, and any insight would be appreciated!

My process in Ghostbusters & Alex Kidd is as follows:

  1. Try unlocking the F18A
  2. Set r1 to blank the screen and disable interrupts
  3. Read sr0 once to ensure all flags are cleared
  4. Read sr14 into a variable
  5. Read sr0 again, compare against the variable. If it hasn't changed, no F18A is present.
  6. If it has changed, the variable contains the F18A FW version and I check if it's new enough (typically 1.6 or higher for the features I use).

 

*edit* Looking at js99er just now, the approach above will break if you ever release a version 1f... :-/. Not sure if this is what a real tms9918a does, but js99er apparently reports 0x1f as the "last sprite" value when there's no sprite overflow.

Edited by TheMole
clarify this might not actually be a future proof way of detecting the F18A
Link to comment
Share on other sites

14 hours ago, matthew180 said:

Yes, SR1 (the 2nd status register... :P ).  Thanks, I fixed the post.  I also checked the Mario Bros code, which was the software that brought the MK1 problem to my attention, and it does mask with >F0 and compare with >E0.  I'm testing (well, Arcadeshopper is testing) a version that uses >20 for the mask and compare, but I think I might have to just put back the >C0 bits in SR1.

new build works with the new firmware ..

Link to comment
Share on other sites

7 hours ago, TheMole said:

Read sr14 into a variable

 

You should really use SR1 for this, since you don't know if you have an F18A yet.

 

I think I know why use used bits >E0 of SR1 for the F18A, so those will remain constant.  For any releases going forward the three zero bits >1C will contain the F18A type, i.e. MK0, MK1, MK2.

 

7 hours ago, TheMole said:

Looking at js99er just now, the approach above will break if you ever release a version 1f... :-/.

 

I cannot promise I won't go over V1F, but once the MK2 is ready the version for all F18As will change to V2.0.

 

7 hours ago, TheMole said:

Not sure if this is what a real tms9918a does, but js99er apparently reports 0x1f as the "last sprite" value when there's no sprite overflow.

 

For the sprite number in SR0 bits >1F, the real 9918A (and F18A) will retain whatever was the highest processed sprite on a line which will depend on:

 

1. If the >D0 byte is set in the sprite table to stop sprite processing at a certain value.

2. If five or more sprites are on a line during the frame.

 

Keep in mind that the sprite number is only valid if the 5S flag is set.  IIRC the sprite number has been characterized to follow the sprite counter unless the 5S flag gets set, at which point it always retains the 5th sprite number until SR0 is read.  So, it is possible that the value is always >1F depending on what happens during the frame, but I don't remember if it gets cleared when reading SR0 (I don't think it does).

 

Link to comment
Share on other sites

I only use the GPU method to detect the F18A, since to date I've not needed to be worried about which one. 

 

I think I use the blanking bit for the scanline palette images, though -- at least, the GPU reads address >7000 to get the scanline number and the blanking bit. My comments note the blanking bit is active for both horizontal and vertical blank, so I assume it's the one you are talking about. (Mind you, if it stays available from the GPU that's all that matters to my code, but I wonder if that complicates things.)

 

Link to comment
Share on other sites

On 5/18/2024 at 5:43 PM, matthew180 said:

So, it is possible that the value is always >1F depending on what happens during the frame, but I don't remember if it gets cleared when reading SR0 (I don't think it does).

I tested this a fair bit on hardware when we were trying to debug Miner 2049'er.

 

As you noted, it counts up every single scanline as sprites are being processed, and by the end of the scanline it represents the highest sprite processed. This means if >D0 is in the sprite list before the end, the maximum value is lower than >1F.

 

If 5 sprites are detected on a line, the 5S bit is set and the fifth sprite bits no longer track the processing. 5S is never set while the vertical blank bit is set.

 

This means if you randomly read the status register, you could conceivably see many values in the fifth sprite count. But by end of frame it will be locked at the last sprite processed (or if 5 sprites on a line occurs). 

 

(My notes say I didn't actually test what the fifth sprite value is when all 32 sprites are active - is it >1F or >00, and that there was some debate at whatever time we were doing this... sounds like people are seeing >1F).

 

When you read VDPS only the top three bits (Frame (interrupt), 5S and Collision) are cleared... but it's probably hard to prove the fifth sprite bits unless you test it during blank since they'll immediately begin counting again. I wrote this as tested on hardware and this is also what the manual says, but I didn't try this exact test (I don't think). Since the manual is vague on the operation of the fifth sprite bits when 5S is not set, it's not impossible the whole byte is cleared, but for Classic99 at least I assume it's not.

 

  • Like 1
Link to comment
Share on other sites

10 hours ago, Tursi said:

(Mind you, if it stays available from the GPU that's all that matters to my code, but I wonder if that complicates things.)

 

Yes, it would remain available to the GPU.  I'm mostly considering its usefulness to the host-side since the bit currently includes horz sync, which would make it too fast to be useful, I think.  There is a horz interrupt for that kind of thing on the host side.

 

6 hours ago, TheMole said:

So, do I understand it correctly that setting sprite #0's y location to 0xd0 would make the fifth sprite value predictably 0 (or perhaps 1)?

 

Yes, that should be correct, and I would expect it to be zero.  And the 5S flag would never be set.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...