Jump to content
IGNORED

Bird's nest...


Recommended Posts

11 minutes ago, carlsson said:

When you overlay video, does everything on the signal get visible or does the STIC cut out (??) the rightmost pixel just like it doesn't display its own rightmost pixel?

The full horizontal line will be displayed since the video circuitry counts pixels and only resets on a video blanking signal. Since the Inty displays an overscan border, blanking will come long after each pixel has been displayed.

 

That said, a different kind of chopping takes place: the STIC pulls SR1 low after the 191'st scan line, midway through the last pair of scan lines. Since my video circuitry depends on using SR1 to detect a new frame, it means that I can't display more than 191 scan lines even though the STIC displays 192. This has a couple of implications. When you display 24 rows of text, for example, the bottom scan line of the bottom row won't get displayed. Also, in large-pixel graphics modes like the one shown above, even though the resolution is 160x96, the 96th row only has one scanline displayed instead of two.

Edited by JohnPCAE
  • Like 1
Link to comment
Share on other sites

Assuming that I have enough space to implement them all, these are the screen modes I'm planning:

 

#define SCREEN_MODE_40_COLS      0
#define SCREEN_MODE_80_COLS      1
#define SCREEN_MODE_160_96_2     2
#define SCREEN_MODE_160_96_16    3
#define SCREEN_MODE_160_96_256   4
#define SCREEN_MODE_160_96_TRUE  5
#define SCREEN_MODE_160_191_2    6
#define SCREEN_MODE_160_191_16   7
#define SCREEN_MODE_160_191_256  8
#define SCREEN_MODE_160_191_TRUE 9
#define SCREEN_MODE_320_191_2    10
#define SCREEN_MODE_320_191_16   11
#define SCREEN_MODE_320_191_256  12
#define SCREEN_MODE_640_191_2    13
#define SCREEN_MODE_640_191_16   14
#define SCREEN_MODE_640_96_256   15

 

The first two modes are text modes and are implemented. Also, the 2-color modes are implemented, as well as the true-color modes. 160x96 true-color uses four bytes per pixel, where each byte contains an NTSC timeslice value from 0-255. 160x191 true-color mode uses two bytes per pixel, where the four NTSC timeslice values each take up four bits. It therefore has less color fidelity than the other true-color mode but still looks amazing.

 

Having enough space is an issue because all code needs to run in RAM for the Pi Pico to be able to keep up, and there is a lot less RAM available than there is flash memory, especially since I'm setting aside 64k bytes (32k words) for the video buffer/character RAM shared area. The buffer will be exposed at $D000 when the correct register bit is set and is available as a set of eight pages. The text-mode code uses a ton of space with giant switch statements because it's the only way to get the code to run fast enough to keep up with the raster beam, but for graphics modes I can use loop unrolling since there isn't multiple mode mixing like there is in the text modes. Hopefully there will be enough space for the 16-color and 256-color modes. All non-true-color modes will use the color palette for selecting colors: for example, the two-color modes use palette entries 0 and 1, the 16-color modes use the first sixteen palette entries, and so on.

 

As for the test patterns in my screenshots, I'm generating those with temporary code in the Pico rather than in Inty code. It's just a lot easier for testing.

Edited by JohnPCAE
  • Like 2
Link to comment
Share on other sites

Hi, John,
 

Just for reference, for it's been a while since you started this thread, could you remind us how the final video module connects to the Master Component?  Is it via the cartridge port or directly to the main board inside?

 

What sort of modifications to a stock console does it require?

 

    dZ.

Link to comment
Share on other sites

23 minutes ago, JohnPCAE said:

It plugs into the cartridge port. It doesn't require any modification of the console to work per se, but on an unmodified Inty I the overlay video will be noticeably dimmer. You'll get much better results after doing the System Changer mod.

Alright, that's what sort remember you mentioning before.  So, it works on a stock Inty II, and requires the System Changer mod for an a Inty I.  ??
 

    dZ.

Link to comment
Share on other sites

This is what I'm looking at for the (hopefully final) version of the board. Like the one I'm currently testing, everything is contained on only a single board, with the Pi Pico doing the heavy lifting. The major change here is the user port: instead of a nonstandard port I've changed it to a standard 25-pin bidirectional parallel port. It isn't ECP or EPP, it's just a standard port, though the eight data pins can be set to input instead of output. It has one nonstandard restriction in that the four control lines (nStrobe, nAuto-Linefeed, nInitialize, and nSelect-Printer) are write-only. So you can't use them for getting inputs but it isn't necessary since it supports reading from the data lines.

 

EDIT: Well, strictly speaking, there is a second board. It's the small board that plugs into the Master Component. It's special in that it has a PLL circuit that multiplies MCLK by 4 to generate the ~14MHz clock that the board here needs for color output.

 

OverlayBoard_1_1.png

IntyQuadClockTap.png

Edited by JohnPCAE
  • Like 1
Link to comment
Share on other sites

While waiting for my updated board design, I've been busy with the software:

 

1. Removed per-column blanking. Since the virtual width, height, and starting memory location are independent of what gets displayed, the feature isn't necessary. Removing it also alleviated some timing issues.

2. Increased the available memory size from 64k to 96k. That's pretty much as high as it can go since all of the code is taking up RAM. Running from flash is just way too slow.

3. Added something really crazy and highly experimental. While it appears to be working so far, I can't guarantee that it will work in all cases:

 

That really crazy something is a CP1600 CPU emulator that will run when the Inty is idle. Spefically, it runs under the following conditions:

 

- during NACT cycles if the Pi Pico didn't have to put something on the Inty's bus right beforehand (this is because the Pico needs to do something else during those NACT cycles)

- during DW, INTAK, and IAB

 

It shares the 96k bytes used by the video memory, so it has an effective address space of 0000-BFFF (obviously it can't see anything on the Inty's hardware bus). Its advantages are that it runs parallel to the real CPU and that all of its instructions run in a single cycle (at least I hope so, or things can go haywire if the Pico falls behind on the next cycle). So far I've only tested it with a tiny looping program that repeatedly increments a single byte in its address space (which, since it's shared with video memory, can be seen on the screen).

 

It can be used for a number of things since it's emulating a general-purpose CPU, and if it turns out to work well I can see it being used for quite a few acceleration tasks. Also, it has the potential to be more powerful than a real CP1600 since I can pretty easily extend its opcodes past ten bits (though I'm limited by the Pico's overall memory constraints since adding code takes up RAM). It even supports BEXT branches and external interrupts since you can use the board's registers to specify EBCA0-3 pins and an interrupt vector, as well as trigger an interrupt. I'm not sure why one would need to do this, but it's possible. The only ignored instructions are SIN and TCI, since there are no actual hardware lines that they would toggle. Those are essentially no-ops.

 

I plan to do some more testing with it and if I run into timing overrun problems I'll investigate splitting some instructions across two cycles. I'm most concerned about instructions that set all four flags since those require extra code. To date I've only tested it with something simple that continually increments a character value on the screen:

 

$6000:

MVI   R0, $100
MOVR  R0, R1
ANDI  R0, #$00FF
ANDI  R1, #$FF00
INCR  R0
MVO   R0, $100
B     $6000

 

This would be a LOT easier if the Pi Pico had more than two cores. This works by having the core that is responsible for monitoring the Inty's bus run the emulator when it's idle. The other core is needed entirely for driving the display and it ***barely*** has enough horsepower to pull it off in certain modes (80-column mode and 4-color characters in 40-column mode really strain it).

 

That said, running the emulator in the same core as the bus monitor has one huge advantage: there are no issues with memory conflicts. If I ran it in a separate core there would always be the possibility that two cores might try to write to the same location at the same time, leading to unpredictable results. That said, anyone who had their real CP1600 try to write to the same location as a running emulated one pretty much deserves it ;)

 

On a side node, one thing that continually drives me crazy with respect to timing is the compiler and how it handles switch statements. It's highly inconsistent. Put your switch in one place and it runs within a certain time, move it elsewhere and it runs a lot slower. The way that gcc compiles switch statements frankly sucks. For my emulator I had to implement a custom dispatcher to achieve consistent results even though it's slower than the fastest possible implementation if I did it in straight assembly. I can't do this with the graphics output code however because the overhead is just too great. I have to rely on gcc to compile it efficiently instead. Finding the magic formula of getting efficient display code has been a real challenge. Now that it's working I'm considering that code frozen.

Edited by JohnPCAE
Link to comment
Share on other sites

To elaborate a little more on the emulator, here is how I see it being used:

 

Let's say you want to scroll a portion of the screen. Since the emulator effectively runs at roughly twice the speed of the real CPU, you can offload the scrolling routine to the emulated CPU and trigger it by writing to a register that tells it to execute at a particular location (the emulator doesn't run until you tell it to). You can put a HLT instruction at the end of the routine and poll one of the board registers until a bit is set telling you that execution has halted.

 

Now, there is a big caveat to this. Since the emulator cannot run on a NACT if the Pico had to put something on the bus right beforehand, polling its register will prevent the emulator from running on some NACT cycles. It will still run, since it can run on a NACT right after an instruction fetch or during the conditional branch in a polling loop, but it will be idle during the NACT cycle when its register is actually being polled by the bus. You can alleviate this somewhat by inserting some NOPs in your polling loop if timing isn't that critical.

 

Another way to use the emulator could be for accelerating calculations. Let's say you want to multiply some numbers. It's not implemented yet, but I plan on adding some extended opcodes like multiplies, inclusive ORs, and anything else I can add based on whether the additional code will fit in the Pico's remaining RAM and whether they can run within a single bus cycle. You could write your parameters either to registers or directly to memory, tell the emulated CPU to run a routine, and wait until it completes. You can either poll for a halt status, or if the runtime is deterministic, just wait with a series of NOP instructions until you know enough NACT cycles have transpired. Then just read the results from either memory or from the board's registers (reading from memory will always be faster).

Edited by JohnPCAE
  • Like 1
Link to comment
Share on other sites

Hey guys, I'm in a help-needed situation. My new board design has arrived and I'm in the process of assembling one for testing. The issue I'm having is right-angle card edge connectors. I have one for myself, but since I have five boards, if everything tests out fine I'd like to send out some free samples to some people to let you evaluate them. The problem is getting the connectors for the cartridge port. I need connectors that extend past the edge of the board somewhat the way they do in the Inty, because the cartridge shell needs to be able to go around the connector housing. Does anyone have any suggestions as to how we can source these parts?

 

EDIT: I think EDAC part 392-044-558-201 might do the trick, but Mouser's minimum order is 100 and Digi-Key's is 25. I only want up to five since that's all the boards I have and I don't even know if they will fit the bill.

 

EDIT #2: I found a place where I can order less and I ordered 5, but I won't get them until the end of March :(

Edited by JohnPCAE
Link to comment
Share on other sites

On 12/24/2021 at 3:58 AM, JohnPCAE said:

To elaborate a little more on the emulator, here is how I see it being used:

 

Let's say you want to scroll a portion of the screen. Since the emulator effectively runs at roughly twice the speed of the real CPU, you can offload the scrolling routine to the emulated CPU and trigger it by writing to a register that tells it to execute at a particular location (the emulator doesn't run until you tell it to). You can put a HLT instruction at the end of the routine and poll one of the board registers until a bit is set telling you that execution has halted.


 

I'm not sure I understand what this would accomplish.  I thought you said that the emulator has no access to the bus, which I took to mean that it cannot read or write to the console's memory.

 

Or are you talking about offloading all dynamic computations (like shifting GRAM cards, etc.)?
 

On 12/24/2021 at 3:58 AM, JohnPCAE said:

Now, there is a big caveat to this. Since the emulator cannot run on a NACT if the Pico had to put something on the bus right beforehand, polling its register will prevent the emulator from running on some NACT cycles. It will still run, since it can run on a NACT right after an instruction fetch or during the conditional branch in a polling loop, but it will be idle during the NACT cycle when its register is actually being polled by the bus. You can alleviate this somewhat by inserting some NOPs in your polling loop if timing isn't that critical.

 

Another way to use the emulator could be for accelerating calculations. Let's say you want to multiply some numbers. It's not implemented yet, but I plan on adding some extended opcodes like multiplies, inclusive ORs, and anything else I can add based on whether the additional code will fit in the Pico's remaining RAM and whether they can run within a single bus cycle. You could write your parameters either to registers or directly to memory, tell the emulated CPU to run a routine, and wait until it completes. You can either poll for a halt status, or if the runtime is deterministic, just wait with a series of NOP instructions until you know enough NACT cycles have transpired. Then just read the results from either memory or from the board's registers (reading from memory will always be faster).

Accelerated calculations sounds like a very useful feature, although I shudder at the thought of having to compose the expression evaluator in CP-1600 Assembly. :(  I think it would be more us useful to offer complete pre-fab mathematical functions, like square roots, trigonometry, vector multiplies, matrix transforms, etc.  I think this is what the LTO or JLP firmware offers.

 

   dZ.

Link to comment
Share on other sites

In the example I mentioned above, by scrolling the screen I mean scrolling the *overlayed* screen data. It has access to its internal memory, so if you're showing 40x25 (or 80x25) text in the overlay, the emulator can manipulate that video buffer twice as fast as the Inty could. Or, if you're using bitmap graphics mode, the emulator can manipulate that frame buffer twice as fast as the Inty could.

 

So far, I've implemented the following extended instructions:

 

OR      RD, ADDR        ; RD |= [ADDR]
OR@     RM, RD          ; RD |= [RM]
ORI     DATA, RD        ; RD |= DATA
ORR     RS, RD          ; RD |= RS
SLLR    RD, RA          ; RD = RD shl RA
SLRR    RD, RA          ; RD = RD shr RA
SARR    RD, RA          ; RD = RD sar RA (arithmetic shift right)
ROLR    RD, RA          ; RD = RD rol RA
RORR    RD, RA          ; RD = RD ror RA
MULR    RS, RD          ; RD = low word of unsigned multiplication of RS * RD
IMULR   RS, RD          ; RD = low word of signed multiplication of RS * RD
MULRW   RA, RB          ; unsigned multiplication of RA and RB are placed in R1:R0 (R1 = high word, R0 = low word)
IMULRW  RA, RB          ; signed multiplication of RA and RB are placed in R1:R0 (R1 = high word, R0 = low word)

 

 

Edited by JohnPCAE
Link to comment
Share on other sites

I've managed to squeeze in some more instructions, but I've been hitting the RAM limit so I'm not sure how much more I can get in. So far the list is up to the following:

 

OR      RD, ADDR   ; RD |= [ADDR]
OR@     RM, RD     ; RD |= [RM]
ORI     DATA, RD   ; RD |= DATA
TST     RA, ADDR   ; performs a bitwise AND and sets the S and Z flags based on the result
TST@    RM, RD     ; performs a bitwise AND and sets the S and Z flags based on the result
TSTI    DATA, RD   ; performs a bitwise AND and sets the S and Z flags based on the result
SLR     RD, ADDR   ; RD = RD shl [ADDR]
SLR@    RD, RA     ; RD = RD shl RA
SLR     RD, DATA   ; RD = RD shl DATA
SLR     RD, ADDR   ; RD = RD shr [ADDR]
SLR@    RD, RA     ; RD = RD shr RA
SLR     RD, DATA   ; RD = RD shr DATA
SAR     RD, ADDR   ; RD = RD sar [ADDR] (arithmetic shift right)
SAR@    RD, RA     ; RD = RD sar RA (arithmetic shift right)
SAR     RD, DATA   ; RD = RD sar DATA (arithmetic shift right)
ROL     RD, ADDR   ; RD = RD rol [ADDR]
ROL@    RD, RA     ; RD = RD rol RA
ROL     RD, DATA   ; RD = RD rol DATA
ROR     RD, ADDR   ; RD = RD ror [ADDR]
ROR@    RD, RA     ; RD = RD ror RA
ROR     RD, DATA   ; RD = RD ror DATA
ORR     RS, RD     ; RD |= RS
TSTR    RA, RB     ; performs a bitwise AND and sets the S and Z flags based on the result
SLLR    RD, RA     ; RD = RD shl RA
SLRR    RD, RA     ; RD = RD shr RA
SARR    RD, RA     ; RD = RD sar RA (arithmetic shift right)
ROLR    RD, RA     ; RD = RD rol RA
RORR    RD, RA     ; RD = RD ror RA
MULR    RS, RD     ; RD = low word of unsigned multiplication of RS * RD
IMULR   RS, RD     ; RD = low word of signed multiplication of RS * RD
MULRW   RA, RB     ; unsigned multiplication of RA and RB are placed in R1:R0 (R1 = high word, R0 = low word)
IMULRW  RA, RB     ; signed multiplication of RA and RB are placed in R1:R0 (R1 = high word, R0 = low word)

The code doesn't take up all that much space, but the dispatch tables take up a lot and that's the driver of whether I run out of RAM.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...