+DZ-Jay Posted December 13, 2021 Share Posted December 13, 2021 What about the old Microsoft ANSI set? It would be awesome to be able to display some old school ANSI artwork. dZ. Quote Link to comment Share on other sites More sharing options...
carlsson Posted December 13, 2021 Share Posted December 13, 2021 If you mean Code page 437, it seems to be present as the first font? Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 13, 2021 Share Posted December 13, 2021 1 hour ago, carlsson said: If you mean Code page 437, it seems to be present as the first font? Yes, that's it. I see it now. :) Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 15, 2021 Author Share Posted December 15, 2021 A wee bit of eye-candy: 160x96x32 true-color mode. Each pixel is represented by four bytes, specifying the four NTSC timeslice levels: 1 Quote Link to comment Share on other sites More sharing options...
carlsson Posted December 15, 2021 Share Posted December 15, 2021 When you overlay video, does everything on the signal get visible or does the STIC cut out (??) the rightmost pixel just like it doesn't display its own rightmost pixel? Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 15, 2021 Author Share Posted December 15, 2021 (edited) 11 minutes ago, carlsson said: When you overlay video, does everything on the signal get visible or does the STIC cut out (??) the rightmost pixel just like it doesn't display its own rightmost pixel? The full horizontal line will be displayed since the video circuitry counts pixels and only resets on a video blanking signal. Since the Inty displays an overscan border, blanking will come long after each pixel has been displayed. That said, a different kind of chopping takes place: the STIC pulls SR1 low after the 191'st scan line, midway through the last pair of scan lines. Since my video circuitry depends on using SR1 to detect a new frame, it means that I can't display more than 191 scan lines even though the STIC displays 192. This has a couple of implications. When you display 24 rows of text, for example, the bottom scan line of the bottom row won't get displayed. Also, in large-pixel graphics modes like the one shown above, even though the resolution is 160x96, the 96th row only has one scanline displayed instead of two. Edited December 15, 2021 by JohnPCAE 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 15, 2021 Author Share Posted December 15, 2021 (edited) Assuming that I have enough space to implement them all, these are the screen modes I'm planning: #define SCREEN_MODE_40_COLS 0 #define SCREEN_MODE_80_COLS 1 #define SCREEN_MODE_160_96_2 2 #define SCREEN_MODE_160_96_16 3 #define SCREEN_MODE_160_96_256 4 #define SCREEN_MODE_160_96_TRUE 5 #define SCREEN_MODE_160_191_2 6 #define SCREEN_MODE_160_191_16 7 #define SCREEN_MODE_160_191_256 8 #define SCREEN_MODE_160_191_TRUE 9 #define SCREEN_MODE_320_191_2 10 #define SCREEN_MODE_320_191_16 11 #define SCREEN_MODE_320_191_256 12 #define SCREEN_MODE_640_191_2 13 #define SCREEN_MODE_640_191_16 14 #define SCREEN_MODE_640_96_256 15 The first two modes are text modes and are implemented. Also, the 2-color modes are implemented, as well as the true-color modes. 160x96 true-color uses four bytes per pixel, where each byte contains an NTSC timeslice value from 0-255. 160x191 true-color mode uses two bytes per pixel, where the four NTSC timeslice values each take up four bits. It therefore has less color fidelity than the other true-color mode but still looks amazing. Having enough space is an issue because all code needs to run in RAM for the Pi Pico to be able to keep up, and there is a lot less RAM available than there is flash memory, especially since I'm setting aside 64k bytes (32k words) for the video buffer/character RAM shared area. The buffer will be exposed at $D000 when the correct register bit is set and is available as a set of eight pages. The text-mode code uses a ton of space with giant switch statements because it's the only way to get the code to run fast enough to keep up with the raster beam, but for graphics modes I can use loop unrolling since there isn't multiple mode mixing like there is in the text modes. Hopefully there will be enough space for the 16-color and 256-color modes. All non-true-color modes will use the color palette for selecting colors: for example, the two-color modes use palette entries 0 and 1, the 16-color modes use the first sixteen palette entries, and so on. As for the test patterns in my screenshots, I'm generating those with temporary code in the Pico rather than in Inty code. It's just a lot easier for testing. Edited December 15, 2021 by JohnPCAE 2 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 16, 2021 Author Share Posted December 16, 2021 All graphics modes are now implemented and tested! 1 Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 16, 2021 Share Posted December 16, 2021 Hi, John, Just for reference, for it's been a while since you started this thread, could you remind us how the final video module connects to the Master Component? Is it via the cartridge port or directly to the main board inside? What sort of modifications to a stock console does it require? dZ. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 16, 2021 Author Share Posted December 16, 2021 It plugs into the cartridge port. It doesn't require any modification of the console to work per se, but on an unmodified Inty I the overlay video will be noticeably dimmer. You'll get much better results after doing the System Changer mod. Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 16, 2021 Share Posted December 16, 2021 23 minutes ago, JohnPCAE said: It plugs into the cartridge port. It doesn't require any modification of the console to work per se, but on an unmodified Inty I the overlay video will be noticeably dimmer. You'll get much better results after doing the System Changer mod. Alright, that's what sort remember you mentioning before. So, it works on a stock Inty II, and requires the System Changer mod for an a Inty I. ?? dZ. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 17, 2021 Author Share Posted December 17, 2021 (edited) This is what I'm looking at for the (hopefully final) version of the board. Like the one I'm currently testing, everything is contained on only a single board, with the Pi Pico doing the heavy lifting. The major change here is the user port: instead of a nonstandard port I've changed it to a standard 25-pin bidirectional parallel port. It isn't ECP or EPP, it's just a standard port, though the eight data pins can be set to input instead of output. It has one nonstandard restriction in that the four control lines (nStrobe, nAuto-Linefeed, nInitialize, and nSelect-Printer) are write-only. So you can't use them for getting inputs but it isn't necessary since it supports reading from the data lines. EDIT: Well, strictly speaking, there is a second board. It's the small board that plugs into the Master Component. It's special in that it has a PLL circuit that multiplies MCLK by 4 to generate the ~14MHz clock that the board here needs for color output. Edited December 17, 2021 by JohnPCAE 1 Quote Link to comment Share on other sites More sharing options...
+Lathe26 Posted December 17, 2021 Share Posted December 17, 2021 Just curious: what CAD program are you using? Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 17, 2021 Author Share Posted December 17, 2021 Eagle 7.7 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 24, 2021 Author Share Posted December 24, 2021 (edited) While waiting for my updated board design, I've been busy with the software: 1. Removed per-column blanking. Since the virtual width, height, and starting memory location are independent of what gets displayed, the feature isn't necessary. Removing it also alleviated some timing issues. 2. Increased the available memory size from 64k to 96k. That's pretty much as high as it can go since all of the code is taking up RAM. Running from flash is just way too slow. 3. Added something really crazy and highly experimental. While it appears to be working so far, I can't guarantee that it will work in all cases: That really crazy something is a CP1600 CPU emulator that will run when the Inty is idle. Spefically, it runs under the following conditions: - during NACT cycles if the Pi Pico didn't have to put something on the Inty's bus right beforehand (this is because the Pico needs to do something else during those NACT cycles) - during DW, INTAK, and IAB It shares the 96k bytes used by the video memory, so it has an effective address space of 0000-BFFF (obviously it can't see anything on the Inty's hardware bus). Its advantages are that it runs parallel to the real CPU and that all of its instructions run in a single cycle (at least I hope so, or things can go haywire if the Pico falls behind on the next cycle). So far I've only tested it with a tiny looping program that repeatedly increments a single byte in its address space (which, since it's shared with video memory, can be seen on the screen). It can be used for a number of things since it's emulating a general-purpose CPU, and if it turns out to work well I can see it being used for quite a few acceleration tasks. Also, it has the potential to be more powerful than a real CP1600 since I can pretty easily extend its opcodes past ten bits (though I'm limited by the Pico's overall memory constraints since adding code takes up RAM). It even supports BEXT branches and external interrupts since you can use the board's registers to specify EBCA0-3 pins and an interrupt vector, as well as trigger an interrupt. I'm not sure why one would need to do this, but it's possible. The only ignored instructions are SIN and TCI, since there are no actual hardware lines that they would toggle. Those are essentially no-ops. I plan to do some more testing with it and if I run into timing overrun problems I'll investigate splitting some instructions across two cycles. I'm most concerned about instructions that set all four flags since those require extra code. To date I've only tested it with something simple that continually increments a character value on the screen: $6000: MVI R0, $100 MOVR R0, R1 ANDI R0, #$00FF ANDI R1, #$FF00 INCR R0 MVO R0, $100 B $6000 This would be a LOT easier if the Pi Pico had more than two cores. This works by having the core that is responsible for monitoring the Inty's bus run the emulator when it's idle. The other core is needed entirely for driving the display and it ***barely*** has enough horsepower to pull it off in certain modes (80-column mode and 4-color characters in 40-column mode really strain it). That said, running the emulator in the same core as the bus monitor has one huge advantage: there are no issues with memory conflicts. If I ran it in a separate core there would always be the possibility that two cores might try to write to the same location at the same time, leading to unpredictable results. That said, anyone who had their real CP1600 try to write to the same location as a running emulated one pretty much deserves it On a side node, one thing that continually drives me crazy with respect to timing is the compiler and how it handles switch statements. It's highly inconsistent. Put your switch in one place and it runs within a certain time, move it elsewhere and it runs a lot slower. The way that gcc compiles switch statements frankly sucks. For my emulator I had to implement a custom dispatcher to achieve consistent results even though it's slower than the fastest possible implementation if I did it in straight assembly. I can't do this with the graphics output code however because the overhead is just too great. I have to rely on gcc to compile it efficiently instead. Finding the magic formula of getting efficient display code has been a real challenge. Now that it's working I'm considering that code frozen. Edited December 24, 2021 by JohnPCAE Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 24, 2021 Author Share Posted December 24, 2021 (edited) To elaborate a little more on the emulator, here is how I see it being used: Let's say you want to scroll a portion of the screen. Since the emulator effectively runs at roughly twice the speed of the real CPU, you can offload the scrolling routine to the emulated CPU and trigger it by writing to a register that tells it to execute at a particular location (the emulator doesn't run until you tell it to). You can put a HLT instruction at the end of the routine and poll one of the board registers until a bit is set telling you that execution has halted. Now, there is a big caveat to this. Since the emulator cannot run on a NACT if the Pico had to put something on the bus right beforehand, polling its register will prevent the emulator from running on some NACT cycles. It will still run, since it can run on a NACT right after an instruction fetch or during the conditional branch in a polling loop, but it will be idle during the NACT cycle when its register is actually being polled by the bus. You can alleviate this somewhat by inserting some NOPs in your polling loop if timing isn't that critical. Another way to use the emulator could be for accelerating calculations. Let's say you want to multiply some numbers. It's not implemented yet, but I plan on adding some extended opcodes like multiplies, inclusive ORs, and anything else I can add based on whether the additional code will fit in the Pico's remaining RAM and whether they can run within a single bus cycle. You could write your parameters either to registers or directly to memory, tell the emulated CPU to run a routine, and wait until it completes. You can either poll for a halt status, or if the runtime is deterministic, just wait with a series of NOP instructions until you know enough NACT cycles have transpired. Then just read the results from either memory or from the board's registers (reading from memory will always be faster). Edited December 24, 2021 by JohnPCAE 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 26, 2021 Author Share Posted December 26, 2021 (edited) Hey guys, I'm in a help-needed situation. My new board design has arrived and I'm in the process of assembling one for testing. The issue I'm having is right-angle card edge connectors. I have one for myself, but since I have five boards, if everything tests out fine I'd like to send out some free samples to some people to let you evaluate them. The problem is getting the connectors for the cartridge port. I need connectors that extend past the edge of the board somewhat the way they do in the Inty, because the cartridge shell needs to be able to go around the connector housing. Does anyone have any suggestions as to how we can source these parts? EDIT: I think EDAC part 392-044-558-201 might do the trick, but Mouser's minimum order is 100 and Digi-Key's is 25. I only want up to five since that's all the boards I have and I don't even know if they will fit the bill. EDIT #2: I found a place where I can order less and I ordered 5, but I won't get them until the end of March Edited December 26, 2021 by JohnPCAE Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 26, 2021 Share Posted December 26, 2021 On 12/24/2021 at 3:58 AM, JohnPCAE said: To elaborate a little more on the emulator, here is how I see it being used: Let's say you want to scroll a portion of the screen. Since the emulator effectively runs at roughly twice the speed of the real CPU, you can offload the scrolling routine to the emulated CPU and trigger it by writing to a register that tells it to execute at a particular location (the emulator doesn't run until you tell it to). You can put a HLT instruction at the end of the routine and poll one of the board registers until a bit is set telling you that execution has halted. I'm not sure I understand what this would accomplish. I thought you said that the emulator has no access to the bus, which I took to mean that it cannot read or write to the console's memory. Or are you talking about offloading all dynamic computations (like shifting GRAM cards, etc.)? On 12/24/2021 at 3:58 AM, JohnPCAE said: Now, there is a big caveat to this. Since the emulator cannot run on a NACT if the Pico had to put something on the bus right beforehand, polling its register will prevent the emulator from running on some NACT cycles. It will still run, since it can run on a NACT right after an instruction fetch or during the conditional branch in a polling loop, but it will be idle during the NACT cycle when its register is actually being polled by the bus. You can alleviate this somewhat by inserting some NOPs in your polling loop if timing isn't that critical. Another way to use the emulator could be for accelerating calculations. Let's say you want to multiply some numbers. It's not implemented yet, but I plan on adding some extended opcodes like multiplies, inclusive ORs, and anything else I can add based on whether the additional code will fit in the Pico's remaining RAM and whether they can run within a single bus cycle. You could write your parameters either to registers or directly to memory, tell the emulated CPU to run a routine, and wait until it completes. You can either poll for a halt status, or if the runtime is deterministic, just wait with a series of NOP instructions until you know enough NACT cycles have transpired. Then just read the results from either memory or from the board's registers (reading from memory will always be faster). Accelerated calculations sounds like a very useful feature, although I shudder at the thought of having to compose the expression evaluator in CP-1600 Assembly. I think it would be more us useful to offer complete pre-fab mathematical functions, like square roots, trigonometry, vector multiplies, matrix transforms, etc. I think this is what the LTO or JLP firmware offers. dZ. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 27, 2021 Author Share Posted December 27, 2021 (edited) In the example I mentioned above, by scrolling the screen I mean scrolling the *overlayed* screen data. It has access to its internal memory, so if you're showing 40x25 (or 80x25) text in the overlay, the emulator can manipulate that video buffer twice as fast as the Inty could. Or, if you're using bitmap graphics mode, the emulator can manipulate that frame buffer twice as fast as the Inty could. So far, I've implemented the following extended instructions: OR RD, ADDR ; RD |= [ADDR] OR@ RM, RD ; RD |= [RM] ORI DATA, RD ; RD |= DATA ORR RS, RD ; RD |= RS SLLR RD, RA ; RD = RD shl RA SLRR RD, RA ; RD = RD shr RA SARR RD, RA ; RD = RD sar RA (arithmetic shift right) ROLR RD, RA ; RD = RD rol RA RORR RD, RA ; RD = RD ror RA MULR RS, RD ; RD = low word of unsigned multiplication of RS * RD IMULR RS, RD ; RD = low word of signed multiplication of RS * RD MULRW RA, RB ; unsigned multiplication of RA and RB are placed in R1:R0 (R1 = high word, R0 = low word) IMULRW RA, RB ; signed multiplication of RA and RB are placed in R1:R0 (R1 = high word, R0 = low word) Edited December 27, 2021 by JohnPCAE Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 27, 2021 Share Posted December 27, 2021 6 hours ago, JohnPCAE said: In the example I mentioned above, by scrolling the screen I mean scrolling the *overlayed* screen data. Ah, that is cool! Quote Link to comment Share on other sites More sharing options...
Mik's Arcade Posted December 27, 2021 Share Posted December 27, 2021 holy crap I have nothing of intelligence to add to this thread, but it never ceases to amaze me how many super talented people there are on this forum Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 27, 2021 Author Share Posted December 27, 2021 I've managed to squeeze in some more instructions, but I've been hitting the RAM limit so I'm not sure how much more I can get in. So far the list is up to the following: OR RD, ADDR ; RD |= [ADDR] OR@ RM, RD ; RD |= [RM] ORI DATA, RD ; RD |= DATA TST RA, ADDR ; performs a bitwise AND and sets the S and Z flags based on the result TST@ RM, RD ; performs a bitwise AND and sets the S and Z flags based on the result TSTI DATA, RD ; performs a bitwise AND and sets the S and Z flags based on the result SLR RD, ADDR ; RD = RD shl [ADDR] SLR@ RD, RA ; RD = RD shl RA SLR RD, DATA ; RD = RD shl DATA SLR RD, ADDR ; RD = RD shr [ADDR] SLR@ RD, RA ; RD = RD shr RA SLR RD, DATA ; RD = RD shr DATA SAR RD, ADDR ; RD = RD sar [ADDR] (arithmetic shift right) SAR@ RD, RA ; RD = RD sar RA (arithmetic shift right) SAR RD, DATA ; RD = RD sar DATA (arithmetic shift right) ROL RD, ADDR ; RD = RD rol [ADDR] ROL@ RD, RA ; RD = RD rol RA ROL RD, DATA ; RD = RD rol DATA ROR RD, ADDR ; RD = RD ror [ADDR] ROR@ RD, RA ; RD = RD ror RA ROR RD, DATA ; RD = RD ror DATA ORR RS, RD ; RD |= RS TSTR RA, RB ; performs a bitwise AND and sets the S and Z flags based on the result SLLR RD, RA ; RD = RD shl RA SLRR RD, RA ; RD = RD shr RA SARR RD, RA ; RD = RD sar RA (arithmetic shift right) ROLR RD, RA ; RD = RD rol RA RORR RD, RA ; RD = RD ror RA MULR RS, RD ; RD = low word of unsigned multiplication of RS * RD IMULR RS, RD ; RD = low word of signed multiplication of RS * RD MULRW RA, RB ; unsigned multiplication of RA and RB are placed in R1:R0 (R1 = high word, R0 = low word) IMULRW RA, RB ; signed multiplication of RA and RB are placed in R1:R0 (R1 = high word, R0 = low word) The code doesn't take up all that much space, but the dispatch tables take up a lot and that's the driver of whether I run out of RAM. Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 27, 2021 Share Posted December 27, 2021 Any chance for arithmetic+branch instructions? Like, add/incr+branch on carry, subtract/decr+branch on zero. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted December 28, 2021 Author Share Posted December 28, 2021 On 12/27/2021 at 4:42 PM, DZ-Jay said: Any chance for arithmetic+branch instructions? Like, add/incr+branch on carry, subtract/decr+branch on zero. Do you mean like the x86 LOOP instruction? Decrement a register and branch if not zero? Quote Link to comment Share on other sites More sharing options...
+DZ-Jay Posted December 28, 2021 Share Posted December 28, 2021 30 minutes ago, JohnPCAE said: Do you mean like the x86 LOOP instruction? Decrement a register and branch if not zero? Yeah, that sort of thing. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.