ThomH
Members-
Posts
179 -
Joined
-
Last visited
Content Type
Profiles
Forums
Blogs
Gallery
Events
Store
Community Map
Everything posted by ThomH
-
Composite video is a one-dimensional signal. It contains horizontal and vertical synchronisation marks. Horizontal marks tell the beam to move back to the left, vertical marks tell it to move back to the top. The TIA automatically handles horizontal duties. Vertical are the responsibility of the programmer. To communicate a vertical sync you need to enable a particular signal, wait an appropriate amount of time, then disable it. The NTSC standard says* you should leave the signal active for three lines. This game appears not to be leaving the proper amount of time. So the monitor doesn't realise what it's trying to communicate. * more or less.
-
Based on a quick check, it asserts the sync line for a really short period: cumulatively, only slightly more than a line and a half — slightly more than half the NTSC specification. I'll wager the 1702's tolerances require a little more. Of other titles I tried randomly: Combat, Pitfall, Night Driver, Pengo, River Raid: identical periods around 2.8 lines; Solaris: more like 2.6 lines; Pacman: just one colour cycle longer than Combat et al; Miner 2049er: around 2.4 lines; Joust: around 2.7 lines; Video Life: 4.6 lines (!). So they're mostly closer to spec or over it. I think, despite being an analogue dunce so cut me down if you have to, that a common technique for discerning sync in the analogue days was allowing the sync level to charge up a capacitor and triggering a vertical sync if it fills up and hence discharges. So over-extending sync likely wouldn't be as problematic as cutting it short.
-
The quickest summary here would probably be: I don't know what I'm talking about. Naturally I have some knee-jerk responses along the lines of "if there are N possible mappings, use the fact that computers are fast to run all of them simultaneously and keep whichever looks to be doing the most probable things re: painting the screen, scanning the inputs, etc". If I were writing an Intellivision emulator, perhaps I would quickly find out why the existing examples don't do that. If the problem could be solved by somebody with no prior knowledge in an off-the-cuff forum post, it would have been solved a long time ago.
-
If we're looking for a common root, surely it's the idea of a Turing machine? If you know that two different machines are Turing complete then you know that, storage limitations aside, either can run the software of the other. Otherwise I don't think there's a single reason why; each case is distinct. A couple of obvious and prominent examples that predate the '90s video game emulation explosion closely enough so as possibly to have more-widely seeded the idea: the 68000 emulator that shipped with PowerPC Macs and the BBC emulator that shipped with the Archimedes; both examples of running code for a different processor at sufficient speed closely to recreate the experience of last year's machine.
-
If it's any help further to comment, I have a unit test for my detection routines that runs through every ROM available on AtariAge and compares results against a hand-collated list; this test doesn't throw up any false positives or false negatives amongst that set.
-
I've tried not to look too much at Stella, but I guess we've ended up at almost the same tests. The only difference is that I'm introducing an intermediate form into my byte signature search. And using different signatures. I guess in theory I'd avoid some potential false positives through keeping everything PC aligned, but miss some genuine positives for failure statically to spot complicated branching strategies. There's at least one title that naively disassembles to only about ten instructions because its reset program just pushes a couple of values to the stack and then performs an RTS, and my disassembler isn't that smart. Quick tip on the RAM test though, as it could help you to eliminate the special case you mentioned for Dig Dug: I initially had more or less exactly that but found it failed on the Dig Dug here on Atari Age. I switched to "if the first 128 bytes are a mirror of the second". That works correctly for all ROM images here, including Dig Dug. No false positives, no false negatives. My working theory is that somebody read the cartridge without physical disassembly in ascending address order, the first 128 bytes getting whatever happened to be on the bus of whatever device they constructed, but then getting the same values reported back because the read ports come after the write ports and the write ports also captured whatever was on the bus. If you considered that as a strategy but discounted it for some reason, the warning would be appreciated.
-
By coincidence I was working on this problem in the last week or so and found that Atari bank switching schemes can be discerned automatically ahead of time through the combination of disassembly + file size. It'd be interesting to see whether the same thing works for Intellivision titles, but it sounds like quite possibly not?
-
It's clearly likely to be a super-unreliable tip, but on my Mac I found that using the Finder to move all the files off a FAT drive and then them back on had the effect of sorting them into name order*. So try that if FATsort ever becomes unmaintained or is otherwise unavailable. * also the order I had the temporary folder they rested in set to display within the Finder, which may or may not be coincidence.
-
Per the Digital Press list of scaliness generated per game, the NTSC version of Realsports Basketball generates a whopping 290 lines per frame. So it's probably just incompatible with your television; exactly on-spec NTSC video would be 262.5 and most Atari games are 262 or 263 as a result. That's the fault of the original developers, not the Harmony. It doesn't know any Pac-Man other than the Atari original and lists an ordinary 262 lines for Centipede but they may have missed a distinction between title screen and game, or in that case possibly you do have a TV system mismatch as others have suggested?
-
I'm at a loss to explain that; there's definitely at least one full play through on Youtube so maybe there's just some inessential part of the tunnels that crashes, which I for some reason have never visited? EDIT: the Youtube is an emulated play through though, so things randomly vanish, the raster effects flicker upward and downward, etc. Maybe the reason I'm the first to suggest it is that Handy gives it a bad rep?
-
It never used to crash for me; I've completed it several times. Where does it crash for you?
-
Shadow of the Beast perhaps has a bit too much learning by rote, but is nevertheless an excellent title.
-
Making Programming the Lynx a little Easier
ThomH replied to Turbo Laser Lynx's topic in Atari Lynx Programming
I was referring more to the absence of Handybug or anything like it other than in Windows but, in any case, I'd be surprised if you could plug it into anything. If I were writing C, I'd probably knock up my own Lynx-on-computer simulation library to cover whatever I'd put on top of the cc65 libraries. -
Making Programming the Lynx a little Easier
ThomH replied to Turbo Laser Lynx's topic in Atari Lynx Programming
Job one would actually be to make any debugger that runs away from Windows but at the cost of a more complicated workflow that does lend itself to a CPU+traps unit-testing route, which would give 70% of the benefit for about 10% of the work. -
Making Programming the Lynx a little Easier
ThomH replied to Turbo Laser Lynx's topic in Atari Lynx Programming
I'm a million years late to this, but the main obstacle for me isn't writing code, it's testing code — especially with Handybug being limited to Windows. A couple of ideas: 1) a Lynx-on-your-computer library. Exposes all the same functions as CC65 ordinarily does, but targets Windows and/or Linux and/or the Mac. So I write my code, complete with whatever tests I like, then can run with my normal debugger attached. 2) an assembler that allows one to apply unit tests. I've considered a few ways around of doing this, and I'm starting to think the best solution is to kill the traditional assembler pipeline, converting the assembling logic from something you call from the command line to something you call from your high-level language of choice, and throwing in a 65SC02 emulator. So e.g. instead of: ; I intend that this method does task X when given inputs M, N, O. routine: ... RTS And invoking my assembler and loading it up, and testing it, I'd have something more like, in C-ish pseudocode: Fragment *fragment = assemble(STRINGIFY( routine: ... RTS )); for(m in possible Ms) { for(n in possible Ns) { for(o in possible Os) { Processor *processor = new_processor(); set_state(processor, m, n, o); perform_fragment(processor, fragment); if(get_state(processor) != expected_state) { // throw error } } } } ... Linker *linker = new_linker(); add_fragment(linker, fragment1); (etc) ... Cartridge *output = new_cartridge(target_name); write_standard_booter(output); write_object_code(get_output(linker)); Which probably means I don't need to visit an emulator at all until I've already got a large proportion of my code working, that my testing is not limited to the scenarios I manually produce, and that I can go back much later and optimise parts while being confident I'm not introducing edge-case breakages to the whole. I can wait to hit the emulator until I've got actual gameplay to experiment with, unless somebody here knows how to write a test case for "is this fun?". EDIT: I guess the other advantage of having such an assembler-as-a-library would be that you'd get a whole bunch of higher-level functionality for free by just doing the computations in your high-level language. I'm thinking of things like building a trig table or performing a PNG -> native image conversion. The assembler could be relatively simple, probably just keeping track of labels and offering basic expression support for label manipulation (like LDA label+2). -
It'd be interesting to know what tests you applied for the 1 x y case; can you provide any more information on that? How many different values of y were tested? Was your CPU routine drawing masked sprites, with a full read/modify/write? Were you scaling? Did you compare speeds if also using a collision buffer?
-
Haha, accidental capture. And so quiet that I didn't even realise it was there. Let this be a snapshot also of at least one thing I was listening to c.2010.
-
With yet more very slight thought, I think my Mode 7 demo wittingly-or-otherwise shows that the cycle timings given in the documentation are against the 16Mhz bus. I'm confident the the per-pixel loop was something almost exactly like: LDA abs ; load high byte of pixel address from Suzy's accumulation register LDY abs ; load low byte of pixel address from Suzy's accumulation register STX abs ; store to Suzy to trigger the next multiply with accumulate STA abs ; dynamically modify the LDA below by storing the high byte of the pixel address to it LDA abs,y ; load the next pixel of floor colour PHA ; store the next pixel of floor colour I'm questioning now whether I unrolled it*, but by my calculation adding the above up gives 4 + 4 + 4 + 4 + 5 (all lines are page aligned) + 2 = 23 cycles at 4Mhz. The documentation states that multiply with accumulate takes "54 ticks". Even if I didn't unroll, there's no way I spent 21 cycles on decrementing an 8-bit loop counter and jumping. Therefore I'm going to say that the fact that the code above works** offers strong evidence that 54 ticks means 54 cycles at 16Mhz. So 13.5 is the number to beat if you want to do it entirely in software. * if I were writing now, I'd probably put eight copies of that plus a decrement and jump back to the start onto the zero page, to save a cycle in the dynamic reprogramming bit, then just jump in at the right position ala Duff's device. ** tested on real hardware:
-
My last post on the topic! I promise! But another good example of overlapping work returned to my conscious mind: my Mode 7 demo. When drawing one of those perspective floors one iterates from left to right, maintaining a current texture map location and a vector. Look up the colour at the current location, put that into the pixel, add the vector, move to the next pixel. In my version, the current location was held in Suzy's accumulation vector. The vector was updated by triggering a multiply with accumulate. The numbers being multiplied worked out to the vector. So the implementation over on the CPU was: Read current location. Write trigger byte to begin addition. Look up colour at current location. Push to stack. Repeat until done. Then, at the end of each line: the next line of output is assembled at one-byte-per-pixel on the stack. Get Suzy to move it to the right place, scaling it down so as to repack to 4bpp. (and, in a later optimisation: don't do 160 pixels per line, do only as many as there are unique texels at that scale, and have Suzy scale based on that; it creates raggedy edges from precision loss but cut out something like half the work at my particular scale. I could have ameliorated a little without cost through better rounding but didn't. I could possibly have saved by assembling multiple lines on the stack at once, if drawing some that are short enough, but didn't.) Summary: performing 32 bit addition on Suzy by supplying one number as two of its factors and allowing a multiplication to occur is almost certainly not faster than just doing it on the CPU. But overlapping the work means that suddenly all the CPU is spending on it is the four cycles of a store absolute. Which is faster.
-
Actually, it strikes me that there's not even a need to be consistent. Suppose you define that each object is rendered according to exactly three inputs — its geometry, a model matrix and the camera matrix. Then you're going to compose the two matrices, then apply them to the geometry. If each matrix is in 2:14 then you can multiply them together using Suzy to get a 4:28 result. Keep only the top two bytes and you're at 4:12. Suppose your model geometry is also 4:12 then when you apply the composed matrix to the geometry you'll end up at 8:24. Keep just the top two bytes and you're at 8:8 just before your clipping and perspective projection (or perspective projection and clipping, if you prefer doing the clipping in pixels). Which is a comfortable place to be. So: trig tables at 2:14. Geometry at 4:12. Perspective and clipping code works in 8:8. No shifting required.
-
A million years late, but further to add: Suzy doesn't just multiply, she multiplies with accumulation. So that's two steps at a time for vector and matrix computation, not one. Even if the timings are 4Mhz rather than 16 (and, like the author, I have no idea), I think you're still doing better than on the CPU. The original Elite uses [-1,1] range numbers in all its matrices, but with only a single byte of precision. On a Lynx you could use 14 bits (to include the ability to store 1.0 and -1.0), 15 bits (conveniently introducing a rounding error and storing 1.0s as just less than their true value) but probably not 16 because you can't take the signs out and put them elsewhere, and get Suzy to accumulate. Also I guess you'd read two bytes askew and then shift by only one bit, rather than actually performing 15 steps but that's probably obvious. I used a 14-bit scheme on a z80 project; the precision is pretty good.
-
I also would love to be on the list. It's been probably a decade since I wrote Lynx homebrew; it'd be fun to try again.
-
I've now implemented full composite colour. Which means I'm imagining a lossless modulation and demodulation but I'm doing fairly well on that front. Many other emulation issues remain. Regular monochrome KC still isn't doing much of anything at all. Does it require emulation of the Supercharger hardware? ET screenshot attached. A very slow video is at http://www.youtube.com/watch?v=Dpu9htFCQn8 — my pixel shader is currently atrocious. No attempt at parallelism at all. Really bogging everything down. --- Additional edited-in notes: You'll see some stray random noise pixels in the ET image. That's where successive raster beams overlap, since they're painted additively. It's not intended to be some sort of simulation of signal noise, it's a genuine precision problem. Possibly I need to emulate a shadow mask. Because the YIQ encode and decode is written and performed fully functionally with no intermediate buffer, it should produce real pixel-correct results no matter what your display resolution. It is a genuine YIQ decode in which Y can pollute I and Q, and vice versa, though the large solid blocks of colour on display make that largely irrelevant. A consequence of a decision not to bother storing anything or processing output when the beam is at the blanking level produces a bug whereby pixels from one edge of the active area think they're next to pixels from the previous edge. I will fix that. It's a full-frame free-running composite CRT emulation, which only knowingly smudges the details on phase detection (i.e. there is none, it's always magically perfectly synchronised to your Atari). However it's also still buggy so a bunch of games don't synchronise properly in various ways or show evidence of doubly scanned areas. Also I've yet to implement phosphor decay; I'm excited because switching to a formal phosphor model might allow me to break the need for any connection between emulated machine output rate and host machine output rate but we'll see. Additionally attached is an image from the second Dukes of Hazzard, showing something going wrong with horizontal sync detection.
-
Okay, I've figured out on what premise my data above was flawed; failure to take clamping into account. So the source RGBs I was working with do not convert directly back to YIQs — the relevant YIQs often convert to RGB values outside of the [0, 1] range, which are then clamped. Just having a go with providing a chrominance signal that is a fixed height sin wave offset by colour / 17 of a wave, with colour 0 being a special case that doesn't add chrominance, seems to produce an approximately correct display, though oddly dull. I'll update my emulator thread with screenshots.
-
I've been using CodeRunner, but will definitely give this a go. Props, collinp!
