Jac!s Sillyventure Contribution

Heaven/TQA · December 13, 2010

I guess Peter will publish the sources anyway...

+JAC! · December 13, 2010

So now that Rybags has entered the debugger, I think I can deliberately talk about the effect ;-)

Yes, the split is a fat linear kernel plus 33 split subroutines with a delay between 0 and 33 cycles. Most of the presentation effects (even the sinus as long it the frequency is so low) could be accomplished by simple DL LMS repositioning or copying memory. But not the fast rotator as it can show any angle in any frame. That would mean copying 8k in 1/50s and hence is not possible.

Here some more technical information:

This gif shows the accuracy you can reach with single CPU cycle exact splits. This works properly on both Atari800 Win 4.0 and Altirra, which was helpful. But I tested all the time also on the real hardware, because I know Murphy very well.

As you can see the accuracy is 4 color clocks at the beginning and the end of a line, bot only 8 color clocks in the middle. That's why there is the player placed above.

As mentioned before, my self-implied restrictions to 64k was the major issue. With extended memory I could have split PORTB/$d301 in any mode but with 64k, the only way to change graphics on the fly is to split CHBASE/$d409. I had tried this many times before but the bad lines spoilt it. Then, one fine day Rybags posted "Yes, you can defeat bad lines using VSCROL" and finally when Tezz wrote that the working version of G2F already uses this, it became possible. Still is was very tricky to get the timing right. The screen memory consists of 80 bytes only. The second 40 bytes of them are repeated for the rest of the screen. The first few lines cannot be used, because the time is not yet balanced there. So there is 1k per char line, i.e. two pictures spread over 48k. The rest of the code (splits, stores to update the kernel) are interleaved therein.

>Can you change charsets midline with pixel resolution? inside a char? or are you limited to a char resolution and use the black line to cover some parts?

See above, 1 char or 2 chars, depending on the x-position.

>Can you apply that to changing colors midline, inclusive in bad lines?

Yes. But you lose of course time for defeating bad lines also. But the timing gets 100% stable.

>Looks like I get a thanks, Heaven gets a greet, also saw xxl, Fox, Raster, PG.

Sure. Whithout your "it is possible" is would no even have tried (again, after so many falilures)

BTW: If you have seen the demo >3 times you can remember when the scrolls turns.

That made reading easier for me after all that testing ;-)

>Will it be possible to change screen content at the same time?

Yes. I do this at the end when the "The End / Silly Venture / 2010" is printed onto the pictures. But there are not so many cycles left to really change memory. The version I sent to the party has a glitch at 6:59 when the "Sillyventure" is draw exactly because it that. The version on Pouet (V1.2) is fixed.

>But what about doing this with some different animations on both pictures?

>Possibly using gr. 10 (charmode)?

Yes. But as mentioned above you will have to use extended RBAM and PORTB instead of CHBASE. And I'm not sure what the delay is when changing PORTB. It might look different on different memory upgrades. That's one reason why I did it like I did.

Silly-Split-Test.xex

Edited December 13, 2010 by peter.dell

Rybags · December 13, 2010

But of course... bank-switching can give similar effects, but rules out 64K.

Those pesky Refresh cycles... really, they must have put some thought into them, but the reality of the situation is that they put them in the worst possible place during the scanline.

Putting them right near the end, after WSYNC, or spreading them out much more would have made things so much easier.

Edited December 13, 2010 by Rybags

Heaven/TQA · December 13, 2010

thx Peter for explaining.

analmux · December 13, 2010

...Putting them right near the end, after WSYNC, or spreading them out much more would have made things so much easier.

Well, for GED-alike kernel techniques it's a disadvantage, but I suppose the refresh cycles were put there for leaving more free cycles during hblank. Otherwise a DLI can't use that many cycles, at least for the purpose of changing regs outside visible area.

analmux · December 13, 2010

Without the PM, it just looks like the program is doing data moves.

Have to wonder... just how fast a "brute force" approach might work, where we try to accomplish the same effect just by copying data across.

Well, brute force movement is not really the thing I'm thinking of.

...But not the fast rotator as it can show any angle in any frame. That would mean copying 8k in 1/50s and hence is not possible.

Yes, I see it's a powerful technique to obtain a maximum of freedom :thumbsup: , but when does a coder really need random angle-transitions? At least not in two consecutive frames.

So, my approach would be using 24 charlines (antic 4) and each line its own font. Then every line only shows 40 chars. No GED-alike kernels. PM overlay masks can be used but wouldn't be really needed in every case.

Now use chars 0-39 for the first screen, and chars 64-103 for the second screen, so flipping from one to the other screen only needs setting/resetting bit 6 of the char#. Indeed, changing 8kB during one cycle is nonsense, but changing 1kB during one cycle is more realistic.

But, we'd only need to look at the difference between two consecutive frames. Thus, search for the boundary of changes. Then copy and paste (with pixel-exact boundary) data to the free areas in each font, see char# 40-63 and 104-127. We could even put precalculated stuff there. Then the only thing needed is doing change of char#codes in screenmemory at some selected positions.

Heaven/TQA · December 13, 2010

Mclaneinc · December 13, 2010

Peter, amazing demo, really nice on the eyes..

Deserved a top spot, well done..

Mclaneinc · December 13, 2010

Was the mini shootem up in there ever fleshed out into a full game?

Tezz · December 13, 2010

cool, reminds me of the Martech game W.A.R. except with a pcb map http://www.worldofspectrum.org/infoseekid.cgi?id=0005620

Tezz · December 13, 2010

Interesting to read the explanation Peter, I spent that entire week playing around with vscrol tests to gain some more insight into it although my intentions were looking more into mid colour changes which is limited by lms not being read after row0 which is basically what causes the bad lines in the first place so it pretty much negates the use in that respect. I came to the conclusion that it would be useful if the 5th colour was used only exclusivley on one side of the screen.

I've used cycle counting within my dli's for the pm manipulation in MM, I noticed from trial and error the 4 color clocks accuracy at the beginning and the end each line and 8 color clocks in the middle. Why is that the case?

MaPa · December 13, 2010

I've used cycle counting within my dli's for the pm manipulation in MM, I noticed from trial and error the 4 color clocks accuracy at the beginning and the end each line and 8 color clocks in the middle. Why is that the case?

Because of RAM refresh cycles? One CPU cycle is 2 color clocks and in one scanline you have 40 DMA reads and 40 free cpu cycles, so when you delay by one cpu cycle you delay it to DMA read thus it will delay one more cycle so 2 cpu cycles delay which is 4 color clocks. But in the left part of the screen there are 9 RAM refresh cycles which occupy the "free" cycles for cpu, so delay by one cpu cycle in code will result in 4 cpu cycles delay... DMA read, RAM refresh, DMA read, free cycle or something like that.

Tezz · December 13, 2010

Because of RAM refresh cycles? One CPU cycle is 2 color clocks and in one scanline you have 40 DMA reads and 40 free cpu cycles, so when you delay by one cpu cycle you delay it to DMA read thus it will delay one more cycle so 2 cpu cycles delay which is 4 color clocks. But in the left part of the screen there are 9 RAM refresh cycles which occupy the "free" cycles for cpu, so delay by one cpu cycle in code will result in 4 cpu cycles delay... DMA read, RAM refresh, DMA read, free cycle or something like that.

Great, understood! Thanks MaPa

xxl · December 13, 2010

> cool, reminds me of the Martech game W.A.R. except with a pcb map http://www.worldofsp....cgi?id=0005620

yes, this is W.A.R. game part2 from bbc micro

Heaven/TQA · December 13, 2010

btw. look at the Ooz pic at the bottom... not bad at all...

http://atarionline.pl/v01/index.php?subaction=showfull&id=1292176402&archive=&start_from=0&ucat=1&ct=nowinki

analmux · December 14, 2010

Ehm I think coding a screen-splitter this way, using the mid-scanline font-change trick, is a real challenge. It's a nasty trick :thumbsup: , but it can be simpler, using less memory, using less cpu-time...

...or am I just f***ing mad? My approach won't work??? It will take far more cpu-time?

[...]

So, my approach would be using 24 charlines (antic 4) and each line its own font. Then every line only shows 40 chars. No GED-alike kernels. PM overlay masks can be used but wouldn't be really needed in every case.

Now use chars 0-39 for the first screen, and chars 64-103 for the second screen, so flipping from one to the other screen only needs setting/resetting bit 6 of the char#. Indeed, changing 8kB during one cycle is nonsense, but changing 1kB during one cycle is more realistic.

But, we'd only need to look at the difference between two consecutive frames. Thus, search for the boundary of changes. Then copy and paste (with pixel-exact boundary) data to the free areas in each font, see char# 40-63 and 104-127. We could even put precalculated stuff there. Then the only thing needed is doing change of char#codes in screenmemory at some selected positions.

Edited December 14, 2010 by analmux

Rybags · December 14, 2010

Sounds workable.

Best method might be a pair of loops - first loop will set characters to 0-39, second loop does 40-79, have a table which controls where the split occurs on each row of characters.

So then you might potentially just have something like this. One copy of the code required for each row involved in the split. Should run quicker if the "split_table" compare is replaced with a self-modified immediate mode instruction. Would need special case handling though, if no split occurs and the line must have chars 40-79, this won't work as it is.

; On entry X contains row number e.g. 0-23
ldy #0
loop1:
tya
cmp split_table,x  ; split_table has list of column numbers where split occurs for each row - use value >40 for no split
beq loop1_exit
sta screen_line1,y
iny
bne loop1
loop1_exit:
tya
clc
adc #40
tay
loop2:
cpy #80
bcs exit  ; Check early in case there's no split
tya
sta screen_line1,y
iny
bne loop2

Kaz atarionline.pl · December 14, 2010

Congratulation to JAC! for 1st place!

I just put voting results from all categories (8-bit/ST/Falcon):

http://atarionline.pl/v01/index.php?subaction=showfull&id=1292292594&archive=&start_from=0&ucat=1&ct=nowinki

Some productions are amazing! You can download stuff here (direct link):

http://atarionline.pl/demoscena/cp/Silly%20Venture%202010/sillyventure2010all.7z

Edited December 14, 2010 by Kaz atarionline.pl

MaPa · December 14, 2010

Ehm I think coding a screen-splitter this way, using the mid-scanline font-change trick, is a real challenge. It's a nasty trick , but it can be simpler, using less memory, using less cpu-time...

...or am I just f***ing mad? My approach won't work??? It will take far more cpu-time?

[...]

So, my approach would be using 24 charlines (antic 4) and each line its own font. Then every line only shows 40 chars. No GED-alike kernels. PM overlay masks can be used but wouldn't be really needed in every case.

Now use chars 0-39 for the first screen, and chars 64-103 for the second screen, so flipping from one to the other screen only needs setting/resetting bit 6 of the char#. Indeed, changing 8kB during one cycle is nonsense, but changing 1kB during one cycle is more realistic.

But, we'd only need to look at the difference between two consecutive frames. Thus, search for the boundary of changes. Then copy and paste (with pixel-exact boundary) data to the free areas in each font, see char# 40-63 and 104-127. We could even put precalculated stuff there. Then the only thing needed is doing change of char#codes in screenmemory at some selected positions.

But this way you will change only at char line basis not scanline as JAC! does. Or did I misunderstood?

Rybags · December 14, 2010

That approach would only do at a character basis.

So maybe not so great for many angles.

An advantage might be had in that you don't have the 100% CPU use during the visible portion, the routine could run starting near the bottom of screen and probably take less time overall.

Possibly it could be used in conjunction with DLIs to create shorter characters. That would allow more finess, e.g. instead of 24 graduations vertically you could have 48 or more.

Edited December 14, 2010 by Rybags

analmux · December 14, 2010

[...]
But this way you will change only at char line basis not scanline as JAC! does. Or did I misunderstood?

Yes, you are correct. It will only increase the font# every 8 scanlines, not on scanline basis, thus also not on mid-scanline basis. The DLIs will increase the font#, but this will only be a static routine. To change screenmemory, which only means changing char#s at selected places, we would need a dynamical routine.

That approach would only do at a character basis.

So maybe not so great for many angles

[...]

?

There's no problem with angles. Remember, we could let the cpu compute the corrections of a pixel-detailed boundary. So, we won't need to change 8kB. We only need to change font data, at some selected places.

Compare with software sprites in charmode: We need 80 chars of background data. This data itself won't be changed. This leaves 48 chars, in each of the 24 fonts, for doing 'software sprites'. The difference is that we're not dealing with char-clusters, but with "lines & curves of chars".

OK, doing software sprites also has its limits. Doing a charcluster of f.e. 8*8 chars will eat all cpu-time in one frame. But, f.e. the fast rotation part in the demo could be precomputed. The 48 chars should be enough for that.

Thinking about it, this approach would even give more freedom. When choosing the cut boundary between the 2 pics, we could have a curved line with many loops. The Jac! kernel & PM overlay approach will have its limits....but, possibly might fit better into 64kB, compared to my approach.

(OK, sorry for the big edit)

Edited December 14, 2010 by analmux

Shamus · December 16, 2010

Cool demo and nice music. :thumbsup: Great work JAC! I love how the transitions are smooth and flow very well everywhere--that's something you don't see much in demos these days.

Just thought I'd post a little note here on emu compatibility: It works perfectly in Atari800 1.2.0 but for some reason, it doesn't work in Atari++ 1.58©. When it goes into the demo proper ("dots") the "SILLY THINGS" logo doubles in size, and the colored letters are a solid color. Also, the picture section shows garbage characters.

Of course you all know how it works on real hardware.

Creature XL · December 16, 2010

There's no problem with angles. Remember, we could let the cpu compute the corrections of a pixel-detailed boundary. So, we won't need to change 8kB. We only need to change font data, at some selected places.

Compare with software sprites in charmode: We need 80 chars of background data. This data itself won't be changed. This leaves 48 chars, in each of the 24 fonts, for doing 'software sprites'. The difference is that we're not dealing with char-clusters, but with "lines & curves of chars".

OK, doing software sprites also has its limits. Doing a charcluster of f.e. 8*8 chars will eat all cpu-time in one frame. But, f.e. the fast rotation part in the demo could be precomputed. The 48 chars should be enough for that.

Thinking about it, this approach would even give more freedom. When choosing the cut boundary between the 2 pics, we could have a curved line with many loops. The Jac! kernel & PM overlay approach will have its limits....but, possibly might fit better into 64kB, compared to my approach.

Back in the day, if someone was of the opinion he could do a better routine for a specific effect he just would code it up, and release it and shouting out in a scroller that his rout is better (less memory / fewer cycles / what ever). But as it seems today we have (as someone said months ago) more talkers then doers here.

PS:

i posted days agom my congrats to JAC in this thread, however, I can't see the post. So I just say it again: Congrats JAC, for finishing a demo. I have to smile thinking back to the ABBUC JHV 2010 when the proposal of 505 was discussed

analmux · December 16, 2010

Back in the day, if someone was of the opinion he could do a better routine for a specific effect he just would code it up, and release it and shouting out in a scroller that his rout is better (less memory / fewer cycles / what ever). But as it seems today we have (as someone said months ago) more talkers then doers here....

?

I already said thumbsup to JAC! two times before. So, what's the problem with just talking here? Just exchanging some ideas. Is this really a problem? That's what a forum is for, ain't it?

I'm not talking about my opinion here. I just expressed my wondering why JAC!'s routine should take so many CPU time, as I don't understand why a faster routine wasn't chosen. Maybe the faster routine takes more memory, I'm not sure, I'll admit...so, I'm not really claiming anything here.

People here are free to say whether I'm talking nonsense or not. But, no one here did. At least no one give a proper explanation why I might me wrong.

The only claim about my approach, POSSIBLY being beter In ONE ASPECT, is that we're having somewhat more freedom to choose our picture boundary. When PMs are needed, and also a kernel to reposition them, then I doubt something like the following pic will be possible using JAC!s approach:

Sheddy · December 16, 2010

OK, I have no opinion on best techniques, but the results are impressive however it's done :thumbsup:

(Actually, for some reason I was particularly drawn to the way the colours flow out of the logo at the top - don't know if it's really hard to do or not, but just looks really nice to me)

Jac!s Sillyventure Contribution

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members