ivop Posted June 26, 2015 Share Posted June 26, 2015 Hi all, A little over three years ago I released Atari Sid III. A few days ago, just when I wanted to get some sleep, I got an idea about how to improve its player routine. I got out of bed, started coding and here's the result 13kB of tables have been compressed to circa 512 bytes, including the decompression and noise generation routines. This improves load times tremendously. The time spent in the timer IRQ handler has been reduced from 98 cycles down to 84 cycles (per scan line). Added multiple song support; you can switch songs by pressing one of the three console keys Instead of using three Pokey channels, it now uses just one. That means that the per channel dynamic range has decreased slightly from the previous version, but instead it sounds a little more balanced and it saves some precious cycles Currently I waste 1248 cycles by visualizing the current waveform, but that's just to differentiate the "play" screen from version 3. Because now only one Pokey channel/timer is used, the other three are free for some Pokey fun! And because there's a lot more CPU time left, one could have a 3 channel Pokey tune combined with a 3 channel Sid tune. Ninja/Goattracker + RMT :-) Attached you'll find the full source code and a zip with a few sample songs (Cybernoid, Cybernoid II, Commando, Metal Warrior 2, Nintendo Metal). There's still room for improvement though. The noise sounds a bit metalic at times. This could be reduced by refilling (parts of) the noise tables every frame, but this is not implemented yet as it would also possibly eliminate the ability to combine Pokey channels with a softsynth SID emulation, in which case you have Pokey do the drums. As for emulating the emulator, Altirra should work (cannot test as my machine is way too slow), atari800 only works with a patch I recently posted to its mailinglist, implementing Read-Modify-Write instructions for Pokey registers. Anyway, it sounds best on real hardware of course The source is still in my weird shasm65 format, as I based this on my previous code, but it should be fairly readable Regards, Ivo atarisid4-src.zip atarisid4-xex.zip 15 Quote Link to comment Share on other sites More sharing options...
Rybags Posted June 27, 2015 Share Posted June 27, 2015 Nice, I'm going on memory but I think the earlier 3-voice version had somewhat better sound quality. Do you reckon having freed up cycles it'd be possible to now have an active display? Even though it might mean something like a narrow OS mode 3, plus you'd probably need multiple versions of the playback loop to cater for the varying DMA loads. Quote Link to comment Share on other sites More sharing options...
emkay Posted June 27, 2015 Share Posted June 27, 2015 (edited) Actually, I'm not getting, why "just put a value to a register / 3 Pokey channels" takes more CPU cycles, than a software mixing of 3 channels and calculating the resulting value before writing it to one register... If there is much CPU time left, and POKEY channels free... How about using approx. "50%" of CPU time, and put the player together with the SIO-loader routines? Edited June 27, 2015 by emkay Quote Link to comment Share on other sites More sharing options...
Rybags Posted June 27, 2015 Share Posted June 27, 2015 I like the integrate with SIO idea... in fact it might actually work but likely in 19.2 k speed only with a poll-driven loader. Quote Link to comment Share on other sites More sharing options...
+Philsan Posted June 27, 2015 Share Posted June 27, 2015 You can compare the two versions. If 2015 version leaves more cpu time I think it's acceptable. SID - Commando (2012) (Ivo van Poorten).xex SID - Commando (2015) (Ivo van Poorten).xex Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted June 27, 2015 Share Posted June 27, 2015 SID plus sio loader.... Yeah.... Gimme asap! finally we would equalise at least little bit to c64 Quote Link to comment Share on other sites More sharing options...
Rybags Posted June 27, 2015 Share Posted June 27, 2015 I reckon it could be done - having the playback bumped by a scanline for every incoming byte wouldn't be too bad. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted June 27, 2015 Share Posted June 27, 2015 Now dumb question... Is the 19200 somehow repeated to the scanline frequency? (Maybe some missing info to me but seems to when looking at possible digital playback and 19200 baud) Quote Link to comment Share on other sites More sharing options...
MARIO130XE Posted June 27, 2015 Share Posted June 27, 2015 wow, WOOOAAAHHH awesome!!!! Quote Link to comment Share on other sites More sharing options...
Rybags Posted June 27, 2015 Share Posted June 27, 2015 The 19.2 k isn't related to scanline frequency - I was only toying with that idea as it's default SIO rate (actually it's a bit less?) and IIRC the SID emulation is oriented towards one sample every 2 scanlines (?) Every chance higher rates might be possible - probably a case of sacrificing fidelity with sound in doing so. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted June 27, 2015 Share Posted June 27, 2015 thought of converting the source to MADS but it seems more difficult than thought... strange assembler format Quote Link to comment Share on other sites More sharing options...
Mclaneinc Posted June 27, 2015 Share Posted June 27, 2015 (edited) Wow, is that my unmodified XL doing that? That is awesome... Totally well done Ivo... And thanks for the updated Commando Philisan...(edit: oops was in the Ivo file) Ta muchly to all... Edited June 27, 2015 by Mclaneinc Quote Link to comment Share on other sites More sharing options...
ivop Posted June 27, 2015 Author Share Posted June 27, 2015 Some more info I probably should have put in the first post Replay rate is 15.6 kHz, just like version 3 (version 2 was 7.8 kHz). The extra cycles were saved by doing a single INC IRQEN (an RMW instruction) to clear and reset the timer 1 interrupt bit. Also, I went back to a single channel, which indeed does slightly degrade the sound quality, but as Philsan said, imho that's acceptable if it leaves more CPU time for other things (like a Pokey player, PMG based scroller, or perhaps a SIO loader). To reply to emkay why it actually saves time to add the channels instead of storing them to Pokey directly: version 3: lda $1234 sta audc1 lda $5678 sta audc2 lda $9abc sta audc3 24 cycles version 4: lda $1234 clc adc $5678 adc $9abc sta audc1 18 cycles Sadly, the clc cannot be skipped. It'll start playing a 7.8kHz beep if you omit it. The tables in v4 are slightly adapted. Its range is now 0-5 instead of 0-7, which is why the quality is a little less. Luckily, the SID chip has three channels, which means that adding three values in the range of $10-$15 gives a result in the range of $30-$3f which is still volume-only As for the funny assembler format, basically, the source is a Unix shell script (works with zsh, bash, ksh). Thanks for the feedback, Ivo 4 Quote Link to comment Share on other sites More sharing options...
emkay Posted June 27, 2015 Share Posted June 27, 2015 Sadly, the clc cannot be skipped. It'll start playing a 7.8kHz beep if you omit it. The tables in v4 are slightly adapted. Its range is now 0-5 instead of 0-7, which is why the quality is a little less. Luckily, the SID chip has three channels, which means that adding three values in the range of $10-$15 gives a result in the range of $30-$3f which is still volume-only Well, sometimes doing less is more The results show , it's useful. Are you interested in plugging the emulation into the SIO loader ? Such stuff is exactly missing Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted June 27, 2015 Share Posted June 27, 2015 I would die for. Sio Sid loader even people say no need for track loaders anymore... Ivop any chance of MADS format? Or should I try to convert myself? But I guess each Sid needs not be run through the converter... Quote Link to comment Share on other sites More sharing options...
ivop Posted June 27, 2015 Author Share Posted June 27, 2015 (edited) I have been thinking of changing the source format to a more reasonable format, but have been putting it off every time because of the work involved This whole project started out as a testcase for shasm65, which in itself was just a fun project to see if it could be done (i.e. an assembler as a shell script). Heaven, if you want to convert it yourself, go ahead. It'll probably help if you have an editor which has syntax highlighting for (ba)sh. Suddenly it becomes a lot more readable I remember that Tezz wanted to do something similar. Perhaps some work has already been done in that direction? I have never written any polled SIO related code, so I'm not sure if I'm the right person to try combining the two. Also, I'm working on two "new graphics mode" projects at the moment Edit: a short "manual" on how to get a SID converted is in the sid2gumby thread here on AtariAge. Once converted, the resulting binary works with v3, v4 and sid2gumby. Edited June 27, 2015 by ivop Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted June 27, 2015 Share Posted June 27, 2015 version 4: lda $1234 clc adc $5678 adc $9abc sta audc1 18 cycles Thanks for the insight. Do you really use absolute, non-ZP addressing here? Could you easily change this routine to use self-modifying-code like this: lda #BYTELOC1234 clc adc #BYTELOC5678 adc #BYTELOC9ABC sta audc1 to go down to 12 cycles? Quote Link to comment Share on other sites More sharing options...
ivop Posted June 27, 2015 Author Share Posted June 27, 2015 Here's the core of the irq routine (skipped code duplication for clarity): .org 0x0000 $tempzp L irq sta.z $saveA # 6 + 3 = 9 count_lsb_v1=$(($_here+1)) lda. 0 # 2 freq_lsb_v1=$(($_here+1)) adc. 0 # 2 sta.z $count_lsb_v1 # 3 lda.z $count_msb_v1 # 3 freq_msb_v1=$(($_here+1)) adc. 0 # 2 sta.z $count_msb_v1 # 3 # ---> 15 ### REPEAT THE ABOVE TWO TIMES FOR SECOND AND THIRD CHANNEL count_msb_v1=$(($_here+1)) table_msb_v1=$(($_here+2)) lda $silence # 4 clc # 2 count_msb_v2=$(($_here+1)) table_msb_v2=$(($_here+2)) adc $silence # 4 count_msb_v3=$(($_here+1)) table_msb_v3=$(($_here+2)) adc $silence # 4 sta $AUDC1 # 4 # ---> 18 inc $IRQEN # 4 saveA=$(($_here+1)) lda. 0 # 2 rti # 6 # ---> 8 # total: 9+3*15+18+4+8 = 84 Stuff starting with a $ are labels, not hex values. Those look like 0x.... similar to the C programming language. Mnemonics with a . (dot/period) added are immediate, with .z are zero page. The whole routine runs in zero page. freq_msb_* and table_msb_* are modified by the sid emulation/softsynth that runs once per frame. All the _here+1 stuff is similar to *+1 in other assemblers. It's all self modifying code. I don't see how I could use immediate loads as those values have to come from tables at some point. But perhaps you can think of a way to speed this up even more? Side note: saving and restoring the accumulator could be omitted if one was to write a player routine directly for the softsynth engine, removing the need for sid register emulation, and only use the X and Y register 1 Quote Link to comment Share on other sites More sharing options...
phaeron Posted June 27, 2015 Share Posted June 27, 2015 I was able to squeeze 4-channels into an IRQ-based player at 15.7KHz once by only updating phase for one channel at a time in round-robin fashion, i.e. 1 -> 2 -> 3 -> 4 -> 1. In the other three phases, the MSB of the phase was projected by a multiple of the MSB of the increment. It gives up to 3/256 phase error for 3/4 samples, but that's not too audible with 4-bit samples. This is one of the IRQ routines: .proc irq1 sta asave ;3 ldy phase1hi ;3 lda wavtab,y ;4+1 voltab1 = *-1 sta audc1 ;4 ldy phase2hi ;3 lda wavtab,y ;4+1 voltab2 = *-1 sta audc2 ;4 ldy phase3hi ;3 lda wavtab,y ;4+1 voltab3 = *-1 sta audc3 ;4 ldy phase4hi ;3 lda wavtab,y ;4+1 voltab4 = *-1 sta audc4 ;4 asl irqen ;6 lda #0 ;2 phase1lo = *-1 adc #0 ;2 freq1lo = *-1 sta phase1lo ;3 lda #0 ;2 phase1hi = *-1 adc #0 ;2 freq1hi = *-1 sta phase1hi ;3 mva #irq2 $fffe ;6 lda #0 ;2 asave = *-1 rti ;6 .endp The main downside is that it requires a lot of zero page and twice as much storage for the samples, since they need two pages instead of one per volume level. Cost including interrupt overhead for 4 channels is 88-92 cycles. Going down to 3 channels would reduce to 77-80 cycles, and accumulating to just AUDC1 instead of AUDC1-3 would bring it down further to 69-71. Note that in this player the main routine was constrained not to use the Y register, so adding save/restore for that would cost 5 cycles. 2 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted June 28, 2015 Share Posted June 28, 2015 Ivop... basicly what is the way to go to get a own composed SID into A8 then? Quote Link to comment Share on other sites More sharing options...
snicklin Posted June 28, 2015 Share Posted June 28, 2015 (edited) Ta muchly to all... Just remember that this is an international website! Edited June 28, 2015 by snicklin Quote Link to comment Share on other sites More sharing options...
emkay Posted June 28, 2015 Share Posted June 28, 2015 Ivop... basicly what is the way to go to get a own composed SID into A8 then? Use a PC Tracker ... Goat Tracker f.E. then import the tune to the player format? It's somehow the missing "digi MOD Tracker" for the A8 then Interesting to check if the replay could use 15.6kHz during SIO . 7.8KHz will work for sure with DMA on. When the DMA is off, SIO gets even more time. As SIO is using POKEY, granting the Timer handling , it shouldn't interfere... Quote Link to comment Share on other sites More sharing options...
pirx Posted June 28, 2015 Share Posted June 28, 2015 The source is still in my weird shasm65 format, as I based this on my previous code, but it should be fairly readable Oh man, this is fokking brilliant - it is almost like an assembler that assembles assemblies. You Prince of assemblages! Me bows in awe. Quote Link to comment Share on other sites More sharing options...
Mclaneinc Posted June 28, 2015 Share Posted June 28, 2015 Just remember that this is an international website! Yes indeed Changed to Thank you all Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted June 28, 2015 Share Posted June 28, 2015 still not get the workaround to get a Sid file converted. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.