Ericde45 Posted January 28, 2022 Share Posted January 28, 2022 Hello there, this week i converted the LSP player from Arnaud Carré ( Leonard/Oxygene) to Jaguar it is a streaming format of Amiga audio hardware registers values, recorded while playing a module on PC https://github.com/arnaud-carre/LSPlayer i made a simple version, then a little more optimised version, both reading samples byte by byte then i did a version reading samples long word/phrase by long word. And stocking 4 bytes for each channel, in the SRAM of the DSP currently with no 68000 displaying any debug information, it reaches 50 KHz replay on real hardware. do you have any advice to improve it ? is transferring 2 long words/8 bytes each time faster ? accessing the same DRAM for 8 bytes each time for example is using blitter to stock some small buffers of samples a good idea to go further than 50 Khz ? something like 8 x 512 KB of buffers in the DSP RAM and is using blitter in DSP code a good idea if i want to use it in also in the GPU code ? putting the samples completly in the DSP SRAM enables to reach 83 KHz ( there are some modules using only 4 KB of samples) source code is available here : https://github.com/ericde45/LSP_Jaguar video of first working version is here : 12 Quote Link to comment Share on other sites More sharing options...
ggn Posted January 28, 2022 Share Posted January 28, 2022 Good work! 1 hour ago, Ericde45 said: do you have any advice to improve it ? You probably mean performance here, but for me it'd be a great improvement if one could also trigger samples playing in parallel (for example sound effects in games). As for performance, it's definitely better to grab more than 1 byte at a time. As for what is optimal, I'll defer that to people that actually have some experience in the bus performance. Quote Link to comment Share on other sites More sharing options...
42bs Posted January 28, 2022 Share Posted January 28, 2022 2 hours ago, Ericde45 said: do you have any advice to improve it ? For example do not push always all registers, only those which are really destroyed. Quote Link to comment Share on other sites More sharing options...
Ericde45 Posted January 28, 2022 Author Share Posted January 28, 2022 Just now, 42bs said: For example do not push always all registers, only those which are really destroyed. i don't care the 68000 part, it was not written to be optimized but to be secured. to avoid any issue with the 68000 code. it is only used to do init, and display debug informations you can put a stop #$2700 once the DSP is started. Quote Link to comment Share on other sites More sharing options...
Ericde45 Posted January 28, 2022 Author Share Posted January 28, 2022 15 minutes ago, ggn said: Good work! You probably mean performance here, but for me it'd be a great improvement if one could also trigger samples playing in parallel (for example sound effects in games). As for performance, it's definitely better to grab more than 1 byte at a time. As for what is optimal, I'll defer that to people that actually have some experience in the bus performance. i can write a version with 4 additionnals samples sources, with frequencies. this will take some DSP CPU time of course but this will surely be usefull. Quote Link to comment Share on other sites More sharing options...
Zerosquare Posted January 28, 2022 Share Posted January 28, 2022 (edited) 2 hours ago, Ericde45 said: is transferring 2 long words/8 bytes each time faster ? accessing the same DRAM for 8 bytes each time for example Unfortunately, the DSP (unlike the GPU) cannot load 64 bits at once. So reading 2 long words will cause two bus transactions. I'm not sure if doing two 32 bits loads in a row is worth it. I think the second will stall because the first one is not complete, and that this will waste DSP cycles that could have been used to do something else. But the best way to know is probably to try it and see if it helps or not. Edited January 28, 2022 by Zerosquare 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted January 28, 2022 Share Posted January 28, 2022 A quick look shows there is very few interleaving in the DSP code which adds a lot of stalls. 1 Quote Link to comment Share on other sites More sharing options...
Ericde45 Posted January 28, 2022 Author Share Posted January 28, 2022 do you have examples of non interleaving ? i might not have understand exactly what is interleaving, as i only tried to interleave registers usage. Quote Link to comment Share on other sites More sharing options...
42bs Posted January 28, 2022 Share Posted January 28, 2022 Currently looking around on a tablet, so a bit handycaped. But for example, here the jump-slot is not used: store R3,(R7) store R4,(R8) ; stocke le pointeur sample de repeat dans LSP_DSP_PAULA_AUD3L jump (R12) ; jump en DSP_LSP_Timer1_skip3 nop Or here shlq #nb_bits_virgule_offset,R4 store R4,(R7) ; pointeur sample repeat, a virgule ; repeat length movei #LSP_DSP_repeat_length2,R7 loadw (R5),R8 ; .w = R8 = taille du sample shlq #nb_bits_virgule_offset,R8 ; en 16:16 add R4,R8 store R8,(R7) There is a stall after the loads. But of course after interleaving the code will look ugly ? 2 Quote Link to comment Share on other sites More sharing options...
42bs Posted January 28, 2022 Share Posted January 28, 2022 Currently looking around on a tablet, so a bit handycaped. But for example, here the jump-slot is not used: store R3,(R7) store R4,(R8) ; stocke le pointeur sample de repeat dans LSP_DSP_PAULA_AUD3L jump (R12) ; jump en DSP_LSP_Timer1_skip3 nop Or here shlq #nb_bits_virgule_offset,R4 store R4,(R7) ; pointeur sample repeat, a virgule ; repeat length movei #LSP_DSP_repeat_length2,R7 loadw (R5),R8 ; .w = R8 = taille du sample shlq #nb_bits_virgule_offset,R8 ; en 16:16 add R4,R8 store R8,(R7) There is a stall after the loads. But of course after interleaving the code will look ugly ? Sign extending: shlq #16,rn sharq #16,rn 2 Quote Link to comment Share on other sites More sharing options...
+Stephen Posted January 28, 2022 Share Posted January 28, 2022 Wow - cool project! Thanks for sharing. I've never done any Jag coding so can't help, just wanted to pop in with a quick thanks. Quote Link to comment Share on other sites More sharing options...
Ericde45 Posted January 28, 2022 Author Share Posted January 28, 2022 1 hour ago, 42bs said: Currently looking around on a tablet, so a bit handycaped. But for example, here the jump-slot is not used: store R3,(R7) store R4,(R8) ; stocke le pointeur sample de repeat dans LSP_DSP_PAULA_AUD3L jump (R12) ; jump en DSP_LSP_Timer1_skip3 nop Or here shlq #nb_bits_virgule_offset,R4 store R4,(R7) ; pointeur sample repeat, a virgule ; repeat length movei #LSP_DSP_repeat_length2,R7 loadw (R5),R8 ; .w = R8 = taille du sample shlq #nb_bits_virgule_offset,R8 ; en 16:16 add R4,R8 store R8,(R7) There is a stall after the loads. But of course after interleaving the code will look ugly ? Sign extending: shlq #16,rn sharq #16,rn nice trick for the sign extending tip ! i will work on optimisation. Timer1 has none. i was looking for working code, with identical parts for each channel. thanks again for the advices. Quote Link to comment Share on other sites More sharing options...
+CyranoJ Posted January 28, 2022 Share Posted January 28, 2022 Again, amazing! If possible, it would be nice to add another 4 channels to the replayer so that some sound effects could also be played. Awesome work! 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted January 28, 2022 Share Posted January 28, 2022 shlq #16,R6 loadw (R5),R8 or R8,R6 movei #LSP_DSP_PAULA_AUD3LEN,R8 shlq #nb_bits_virgule_offset,R6 store R6,(R7) ; stocke le pointeur sample a virgule dans LSP_DSP_PAULA_AUD3L addq #2,R5 loadw (R5),R9 ; .w = R9 = taille du sample shlq #nb_bits_virgule_offset If you move the movei after the load you can fill the stall. Of course ,out need to movei to r9 and not R8. 1 Quote Link to comment Share on other sites More sharing options...
+BitJag Posted January 28, 2022 Share Posted January 28, 2022 This is all over my head, but @SebRmv has ported a protracker routine for use in the Remover's Library (a C code library for the Jaguar platform). Based on my very surface level understanding of PT and LSP these seem like very different implementations of accomplishing the same thing, but I thought I would bring this up in case there might be something informative for your efforts in the Removers Library There is a brief description of what Seb implemented on the Remover's webpage (Sound Manager section line 142): https://github.com/theRemovers/rmvlib/blob/master/main.h Here is Seb's repositiory for the latest version of the code: https://github.com/theRemovers/rmvlib Quote Link to comment Share on other sites More sharing options...
+CyranoJ Posted January 29, 2022 Share Posted January 29, 2022 45 minutes ago, BitJag said: This is all over my head, but @SebRmv has ported a protracker routine for use in the Remover's Library (a C code library for the Jaguar platform). Based on my very surface level understanding of PT and LSP these seem like very different implementations of accomplishing the same thing, but I thought I would bring this up in case there might be something informative for your efforts in the Removers Library There is a brief description of what Seb implemented on the Remover's webpage (Sound Manager section line 142): https://github.com/theRemovers/rmvlib/blob/master/main.h Here is Seb's repositiory for the latest version of the code: https://github.com/theRemovers/rmvlib LSP is optimised for speed by doing all the 'grunt work' with an external pre-processor. It would be an awesome addition to the toolset for games if we can get additional channels for FX. 1 Quote Link to comment Share on other sites More sharing options...
+BitJag Posted January 29, 2022 Share Posted January 29, 2022 43 minutes ago, CyranoJ said: LSP is optimised for speed by doing all the 'grunt work' with an external pre-processor. It would be an awesome addition to the toolset for games if we can get additional channels for FX. I can see how my previous comment could insinuate that I was saying we didn't need another tool for MOD file playback. My intention was to support your suggestion for adding additional channels for playback. The removers library supports up to 8 channels for MOD file playback, I was suggesting that there might be something in that code that could help @Ericde45 enable the similar functionality with his code. Quote Link to comment Share on other sites More sharing options...
+CyranoJ Posted January 29, 2022 Share Posted January 29, 2022 21 minutes ago, BitJag said: I can see how my previous comment could insinuate that I was saying we didn't need another tool for MOD file playback. My intention was to support your suggestion for adding additional channels for playback. The removers library supports up to 8 channels for MOD file playback, I was suggesting that there might be something in that code that could help @Ericde45 enable the similar functionality with his code. I didn't read it that you were - I was just trying to explain why LSP is 'better' for games (Less CPU=Good!) 1 Quote Link to comment Share on other sites More sharing options...
Ericde45 Posted January 29, 2022 Author Share Posted January 29, 2022 currently the PC exe converter of LSP does not handle 8 voices modules ( i mean 8 voices of music tracks) it is possible to split the 8 voices module in 2 4 voices module using openmpt, then it is possible to play it ( i did a 8 voices LSP player on Archimedes ) is 8 voices module interesting ? or is it 4 voices module + 4 voices for samples for game noise/sound, with variable replay frequencies also for sounds ? Quote Link to comment Share on other sites More sharing options...
42bs Posted January 29, 2022 Share Posted January 29, 2022 More to optimize: test period canal 3 btst #3,R2 jr eq,DSP_LSP_Timer1_noPd nop loadw (R0),R4 movei #LSP_DSP_PAULA_AUD3PER,R5 addq #2,R0 store R4,(R5) DSP_LSP_Timer1_noPd: ; test period canal 2 btst #2,R2 jr eq,DSP_LSP_Timer1_noPc nop If you use addqt then you can move the btst of the next bit into the jump-slot. 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted January 29, 2022 Share Posted January 29, 2022 It seems you are using only one bank, so you can move all/some of the movei constants into the other bank and use movefa 1 Quote Link to comment Share on other sites More sharing options...
Ericde45 Posted January 29, 2022 Author Share Posted January 29, 2022 (edited) just added pad1 & pad2 reading in timer 2, U235 format : xxxxxxCx xxBx2580 147*oxAP 369#RLDU optimised a little bit the I2S part Edited January 29, 2022 by Ericde45 1 Quote Link to comment Share on other sites More sharing options...
Ericde45 Posted September 15, 2022 Author Share Posted September 15, 2022 (edited) today another step is reached working LSP replay v1.5 @ 51935 Hz at last, Jaguar sound is better than STE ! https://github.com/ericde45/LSP_Jaguar I2S code is full registers only, no memory access, no load no store , except to DAC reading samples 4 bytes each time only 2 DSP registers left on the whole 64 available Edited September 15, 2022 by Ericde45 5 Quote Link to comment Share on other sites More sharing options...
42bs Posted September 15, 2022 Share Posted September 15, 2022 The 5k RAM available is really free? Or just no code, but buffers? Quote Link to comment Share on other sites More sharing options...
42bs Posted September 15, 2022 Share Posted September 15, 2022 Would be nice, if you'd split player and demo code. Makes it easier to use in another project. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.