Jump to content
IGNORED

LSP player conversion to Atari Jaguar


Ericde45

Recommended Posts

Hello there,

 

this week i converted the LSP player from Arnaud Carré ( Leonard/Oxygene) to Jaguar

it is a streaming format of Amiga audio hardware registers values, recorded while playing a module on PC

https://github.com/arnaud-carre/LSPlayer

 

i made a simple version, then a little more optimised version, both reading samples byte by byte

then i did a version reading samples long word/phrase by long word. And stocking 4 bytes for each channel, in the SRAM of the DSP

 

currently with no 68000 displaying any debug information, it reaches 50 KHz replay on real hardware.

 

do you have any advice to improve it ?

 

is transferring 2 long words/8 bytes each time faster ? accessing the same DRAM for 8 bytes each time for example

 

is using blitter to stock some small buffers of samples a good idea to go further than 50 Khz ? something like 8 x 512 KB of buffers in the DSP RAM

and is using blitter in DSP code a good idea if i want to use it in also in the GPU code ?

 

putting the samples completly in the DSP SRAM enables to reach 83 KHz ( there are some modules using only 4 KB of samples)

 

source code is available here :

 

https://github.com/ericde45/LSP_Jaguar

 

 

video of first working version is here :

 

  • Like 15
Link to comment
Share on other sites

Good work!

1 hour ago, Ericde45 said:

do you have any advice to improve it ?

You probably mean performance here, but for me it'd be a great improvement if one could also trigger samples playing in parallel (for example sound effects in games).

 

As for performance, it's definitely better to grab more than 1 byte at a time. As for what is optimal, I'll defer that to people that actually have some experience in the bus performance. 

Link to comment
Share on other sites

 

Just now, 42bs said:

For example do not push always all registers, only those which are really destroyed.

i don't care the 68000 part, it was not written to be optimized but to be secured. to avoid any issue with the 68000 code.

it is only used to do init, and display debug informations

you can put a stop #$2700 once the DSP is started.

 

Link to comment
Share on other sites

15 minutes ago, ggn said:

Good work!

You probably mean performance here, but for me it'd be a great improvement if one could also trigger samples playing in parallel (for example sound effects in games).

 

As for performance, it's definitely better to grab more than 1 byte at a time. As for what is optimal, I'll defer that to people that actually have some experience in the bus performance. 

i can write a version with 4 additionnals samples sources, with frequencies.

this will take some DSP CPU time of course but this will surely be usefull.

 

Link to comment
Share on other sites

  

2 hours ago, Ericde45 said:

is transferring 2 long words/8 bytes each time faster ? accessing the same DRAM for 8 bytes each time for example

Unfortunately, the DSP (unlike the GPU) cannot load 64 bits at once. So reading 2 long words will cause two bus transactions.

 

I'm not sure if doing two 32 bits loads in a row is worth it. I think the second will stall because the first one is not complete, and that this will waste DSP cycles that could have been used to do something else. But the best way to know is probably to try it and see if it helps or not.

Edited by Zerosquare
  • Like 1
Link to comment
Share on other sites

Currently looking around on a tablet, so a bit handycaped.

But for example, here the jump-slot is not used:

store		R3,(R7)
	store		R4,(R8)					; stocke le pointeur sample de repeat dans LSP_DSP_PAULA_AUD3L
	jump		(R12)				; jump en DSP_LSP_Timer1_skip3
	nop

Or here 

shlq		#nb_bits_virgule_offset,R4	
	store		R4,(R7)				; pointeur sample repeat, a virgule
; repeat length
	movei		#LSP_DSP_repeat_length2,R7
	loadw		(R5),R8				; .w = R8 = taille du sample
	shlq		#nb_bits_virgule_offset,R8				; en 16:16
	add			R4,R8
	store		R8,(R7)			

There is a stall after the loads.

But of course after interleaving the code will look ugly ?

 

 

  • Like 2
Link to comment
Share on other sites

Currently looking around on a tablet, so a bit handycaped.

But for example, here the jump-slot is not used:

store		R3,(R7)
	store		R4,(R8)					; stocke le pointeur sample de repeat dans LSP_DSP_PAULA_AUD3L
	jump		(R12)				; jump en DSP_LSP_Timer1_skip3
	nop

Or here 

shlq		#nb_bits_virgule_offset,R4	
	store		R4,(R7)				; pointeur sample repeat, a virgule
; repeat length
	movei		#LSP_DSP_repeat_length2,R7
	loadw		(R5),R8				; .w = R8 = taille du sample
	shlq		#nb_bits_virgule_offset,R8				; en 16:16
	add			R4,R8
	store		R8,(R7)			

There is a stall after the loads.

But of course after interleaving the code will look ugly ?

Sign extending:

shlq #16,rn

sharq #16,rn

 

 

 

 

 

 

 

 

 

 

  • Like 2
Link to comment
Share on other sites

1 hour ago, 42bs said:

Currently looking around on a tablet, so a bit handycaped.

But for example, here the jump-slot is not used:


store		R3,(R7)
	store		R4,(R8)					; stocke le pointeur sample de repeat dans LSP_DSP_PAULA_AUD3L
	jump		(R12)				; jump en DSP_LSP_Timer1_skip3
	nop

Or here 


shlq		#nb_bits_virgule_offset,R4	
	store		R4,(R7)				; pointeur sample repeat, a virgule
; repeat length
	movei		#LSP_DSP_repeat_length2,R7
	loadw		(R5),R8				; .w = R8 = taille du sample
	shlq		#nb_bits_virgule_offset,R8				; en 16:16
	add			R4,R8
	store		R8,(R7)			

There is a stall after the loads.

But of course after interleaving the code will look ugly ?

Sign extending:

shlq #16,rn

sharq #16,rn

 

 

 

 

nice trick for the sign extending tip !

 

i will work on optimisation. Timer1 has none.

i was looking for working code, with identical parts for each channel.

 

thanks again for the advices.

 

Link to comment
Share on other sites

shlq		#16,R6
	loadw		(R5),R8
	or			R8,R6
	movei		#LSP_DSP_PAULA_AUD3LEN,R8
	shlq		#nb_bits_virgule_offset,R6		
	store		R6,(R7)				; stocke le pointeur sample a virgule dans LSP_DSP_PAULA_AUD3L
	addq		#2,R5
	loadw		(R5),R9				; .w = R9 = taille du sample
	shlq		#nb_bits_virgule_offset

 

If you move the movei after the load you can fill the stall. Of course ,out need to movei to r9 and not R8.

  • Like 1
Link to comment
Share on other sites

This is all over my head, but @SebRmv has ported a protracker routine for use in the Remover's Library (a C code library for the Jaguar platform).  Based on my very surface level understanding of PT and LSP these seem like very different implementations of accomplishing the same thing, but I thought I would bring this up in case there might be something informative for your efforts in the Removers Library

 

There is a brief description of what Seb implemented on the Remover's webpage (Sound Manager section line 142): https://github.com/theRemovers/rmvlib/blob/master/main.h

 

Here is Seb's repositiory for the latest version of the code: https://github.com/theRemovers/rmvlib

Link to comment
Share on other sites

45 minutes ago, BitJag said:

This is all over my head, but @SebRmv has ported a protracker routine for use in the Remover's Library (a C code library for the Jaguar platform).  Based on my very surface level understanding of PT and LSP these seem like very different implementations of accomplishing the same thing, but I thought I would bring this up in case there might be something informative for your efforts in the Removers Library

 

There is a brief description of what Seb implemented on the Remover's webpage (Sound Manager section line 142): https://github.com/theRemovers/rmvlib/blob/master/main.h

 

Here is Seb's repositiory for the latest version of the code: https://github.com/theRemovers/rmvlib

LSP is optimised for speed by doing all the 'grunt work' with an external pre-processor.  It would be an awesome addition to the toolset for games if we can get additional channels for FX.

  • Like 3
Link to comment
Share on other sites

43 minutes ago, CyranoJ said:

LSP is optimised for speed by doing all the 'grunt work' with an external pre-processor.  It would be an awesome addition to the toolset for games if we can get additional channels for FX.

I can see how my previous comment could insinuate that I was saying we didn't need another tool for MOD file playback. My intention was to support your suggestion for adding additional channels for playback.  The removers library supports up to 8 channels for MOD file playback, I was suggesting that there might be something in that code that could help @Ericde45 enable the similar functionality with his code.

Link to comment
Share on other sites

21 minutes ago, BitJag said:

I can see how my previous comment could insinuate that I was saying we didn't need another tool for MOD file playback. My intention was to support your suggestion for adding additional channels for playback.  The removers library supports up to 8 channels for MOD file playback, I was suggesting that there might be something in that code that could help @Ericde45 enable the similar functionality with his code.

I didn't read it that you were - I was just trying to explain why LSP is 'better' for games (Less CPU=Good!) :)

  • Like 1
Link to comment
Share on other sites

currently the PC exe converter of LSP does not handle 8 voices modules ( i mean 8 voices of music tracks)

it is possible to split the 8 voices module in 2 4 voices module using openmpt, then it is possible to play it ( i did a 8 voices LSP player on Archimedes )

 

is 8 voices module interesting ?

 

or is it 4 voices module + 4 voices for samples for game noise/sound, with variable replay frequencies also for sounds ?

 

Link to comment
Share on other sites

More to optimize:

test period canal 3
	btst		#3,R2
	jr			eq,DSP_LSP_Timer1_noPd
	nop
	loadw		(R0),R4
	movei		#LSP_DSP_PAULA_AUD3PER,R5
	addq		#2,R0
	store		R4,(R5)
DSP_LSP_Timer1_noPd:
; test period canal 2
	btst		#2,R2
	jr			eq,DSP_LSP_Timer1_noPc
	nop

If you use addqt then you can move the btst of the next bit into the jump-slot.

  • Like 1
Link to comment
Share on other sites

  • 7 months later...

today another step is reached

working LSP replay v1.5 @ 51935 Hz

 

at last, Jaguar sound is better than STE !

 

https://github.com/ericde45/LSP_Jaguar

 

I2S code is full registers only, no memory access, no load no store , except to DAC

reading samples 4 bytes each time

only 2 DSP registers left on the whole 64 available

 

image.thumb.png.1cb07a5cef67a9500774255995a38e4b.png

Edited by Ericde45
  • Like 5
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...