New speech synthesis for Atari

R0ger · March 20, 2023

In 2019, when topic for 2020 Forever party was announced to be "Robot", I came with with an idea to somehow improve speech synthesis situation on Atari. Good old SAM can't talk with video on, is mostly only good for English, not very good for singing, and nobody really understands how it works. And singing was something I was especially interested in.

I ended up writing new speech synth from scratch.

Well, 2020 Forever didn't happen. Nor did Forever 2021, nor 2022. But it did finally happen this year. So after all those years, I barely managed to finish the demo in the last two weeks, and here is the result:

I plan to make like a lecture about it in Czech on this year Atariada, and I would make a video or megapost here about it soon after in English. That would be early May.

So at the moment I don't want to go too deep in explaining how it works. But here are few points:

- it can do 2 voices with low DMA modes, like Antic D, or low res text mode. It can do 1 voice with full-screen hires. It doesn't like badlines though, so no hires text modes. And with a bit of work, it can also do other things in between, like play a POKEY song, and do slight animation.

- this demo plays the 2 voices in separate channels in stereo, but it is also possible to mix them by CPU and only use 1 channel for 2 voices of speech.

- of course it can also talk. It was supposed to be part of the demo, I just didn't manage to make that part work before the deadline.

- at the moment it sucks at English, as I didn't need it for the demo. But I'm working on it, as well as some other languages.

- compared to SAM (which I somewhat understand now, as some good soul rewrote it to C) it needs more memory (about 15k sample bank) but way less CPU. Features and output quality are similar.

- mine has some special features for singing, like 16 bit frequency control, vibrato, and frame perfect timing.

- POKEY music uses LZSS. The code is based on @rensoup's modified LZSSP, and I use his RMT2LZSS.

- speech output is every 2 lines, so about 8kHz, 4 bits. I can also do 8bits output, but it doesn't help much, and usually it's not worth the extra channel.

Stay tuned for more demos, and one day, hopefully, something other people can use too.

PS. you need stereo for the demo !

zvonky.xex

Irgendwer · March 20, 2023

Nice!

Here how the German version should sound 😁:

Edited March 20, 2023 by Irgendwer

R0ger · March 20, 2023

Of course I expected Germans to be first to react 😁 Oh wait, the harmonies are different to Czech&Slovak version !

Rybags · March 20, 2023

Cool. Would be interesting to hear normal speech as well though.

SAM - would benefit hugely if it was changed to use Pokey Timers instead of delay loops.

rensoup · March 21, 2023

Nice! Even though it's difficult to assess the speech quality because of the Czech language.

How about a downgraded mono version with just 2 channels for music ? Stereo only seems to imply some kind of cheat even though there isn't obviously.

Waiting for that paper too!

Heaven/TQA · March 21, 2023

I really love that intro… same with the lovely PM animations (mouth). Ace.

rdefabri · March 21, 2023

Very cool! It always bothered me when S.A.M. blacked out the screen, or when Berzerk pauses when it speaks.

Looking forward to seeing this evolve - your persistence has paid off and is to be admired!!!

thorfdbg · March 21, 2023

This is great news! Can you provide some details on the technology that is behind your code? SAM is not sample based, BTW. It is based on modelling the vocal tract of humans, which adds a lot of complexity, but brings quite some flexibility. At least in threory, as nobody (except Softvoice) knows how to adapt it to other languages. The latter would be possible,in principle, but it's unfortunately undocumented.

Rybags · March 21, 2023

I'd think SAM should do other languages mostly well since it takes phonetics as input.

R0ger · March 21, 2023

26 minutes ago, Rybags said:

I'd think SAM should do other languages mostly well since it takes phonetics as input.

Indeed it does. The issue is every language has different phonetics. For example Czech "R" is completely different to English one. What's worse, even vowels are shifted a bit. You can't get clean Czech vowels out of Sam. Which makes Sam sound like uncle from America. And it's exactly the same the other way around. At the moment I don't have English "R", and I have poor support for diftongs, and I miss few other sounds .. which makes for really bad English. On the other hand I can do Japanese or Italian just about perfectly, as they are very similar to Czech phonetically.

But don't worry, English is certainly on the list. I will probably make mono version of the demo, probably will try German one, even without proper German sounds, I can always improve it later. And I'm thinking about Bad Apple with actual singing (but obviously, without the video, or with very simpler animation, something like this demo). For those I need just to reuse the tech I have now, without improving it.

After that, English is on the table. That will require some more research and experimenting.

rcamp48 · March 21, 2023

I may be able to help you with the English part , what language are you coding it in ?

Russ

+Philsan · March 21, 2023

Cool, I am at disposal for Italian language!

Ppyo · March 21, 2023

As far as pronunciation goes, Spanish could be the easiest. Not many weird noises.

Beeblebrox · March 21, 2023

I keep running the demo. Ha - Brilliant! Love the mouth animations. Definitely be up for seeing some English songs with these two. Perhaps enter them in for Eurovision heh heh!

Last time I was pleasantly impressed with A8 speech synthesis was in the Cyberpunk demo:

Edited March 21, 2023 by Beeblebrox

R0ger · March 21, 2023

4 hours ago, Beeblebrox said:

Last time I was pleasantly impressed with A8 speech synthesis was in the Cyberpunk demo:

That sounds cool, but it seems to be just really low base frequency for the speech. I encountered this effect during my tests, as soon as I make my talking working again, I'll post some.

Anyway .. here is the mono version with software mixing. Quite mediocre must say. I run it like this for months, and only switched to stereo week ago, but now I can see (I mean hear) how superior hardware mixing is. Also only one oscilloscope, it's a cheap effect, but only if I just reuse the value I'm sending to pokey. Which in this version is the mix.

It's also quieter, as I need extra room in the amplitude range to prevent overflow. And it's also bigger, as I didn't bother to pack it ;-)

zvonky.mono.xex

+MrFish · March 21, 2023

40 minutes ago, R0ger said:

Anyway .. here is the mono version with software mixing. Quite mediocre must say.

It still sounds great in mono.

Havok69 · March 21, 2023

You HAVE to do this as a demo!

:-D

Edited March 22, 2023 by Havok69

thorfdbg · March 22, 2023

11 hours ago, R0ger said:

Indeed it does. The issue is every language has different phonetics. For example Czech "R" is completely different to English one. What's worse, even vowels are shifted a bit. You can't get clean Czech vowels out of Sam. Which makes Sam sound like uncle from America. And it's exactly the same the other way around. At the moment I don't have English "R", and I have poor support for diftongs, and I miss few other sounds .. which makes for really bad English. On the other hand I can do Japanese or Italian just about perfectly, as they are very similar to Czech phonetically.

But don't worry, English is certainly on the list. I will probably make mono version of the demo, probably will try German one, even without proper German sounds, I can always improve it later. And I'm thinking about Bad Apple with actual singing (but obviously, without the video, or with very simpler animation, something like this demo). For those I need just to reuse the tech I have now, without improving it.

After that, English is on the table. That will require some more research and experimenting.

Probably you mean "indeed it does not". As you say completely correctly, the main issue is that SAM (and its related products, such as the Amiga narrator.device also from SoftVoice) does not support the phonems of any other language but english. There is no german "R", no German "Ü", no German "CH" (actually, we have two of them). Even with these phonems present, the result would still sound like an american trying to speak German as the "melody" of a sentence is not right.

_The Doctor__ · March 22, 2023

Word order is different as well, and current trend is to spay and neuter everything. Could be refreshing if old English were used.

R0ger · March 22, 2023

41 minutes ago, Havok69 said:

You HAVE to do this as a demo!

Haha, wouldn't be easy, but could be hilarious. Keep the ideas coming guys, but I don't promise anything :-D

rensoup · March 22, 2023

2 hours ago, R0ger said:

Anyway .. here is the mono version with software mixing. Quite mediocre must say

Why not just use another audio channel for the 2nd singer (but only when both are singing) ?

Irgendwer · March 22, 2023

20 minutes ago, R0ger said:

Keep the ideas coming guys, but I don't promise anything

Go for it!

Sandor / HARD · March 22, 2023

Very cool demo, huge thumbs up!

Your tech sounds even better than the one in my favorite VIC-20 demo "Robotic Liberation".

I hope to see Atari demos from various teams with your stuff in them, something like here:

Edited March 22, 2023 by Sandor / HARD

+MrFish · March 22, 2023

19 hours ago, R0ger said:

Keep the ideas coming guys...

Anything by Kraftwerk

+MrFish · March 22, 2023

BTW... Kraftwerk used the TI Language Translator for various audio portions on the Computer World album.

New speech synthesis for Atari

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members