R0ger Posted March 20, 2023 Share Posted March 20, 2023 In 2019, when topic for 2020 Forever party was announced to be "Robot", I came with with an idea to somehow improve speech synthesis situation on Atari. Good old SAM can't talk with video on, is mostly only good for English, not very good for singing, and nobody really understands how it works. And singing was something I was especially interested in. I ended up writing new speech synth from scratch. Well, 2020 Forever didn't happen. Nor did Forever 2021, nor 2022. But it did finally happen this year. So after all those years, I barely managed to finish the demo in the last two weeks, and here is the result: I plan to make like a lecture about it in Czech on this year Atariada, and I would make a video or megapost here about it soon after in English. That would be early May. So at the moment I don't want to go too deep in explaining how it works. But here are few points: - it can do 2 voices with low DMA modes, like Antic D, or low res text mode. It can do 1 voice with full-screen hires. It doesn't like badlines though, so no hires text modes. And with a bit of work, it can also do other things in between, like play a POKEY song, and do slight animation. - this demo plays the 2 voices in separate channels in stereo, but it is also possible to mix them by CPU and only use 1 channel for 2 voices of speech. - of course it can also talk. It was supposed to be part of the demo, I just didn't manage to make that part work before the deadline. - at the moment it sucks at English, as I didn't need it for the demo. But I'm working on it, as well as some other languages. - compared to SAM (which I somewhat understand now, as some good soul rewrote it to C) it needs more memory (about 15k sample bank) but way less CPU. Features and output quality are similar. - mine has some special features for singing, like 16 bit frequency control, vibrato, and frame perfect timing. - POKEY music uses LZSS. The code is based on @rensoup's modified LZSSP, and I use his RMT2LZSS. - speech output is every 2 lines, so about 8kHz, 4 bits. I can also do 8bits output, but it doesn't help much, and usually it's not worth the extra channel. Stay tuned for more demos, and one day, hopefully, something other people can use too. PS. you need stereo for the demo ! zvonky.xex 43 4 Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted March 20, 2023 Share Posted March 20, 2023 (edited) Nice! Here how the German version should sound 😁: Edited March 20, 2023 by Irgendwer 2 Quote Link to comment Share on other sites More sharing options...
R0ger Posted March 20, 2023 Author Share Posted March 20, 2023 Of course I expected Germans to be first to react 😁 Oh wait, the harmonies are different to Czech&Slovak version ! 1 4 Quote Link to comment Share on other sites More sharing options...
Rybags Posted March 20, 2023 Share Posted March 20, 2023 Cool. Would be interesting to hear normal speech as well though. SAM - would benefit hugely if it was changed to use Pokey Timers instead of delay loops. 1 Quote Link to comment Share on other sites More sharing options...
rensoup Posted March 21, 2023 Share Posted March 21, 2023 Nice! Even though it's difficult to assess the speech quality because of the Czech language. How about a downgraded mono version with just 2 channels for music ? Stereo only seems to imply some kind of cheat even though there isn't obviously. Waiting for that paper too! Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted March 21, 2023 Share Posted March 21, 2023 I really love that intro… same with the lovely PM animations (mouth). Ace. 1 Quote Link to comment Share on other sites More sharing options...
rdefabri Posted March 21, 2023 Share Posted March 21, 2023 Very cool! It always bothered me when S.A.M. blacked out the screen, or when Berzerk pauses when it speaks. Looking forward to seeing this evolve - your persistence has paid off and is to be admired!!! Quote Link to comment Share on other sites More sharing options...
thorfdbg Posted March 21, 2023 Share Posted March 21, 2023 This is great news! Can you provide some details on the technology that is behind your code? SAM is not sample based, BTW. It is based on modelling the vocal tract of humans, which adds a lot of complexity, but brings quite some flexibility. At least in threory, as nobody (except Softvoice) knows how to adapt it to other languages. The latter would be possible,in principle, but it's unfortunately undocumented. Quote Link to comment Share on other sites More sharing options...
Rybags Posted March 21, 2023 Share Posted March 21, 2023 I'd think SAM should do other languages mostly well since it takes phonetics as input. Quote Link to comment Share on other sites More sharing options...
R0ger Posted March 21, 2023 Author Share Posted March 21, 2023 26 minutes ago, Rybags said: I'd think SAM should do other languages mostly well since it takes phonetics as input. Indeed it does. The issue is every language has different phonetics. For example Czech "R" is completely different to English one. What's worse, even vowels are shifted a bit. You can't get clean Czech vowels out of Sam. Which makes Sam sound like uncle from America. And it's exactly the same the other way around. At the moment I don't have English "R", and I have poor support for diftongs, and I miss few other sounds .. which makes for really bad English. On the other hand I can do Japanese or Italian just about perfectly, as they are very similar to Czech phonetically. But don't worry, English is certainly on the list. I will probably make mono version of the demo, probably will try German one, even without proper German sounds, I can always improve it later. And I'm thinking about Bad Apple with actual singing (but obviously, without the video, or with very simpler animation, something like this demo). For those I need just to reuse the tech I have now, without improving it. After that, English is on the table. That will require some more research and experimenting. 8 Quote Link to comment Share on other sites More sharing options...
rcamp48 Posted March 21, 2023 Share Posted March 21, 2023 I may be able to help you with the English part , what language are you coding it in ? Russ Quote Link to comment Share on other sites More sharing options...
+Philsan Posted March 21, 2023 Share Posted March 21, 2023 Cool, I am at disposal for Italian language! Quote Link to comment Share on other sites More sharing options...
Ppyo Posted March 21, 2023 Share Posted March 21, 2023 As far as pronunciation goes, Spanish could be the easiest. Not many weird noises. Quote Link to comment Share on other sites More sharing options...
Beeblebrox Posted March 21, 2023 Share Posted March 21, 2023 (edited) I keep running the demo. Ha - Brilliant! Love the mouth animations. Definitely be up for seeing some English songs with these two. Perhaps enter them in for Eurovision heh heh! Last time I was pleasantly impressed with A8 speech synthesis was in the Cyberpunk demo: Edited March 21, 2023 by Beeblebrox 4 Quote Link to comment Share on other sites More sharing options...
R0ger Posted March 21, 2023 Author Share Posted March 21, 2023 4 hours ago, Beeblebrox said: Last time I was pleasantly impressed with A8 speech synthesis was in the Cyberpunk demo: That sounds cool, but it seems to be just really low base frequency for the speech. I encountered this effect during my tests, as soon as I make my talking working again, I'll post some. Anyway .. here is the mono version with software mixing. Quite mediocre must say. I run it like this for months, and only switched to stereo week ago, but now I can see (I mean hear) how superior hardware mixing is. Also only one oscilloscope, it's a cheap effect, but only if I just reuse the value I'm sending to pokey. Which in this version is the mix. It's also quieter, as I need extra room in the amplitude range to prevent overflow. And it's also bigger, as I didn't bother to pack it zvonky.mono.xex 4 Quote Link to comment Share on other sites More sharing options...
+MrFish Posted March 21, 2023 Share Posted March 21, 2023 40 minutes ago, R0ger said: Anyway .. here is the mono version with software mixing. Quite mediocre must say. It still sounds great in mono. Quote Link to comment Share on other sites More sharing options...
Havok69 Posted March 21, 2023 Share Posted March 21, 2023 (edited) You HAVE to do this as a demo! Edited March 22, 2023 by Havok69 Quote Link to comment Share on other sites More sharing options...
thorfdbg Posted March 22, 2023 Share Posted March 22, 2023 11 hours ago, R0ger said: Indeed it does. The issue is every language has different phonetics. For example Czech "R" is completely different to English one. What's worse, even vowels are shifted a bit. You can't get clean Czech vowels out of Sam. Which makes Sam sound like uncle from America. And it's exactly the same the other way around. At the moment I don't have English "R", and I have poor support for diftongs, and I miss few other sounds .. which makes for really bad English. On the other hand I can do Japanese or Italian just about perfectly, as they are very similar to Czech phonetically. But don't worry, English is certainly on the list. I will probably make mono version of the demo, probably will try German one, even without proper German sounds, I can always improve it later. And I'm thinking about Bad Apple with actual singing (but obviously, without the video, or with very simpler animation, something like this demo). For those I need just to reuse the tech I have now, without improving it. After that, English is on the table. That will require some more research and experimenting. Probably you mean "indeed it does not". As you say completely correctly, the main issue is that SAM (and its related products, such as the Amiga narrator.device also from SoftVoice) does not support the phonems of any other language but english. There is no german "R", no German "Ü", no German "CH" (actually, we have two of them). Even with these phonems present, the result would still sound like an american trying to speak German as the "melody" of a sentence is not right. Quote Link to comment Share on other sites More sharing options...
_The Doctor__ Posted March 22, 2023 Share Posted March 22, 2023 Word order is different as well, and current trend is to spay and neuter everything. Could be refreshing if old English were used. Quote Link to comment Share on other sites More sharing options...
R0ger Posted March 22, 2023 Author Share Posted March 22, 2023 41 minutes ago, Havok69 said: You HAVE to do this as a demo! Haha, wouldn't be easy, but could be hilarious. Keep the ideas coming guys, but I don't promise anything 3 Quote Link to comment Share on other sites More sharing options...
rensoup Posted March 22, 2023 Share Posted March 22, 2023 2 hours ago, R0ger said: Anyway .. here is the mono version with software mixing. Quite mediocre must say Why not just use another audio channel for the 2nd singer (but only when both are singing) ? Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted March 22, 2023 Share Posted March 22, 2023 20 minutes ago, R0ger said: Keep the ideas coming guys, but I don't promise anything Go for it! 1 Quote Link to comment Share on other sites More sharing options...
Sandor / HARD Posted March 22, 2023 Share Posted March 22, 2023 (edited) Very cool demo, huge thumbs up! Your tech sounds even better than the one in my favorite VIC-20 demo "Robotic Liberation". I hope to see Atari demos from various teams with your stuff in them, something like here: Edited March 22, 2023 by Sandor / HARD 5 Quote Link to comment Share on other sites More sharing options...
+MrFish Posted March 22, 2023 Share Posted March 22, 2023 19 hours ago, R0ger said: Keep the ideas coming guys... Anything by Kraftwerk 3 Quote Link to comment Share on other sites More sharing options...
+MrFish Posted March 22, 2023 Share Posted March 22, 2023 BTW... Kraftwerk used the TI Language Translator for various audio portions on the Computer World album. 4 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.