Tursi Posted August 13, 2016 Share Posted August 13, 2016 Some time ago artrag mentioned a voice converter he had created that ran on the MSX and produced really nice voice at 60Hz. He was kind enough to adapt it for the PSG in the TI-99/4A and ColecoVision, and after lots of work-induced delays, I've put together a small package for those interested in using it. With only three voices, the results are not quite as nice as the MSX, but most voice is still quite legible. I've posted a sample YouTube (it's just a quick and dirty test, but it demonstrates both good samples and not-so-good samples, so you can hear the range.) https://www.youtube.com/watch?v=wkBShy-EFkI Playback takes very little CPU (just unpacking 6 bytes per frame) and a fair amount of memory (360 bytes per second). It's quite good for adding short voice samples! On the TI side it may not be quite as handy, since the Speech Synthesizer can do as well or better with some tuning, but it's still a nice option to have and can enable speech without the speech synthesizer. The actual converter runs under Matlab so requires the Matlab runtime and 64-bit Windows to execute. Alternately, the Matlab script is included so you can run it on your choice of platform if you have the ability to run Matlab scripts. I experimented with Octave and although it didn't run out-of-the-box, I eventually got an early version processing there. For playback, I've included assembly playback code for both the TI-99/4A and the ColecoVision (the ColecoVision code is hand-optimized from SDCC output and runs fine linked into C programs). There's also a VGM converter with C source in case you have a need to VGM audio files (for instance, for my VGM compressor). Anyway, hope you enjoy! Archive is posted on my site: http://harmlesslion.com/software/artvoice 7 Quote Link to comment Share on other sites More sharing options...
ti99iuc Posted August 13, 2016 Share Posted August 13, 2016 Oh my.... i found this fantastic !! :D the laugh is incredible ahahah Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted August 13, 2016 Share Posted August 13, 2016 Loud female voices seem to work the best. Plenty of those around these parts if you believe another thread It is quite an interesting demo, and I wonder if the algorithms can be improved. Did anybody ever use SAMS on other platforms? Quote Link to comment Share on other sites More sharing options...
ti99iuc Posted August 13, 2016 Share Posted August 13, 2016 yes, right, it is very similar to SAMS, i used it on commodore years ago, and i had also a speech utility for the GW-Basic on my older PC DOS Quote Link to comment Share on other sites More sharing options...
JamesD Posted August 13, 2016 Share Posted August 13, 2016 It sounds very 80s. The sample rate is a bit low to make it easy to understand. 'Beware I Live' is barely recognizable. I'm sure it's a frequency issue.But some other sounds are pretty good. The higher sounds like the laugh are really good! Quote Link to comment Share on other sites More sharing options...
Tursi Posted August 14, 2016 Author Share Posted August 14, 2016 There's no sample rate in the traditional concept, that's how it works at 60hz. Instead, every frame it changes the tones being played on the sound chip. The muffling problems you hear are coming from the fact that low frequency voices have more harmonics, which make the job of choosing the "best" frequency more difficult. The more voices you have, the more accurately you can reproduce the sound, but we only have three. The same for low volume sounds, they are harder to distinguish from the background noise. The samples that are coming from speech chips, specifically Sinistar's "Beware I Live" and "I am the Texas Instruments Home Computer" - they are suffering from a double-compression. The original WAV file to LPC, then from LPC to the limited frequency selection of the sound chip. Likewise, Megabyte in the middle (Yes yes yes) and Fluttershy at the end (whatever you want to do) are suffering from being too quiet. If you play with it a bit, you can quickly predict what will work better than others. The issues are understood, but solving them is more difficult. Again, with more voices to throw at the issue, we can worry less about the "best" harmonic by just playing more of them, and it makes an audible difference. (It would be interesting to try a version that plays back on the FourTI card ). SAMS -- do you guys mean SAM (Software Automatic Mouth)? It uses a similar concept. I started porting a port of it to the TI some years ago, but at the time GCC wasn't up to the task. It may be able to do something with it now. Here's the page of the group that ported the original 6502 code to C: http://hitmen.c02.at/html/tools_sam.html 3 Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted August 14, 2016 Share Posted August 14, 2016 Yeah, SAM. I fat-fingered the extra 'S'. Quote Link to comment Share on other sites More sharing options...
artrag Posted August 14, 2016 Share Posted August 14, 2016 This is how it works on msx (using a dedicated SCC chip with 5 channels) 2 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 14, 2016 Share Posted August 14, 2016 A bit off topic, but among the VGM files for the music to the Sega Master System version of After Burner I noticed the attached file that contains speech (it can be played using VGMPlay). I didn't think VGM files could contain speech, so how does this work? After Burner - 02 - Get Ready.vgm Quote Link to comment Share on other sites More sharing options...
artrag Posted August 14, 2016 Share Posted August 14, 2016 The algorithm is in the .m file. The wav file is resampled at 8khz and segmented in chunks of 1/60 of second. Each chunk is converted in frequency via DFT on a number of points sufficient to guarantee frequency resolution of 1hz. The algorithm looks for the 3 highest peeks on the segment and records their frequencies and amplitudes. The info are coded as sn768xx parameters and saved in the output file. 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted August 15, 2016 Author Share Posted August 15, 2016 A bit off topic, but among the VGM files for the music to the Sega Master System version of After Burner I noticed the attached file that contains speech (it can be played using VGMPlay). I didn't think VGM files could contain speech, so how does this work? VGM files can contain cycle-accurate audio, that's how this one works. If you compressed it using my vgmcomp tool you'd get a warning about lost resolution and not hear the speech in the resulting file. The files compressed by this new tool are trivially converted to VGM (I included the tool ) with no loss of resolution. Quote Link to comment Share on other sites More sharing options...
JamesD Posted August 15, 2016 Share Posted August 15, 2016 The TI really needs a programmable interrupt timer similar to the one on the CoCo 3. Then you can select an appropriate playback rate for individual samples.Twice the playback rate would offer pretty decent quality without slowing down the machine significantly.Maybe a future RAM / Drive interface board will add such a thing.It's not that difficult, I implemented the CoCo 3 timer in Verilog based on the specs in a few hours, and I'm just learning Verilog. Quote Link to comment Share on other sites More sharing options...
mikiex Posted October 1, 2016 Share Posted October 1, 2016 Nice work, I wonder if you get any more improvement from updating twice a frame at 120hz? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.