Intellivoice is How Hard?

+DZ-Jay · May 30, 2018

Could you add an additional syntax for VOICE command, VOICE PLAY (string)? It would (at compile time) convert the string into English phonemes so you can put in simple English phrases easily.

That's not trivial and is the text-to-speech feature that intvnut was talking about.

The Navy algorithm is available, and the source of SAM for the Apple ][ is available for free too, for whomever wishes to implement it for the Intellivision.

carlsson · May 30, 2018

If you're going to spend some quality time making programs utilizing the IntelliVoice, surely you can put aside 20 minutes of the development time to come up with phrases. There is a manual of the SP-256 chip somewhere which lists all phoenemes and typical words you'd hear them in to guide you which ones to use where.

I'm not saying that text-to-speech would be useless, but that you can get very far by own trial and error if you read the manual first.

intvnut · May 30, 2018

That's not trivial and is the text-to-speech feature that intvnut was talking about.

The Navy algorithm is available, and the source of SAM for the Apple ][ is available for free too, for whomever wishes to implement it for the Intellivision.

There seems to be a lot of confusion on this point, so I will explain it again.

Allophones are pre-recorded generic fragments of words for sounds like "TH" and "UH" that can be strung together to pronounce "THE". These give you the generic "robot voice." There is an allophone library from the SP0256-AL2 that comes with IntyBASIC today, thanks to a generous arrangement from Microchip. You can string together allophones and speak words in "robot voice." Some allophone libraries allow you to tweak parameters such as pitch or speed; however, the allophone library that comes from the SP0256-AL2 does not.

Text-to-speech converts written text (not spoken voice, but rather ASCII or other computer readable text) to allophones. It is the process a screen reader might use to read to you what's written on the screen. It is an automated process for generating robot voice.

The Navy algorithm is a text-to-speech algorithm. It's great if you want someone to type a word or a sentence and have the computer guess how to pronounce it.

LPC encoding takes an audio sample (such as a WAV file or other PCM data) and converts it so speech parameters. It never processes text. It knows nothing of allophone samples. It has nothing to do with the Navy text-to-speech algorithm (which works with written text, not audio samples). The PCM samples are analyzed through a number of algorithms to arrive at Vocal Tract Model (VTM) parameters. LPC encoding is what allows me to talk into a microphone and ultimately have the speech chip sound like me.

The output of LPC encoding is not "allophones with different parameters." The output of LPC encoding, rather, is a set of VTM parameters, including excitation pitch and IIR filter coefficients.

The SP0256-AL2 allophone sample library was likely created from PCM samples and LPC encoding, with lots of heavy editing and chopping to make them uniform and easy to string together. The AL2 sample library does not appear inside the Intellivoice (SP0256-012). It was dumped from a related, compatible chip (SP0256-AL2) and then reformatted so we could feed it to the Intellivoice via the speech FIFO. I obtained permission from Microchip to redistribute the AL2 samples, which is why they're in IntyBASIC today. If they were embedded in the Intellivoice (and thus involved in how the Intellivoice produces audio), that would not have been necessary.

If you start with allophones and the Navy algorithm, you will never arrive at "my voice playing in the Intellivoice," since that process has nothing to do with the LPC encoding required to process your voice.

I have worked on the voice encoding problem. I have some tooling that doesn't work very well and may have intellectual property issues, therefore I am not redistributing it. My eventual goal is to have something I can redistribute. It's a task I pick up from time to time, spend a few intense weeks with, and then move on to something else when life gets in the way.

+DZ-Jay · May 30, 2018

I appreciate your thorough explanation, as I'm sure everyone does. That is very helpful.

I am just curious as to why your comment quoted mine. Just to be sure I didn't misunderstand the question, do you interpret the following as different from a request for "text-to-speech"?

It would (at compile time) convert the string into English phonemes so you can put in simple English phrases easily.

In any case, thanks again for the great thorough description of the options.

dZ

intvnut · May 30, 2018

Sorry, I had my wires crossed a little bit, since I had been reading another thread recently where the LPC process was described in terms of "allophones with different parameters." I haven't had my coffee yet.

Yes. If you start with written text and convert that to allophones, that's text to speech.

+DZ-Jay · May 30, 2018

Sorry, I had my wires crossed a little bit, since I had been reading another thread recently where the LPC process was described in terms of "allophones with different parameters." I haven't had my coffee yet.

Yes. If you start with written text and convert that to allophones, that's text to speech.

No worries!

Like I mentioned above, the source of SAM for the Apple ][ is available for free from the author. It includes documentation on the algorithms as well. I found it once and saved the bookmark; I'll try to find it when I get home and share it.

It may be useful to someone more industrious and smarter than me.

dZ.

intvnut · May 31, 2018

If someone does put together a text-to-speech tool, my suggestion would be to make it a separate tool, so you can tweak the allophone list that's generated. My experience is that text-to-speech is "ok" for a screen reader, where the listener can figure out what was meant from context, but that the algorithms get things laughably wrong quite regularly. At least, that's how it was with the text-to-speech algo in TI's Terminal Emulator II cartridge (TE2), as well as the tool that came with Apple ]['s Mockingboard. A transient error is easy to ignore, while you might want more control to clean things up if you're baking fixed strings into a game.

Some words are impossible to get correct, because they're homographs of each other, but pronounced differently. I'm talking about words like: "live", "bow", "bass", "perfect", "permit", "polish", "record", "row", "sow", "tear", "wind", "wound", ... And then there's the words smart people disagree on, such as "route". Does it rhyme with boot or out, or does it vary by circumstance?

In some text-to-speech systems, you can augment the input text with inflection markers to distinguish homographs, and guide interpretation of tough words. TE2 used underscores and carats for that, if memory serves. That helps with words like "per-MIT" (as in allow) and "PERM-it" (as in a licence), and words that have common prefixes that are pronounced rather differently (com-POS-ite vs. COMP-en-sate). I don't know if SAM does, though, as I haven't used it.

+DZ-Jay · May 31, 2018

I don't know if SAM does, though, as I haven't used it.

As far as I recall (I had it for the C=64), SAM did have a special notation that allowed you to guide inflections, at the phrase (e.g., for questions or exclamations), as well at the syllable level.

It had a demo where SAM "sang" the Star-Spangled Banner, and other choice tunes, to show how to alter the speech to change intonation.

I never played too much with it, but as far as I remember, it was very versatile; and the very impressive demos were in BASIC, so you could see how to do everything.

And then there's the words smart people disagree on, such as "route".

That's easy: A route ("path") rhymes with root. Everybody knows that!!

*ducks*

:grin:

carlsson · May 31, 2018

SAM on the C64 (and I suppose Apple II and Atari as well) has two modes:

In the Reciter mode, you give it actual words and it tries to pronounce those according to rules, like the text to speech algorithm described above. It has a built in library of common English words, but beyond that the results can be very strange.

In the Sam (?) mode, you build strings of allophones much like you do with IntelliVoice. You can though adjust pitch for each vowel, extend some with symbols, make the voice go up or down with a question mark or dot. However any string that doesn't match a set of correct allophones will give you a warning beep-beep.

SAY"PERMIT" in Reciter mode yields "PER-mit" like the license. Actually I don't know a way to make it say per-MIT, adding hyphens or spaces makes no difference.

Sign In

Intellivoice is How Hard?

Recommended Posts

+DZ-Jay

Link to comment

Share on other sites

carlsson

Link to comment

Share on other sites

intvnut

Link to comment

Share on other sites

+DZ-Jay

Link to comment

Share on other sites

intvnut

Link to comment

Share on other sites

+DZ-Jay

Link to comment

Share on other sites

intvnut

Link to comment

Share on other sites

+DZ-Jay

Link to comment

Share on other sites

carlsson

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More