Jump to content
IGNORED

Interpretation of LPC


Willsy

Recommended Posts

I'm interested in (eventually) replacing the QBOX app with a modern Java application. In the meantime, I need to understand what LPC *is*. Articles on the web are just full of maths mumbo-jumbo, and it probably isn't particularly relevant to the method used on the TI sound chip.

 

Does anyone have any understanding of the LPC used on the speech synth? I’d like to present my understanding of it here, and please, if anyone knows the *facts* then please do correct me.

 

The sound generators in the TI speech chip consist of a white noise generator (for ess sounds, sh sounds etc), a buzzer for pitch, and modulator.

 

My understanding of LPC is that it is simply a compressed version of 8khz samples? So, I think what QBOX does is take a few frames of an 8khz sample, probably interpolates them in some way (average?), then looks up the closest match in LPC tables for pitch, modulation, etc and uses that value from the LPC table as a byte of speech data. However, this doesn't explain what is "predictive" about it.

 

My questions:

 

* For an 8 bit byte of speech data, does anyone know what the bits in each byte mean?
* Is the byte stream that is sent to the chip “stateful”. By that, I mean, can we, for example, send commands to the speech chip (outside of the normal start, stop, reset) to do things like select pitch? If so, then QBOX is obviously taking this into account.

 

I find the entire subject absolutely fascinating, and would love to learn more about it. If anyone is interested in this I’d be happy to team up on something. My problem is that, as things currently stand, it seems like a big super-complex mathematical conumdrum, and maths aint my bag, man. Heck, Lee had to teach me what floored division was – I’d never heard of it. When my class was doing maths, I was skipping class, and playing Beatles songs on the guitar, in the hope of getting my hands down girls pants (I had quite a good success rate at that, back in the day, but I’m paying the price now ;-)

  • Haha 1
Link to comment
Share on other sites

This is quite interesting: http://en.wikipedia.org/wiki/Linear_predictive_analysis

 

 


Linear predictive analysis is a simple form of first-order extrapolation: if it has been changing at this rate then it will probably continue to change at approximately the same rate, at least in the short term. This is equivalent to fitting a tangent to the graph and extending the line.

One use of this is in Linear predictive coding which can be used as a method of reducing the amount of data needed to approximately encode a series. Suppose it is desired to store or transmit a series of values representing voice. The value at each sampling point could be transmitted (if 256 values are possible then 8 bits of data for each point are required, if the precision of 65536 levels are desired then 16 bits per sample are required). If it is known that the value rarely changes more than +/- 15 values between successive samples (-15 to +15 is 31 steps, counting the zero) then we could encode the change in 5 bits. As long as the change is less than +/- 15 values in successive steps the value will exactly reproduce the desired sequence. When the rate of change exceeds +/-15 then the reconstructed values will temporarily differ from the desired value; provided fast changes that exceed the limit are rare it may be acceptable to use the approximation in order to attain the improved coding density.

 

Makes me think that the TMS5200 implementation is more of a stateful bit-stream, than chunks of pre-compressed waveform data that can be looked up in table form.

Link to comment
Share on other sites

The speech synthesiser is based around a model of the human vocal tract, and much of the data is for controlling a lattice filter that shapes the sound (as an analogy, I think you'd be right if you thought about the data defining how your vocal cords are set, the shape of your mouth and the position of your tongue as you speak).

 

If you go to my page [http://www.avjd51.dsl.pipex.com/ti_portable_speech_lab/ti_portable_speech_lab.htm#theory_of_operation] and scroll down a little to the section "A noise by any other name" (below figure 3), there is quite a good description of LPC and the coding process, plus sample LPC data for the word "Help".

 

"The resulting LPC data is then coded to further reduce the bit-rate in accordance with the coding tables stored in ROM in the selected target speech synthesis device" - so I don't think the data you see being sent to the speech synth is actual LPC data - it's just lookup data into the coding tables embedded in the speech chip. TI during their recording sessions were able to edit the LPC data. Thierry on the other hand says on one of his web pages that he tried making simple changes to some of the speech data, but couldn't get anywhere with it. So editing raw LPC data, possible, editing the lookup data for the coding tables - looks tricky if not impossible.

 

I was approached by some MESS guys a couple of months ago for copies of the ROMs and scans of the speech boards in my TI speech lab, as they wanted to try to reproduce the system in MESS. I gave them what they wanted, but the Speech Analysis board has a ground plane over much of both sides of the board, so there is no way to see how the components are interconnected and hence derive the circuit. [i gave them your name, Michael, as the expert on the TI/99 in MESS, not sure if they ever got in touch ...?]

  • Like 1
Link to comment
Share on other sites

Mark asked me if it might be possible to derive coding tables for the other speech synth chips from the TI speech lab code, which supports the TMS5100, 5110, 5200 and 5220 ...

 

In the attached zip files are two files, QV5220.COD from QBox and SPCLAB2.SRC (a text file) which is part of the speech lab software that Harald in Germany reverse engineered. Looking at the last three parameters at the end of the .COD file, these are 10298, 14979 and 19660, in hex >283A, >3A83, >4CCC. If you search for these in the .SRC file, you'll find them in the coding tables at the end of the line below the label LN6C3C. Looks like other values match up as well. So you should be able to work out how the formats of the tables in the two files are linked, and hence derive a QBox coding table for the 5200.

 

Stuart.

 

Coding Tables.zip

Link to comment
Share on other sites

  • 8 years later...
On 2/10/2022 at 11:35 AM, Willsy said:

Bumping in case anyone is interested in generating 5220 tables:thumbsup:

Thanks, I missed this thread. 

Try SPEECODER from @mizapf.  Search the forum here and the FAQ. Threads here and here

the LPC is a bitstream where each Frame has the parameters for the next 25 ms. 
 

A frame has a variable number of fields of different sizes (like flag bits, 6 bit pitch, and varying precision coefficients K1-10.  So you have to unpack fields until you have end of frame. Then a new one starts. 
 

SPEECODER can show you that. 

 

There are great explanations of LPC in the later chips’ data books. You can get many of these from Bitsavers. For instance the TSP50C40 is excellent. 

“TSP5OCAOA Speech Products Data Manual 1987
Principles of Operation and Electrical Specifications (formerly designated TMS50C40A)“

 

“TSP6100 Speech Products Data Manual 1987
Principles of Operation and Electrical Specifications (formerly designated TMS6100)” 

 

“TSP5110A Voice Synthesis Processor Data Manual 1987
Principles of Operation and Electrical Specifications (formerly designated TMS5110A)”


Which I browsed at:


—— 

TI Archive at Southern Methodist University, Dallas TX. DeGolyer Library. 

 

(I read a ton of new, but not publishable information. Message me for questions about  it.)

 

The TI archive at SMU has invaluable speech papers, in the personal files of Gene Frantz and Larry Brantingham. 


in these are the mask ROM for the 0285 coding table, complete schematics for the 0285,  and printouts of various coding tables. There is a 30” x 17” blueprint of the 0285 silicon.
 

I saw Gene Frantz’ printouts and  handwritten tables for the Chirp ROM (stimulus) and K1-10 values in the coding table. They are dated 3-19-78.

 

 They are in a file called “Spelling Bee” which was a working name for Speak and Spell. (Curiously, archived in a box with Laser Guided Missile.) 

 

another treasure trove is the SPEECH EDUCATION MODULE USER'S GUIDE MAY 1982. I want to ask permission to publish this! I photoed the whole thing but OCR is impossible. 

 

 

This was a single board computer with TMS7000 and 5200. The manual has schematics and ROM and FORTH source code. The FORTH command vocabulary lets you  unpacks/pack frames of the LPC and step through it. you can alter pitch or any of the K1-10 coefficients of that frame. 
 

I wonder if any of that code is in Stuart’s machine?
 

Privately
 

In my dad’s papers, (Al Olson),  last June, I found printouts of many experimental coding tables, with the same phrase coded against each. Undoubtedly from the speech lab at TI Lubbock or later West Building where he worked. There is a 3-page memo about the choices made after considering the resulting quality. 

Visit!

 

I don’t have time to go through all this material. 
 

I would welcome anyone who wants to come to Austin and help!

 

particularly to compare all the coding tables and type in LPC dumps (if they are unique.) and that FORTH code! (Special place in my heart there.) 

 

If anyone makes a trip to Dallas to the Archive, I will do what I can to help. Would love to visit it together.
 

(Amanda  is an excellent help with the archive, but she doesn’t care to geek out about what we find.)

 

I have to ask permission from the library ( and that goes to TI?) to publish any excerpts, but knowing what to ask for is half the battle. 
 

 

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...