coolio Posted June 13 Share Posted June 13 I am trying to create a project that interfaces with a TMS5220 speech synthesize chip at the hardware level, and I am not having much success. I am able to generate a lot of noise, just not the expected speech-like sounds. I need some help figuring out what I might be doing wrong with the circuit interfacing with the TMS5220 or even the software. I know this isn't strictly a TI 99/4a question, but I honestly don't know where else to ask. Before I dump my questions here, I just want to double check if this would be a good place to ask. Thanks! 2 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted June 13 Share Posted June 13 https://www.unige.ch/medecine/nouspikel/ti99/speech.htm Careful with the ready line, the first thing the chip will do is try and slow you down with this signal. Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted June 13 Share Posted June 13 11 hours ago, coolio said: I am trying to create a project that interfaces with a TMS5220 speech synthesize chip at the hardware level, and I am not having much success. I am able to generate a lot of noise, just not the expected speech-like sounds. I need some help figuring out what I might be doing wrong with the circuit interfacing with the TMS5220 or even the software. I know this isn't strictly a TI 99/4a question, but I honestly don't know where else to ask. Before I dump my questions here, I just want to double check if this would be a good place to ask. Thanks! It would be tough to separate the TMS5220 from the TI-99/4A! You are most welcome to post questions here. Are you using a Speech ROM, or sending raw data to the TMS5220? Here are some resources to consult: Thierry Nouspikel's TI Tech Pages http://www.unige.ch/medecine/nouspikel/ti99/speech.htm Peripherals Theory of Operation - section 13, Speech Synthesizer ftp://ftp.whtech.com/datasheets and manuals/Hardware/Texas Instruments/PHP1200 Peripheral Expansion System/Peripheral Expansion System Theory Of Operation and Technical Traning Guide 1982-09-03.pdf Bunyard Hardware Manual - Much of the same material (same author!) https://archive.org/details/tibook_hardware-manual-for-the-texas-instruments-994a Some Speech code and documents which I collected from my father: https://gitlab.com/FarmerPotato/speech Datasheets in https://ftp.whtech.com/datasheets and manuals/Datasheets - TI/ Of relevance: TMS5200 Databook TMS5220C https://ftp.whtech.com/datasheets and manuals/Datasheets - TI/TMS5220C (like 5200 in 99-4 speech) manual.pdf Early TMC0285 (rough.. somehow the file is empty on ftp.whtech.com) https://usermanual.wiki/Manual/TMC0285SpeechSynthesisProcessor.1384298344/view The TMS5200s are functionally similar. 0285 was the early part number for the "Home Computer Chip" as the Speak & Spell's TMC0281 evolved. It appears in the Speech Synthesizer peripheral with various markings. The differences among 5200s won't prevent you from getting recognizable speech. LPC strings coded for the Home Computer , 0285 or 5200, will still be recognizable on a 5220 or 5220C. Quote Link to comment Share on other sites More sharing options...
coolio Posted June 13 Author Share Posted June 13 (edited) Thanks. Yes, I have studied Thierry Nouspikel's TI Tech Pages very closely over the years. My question gets into a bit more detail. For context, I am trying to operate the TMS5220 from my custom hombrewed breadboard CPU (learn more about that here). The only thing I am trying to do is enable the Speak External command and so there is no integration with a Speech ROM. When I try to write external LPC speech data to the TMS5220, sound is produced, but it is very garbled. If I vary the OSC input between the 8 KHz and 10 KHz sampling speeds, I can just make ut some of the words I would expect, but it's too static and garbled. In order to satisfy the timing requirements of the "Write Cycle for External Speech Data" identified in the TMS5220 data sheet, I have built circuity that does the following when I want to write data to the common register or FIFO buffer: Bring /WS low Immediately present the byte to write on the data bus (with D0 being MSB and D7 being LSB) Wait for the /READY line to go high, then continue to hold both /WS low and the data bus value valid When the /READY line returns low, then release /WS (bring it high) and return the data bus to high-Z state. Rinse and repeat I have verified with a logic analyzer that my circuit is in fact doing that, so that's not the question here - unless this sequence is wrong, then that's the answer ;-). However, I am noting peculiar behavior with the /READY line that I don't readily understand. First, I note in Thierry Nouspikel's notes not he TMS5220 that the /READY line can be used as a sort of flow control. That is, after writing the first 16 bytes into the FIFO buffer, when I write the 17th, the /READY line will be held high until the FIFO buffer has consumed some data and can make room for more. I in fact see this happening in the logic analyzer. The first 16 bytes get written with the /READY line is held high for about 20 uS each time, which is in line with the data sheet on t_w(R), and then on the 17th byte the /READY line is held higher about 9.5 mS, and then the /READY line returns low, and my circuitry releases /WS and the data bus. During that time, the talking begins (the sound is not intelligible, but I can't tell if it is the first 16 bytes that is causing the problem). This leads to my first question: When the /READY line is held high for so long on that 17th byte, is the external data that I was trying to write accepted (just before) when the /READY line is returned low, or in this scenario did that external write attempt fail and when the /READY line is returned low I need to try again with that 17th byte that I was trying to write? Then, after that 17th byte, the byte writing never returns to t_w(R) (/READY line pulse width) being less than 23 uS as indicated in the data sheet. All subsequent byte writing cycles experiences the /READY line being held high for 200us each for a few bytes, then a long pause of 10-20 mS. Basically it looks like it the speech data is consumed a few bytes at a time (3-8 bytes at a time, it varies) and each byte takes about 200 uS to consume, and then that packet causes a longer pause measured in 10s of milliseconds, during which I assume speech sound is being generated. My question then is whether this timing pattern sounds about right? Finally, I have sample LPC speech data that is about 300 bytes long. The /INT line is held high (inactive) until about halfway through that data, at which point it gets brought low periodically. I am taking this as a sign that something is not going as I expect because this would indicate that either the talking has stopped or the FIFO buffer is low or empty. But the first time /INT asserts, the /READY line is still being held high. So I am a bit confused by this. I suspect this means my data writing process above and the dependences on the /READY line for flow control might not be right, but I cannot find what might be wrong per the data sheet and Thierry Nouspikel's tech page. Any advice here would be much appreciated. Edited June 13 by coolio Typo Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted June 13 Share Posted June 13 For additional reference, pull up schematics of 80s pinball games, too. One of the chips the talking pinball games of the era used was the 5220. Quote Link to comment Share on other sites More sharing options...
Stuart Posted June 13 Share Posted June 13 This (https://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_SpeechSystemUG.pdf) might be helpful, describing the 5220 speech synth in the BBC Micro. Lots of lovely low-level detail about loading data in "Section Seven". The 5220 speech synth is interfaced through a 6522 which presumably enables the processor to carry on processing while waiting for the 5220 to do its stuff. Quote Link to comment Share on other sites More sharing options...
JasonACT Posted June 13 Share Posted June 13 This is how it's documented by TI: Send in 16 bytes (as READY allows) and poll the status register (again as READY allows) and watch for the Buffer Low bit, when you see that it's half empty, so send in another 8 bytes and continue polling (or send whatever you have left if less and you're done). 3 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted June 14 Share Posted June 14 Timing: Clocked at 40 speech frames/sec, each period 25 ms consumes from 5-51 bits. Your observed 10-20 mS long pause seems about right. 300 bytes is a big sample, but ok. Have you inspected it frame by frame? (I provide one such decoder, in C, in my gitlab above.) That will tell you what bitrate to expect. I've only driven the 5200 from a 4A, using the ISR code by John Philips (also in my directory.) This works like others say above: speech interrupt occurs, handler code checks status, feeds in 8 bytes, done. Repeat. My wild guess is that your setup garbles a byte in the FIFO on the 17th write that is blocked on READY. Try loading 16 then wait for the chip to say it wants more and send another 8. Either on interrupt, or you poll status. 1 Quote Link to comment Share on other sites More sharing options...
Eric Lafortune Posted June 14 Share Posted June 14 Some general thoughts, based on my experience simulating the TMS5200 in software (Java code to be released, derived from the canonical code in Mame). It seems you have implemented and probably checked most of them, and some probably don't apply. On the input side: You mustn't forget the initial SPEAK EXTERNAL command (0x40). The input is a contiguous stream of bits, packed in a stream of bytes. The bit order is easy to get wrong, although it would consistently result in pure noise. The LPC coding tables are different for the TMS5200 and the TMS5220, but not dramatically. You probably want to end with a STOP frame (0xf) to leave the chip in a clean state. I also get garbled speech when underflowing the LPC buffer. You could start with a simple vowel sample of 16 bytes, e.g. stuffing it with compact REPEAT frames. I assume the TMS9900 doesn't resend the 17th byte either when it is held up by the /READY line. On the output side: In software, you can use the analog output or the more accurate digital output. In my experience, getting the order of the bits in the digital output wrong can result in vaguely recognizable speech. 1 Quote Link to comment Share on other sites More sharing options...
coolio Posted June 15 Author Share Posted June 15 14 hours ago, FarmerPotato said: Timing: Clocked at 40 speech frames/sec, each period 25 ms consumes from 5-51 bits. Your observed 10-20 mS long pause seems about right. 300 bytes is a big sample, but ok. Have you inspected it frame by frame? (I provide one such decoder, in C, in my gitlab above.) That will tell you what bitrate to expect. I've only driven the 5200 from a 4A, using the ISR code by John Philips (also in my directory.) This works like others say above: speech interrupt occurs, handler code checks status, feeds in 8 bytes, done. Repeat. My wild guess is that your setup garbles a byte in the FIFO on the 17th write that is blocked on READY. Try loading 16 then wait for the chip to say it wants more and send another 8. Either on interrupt, or you poll status. Thanks! So I changed the control flow of my set up to this (after sending the speak external command): Set N to 16 Repeat for N bytes or until input buffer is done: Bring /WS low Immediately present the byte to write on the data bus (with D0 being MSB and D7 being LSB) Wait for the /READY line to go high, then continue to hold both /WS low and the data bus value valid When the /READY line returns low, then release /WS (bring it high) and return the data bus to high-Z state. Wait for /INT line to go low. Check the status register on the TMS5220 If status register indicates talking has stopped: If input buffer not done yet, send speak external command and set N to 16. Go to step #2. If input buffer done, set stop frame and got to END If status register indicates buffer is low, set N to 8 and go to step #2 This certainly produces better results, as the rhythm of the noise resembles the speech rhythm of my sound sample I converted to LPC data, but it is still very garble and not intelligible. Which now makes me wonder if I don't have good LPC data. Does anybody have LPC data designed to be sent with the speak external command on the TMS5220? Quote Link to comment Share on other sites More sharing options...
coolio Posted June 15 Author Share Posted June 15 8 hours ago, Eric Lafortune said: You mustn't forget the initial SPEAK EXTERNAL command (0x40). I thought the speak external command was x110xxxx, which would be 0x60. Isn't 0x40 the "load address" command for an address of 0x0? Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted June 15 Share Posted June 15 There are a bunch of sample LPC strings in my gitlab. Mostly from game cartridges. (released and unreleased.) How did you make your LPC data? Dumb question: verify that the actual bytes are on the data bus? And yeah TI numbered from D0 most significant. What impedance do you present to the output (analog). Crazy idea: run the output to the input of a 76489 as in the 99/4A. Apart from the 4A, TI always had filter and amp stages in the output. You'll see a pre-emphasis filter option on the front end of LPC coding, like in Blue Wizard. This gives more attention to higher more essential speech frequencies. (Starting with Speak & Spell) TI would recommend an analog output stage to amplify the low end. Quote Link to comment Share on other sites More sharing options...
JasonACT Posted June 15 Share Posted June 15 static unsigned char press_fire [] = { // From Parsec 0x10, 0x80, 0x58, 0x43, 0x9B, 0x6A, 0x8A, 0x67, 0x46, 0xC8, 0xD9, 0xEA, 0xD1, 0x4C, 0x8E, 0x8C, 0x13, 0x96, 0x5B, 0x6B, 0xAA, 0x76, 0xD9, 0x5A, 0xC2, 0x24, 0x69, 0xD3, 0x85, 0x3A, 0x85, 0x1C, 0x03, 0x74, 0x59, 0x66, 0x01, 0x0B, 0x44, 0xC0, 0x03, 0x14, 0x60, 0x40, 0x56, 0xAE, 0x43, 0xD5, 0xD3, 0x23, 0xB4, 0x18, 0x2D, 0xCD, 0xEC, 0x88, 0xF0, 0x64, 0x7C, 0x74, 0xBB, 0x22, 0x22, 0x52, 0xCE, 0xD1, 0x5D, 0x4F, 0x6B, 0x2F, 0xD9, 0x47, 0xF7, 0xCD, 0xBD, 0xBC, 0xE8, 0x1C, 0xDD, 0x27, 0xF7, 0xB6, 0x92, 0x73, 0x74, 0x9F, 0xCC, 0xD7, 0x52, 0xEE, 0xD2, 0x7D, 0x91, 0x1C, 0x4F, 0x39, 0x43, 0xF7, 0x85, 0x73, 0xB4, 0xE4, 0x0C, 0xDD, 0x57, 0xAE, 0x96, 0x92, 0xDC, 0xF4, 0xD0, 0xA8, 0x93, 0x57, 0x6A, 0xD3, 0x6D, 0x92, 0x6C, 0x76, 0xC9, 0x55, 0x4F, 0xBA, 0xB7, 0x0E, 0xD9, 0x1A, 0xDB, 0x1B, 0x02, 0x88, 0x3A, 0x84, 0x03, 0x14, 0x00, 0x40, 0x80, 0x54, 0xE1, 0x00, 0x01, 0xF8, 0xCC, 0x35, 0x00, 0xDF, 0x8F, 0x26, 0xD9, 0x1B, 0xB5, 0x8E, 0xB7, 0x3C, 0xB5, 0xA9, 0x65, 0x9D, 0x98, 0xC8, 0x4E, 0x65, 0xB8, 0xAD, 0x60, 0x14, 0x6B, 0xE8, 0x54, 0x4E, 0x9B, 0x1E, 0x0D, 0x57, 0xCD, 0xDB, 0x9E, 0x7A, 0xD1, 0x93, 0x86, 0xDE, 0x4E, 0x4A, 0xDE, 0x20, 0x86, 0x21, 0x96, 0x63, 0xE9, 0x84, 0x01, 0xD3, 0xB0, 0x35, 0x33, 0x4A, 0xF5, 0xAE, 0x61, 0xB2, 0xCD, 0x69, 0x35, 0x2B, 0x18, 0x0B, 0xF7, 0x44, 0x9D, 0xED, 0xEA, 0x18, 0x57, 0x0B, 0x7B, 0x55, 0x57, 0xA5, 0x5C, 0x66, 0xF2, 0xC5, 0x9C, 0xB5, 0xF2, 0x14, 0xD3, 0x0F, 0x73, 0xC5, 0x0F }; Quote Link to comment Share on other sites More sharing options...
Eric Lafortune Posted June 15 Share Posted June 15 12 minutes ago, coolio said: I thought the speak external command was x110xxxx, which would be 0x60. Isn't 0x40 the "load address" command for an address of 0x0? You're right -- it's 0x60 indeed. This byte doesn't go into the LPC FIFO buffer. The bit order of the commands and the LPC data remain tricky. On the TI-99, you can send the command byte as-is, and LPC software will already reverse the LPC bits for you, so you can send the resulting LPC bytes unchanged. If it helps, my VideoTools project has ConvertLpcToText and ConvertTextToLpc to convert between binary and readable/editable LPC files. Quote Link to comment Share on other sites More sharing options...
coolio Posted June 16 Author Share Posted June 16 20 hours ago, JasonACT said: static unsigned char press_fire [] = { // From Parsec 0x10, 0x80, 0x58, 0x43, 0x9B, 0x6A, 0x8A, 0x67, ... Thanks! I used this bit stream and it seemed to work, but the speech still sounds poor. Listen to the the audio here: IMG_6002.MOV I was the Parsec "Press Fire to Begin" sound I assume, but that audio quality is very poor. I wonder if the audio quality is caused by my audio amplifier I am using. It's just a simple LM386 circuit similar to the second one documented here. As a comparison, here is an audio file of my own that I ran through TMS-Express: IMG_6003.MOV Both samples have static noise problems. For clarity, I am using the audio filtering circuit in Thierry Nouspikel's documentation of the TI Speech Synthesizer circuit, specifically the SPEAKER line from the TMS5220 is connected to a 0.22 uF capacitor and 1K8 ohm resistor to ground, and a 1 uF capacitor inline with the audio line. This then feeds the LM386 audio amp, which is driving a 8 ohm, 0.5W speaker. Also, since making the recommended change to use the interrupt line as the cue for more data, the logic analyzer picture looks much better (the /READY line goes high for only 20 uS at a time). So like I said, I wonder how much of this is my audio circuit now. Any pointers would be much appreciated. Thanks! 2 Quote Link to comment Share on other sites More sharing options...
coolio Posted June 18 Author Share Posted June 18 (edited) OK, progress here. I moved away from the LM386 audio amp to some computer speakers with their own amplifier and that fixed the audio quality issue. the Parsec sample @JasonACT provided sounds just like it did on my TI 99/4a .... with one exception. The speech synthesis gets about 3/4 through the data and then just halts. What I hear is "Press fire t<bzzzt>". About the time the sound goes south, I the TS (talk status) bit on not he TMS5220 goes to 0. But the thing is I don't get an interrupt prior to this indicting the BUFFER is low on data. The process stops always at the same pointing the data stream, meaning the talking stops before all the data is sent. I am not sure how to debug this. Has anyone experienced anything like this before? I looks at @Asmusr's TMS5220 handling code for some of the games he published the source to. I see in his code instead of waiting on the TMS5220 interrupt pin to send 8 more bytes to the FIFO buffer, he polls the BL (buffer low) status big. I changed my code to do that and get the exact same results (about 3/4 of the data stream is successful, and then talking stops). I also tried restarting the speak external command by when I get a premature TS=0, I reissue the speak external command and pick up with the data transmission where it previously left off. This doesn't help. Any thoughts or ideas on this? Thanks! Edited June 18 by coolio clarity Quote Link to comment Share on other sites More sharing options...
coolio Posted September 9 Author Share Posted September 9 (edited) OK, I have made progress here. After literally months of trying (not every day, just during "hobby time") I came to the conclusion my problem was the breadboard on which I was building my system (too much noise and capacitance). So, I moved my hardware design to a PCB, and got it working. Yay! The software approach for the speak external command was to send the first 16 bytes, and then poll for the TMS5220 status buffer looking for the buffer low status and sending 8 more. Works perfectly. And the beauty of my hardware design is that I don't have to put the CPU into a wait state while the TMS5220 latches onto the data (as indicated by the READY line). I'll share more on the details of this design later (I plan to make a video about it). Now I have a new problem: what are the best practices for encoding custom audio into LPC bitstream? I am using TMS Express to do the conversion from a WAV file to an LPC bitstream, and that in general works, but frequently the audio from the TMS5220 is just awful, notably during times like inter-word silences. Here is an example: Note that in the second sample (the classic Daisy song) there is a large amount of static in between words and at the "tail" of words. Any advice on the best practices when LPC encoding custom audio would be appreciated! Thanks! Edited September 9 by coolio typo 2 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted September 10 Share Posted September 10 From documents I've seen recently from the 1980s: LPC was always cleaned up by a human. The human editor would step through the frames (ie play from 0-20, 0-21, etc). to find syllable boundaries and word boundaries, They would then chop out noise frames. Another necessity was smoothing discontinuities in pitch or other parameters. I don't know anything about how it was done after 1985. I haven't done this myself. 2 Quote Link to comment Share on other sites More sharing options...
Stuart Posted September 10 Share Posted September 10 The "hello world" sounds considerably better than the "Daisy, Daisy" song. From what I understand, the speech synth works better with some voices than with others. It may be that the synth can't handle singing like this, where each word is very elongated. I'd be trying to optimise the quality for 'normal' speech, which is what the synth was designed for. 2 Quote Link to comment Share on other sites More sharing options...
Eric Lafortune Posted September 10 Share Posted September 10 For audio-to-speech, it's important to start from clean speech audio: avoid poor recordings, background noise, background instruments, echoes, overdubbing,... LPC speech encoding is based on a simple model of a vocal tract, so best suited for simple voices. High-pitched voices and songs are more difficult because of the sample frequency and the limited number of available LPC frequencies. It then helps to play with the options of the tool that you are using (QBox Pro, Blue Wizard, python_wizard, TMS Express, Praat) : notably frequency range of voiced frames, stability of frequencies, threshold between voiced and unvoiced frames, pre-emphasis, frame sizes. In my experience, this requires a lot of trial and error because of the combinatorial explosion of possible settings. I still need to document and release my own audio-to-speech tool, which I created in the hopes of solving this problem. It includes an exact simulation of the TMS5200/TMS5220, to automatically tune the results and clean up clicks and pops resulting from quirks/bugs in the processor. It turned out to be a rabbit hole with months of research in academic papers and technical implementations. The problem remains non-trivial. It did help to create much better results for the vocals of the Bad Apple demo, also the be released (as version 2). Let me aim for October. 3 1 Quote Link to comment Share on other sites More sharing options...
coolio Posted September 11 Author Share Posted September 11 Thanks all. I played with the encoding options as suggested. I also tried the different encoding software. I found that I could get the singing bit to sound good (no static or noise) with python_wizard. So, Yay! 1 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted September 11 Share Posted September 11 Just to remind, we discussed experiences with python_wizard two years ago: 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.