Eric Lafortune Posted August 8, 2022 Share Posted August 8, 2022 For converting sound files to speech for the speech synthesizer, we have a number of options: QBox Pro by TI's speech engineers (Windows 3.1, TMS5220) BlueWizard by @patrick99e99 (Mac OS X, TMS5220) python_wizard (supports TMS5200, which we actually need) Implement your own brute force optimization, like @Kurt_Woloch did After struggling with and further exploring the conversion of vocals for my Bad Apple demo, I can add a new option: Praat by the Phonetic Sciences group of the University of Amsterdam, combined with my conversion tool ConvertPraatToLpc Praat (which is Dutch for "talk") is a powerful phonetics program with many techniques and algorithms to analyze and synthesize speech. You can operate it from a GUI or from scripts. It's available for the major platforms (e.g. on Debian/Ubuntu: sudo apt install praat). Two features in Praat are relevant: a choice of algorithms to extract the pitch and a choice of algorithms to compute LPC coefficients. I have created: A Praat script lpc.praat that reads a specified WAV file, analyzes the speech, and writes a Praat pitch file and a Praat LPC file. The script is a single file in my Bad Apple demo. A command-line tool ConvertPraatToLpc that reads a pair of these files and writes a binary LPC file that is suitable for our TMS5200 speech synthesizer. The tool is one of my video tools, written in Java. The Bad Apple build script illustrates the flow. For example: praat --run lpc.praat \ /tmp/input.wav \ /tmp/output.PraatPitch \ /tmp/output.PraatLPC \ 250 550 0.02 0.40 0.20 0.20 0.03 java ConvertPraatToLpc \ -addstopframe \ /tmp/output.PraatPitch \ /tmp/output.PraatLPC \ 0.4 0.6 \ /tmp/output.lpc The header resp. the documentation of the tools explain the parameters. You can also follow the Praat script interactively and explore the results. I feel like we can still push speech extraction further. For example, converting WAV files of the high-quality speech from the speech dictionary, with any of these programs, doesn't produce anything close to the original speech, even though it could/should. I'm curious about your experiences and thoughts. 14 4 Quote Link to comment Share on other sites More sharing options...
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.