The voice–computer interface affords many opportunities for explorations because it is not a mature technology. There are still many issues that need to be solved in order for the interface to work seamlessly. This website uses a series of exercises to investigate the successes and shortcomings of Text to Speech (TTS) and Automated Speech Recognition (ASR) routines. The exercises are designed to help non-science majors explore concepts in much the same way that a scientist does. This work is made possible through a 2006 Technology Fellowship from the Associated Colleges of the South.
Text to Speech (TTS)
Download and install the latest version of the free tool from Natural Reader. This software reads highlighted text and sends the output to the sound card and on to the computer speakers.
The formant map presents the vowel loops, defined by the first and second formant frequencies, used in vowel recognition.
Download the latest version of the free audio software Audacity. By using the pull-down box next to the microphone icon, you can change what you can record. Choosing the "wave out mix" allows you to record sounds that are being played through your sound card. This way you can record the voices that Natural Reader is using.
TTS Exercises
1. Record Michael, Michelle, and Sam reading the same text. Select
the whole text and plot the spectrum. What are the frequency ranges for
the different voices? What about the frequency range of the professional
software voices?
2. Write out 10 vowel sounds of the formant map in MS Word. Record a
human reading it and Natural Reader reading it. Capture the sound with Audacity and save
it in .wav format. Analyze the formant structure of the different vowel sounds
in SFSWIN. Plot the position of the first and second formants on the
formant map. How well does your data fit in the vowel loops? What
observations can you make comparing the two voices?
Speech to Text (ASR)
AccessScience article on Speech Recognition and AS block diagram of speech recognition model (Figure 1).
ASR Exercises
Open MS Word. Go to the Tools menu and select Speech from the menu.
(If Speech is not listed as an option, you will need to activate the Speech Tools in MS Office. The easiest way is to bring up the Help file in MS Word by hitting the F1 button. Enter "speech recognition" into the search field and hit enter. Find the entry that says "use speech recognition" and follow the directions in that help document. You will need to be logged in as a computer administrator to do this.)
Follow the microphone setup and ASR training procedure. Once you have trained the software to recognize your voice, continue with the following exercises.
1. Read the "Fundamental Statements of Frequency Analysis".
Calculate the error rate.
2. Repeat the reading a second time and calculate the error rate. How does
it compare to your first reading?
3. Have another person read the same statements and calculate their error rate.
Comment on the differences in your two voices.
Using SFSWIN
Download the latest version of SFSWIN, a software package used in speech research.
Synthesize speech using SFS
Record a speech signal, then do:
1. Tools | Speech | Analysis | Fundamental Frequency | Fundamental Frequency
Track
2. Tools | Speech | Analysis | Formant Estimates Track - and select synthesizer
control data output
This should give you a basic set of data for formant synthesis. Then do:
3. Tools | Synthesis Data | Synthesize speech