Friday, August 31, 2007

Speech Technology Basics, part one: Speech Recognition

I asked our director of product engineering to give me a few words for the blog that would present how speech technology can be used in the call center. He ended up giving me a whole lot more, and I thought it will definitely be something that others might like to read:

There are many different flavors of technology that can be categorized under the term “Speech Technology”. You have speech recognition, speech analytics, speech to text, interactive voice, phonetics, speech XML and so on down the line. Some of these technologies relate to call centers and some do not.

"Speech Recognition": This form of speech technology allows you to talk to your computer. “Open Microsoft Word”, “Start new document”. “Hello Mom . . . .”, “Save and Print”, "Save the World", etc., a simple way to verbally bypass your keyboard and mouse. If you want to read and write email while you’re on the treadmill then this is the way to go. This technology uses the concept of “Speech Training”.

"Speech Training": This is the method by which you train the software to recognize your words. You must read a story or list of sentences into the computer before you use the software, so it can expect how you will say various words. You are basically “training” the system to your voice.

IVR ("Interactive Voice Response") systems use "speech training". For example, an IVR vendor has speech capabilities in their products that will prompt the user for a response and expect a specific answer. Such as “Would you like to be transferred to the sales department? Please say ‘Yes’ or ‘No’”. This technology is very basic and simple and will work effectively across many different dialects, accents and slang. The IVR knows you will say some variation of ‘Yes’ or ‘No’. Another example is an IVR at a bank that can read your account in their database and determine your balance is $100. Speech XML capabilities within the IVR will have the IVR attendants voice actually say out loud “Your balance is one hundred dollars and zeros cents”.

Call recording allows you to record these IVR interactions to determine what your customers were inquiring about. This would be recorded and saved as audio. This technology uses the concept of “Expected Responses”. You will say some variation of ‘Yes’ or ‘No’, or you will say a number, or it will dictate a number, or you will say one of the employees names in an employee directory. There is a finite number of words or phrases you will say and therefore it can expect a certain range of responses. The resulting statistics could be considered to be Speech Analytics, but it is only a fraction of what Speech Analytics can offer the enterprise.

From the Wikipedia:
Speech Recognition

Popular Posts