dacs.doc electric

September Meeting Review

Voice Xpress Talk to Your Computer

By Jack Corcoran

 

The DACS general meeting in September was all about speech recognition software. Our presenter, Chuck Runquist, was a good speaker with excellent command of the technology-and his demo's all worked. The middling-size audience was friendly and mildly receptive. And we all went home after a pleasant evening.

"...ay, there's the rub."

Voice recognition by the computer is man/machine bonding, a human expression media comparable to written language, and the entry to a machine participatory society we can't even imagine. It should be the most dramatic, exciting, and disruptive technology ever to hit the technology scene. But it isn't. So let's review the September presentation and then get back to that enigma.

Chuck Runquist is Technical Project Manager at the Lernout & Hauspie branch office in Howell NJ. L&H is a company of 1,700 employees worldwide that was founded in 1987 and headquartered in the Flanders region of Belgium. The company produces linguistic-based software products, language translation, speech recognition, etc. They certainly have the location and credentials for it.

The Big (and only) Four in voice recognition software are L&H with Voice Xpress, Dragon Systems' Naturally Speaking, IBM's ViaVoice, and Philips' FreeSpeech. At the present time, Dragon Systems is probably the most popular, but all are contenders.

Chuck opened his presentation with an overview of L&H and the basic functions of a voice recognition system. He then gave us a frank assessment of the major technical challenges of processing continuous speech-ambiguities and context dependence. He emphasized that training is the key element in developing the accuracy of the system, with microphone calibration and placement crucial to system performance.

After establishing his technical base, Chuck told us about a number of special features of L&H's Voice Xpress 4 that make it stand out from the competition. Most impresssive was the greatly reduced training time to get a new user started. While other systems require up to an hour of reading in training text, Voice Xpress gets a user up and going after only seven minutes reading and about six minutes more of touch up. This is a major factor in our instant-everything society and is sufficient in itself to outdistance Voice Xpress from its competitors.

In his demos, Chuck proved that voice recognition is here to stay. True, his demo was practiced, polished, and perfected, but it worked. If he could talk to Word, Excel, and other apps, so could we if we really wanted to. He made believers of the audience.

Throughout his presentation, however, Chuck frequently mentioned the ever greater features and ever greater accuracy to be expected in future products coming soon. It was almost as if he sensed that his audience was holding back, had some sort of barrier or mental reservation. They were not buying.

Now let's consider the enigma. Chuck proved by his demo that Voice Xpress can input over 100 words per minute with a percent accuracy in the high 90's. He claimed that up to 140 words per minute at 90% accuracy is possible for anyone. Now compare that to what you and I do, perhaps 20 to 40 words per minute with an accuracy (if that's the word) of awful. So here we have an input device five times faster and much more accurate with no carpal tunnel syndrome and it doesn't even take up any desk space. So why haven't we all converted already?

Now the first introduction for most of us to computer speech processing was in the movie "2001" where we met HAL. He scared the hell out of us. Did you ever see a "HAL" T-shirt? It's a possibility that it's all HAL's fault.

Another consideration is that natural language is so integrated into our subliminal nature that we feel uncomfortable with the strange-looking text that sometimes comes out. At the meeting, Chuck ran a machine translation of some Lewis Carroll poetry from English to French. The audience then insisted he run the French back into English. The idiomatic correlation was not perfect, and the audience gleefully pounced on the awkward result. No credit for how close it came, just laughter at the mistakes.

The surprising bottom line is that we are demanding perfection in this particular computer product that we do not expect in any of the others that we use. Voice Xpress or one of the others, will eventually and inevitably become our primary method of computer interaction. It will improve and our ten little fingers will not. The current list of excuses, "I can't use it in the office.", "It makes too many mistakes.", "It takes too much time to learn.", etc. will go away.

Chuck did everything a technical presenter could do to show us the way. The product will improve if people buy and support it, but it is now up to the marketeers, evangalists, hypsters, industry pundits, technology prophets, and all the other image makers to bring it to the people.

No one saw the Internet coming even though we all were familiar with networks both large and small. Perhaps the audience at our September meeting will some day look back on what Chuck showed us that evening.




Jack Corcoran is an old retired computer programmer who pleads mea culpa to everything he criticized in this review.

BackHomeNext