Friday, May 23, 2008

Computertake A Letter A Speech Recognition Update

Writen by Grant Fairley

Most of us are used to just shouting at our computers when they misbehave. We wish that there was an axe included as standard equipment along with our keyboard and mouse. What if you had something nice to say to your computer? Imagine if your computer actually listened to you and did what you told it to do? If you have not heard about it – it's called speech recognition software.

If you've never seen speech recognition demonstrated by some who is trained to use it - you're in for a surprise. A person speaks into a microphone in front of their computer and you see the words they are speaking pop up on the screen in real time. You might be tempted to look around for someone operating a keyboard around the corner. It is amazing to watch and to use. You can speak at our normal speaking speed (about 120 words per minute) and the computer "guesses" using mathematical algorithms what word you mean in that context from what it knows about the English language. (There are some other language models also available.)

Speech recognition has been useful for users with training for the past seven years. Training still makes a difference as it does for any major piece of software. We train people to use Word®, Excel® or PowerPoint® and speech recognition should have training as well. (Tip - Good training resources are available at Crown International http://www.crown1.com including training manuals and DVDs for ViaVoice and Dragon products...) The high degree of speed and accuracy for trained users is a combination of improved software and most important - better hardware. The faster, more powerful, personal computers with operating systems and multimedia that focus on high sound quality have made all the difference.

It is like a Star Trek future when you think that a person can speak and a computer program can listen, interpret and respond with the correct words. One of our favorite sentences to demonstrate this is "Mr. Wright will write you a letter right now." The speech software guesses based on context which "write" is the right one for that place. Homonyms are tricky for any of us. You can also say, "I would like my next paycheck to be two-thousand, one hundred and sixty-two dollars and eight cents." The computer will write $2,162.08 on the screen. The same is true for dates and times. We say it as we normally would and the software formats it for us. When the software makes a mistake - you correct it and it learns. It becomes more accurate as you continue to use it.

Speech recognition software for personal computers has been around and improving since the early 1990's with products names like Kurzwiel, Lernout & Hauspie, Kolvox, Philips and the dominant products IBM's ViaVoice® and Nuance's Dragon Naturally Speaking®. Now with Microsoft's Vista Speech® coming in their next release of Windows® speech recognition will change the future. It is already changing the present.

Speech began with what was called "discrete speech" where you had to pause between each word or phrase. "Today…is…a…beautiful…day…to…play…tennis…period" It was slow at 40 words per minute but still an amazing breakthrough of the science. In the late 1990s we finally had "continuous speech" where we could speak at our normal speaking speed. We still include the punctuation just as people do when they would be dictating a letter to their assistant.

For those of us who used typewriters with whiteout or eraser ribbons - we can only dream of what our past might have been! All those 30 page papers I had to do at college would not have seemed so daunting, if I would have had speech software back then. But I suspect that the long history of speech recognition software is still news to most people (and professors) today. I had an interesting discussion with an English teacher who watched a demonstration of speech. Like the calculator has been to arithmetic, this teacher was sure that speech recognition would ruin the written language. Perhaps. Or maybe it is just a return to a more ancient form - the oral tradition.

Surprisingly - it was only really Star Trek that nailed how speech recognition would become natural. As someone who uses speech, it is still surprising to see how many futuristic commercials and movies still have people typing. Speech as the more natural interface makes sense - we speak before we can learn to type.

Of the many types of users of speech recognition today - most are in the words business. They are people who use extensive numbers of words in their profession. So it is lawyers, physicians, judges and educators who tend to be the early adopters. Most of them were already used to dictation so the idea of speaking their thoughts was already comfortable. Other categories are executives who want to be able to control their personal email dictation. People with disabilities that limit their ability to use the computer keyboard or mouse have also found speech as their way to surf the web, play games, send email or do their work. It is liberating.

One of the quiet epidemics that is carpal tunnel syndrome (CTS) or repetitive strain injuries (RSI) associated with typing. It is estimated that this currently costs anywhere from hundreds of millions of dollars to billions of dollars a year. It is difficult to calculate it since the reporting on these conditions and the diagnosis is still uneven. But it is safe to say that the accumulated keystrokes from many users over years of typing are showing up with the classic symptoms of tingling and burning in the fingers, hands and wrists as well as stiffness and pain up our arms, shoulders and neck. RSIs can also be associated with headaches, migraines and a number of other pain conditions.

Speech recognition offers an alternative to all that typing. With the aging boomers and Gen-Xers who have accumulated RSI and Carpal Tunnel injuries over years of typing and playing - speech will be the only game in town.

The use of speech is divided into two types of applications. One is command and control. The other is dictation. It is better at command and control since it is only recognizing a word or phrase at a time. Our normal dictation is extremely complex as we all know from studying how language works or if you've tried to translate what someone is saying from one language to another. You have to hear them well and guess correctly what they meant based on what you know of the two languages. Most speech products are still "speaker dependent" – one user's voice at a time for dictation but "speaker independent" for command and control. It doesn't need to know your voice to be accurate for most users. The holy grail of speech recognition is speaker independence for dictation – where it doesn't matter who is speaking – the computer will interpret you correctly. Just like on Star Trek.

But speech recognition for the PC is only a small part of the speech story. Speech is now server-wide and is used for example when the computer answering attendant chats with you when you call a company on the phone. Command and control speech is also in things like automobiles where you can adjust the heat and the position of your seat. From huge to small you can find speech in your PDA, cell phone and even toys. It has been predicted that it is in these large and small applications that we will really see and hear speech in the future as the desktop computer becomes a thing of the past.

It will be interesting to see how the next generation of speech recognition from Microsoft will change the speech landscape. The next operating system to follow Windows XP® was called Longhorn® and now is referred to as Vista®. Like XP (did you know a basic speech recognition program was included in XP?) - it is expected to include speech recognition software for command and control and dictation. The early reports are that this new release will be very accurate and require very little training. If true - that's going to take people a step closer to our talking to our computers everyday.

Mr. Scott on Star Trek once traveled back in time and was confronted by a computer with a keyboard. He commented, "A keyboard - how quaint" He knew that the keyboard and mouse were a thing of the past. Like the telegraph it was a useful tool in its time but definitely part of the past. We may someday have to explain what a keyboard is - just as we have to explain what a typewriter is and how you used white-out to correct your college papers. Those were the days...

COMPUTER-SAVE-THIS

P.S. If you're one of those already in pain with RSIs and Carpal Tunnel Syndrome - you should check out some of the information by Dr. Blair Lamb, MD - a pain specialist. He shows that most carpal tunnel syndromes and RSIs are conditions that are primarily due to injuries in the neck caused by our typing and poor posture when typing. The pain we feel elsewhere are referred pains or the result of shortening muscles in our arms and wrists. To resolve these RSIs Dr. Lamb has a number of treatments with stretching exercises for RSI that stretch the neck as well as other areas to lengthen those shortened muscles that are pulling and pinching our nerves and causing us the pain.

To learn more about this, visit his website http://www.drlamb.com He also has DVDs available that explain pain conditions like RSIs and he has a multi-level stretching program also available on DVD http://www.stretch-doctor.com

Grant D. Fairley is a graduate of Wheaton College, Wheaton, IL. He is an IBM Business Partner and is a principal presenter with Strategic Seminars http://www.strategic-seminars.com He is the author of several books available through http://www.palantir-publishing.com

No comments: