Wednesday, December 17, 2008

Talking Computers

Writen by Kadence Buchanan

Just a few decades ago, the possibility of creating talking computers was considered strictly in the realm of science fiction. But today, talking desktop computers are so commonplace that they hardly elicit any response from modern and sophisticated consumers.

It's really pretty simple, as any of the computer school-trained friends and acquaintances we have are wont to tell us. Computers talk simply because they are running software that converts text to speech, thereby allowing the computer to talk out loud through speakers or a headset. In short, it has speech recognition capability.

Hearing a friend explain all that to me the other day almost made me fall asleep. But flipping through the TV remote control this morning, I came across this very thing and I suddenly found if fascinating.

The entire process of artificially producing speech is called speech synthesis and the software system that achieves this is called text-to-speech or TTS. It has two parts: a front end and a back end. The front end takes input (in the form of text), converts this into linguistic symbols and sends them to the back end which converts these symbols into speech waveform through the computer's speakers.

The front end has two basic functions. First, it identifies all the numbers and abbreviations in the raw text and converts them into their spelled-out word equivalents. Then it divides the entire text into phrases and sentences and assigns different sounds or "phonetic transcriptions" to each word, complete with pauses and intonations. Meanwhile, the back end, which is often called the synthesizer, takes these transcriptions and turns them into actual sound output.

The other part, the back-end, takes the symbolic linguistic representation and converts it into actual sound output. The back end is often referred to as the synthesizer. The different techniques synthesizers use are described below.

Today, the challenge is no longer in reproducing speech, but in improving the quality of speech synthesis. In this regard, there are two key concerns: naturalness and intelligibility. The ideal speech synthesizer is both. Naturalness refers to how close the sound output is to a human being while intelligibility refers to how clearly the sound output is understood.

Kadence Buchanan writes articles on many topics including Computers, Science, and Education

No comments: