Technical Musings: March 2007

Hi, I'm back. It's Finally time for the long promised discussion on speech recognition.

What speech recognition is and is not.

Gene Roddenberry first introduced the general public to the notion of talking to computers in September of 1966. The crew of the Starship Enterprise enjoyed flawless voice driven interaction with the ship's computer. The system never made mistakes; required correction during the dictation of a log entry, or, seemed to need any special maintenance of any kind. It could be used by voice by many different people simultaneously. Those of you who know me, know that I'm into science fiction. You may not however, understand exactly why.

Growing up as someone with cerebral palsy, I had to face many limitations. Although I can paint and draw fairly well, my brain injury makes writing physically difficult, laborious, time consuming, and, eligible. My vision problem, which is also caused by my CP, makes driving a car both impractical and unsafe. The welfare system and social structures in place in our society today mean that our government neither expects or demands much from its disabled citizens. "As long as you just want to sit home and play with your Xbox and don't demand much out of life, we'll keep sending you money but don't expect much else from us in the way of help." This seems to be the general attitude of those in power. I assure you though, that this is only due to a lack of knowledge.

Star Trek and other shows like it show us the way things must become. In star trek for example, Geordi Laforge may be blind, but he has been provided with the appropriate technology to overcome his limitations. He wasn't expected to pay for it. The appropriate piece of technology was simply provided to him that might become a more productive member of society. Moreover, as chief engineer of the USS Enterprise D., Laforge is expected to perform his duties regardless of any limitations he might have had at birth. His crew mates expect this of him and he in turn demands this of himself. In the world of Star Trek, no one drives. They simply have a way to transport everyone safely and conveniently. In Doctor Who, if the Doctor meets someone with three heads and six arms, he's not going to say " oh god, you have three heads and six arms!" He's more likely to simply say " lovely to see you again. I haven't seen you in so long!" And that's why I watch science fiction. I'm not always interested in who's attacking our heroes this week. Although I do enjoy a good story, I'm interested in the technological and sociological messages behind these things. I don't mind telling you that I believe in miracles and I wait expectantly for the manifestation of life altering, liberating technologies.

When I was young, speech recognition was just such a technology; something to be reached for; a potential solution to my problems, but nothing more.

Today, we have speech recognition technology, but what is it really like? Does it match up to the vision holdout by Star Trek? We are getting there. Right now though, all speech recognition technology works by analyzing sound waves. It also to looks at the surroundings sentences and judges the statistical likelihood that certain words will appear next to each other. The effect of this is as if the system is playing charades. Let's say that your dictating a sentence in a letter and you want to write a sentence that says, " john, I'll meet you there." The minute you start speaking, these speech recognition software is attempting to judge exactly what it was you said. It has fractions of a second to do this because you're obviously going to want to dictate something else or go to another program, etc. So the first thing the software does is say to itself, " I noticed the name John several times in this letter. So I'd have to guess that's who this is for. And then it sounds like he dictated something that sounds like I'll eat new where." Speech recognition systems only learn to improve by proper correction. It is therefore, extremely important to follow the appropriate correction method so that the software learns from its mistakes. Otherwise, it will never improve! This isn't Star Trek! Speech recognition systems require ongoing correction to learn. It is not teaching the software anything for you to simply highlight the misunderstood text and type in the corrections by hand. Although most software packages can learn from document analysis, the best way to teach speech recognition software is to make corrections by voice. This is after all, speech recognition software and therefore it needs to hear the appropriate things to tell it that it's made mistakes! Only in this way will the software ever improve! I cannot stress this enough! When you buy and or use speech recognition software, ask for training on how to use the software appropriately. Take any tutorials that come with the software until you feel comfortable using it. Learn the appropriate correction commands! Most importantly, understand that working with speech recognition is a little like raising a child. Children are born with a tremendous amount of intelligence, however they need continual nurturing and guidance from their parents and family to learn right from wrong. So it is with speech recognition. When I was working as a certified consultant for Dragon Systems, I would get a lot of casual users who were intrigued by speech recognition. They would go to their local computer superstore and pick up the entry level version, bring it home and without any prior experience or knowledge of how the software truly works, they would expect to be dictating at 100% accuracy all the time without putting in any real effort. I would then get calls and emails saying " I bought this and it doesn't work." I would inevitably write back asking if they made corrections when the software made mistakes. Almost invariably, I would either discover that they weren't correcting at all or, were simply just dictating or typing over mistakes. This is not the way to run the railroad!

So now that that tirade is over (for the moment :-)), let's discuss speech recognition alternatives. In the world of speech recognition software for PCs, there are basically two alternatives. You can get Windows Vista with its built in speech software or, you can get Dragon Naturallyspeaking. If you're on a Mac and you want more than command and control and control abilities that can be found in OS X itself, you want to check out a company called Macspeech. I have not personally had a chance to review their software firsthand as of yet, but I hope to sometime in the near future. When I do, I will post my observations here.

Windows Speech Recognition, as previously stated, is speech recognition built into Windows Vista. It works fairly well provided you have enough RAM. If you're going to be doing any kind of speech recognition, you want at least 1 GB or more of RAM on Windows Vista! Vista's speech recognition is not just for dictation: there is also fairly decent command and control capabilities as well. For those of you unaware of what Command And Control is, it's the ability to bring up programs by speech and then interact with them in the same way (without having to use the mouse or keyboard.) As good as Windows speech recognition is, it does lack some more useful features that are found in Dragon Naturallyspeaking.

Naturallyspeaking Preferred includes text to speech technology to allow documents to be read. It also includes the ability to create simple macros to enter text. For example, in Naturallyspeaking, I have a amacro called " write my e-mail address". So whenever I say that command, guess what happens? There is no such commitments in Windows Speech Recognition. If I want to dictate my e-mail address, I have to go in to "typing mode" by saying "start typing". I can then spell out my e-mail address. Not quite as elegant as in Naturallyspeaking, is it? Windows Speech Recognition is analogous to Dragon Naturallyspeaking Standard Edition. It is possible to move the mouse by speech in Windows Speech Recognition, however, it's not always as seamless or as accurate for me as in Naturallyspeaking. This doesn't mean that mouse control by speech doesn't work in Windows Speech Recognition. It most certainly does. The process is just a little more elegant in Dragon NaturallySpeaking. Microsoft also doesn't currently offer any add-ons for their speech product. They do offer development tools for application designers, but that's not quite the same thing.

What if you're a medical professional and need an extensive medical vocabulary? Windows Speech Recognition offers no way to create such a vocabulary. Yes, you can add individual words to your vocabulary but for a large scale vocabulary of specialized terminology, this would be extremely time consuming. Dragon Naturallyspeaking Medical Solutions is the only way to go in that case, likewise with the legal profession. Windows speech recognition also offers no way to create an advanced scripting commands to perform complex tasks. This functionality can only be found in dragon Naturallyspeaking professional edition. And if you need the ability to automate tasks beyond a simple text macro, you really need Naturallyspeaking professional at this point. Having said that, no one really knows what Microsoft might add to Windows Speech Recognition in the future. The product is built into windows and that could be a tremendous advantage for the company. The Windows Speech Recognition is very much a first generation product, so we can expect further development from here on out. NaturallySpeaking 9 is compatible with Windows Vista for most editions via a free downloadable patch (9.5). (Scroll down to NaturallySpeaking section.) Patches for the professional, legal, and medical versions of Naturallyspeaking are coming.

Technical Musings

20 March, 2007

Windows Vista and Speech Recognition

02 March, 2007

Notice of updates

Blog Archive

About Me

Facebook Badge