So what’s the state of speech recognition and dictation software?
Linda asked me about the state of desktop dictation technology. In particular, she asked me whether there was much difference between the latest version and earlier, cheaper ones. My knowledge of the area is out of date, so I thought I’d throw both the specific question and the broader subject of speech recognition out there for general discussion.
Here’s much of what I know or believe about speech recognition:
- Most major independent commercial speech recognition efforts have wound up being merged into Nuance Communications. That goes for both desktop and server-side stuff. None was doing particularly well before its respective merger.
- A folk dance buddy (Jonathan Young, once of Dragon Systems) taught me the essential principle of developing speech recognition systems, which probably applies more broadly to other language-understanding technologies as well: “How do you make a good speech recognition product? You start with a bad one and keep incrementally improving it.”
- Linda tells me that a lot of novelists use dictation software, to reduce repetitive strain from typing. However, this often leads to repetitive use strains on their throats. I don’t know whether it makes a difference if one uses better microphones, talks more softly, and/or has access to software that is less demanding of carefully enunciated gaps between each word.
- Perhaps due to accuracy concerns, and perhaps also due to concern about noise pollution in the workplace, ordinary computer control via voice is rare. Most applications focus on specialized-circumstance dictation (hands-free, disabled users, users who are being harmed by typing, etc.) or telephone interaction.
- Rich semantic technology isn’t yet used in speech recognition to nearly the extent it is in text search/mining/analytics. The grammar in speech recognition systems is primitive at best. And while there may be some hand-built semantic networks with small numbers of nodes, ala Sybase AnswersAnywhere, nobody’s ever hooked up (say) a WordNet equivalent or a good entity-extraction engine as part of a mainstream commercial speech recognition product. (Please correct me if I’m wrong about this part!)
- There are real challenges in voice recognition via remote microphones in small enclosed places (e.g., automobiles), especially when noisy. But wearing headsets while driving is generally frowned on by the traffic police. EDIT: It seems that those challenges are being overcome.
- Overall, I can’t think of anything wrong in this Wikipedia article on Dragon NaturallySpeaking. That said, the article is a bit sloppy, so I’d encourage people to see if they can edit it a bit and spruce it up.
Any thoughts? In particular, what version of Dragon NaturallySpeaking or a competitive product should Linda use, and why?
Technorati Tags: speech recognition, voice recognition, Nuance, NaturallySpeaking, AnswersAnywhere
December 3rd, 2007 at 4:34 am
.
Hi Curt,
Speech recognition in general, is gaining ground as an ubiquitous technology almost daily..
And Windows Vista offers dictation, and Command and Control that’s previously unheard of!
Here’s an article that may help you get a better picture:
http://wirelessspeech.blogspot.com/2007/12/speech-recognition-top-10-flop-says.html
Bill Burke
http://wirelessspeech.blogspot.com
.
December 3rd, 2007 at 11:21 am
If you have Vista you don’t need to get Dragon. Just go to the accessibility menu and turn on the speech recognition that is included in the OS. It is very good, and is both free and immediate, just a few clicks and a training session away…
Thanks!
steveh
December 3rd, 2007 at 3:53 pm
Thanks, Steve.
I’ve chickened out and haven’t run Vista so far, despite Microsoft’s blandishments.
CAM
December 3rd, 2007 at 5:23 pm
The MS product is also downloadable for XP / Word 2003, or you may already have it.
It’s only for US English, Chinese, Japanese.
December 4th, 2007 at 1:02 am
Unfortunately, I don’t have Vista.
I used the Word program a few years ago, and found it pretty annoying. Despite long hours of “training,” the errors when I dictated were legion. I know writers who use Dragon, and love it, but the version they’re using is several years old. Does anyone know if Dragon has a recent update?
Thanks.
–Linda
December 4th, 2007 at 1:21 am
Per Wikipedia, Dragon NaturallySpeaking 9 came out in mid-2006, and doesn’t require training. Does anybody know whether there are other significant enhancements in Version 9? And is the no-training claim really true?
CAM
December 20th, 2007 at 2:02 pm
Hi. I’ve used the open source Java research software Sphinx-4, which performs automatic speech recognition. I get about 5% - 10% error rate on my large vocabulary evaluations. It does not have a facility for training. And its not really an end-user product but it can be incorporated easily into Java applications.
See: http://cmusphinx.sourceforge.net/html/cmusphinx.php
-Steve
Stephen L. Reed
Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860
May 15th, 2008 at 10:00 pm
During the past few days, I’ve been increasing my usage of SR in Vista, and the results are encouraging. A boom microphone is essential (a Bluetooth-connected earset won’t work) and a reasonably quiet environment is needed (loud noises from outside such as bird songs (!) don’t help).
Anyway, because the Vista SR is part of the OS, it seems to have knowledge of all the special words, names, etc., in ones documents, contacts, and so forth. This radically reduces training. For ordinary conversation recognition, it does very well, and it sure beats typing. If it mis-identifies a word, the correct word is almost always found on the pop-up menu of alternates.
Not perfect, but impressive. Training is quite short, perhaps 15 minutes. And it also gives good control over the desktop, once again, not perfectly, but it’s a whole lot better than typing.