Linda asked me about the state of desktop dictation technology. In particular, she asked me whether there was much difference between the latest version and earlier, cheaper ones. My knowledge of the area is out of date, so I thought I’d throw both the specific question and the broader subject of speech recognition out there for general discussion.
Here’s much of what I know or believe about speech recognition:
- Most major independent commercial speech recognition efforts have wound up being merged into Nuance Communications. That goes for both desktop and server-side stuff. None was doing particularly well before its respective merger.
- A folk dance buddy (Jonathan Young, once of Dragon Systems) taught me the essential principle of developing speech recognition systems, which probably applies more broadly to other language-understanding technologies as well: “How do you make a good speech recognition product? You start with a bad one and keep incrementally improving it.”
- Linda tells me that a lot of novelists use dictation software, to reduce repetitive strain from typing. However, this often leads to repetitive use strains on their throats. I don’t know whether it makes a difference if one uses better microphones, talks more softly, and/or has access to software that is less demanding of carefully enunciated gaps between each word.
- Perhaps due to accuracy concerns, and perhaps also due to concern about noise pollution in the workplace, ordinary computer control via voice is rare. Most applications focus on specialized-circumstance dictation (hands-free, disabled users, users who are being harmed by typing, etc.) or telephone interaction.
- Rich semantic technology isn’t yet used in speech recognition to nearly the extent it is in text search/mining/analytics. The grammar in speech recognition systems is primitive at best. And while there may be some hand-built semantic networks with small numbers of nodes, ala Sybase AnswersAnywhere, nobody’s ever hooked up (say) a WordNet equivalent or a good entity-extraction engine as part of a mainstream commercial speech recognition product. (Please correct me if I’m wrong about this part!)
- There are real challenges in voice recognition via remote microphones in small enclosed places (e.g., automobiles), especially when noisy. But wearing headsets while driving is generally frowned on by the traffic police. EDIT: It seems that those challenges are being overcome.
- Overall, I can’t think of anything wrong in this Wikipedia article on Dragon NaturallySpeaking. That said, the article is a bit sloppy, so I’d encourage people to see if they can edit it a bit and spruce it up.
Any thoughts? In particular, what version of Dragon NaturallySpeaking or a competitive product should Linda use, and why?