Analysis of technologies that recognize and/or respond directly to voice and speech. Related subjects include:
I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:
- A small oligopoly dominating the conjoined businesses of mobile device software and search. The companies most obviously positioned for membership are Google and Apple.
- The continued and growing combination of search, advertisement/recommendation, and alerting. The same user-specific data will be needed for all three.
- A whole lot of privacy concerns.
My reasoning starts from several observations:
- Enterprise search is greatly disappointing. My main reason for saying that is anecdotal evidence — I don’t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:
- HP just disclosed serious problems with Autonomy.
- Microsoft’s acquisition of FAST was a similar debacle.
- Lesser enterprise search outfits never prospered much. (E.g., when’s the last time you heard mention of Coveo?)
- My favorable impressions of the e-commerce site search business turned out to be overdone. (E.g., Mercado’s assets were sold for a pittance soon after I wrote that, while Endeca and Inquira were absorbed into Oracle.)
- Lucene/Solr’s recent stirrings aren’t really in the area of search.
- Web search, while superior to the enterprise kind, is disappointing people as well. Are Google’s results any better than they were 8 years ago? Google’s ongoing hard work notwithstanding, are they even as good?
- Consumer computer usage is swinging toward mobile devices. I hope I don’t have to convince you about that one.
In principle, there are two main ways to make search better:
- Understand more about the documents being searched over. But Google’s travails, combined with the rather dismal history of enterprise search, suggest we’re well into the diminishing-returns part of that project.
- Understand more about what the searcher wants.
The latter, I think, is where significant future improvement will be found.
|Categories: Autonomy, Coveo, Endeca, Enterprise search, FAST, Google, Lucene, Mercado, Microsoft, Search engines, Speech recognition, Structured search||4 Comments|
The newsletter/column excerpted below was originally published in 1998. Some of the specific references are obviously very dated. But the general points about the requirements for successful natural language computer interfaces still hold true. Less progress has been made in the intervening decade-plus than I would have hoped, but some recent efforts — especially in the area of search-over-business-intelligence — are at least mildly encouraging. Emphasis added.
Natural language computer interfaces were introduced commercially about 15 years ago*. They failed miserably.
*I.e., the early 1980s
For example, Artificial Intelligence Corporation’s Intellect was a natural language DBMS query/reporting/charting tool. It was actually a pretty good product. But it’s infamous among industry insiders as the product for which IBM, in one of its first software licensing deals, got about 1700 trial installations — and less than a 1% sales close rate. Even its successor, Linguistic Technologies’ English Wizard*, doesn’t seem to be attracting many customers, despite consistently good product reviews.
*These days (i.e., in 2009) it’s owned by Progress and called EasyAsk. It still doesn’t seem to be selling well.
Another example was HAL, the natural language command interface to 1-2-3. HAL is the product that first made Bill Gross (subsequently the founder of Knowledge Adventure and idealab!) and his brother Larry famous. However, it achieved no success*, and was quickly dropped from Lotus’ product line.
*I loved the product personally. But I was sadly alone.
In retrospect, it’s obvious why natural language interfaces failed. First of all, they offered little advantage over the forms-and-menus paradigm that dominated enterprise computing in both the online-character-based and client-server-GUI eras. If you couldn’t meet an application need with forms and menus, you couldn’t meet it with natural language either. Read more
|Categories: BI integration, IBM and UIMA, Language recognition, Natural language processing (NLP), Progress and EasyAsk, Search engines, Speech recognition||3 Comments|
Stephen Shankland reviewed Yahoo’s mobile voice search, which works by taking voice input and returning results onscreen (in his case on his Blackberry Pearl). He found:
- There are plenty of times when voice is a more convenient form of input than typing.
- Voice recognition was good but far from perfect.
- Editing search strings was annoyingly difficult.
- Search results themselves aren’t 100% perfect.
No big surprises there.
|Categories: Language recognition, Search engines, Specialized search, Speech recognition, Yahoo||Leave a Comment|
TechCrunchIT ranted yesterday against voice recognition. Parts of the argument have validity, but I think the overall argument was overstated.
Key points included:
1. Microsoft and Bill Gates have been overoptimistic about voice recognition.
2. Who needs voice when you have keyboards big and small?
3. The office environment is too noisy for voice recognition to work.
|Categories: Language recognition, Natural language processing (NLP), Spam and antispam, Speech recognition||2 Comments|
As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:
1. Web search
2. Public-facing site search
3. Enterprise search and knowledge management
4. Custom publishing
5. Text mining and extraction
Three are more standalone:
6. Spam filtering
7. Voice recognition
8. Machine translation
The Reg passes along a Reuters story that Hungarian scientists have built a system to automatically understand canine vocalizations. I’d like to say it’s a woof-to-Magyar translator, but apparently all it does is recognize the doggies’ emotional states. The story reports that the system has 43% accuracy, vs. 40% for humans.
I must confess, however, to being somewhat puzzled about how they measure success. Does the pooch fill out a survey form afterwards? Do they conclude that the beast wasn’t angry if the experimenter doesn’t get bitten?
I need to know a bit more about the research protocol before I know what to think about this.
EDIT: The CBC has a little more detail. The underlying research paper is appearing in Animal Cognition.
Linda asked me about the state of desktop dictation technology. In particular, she asked me whether there was much difference between the latest version and earlier, cheaper ones. My knowledge of the area is out of date, so I thought I’d throw both the specific question and the broader subject of speech recognition out there for general discussion.
Here’s much of what I know or believe about speech recognition:
- Most major independent commercial speech recognition efforts have wound up being merged into Nuance Communications. That goes for both desktop and server-side stuff. None was doing particularly well before its respective merger. Read more
|Categories: Language recognition, Natural language processing (NLP), Nuance, Speech recognition, Sybase||12 Comments|
NEC announced research-level technology that lets a cellphone automatically translate from Japanese into English. The key idea is that they are generating text output, not speech, which lets them sidestep pesky problems about accuracy. I.e. (emphasis mine):
One second after the phone hears speech in Japanese, the cellphone with the new technology shows the text on the screen. One second later, an English version appears. …
“We would need to study how to recognise [sic] voices on the phone precisely. Another problem would be how the person on the other side of the line could know if his or her words are being translated correctly,” he said.
|Categories: Language recognition, Natural language processing (NLP), Speech recognition||Leave a Comment|
I dropped by Progress a couple of weeks ago for back-to-back briefings on Apama and EasyAsk. EasyAsk is Larry Harris’ second try at natural language query, after the Intellect product fell by the wayside at Trinzic, the company Artificial Intelligence Corporation grew into.* After a friendly divorce from the company he founded, if my memory is correct, Larry was able to build EasyAsk very directly on top of the Intellect intellectual property.
*Other company or product names in the mix at various times include AI Corp and English Wizard. Not inappropriately, it seems that Larry has quite an affinity for synonyms …
EasyAsk is still a small business. The bulk is still in enterprise query, but new activity is concentrated on e-commerce applications. While Larry thinks that they’ve solved most of the other technical problems that have bedeviled him over the past three decades, the system still takes too long to implement. Read more
|Categories: BI integration, Language recognition, Mercado, Natural language processing (NLP), Progress and EasyAsk, Speech recognition||1 Comment|