Natural language processing (NLP)
Analysis of natural language processing (NLP) technologies.
MEN ARE FROM EARTH, COMPUTERS ARE FROM VULCAN
The newsletter/column excerpted below was originally published in 1998.  Some of the specific references are obviously very dated.  But the general points about the requirements for successful natural language computer interfaces still hold true.  Less progress has been made in the intervening decade-plus than I would have hoped, but some recent efforts — especially in the area of search-over-business-intelligence — are at least mildly encouraging.  Emphasis added.
Natural language computer interfaces were introduced commercially about 15 years ago*. They failed miserably.
*I.e., the early 1980s
For example, Artificial Intelligence Corporation’s Intellect was a natural language DBMS query/reporting/charting tool. It was actually a pretty good product. But it’s infamous among industry insiders as the product for which IBM, in one of its first software licensing deals, got about 1700 trial installations — and less than a 1% sales close rate. Even its successor, Linguistic Technologies’ English Wizard*, doesn’t seem to be attracting many customers, despite consistently good product reviews.
*These days (i.e., in 2009) it’s owned by Progress and called EasyAsk. It still doesn’t seem to be selling well.
Another example was HAL, the natural language command interface to 1-2-3. HAL is the product that first made Bill Gross (subsequently the founder of Knowledge Adventure and idealab!) and his brother Larry famous. However, it achieved no success*, and was quickly dropped from Lotus’ product line.
*I loved the product personally. But I was sadly alone.
In retrospect, it’s obvious why natural language interfaces failed. First of all, they offered little advantage over the forms-and-menus paradigm that dominated enterprise computing in both the online-character-based and client-server-GUI eras. If you couldn’t meet an application need with forms and menus, you couldn’t meet it with natural language either. Read more
| Categories: BI integration, IBM and UIMA, Language recognition, Natural language processing (NLP), Progress and EasyAsk, Search engines, Speech recognition | 3 Comments | 
Google Wave — finally a Microsoft killer?
Google held a superbly-received preview of a new technology called Google Wave, which promises to “reinvent communication.” In simplest terms, Google Wave is a software platform that:
- Offers the possibility to improve upon a broad range of communication, collaboration, and/or text-based product categories, such as:
- Search
- Word processing
- Instant messaging
- Microblogging
- Blogging
- Mini-portals (Facebook-style)
- Mini-portals (Sharepoint-style)
 
- In particular, allows these applications to be both much more integrated and interactive than they now are.
- Will have open developer APIs.
- WIll be open-sourced.
If this all works out, Google Wave could play merry hell with Microsoft Outlook, Microsoft Word, Microsoft Exchange, Microsoft SharePoint, and more.
I suspect it will.
And by the way, there’s a cool “natural language” angle as well. Read more
| Categories: Google, Language recognition, Microblogging, Microsoft, Natural language processing (NLP), Search engines, Social software and online media, Software as a Service (SaaS) | 3 Comments | 
More on Languageware
Marie Wallace of IBM wrote back in response to my post on Languageware. In particular, it seems I got the Languageware/UIMA relationship wrong. Marie’s email was long and thoughtful enough that, rather than just pointing her at the comment thread, I asked for permission to repost it. Here goes:
Thanks for your mention to LanguageWare on your blog, albeit a skeptical one
I totally understand your scepticism as there is so much talk about text analytics these days and everyone believes they have solved the problem. I guess I can only hope that our approach will indeed prove to be different and offers some new and interesting perspectives.
The key differentiation in our approach is that we have completely decoupled the language model from the code that runs the analysis. This has been generalized to a set of data-driven algorithms that apply across many languages so that you can have an approach that makes the solution hugely and rapidly customizable (without having to change code). It is this flexibility that we believe is core to realizing multi-lingual and multi-domain text analysis applications in a real-word scenario. This customization environment is available for download from Alphaworks, http://www.alphaworks.ibm.com/tech/lrw, and we would love to get feedback from your community.
On your point about performance, we actually consider UIMA one of our greatest performance optimizations and core to our design. The point about one-pass is that we never go back over the same piece of text twice at the same “level” and take a very careful approach when defining our UIMA Annotators. Certain layers of language processing just don’t make sense to split up due to their interconnectedness and therefore we create our UIMA annotators according to where they sit in the overall processing layers. That’s the key point.
Anyway those are my thoughts, and thanks again for the mention. It’s really great to see these topics being discussed in an open and challenging forum.
Languageware — IBM takes another try at natural language processing
Marie Wallace of IBM wrote in from Ireland to call my attention to Languageware, IBM’s latest try at natural language processing (NLP). Obviously, IBM has been down this road multiple times before, from ViaVoice (dictation software that got beat out by Dragon NaturallySpeaking) to Penelope (research project that seemingly went on for as long as Odysseus was away from Ithaca — rumor has it that the principals eventually decamped to Microsoft, and continued to not produce commercial technology there). Read more
Chatbot game — Digg meets Eliza?
I forget how I got the URL, but something called the Chatbot Game purports to be a combination of Eliza and Digg. That is, it’s a chatbot with a lot of rules; anybody can submit rules; rules are voted up and down.
I don’t think I’ll want to play with it for a while (I’m heading off on vacation for a while), so I thought I’d post it here to see if anybody else had any thoughts about or familiarity with it.
Related link
3 specialized markets for text analytics
In the previous post, I offered a list of eight linguistics-based market segments, and a slide deck surveying them. And I promised a series of follow-up posts based on the slides. Read more
| Categories: Language recognition, Natural language processing (NLP), Spam and antispam, Speech recognition | 2 Comments | 
The Text Analytics Marketplace: Competitive landscape and trends
As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:
1. Web search
2. Public-facing site search
3. Enterprise search and knowledge management
4. Custom publishing
5. Text mining and extraction
Three are more standalone:
6. Spam filtering
7. Voice recognition
8. Machine translation
So what’s the state of speech recognition and dictation software?
Linda asked me about the state of desktop dictation technology. In particular, she asked me whether there was much difference between the latest version and earlier, cheaper ones. My knowledge of the area is out of date, so I thought I’d throw both the specific question and the broader subject of speech recognition out there for general discussion.
Here’s much of what I know or believe about speech recognition:
- Most major independent commercial speech recognition efforts have wound up being merged into Nuance Communications. That goes for both desktop and server-side stuff. None was doing particularly well before its respective merger. Read more
| Categories: Language recognition, Natural language processing (NLP), Nuance, Speech recognition, Sybase | 12 Comments | 
NEC simplifies the voice translation problem
NEC announced research-level technology that lets a cellphone automatically translate from Japanese into English. The key idea is that they are generating text output, not speech, which lets them sidestep pesky problems about accuracy. I.e. (emphasis mine):
One second after the phone hears speech in Japanese, the cellphone with the new technology shows the text on the screen. One second later, an English version appears. …
“We would need to study how to recognise [sic] voices on the phone precisely. Another problem would be how the person on the other side of the line could know if his or her words are being translated correctly,” he said.
| Categories: Language recognition, Natural language processing (NLP), Speech recognition | Leave a Comment | 
Progress EasyAsk
I dropped by Progress a couple of weeks ago for back-to-back briefings on Apama and EasyAsk. EasyAsk is Larry Harris’ second try at natural language query, after the Intellect product fell by the wayside at Trinzic, the company Artificial Intelligence Corporation grew into.* After a friendly divorce from the company he founded, if my memory is correct, Larry was able to build EasyAsk very directly on top of the Intellect intellectual property.
*Other company or product names in the mix at various times include AI Corp and English Wizard. Not inappropriately, it seems that Larry has quite an affinity for synonyms …
EasyAsk is still a small business. The bulk is still in enterprise query, but new activity is concentrated on e-commerce applications. While Larry thinks that they’ve solved most of the other technical problems that have bedeviled him over the past three decades, the system still takes too long to implement. Read more
