Language recognition – Text Technologies

The future of search

Curt Monash — Mon, 26 Nov 2012 03:07:34 +0000

I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:

A small oligopoly dominating the conjoined businesses of mobile device software and search. The companies most obviously positioned for membership are Google and Apple.
The continued and growing combination of search, advertisement/recommendation, and alerting. The same user-specific data will be needed for all three.
A whole lot of privacy concerns.

My reasoning starts from several observations:

Enterprise search is greatly disappointing. My main reason for saying that is anecdotal evidence — I don’t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:
- HP just disclosed serious problems with Autonomy.
- Microsoft’s acquisition of FAST was a similar debacle.
- Lesser enterprise search outfits never prospered much. (E.g., when’s the last time you heard mention of Coveo?)
- My favorable impressions of the e-commerce site search business turned out to be overdone. (E.g., Mercado’s assets were sold for a pittance soon after I wrote that, while Endeca and Inquira were absorbed into Oracle.)
- Lucene/Solr’s recent stirrings aren’t really in the area of search.
Web search, while superior to the enterprise kind, is disappointing people as well. Are Google’s results any better than they were 8 years ago? Google’s ongoing hard work notwithstanding, are they even as good?
Consumer computer usage is swinging toward mobile devices. I hope I don’t have to convince you about that one.

In principle, there are two main ways to make search better:

Understand more about the documents being searched over. But Google’s travails, combined with the rather dismal history of enterprise search, suggest we’re well into the diminishing-returns part of that project.
Understand more about what the searcher wants.

The latter, I think, is where significant future improvement will be found.

So how does a search engine understand what you want? It can listen to you directly, parsing your search string. It can ask for more clarity, through some kind of disambiguation interface. Or it can make inferences, based on — well, based on just about any kind of information that might exist about you and your online behavior.

Search strings are short, typically four words or less. That doesn’t leave room for a lot of innovative parsing. Not a lot of progress can be made until search strings get a lot longer, and that is unlikely except perhaps through the convenience of speech recognition.

Faceted/parameterized selection has its place. For example, when I search on Amazon.com, the site encourages me to also select a department from its dropdown menu; otherwise, it refuses to rank the search results. And when I buy shirts from Land’s End, I just click through and never search at all. Still, Google’s been around for 15 years, and about all its successes in searcher-does-the-work disambiguation boil down to is:

A list of a few major subcategories to search (News, YouTube, etc.).
Spelling correction.
A desultory list of related/more specific searches, perhaps just longer search strings other people have recently entered.
Well-hidden “Advanced Search” features, which look much like AltaVista’s and AllTheWeb’s similar features did late in the 20th Century.

Whatever the user attitudes and behaviors are that constrain Google’s or its competitors’ success in this area, I can’t imagine them changing much — except, once again, in the event that speech recognition leads to richer human-computer conversations.

I’ve now highlighted two different ways in which there’s a search-interface challenge that will be tough to beat without turning to speech recognition. But the case for speech recognition is even stronger than that. We’re moving to small, mobile devices, and:

Traditional search interfaces work worse on mobile devices than on desktop computers. Typing is harder. So is dealing with picky forms.
Speech may work as well or better on mobile devices than at your desk. If you have upgraded your Apple device to IOS 6, you have both a microphone and Siri. The same may not be true of your desktop gear.

And so I conclude that speech recognition is a big part of the future of search.

What will that allow? Since talking is easier than typing, speech is a way to get longer text strings as search inputs, or more of them. It’s plausible that people might speak queries as complex as:

“I want to buy a recharger for an iPad 3 with delivery this week.”
“Where is 10gen’s Northern California office?” … “Which nearby restaurants have good Yelp reviews?”
“Tell me about the David Reed who went to the Kennedy School of Government around 1977, went to Dartmouth before that, and worked for the Federal Communications Commission.”

Getting search engines to the point that they can handle such queries will be difficult but straightforward — but even more progress is needed. Search results for various queries will be greatly improved if the search engine “knows” things like:

The location of your home and office, and the distance you’re willing to go from them to eat or shop.
Your tastes in food, clothing, and gadgetry.
The level of sophistication at which you like to read about medicine, finance, or electronics.
Which people are or might be in your extended social network.

And that will cement internet search squarely in the world of — for once I approve of the term — big data.

MEN ARE FROM EARTH, COMPUTERS ARE FROM VULCAN

Curt Monash — Sat, 30 May 2009 06:15:44 +0000

The newsletter/column excerpted below was originally published in 1998. Some of the specific references are obviously very dated. But the general points about the requirements for successful natural language computer interfaces still hold true. Less progress has been made in the intervening decade-plus than I would have hoped, but some recent efforts — especially in the area of search-over-business-intelligence — are at least mildly encouraging. Emphasis added.

Natural language computer interfaces were introduced commercially about 15 years ago*. They failed miserably.

*I.e., the early 1980s

For example, Artificial Intelligence Corporation’s Intellect was a natural language DBMS query/reporting/charting tool. It was actually a pretty good product. But it’s infamous among industry insiders as the product for which IBM, in one of its first software licensing deals, got about 1700 trial installations — and less than a 1% sales close rate. Even its successor, Linguistic Technologies’ English Wizard*, doesn’t seem to be attracting many customers, despite consistently good product reviews.

*These days (i.e., in 2009) it’s owned by Progress and called EasyAsk. It still doesn’t seem to be selling well.

Another example was HAL, the natural language command interface to 1-2-3. HAL is the product that first made Bill Gross (subsequently the founder of Knowledge Adventure and idealab!) and his brother Larry famous. However, it achieved no success*, and was quickly dropped from Lotus’ product line.

*I loved the product personally. But I was sadly alone.

In retrospect, it’s obvious why natural language interfaces failed. First of all, they offered little advantage over the forms-and-menus paradigm that dominated enterprise computing in both the online-character-based and client-server-GUI eras. If you couldn’t meet an application need with forms and menus, you couldn’t meet it with natural language either.

Even worse, NL actually had a couple of clear disadvantages versus traditional interfaces. First of all, it required (ick!) typing, often more typing than the forms and menus did. Second, forms and menus tell the user exactly what he can do. Natural language, however, lets him give orders the computer doesn’t know how to follow. This is inefficient, not to mention frustrating.

However, even in 1983, it was obvious that the typing objection would go away some day, because of speech recognition — once desktop computers reached 100 MIPs or so. (Effective keyboard-replacement speech recognition — as opposed to true natural language understanding — is mainly a matter of processing power.) 15 years later, standard PCs exceed 100 MIPs (assuming that 1 MIPs = a couple of megahertz for these purposes), and speech recognition is indeed getting practical.

In fact, as become increasingly evident recently, speech recognition is now a hot technology. Bill Gates has been talking it up for a couple of years. Increasingly, the press has swung to believing him … And my parents just bought a PC with two speech recognition products on it.

That said, speech recognition is as misunderstood (no pun intended) as most artificial intelligence technologies. Yes, it beats typing, in a number of circumstances:

On the telephone (duh!)
“Busy hands” and/or “busy eyes” applications and locales (doctors‘ offices, trading floors, warehouses, etc. — and, some day in the future, your kitchen and car)
People simply reluctant to type (e.g., anybody with sufficient wrist or back problems, and many males over the age of 45)

But before our computers talk back and forth with us in the voice of Majel Barrett Roddenberry, applications are going to have to add several important elements required for truly functional natural-language interfaces:

Intuitively clear names for everything on (or just behind) the screen
Application-specific disambiguation logic

For most practical purposes, the latter requirement equates to

A new generation of document selection technology

THE RULE OF NAMES

According to legend, knowing something’s name gives you power over it. When that “something” is a button or menu choice on a speech-enabled computer, the legend is literally true. But when a feature doesn’t have an obvious name, you can’t easily invoke it.

When applications consisted mainly of forms and menus, this was rarely a problem. Everything had a clear role and label. But web pages are less organized. Hyperlinks can be scattered all over the place, with little rhyme or reason.

Frankly, I don’t think this is a hard problem to solve. It wouldn’t take a lot of XML to divide the page into clear regions, so that commands like “Show me article #3” (on a search results list) could be interpreted in the obvious way. But it does take at least some discipline; random web pages will not necessarily be easy to “talk” to.

CYBERNETIC LISTENING SKILLS

The bigger challenge is to make sure that the application can respond in some useful way, no matter what command it’s given. This is even more difficult than it was 15 years ago, because of the radical increase in “casual” computer usage. In the old days, we could assume the user had some clear business reason for using the application, and if necessary that s/he had time to be trained (even if people rarely sat still for as much training as they really needed). Therefore, we could at least assume that the users had at least a general idea of what the application did, and hence of which commands the computer could obey. From an NL standpoint, we could assume that what they actually “said” (which in those days meant “typed”) was at least reasonably close to what they were “supposed” to say.

Now, however, some of the most important applications are internet e-commerce and portals, competing and begging for the user’s attention. The user is there strictly on a voluntary basis, and if he doesn’t get immediate gratification, he‘s gone, history, hasta la bye-bye. Site-specific training isn’t even a consideration. And even if somebody did actually take a class on “How to use Excite,” the knowledge would be obsolete in six months. So applications, if they are to have natural language interfaces that please and respond to users, have to be able to respond pretty much to any command.

Ideally, voice-enabled systems would be like the computers on Star Trek, which can return information from vast archives, brew a pot of Earl Grey tea, play three parts of a quartet, create self-aware life forms, or answer questions like “Computer, what is the nature of the universe?” More realistically, they should be able, for example, to respond to a command like “Tell me about flights to Miami” by automatically giving the user a travel-reservation application or web page, and entering Miami in the appropriate form field.

If one thinks about the complications in such a system, it becomes clear that there are only two possible ways an application system can be designed to respond meaningfully to an enormous range of reasonable possible requests.

1. It can do the equivalent of saying “I’m sorry, I didn’t understand that,” “I’m sorry, I can’t do that,” and so on.

2. It can interpret many commands as text-search strings, and return appropriate results.

The first strategy — application-specific disambiguation logic, clear responses to “errors,” etc. — is absolutely necessary. No software is perfectly intelligent; the user will have to be asked for disambiguation help from time to time (just as clerks today ask customers to repeat their requests!). I’m not going to go into much detail about how that works because, frankly, it’s a tricky thing to get right. Users hate unnecessary disambiguation steps. They also hate the incorrect responses that result from ambiguity, and do tolerate being asked for help when it’s truly needed. In short, whatever you build the first time around will probably be wrong. So build something fast; then run, don’t walk, to the nearest usability lab, find out how you screwed up, and redo your system until you get it right.

I’m convinced that the second strategy — heavy reliance on text search technology — is a requirement as well. Just try to name a major web site that doesn’t use text search. True, text search has gotten a bad rap recently, mainly because a whole generation of search engines didn’t really work. But it will stage a comeback.

Related links

Google Wave — finally a Microsoft killer?

Curt Monash — Fri, 29 May 2009 09:49:24 +0000

Google held a superbly-received preview of a new technology called Google Wave, which promises to “reinvent communication.” In simplest terms, Google Wave is a software platform that:

Offers the possibility to improve upon a broad range of communication, collaboration, and/or text-based product categories, such as:
- Search
- Word processing
- E-mail
- Instant messaging
- Microblogging
- Blogging
- Mini-portals (Facebook-style)
- Mini-portals (Sharepoint-style)
In particular, allows these applications to be both much more integrated and interactive than they now are.
Will have open developer APIs.
WIll be open-sourced.

If this all works out, Google Wave could play merry hell with Microsoft Outlook, Microsoft Word, Microsoft Exchange, Microsoft SharePoint, and more.

I suspect it will.

And by the way, there’s a cool “natural language” angle as well.

For starters, here are some basic links:

Google has naturally set up a home page for the Google Wave project.
Featured on that page but also separately available is an 80-minute video introducing Google Wave.
Techcrunch has two highly detailed posts on Google Wave, one summarizing what’s in the main Google Wave video and one reporting on a Google Wave Q&A.

Here are some reasons I think Google Wave could actually live up to its promise:

The email problem Google Wave purports to solve is real and critical. The email paradigm assumes linear conversations, and what actually happens is that they branch. Google Wave’s message-board-like paradigm is simply better, and more flexible (e.g., not limited to a single enterprise!) than Microsoft Exchange or Lotus Notes.
The instant messaging problems Google Wave purports to solve are also major. Instant messaging is slow, tedious, disjointed, and ephemeral. Fully integrating IM with email solves most of those problems. And Google Wave’s UI interactivity solves most of the rest.
Twitter needs to be integrated with other forms of communication. What’s more, Twitter’s functionality needs to be drastically extended. Google Wave is the best hope I know of to meet those needs. Enterprise Twitter is just a special case of that.
Workgroups (enterprise or otherwise) need light-weight mini-portals that can be created on the fly by non-technical users, to ease collaboration. Microsoft SharePoint, SAP Rooms, et al. don’t really meet that need. Google Wave could.
In particular, collaboration on documents, presentations and so on needs to be more cloud-based and generally easier than is the case in Microsoft Office. Google Wave has the potential to provide that.
Google + open source is a potentially potent combination, especially versus Microsoft.

One note: Google of course needs to improve the reliability and customer service of its cloud-based offerings to make a huge dent in Microsoft’s market. But even with its flaws Google has already been a good alternative for a while.

As for the “natural language” angle: At the 44:30 mark of the main Google Wave video is a demo of some cool, very grammar-sensitive spell-checking technology. Google’s spell-checking technology is further discussed in a separate, short video. The basic idea is that Google uses its vast library of web pages — and email and chat? — not just to model intended word usage but also kinds of mis-spelling behavior as well.

Lukewarm review of Yahoo mobile search

Curt Monash — Tue, 11 Nov 2008 23:01:36 +0000

Stephen Shankland reviewed Yahoo’s mobile voice search, which works by taking voice input and returning results onscreen (in his case on his Blackberry Pearl). He found:

There are plenty of times when voice is a more convenient form of input than typing.
Voice recognition was good but far from perfect.
Editing search strings was annoyingly difficult.
Search results themselves aren’t 100% perfect.

No big surprises there.

More on Languageware

Curt Monash — Fri, 10 Oct 2008 10:38:29 +0000

Marie Wallace of IBM wrote back in response to my post on Languageware. In particular, it seems I got the Languageware/UIMA relationship wrong. Marie’s email was long and thoughtful enough that, rather than just pointing her at the comment thread, I asked for permission to repost it. Here goes:

Thanks for your mention to LanguageWare on your blog, albeit a skeptical one I totally understand your scepticism as there is so much talk about text analytics these days and everyone believes they have solved the problem. I guess I can only hope that our approach will indeed prove to be different and offers some new and interesting perspectives.

The key differentiation in our approach is that we have completely decoupled the language model from the code that runs the analysis. This has been generalized to a set of data-driven algorithms that apply across many languages so that you can have an approach that makes the solution hugely and rapidly customizable (without having to change code). It is this flexibility that we believe is core to realizing multi-lingual and multi-domain text analysis applications in a real-word scenario. This customization environment is available for download from Alphaworks, http://www.alphaworks.ibm.com/tech/lrw, and we would love to get feedback from your community.

On your point about performance, we actually consider UIMA one of our greatest performance optimizations and core to our design. The point about one-pass is that we never go back over the same piece of text twice at the same “level” and take a very careful approach when defining our UIMA Annotators. Certain layers of language processing just don’t make sense to split up due to their interconnectedness and therefore we create our UIMA annotators according to where they sit in the overall processing layers. That’s the key point.

Anyway those are my thoughts, and thanks again for the mention. It’s really great to see these topics being discussed in an open and challenging forum.

Languageware — IBM takes another try at natural language processing

Curt Monash — Tue, 07 Oct 2008 15:51:26 +0000

Marie Wallace of IBM wrote in from Ireland to call my attention to Languageware, IBM’s latest try at natural language processing (NLP). Obviously, IBM has been down this road multiple times before, from ViaVoice (dictation software that got beat out by Dragon NaturallySpeaking) to Penelope (research project that seemingly went on for as long as Odysseus was away from Ithaca — rumor has it that the principals eventually decamped to Microsoft, and continued to not produce commercial technology there).

By the way — I by no means want to single out IBM’s natural language efforts for especial bashing. The AI industry’s unit of bogosity has long been the “microlenat,” and Doug Lenat’s life work is, approximately, solving natural language access. I sat next to Doug at dinner at an IJCAI/AAAI conference in 1985 or so. So far as I can tell, what he told me about then still hasn’t been delivered in real life. I’m not aware of any connection between Lenat and IBM.

What’s different this time, apparently, is a rigorous focus on performance. Marie and her team seem to believe that what has held natural language processing back in the past has been poor performance. That’s not as crazy as it sounds, since natural language may be one of those artificial intelligence problems in which brute force outperforms sophisticated heuristics (Lenatesque or otherwise). Still, I have to wonder if performance has really been the main problem.

One interesting side note is that a reason given for this great performance is that processing is done in one pass rather than several. Since seems to directly contradict the philosophy of UIMA, IBM’s proposed general-purpose text analytic industry standard. And it’s tough to see how that architectural choice alone can produce enough of a performance advantage to be a game-change.

The link I gave above already has quite a bit of material. Marie tells me that more and/or fresher material is coming soon.

Chatbot game — Digg meets Eliza?

Curt Monash — Thu, 10 Jul 2008 08:41:59 +0000

I forget how I got the URL, but something called the Chatbot Game purports to be a combination of Eliza and Digg. That is, it’s a chatbot with a lot of rules; anybody can submit rules; rules are voted up and down.

I don’t think I’ll want to play with it for a while (I’m heading off on vacation for a while), so I thought I’d post it here to see if anybody else had any thoughts about or familiarity with it.

Related link

Russian chatbot apparently passes Turing Test

TechCrunchIT rants against voice recognition

Curt Monash — Mon, 07 Jul 2008 08:17:08 +0000

TechCrunchIT ranted yesterday against voice recognition. Parts of the argument have validity, but I think the overall argument was overstated.

Key points included:

1. Microsoft and Bill Gates have been overoptimistic about voice recognition.

2. Who needs voice when you have keyboards big and small?

3. The office environment is too noisy for voice recognition to work.

In particular, TechcrunchIT wrote:

In a real-world enterprise environment, it is impossible to imagine a room full of people all using voice dictation at their computers. The background noise is difficult to filter out, and the modern office environment is full of interruptions with phones ringing, instant messages, new emails and more.

That part of the argument can be refuted in one word — headphones — but other parts carry a bit more weight. For example, so long as it is true that:

When typing at a keyboard, you can easily multi-task and stop/start easily while switching between programs. With voice recognition, you need to pause or stop recording and specifically tell the application when you are actually speaking to it by pressing a button.

voice recognition won’t grow beyond niche status. But it will remain true until computers have effective command-line interfaces that work seamlessly among multiple applications. And I’m not aware that such interfaces have shown much progress to date.

3 specialized markets for text analytics

Curt Monash — Thu, 19 Jun 2008 07:44:09 +0000

In the previous post, I offered a list of eight linguistics-based market segments, and a slide deck surveying them. And I promised a series of follow-up posts based on the slides.

Let me begin by explaining what I mean by some of that list (taken from Slide 2), starting from the bottom.

Machine translation is a small business, with small specialized vendors. Lernout & Hauspie attempted to combine it with voice recognition in a complex financial play, but that collapsed in a miasma of stock fraud. The remnants turned into what became Nuance Communications.
Nuance is a roll-up of most of the important independent voice recognition vendors. So far voice recognition has worked best in two areas: “Hands-free” computer use/dictation, and IVR (interactive voice response). While both are important, neither is exactly a mainstream enterprise computer software business. So voice recognition is not closely integrated with the other market segments.
“Natural language processing” other than voice recognition isn’t much of a business at this time (with apologies to Progress EasyAsk). It doesn’t make the list at all.
Spam filtering is obviously a major business, whether or not it is getting combined into more general security and/or messaging product suites. Antispam vendors actually perform a lot of machine learning, much like text miners do. But the types of rules they wind up with are quite different. And their hardest problems aren’t linguistic ones, usually, as the spammers have gone beyond text to, e.g., words depicted in graphical images. Besides, even where linguistics are involved, it’s a very different problem to identify words used by bad guys trying to spoof you (and the rest of the world) than it is to understand your particular users.

Why and to what extent I see the other five as separate markets was explained in connection with the subsequent 17 slides.

The Text Analytics Marketplace: Competitive landscape and trends

Curt Monash — Thu, 19 Jun 2008 07:35:39 +0000

As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:

1. Web search

2. Public-facing site search

3. Enterprise search and knowledge management

4. Custom publishing

5. Text mining and extraction

Three are more standalone:

6. Spam filtering

7. Voice recognition

8. Machine translation

This list comes from a talk I gave Monday at the Text Analytics Summit called The Text Analytics Marketplace: Competitive landscape and trends. In half an hour, I covered the first five areas (in Sue Feldman’s word, at a “gallop”). The slide deck has been uploaded to the link below. I plan to break out the material from the talk into a series of blog posts over the next few (or perhaps not-so-few) weeks.

Slides:

The Text Analytics Marketplace: Competitive landscape and trends

Other posts based on those slides:

Three specialized markets for text analytics (based on Slide 2)
6 trends that could shake up the text analytics market (based on Slide 19)
Why search technologies are going to recombine (in A World of Bytes, based on Slide 19)