November 25, 2012

The future of search

I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:

A small oligopoly dominating the conjoined businesses of mobile device software and search. The companies most obviously positioned for membership are Google and Apple.
The continued and growing combination of search, advertisement/recommendation, and alerting. The same user-specific data will be needed for all three.
A whole lot of privacy concerns.

My reasoning starts from several observations:

Enterprise search is greatly disappointing. My main reason for saying that is anecdotal evidence — I don’t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:
- HP just disclosed serious problems with Autonomy.
- Microsoft’s acquisition of FAST was a similar debacle.
- Lesser enterprise search outfits never prospered much. (E.g., when’s the last time you heard mention of Coveo?)
- My favorable impressions of the e-commerce site search business turned out to be overdone. (E.g., Mercado’s assets were sold for a pittance soon after I wrote that, while Endeca and Inquira were absorbed into Oracle.)
- Lucene/Solr’s recent stirrings aren’t really in the area of search.
Web search, while superior to the enterprise kind, is disappointing people as well. Are Google’s results any better than they were 8 years ago? Google’s ongoing hard work notwithstanding, are they even as good?
Consumer computer usage is swinging toward mobile devices. I hope I don’t have to convince you about that one. 🙂

In principle, there are two main ways to make search better:

Understand more about the documents being searched over. But Google’s travails, combined with the rather dismal history of enterprise search, suggest we’re well into the diminishing-returns part of that project.
Understand more about what the searcher wants.

The latter, I think, is where significant future improvement will be found.

Categories: Autonomy, Coveo, Endeca, Enterprise search, FAST, Google, Lucene, Mercado, Microsoft, Search engines, Speech recognition, Structured search

4 Comments

December 1, 2010

The state of the art in text analytics applications

Text analytics application areas typically fall into one or more of three broad, often overlapping domains:

Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) — in which case — or public-domain documents such as blogs, forum posts, and tweets. The former is usually called Voice of the Customer (VotC), while the latter is Voice of the Market (VotM).
Detecting and identifying problems. This can happen across many domains — VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.
Aiding text search, custom publishing, and other electronic document-shuffling use cases, often via document augmentation.

For several years, I’ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of sentiment analysis has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there’s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:

A bunch of documents are analyzed to ascertain the ideas expressed in them.
A count is made as to how many times each idea turns up.
The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.

Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such integration is most commonly weak or nonexistent. Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among other reasons. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.

But in fairness, the text analytics innovation picture hasn’t been quite as bleak as what I’ve been painting so far. Read more

Categories: Attensity, BI integration, Investment research and trading, SPSS, Text mining, Voice of the Customer

12 Comments

April 4, 2010

Ike Pigott on the future of reporting

Ike Pigott argues that, as the number of conventional journalists plummets, corporations will have to hire their own “embedded” journalists to fill the void. Read more

Categories: Blogosphere, Mark Logic, Online media, Social software and online media

5 Comments

April 1, 2010

Google funniest joke of the year (that I’ve noticed so far)

I just noticed a subtle and really funny Google joke. Look at where on the search results page it tells you how long the search took. They’re screwing around with the units of time (and in some cases substituting actual measures of speed). So far I’ve noticed figures in units of:

Centibeats
Microfortnights
Microweeks
Nanocenturies
“The velocity of an unladen swallow”
Planck times
Shakes of a lamb’s tail
Warp (Star Trek, of course)
Centons (Battlestar Galactica)
Parsecs (a unit of time in Star Wars Episode IV 🙂 )
Jiffies
Skidoo (23.00 skidoo, to be precise)
Gigawatts (pretty hard to explain how that’s a unit of time or velocity)
Epochs (one precise figure was 1.25e-15 epochs)
Hertz
Femtogalactic years

I haven’t tried to check or estimate the conversion factors used.

Related links

2010 April Fool’s Day highlights
My recent roundup of past years’ April Fool’s highlights
A companion roundup of other, even funnier pranks
My alternative to pranks: April No-Fooling Day
Google Operating System with more on Google’s 2010 April Fool’s jokes.

Categories: Fun stuff, Google, Humor

April Fool’s Day highlights

It’s April 1, and hence time for jests, online or otherwise. Highlights this year include:

In a charming blog post, Google annoucned the new Android Translate For Animals feature.
Reddit has apparently made every user an administrator, throwing the whole site –or at least the Reddit hot stories list — into chaos.
A video depicts icons falling off of an iPhone onto a table.
Firetoys, Ltd., whoever they are, are promoting a Back To The Future style hoverboard. I want one!

Edit: And more being added as I find them:

Supernatural Season 6
A crucial computer gaming accessory

Related links

My recent roundup of past years’ April Fool’s highlights
A companion roundup of other, even funnier pranks
My alternative to pranks: April No-Fooling Day

Categories: Fun stuff, Google, Humor, Social software and online media

3 Comments

March 29, 2010

Google’s version of an old joke

Search Google for “recursion” and it helpfully offers a link to let you search on — you guessed it — “recursion.” The joke has been implemented in German as well.

This idea is not, to put it mildly, new. I first saw the definition

Recursion: See recursion

in the glossary to Intellicorp’s KEE documentation, in 1984 or so. And I’d guess the joke is actually a lot older than that.

For another variation of the same idea, see this link.

Categories: Fun stuff, Google, Humor, Search engines

Google declares total war on Microsoft

Google blogged Tuesday night about a new project, the Google Chrome Operating System. Highlights include:

Open source
Targeted to appear in netbooks in the second half of 2010
Google Chrome browser + new windowing system + Linux kernel
Minimal user interface
Data stored or at least backed up in the cloud, and hence available on any computer
Hardware compatibility hassles allegedly eliminated
Ditto for software update hassles
Ditto for security problems
Apps apparently assumed to run inside the browser. (Not clear if this is required or just recommended.)

Obviously, Google Chrome OS is a direct attack on Microsoft — even more so than Google Wave, which I’ve predicted will “play merry hell with Microsoft Outlook, Microsoft Word, Microsoft Exchange, Microsoft SharePoint, and more,” or for that matter than Google Mail and the rest of Google Apps. Taken together, Google’s initiatives suggest that an all-out Google-Microsoft war is coming, in a conflict that many people have been expecting — and analyzing — for years.

So how will this all shake out? Well, let’s start with some basic points:

Google Chrome OS Release 1 is expected over a year from now, and then only on a limited subset of PCs, namely netbooks.
Google Chrome OS Release 1 is supposed to have great performance and be bullet-proof. Hmm …
Google is evidently assuming that the apps people want to run will either be browser-based, or else be new ones written for Chrome OS. Hmm …
Google is signaling that Chrome OS will be very limited in features. That makes sense for Release 1 — but what will be missing?
Consumers have proven their willingness to buy non-Microsoft computers, especially Apple ones, specifically in the Mac and iPhone/iTouch product lines.
A lot of people would have compatibility issues replacing Microsoft Excel or PowerPoint with partially-compatible alternatives. I’m not so sure about Microsoft Word, however. Other than those three, Outlook, and the Windows family itself, I’m not aware of any Microsoft client products that have much lock-in. (Well, maybe Xbox, but that’s not in the main stack.)
Open source software often gets most of its community support in a couple of areas, namely compatibilities and language translation. Google probably doesn’t need the help in languages, but letting other people fix Chrome OS compatibility issues whose importance it didn’t recognize is potentially valuable.
Google probably won’t make any direct revenue from Chrome OS. So how much will it invest in the project?
Notwithstanding Danny Sullivan’s concern, there isn’t much of an antitrust issue here. Google’s search can’t easily be used to favor Chrome, Chrome OS, or Google Apps. And the other way around — e.g., using Chrome OS to favor search — Google clearly isn’t a monopolist.

Categories: Google, Microsoft, Software as a Service (SaaS)

10 Comments

May 30, 2009

MEN ARE FROM EARTH, COMPUTERS ARE FROM VULCAN

The newsletter/column excerpted below was originally published in 1998. Some of the specific references are obviously very dated. But the general points about the requirements for successful natural language computer interfaces still hold true. Less progress has been made in the intervening decade-plus than I would have hoped, but some recent efforts — especially in the area of search-over-business-intelligence — are at least mildly encouraging. Emphasis added.

Natural language computer interfaces were introduced commercially about 15 years ago*. They failed miserably.

*I.e., the early 1980s

For example, Artificial Intelligence Corporation’s Intellect was a natural language DBMS query/reporting/charting tool. It was actually a pretty good product. But it’s infamous among industry insiders as the product for which IBM, in one of its first software licensing deals, got about 1700 trial installations — and less than a 1% sales close rate. Even its successor, Linguistic Technologies’ English Wizard*, doesn’t seem to be attracting many customers, despite consistently good product reviews.

*These days (i.e., in 2009) it’s owned by Progress and called EasyAsk. It still doesn’t seem to be selling well.

Another example was HAL, the natural language command interface to 1-2-3. HAL is the product that first made Bill Gross (subsequently the founder of Knowledge Adventure and idealab!) and his brother Larry famous. However, it achieved no success*, and was quickly dropped from Lotus’ product line.

*I loved the product personally. But I was sadly alone.

In retrospect, it’s obvious why natural language interfaces failed. First of all, they offered little advantage over the forms-and-menus paradigm that dominated enterprise computing in both the online-character-based and client-server-GUI eras. If you couldn’t meet an application need with forms and menus, you couldn’t meet it with natural language either. Read more

Categories: BI integration, IBM and UIMA, Language recognition, Natural language processing (NLP), Progress and EasyAsk, Search engines, Speech recognition

3 Comments

May 29, 2009

Google Wave — finally a Microsoft killer?

Google held a superbly-received preview of a new technology called Google Wave, which promises to “reinvent communication.” In simplest terms, Google Wave is a software platform that:

Offers the possibility to improve upon a broad range of communication, collaboration, and/or text-based product categories, such as:
- Search
- Word processing
- E-mail
- Instant messaging
- Microblogging
- Blogging
- Mini-portals (Facebook-style)
- Mini-portals (Sharepoint-style)
In particular, allows these applications to be both much more integrated and interactive than they now are.
Will have open developer APIs.
WIll be open-sourced.

If this all works out, Google Wave could play merry hell with Microsoft Outlook, Microsoft Word, Microsoft Exchange, Microsoft SharePoint, and more.

I suspect it will.

And by the way, there’s a cool “natural language” angle as well. Read more

Categories: Google, Language recognition, Microblogging, Microsoft, Natural language processing (NLP), Search engines, Social software and online media, Software as a Service (SaaS)

3 Comments

April 20, 2009

The new Attensity — deal overview

A new Attensity Group has been created in a complex set of maneuvers. So far as I understand or guess, elements of the deal include:

The Attensity Group is being formed by the merger of three companies: Attensity, empolis, and Living-e. Frankly, I’d never heard of either empolis or Living-e until this merger. (In case you ever have to resort to the Wayback Machine, embolis’ URL was http://www.empolis.com/home.html and Living-e’s was http://www.living-e.com/us/index.php)
Existing investors (employees aside) have largely been bought out. Most of the stock is owned by Aeris, an investment vehicle for SAP co-founder Klaus Tschira. Living-e already was a Tschira investment.
Inxight managers have been brought in to run the whole thing. Specifically, Ian Bonner will be CEO, and Ian Hersey will be EVP of Products and Technology.
The former CEOs of Attensity and empolis will run the Americas and EMEA regions, under the Attensity and empolis names respectively, apparently with their prior sales organizations more or less intact.
A former CEO of Living-e will be their boss, but also run “Special Projects”, which adds up to a very odd title indeed: “Senior Vice President of Operations and Strategic Projects, Attensity Group”
The former CTOs of Attensity and empolis are CTOs of system software (“Natural Language Processing”) and application software respectively. This gets Attensity’s total CTO count up to 3, a level I’ve previously seen only at Teradata. I haven’t talked with David Bean yet, but his colleagues insist that he’s excited about his new role.
This whole deal has been underway since at least late last year. For example, Ian Bonner has been involved for that long. empolis and Living-e announced the pooling of their sales forces back in February.
Technically, the merger isn’t complete, as Living-e is a public company and all 100% of its shares haven’t been acquired yet. (But they will be Real Soon Now.)
Attensity, of course, was a venture-backed private company, with tired investors. empolis was owned by Bertelsmann, and was itself a roll-up of several smaller text analytics companies.

I was told on the phone empolis was doing something like €30-40 million. Attensity and Living-e were under $10 million each. That surprises me a bit, as I thought Attensity was in that range on commercial business alone, and was doing more than $10 million counting its government accounts.

It turns out that if I had been paying attention to the news filters I could have seen this coming. Read more

Categories: Attensity

5 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

The future of search

The state of the art in text analytics applications

Ike Pigott on the future of reporting

Google funniest joke of the year (that I’ve noticed so far)

April Fool’s Day highlights

Google’s version of an old joke

Google declares total war on Microsoft

MEN ARE FROM EARTH, COMPUTERS ARE FROM VULCAN

Google Wave — finally a Microsoft killer?

The new Attensity — deal overview

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin