Analysis of enterprise-specific search technology (as opposed to general web search). Related subjects include:
I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:
- A small oligopoly dominating the conjoined businesses of mobile device software and search. The companies most obviously positioned for membership are Google and Apple.
- The continued and growing combination of search, advertisement/recommendation, and alerting. The same user-specific data will be needed for all three.
- A whole lot of privacy concerns.
My reasoning starts from several observations:
- Enterprise search is greatly disappointing. My main reason for saying that is anecdotal evidence — I don’t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:
- HP just disclosed serious problems with Autonomy.
- Microsoft’s acquisition of FAST was a similar debacle.
- Lesser enterprise search outfits never prospered much. (E.g., when’s the last time you heard mention of Coveo?)
- My favorable impressions of the e-commerce site search business turned out to be overdone. (E.g., Mercado’s assets were sold for a pittance soon after I wrote that, while Endeca and Inquira were absorbed into Oracle.)
- Lucene/Solr’s recent stirrings aren’t really in the area of search.
- Web search, while superior to the enterprise kind, is disappointing people as well. Are Google’s results any better than they were 8 years ago? Google’s ongoing hard work notwithstanding, are they even as good?
- Consumer computer usage is swinging toward mobile devices. I hope I don’t have to convince you about that one.
In principle, there are two main ways to make search better:
- Understand more about the documents being searched over. But Google’s travails, combined with the rather dismal history of enterprise search, suggest we’re well into the diminishing-returns part of that project.
- Understand more about what the searcher wants.
The latter, I think, is where significant future improvement will be found.
|Categories: Autonomy, Coveo, Endeca, Enterprise search, FAST, Google, Lucene, Mercado, Microsoft, Search engines, Speech recognition, Structured search||5 Comments|
CMS/search (Content Management System) expert Alan Pelz-Sharpe recently decried “Shadow IT”, by which he seems to mean departmental proliferation of data stores outside the control of the IT department. In other words, he’s talking about data marts, only for documents rather than tabular data.
Notwithstanding the manifest virtues of centralization, there are numerous reasons you might want data marts, in the tabular and document worlds alike. For example:
- Price/performance. Your main/central data manager might be too expensive to support additional large specialized databases. Or different databases and applications might have sufficiently different profiles so as to get great price/performance from different kinds of data managers. This is particularly prevalent in the relational world, where each of column stores, sequentially-oriented row stores, and random I/O-oriented row stores have compelling use cases.
- Different SLAs (Service-Level Agreements). Similarly, different applications may have very different requirements for uptime, response time, and the like. (In the relational world, think of operational data stores.)
- Different security requirements. Different subsets of the data may need different levels of security. This is particularly prevalent in the document world, where security problems are not as well-solved as in the tabular arena, and where it’s common for a search engine to index across different corpuses with radically different levels of sensitivity.
- Integrated application and user interfaces. In the relational world, there’s a pretty clean separation between data management and interface logic; most serious business intelligence tools can talk to most DBMS. The document world is quite different. Some search engines bundle, for example, various kinds of faceted or parameterized search interfaces. What’s more, in public-facing search, a major differentiator is the facilities that the product offers for skewing search results.
- Different text applications require different thesauruses or taxonomy management systems. Ideally, those should all be integrated — but the requisite technology still doesn’t exist.
Bottom line: Text data marts, much like relational data marts, are almost surely here to stay.
|Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search||2 Comments|
At Lynda Moulton’s behest, I spoke a couple of times recently on the subject of where “semantic” technology is or isn’t likely to be important. One was at the Gilbane conference in early December. The slides were based on my previously posted deck for a June talk I gave on a text analytics market overview. The actual Gilbane slides may be found here.
My opinions about the applicability of semantic technology include:
- The big bucks in web search are for “transactional” web search, and semantics isn’t the issue there. (Slides 3-4)
- When UIs finally go beyond the simple search box — e.g. to clusters/facets or to voice — semantics should have a role to play. (Slide 5)
- Public-facing site search depends — more than any other area of text analytics — on hand-tagging. (Slide 7)
- “Enterprise” search that searches specialized external databases could benefit from semantic technologies. (Slide 8)
- True enterprise search could benefit from semantic technologies in multiple ways, but has other problems as well. (Slides 10-11)
- Semantics — specifically extraction — is central to custom publishing. (Slide 12 — upon review I regret using the word “sophisticated”)
- Semantics is central to text mining. (Slide 18)
- Semantics could play a big role in all sorts of exciting future developments. (Slide 19)
So what would your list be like?
|Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search||5 Comments|
Lynda Moulton, to put it mildly, disagrees with the Gartner Magic Quadrant analysis of enterprise search. Her preferred approach is captured in:
Coveo, Exalead, ISYS, Recommind, Vivisimo, and X1 are a few of a select group that are marking a mark in their respective niches, as products ready for action with a short implementation cycle (weeks or months not years).
By way of contrast, Lynda opines:
Autonomy and Endeca continue to bring value to very large projects in large companies but are not plug-and-play solutions, by any means. Oracle, IBM, and Microsoft offer search solutions of a very different type with a heavy vendor or third-party service requirement. Google Search Appliance has a much larger installed base than any of these but needs serious tuning and customization to make it suitable to enterprise needs.
In particular, her views about FAST (now Microsoft) are scathing.
I talked w/ Andrew McKay of Attivio for 2 ½ hours Thursday. I’ve also been working with some Attivio engineers on a blog search engine. I think it’s time to post about Attivio. Read more
I just found a year-old (almost) blog post from EMC executive Andrew Cohen that succinctly lays out his view (which he believes to mainly be a consensus stance) on e-discovery. Cohen is evidently both a lawyer and a honcho in document management system vendor EMC’s Compliance Division, which is probably relevant to interpreting his outlook, in the spirit of the old Kennedy School dictum that “Where you stand depends upon where you sit.”
- Information management is central to e-discovery.
- In particular, auditability (my word) is central, if you want electronic documents to hold up as evidence in court.
- Search is good enough, but it’s not the biggest issue in e-discovery.
- E-mail archiving has reached the tipping point, and is increasingly a must-have, largely for its e-discovery benefits.
Two years ago, CEO Mike Lynch of Autonomy tried to persuade me that Autonomy was and would remain dominant in the e-discovery search market because: Read more
Attivio CEO Ali Riaz was previously CFO and COO of FAST. He tried to avoid involvement in the recent expose’ of his former employer. For his troubles he got a parking lot ambush, a big photograph, and some unflattering coverage. Read more
A Norwegian newspaper did an expose’ on FAST, dated June 28. Helpful search industry participants quickly distributed English translations to a variety of commentators, including me. TechCrunch posted a scan of part of the article.
The gist is that FAST followed a pattern very common in the packaged enterprise software industry: Read more
Text analytics vendors participate in the same trends as other software and technology vendors. For example, relational business intelligence and data warehousing products are increasingly being sold to departmental buyers. Those buyers place particularly high value on ease of installation. And golly gee whiz, both parts of that are also true in text mining.
But beyond such general trends, I’ve identified six developments that I think could radically transform the text analytics market landscape. Indeed, they could invalidate the neat little eight-bucket categorization I laid out in the prior post. Each is highly likely to occur, although in some cases the timing remains greatly in doubt.
These six market-transforming trends are:
- Web/enterprise/messaging integration
- BI integration
- Universal message retention
- Portable personal profiles
- Electronic health records
- Voice command & control
|Categories: BI integration, Enterprise search, Google, Microsoft, Search engines, Social software and online media, Text mining||1 Comment|