Analysis of companies or products focused on structured or faceted search. Related subjects include:
I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:
- A small oligopoly dominating the conjoined businesses of mobile device software and search. The companies most obviously positioned for membership are Google and Apple.
- The continued and growing combination of search, advertisement/recommendation, and alerting. The same user-specific data will be needed for all three.
- A whole lot of privacy concerns.
My reasoning starts from several observations:
- Enterprise search is greatly disappointing. My main reason for saying that is anecdotal evidence — I don’t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:
- HP just disclosed serious problems with Autonomy.
- Microsoft’s acquisition of FAST was a similar debacle.
- Lesser enterprise search outfits never prospered much. (E.g., when’s the last time you heard mention of Coveo?)
- My favorable impressions of the e-commerce site search business turned out to be overdone. (E.g., Mercado’s assets were sold for a pittance soon after I wrote that, while Endeca and Inquira were absorbed into Oracle.)
- Lucene/Solr’s recent stirrings aren’t really in the area of search.
- Web search, while superior to the enterprise kind, is disappointing people as well. Are Google’s results any better than they were 8 years ago? Google’s ongoing hard work notwithstanding, are they even as good?
- Consumer computer usage is swinging toward mobile devices. I hope I don’t have to convince you about that one.
In principle, there are two main ways to make search better:
- Understand more about the documents being searched over. But Google’s travails, combined with the rather dismal history of enterprise search, suggest we’re well into the diminishing-returns part of that project.
- Understand more about what the searcher wants.
The latter, I think, is where significant future improvement will be found.
|Categories: Autonomy, Coveo, Endeca, Enterprise search, FAST, Google, Lucene, Mercado, Microsoft, Search engines, Speech recognition, Structured search||4 Comments|
CMS/search (Content Management System) expert Alan Pelz-Sharpe recently decried “Shadow IT”, by which he seems to mean departmental proliferation of data stores outside the control of the IT department. In other words, he’s talking about data marts, only for documents rather than tabular data.
Notwithstanding the manifest virtues of centralization, there are numerous reasons you might want data marts, in the tabular and document worlds alike. For example:
- Price/performance. Your main/central data manager might be too expensive to support additional large specialized databases. Or different databases and applications might have sufficiently different profiles so as to get great price/performance from different kinds of data managers. This is particularly prevalent in the relational world, where each of column stores, sequentially-oriented row stores, and random I/O-oriented row stores have compelling use cases.
- Different SLAs (Service-Level Agreements). Similarly, different applications may have very different requirements for uptime, response time, and the like. (In the relational world, think of operational data stores.)
- Different security requirements. Different subsets of the data may need different levels of security. This is particularly prevalent in the document world, where security problems are not as well-solved as in the tabular arena, and where it’s common for a search engine to index across different corpuses with radically different levels of sensitivity.
- Integrated application and user interfaces. In the relational world, there’s a pretty clean separation between data management and interface logic; most serious business intelligence tools can talk to most DBMS. The document world is quite different. Some search engines bundle, for example, various kinds of faceted or parameterized search interfaces. What’s more, in public-facing search, a major differentiator is the facilities that the product offers for skewing search results.
- Different text applications require different thesauruses or taxonomy management systems. Ideally, those should all be integrated — but the requisite technology still doesn’t exist.
Bottom line: Text data marts, much like relational data marts, are almost surely here to stay.
|Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search||2 Comments|
At Lynda Moulton’s behest, I spoke a couple of times recently on the subject of where “semantic” technology is or isn’t likely to be important. One was at the Gilbane conference in early December. The slides were based on my previously posted deck for a June talk I gave on a text analytics market overview. The actual Gilbane slides may be found here.
My opinions about the applicability of semantic technology include:
- The big bucks in web search are for “transactional” web search, and semantics isn’t the issue there. (Slides 3-4)
- When UIs finally go beyond the simple search box — e.g. to clusters/facets or to voice — semantics should have a role to play. (Slide 5)
- Public-facing site search depends — more than any other area of text analytics — on hand-tagging. (Slide 7)
- “Enterprise” search that searches specialized external databases could benefit from semantic technologies. (Slide 8)
- True enterprise search could benefit from semantic technologies in multiple ways, but has other problems as well. (Slides 10-11)
- Semantics — specifically extraction — is central to custom publishing. (Slide 12 — upon review I regret using the word “sophisticated”)
- Semantics is central to text mining. (Slide 18)
- Semantics could play a big role in all sorts of exciting future developments. (Slide 19)
So what would your list be like?
|Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search||5 Comments|
On the whole, the Barack Obama campaign has been very internet-savvy. Maybe their web site JohnMcCainRecord.com is yet another example of same. But to my eyes, it has such an appallingly bad search interface that people going to the site are apt to be annoyed. To wit:
- There a huge search box in the center of the screen.
- All the search box ever does is take you to one of the 13 categories listed right below it.
- Usually, it doesn’t even do that. Instead, it just fails. For example, I entered terrorism and hit “Go”, and got no response. Ditto nuclear energy.
- When it does give you an answer, it’s apt not to be what you were looking for. For example, entering Iran takes you to the Foreign Policy page, which contains nothing about Iran.
And then, of course, there’s the funny stuff. For example, if you search on foo, you are taken to Rural Issues.
In general terms, I like the idea of the site. But absent some serious changes, JohnMcCainRecord.com should not have a search interface.
Edit: More here in my post on The Obama campaign’s Search Engine to Nowhere
I talked w/ Andrew McKay of Attivio for 2 ½ hours Thursday. I’ve also been working with some Attivio engineers on a blog search engine. I think it’s time to post about Attivio. Read more
As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:
1. Web search
2. Public-facing site search
3. Enterprise search and knowledge management
4. Custom publishing
5. Text mining and extraction
Three are more standalone:
6. Spam filtering
7. Voice recognition
8. Machine translation
I just stumbled across a brilliant summary of evolution in text search technology, written four years ago. It’s equally valid today (which in itself says something). I found it on the Prism Legal blog, but the actual author is Sharon Flank. My own comments are interspersed in bold. Read more
Powerset has done a great job of generating buzz for it’s version of smart search. That said, its current demo is mediocre — and that’s being polite. Powerset currently indexes little more than just Wikipedia, and the quality of its search results is about comparable to that of Wikipedia’s justly reviled internal search engine. To determine this, I did searches on both sites on five strings. Wikipedia typically had more total junk ranking higher, but it also put the very best hits of all higher than Powerset did. The strings were:
- Drosophila research
- Bill Clinton foreign policy
- Home run hitters
- Innocents on death row
- Text data mining
As I write this, Microsoft has just announced an offer to acquire Yahoo. Early responses from the likes of Danny Sullivan, Henry Blodget, the Download Squad, TechCrunch, Raven SEO, Mashable, and others seem to boil down to:
- Both sides needed it.
- Yahoo wasn’t going anywhere fast on its own.
- Microsoft wasn’t going anywhere fast in search on its own.
- This may be enough critical mass to matter.
- Conference call at 8:30 am
I’ll try to be a bit more analytical than that, but this is still going to be quick. Assuming the deal goes through:
- Microsoft will recombine both parts of the old FAST/alltheweb.com Therefore, Microsoft will be able to use the same technology for web and enterprise search, to the extent that such commonality makes sense.
- I’d expect Microsoft to try to differentiate its technology via faceted/structured search. That’s a FAST strength.
- The old FAST search-as-BI dream might become pretty appealing to Microsoft/Yahoo.
- In a non-search point, Microsoft is strong in games and Yahoo is strong in fantasy sports. Look for some synergies.
- There sure would be a whole lot of non-Windows technology inside Microsoft.
Basically, Microsoft is a company that’s a lot more sophisticated in its thinking about user interfaces and experiences than Yahoo is. That’s where the really interesting competitive innovation would be most likely to occur.
Following up on my prior posts about Microsoft’s impending acquisition of FAST, they’ve now had the conference call. By custom and indeed antitrust law, such calls are very light on content. But here are a few tidbits and takeaways, all from Jeff Raikes of Microsoft:
- Jeff talked solely about FAST as adding to enterprise search, and rightly contrasted that with web search.
- However, he deflected questions about web search with “We aren’t talking about that much detail right now” rather than with a firm “Well, we aren’t allowed to use FAST that way.”
- Specifically, enterprise search is all about integration with SharePoint (portal).
- Jeff said Microsoft’s current search could handle millions or maybe tens of millions of documents, but thought there was demand for FAST’s ability to handle billions.
- He positioned FAST as an application development platform, giving an example of structured search (the actual word was “pivot”) in consumer electronics. … Well, at least he’s looking in the right direction.