Enterprise-specific web search: High-end web search/mining appliances?
OK. I have a vision of one way search could evolve, which I think deserves consideration on at least a “concept-car” basis. This is all speculative; I haven’t discussed it at length with the vendors who’d need to make it happen, nor checked the technical assumptions carefully myself. So I could well be wrong. Indeed, I’ve at least half-changed my mind multiple times this weekend, just in the drafting of this post. Oh yeah, I’m also mixing several subjects together here too. All-in-all, this is not my crispest post …
Anyhow, the core idea is that large enterprises spider and index a subset of the Web, and use that for most of their employees’ web search needs. Key benefits would include:
- Filtering out spam hits. This is obviously important for search, and in some cases could help with public-web text mining as well. It should be OK to be more aggressive on spam-site filtering in an enterprise-specific index than it is in general web search.
- Filtering out malicious/undesirable downloads of various sorts. I’m thinking mainly of malware/spyware here, but of course it can also be used for netnannying porn-prevention and the like as well. Again, this is more easily done for the enterprise market than for the search world at large. (I anyway think that Google could blow Websense out of the water any time they wanted to – except, of course, for the not-so-small matter of not being seen as participating in the censorship business — but that’s a separate discussion.)
- Capturing employees’ search strings. This could be useful for various purposes, including discerning their interests, and building the corporate ontology for internal web search.
- Freshness control. If there’s a site you really care about, you can make sure it’s re-indexed frequently.
| Categories: Categorization and filtering, Convera, Enterprise search, FAST, Google, IBM and UIMA, Search engines, Spam and antispam, Specialized search, Text mining, Website filtering | 1 Comment |
Danny Sullivan and Yahoo on the past and future of search
Danny Sullivan argues that search interfaces haven’t changed significantly for a decade, and that this suggests that the ways people have tried to change them aren’t likely to work when people try the same things yet again. He backs his thesis up with lots of historical screenshot pictures, some of which actually made me a bit nostalgic. In particular, he suggests that topic/cluster-based query refinement is a non-starter.
If he’s wrong, it will probably be because people today are satisfied with search only some of the time. Here, in a Business Week article, is a pretty good cut at where search so far has and hasn’t worked:
“Web searching can be frustrating for a lot of people,” says Tomi Poutanen, Yahoo’s director of product management for social search. “Search does a very good job if you are searching for something factual or doing research. It is not as good when searching for experiential knowledge—such as what is a good sushi restaurant in New York—where a person’s experience would count in having that answer.”
| Categories: Search engines | Leave a Comment |
KXEN is getting into text mining
Data mining challenger KXEN is getting into text mining, and they’re writing all their own stuff. Not even any Inxight filters. Weird. It will be interesting to see if they stick with that plan.
EDIT: Actually, upon reviewing an e-mail I see that their text mining features are in beta already. So I guess they stuck with the plan, at least for Release 1.
| Categories: BI integration, Text mining | Leave a Comment |
Two own-dogfood text-based bug-tracking applications
Last July I wrote about Google’s text-based project management system. Dave Kellogg of Mark Logic offers links to discussion of a related Google project, and adds news of his own — Mark Logic built a text-based bug tracking system in its own MarkLogic technology.
| Categories: Enterprise search, Google, Mark Logic, Search engines, Specialized search | Leave a Comment |
