Most people in the text analytics market realize that text mining and search are somewhat related. But I don’t think they often stop to contemplate just how close the relationship is, could be, or someday probably will become. Here’s part of what I mean:
- Text mining powers search. The biggest text mining outfits in the world, possibly excepting the US intelligence community, are surely Google, Yahoo, and perhaps Microsoft.
- Search powers text mining. Restricting the corpus of documents to mine, even via a keyword search, makes tons of sense. That’s one of the good ideas in Attensity 4.
- Text mining and search are powered by the same underlying technologies. For starters, there’s all the tokenization, extraction, etc. that vendors in both areas license from Inxight and its competitors. Beyond that, I think there’s a future play in integrated taxonomy management that will rearrange the text analytics market landscape.
So who does “get it” about the search/text mining connection? The UIMA folks at IBM probably do. Inxight surely does. Attensity seemingly does, and so do most large search engine vendors (FAST and the public guys for sure; I’m not so certain about Autonomy and Convera). A small company whose CEO just called me yesterday does. I think I do.
But I’m not sure that the smaller text mining and search outfits – or the small text-oriented parts of large enterprise software vendors — have gotten the message at all yet …